AI in Dubbing: Almost There, But Still a Long Way to Go!

Submitted by jpressman on Sun, 03/31/2019 - 11:01

AI in Dubbing: Almost There, But Still a Long Way to Go!

March 31, 2019

By Jacques Barreau - Dean of Dubbing | VP, Media & Interactive Entertainment Transperfect Media

AI has already been implemented into our lives. It’s in our cars, our telephones, our computers, our transportation systems, and our hospitals. AI is helping us make decisions daily. But what about language-dubbing?

Although a new AI-based system is trying to change the facial expressions of actors to accurately match dubbed voices, AI can also help the dubbing process by helping the dubbing actors themselves — without altering the on-screen content — which could lead to a more complicated distribution system for a TV product.

Translation Memory is the first AI stage in the dubbing process and very different from Automatic Computer Translation, also known as Phrase-Based Machine Translation, wherein the machine translates a phrase word for word, without understanding the context.

Unlike humans, machines will remember every word of a text, and consequently, are perfectly suited to build glossaries for each and every episode of a TV series. Since human mistakes will be avoided, character names and/or locations can then accurately be reused.

The next step: Neural Machine Translation (NMT), which will bring the predictive concept into play. This is a piece of software using different algorithms working together, like the neurons in our brain. The same way a child learns a language, the machine will learn to predict complex sentences, after first memorizing expressions, grammar, and linguistic rules. Furthermore, NMT is able to learn new languages by itself.

Obviously, translating a narration in a documentary is very different from translating dialogue in a TV series. And comedies will always be more complicated to translate as we need to adapt the original humor to the localized version.

We can say today that in some cases, depending on the complexity of the dialogue, the machine will get much better and the need to have a human editing will already be reduced significantly after the third episode of a TV series.

Today, the NMT concept is still in the infancy stage, as we are just now seeing machines capable of learning how humans think. But how many years will it take for machines to understand how humans think after they begin to translate from one language to another, after adapting the sense of humor for different cultures, and after figuring out the degree of emotion necessary for complex dialogues?

The shorter process time requested by the broadcasters so that they can air the subtitled or dubbed versions earlier and earlier are making AI a more efficient, more reliable, and cheaper process.

Translation companies are now reaching a limit in which several translators are required to work in parallel on different parts of a project, raising the risk of inconsistency and mistakes. Machines will not only be faster, but will help in avoiding silly mistakes — such as turning dollars into euros, north becoming south, or son becoming soon — that we see regularly on TV.

Speech to text and text to speech have made enormous progress in performance (or delivery). A few years ago, a text to speech sounded like a robot. Now, it adds variation in the delivery and sounds more like a very dull person. This is big progress! Text to speech can help actors rehearse, review scripts, or create mock-up versions for producers and dubbing directors.

AI needs to learn imperfection. AI has to understand all the nuances (pitch, air-in-voice, throat compression, projection, movement, and intonation) in order not to deliver sentences continuously in the same robot-like way.

Data is the fuel of AI. One of the major challenges in the dubbing world is to understand what kind of data AI needs for machines to understand the concepts of language and cultural adaptation. As it is often said: we don’t dub to a language, we dub to a culture. This is where AI can help the dubbing community. Could AI understand emotions during the delivery, identify different accents or even imperfections in voice projections in order to create a neural analysis? For that, machines will need a huge bank of data covering the different parameters that define not only a voice, but also concepts such as intention, punctuation, delivery, and projection.

Dubbing actors worldwide still follow a very manual recording process. AI could help in the background by doing an immediate analysis and consequently review each recorded line based on a bank of data built before or during the dubbing process.

Going even further, AI can now help to develop new processes, such as cloud-based workflows, which will allow actors’ to remote-record, as well as give them immediate access — in the cloud — to all parties involved in the production.

AI cannot be seen as a threat to the dubbing community. In fact, it could be its savior!

The author of this article, Jacques Barreau started as a researcher at the Music and Computer Science Laboratory in Marseille, France. During his tenure at Warner Bros. as head of the Dubbing and Subtitling department, Barreau applied his musical theories to the world of language-dubbing, considering the human voice as a musical instrument. In his current role of VP at the L.A.-based Translations.com, Barreau plays a key role in the development of the company’s GlobalLink Studio and Media.Next suite of AI-powered media localization solutions.