A video of Obama giving a speech went viral a few months ago, and not because of his crowd skills. The video wasn’t a real speech given by the former president. It was a video created by an artificial intelligence (AI) algorithm that can make a person’s facial movements to match a given audio clip. The video went viral partly because of people’s fear of what this technology can do to the reliability of news videos.
But this type of technology isn’t just used for evil. A team of researchers from Max Planck Institute for Informatics, University of Bath, Technicolor, TU Munich and Stanford University has created a new AI system that can edit an actor’s facial expressions to match dubbed-in audio clips. The new system would save everyone on the film set money and time. The algorithm can correct an actor’s gaze and head pose, eliminating the need for a scene reshoot if a line needs to be changed. Instead of filming a whole new scene, the video from the first shot can be changed with the AI algorithm to match the new, dubbed-in line. If you’re a fan of the Netflix series BoJack Horseman, this may sound like the "futuristic" technology the Secretariat movie crew uses to dub in BoJack’s character when he runs away from the shoot for months during season two and three.
This algorithm would also be great for films that are translated into different languages. One of the most annoying things about watching a movie that has been translated is the mismatch between the dialogue and the actor’s lips. The mismatching can be distracting and ultimately take away from the viewer’s enjoyment of the movie.
Hyeongwoo Kim from the Max Planck Institute for Informatics says, "It works by using model-based 3D face performance capture to record the detailed movements of the eyebrows, mouth, nose, and head position of the dubbing actor in a video. It then transposes these movements onto the 'target' actor in the film to accurately sync the lips and facial movements with the new audio."
Co-author of the paper, Dr. Christian Richardt, from the University of Bath's motion capture research center CAMERA, adds, "This technique could also be used for post-production in the film industry where computer graphics editing of faces is already widely used in today's feature films. Deep Video Portraits shows how such a visual effect could be created with less effort in the future. With our approach, even the positioning of an actor's head and their facial expression could be easily edited to change camera angles or subtly change the framing of a scene to tell the story better."
Similar to the Obama video, this technology could easily be used to trick the viewer. While it is still in the proof-of-concept stage, the researchers believe it could be widely used within the next few years.
Dr. Michael Zollhöfer, from Stanford University, explains, "The media industry has been touching up photos with photo-editing software for many years, meaning most of us have learned to take what we see in photos with a pinch of salt. With ever-improving video editing technology, we must also start being more critical about the video content we consume every day, especially if there is no proof of origin. We believe that the field of digital forensics should and will receive a lot more attention in the future to develop approaches that can automatically prove the authenticity of a video clip. This will lead to ever better approaches that can spot such modifications even if we humans might not be able to spot them with our own eyes."
The system’s use doesn’t stop at movies. It could also be used in video conferencing, editing technology, VR conferencing and more.
The Max Planck Institute researchers will be presenting the technology at SIGGRAPH 2018.