Machine learning tool could improve 'deepfake' videos
Image credit: Carnegie Mellon University
US researchers at Carnegie Mellon University have developed an automated technique for transferring the actions in one video into the context of another, allowing for the facial expressions and bodily movements of one person to be transferred onto another.
To demonstrate their work, the researchers transferred comedian John Oliver’s facial expressions onto a video of a cartoon frog, and made a daffodil unfurl like a hibiscus. Aayush Bansal, the PhD behind the project, was motivated by the need for more efficient tools in film production.
“It’s a tool for the artist that gives them an initial model that they can improve,” he said.
The tool uses a class of unsupervised machine-learning algorithms called generative adversarial networks (GANs), which – due to their application in applying the style of one image to another – are often used to generate fake photographs that appear authentic.
A GAN tends to use two models: one to create images or video to match a certain style and another to essentially spot the difference between the styles of images or videos. The two ‘compete’, resulting in the GAN being trained to transform the style of content more accurately. An existing variant, cycle-GAN, adds a stage to the process to analyse the spatial characteristics of the product, although this can still leave defects in video footage.
In order to improve the method, the Carnegie Mellon researchers developed a new iteration of the technique, Recycle-GAN, which incorporates temporal as well as spatial information for improved results. As this method does not require supervision, it is capable of altering large amounts of video very quickly.
The researchers demonstrated the possibilities of Recycle-GAN by transferring facial expressions and bodily movements between videos of Last Week Tonight host John Oliver, civil rights activist Martin Luther King, former US President Barack Obama, US President Donald Trump and a cartoon frog. They demonstrated that Recycle-GAN could also be applied to non-humanoid features by making a daffodil appear to bloom like a hibiscus, and having windswept clouds slowed down to give the impression of milder weather.
The researchers suggest that it could also be used to convert black and white films into colour, or create content for virtual reality (VR) apps. Bansal says that it could also be useful for driverless car technology; turning training footage into night time or stormy scenes which could be used to train the cars to operate under these conditions. Bansal also acknowledges, however, that this technology could be used for “deepfakes”: photographs and videos (often pornographic) that use a neural network to insert a real person’s likeness into the media without their consent. Pornographic deepfakes have, in recent months, been banned on Reddit and other internet platforms which support user-generated content.
Last year, a study conducted at the University of Washington demonstrated that it was possible to lip-sync videos to different audio tracks automatically.
“It was an eye opener to all of us in the field that such fakes would be created and have such an impact,” said Bansal. “Finding ways to detect them will be important moving forward.”
If you are interested in this subject, register for our free EngTalk event: ‘AI and human digitisation: when seeing is not believing?’, which will be held at IET London: Savoy Place on 17th September 2018.