Synthetic video: don’t call them deepfakes
Image credit: Flawless
Synthetic video is best known for its abusive applications – deepfake pornography and disinformation. Some companies, however, are out to prove that the technology is a useful tool in video production.
Among the many ways we have become aware that AI is gnawing away at the fabric of society is in doubting the evidence of our own eyes. Deepfakes have added fuel to existing social media, disinformation and online sexual abuse bin-fires – attracting almost uniformly negative sentiment. A 2020 study by UCL researchers published in Crime Science ranked them as the most harmful AI application.
Although there is no universally agreed-upon definition, a typical deepfake uses AI to replace a person in an existing video with another. The vast majority of deepfakes are used to switch pornographic actors with celebrity women, but they have attracted popular attention as tools of political disinformation. In March, a poor-quality deepfake of President Volodymyr Zelensky announcing Ukraine’s surrender to Russia surfaced on social media for a brief round of ridicule, before being removed.
There is an argument to be made that it is abusive intent that must be policed and not the technology itself. In fact, many companies have emerged in recent years seeking to demonstrate the potential of deepfakes, and synthetic video more broadly: Deep Word, Flawless, Deepdub, Anymate Me, and others.
These companies are responding to explosive demand for video production driven by internet trends. Social media platforms are pivoting to video in an effort to replicate the success of TikTok, and Cisco estimates video will make up more than 82 per cent of consumer internet traffic this year (15 times higher than in 2017). Traditional video production, however, is expensive, complex and unscalable. These companies hope to make it more accessible using AI to replace cameras, editors, actors and other elements of traditional video production.
Mina Samaan, a partner at MMC Ventures, says he sees new use-cases for synthetic video every day. He points to innovation in text-to-video – generating video from a text prompt, rather than an existing video – as an exciting trend in the area: “Today, because there’s so much capital going into [this space], the innovation is shifting from taking an existing video and maybe doing some lip-syncing to taking a completely blank slate and creating a video from scratch. That [...] wasn’t really possible three to four years ago.”
MMC Ventures is an investor in Synthesia: a London-based company founded in 2017 which recently secured the largest investment in the AI video space to date. Synthesia allows clients to generate video from text using a web-based app, bypassing traditional video production or – more usually – communicating via video what they would otherwise communicate via text. (Video is the most effective mode of communication; according to Invisia, we retain 95 per cent of information communicated through video compared with 10 per cent communicated through text.)
Using Synthesia’s platform, McDonald’s delivers learning and development content to employees with AI-generated training videos. “It turns out that when people use AI videos, they don’t compare it with traditional video; they compare it with text,” says Victor Riparbelli, CEO and co-founder of Synthesia. “Compared with traditional video, sure, [synthetic video] is not as good. But, if someone in a big fast-food company needs to be trained, and they can decide between reading 20 pages of PDF documents or watching a six-minute video, it’s a very easy choice for the company.”
Deepfakes – and perhaps, by extension, synthetic video – have a public image problem. Synthetic video companies are wary of association with abusive deepfakes, prominently laying out their ethical frameworks to make clear their distance. Synthesia is in the process of creating a library of viral deepfakes which it hopes will help fight disinformation, engages in the debate around regulation of synthetic media, and has set out firm rules regarding what its technology can be used for.
Samaan says: “There are quite a few steps to making sure that you’re not seen as a deepfake; you’re seen as synthetic video and ensuring that other people understand that is your position, despite the fact that news headlines will continue to set you back.”
Riparbelli, meanwhile, seems more relaxed about the association. “We don’t call ourselves deepfakes, but it’s not like: ‘oh no we’re definitely not deepfakes.’ We’re definitely building technology that you could put in the deepfake family of technologies, I guess,” he says. “I wouldn’t say I spend a lot of time trying to escape the deepfake narrative. I welcome it; people find it interesting.”
He believes that companies like Synthesia are already providing evidence that synthetic video is a general-purpose technology with many potential applications beyond abusive deepfakes: “The popular narrative is very focused on deepfakes, which makes a lot of sense. It certainly is a real threat. It’s causing real harm today. But I think it often becomes a red herring for a much more fundamental shift we’re going through, which is to switch from traditional software to AI ... and I think it misses the myriad of applications of what you could very well call deepfake technology that we’re all using every single day.”
It is worth noting that many technologies have entered the mainstream after emerging from slightly seedy applications. For example, it is acknowledged now that many innovations – from VCR to digital payments – were largely driven by demand for pornography, and they can hardly be said to have suffered as a result of it.
In the hands of these companies, synthetic video technology is unlikely to be used for abusive purposes. Even so, the most mundane applications might still feed the liar’s dividend.
In 2020, Professor Andrew Chadwick, a political communications expert at the University of Loughborough, found that even ‘educative deepfakes’ intended for good can inadvertently deepen disillusionment: “When you look at people’s attitudes toward trust in news on social media before we expose them to the video, and then afterward ... we found that deepfakes seem to have this kind of effect on people’s general sense of trust, whether or not to trust the news they find on social media.”
Chadwick explained that deepfakes can give rise to a “culture of indeterminacy” in which it is difficult to tell truth from lies, prompting more cynicism about public information. He is, therefore, sceptical about the idea that deepfakes can be used for good: “[Non-abusive use of deepfakes] is not straightforward because it normalises the use of the technology. And I think if we normalise it and it spreads into all spheres of public life, we could be in a bit of trouble.” He wonders aloud about what could be done to minimise that effect, such as including disclaimers on synthetic videos, much like the disclaimers displayed when witness testimony is recreated by an actor.
Synthesia does not require customers to disclose to viewers that their videos are AI-generated, although Riparbelli says that most do so anyway. He defends synthetic video, arguing that it is simply the latest of many game-changing technologies – dating back to Gutenberg’s printing press – that might change how we scrutinise media.
“I think there’s definitely some merit to [the criticism], you know, we can’t trust video and audio the same way we did a good five to 10 years ago,” he says. “But I think that’s just the way the world evolves. I think none of us would be without the printing press; none of us would be without the internet; and I don’t think any of us will want to be without synthetic media.”
Flawless is a London-based AI company specialising in ‘visual dubbing’ for entertainment. “We focus on visual translation,” explains Pablo Garrido, head of research at the company. “So, we try to move the mouth according to a new audio track. It’s quite novel.”
When films are prepared for a foreign-language market, they may be dubbed. However, this results in the mouth not moving with the words, creating a distracting mismatch between video and audio for the audience.
Although some researchers have experimented with manually editing the video to fit the new audio, Garrido explains that Flawless is the first to do it with a fully automated process.
He hopes that the company can provide the tools to prepare films for foreign-language markets, rendering the process “simpler and less costly” than before.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.