Researchers make realistic fake puppeteering possible from a single image

Image credit: Dreamstime

New research tests method that relies on a minimal amount of data to produce personalised photorealistic talking head model videos

A piece of research emerged this week that shows how to produce ‘talking heads’ from a single picture of a person.

The research paper, called ‘Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, authored by Egor Zakharov and others at Samsung Labs and Skolkovo Institute of Science and Technology in Moscow, represents a stark reminder that developments in machine learning might soon permit us to make lifelike but fake videos of a person with merely their picture at hand.

The new method promises to imitate a person's head and mimic their facial traits without the need for the vast amount of data points such techniques required historically.

The research demonstrated how "to animate and imitate talking heads from a single picture of a person" and introduced a "framework for meta-learning of adversarial generative models, which is able to train highly realistic virtual talking heads in the form of deep generator networks”, according to the authors.

What was dubbed as ‘puppeteering heads’ and became famous with imitating celebrities in videos with neural networks and other machine learning techniques is not new. However, the little effort with which they can now be produced is new.

'Fake' puppeteering videos of public figures are now prevalent online. Anyone can see Barack Obama’s speeches with a fake puppeteer saying things the previous US president would have never dared to say (such as presented by the University of Washington’s ‘synthesising Obama project’ from 2017). 

"This is the future of fake news. We’ve long been told not to believe everything we read, but soon we’ll have to question everything we see and hear as well", writes Olivia Solon for the Guardian

What is the difference between this research and many previous applications? What makes these findings special, according to the authors, is that the new method used as little amount of data as possible to produce these entertaining videos.

Only a handful of photographs – this can be as little as a single image – would be thrown at the system to create a new working model. A model trained on 32 images would hit ‘perfect realism and personalisation scores in the user study. 

In one instance, the research team concentrated on Leonardo da Vinci's Mona Lisa portrait - hailed as the "sphinx of beauty who smiles so mysteriously" by Théophile Gautier - in order to test the system. The model Da Vinci drew, Lisa Gherardini, the wife of Francesco del Giocondo, seems resurrected near perfectly - despite losing some of her irresistible smirk in the process. 

Few-Shot Adversarial Learning of Realistic Neural Talking Head Model

Image credit: Skolkovo Institute of Science and Technology

On Twitter, a PhD student at the Skolkovo Institute of Science and Technology shared his surprise that "no 3D face modelling was used, merely adaptive instance norms and GANs".

At present some limitations still prevail. One crucial feature causing problems is something called ‘mimics representation’. The current set of landmarks would not represent the gaze in any way - something called ‘landmark adaptation’ would be needed. The implication is that when using landmarks from a different person, this would mar the representation in the resulting video and would lead to a noticeable ‘personality mismatch’, so the paper states.

The research concludes that if one wants to create more legitimate fake puppeteering videos without such mismatch, some landmark adaptation is required.

If ‘fake puppeteering’ is not at the centre of the aim and instead only used to "to drive one’s own talking head" the approach the research team is offering would already cater for a "high-realism solution", concluded the authors.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles