Accurate 3D facial models created using smartphone videos
Image credit: Dreamstime
A new process can quickly and easily create digital models of people’s faces by harvesting data gleaned from videos recorded on a smartphone.
While laser scanners, structured light and multicamera studio setups can already produce highly accurate scans of the face, these specialised sensors are prohibitively expensive for most applications.
Carnegie Mellon University researchers were able to pull off this trick by shooting a continuous video of the front and sides of the face to generate a dense cloud of data.
Deep-learning algorithms are then applied in a two-step process that builds a digital reconstruction of the face.
The experiments show that the method can achieve sub-millimetre accuracy, outperforming other camera-based processes.
“Building a 3D reconstruction of the face has been an open problem in computer vision and graphics because people are very sensitive to the look of facial features,” said associate research professor Simon Lucey. “Even slight anomalies in the reconstructions can make the end result look unrealistic.”
The team said the digitally reproduced faces could be used to build an avatar for gaming or for virtual or augmented reality, and could also be used in animation, biometric identification and even medical procedures.
An accurate 3D rendering of the face might also be useful in building customised surgical masks or respirators.
The process begins by shooting 15-20 seconds of video, the researchers used an iPhone X recording on the slow-motion setting.
“The high frame rate of slow motion is one of the key things for our method because it generates a dense point cloud,” Lucey said.
The researchers then employ a commonly used technique called visual simultaneous localisation and mapping (SLAM).
Visual SLAM triangulates points on a surface to calculate its shape, while at the same time using that information to determine the position of the camera. This creates an initial geometry of the face, but missing data leave gaps in the model.
In the second step of this process, the researchers work to fill in those gaps, first by using deep learning algorithms.
Deep learning is used to identify the person’s profile and landmarks such as ears, eyes and nose. Classical computer-vision techniques are then used to fill in the gaps.
“Deep learning is a powerful tool that we use every day,” Lucey said. “But deep learning has a tendency to memorise solutions,” which works against efforts to include distinguishing details of the face. “If you use these algorithms just to find the landmarks, you can use classical methods to fill in the gaps much more easily.”
The method isn’t necessarily quick; it took 30-40 minutes of processing time. But the entire process can be performed on a smartphone.
In addition to face reconstructions, the CMU team’s methods might also be employed to capture the geometry of almost any object, Lucey said. Digital reconstructions of those objects can then be incorporated into animations or transmitted across the internet to sites where the objects could be duplicated with 3D printers.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.