An artificial intelligence algorithm can produce 'natural' sounds indistinguishable from the real ones

AI alogrithm reproduces incredibly lifelike natural sounds

A new algorithm developed by MIT researchers can automatically produce sounds of objects being hit, touched or scraped, which for human ears are indistinguishable from real natural sounds.

The algorithm, which according to the researchers could be used to help robots navigate in various environments or produce sound effects for films and video games, predicts the sounds by processing visual input.

"When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it," explained Andrew Owens, a PhD student at the Computer Science and Artificial Intelligence Laboratory of the Massachusetts Institute of Technology (MIT)

"An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world."

The team used deep learning techniques to train the computer to predict sounds. First, the researchers fed the system 1,000 videos comprising 46,000 sounds of various objects being hit, scraped and prodded with a drumstick. The algorithm deconstructed the sounds and analysed their characteristics, including pitch and loudness.

After the training period, the computer was able to automatically produce sounds for a video using its database.

"The algorithm looks at the sound properties of each frame of that video and matches them to the most similar sounds in the database," said Owens. "Once the system has those bits of audio, it stitches them together to create one coherent sound."

The team tested the artificial sounds on a group of volunteers. In the experiment, conducted online, the test subjects were shown pairs of videos of collisions – one containing a real sound and the other one the sound generated by the computer.

When asked, which sound was real, the respondents selected the fake sound over the real one twice as often as a baseline algorithm. Sounds such as those produced by leaves and dirt proved to be more confusing than cleaner sounds such as those of wood or metal.

Another algorithm developed as part of the project was able to distinguish between hard and soft materials with 67 per cent accuracy based on the provided sounds.

The team believes that future work in this area could improve robots' abilities to interact with their surroundings.

"A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft and therefore know what would happen if they stepped on either of them," said Owens. "Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world."

The library of 46,000 sounds used to train the system is now available for free to other researchers.

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them