Audio and Video

AI algorithm can create music that sounds like a live performance

01 March 2021

Researchers from the University of Washington have achieved what was once thought to be impossible. Their artificial intelligence (AI) algorithm can create music that sounds like a live performance using only visual cues. The new system is called Audeo.

Audeo creates audio from silent videos of piano performances. To create music, the algorithm must first figure out features in the video frames that are related to music generation and imagine the sound happening between the frames. Audeo uses a series of steps to decode videos and translate them into music. It detects the keys being pressed to create a diagram, which is then translated into something a music synthesizer would recognize as piano sounds. This step cleans up the data and adds more information to make clear audio.Source: UnsplashSource: Unsplash

During the project, the system was tested using videos of professional pianist Paul Barton. The training involved about 172,000 video frames of Barton playing music from well-known composers. Audeo was then tested on almost 19,000 frames of Barton playing different music. When Audeo generates music, the synthesizers generate it into sound. The team used two different synthesizers.

The Audeo-created music was tested with music recognition apps which correctly identified Audeo-played pieces about 86% of the time. In comparison, the apps identified the same piece from the source videos about 93% of the time.

Researchers say that there is more to be done to see how well the device would transcribe any musician or piano.

A paper on this technology was presented at the NeurIPS 2020 conference.



Powered by CR4, the Engineering Community

Discussion – 1 comment

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Re: AI algorithm can create music that sounds like a live performance
#1
2021-Mar-11 10:20 PM

A probably ridiculous question comes to mind... Is it possible the visual cues include, if not rely upon the body language of the performing artist, exhibited during the piece?

It seems a minimal analysis of keystrokes of any short, even incongruent sequence of keys seeing just 1 or 2 fingers would easily identify the composition being performed.

I must be missing or misunderstanding something.

Engineering Newsletter Signup
Get the GlobalSpec
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Advertisement
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter
Advertisement