Researchers from the University of Washington have achieved what was once thought to be impossible. Their artificial intelligence (AI) algorithm can create music that sounds like a live performance using only visual cues. The new system is called Audeo.
Audeo creates audio from silent videos of piano performances. To create music, the algorithm must first figure out features in the video frames that are related to music generation and imagine the sound happening between the frames. Audeo uses a series of steps to decode videos and translate them into music. It detects the keys being pressed to create a diagram, which is then translated into something a music synthesizer would recognize as piano sounds. This step cleans up the data and adds more information to make clear audio.
Source: Unsplash
During the project, the system was tested using videos of professional pianist Paul Barton. The training involved about 172,000 video frames of Barton playing music from well-known composers. Audeo was then tested on almost 19,000 frames of Barton playing different music. When Audeo generates music, the synthesizers generate it into sound. The team used two different synthesizers.
The Audeo-created music was tested with music recognition apps which correctly identified Audeo-played pieces about 86% of the time. In comparison, the apps identified the same piece from the source videos about 93% of the time.
Researchers say that there is more to be done to see how well the device would transcribe any musician or piano.
A paper on this technology was presented at the NeurIPS 2020 conference.
