Researchers at the University of Washington have developed an artificial intelligence (AI) system that allows headphone users to cancel all other sounds and focus on the voice of a target person in a noisy environment.
Called Target Speech Hearing, the AI system lets a user wearing headphones look at a person speaking for three to five seconds, which enrolls them in the system. Then the enrolled speaker’s voice in real time is heard in the headphones even as the listener moves around in noisy places and no longer faces the speaker.
While noise-canceling headphones have become ubiquitous in the audio marketplace and many automatically adjust sound levels for wearers, sensing conversations is still something that is harder to master.
AI is being used across multiple sectors for a variety of tasks, but this could usher in a new feature in portal wireless headphones like AirPods or earbuds.
“We tend to think of AI now as web-based chatbots that answer questions,” said Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”
How it works
A person wearing headphones fitted with microphones taps a button while directing their head at someone talking. The Target Speech Hearing system identifies the sound waves from the speaker’s voice then should reach the microphone on both sides of the headset simultaneously.
UW said there is a 16° margin of error as the signal is sent to an on-board embedded computer where the AI software learns the desired speaker’s vocal patterns. The system then latches onto the speaker’s voice and continues to play it back to the listener, even as the pair moves around. As the speaker keeps talking, the system learns more training data of the target speaker.
UW tested the AI system on 21 subjects and found the enrolled speaker’s voice was nearly twice as clear as the unfiltered audio on average. If the sound quality is not good, the wearer can run another enrollment to improve clarity.
The next steps are to test the system on earbuds and hearing aids as well as to enroll more than one enrolled speaker at a time.
The full research can be found in the journal CHI '24.