Acquired Electronics360


Taking Voice Conversion to the Next Level

03 January 2017

Scientists and engineers have been working to develop a reliable and flexible voice-conversion technique. Most approaches rely on statistical models, with the Gaussian mixture models considered to be mainstream approaches. Unfortunately most of these voice-conversion methods require parallel data to train the system. This means that speech data from the source and target speakers must align so that each frame of the source speaker’s data corresponds with that of the target speaker. This reliance on parallel data poses problems that have prevented the techniques from gaining enough traction to achieve broad adoption.

A voice-conversion technique developed by researchers at the University of Electro-Communications in Japan, however, may offer a more viable alternative. The model created by Toru Nakashika and his colleagues uses an adaptive, restricted Boltzmann machine, which does not require parallel data from two speakers to train the system. Testing has shown that Nakashika’s approach can deconstruct and rebuild the source speaker’s speech, creating a voice that sounds like a different person.

The researchers have based this voice-conversion model on the premise that the acoustic features of speech consist of two layers: neutral phonological information, which is not associated with a specific person; and speaker identity features, which make words sound like they come from a specific speaker. After training the system, the researchers found that the model’s performance was comparable to that of existing parallel-trained models, with one exception. It offered a unique advantage: the system can generate new phonemic sounds for the target speaker—providing target speakers speech generation of a different language.

While this technology is still in the early stages of its development, the simple, intuitive and flexible nature of the model promises to open the door for a number of applications. These include security functions—such as speaker identification and authentication—and interface control modalities, such as speech recognition.

Powered by CR4, the Engineering Community

Discussion – 0 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Engineering Newsletter Signup
Get the Engineering360
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter


Date Event Location
22-24 May 2018 Los Angeles, CA
04-07 Jun 2018 Boston, MA
06-08 Jun 2018 Los Angeles, CA
18-22 Jun 2018 Honolulu, Hawaii
12-16 Aug 2018 Vancouver, Canada
Find Free Electronics Datasheets