Information Technology

Skype Demos Real-Time Language Translation

17 December 2014

Text, voice and video chat company Skype has demonstrated real-time language translation between English and Spanish using technology supplied by Microsoft Research. Microsoft acquired Skype for about $8.5 billion in May 2011. Skype has demonstrated the technology by way of a video chat between Peterson School in Mexico City and Stafford Elementary School in Tacoma, Washington.

While the technology is still in a "preview" or beta mode the company is encouraging users to sign up to help improve the technology and extend the number of languages that can be addressed.

Skype Translator relies on machine learning and recent improvements in speech recognition made possible by the introduction of deep neural networks, Skype said. The neural networks are used in conjunction with machine translation after first excising disfluencies – the ums and ahs typical of speech. By learning from training data during the preview stage, the software can learn to better recognize and translate the diversity of accents that people use and the specialized terms that may be relevant to various topics.

The training data comes from Skype Translator conversations that have been recorded to enable analysis as well as translated web pages and videos with captions. Skype Translator participants are notified as the call begins that their conversation will be recorded and used by Microsoft to improve the software.

As well as speaking out the translation the software creates text divided in sentences and complete with punctuation and capitalization.

ANNs on graphics cards

The technology is based on the adoption of artificial neural networks as an alternative to context-dependent Gaussian mixture model hidden Markov models (CD-GMM-HMM) as pioneered by Microsoft Research for speech transcription. This was proposed by Frank Seide, Gang Li and Dong Yu in paper published at the Interspeech 2011 conference.

Part of the group's progress was down to modelling very short utterances called senones of which there can be thousands rather than the relatively few phonemes. By modelling senones directly using deep neural networks it was possible to outperform the conventional CD-GMM-HMM large vocabulary speech recognition systems. This capability was also arriving at a time when it was becoming possible to map neural network computation on to graphics processing units and graphics cards. The speed up contributed to the feasibility of the architectural model.

In addition to Skype and Microsoft Research have created a customized "bot" that controls the call experience and organizes the audio, text and data streams and acts like a translator would.

The Skype Translator preview program is currently available for Spanish-speaking and English-speaking Skype customers who use Windows 8.1 or Windows 10 Technical Preview on their desktop or tablet computer.

Related links and articles:

Interspeech 2011 paper of Seide, Li and Yu

IHS IT research

News articles:

Voice Recognition Installed in More than Half of New Cars by 2019

Healthcare Robots for the Elderly Making Progress

Powered by CR4, the Engineering Community

Discussion – 0 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Engineering Newsletter Signup
Get the GlobalSpec
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter
Find Free Electronics Datasheets