Natural language processing (NLP) and ambient sound processing have traditionally been cloud-only technologies, which has restricted adoption in many markets.
But with the advancement of deep learning compression technologies and artificial intelligence (AI) chipsets, the technologies could migrate quickly to consumer and connected devices, according to new data from ABI Research.
The market research firm forecasts that more than 2 billion end devices will be shipped with a dedicated chipset for ambient sound or NLP by 2026.
"NLP and ambient sound processing will follow the same cloud-to-edge evolutionary path as machine vision,” said Lian Jye Su, principal analyst for AI and machine learning at ABI Research. “Through efficient hardware and model compression technologies, this technology now requires fewer resources and can be fully embedded in end devices. At the moment, most of the implementations focus on simple tasks, such as wake word detection, scene recognition and voice biometrics. However, moving forward, AI-enabled devices will feature more complex audio and voice processing applications."
Alexa, Google Assistant, Siri and other voice AI have exploded in the home and enterprise sectors. This year, Apple said Siri would process certain requests and actions offline, freeing it from constant internet connectivity and improving the overall smartphone experience, ABI said. Competitors will follow suit, especially Google, which is working with its Tensor system-on-chip (SoC) to offer similar support for Android operating systems and billions of consumer and connected devices.
In enterprise, ambient sound processing is still in the nascent stages but it might grow quickly as sensor vendors are conducting trials for machine sound for uptime tracking, predictive maintenance and machinery analytics. This combination of machine sound with other information such as temperature, pressure and torque can predict the status of machine health and longevity.
Meanwhile, chipset vendors are forming partnerships to boost capabilities with collaborations being formed to offer multilingual speech recognition technology in low-power audio devices. A recent collaboration between Synitiant and Renesas aimed to provide multimodal AI with deep learning and audio processing.
"Aside from dedicated hardware, machine learning developers are also looking to leverage various novel machine learning techniques such as multimodal learning and federated learning,” Su said. “Through multimodal learning, edge AI systems can become smarter and more secure if they combine insights from multiple data sources. With federated learning, end users can personalize voice AI in end devices, as edge AI can improve based on learning from their unique local environments.”