For Hollywood, the conversational ability of machines (including evil robots) has long stood as a sort of barometer to suggest the technological level of the story's world. By that measure, the leading edge of mass-marketed, voice-interactive products probably puts us somewhere in an early Star Trek TV episode. The good news is these early products just scratch the surface of underlying speech recognition technologies that are becoming more accessible and available to any developer.
Voice control is certainly not new. What is different with products such as Amazon Echo, ivee and VoicePod is not only their "always-on" voice recognition capabilities, but also their position in the connected-device hierarchy. Not necessarily hubs in the traditional sense of the word, these devices serve as voice-interactive user interfaces to connected devices, which are linked directly to the hub through standard wireless connectivity protocols or indirectly through some cloud-based app.
For developers, what is exciting about these mass-market voice-capable appliances is their promise to stand as a universal voice-responsive remote control for the user's personal Internet of Things (IoT) space. Most notably, Amazon has moved quickly to enable developers to leverage Echo and the associated cloud-based Alexa voice-processing service. Amazon's Alexa Skills Kit provides a collection of APIs, tools, documentation and code samples designed to let anyone add functionality (or "skills") to it using a familiar "app store" model.
Even more intriguing for developers is Amazon's Alexa Voice Service (AVS), which enables developers to add voice interaction to their own connected devices. With this approach, users talk to the connected device through the device's own microphone; in turn, the device uses AVS to process the voice stream, but responds through its own speakers. AVS always remains in the background just like any existing cloud-based software-as-a-service (SaaS) offering database management, SMS service, or similar utility-level SaaS products.
Google's OnHub offers an interesting contrast to voice-interactive UI appliances such as Echo. Along with its primary router functionality, OnHub lives up to its name with more traditional hub features and connectivity. OnHub notably includes a speaker—to support setup of connected devices—but its lack of a microphone suggests a more complementary role in the Google-verse of Android devices. Here, OnHub might serve as a voice-responsive hub controlled primarily by smartphone apps, Android voice services and Google's deep experience in voice recognition built over years of performing Google Voice transcriptions.
Whether its standalone appliances such as Echo or smartphone-based Android (or IoS) solutions, it is clear that voice control of connected devices is emerging. However, rather than waiting to see if it is a world driven by sentences starting with "Alexa," "Ok Google," or "Hey Siri," engineers can explore a number of other alternatives.
While high-volume commercial designs can take advantage of a number of powerful (but expensive) solutions, engineers looking to explore this voice-recognition trend can find modules and add-on boards designed to simplify implementation of voice recognition. For example, VeeaR's EasyVR 3 module is a general solution that connects to a system board through a standard UART interface. EasyVR 3 offers a fairly limited vocabulary, but enough to control simple applications. If you are an Arduino user, VeeaR also offers the module as an Arduino shield.
Another Arduino shield, MOVI (for "my own voice interface"), is a Kickstarter-funded project that promises offline voice recognition and voice synthesis, supporting up to 200 customizable sentences (in English). MOVI provides a simple programming interface that allows developers to code customize responses to spoken phrases. Designers can also use MOVI's serial interface to connect the board to non-Arduino designs.
Voice recognition libraries continue to emerge in Arduino and other development communities. For example, the open-source Jasper platform runs on a Raspberry Pi outfitted with a WiFi adapter and USB microphone. Designed for always-on voice recognition, the platform works with a variety of third-party speech-to-text and text-to-speech engines including Google's. Jasper ties the various components together and provides both user-initiated interaction and event-based notification.
For Microsoft developers, CastleOS leverages the Microsoft Kinect microphone system to provide voice control of a wide range of supported devices through a CastleOS Kinect app available on all the popular mobile platforms. Engineers can write custom C# scripts to create specific voice-driven actions. Looking to reach a broader market for Echo-like voice-control home automation, CastleOS also has its own standalone voice-recognition hub in the works.
For developers, it is not difficult to jump on the voice-recognition bandwagon for voice-based home automation—and voice control of IoT devices in general. Inevitably, the more complex or unusual the voice command, the greater the chance for misconstrued phrases. It is a worthwhile gamble, though, because in the end, if voice is not working, you can always fall back on more traditional control or even "just use the keyboard."
Questions or comments on this story? Contact firstname.lastname@example.org