Speech recognition has been around for many years, but it was only until Apple launched Siri with the iPhone 4S on October 2011 that this technology became mainstream. Despite its potential, speech recognition has taken a back seat to other innovations like the touch screen, but new developments by Nuance, Google and Intel are setting the stage for this technology to finally take off in 2014.
These developments are driving a new wave of voice-controlled applications and consumer devices that are changing the mobile experience:
- Voice control is a key component of the UI in new consumer devices. This is particularly true for wearable devices, because voice commands are better suited for a small touchscreen, as opposed to keyboard or touch screen entry. The Samsung smart watch launched in October 2013 and the Omate TrueSmart Android watch presented at CES 2014 both use voice control as the main UI; the new Windows-based Intel ultrabooks and tablets launched for the 2013 holiday season come with a voice-controlled Personal Assistant (PA). All of these devices use speech recognition technology developed by Nuance.
- Speech recognition adds value to the mobile experience. Financial institutions in the United States and Europe are using voice biometrics technology developed by Nuance for secure mobile app authentication; companies like USAA, Garanti and Geico are using the Nuance voice-controlled virtual assistant to upgrade their mobile apps from simply fulfilling transactions and data requests to virtual advisors.
- Smartphones are transitioning into voice-controlled, PA-centric devices. Apple started the trend with Siri, and Google is following closely. On December 2013 at LeWeb conference in Paris, Google Engineering Director Scott Huffman stated that Google's goal is to create the "ultimate personal assistant," with conversation as the main user interface. Intel is also jumping on the PA bandwagon; at CES 2014 in Las Vegas, the company presented Jarvis, a Bluetooth headset that connects wirelessly with a smartphone and integrates with a PA app to remotely interact with the phone via voice commands.
The technology that is driving these developments is natural language understanding (NLU), which allows devices to contextualize what the user is saying. The main barrier for speech recognition until now has been usability; by incorporating NLU to their voice recognition technology, Nuance can enable an interaction that more closely resembles natural conversation. Google uses a different approach for the same purpose; by aggregating Google Now and Knowledge Graph to their search engine, they have designed a voice-controlled PA that allows a more natural conversation. These developments by Nuance and Google bring Human–computer interaction (HCI) to a new level; enabling natural conversation patterns means users do not have to structure voice commands in a way that their device understands the instructions that need to be executed.
The stage is set for the next mobile battleground, which will be the voice-controlled Personal Assistant. It is not clear how Google, Apple and Microsoft will manage PA-centric devices; whether they will allow users to download a competing PA, or block access to the address book, camera and other device features. This is uncharted territory, but the app store experience indicates that an open platform and third party developers--as opposed to a closed environment--aggregate value for users, device manufacturers and developers.
The voice-controlled PA will drive adoption of speech recognition in mobile applications; this will require developers to evaluate how voice can add value to mobile apps, and they will need to upgrade their skill set to work with new speech functionalities at the hardware and OS levels. We expect conversational interfaces will be a major trend in 2014 that will transform the overall mobile user experience.
Raúl Castañón is a senior analyst with Yankee Group. Castañón focuses on analyzing opportunities around mobile applications and their impact in the mobile ecosystem. Before joining Yankee Group, Castañón was Product Manager at EMOSpeech, developing market analysis in the field of emotion recognition technology to identify strategic opportunities for product development. Prior to EMOSpeech, Castañón worked at Novell, developing product and partner strategy for endpoint management, collaboration and cloud-based enterprise software applications. At Comverse Network Systems, he gained significant experience in product marketing, working side by side with Tier 1 mobile operators to define strategy, pricing and business models for voice and data products. Castañón holds a B.S. degree in marketing and business administration from ITESO University in Guadalajara, Mexico, and an M.B.A. degree from Duke University.