Speech technologies offer innovative way to enhance and differentiate apps

Peggy Albright

Speech recognition technologies have advanced much in the past couple years, crossing some important business and technical milestones that now make it possible for mobile developers to add these features to their applications. Developers who are seeking new and innovative ways to differentiate their products should pay attention to the innovation opportunities speech can bring.

Generally, speech technology has become popular and appreciated by the market in large part thanks to Apple's (NASDAQ:AAPL) virtual assistant, Siri, which has made speech-enabled interactions personally compelling to consumers.

But speech is also considered by its advocates to be a mature technology now and has sufficiently improved in accuracy and performance to justify expanded use in the market. Speech technologies exploit the conveniences and cost-advantages that cloud-based technologies provide, and they benefit from widespread consumer access to mobile broadband networks and smart devices. The speech engines themselves--through ongoing customer usage and constant innovation by the scientists who build these technologies--now have vocabularies of millions of words, the ability to engage with natural and conversational language and even to figure out behavioral and other contexts in which consumers interact with their speech-enabled apps.  

Dan Miller, senior analyst and founder at Opus Research, highly recommends that developers give speech a try. "Recognition of what people are saying and recognition of their intent, or understanding their context, are improving to the point that it really does make sense to add spoken conversational options to your app," he said.

Technology options available to developers are also broadening.  AT&T (NYSE:T), for example, announced earlier this month that in June it will begin offering developers tools to integrate its AT&T Watson speech and language engine into mobile apps. The company will release APIs for a range of applications, revealing a company strategy to get speech involved in everything from mobile search to TV guide interactions. 

The first APIs will focus on Web search, local business search, question and answer apps, voice mail to text, SMS, the U-verse TV electronic programming guide and dictation. The company has more APIs coming that will extend its speech features to gaming and social media applications, among others.

"This is the beginning of a long road of very exciting set of APIs. Ultimately we are really trying to make speech pervasive and ubiquitous across all apps," said Mazin Gilbert, associate vice president of technical research at AT&T. 

The AT&T Watson engine has been used by AT&T for years as part of its telecommunications services with customers.  It has been used in some mobile voice search applications. One recent innovative app is the AT&T Translator for iOS and Android phones. This app can translate conversations for people who are speaking different languages and do this in real time.

Nuance Communications began offering developers access to its speech technologies a year ago when it launched its Dragon Mobile SDK. It has 10,000 developers in its program, which is called NDEV Mobile.

Matt Revis, vice president of the handset business at Nuance Mobile, said developers are coming up with new and exciting apps every week. He gave some examples that illustrate the creative thinking that is going into these apps.

Amazon just launched a Dragon-powered speech application for iOS and Android phones, called "Price Check." The app is for people who are shopping in a retail stores and want to evaluate the price of a product. The shopper can speak the name of a product he or she is considering, and the application will display prices for that product from Amazon and other online merchants.

Kraft Foods recently used Dragon technology to add voice input feature to its cooking application, called iFood Assistant. As part of this application, now available on the iPhone, a customer can speak the name of a dish, such as "chicken casserole," and the application launches a search and gives the user a recipe along with a shopping list.

These few examples barely brush the surface of what advocates envision for speech-enabled apps. While the technology gives developers a new interface to innovate and compete with, success will be just as difficult to achieve in this application environment as it is with traditional apps.

"The successful applications will be creative," said Bill Meisel, president of TMA Associates. "Creativity is a skill that takes a certain type of personality and type of work. Creativity is what makes Siri so successful," he said.

As sophisticated as speech technologies have become, all systems make mistakes sometimes, and developers need to consider the potential impact of occasional failures on the particular apps they're developing. To prepare for this, for example, developers should program multiple options for getting results from a query if there is an error. If a virtual assistant needs to ask a user for clarification of a request, give the assistant some friendly and clever language to use when offering to help. Build in assurances so that the user will find success with the app.--Peggy