One key step toward achieving natural feeling conversations with machines is combining the power of artificial intelligence with the ease of use of an API. Here's how it works and why it's a critical technology for developers to harness.

AI and the API: Developing the next generation of smart mobile apps

When thinking about smart apps or artificial intelligence (AI) today, Siri, Cortana, or Assistant probably come to mind. Hundreds of linguists and software engineers dedicate countless hours to building these services into responsive personal assistants that can answer questions, track down information, send messages, launch services, and more.

Developers outside the virtual assistant sphere tend to think they don't need and can't afford such sophisticated functionality in their own apps. This is no longer the case. All developers can and should add an intelligent voice interface into their apps.

Voice of reason

In the early days of iOS and Android, apps were highly fragmented. Since then, the ecosystem has become much more integrated. Device and OS manufacturers don't want users scrolling through tens of apps just to get to the one they need. To improve navigability and convenience, developers are integrating deep linking into their apps, such as Android intents, iOS URLs, and Facebook App links, allowing users to work across apps without needing to launch them separately. Voice is a powerful way to make the navigation process even more seamless. Integrating intelligent voice interfaces into your apps will be an essential part of participating in this increasingly connected mobile ecosystem.

Another reason to integrate voice interfaces is the growth of the Internet of Things (IoT). Nearly four billion connected things were in use in 2014, and Gartner expects this figure to rise to 25 billion in the next five years. Surrounded by dozens of connected devices, including phones, computers, watches, TVs, thermostats, and lights, consumers don't want the confusion of using multiple devices and interfaces to complete simple tasks. They want one universal interface that works across all these devices. Voice is that interface because it allows people to get what they need from their connected home (or car or office) with a few simple voice commands.

There are a wide range of situations in which users are unable to tap buttons in an app, especially as devices continue to infiltrate more areas of our lives. For example, if users are driving or cooking, or if their devices are out of arm's reach, intelligent voice capabilities greatly enhance the accessibility of apps by allowing users to free up their hands.

Finally, voice interfaces allow people to interact with your app in the most intuitive way possible: with natural language. This not only improves the user experience and cultivates a closer connection between your app and the user but also breaks through the chaotic, crowded din of daily life in the digital age.

Speech recognition gets smart

Speech recognition is one thing, but the understanding of speech required for AI is another. Many developers who have already integrated voice recognition into their app mistakenly think that this step alone is enough. For example, if you ask an app for the weather, and the system returns it as a string, it seems like the system understood it. Right? Not quite. Imagine someone speaks Klingon, and the system returns a string such as "BIpIv'a'? nuq 'oH muD rur." You still need a system to really understand what it means.

This is the gap that my company, api.ai, and others like Nuance and wit.ai aim to bridge. There are several important steps.

The first is automatic speech recognition (ASR), or literally transcribing voice to text. This capability is provided by some platforms (Android and Windows Phone) and not by others (iOS). api.ai, wit.ai, and Nuance all provide speech recognition if it's not available for your platform.

Next is natural language understanding (NLU), which is where AI comes in. Developers provide examples of requests that the app should understand, and then machine learning is used to train the system to understand all the requests similar to the provided ones. In api.ai's case, it then returns the request as formal JSON objects that represent the meaning of the request. In addition, the system is contextually aware. It takes into account both conversations and real-world contexts (gestures, GUI, etc.) to build complex interactions with the user. This approach is made even more powerful when it's an open platform. A community of developers contributing data about how people use voice effectively crowdsources knowledge and creates a rising AI tide that lifts all boats.

After you understand a user's request, you can either fulfill it in your app, such as call taxi or search for something, or send the request to some external service, such as a weather or news app. While not included in api.ai, developers can also integrate voice identification, user profiling, and much more to make their apps as smart and easy to use as possible.

Developers, find your voice

A voice interface will enable you to gain a better understanding of your user. Voice input isn't restricted to a limited set of buttons and functions. This means you can easily see exactly what your users are looking for in your app and act accordingly. A better understanding of your user enables better targeting. By analyzing what your users are asking or talking about right now, you can drive revenue with targeted offers and ads. Voice capabilities make developers and their apps smarter.

Consumers are always pushing app developers to create intuitive experiences. They click away from confusing websites, order on-demand services to their doorsteps, and seek out products like smart watches that integrate technology more seamlessly into their lives. This trend is only going to continue, and developers will have to adapt to their users with natural interfaces.

*Image source: Manny Valdes/Flickr

Topics: MobileApp Dev