Voice AI’s Time is Now: SoundHound’s Nitesh Sharan on OPTO Sessions

With the rapid pace of technological advancement, it is easy to feel like you’re living in a sci-fi novel. But, as Nitesh Sharan, CFO at SoundHound AI [SOUN], explains, sometimes it is fiction that inspires reality. 

When the company’s founders, including CEO Keyvan Mohajer, were brainstorming an idea for a company in 2005, they asked a key question: “What do we not have in the world that you see permeated through Star Trek and Star Wars?” 

“One of the things that is embedded in the background of all those movies and shows is that people have conversations with anything and everything,” notes Sharan. Accordingly, developing voice artificial intelligence (AI) — that is, the capacity for machines to converse with humans — became the new company’s focus. 

Almost a decade later, both AI and voice tech have gone through “exponential change.” In this edition of OPTO Sessions, Sharan explains how AI is maturing, and why the voice tech segment has the potential to support aggressive growth. 

AI Adolescence

Sharan compares the experience of consumers interacting with the first generation of voice assistants to having a child. “When your kids start speaking, you’re a little bit blown away … and then you quickly find out it’s sort of limited in utility.” The early voice assistants could set a timer or add items to a shopping list, but natural interactions and complex problem-solving were beyond their capabilities.

With the development of large language models and agentic AI, however, “you’re seeing massive inflection in how this technology can now handle the compound complexity that is part and parcel of how humans interact … maybe puberty is good analogy, because you get exponential change.”

As a result of this rapid advancement, both the use cases and the monetization opportunities for voice AI have multiplied. Sharan provides the example of ordering coffee through your car’s voice assistant while driving to work. In a “seamless transaction”, the assistant suggests a coffee shop along your route and places your order so that it is ready for pick up when you arrive. 

Beyond ease for the consumer, such a use case is beneficial to all of the companies involved in the process, as Sharan points out. “The coffee shop is excited because they now have a new customer. And then our model is to share the economics of that transaction with the car manufacturer, so now even a car manufacturer or TV device maker can generate new revenue, which is a new opportunity for them.”

Talking Machines

Voice tech has come a long way since the heyday of Amazon’s [AMZN] Alexa and Apple’s [AAPL] Siri. With machines able to tackle ever-more complex tasks, conversational interface with machines has become a reality, and pent-up demand is growing. A 2023 study conducted by data analysis firm PYMNTS found that 28% of Americans were willing to pay a monthly fee for a reliable, smart voice assistant, with that figure rising to 31% among high-income consumers and 43% among millennials.

The reason is quite simple, Sharan says. “Voice AI has got a competitive advantage because it’s so easy to access. You just have to speak.”

On the hardware side, as well, voice AI is relatively straightforward to integrate, requiring a microphone rather than a keyboard or touchscreen. 

To make voice AI even more useful in a range of situations, SoundHound has focused on edge deployment of AI — that is, deploying voice AI that can work on devices without internet connectivity, in contrast to AI models that rely on access to a cloud to function. 

Edge applications are especially important in environments with unreliable or limited internet connectivity where clients may want access to an AI voice assistant. Sharan cites healthcare and automotive applications: a doctor working in a hospital may need access to AI patient support even without connectivity; similarly, a driver may need navigation or maintenance support even in low-connectivity areas.

SoundHound has enabled this kind of interaction by integrating small language models “with the capability to ingest a car manual, so that if you have a weird light that shows up on your dashboard or you start to hear some squeaking in your brakes, you can just talk to your car”.

Ultimately, the sky’s the limit for voice AI applications. Sharan provides the example of a young couple preparing for parenthood with the help of a voice AI assistant. The assistant could provide health insurance recommendations, or advise on financial strategies to build savings for the child’s education. “All those interconnected systems used to be five different conversations. Now, just through one integrated architecture, that’s the type of use case that you’re seeing more through agentic solutions.”

The key step between text-based chat bots and this sort of natural interaction is exactly what SoundHound specializes in: voice AI’s “killer app”.

And the company is already reaping the benefits, Sharan says. “Our time is now, and we’re seeing that in our business.”

Continue reading for FREE

Latest articles