Voice AI: What’s the Latest?

“Our time is now.”

So said Nitesh Sharan, the CFO of SoundHound AI [SOUN], in an interview with OPTO Sessions in July 2025. Sharan was describing a new dawn in voice-activated artificial intelligence (AI). 

With the development of large language models and agentic AI, he said, “you’re seeing massive inflection in how this technology can now handle the compound complexity that is part and parcel of how humans interact.”

Six months later, does it look like Sharan spoke too soon? 

Absolutely not. Everything suggests that this will be a big, big year for voice AI. Let’s take the temperature of the broader sector, and try to work out what it could mean for investors looking for the next opportunity in AI.

CES Reveals

As always, CES 2026, held in Las Vegas earlier in January, gave tech-watchers a good idea of what lies ahead.

SoundHound, which remains a leader in voice and conversational AI, debuted new developments to its Amelia 7 agentic AI. 

These include commerce capabilities such as restaurant reservations powered by OpenTable, the world’s leading provider of online restaurant bookings; parking payments powered by Parkopedia, the world’s largest parking services provider; takeout ordering from thousands of restaurant locations across the US; and flight bookings and hotel reservations. Additional capabilities include intelligent vehicle diagnostics; booking of service appointments and dealership visits; and access to popular calendars, emails and messages, including the ability to schedule meetings.

SoundHound also showcased its edge AI partnership with Nvidia [NVDA].

According to Keyvan Mohajer, CEO and Co-Founder, SoundHound’s agentic AI “can now orchestrate real-world commerce via the vehicle dashboard, via TVs and across many other media, using the human voice as its main interface. After decades of websites and mobile apps, we predict this is the way businesses will interact with customers in this new era of AI.”

Exciting stuff.

Other, smaller companies in the space debuted applications which, taken together, demonstrate the range of potential uses of voice AI.

Accessibility technology was a key theme at the trade show on Sunday, with companies such as Cearvol and Elehear unveiling AI-powered hearing aids designed to cut through background noise. 

Startup Subtle Computing demonstrated its new “voicebuds”, which use fine-tuned, “high-performance voice isolation models” to enable accurate dictation in both loud and quiet environments. According to the company, the new device generates five times fewer errors than Apple’s [AAPL] AirPods Pro 3 in combination with OpenAI’s transcription model.

“We are seeing that there is a huge move toward voice as a new interface that a lot of folks are adopting. You can do much more with voice in a natural way than with a keyboard. However, we saw that voice is rarely an interface people use when others are around … using our noise-isolation model, we will give consumers a way to experience a voice interface in the form of our earbuds,” CEO Tyler Chen told TechCrunch.

Subtle has so far raised $6m in funding, and is working with firms including Qualcomm [QCOM] and Nothing. 

Going beyond in-ear devices, Gyges Labs showcased Vocci, an AI-powered note-taking ring capable of understanding 112 languages and using an agent to summarize transcriptions with awareness of implicit meaning and historical context. 

Interestingly, this ring strips out the biometric doohickeys (heart rate sensor, step counter, etc.) associated with traditional wearables, such as Samsung’s [SSNLF] Galaxy Ring.

Another CES debut, Maono’s DM40 Wireless Microphone combines advanced audio and wireless technologies to deliver studio-quality vocal performance tailored to gamers, streamers and content creators. The microphone is complemented by Maono Link, an AI-powered software platform that applies large language models to enable realistic voice transformation.

On that note, going beyond CES, Modulate is a Boston-based, venture-backed voice AI startup. It initially built its reputation in the massively multiplayer gaming ecosystem, where its technology was deployed to moderate player communications in franchises such as Call of Duty and Grand Theft Auto. After establishing a foothold in gaming, the company expanded into enterprise use cases, applying its voice AI to contact center operations, with a particular focus on banking and insurance. 

On January 21, Modulate unveiled a new system architecture, dubbed the Ensemble Listening Model, which coordinates multiple models simultaneously to deliver a richer, more accurate interpretation of real-world conversations.

According to the company’s own data, their models outperform rivals such as Alphabet’s [GOOGL] Gemini, ElevenLabs and OpenAI across metrics including conversational understanding, transcription accuracy and deepfake recognition.

In short, as Nat Rubio-Licht wrote in the Deep View, “The timing looks right for audio AI to explode. Industry voices are starting to question how useful large language models are when used solely for chatbot capabilities. And as consumers start to examine exactly how AI fits into their lives, audio-based models provide an easy way in.”

Hey Google, Can These Startups Compete?

The question is, will this proliferation of interesting new use cases be able to survive in an ecosystem dominated by big tech?

According to one report, the global voice assistant market, valued at $7.08bn in 2024, is expected to exceed $59.9bn by 2033, growing at a CAGR of 26.8% between 2025 and 2033. That’s a significant chunk of money, and the big tech players are keen to get a piece of it.  

Apple’s Siri, Amazon’s [AMZN] Alexa and Alphabet’s Google Assistant are looking like legacy tech at this point. But this does not mean that the giants aren’t working hard to corner the voice AI market.  

Google, for instance, is advancing its voice AI leadership by updating its core voice search experience on Android, reflecting a broader pivot toward conversational, AI‑powered interaction across products. The rollout of a redesigned voice search interface aligns with the company’s updated AI design language and reinforces voice as a central component of its search strategy. 

Recent enhancements to the Gemini platform — including contextual understanding and personal intelligence features that may extend into voice assistants — highlight Google’s continuing push to differentiate its ecosystem through deeper, multimodal AI integration.

Elsewhere, Apple’s strategy for voice AI centers on integrating Google’s Gemini models to overhaul Siri, scheduled for release in 2026. After years of challenges developing competitive in‑house AI, Apple is turning to a multi‑year partnership with Google to boost Siri’s conversational intelligence and contextual capabilities. 

Amazon is aggressively embedding generative AI into Alexa through the rollout of Alexa+, now automatically upgrading eligible Prime members’ Echo devices with a more advanced conversational assistant. This expanded deployment, including broader device compatibility and a web‑accessible Alexa+ interface, aims to lock in engagement across its ecosystem and compete directly with other AI assistants. 

Although early experiences report mixed performance, Amazon’s strategy is seemingly to leverage its scale and hardware install base to entrench voice AI usage in homes and commerce.

Lastly, Microsoft [MSFT] is expanding its voice AI footprint through Azure’s developer‑focused Voice Live API and deeper voice integrations in its Copilot products. Recent platform enhancements enable real‑time multilingual voice interaction and are designed to support custom enterprise applications beyond consumer assistants. 

Microsoft also continues embedding voice commands into Microsoft 365 Copilot and Windows, broadening use cases across productivity workflows. 

Conclusion

Voice AI startups can innovate faster than big tech, particularly in realistic speech synthesis, niche enterprise applications and creative content generation. In this sense, paying close attention to what they are working on, as we have done in this article, gives a good sense of where the broader sector is headed.

However, such startups face hurdles in scale, distribution and platform lock-in, where Google, Amazon, Apple and Microsoft dominate.

That said, if there is one challenger that might be able to carve out a space for itself, it’s SoundHound. 

That, at least, is the opinion of one analyst, Rick Orford, who earlier in January opined that “this under-the-radar voice AI company could be setting up for a massive comeback. After getting crushed over the last year, SoundHound is quietly landing real enterprise deals, posting record revenue growth and expanding into high-value, real-world use cases.”

Perhaps SoundHound’s time really is now. 

Disclaimer Past performance is not a reliable indicator of future results.

CMC Markets is an execution-only service provider. The material (whether or not it states any opinions) is for general information purposes only, and does not take into account your personal circumstances or objectives. Nothing in this material is (or should be considered to be) financial, investment or other advice on which reliance should be placed. No opinion given in the material constitutes a recommendation by CMC Markets or the author that any particular investment, security, transaction or investment strategy is suitable for any specific person.

The material has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Although we are not specifically prevented from dealing before providing this material, we do not seek to take advantage of the material prior to its dissemination.

CMC Markets does not endorse or offer opinion on the trading strategies used by the author. Their trading strategies do not guarantee any return and CMC Markets shall not be held responsible for any loss that you may incur, either directly or indirectly, arising from any investment based on any information contained herein.

*Tax treatment depends on individual circumstances and can change or may differ in a jurisdiction other than the UK.

Continue reading for FREE

Latest articles