How Realtime Voice AI is Transforming Digital Interactions
This post is part of a series of reflexions on the upcoming transformation of digital interactions:
- Use Cases
- Technical (coming soon)
- Responsible Design (coming soon)
Early November 2024, I spent a day with the OpenAI team in Paris. A few weeks earlier, the announcement of an API to allow developers to create multimodal assistants—voice + text—in real time had made waves. I was fortunate enough to have time and receive firsthand advice to explore this new Realtime API with the team that designed it.
This technology is what powers the ChatGPT app in voice mode, and it's now available to developers for creating new applications.
I was blown away.
In 6 hours, I prototyped an interaction for a cooking coach. 100% voice-based, with beautiful sound quality, expressive intentions in the voice, and emotion. User-driven interactions, assistant-driven responses, contextual information references—all orchestrated with very simple code for a highly encouraging result.
What’s Happening Now
It was truly a unique moment because, as I write these lines, I have the tools in my hands to create natural and instinctive interactions.
This is a remarkable achievement by the teams who developed these technologies, and I’m grateful to now be able to develop, create, and imagine new uses based on this technology.
For over 10 years, I’ve dreamed of new interfaces for communication, learning, and play. It’s my motivational foundation for building a digital world that I would both be satisfied with and proud to leave to our children. Because it would suit me.
The opportunity is wide open.
Monthly reminder about that big green tree pic.twitter.com/kM46Wsxo9O
— Tim Urban (@waitbutwhy) June 21, 2021
Collectively, we have traveled through some of the darkest paths until reaching this technology. This precise moment where we all are now is that GREEN point, right in the middle. This point is beautiful because it’s the beginning. The beginning of an era of exploration.
The revolution in artificial intelligence opens up opportunities for entrepreneurs who want to thrive in their markets by embracing (1) simplified interactions with digital services and (2) the cognitive performance of software agents.
Trend of Acceleration (Oh Yes?)
Yes, and mainly an acceleration of the democratization of this technology.
Less than 3 months after the initial release of the technology in the form of an API: twice the voice quality and 100 times cheaper:
That’s it. That’s the tweet. The Realtime API now supports WebRTC—you can add Realtime capabilities with just a handful of lines of code.
— OpenAI Developers (@OpenAIDevs) December 17, 2024
We’ve also cut prices by 60%, added GPT-4o mini (10x cheaper than previous prices), improved voice quality, and made inputs more reliable. https://t.co/ggVAc5523K pic.twitter.com/07ep5rh0Kl
In this series of post, I share what I believe is fundamental to understand in order to make informed choices in this revolution. It’s a personal, structuring journey, and I’m making an effort to open it up to everyone to share the reflection.
What Use Cases?
First, yes, once again, it’s a technology looking for a problem to solve. And to be honest, I’m not yet convinced on this point. Yes, now it’s possible to create a natural interface with a digital service, but do people want it?
There is an analysis by Jared Friedman from YC on the impact of AI on software markets that applies very well to the specific case of real-time interfaces. This analysis classifies new opportunities into three buckets:
-
The Obvious Use Cases: All the very obvious cases like customer support (waiting for hours, no response, or a terrible answer—AI will always be better), sales assistance, the super personal assistant connected to my calendar, emails, etc. This will be the playground of the very significant players—Microsoft, Google, Apple, OpenAI...—who benefit from direct access to current users to transition them to this new paradigm.
-
The Big Shift from SaaS to AI: 300 unicorns on use cases currently handled by well-established, highly specialized companies, such as payment processing, customer relationship history, payroll management, visual communication production, data stream processing, etc. Expect major disruptions in this area where it’s likely that some services will disappear and others will emerge.
-
New Horizons: The unknowns for now, those that will magically combine the new possibilities of AI with existing technologies to solve tough and deep problems that everyone takes for granted and considers unsolvable—the status quo, “that’s just how it works.” Yes, but. Here, think of Uber as the conjunction of mobile + GPS + the need to transport from A to B or Airbnb as the conjunction of mobile + sharing opinions on the web + the desire to travel.
There's plenty to anticipate at least a decade of creation and destruction; let's prepare ourselves—it’s going to be tumultuous.
To continue this line of thought, we will consider use cases that are sufficiently obvious to make us say: okay, it’s quite possible this happens as soon as 2025 in a more technical post.