Skip to content

2024

Rico LeBot • Real-time Voice Interface Toolkit

In November 2024, I've had the opportunity to participate in the OpenAI Builders Lab in Paris, where I was able to explore the potential of the Realtime API. I was amazed by how quickly I could build a prototype for a real-time web-based voice interface that uses function calls.

The initial prototype was focused on a cooking guide where you could ask for a recipe or ask for instructions on how to prepare a certain dish. It was an amazing and inspiring experience to me because I've spent the last 10 years building a similar technology. And it just works.

To go further, as I explored the possibilities of this prototype, it became clear there were some challenges to overcome before it could become a deployable product. That's why I created an open-source toolkit to address those problems.

Challenges and Solutions
  • WebSockets not well suited for longform connections: The official OpenAI Realtime API toolkit relies on WebSockets, which proved unstable for long-term sessions over HTTP. The OpenAI team suggested using WebRTC bridges for better stability. The toolkit implements WebRTC with a LiveKit integration.

  • Dynamic UI: I wanted a dynamic UI that could respond to user input in real-time. This meant connecting the function calls from the model to the front-end functions using remote procedure calls (RPC) over WebRTC. This brings the voice interface to life, allowing users to interact with different functions of the app seamlessly.

  • Architecture: A clear separation was needed between the web app's backend, the AI agent's backend, and the front-end. The toolkit achieves a lightweight design that is modular, has few dependencies, and is easy to use.

  • Roles: To quickly iterate on the user experience, you need to refine the prompts / instructions of the agent very frequently. That's implemented through an architecture where the 'roles' are separated from the code, and allows to add and modify them very quickly

Rico LeBot – Screenshot

Evolution of jobs

I remember when I was a kid in the 90s, my mother, a pharmacist in the hospital, telling me about her struggles with the new computers she had to use. She felt like these computers were stupid tools because they didn't understand her, and she often came home disturbed and uncomfortable with this new way of working. It was not a choice of her to use them, it was kind of forced with limited bandwith to adapt to it, and that was a bad feeling.

At the same time, I remember my own passion for tech growing, wondering why my mother was so bothered by it while I found so much joy in it.

Jobs have always evolved over time. One notable example is the blacksmith, a profession that saw significant growth from the Middle Ages with the development of the population and the strong need for the agricultural world.

Maréchal Ferrand

What I build

My interests revolve around natural interactions with digital services and AI as a tool to make it happen.


Motivations

My goal?

Create ways everyone can trust when interacting with technology. Be it for communicating, learning, or just playing around.

But here’s the thing.

How we connect with the digital world keeps evolving. We started with punch cards, then screens and keyboards. Then came the mouse and GUI. We got touchscreens, then pocket-sized devices. Now, it’s voice interactions.

More input. More ways to connect.

And with more input comes a deeper understanding of people.

But the big question: as things change, how important is data ownership in our future with digital assistants?

I want to build a world with the kind of tech I’d want my kids to use and live with for their entire lives.