Skip to content

2025

The Evolution of My Coding Muscle

I've spent the last 20 years coding.

My brain thrived on programming languages: C, Java, Python, Dart/Flutter, JS/TS.

I learned tricks, optimized algorithms, and debated design patterns.

Crafting solutions was a meticulous skill—like building precise machinery.

Then AI-powered coding tools arrived. GitHub Copilot. Cursor.

Suddenly, my routine transformed.

calculator

Dynamic Confidence Estimation for LLM Task Execution

How do you measure the reliability of LLM outputs in production?

Run tasks multiple times, track dominant results, and calculate dynamic confidence with statistical methods (LLN, CLT). Practical guidelines and code in this post.

As developers, we often use Large Language Models (LLMs) in real-world systems where not only the answer to a task matters, but also how confident we are in that answer. In an experiment, I ran a simple task—"How many R's are in 'strawberry'?"—103 times sequentially and independently on the DeepSeek-R1-Distill-Llama-8B model. This experiment illustrates two key aspects of a production system:

  1. Task Result: What is the output of the task?
  2. Probabilistic Confidence: How much trust can we put in that output?

result

When we observe successive executions, both pieces of information evolve. The outcome of the task can be inferred using principles like the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). But if the results do not converge to a stable answer, it might mean that the task is inherently ambiguous for the model. On the other hand, the confidence is evaluated dynamically based on the number of executions and the distribution of outputs.

In other words, you want to ask, "Now that I've done X executions, and I see that Y of them point to one particular result while the rest differ, how confident am I that this result is indeed the correct one?"

I've saved 13% disk space on my Mac today

Today, I've cleaned 65GB safely and I feel better now. That's ~13% of my disk space.

alt text

See this big gray "System Data" space? That's where I've been cleaning a lot stuff.

What's annoying as a developer, is that it keeps growing and that the built-in MacOS storage space optimization tools don't really get it.

And I never really took time to understand it before it was my last viable option before buying a bigger Mac.

Yesterday, even the 'magical' Store in iCloud didn't change a thing.

alt text

So I had to figure out what was going on.

How Realtime Voice AI is Transforming Digital Interactions

This post is part of a series of reflexions on the upcoming transformation of digital interactions:

  • Use Cases
  • Technical (coming soon)
  • Responsible Design (coming soon)

Early November 2024, I spent a day with the OpenAI team in Paris. A few weeks earlier, the announcement of an API to allow developers to create multimodal assistants—voice + text—in real time had made waves. I was fortunate enough to have time and receive firsthand advice to explore this new Realtime API with the team that designed it.

This technology is what powers the ChatGPT app in voice mode, and it's now available to developers for creating new applications.

I was blown away.

In 6 hours, I prototyped an interaction for a cooking coach. 100% voice-based, with beautiful sound quality, expressive intentions in the voice, and emotion. User-driven interactions, assistant-driven responses, contextual information references—all orchestrated with very simple code for a highly encouraging result.