Plan 9: An Operating System That Shaped My Career

I just turned 50. In my head, I am 37. I am still as curious as ever, and the advent of generative AI is a topic that fascinates me.

But still, 50 is a round number, and I wanted to look back on one of the technical challenges that shaped my career -- the one I encountered while working on 9vx, the version of Plan 9 adapted for Linux. This project, which long fascinated free software developers, gave me the opportunity to put my ability to analyze, question assumptions, and propose concrete solutions to the test.

The "double sleep" bug

The problem appeared when a kernel process, called kproc, remained stuck in a sleep state without ever being woken up. After several exchanges with community contributors, it became clear that the root cause of the deadlock lay in an assumption made during the design of 9vx: the developers assumed that a thread under Linux could be treated as a simple equivalent of a simulated processor, without accounting for the fact that the thread could migrate from one core to another. This simplification worked in many cases, but it did not account for the fact that the function that disables local interrupts could be called from a thread running on a different processor than the one it was originally assigned to. The scheduler, relying on this assumption, left the kproc asleep indefinitely.

For the curious, here is the original thread on comp.os.plan9: 9vx, kproc and double sleep.

The method: GDB, 40 pages, and a highlighter

To pinpoint the source of the malfunction, I adopted a very methodical approach. I first wrote a small GDB script that, at each scheduler stop, displayed the relevant fields of the process: the processor ID, the process state, and the interrupt flags. The script redirected its output to a text file, which I then split into forty pages of approximately two hundred lines each. This breakdown allowed me to work on paper, scanning the traces line by line and highlighting, with a marker, the hexadecimal values that changed unexpectedly. The numbers that appeared in yellow consistently corresponded to CPU identifiers different from the one on which the thread was supposed to be running. This observation confirmed that the scheduler was calling the interrupt function from the wrong core.

The community's resistance

When I presented my proposed fix to the community, I encountered some resistance. Some contributors felt that the solution was not immediately understandable and could introduce new complexities. This reaction, although difficult to accept at the time, was very enriching: it pushed me to further clarify my arguments and to explore the limits of my approach.

In the end, the fix I proposed was integrated and, to my knowledge, has never been replaced in subsequent versions of 9vx. Today, 9vx is rarely used, but the fact that my solution sparked debate among high-level engineers remains a source of pride for me.

The scientific method applied to code

This experience illustrates the very principle of the scientific method. Proposing a hypothesis, submitting it to criticism, and accepting that it may be refuted is at the heart of progress. Karl Popper reminded us that, for a theory to be scientific, it must be falsifiable; otherwise it cannot be tested. In the field of software development, every idea must be put to the test, confronted with facts and expert opinions. The resistance I encountered is therefore not an obstacle, but a natural step in the validation process.

Looking back on this project, I realize how much each challenge, even the most technical ones, contributes to enriching our understanding of how systems work and to strengthening our capacity to innovate.

What now?

Reaching the half-century mark is merely an invitation to keep exploring, learning, and sharing. For me today, that means, among other things, implementing an LLM inference engine in Rust and assembly languages, along with a few FPGA projects.