Scalability Series - Part 2

Introduction

Before diving into the async architecture of mAIstrow, let us take a moment to look at what already exists. Kafka, Pulsar, Redpanda, Ray... these names come up whenever distributed systems are discussed. OpenAI and Google handle millions of users, but they do not tell you how. I want to be transparent, learn from existing solutions, and decide -- with full knowledge -- whether to draw inspiration from them or stay true to my vision: a distributed, sovereign, and ecological AI built on Rust.

Because, let us be honest: what if mAIstrow had to handle tens of thousands of users? It is not a question of "can we" -- it is a question of "how."


Why Explore Distributed Frameworks?

When I worked on Plan 9, I understood something fundamental: to innovate is also to understand what already exists. You do not reinvent the wheel -- you look at it, measure it, compare it, then decide whether to build it better.

My three-body system (server, AI engine, interface) is lightweight, local, and designed for sovereignty. But if mAIstrow were to become a public platform with thousands of users, I would need to understand how the giants do it. Not to copy, but to learn, and to decide whether to adapt or surpass.

That is why I explore Kafka, Pulsar, Redpanda, and Ray here -- not to abandon Rust, but to refine my choices and keep control over what I build.


Apache Kafka: The Streaming Giant

Kafka is a monolith of performance. It would replace my Rust server with a cluster of brokers and my requests/responses with topics:

It is powerful. It is what powers Netflix, LinkedIn, and Uber. But it is also massively complex.

My current system is centralized, lightweight, and local. A single Rust/Go server, a simple round-robin, SQLite for persistence. Kafka requires a cluster of brokers, Zookeeper (or equivalent) for coordination, and a full infrastructure.

Yet Kafka inspires me. Its partitions show how to distribute load without depending on a single server. Its consumer groups embody resilience by nature.

But my system remains more sovereign, simpler, and more flexible thanks to my Rust traits. Kafka is a tool for mass. I want a tool for mastery.

The Challenges of Scaling Kafka in Production

Scaling Kafka in production comes with significant challenges:


Apache Pulsar: A Modern, Modular Alternative

Pulsar distinguishes itself through its layered architecture:

This separation allows scaling storage independently from compute. You can add brokers for throughput or bookies for capacity, without massive rebalancing or downtime.

Pulsar is more modular than Kafka, handles multi-tenancy natively, and supports built-in geo-replication. It is tempting.

But once again: it is too heavy for my use cases. My system is built to run on a Raspberry Pi, an old laptop, or a VM at a friend's house. Pulsar, even in its "light" version, requires a cluster.

Pulsar would be a good candidate if I wanted to isolate data flows for schools, labs, or educational projects. But for mAIstrow, I prefer to keep total control.


Redpanda: The Lightweight Kafka-Compatible Option

Redpanda is a dream for those who value lightness:

Imagine: my Rust server becomes a Redpanda broker. The requests and responses topics exist. The interface and engines use rdkafka to communicate. It is almost like my current system, but with automatic distribution, built-in resilience, and obvious scalability.

Redpanda is the first framework I want to test. Not to replace my system, but to experiment with it. A proof of concept with a local broker, a Rust client, and a Transport trait that can switch between WebSocket and Kafka/Redpanda.


Ray: The Distributed Computing Champion

Ray is different. It does not manage streaming -- it manages distributed computing.

In Ray:

Ray is designed for machine learning, training, and parallel inference. Its architecture rests on dedicated primitives: Tasks (distributable stateless functions) and Actors (distributable stateful objects), each capable of specifying CPU/GPU requirements.

But Ray is often Python-based and cloud-oriented. My system is Rust, local, and modular. It is an option if I need to handle thousands of users with heavy computation. But for now, I prefer to stick with my traits, my abstractions, and my control.


What OpenAI, Google, and Anthropic Do

The AI giants handle millions of users. They likely use proprietary distributed systems, mixing:

But they do not talk about it. Not because they are secretive -- but because it is their competitive advantage. Their silence is a lesson.

The model is clearly cloud-centric: each depends on a proprietary cloud optimized for their AI workload, which closes the door on open resource access and open innovation.

I want to be different. I want mAIstrow to be transparent. I want people to know how it works, why it is designed this way, and what I learned building it.


Why Stay with Rust?

So, why not switch to Kafka, Pulsar, or Ray?

Because Rust is my tool for mastery, not just performance.

Kafka, Pulsar, Redpanda, Ray -- they are scalability beasts. But mAIstrow is a sovereignty beast.


Next Steps

  1. Test a POC with Redpanda: use rdkafka in Rust, create topics, integrate a Transport trait that switches between WebSocket and Kafka/Redpanda.
  2. Explore ray-rs: if I need to handle heavy parallel computation, try a prototype with Ray actors in Rust.
  3. Stay modular: the system remains lightweight, local, and sovereign, but it can draw inspiration from these tools without becoming heavy.

Quick Glossary