Towards Energy-Efficient Conversational AI

At mAIstrow, we love hunting down waste. Why? Because it could reduce our carbon footprint, lower our clients' bills, and -- let's be honest -- because large AI models (LLMs) appear to be real energy sinkholes.

The human brain: a model of efficiency

Picture this: your brain, that little genius, runs on just 12 watts, like a small light bulb illuminating your ideas. According to a 2023 study published in PMC, biological computing is approximately 900 million times more energy-efficient than current artificial computing architectures. Yes, you read that right: 900 million times.

The powerful AIs we use today, such as ChatGPT, have a very different appetite. They impress with their answers, but at what cost? Tonnes of electricity for a simple "Hello" is a bit like using a fighter jet to deliver a pizza. And as they grow in popularity, they are even starting to replace traditional search engines, which could amplify the energy bill even further.

LLMs: powerful but hungry champions

These super-powerful AIs sometimes forget their own limits. They produce stunning answers, but the energy cost is considerable. Inference for an LLM like GPT-4 can reach 0.5 to 1 kWh per interaction, while training these models runs into millions of kWh. At a global scale, consumption is exploding.

Fortunately, there is an alternative: Small Language Models (SLMs).

Our SLM experiments: small but mighty

We experimented with SLMs such as Qwen3-MoE, SmolLM3 from HuggingFace, and Phi-4 from Microsoft. The result? Inference on a 53-watt machine -- no power-hungry GPU required -- producing up to 25 tokens per second. Not bad for the little guys, right?

Performance does not yet fully match that of LLMs, but progress is staggering. And above all, the performance-to-consumption ratio is incomparably better.

The gem: Sapient's HRM

While browsing the web, I uncovered a gem: a blog post from Sapient Intelligence. They unveiled the Hierarchical Reasoning Model (HRM), an SLM with just 27 million parameters that outperforms LLMs like Claude 3.5 on challenges such as ARC-AGI (40.3% with only 1,000 training examples).

The cherry on top: it consumes very little energy (training under 10,000 kWh) and draws inspiration from the human brain. A nod to that 900-million-fold efficiency gap. And for the curious, Sapient has published the code on GitHub.

What if we combined everything?

Imagine: a lightweight SLM that chats with you, boosted by an HRM that solves the tough puzzles. No expensive GPU, just an AI that runs like a Swiss watch -- or like your brain.

This duo could revolutionize chatbots, customer support, and even healthcare assistants. The future of conversational AI may lie in this mix of technologies: the fluidity of an SLM for dialogue, the power of an HRM for reasoning, all at a fraction of the energy consumed by today's LLMs.

Energy comparisons at a glance

SystemConsumptionContext
Human brain12-20 W (approx. 0.48 kWh/day)100 billion neurons
SLM/HRM (inference)~0.01-0.05 kWh per interaction27M to a few billion parameters
LLM like GPT-4 (inference)~0.5-1 kWh per interactionHundreds of billions of parameters
LLM (training)~1.5 million kWhFull initial training

What do you think? Should we keep betting on large models that drain energy, or put our chips on these frugal little geniuses?