How Does Multiplication Work at Tenstorrent?

In the optimization of my inference engine, I need to focus on an operation performed several billion times per token: multiplication.

Tenstorrent is a Canadian company specializing in high-performance processors for AI. It was founded by Ljubisa Bajic and Jim Keller (whom I greatly admire -- he has worked at Apple, AMD, Tesla, no less).

Their approach to multiplication is truly original.

Refresher: the IEEE754 Standard

The IEEE754 standard defines how to represent a floating-point number in 3 parts:

For example, a 32-bit float:

Example: 0_10000001_01000000000000000000000

So Where Does Tenstorrent Fit In?

Traditionally, an FPU multiplies two floats by directly manipulating their mantissas --> precise, but expensive in transistors and therefore in energy.

At Tenstorrent, there is no direct floating-point multiplication. Instead: a very small 7-bit x 5-bit integer multiplier. The mantissas are split into blocks (7 and 5 bits), multiplied separately, then recombined (shifts + additions).

In doing so, they have the ability to adjust the precision of computations: Math Fidelity.

This is a slider that lets you choose between:

Exactly like a long multiplication where you choose to stop after 1, 2, or 3 steps depending on the speed / precision trade-off.

Why is this clever?

And this fits perfectly with the evolution of current models (e.g., 4-bit quantization built into LLM design from the start).

In short: doing floating-point with small integers -- and letting the software choose the level of precision.


Small Language Models (SLMs): The Future of Agentic AI?

An article published in June 2025 by researchers at Nvidia aligns with our vision of AI: "Small Language Models are the Future of Agentic AI" (arXiv:2506.02153v1).

They argue that SLMs -- more compact, more energy-efficient, and more accessible than LLMs -- are particularly well suited to specific, structured, and repetitive tasks... in other words: ideal for AI agents.

What if intelligence were no longer measured by the size of the model, but by its contextual efficiency?


Questions about this article or your own project? Book a consultation