A methodology for building AI agents
On March 31, 2026, a 59.8 MB .map file was accidentally published in Claude Code's npm package. 512,000 lines of TypeScript, 1,906 files. Anthropic's entire agentic harness exposed in broad daylight.
Everyone talked about it, and rightly so. Codenames for future models (Capybara, Tengu), the autonomous agent KAIROS, fake anti-distillation tools. Great material for headlines.
What interests me is something else. I want to be upfront that I did not have access to the leaked code myself. What follows is a synthesis of the technical analyses published by others in the days after the leak, cross-referenced with the claw-code project that reproduced the architecture from scratch in Python and Rust for legal reasons. From those sources, I ended up finding something that feels more valuable to me than codenames or code snippets: a methodology. Seven design principles that I compare to what I am building with mAIstrow, the agentic AI system I work on every day. They are useful to just about everyone I see getting seriously started on an agent system.
The number that changes everything
Before getting into the methodology, two numbers that should be enough to convince anyone this matters.
The first comes from SWE-bench Lite. With a basic scaffold, GPT-4 solves 2.7% of the problems. With the CodeR framework [1], the same model climbs to 28.3%. A factor of ten without touching a single weight. The second comes from Terminal-Bench 2.0, where Opus 4.5 tested with three different scaffoldings (Augment, Cursor, Claude Code) shows a 17-problem gap across the 731 issues of the benchmark [2]. Same model, three infrastructures, three clearly different scores.
The conclusion is hard to dodge. The choice of model is not the determining factor of an agent's performance. The infrastructure around the model (the scaffold [3], the harness, the tooling) counts as much and often more. That is also what the published analyses of Claude Code's source describe, section after section.
Sebastian Raschka reaches the same conclusion on the day of the leak itself, in a first post [4] where he lists the same ingredients I will detail below: project-context management, prompt cache, structured tools, context compression, memory, sub-agents. He then develops the analysis in Components of a Coding Agent with a mini coding agent in Python, and his formula captures what all the other analysts of the leak are also saying: "the surrounding system plays as much of a role as the model itself". The principles that follow are a reading of the available analyses through that same lens.
Principle 1. The scaffold matters more than the model
The published analyses describe a system of more than fifty tools organized in a plugin registry. The model doesn't really "code". It orchestrates structured calls (read a file, run a command, search a pattern in a directory) and the scaffold executes them.
Each tool has a risk level classified as LOW, MEDIUM or HIGH, and the scaffold decides accordingly whether the action passes through, requires confirmation, or is blocked. This separation has a consequence that is less trivial than it looks at first: you can change the model without changing the system. Opus, Sonnet, a third-party model, the scaffold stays the same. The analyses also mention the existence of feature flags for switching between models in flight.
What I take from this for my own projects is that it is more rewarding to start with the scaffold than with the choice of model. The model is something you will eventually swap out; the scaffold is what you are really building. That is the line mAIstrow follows on the system side, and the one herbert-rs follows a layer below on the inference engine side: the engine is deliberately decoupled from the model it loads, with weight format, transformer architecture and operators all treated as interchangeable modules. I made that effort from the start precisely because I knew that no model chosen today would still serve me in two years.
Principle 2. The prompt is the architecture
According to the analyses, Claude Code's system prompt [3] runs somewhere between 38,000 and 50,000 tokens. It is a structured document, closer to a specification than to a natural-language instruction.
The point that struck me most when reading those analyses is that a good part of the multi-agent orchestration is defined inside the prompt itself rather than in code. The Coordinator mode, which distributes work across sub-agents, would be described textually, as a directive addressed to the model rather than as TypeScript branching. The advantage is concrete: to change the orchestration behavior, you rewrite a section of the prompt and toggle a GrowthBook feature flag [3], whereas a hard-coded equivalent would have required a compilation, a release, a deployment and probably a pull request review.
The prompt would also contain fairly precise behavioral directives. No emojis unless explicitly asked for, concise answers, action first and reasoning second, and above all the obligation to evaluate the reversibility and blast radius of every action before executing it. That last point is what I find wisest in the whole methodology. These are criteria any good sysadmin has applied forever, and it is reassuring to see them explicitly codified for an agent.
What I conclude from this is that in an agentic system the prompt is not configuration but a genuine architecture, and it deserves the same care as your code. Meaning a review, a structure, rules of coherence and probably some form of versioning.
Principle 3. Memory is a hint, not truth
Claude Code implements a 3-layer memory system:
| Layer | Content | Loading |
|---|---|---|
| Index (MEMORY.md) | One-line pointers (~150 chars max) | Always in context |
| Topic files (.md) | Detailed notes by subject | On demand |
| Transcripts (JSONL) | Complete session history | Search only |
The index is capped at 200 lines. Each entry is a pointer, not content. When the model needs details, it loads the matching topic file. The raw history is accessible only through search.
Consolidation happens in the background through a forked sub-agent (autoDream) that merges observations, removes contradictions, and updates the index. Three conditions must be met: 24h since the last consolidation, 5+ accumulated sessions, and an acquired lock.
The most important design principle, though, is this one:
"Memory says X exists" ≠ "X exists now."
The model is instructed to verify against the real code before acting on a memory. If memory contradicts reality, reality wins and memory gets updated right after.
This pattern is not an arbitrary choice by Anthropic. Andrej Karpathy independently describes the same architecture in a recent thread on personalized memory for AI agents. He defends four principles there: explicit and inspectable memory rather than a black box, markdown files rather than a database (his "file over app"), data that stays under the user's control, and a swappable model ("BYOAI"). What MEMORY.md does in Claude Code matches that description almost exactly: markdown files, verifiable, local, independent of the model.
In the replies to Karpathy's thread, I came across a user describing his own system with "cron jobs and a dream cron to clean up stale memories/contradictions", shared memory across providers and real-time synchronization. Without ever seeing Claude Code's source, he had just reinvented autoDream and KAIROS. When the same pattern emerges independently at Anthropic, at Karpathy's, and among an isolated practitioner, it gets hard to keep treating it as a mere implementation choice.
The way I phrase it for my own systems is that an agent's memory is a cache, not a database. It accelerates, it gives context, but it doesn't carry authority. Once you accept that distinction, you avoid a whole class of erratic behaviors driven by stale memories silently contradicting reality.
Principle 4. Defense in depth, not width
The analyses describe a bashSecurity.ts file that would contain 23 numbered checks applied to every command before execution. 18 Zsh builtins blocked (eval, exec, source, trap...). Unicode injection detection (zero-width spaces). Path traversal verification with Unicode normalization.
The 23 checks are only the first layer. The full architecture has five:
| Layer | Function |
|---|---|
| 1. bashSecurity.ts | 23 syntactic and semantic checks |
| 2. Permission System | LOW / MEDIUM / HIGH classification |
| 3. User Confirmation | Approval required for HIGH actions |
| 4. Sandbox Execution | Isolated execution |
| 5. Output Sanitization | Result cleanup |
The permission system offers five modes that span the spectrum from strictest to most permissive: Default (frequent confirmations), Allow Edits (files OK, bash confirmed), Auto (where an ML classifier assesses command risk), Bypass and YOLO (everything passes, for development or users who know what they are doing). Five public CVEs [3] also document bypasses found and fixed through HackerOne, which is both reassuring about the maturity of the process and sobering about the real attack surface.
What I take away from this is that the security of an AI agent cannot rest on a single firewall. It looks more like a stack of sieves, each of which assumes the previous one let something through. The fact that Anthropic runs an active bug bounty confirms, for me, that this discipline is never "done", and that is probably the only reasonable attitude in this area.
Principle 5. Read parallel, Write serial
Claude Code's multi-agent system defines three types of sub-agents:
| Type | Isolation | Use case |
|---|---|---|
| Fork | Byte-identical copy of parent context | Quick explorations |
| Teammate | Own context, separate terminal pane | Long parallel tasks |
| Worktree | Isolated git branch | Code changes without conflicts |
Orchestration follows four phases: Research (parallel, read-only) → Synthesis (coordinator merges) → Implementation (serial, a single agent writes) → Verification (parallel, tests and reviews).
The concurrency rule is simple. Read operations (Glob, Grep, Read) execute in parallel. Write operations (Edit, Write, Bash) execute serially. A single agent modifies files at any given moment.
The analyses report explicit directives in the coordinator's prompt, along the lines of "Do not rubber-stamp weak work" or "You must understand findings". They are there to prevent a classic failure mode of this kind of architecture, where the coordinator passively relays the sub-agents' output without adding any judgment of its own.
This principle connects directly with what any engineer knows from concurrent programming. Reads parallelize painlessly, writes demand serialization. And in a system where the coordinator doesn't understand what it is coordinating, it becomes a bottleneck that adds nothing beyond message rerouting.
Principle 6. Design for cost from day one
According to the analyses, Claude Code's system prompt is split in two by a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker:
- Static section: tools, base rules, instructions. Identical for all users in an organization. Cacheable.
- Dynamic section: MEMORY.md, CLAUDE.md, project context. Specific to each session.
Prompt caching [3] charges cached tokens at one tenth of the price. This division cuts the system prompt cost by roughly 90%, the difference between ~$0.75 and ~$0.075 per request with Opus 4.6.
The analyses mention a promptCacheBreakDetection.ts file that would monitor fourteen vectors susceptible to invalidating this cache: MEMORY.md changes, mode toggles, tool additions, model changes. Each vector has its mitigation. "Sticky latches" prevent mode oscillations (plan → normal → plan) from breaking the cache on every toggle.
And a comment in autoCompact.ts, reported by the analyses, reveals a historical bug:
"1,279 sessions had 50+ consecutive failures wasting ~250K API calls/day globally"
The fix documented in the code is a circuit breaker [3] capped at three consecutive failures. It is simple, effective, and at Anthropic's scale it very likely represents several tens of thousands of dollars a day.
What this kind of detail reminds me is that in a production AI system, cost is not an optimization problem you handle later, it is an architectural constraint you integrate from the start. Structuring the prompt to make the most of the cache, watching the invalidation vectors, putting circuit breakers on every loop where the agent might keep grinding, all of that belongs to initial design rather than late-stage tuning.
I find the same concern at two levels in what I am building. On the mAIstrow side, it translates into a token budget discipline that looks a lot like the one described in the Claude Code analyses. On the herbert-rs side, the scarce resource is no longer the billed token but the CPU or GPU cycle. Every unnecessary matrix operation becomes wasted time that ends up as latency for the user. Both levels share the same logic: profile, identify the waste, refuse what serves no purpose.
Principle 7. From reactive to proactive: KAIROS
KAIROS is reportedly referenced 150+ times in the codebase, according to the analyses. It is a daemon [3] agent, a persistent process running continuously, compiled to false in public builds but fully present in the code.
The architecture: a cron every 5 minutes, GitHub webhooks, daemon workers, and a 15-second budget per proactive action cycle. The agent checks PR status, runs scheduled tasks, consolidates its memory through the /dream skill.
Exclusive tools are reserved for it: sending files to the user, push notifications, subscription to PR events. Brief mode is on by default, with minimal responses and a focus on action.
No known competitor has an equivalent today. Cursor, OpenHands and Copilot are reactive, Devin is autonomous but lacks both the daemon persistence and the memory consolidation. KAIROS describes a different regime, one where the agent stops waiting to be addressed and starts working on what it judges relevant.
That is probably where a good part of the future of AI agents is going to play out, and it will not be in the chat interface. It will be in this kind of daemon that runs in the background, prioritizes its own tasks, and only interrupts the human when it has something useful to say. The fifteen-second budget per proactive action cycle strikes me as a particularly well-thought-out detail. It prevents the agent from monopolizing resources while leaving enough headroom to do something useful on each pass.
Bonus. Anti-distillation, or how to protect your value
An unexpected aspect reported by the analyses: the anti-distillation mechanisms. Behavioral distillation [3] (training a small model on a large model's outputs) is the existential threat to proprietary agents.
According to those analyses, Claude Code implements four countermeasures:
| Mechanism | Principle | Effectiveness |
|---|---|---|
| Fake tools | Injection of fictitious tools into the prompt | Medium (bypassable via proxy) |
| Connector-text | Cryptographic signature of intermediate text | High |
| Undercover mode | Removal of internal codenames on public repos | High |
| CharCode encoding | Names encoded via String.fromCharCode() | Low (obscurity ≠ security) |
Fake tools are activated by four simultaneous conditions: compile-time flag, CLI launch, first-party Anthropic provider, and active GrowthBook flag. If a distiller trains on this data, their model learns tools that don't exist.
The strategy is pragmatic. It won't block a determined attacker, since a MITM proxy is enough to bypass fake tools, but it raises the cost of distillation above the profitability threshold for most actors who might be tempted by the exercise.
What this less spectacular part of the leak taught me is that value in today's AI economy has shifted. Models are gradually becoming commodities; what surrounds them has become, for Anthropic and its competitors alike, the real asset. And that asset is not protected by obscurity of the code, since this leak proves that code always leaks in one way or another. It is protected by data-poisoning mechanisms that make behavioral cloning costly. The claw-code project offers an almost perfect illustration of this reading. Its authors managed to reproduce the entire architecture from scratch in Python and Rust, solely from public analyses, which shows that the methodology itself is not a secret. The secret, if there is one left, is the discipline of execution.
What I take away for mAIstrow
None of these seven principles stayed theoretical for me. I recognize them in mAIstrow, the agentic AI system I work on every day. I also find them, to a lesser extent, in herbert-rs, the LLM inference engine in Rust and assembly that sits as its low-level layer.
On the mAIstrow side, the list of applicable principles is essentially the seven described above. The scaffold is decoupled from the underlying model so I can swap it without rewriting the rest. The system prompt is treated as a real structured architecture, with its rules of coherence and its cacheable static section. Memory follows the "hint, not truth" discipline, with an index at the front and topic files loaded on demand. Sensitive actions pass through several validation layers rather than a single check. Cost is built in as a design constraint, with particular attention to the ratio of cached tokens to fresh ones. And multi-agent orchestration follows the "reads in parallel, writes in series" rule, which I have found in practice avoids a whole class of shared-state bugs.
On the herbert-rs side, the analogy is more limited, because an inference engine is not an agent. Three principles still cascade down. Modularity first, with each operator (matmul, attention, RoPE) treated as a module replaceable independently from the others, in the same spirit as the tools in Claude Code. The obsession with cost next, transposed from the billed token to the CPU or GPU cycle consumed, with the same discipline of systematic profiling. And layered security finally, in a particular form: Rust eliminates a whole class of memory bugs by construction, which plays the role of a structural layer set once and for all at the language level rather than a runtime check.
What the Claude Code leak confirms, for me, is what many practitioners already knew without having formulated it as clearly. The model is a necessary condition for an agent to work, but the scaffold is what decides how useful it will be in practice. And it is on the methodology for building that scaffold that most of the difference between projects that hold up and those that stay at the demo stage is being settled today.
Sources
The analyses this article is based on:
- Alex Kim, Claude Code Source Leak: detailed technical analysis
- Kuber Studio, 512K Lines Analysis: the most comprehensive analysis
- WaveSpeedAI, Architecture Analysis: harness architecture
- Marc Bara, What the Leak Actually Reveals: prompt as architecture
- Anthropic Engineering, Infrastructure Noise: official source
- Penligent, Security Analysis: CVEs and security
- Scaffold (or harness): the software infrastructure surrounding the AI model. It is the code that receives the model's requests, executes actions (read a file, run a command), manages permissions, and returns results. The model "thinks", the scaffold "does".
- System prompt: the instruction text sent to the model at the start of every conversation. It defines its behavior, tools, and rules. In Claude Code, it is ~50,000 tokens.
- Tool call: a structured call from the model to a tool. The model emits JSON ("I want to read this file"), the scaffold executes it and returns the result.
- Prompt caching: a technique that avoids retransmitting invariant parts of the prompt on every request. Cached tokens cost 10x less.
- Circuit breaker: a pattern that stops an operation after N consecutive failures, to avoid wasting resources in a loop.
- MoE (Mixture of Experts): a model architecture where only a fraction of the parameters is activated per request. Allows a large model (120B) with the inference cost of a small one (5B active).
- Distillation: a technique that trains a small model to reproduce the behavior of a large one. "Anti-distillation" refers to mechanisms that prevent this cloning.
- Feature flag: a switch in the code that enables or disables a feature without redeploying. GrowthBook is a feature flag tool.
- Daemon: a process that runs continuously in the background, as opposed to a program that is launched then terminated.
- SWE-bench: a benchmark measuring an AI agent's ability to resolve real GitHub issues. 2,294 problems from open-source Python projects.
- Token: a unit of text used by language models. A common word = 1 token, a rare or technical word = 2-3 tokens. API costs are billed per token.
- CVE (Common Vulnerabilities and Exposures): a unique identifier for a known security vulnerability. CVE-2025-XXXXX = a publicly documented vulnerability.
- Webhook: a mechanism where an external service (GitHub, Slack...) automatically sends an HTTP notification when an event occurs (new PR, comment...).
- MITM (Man-In-The-Middle): an attack where an intermediary intercepts communications between two parties. A MITM proxy can modify API requests in transit.
- RAG (Retrieval-Augmented Generation): a technique that enriches a model's prompt with documents retrieved through search, rather than relying solely on what the model has memorized.
- Blast radius: a term borrowed from military security. In software engineering, it refers to the extent of potential damage if an action goes wrong.
[1] CodeR: Issue Resolving with Multi-Agent and Task Graphs (arXiv 2406.01304). CodeR uses a multi-agent framework with pre-defined task graphs to resolve GitHub issues. The same GPT-4 goes from 2.7% (basic RAG scaffold) to 28.3% with this orchestration. Source code: NL2Code/CodeR.
[2] Terminal-Bench 2.0: 6-point gap between the richest and poorest infrastructure configuration, same model.
[4] Sebastian Raschka. On the day of the leak: Claude Code's Real Secret Sauce (Probably) Isn't the Model (March 31, 2026, 2,830 likes). Then the developed analysis: Components of a Coding Agent, with a mini coding agent in Python. Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).
Independent reproduction:
- claw-code: clean-room port of the architecture in Python/Rust, maintained by AI agents
My related work:
- herbert-rs: LLM inference engine in Rust and hand-written assembly
- Open-source LLM landscape 2026: the companion article on models
- Benchmark reference (71 benchmarks): benchmark reference used in this article