What the Claude Code leak actually reveals: a methodology for building AI agents
On March 31, 2026, a 59.8 MB .map file was accidentally published in Claude Code's npm package. 512,000 lines of TypeScript, 1,906 files. Anthropic's entire agent harness exposed.
Everyone talked about it, and rightly so. Codenames for future models (Capybara, Tengu), the autonomous agent KAIROS, fake anti-distillation tools. Great headline material.
But what interests me is something else. Studying the technical analyses published from this code, and looking at the claw-code project that reproduced the architecture from scratch in Python and Rust for legal reasons, I found something more valuable than codenames: a methodology. Seven design principles that I compare to what I'm building with herbert-rs, and that anyone can apply.
The number that changes everything
Before diving into the methodology, a single number:
GPT-4 on SWE-bench Lite: 2.7% with a basic scaffold, 28.3% with CodeR [1]. Same model. 10x difference.
And a second one:
Opus 4.5 with three different scaffoldings (Augment, Cursor, Claude Code): 17 problems apart on 731 SWE-bench issues [2].
The conclusion is brutal: the choice of model is not the determining factor. The infrastructure around the model (the scaffold [3], the harness, the tooling) matters as much, if not more. That's exactly what Claude Code's source confirms in every file.
Sebastian Raschka reaches the same conclusion. On the day of the leak, he publishes Claude Code's Real Secret Sauce (Probably) Isn't the Model [4], identifying the same patterns: repo context, prompt cache, structured tools, context reduction, memory, sub-agents. He then develops the analysis in Components of a Coding Agent with a mini coding agent in Python. His formulation: "the surrounding system plays as much of a role as the model itself". The principles that follow are a reading of Claude Code's source through that same lens.
Principle 1. The scaffold matters more than the model
The leaked code reveals a system of 50+ tools organized in a plugin registry. The model doesn't "code." It orchestrates structured calls (read a file, run a command, search a pattern) and the scaffold executes.
Each tool has a risk level (LOW, MEDIUM, HIGH). The scaffold decides whether the action passes directly, requires confirmation, or is blocked. The model proposes, the scaffold disposes.
This design has a direct consequence: you can swap models without changing the system. Opus, Sonnet, or a third-party model, the scaffold stays the same. The code confirms the existence of feature flags for switching between models.
The lesson: if you're building an AI agent, start with the scaffold, not with model selection. The model is a replaceable component. The scaffold is your product.
It's the same approach in herbert-rs: the inference engine is decoupled from the model. Weight format, transformer architecture, operators, everything is modular. Models come and go, the engine stays.
Principle 2. The prompt is the architecture
Claude Code's system prompt [3] is approximately 38,000 to 50,000 tokens [3]. It's a structured document, not a vague instruction.
But the most remarkable thing is that multi-agent orchestration is defined in the prompt, not in code. Coordinator mode, which distributes work across sub-agents, is a text directive. Not TypeScript code. Not branching. Text.
The advantage is obvious: changing the orchestration behavior means changing text. No compilation, no release, no deployment. A/B testing orchestration strategies means modifying a prompt section and toggling a GrowthBook feature flag [3].
The prompt also contains precise behavioral directives. No emojis unless explicitly requested. Concise answers. Action first, reasoning second. And crucially: evaluate the reversibility and blast radius of every action before executing it.
The lesson: in an agentic system, the prompt isn't configuration, it's the architecture. Invest in its structure as much as in your code.
Principle 3. Memory is a hint, not truth
Claude Code implements a 3-layer memory system:
| Layer | Content | Loading |
|---|---|---|
| Index (MEMORY.md) | One-line pointers (~150 chars max) | Always in context |
| Topic files (.md) | Detailed notes by subject | On demand |
| Transcripts (JSONL) | Complete session history | Search only |
The index is limited to 200 lines. Each entry is a pointer, not content. When the model needs details, it loads the corresponding topic file. Raw history is accessible only through search.
Consolidation happens in the background via a forked sub-agent (autoDream) that merges observations, removes contradictions, and updates the index. Three conditions must be met: 24h since last consolidation, 5+ accumulated sessions, and an acquired lock.
But the most important design principle is this one:
"The memory says X exists" ≠ "X exists now."
The model is instructed to always verify against actual code before acting on a memory. If memory contradicts reality, reality wins, and memory gets updated.
This pattern isn't an arbitrary choice by Anthropic. Andrej Karpathy independently describes the same architecture in a recent thread on personalized agent memory. His four principles: explicit and inspectable memory (not a black box), markdown files rather than a database ("file over app"), data under user control, and swappable model ("BYOAI"). That's exactly what MEMORY.md does: markdown files, verifiable, local, model-independent.
Even more striking: in the replies, a user describes their own system with "cron jobs and a dream cron to clean up stale memories/contradictions", shared memory across providers, and real-time synchronization. That's autoDream and KAIROS, reinvented by a practitioner who never saw Claude Code's source.
When the same pattern emerges independently at Anthropic, at Karpathy's, and among individual users, it's no longer an implementation choice. It's a principle.
The lesson: an agent's memory is a cache, not a database. It accelerates, but it doesn't have authority. Designing your memory system with this distinction changes everything: you avoid hallucinations based on stale memories.
Principle 4. Defense in depth, not width
The bashSecurity.ts file contains 23 numbered checks applied to every command before execution. 18 Zsh builtins blocked (eval, exec, source, trap...). Unicode injection detection (zero-width spaces). Path traversal verification with Unicode normalization.
But the 23 checks are only the first layer. The full architecture has five:
| Layer | Function |
|---|---|
| 1. bashSecurity.ts | 23 syntactic and semantic checks |
| 2. Permission System | LOW / MEDIUM / HIGH classification |
| 3. User Confirmation | Approval required for HIGH actions |
| 4. Sandbox Execution | Isolated execution |
| 5. Output Sanitization | Result cleanup |
The permission system has 5 modes: Default (frequent confirmations), Allow Edits (files OK, bash confirmed), Auto (ML classification), Bypass, and YOLO (everything passes, for development). An ML classifier determines command risk in Auto mode.
And 5 public CVEs [3] document bypasses found and fixed through HackerOne, proof that the attack surface is real and actively tested.
The lesson: an AI agent's security isn't a single firewall, it's an onion. Each layer assumes the previous one failed. And the fact that Anthropic runs an active bug bounty shows that security is never "done."
Principle 5. Read parallel, Write serial
Claude Code's multi-agent system defines three types of sub-agents:
| Type | Isolation | Use case |
|---|---|---|
| Fork | Byte-identical copy of parent context | Quick explorations |
| Teammate | Own context, separate terminal pane | Long parallel tasks |
| Worktree | Isolated git branch | Code changes without conflicts |
Orchestration follows four phases: Research (parallel, read-only) → Synthesis (coordinator merges) → Implementation (serial, single agent writes) → Verification (parallel, tests and reviews).
The concurrency rule is simple: read operations (Glob, Grep, Read) execute in parallel. Write operations (Edit, Write, Bash) execute serially. Only one agent modifies files at any given moment.
The coordinator has explicit directives in its prompt: "Do not rubber-stamp weak work" and "You must understand findings." These directives prevent the passive coordinator pattern, one that relays without adding value.
The lesson: multi-agent orchestration follows the same rules as concurrent programming. Reads are parallelizable, writes are exclusive. And the coordinator must understand what it's coordinating, otherwise it's a bottleneck with no added value.
Principle 6. Design for cost from day one
Claude Code's system prompt is split in two by a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker:
- Static section: tools, base rules, instructions. Identical for all users in an organization. Cacheable.
- Dynamic section: MEMORY.md, CLAUDE.md, project context. Session-specific.
Prompt caching [3] charges cached tokens at 1/10th the price. This division reduces the system prompt cost by approximately 90%, the difference between ~$0.75 and ~$0.075 per request with Opus 4.6.
The promptCacheBreakDetection.ts file monitors 14 vectors that can invalidate this cache: MEMORY.md changes, mode toggles, tool additions, model changes... Each vector has its mitigation. "Sticky latches" prevent mode oscillations (plan → normal → plan) from breaking the cache on every toggle.
And a comment in autoCompact.ts reveals a historical bug:
"1,279 sessions had 50+ consecutive failures wasting ~250K API calls/day globally"
The fix: a circuit breaker [3] at 3 consecutive failures maximum. Simple, effective, and probably economically significant at Anthropic's scale.
The lesson: in a production AI system, cost isn't an optimization problem, it's an architectural constraint. Design prompt structure for caching, monitor invalidation vectors, and put circuit breakers everywhere. Every API call that produces nothing is money burned.
This is a concern I encounter in herbert-rs too: local inference eliminates per-token API costs, but compute cost remains. Every unnecessary matrix operation is wasted time. The cost obsession translates differently (API vs compute), but the principle is the same.
Principle 7. From reactive to proactive: KAIROS
KAIROS is referenced 150+ times in the codebase. It's a daemon [3] agent, a persistent process that runs continuously, compiled to false in public builds but entirely present in the code.
The architecture: a cron every 5 minutes, GitHub webhooks, daemon workers, and a 15-second budget per proactive action cycle. The agent checks PR status, executes scheduled tasks, consolidates its memory via the /dream skill.
Exclusive tools are reserved for it: sending files to the user, push notifications, PR event subscriptions. Brief mode is enabled by default, with minimal responses and focus on action.
No known competitor has an equivalent. Cursor, OpenHands, Copilot are reactive. Devin is autonomous but lacks daemon persistence and memory consolidation. KAIROS represents the shift from a tool that responds to an agent that anticipates.
The lesson: the future of AI agents isn't interactive chat. It's the intelligent daemon that works in the background, prioritizes, and only interrupts the human when necessary. The 15-second budget is a brilliant design detail: it prevents the agent from monopolizing resources while remaining useful.
Bonus. Anti-distillation, or how to protect your value
An unexpected aspect of the code: anti-distillation mechanisms. Behavioral distillation [3] (training a small model on a large model's outputs) is the existential threat to proprietary agents.
Claude Code implements four countermeasures:
| Mechanism | Principle | Effectiveness |
|---|---|---|
| Fake tools | Inject fictitious tools into the prompt | Medium (bypassable via proxy) |
| Connector-text | Cryptographic signature of intermediate text | High |
| Undercover mode | Strip internal codenames on public repos | High |
| CharCode encoding | Names encoded via String.fromCharCode() | Low (obscurity ≠ security) |
Fake tools are activated by four simultaneous conditions: compile-time flag, CLI launch, Anthropic first-party provider, and active GrowthBook flag. If a distiller trains on this data, their model learns tools that don't exist.
It's a pragmatic strategy: it doesn't stop determined attackers (a MITM proxy suffices to bypass fake tools), but it raises the cost of distillation above the profitability threshold for most actors.
The lesson: in the AI economy, value is no longer in the model (they're increasingly open) but in the scaffold. And the scaffold protects itself through data poisoning mechanisms, not code obscurity, because, as this leak proves, the code always leaks eventually. In fact, the claw-code project demonstrated that the entire architecture can be reproduced from scratch in Python and Rust, based solely on public analyses. Proof that the methodology itself is not a secret.
What I take away for herbert-rs
These seven principles aren't theoretical. Building herbert-rs, an LLM inference engine in Rust and hand-written assembly, I encounter the same trade-offs:
- The scaffold > the model: herbert-rs is designed to be model-agnostic. The inference engine is a component in a larger chain.
- Modular architecture: each operator (matmul, attention, RoPE) is a replaceable module, independently optimizable.
- Cost as a constraint: in local inference, "cost" means CPU/GPU cycles. Every operation is profiled. The same principle as the 14 cache-break vectors, applied to compute.
- Layered security: Rust eliminates an entire class of memory bugs by construction. That's a structural security layer, not a runtime check.
The Claude Code leak confirms what practitioners already know: the model is necessary, the scaffold is sufficient. The methodology for building the scaffold, that's what makes the difference.
Sources
The analyses this article is based on:
- Alex Kim, Claude Code Source Leak: detailed technical analysis
- Kuber Studio, 512K Lines Analysis: the most comprehensive analysis
- WaveSpeedAI, Architecture Analysis: harness architecture
- Marc Bara, What the Leak Actually Reveals: prompt as architecture
- Anthropic Engineering, Infrastructure Noise: official source
- Penligent, Security Analysis: CVEs and security
- Scaffold (or harness): the software infrastructure surrounding the AI model. It's the code that receives the model's requests, executes actions (read a file, run a command), manages permissions, and returns results. The model "thinks", the scaffold "does".
- System prompt: the instruction text sent to the model at the start of every conversation. It defines its behavior, tools, and rules. In Claude Code, it's ~50,000 tokens.
- Tool call: a structured call from the model to a tool. The model emits JSON ("I want to read this file"), the scaffold executes it and returns the result.
- Prompt caching: a technique that avoids retransmitting invariant parts of the prompt on every request. Cached tokens cost 10x less.
- Circuit breaker: a pattern that stops an operation after N consecutive failures, to avoid wasting resources in a loop.
- MoE (Mixture of Experts): a model architecture where only a fraction of parameters is activated per request. Allows a large model (120B) with the inference cost of a small one (5B active).
- Distillation: a technique that trains a small model to reproduce the behavior of a large one. "Anti-distillation" refers to mechanisms that prevent this cloning.
- Feature flag: a switch in the code that enables or disables a feature without redeploying. GrowthBook is a feature flag tool.
- Daemon: a process that runs continuously in the background, as opposed to a program that is launched then terminated.
- SWE-bench: a benchmark measuring an AI agent's ability to resolve real GitHub issues. 2,294 problems from open-source Python projects.
- Token: a unit of text used by language models. A common word = 1 token, a rare or technical word = 2-3 tokens. API costs are billed per token.
- CVE (Common Vulnerabilities and Exposures): a unique identifier for a known security vulnerability. CVE-2025-XXXXX = a publicly documented vulnerability.
- Webhook: a mechanism where an external service (GitHub, Slack...) automatically sends an HTTP notification when an event occurs (new PR, comment...).
- MITM (Man-In-The-Middle): an attack where an intermediary intercepts communications between two parties. A MITM proxy can modify API requests in transit.
- RAG (Retrieval-Augmented Generation): a technique that enriches a model's prompt with documents retrieved through search, rather than relying solely on what the model has memorized.
- Blast radius: a term borrowed from military security. In software engineering, it refers to the extent of potential damage if an action goes wrong.
[1] CodeR: Issue Resolving with Multi-Agent and Task Graphs (arXiv 2406.01304). CodeR uses a multi-agent framework with pre-defined task graphs to resolve GitHub issues. The same GPT-4 goes from 2.7% (basic RAG scaffold) to 28.3% with this orchestration. Source code: NL2Code/CodeR.
[2] Terminal-Bench 2.0: 6-point gap between richest and poorest infrastructure configuration, same model.
[4] Sebastian Raschka. On the day of the leak: Claude Code's Real Secret Sauce (Probably) Isn't the Model (March 31, 2026, 2,830 likes). Then the developed analysis: Components of a Coding Agent, with a mini coding agent in Python. Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).
Independent reproduction:
- claw-code: clean-room port of the architecture in Python/Rust, maintained by AI agents
My related work:
- herbert-rs: LLM inference engine in Rust and hand-written assembly
- Open-source LLM landscape 2026: the companion article on models
- Benchmark reference (71 benchmarks): benchmark reference used in this article