tags: [tokens, context, contracts, retrieval, agents, architecture, codemap, nlp] related:

011_nlp_contracts.md
012_dar_find.md supersedes: ~ status: current —

The Three Token Debts — and the One Architecture That Pays Them

Building in the Age of Agents — Article 019

Every AI-assisted codebase has a hidden cost that compounds across sessions: the AI has to re-learn the project every time it wakes up. Not because it forgot — it never remembered. Context windows don’t persist. Each session starts cold.

You pay for that cold start in tokens. What most developers don’t realize is that there are three different kinds of tokens being spent, and they respond to completely different solutions.

The Cold Start Tax

Here’s what actually happens when an AI agent picks up a codebase it built last session.

The agent needs to understand the project before it can do anything useful. So it reads files. The App.js to understand the entry point. The engine modules to understand what’s available. The scenario files to understand the data shapes. The renderer to understand what methods exist before calling them. By the time it’s ready to do the actual work, it has spent 2,000–5,000 tokens just orienting.

That’s not the work. That’s the tax on the work.

And it’s not a fixed tax. It grows with the project. A codebase with 8 modules costs eight times what a single-file project costs to re-enter. Every refactor, every new module, every renamed method makes the cold start more expensive.

The instinct is to fight this with better memory — “just remember what you built.” But that misses the structure of the problem. Cold start tokens aren’t all the same thing. They break into three distinct debts, and each one has its own creditor.

Debt 1 — Orientation Tokens

The first debt is orientation: understanding the shape of the project before touching anything.

Which files exist? What does each one own? How do the pieces relate? What’s the entry point? Where do I look for the board renderer versus the move validator?

Without any documentation, the agent answers these questions by reading files. Every file read burns tokens. Some files are read only to answer “is this the right place?” — and then discarded.

The cheapest possible solution is a file map. Not a full README — just a structured tree with one-line descriptions. Something like:

src/
├── App.js              ← top-level coordinator; wires all modules
├── engine/
│   ├── ChessEngine.js  ← chess.js wrapper (camelCase API)
│   └── FakeChessEngine.js ← controllable test stub
├── story/
│   ├── StoryDirector.js  ← fluent scenario builder
│   └── StoryPlayer.js    ← async step sequencer; emits events; no DOM
├── renderer/
│   └── BoardRenderer.js  ← diff-based piece renderer; no event listeners

An agent that loads this before doing anything else knows where to look before it reads. It skips the “is this the right file?” reads entirely. On an 8-module project, that’s typically 3–5 full file reads eliminated per session.

The README is the answer to Debt 1. Not because it’s documentation — because it’s orientation that fits in a single context load instead of scattered across 8 file reads.

Debt 2 — Interface Tokens

The second debt is interface understanding: knowing what a module can do before you try to use it.

You need to call the board renderer. But what methods does it expose? What does render() take? Does movePiece handle en passant, or do you handle that in the caller? What’s the exact signature of highlightSquares?

Without a contracts layer, the answer is: read BoardRenderer.js. All of it. To answer three questions about method signatures.

The contracts layer flips this. Instead of reading the implementation to infer the interface, you read a dedicated interface file that contains nothing but the interface. A 200-line contracts file answers every “what does X take and return?” question for the entire project — at the cost of 200 tokens instead of 2,000.

The key insight is separating shape from behavior. The implementation file tells you how something works. The contracts file tells you what it looks like at the boundary. For 80% of agent tasks — calling an existing function, passing data through a chain, wiring up a new module — the boundary is all you need.

A contracts file for an 8-module project typically runs 150–250 lines. It replaces 4–8 full file reads per session. The ratio is roughly 10:1 in favor of the contracts approach.

This is Debt 2 paid. But notice: the contracts file still has to be found and loaded. The agent has to know it exists and reach for it before reaching for the implementation files. That’s a behavior change, not just an architecture change.

Debt 3 — Reuse-Discovery Tokens

The third debt is the hardest to see: discovering what already exists before writing new code.

Debt 1 and Debt 2 assume you know what you’re looking for. You want to understand the renderer; you find it in the file map; you load its contracts.

But what about the moment before that? You need to build a feature. You have a rough idea what it should do. Do you reach for something that already exists, or do you write something new?

If you write something new and a perfect match already existed, you’ve spent tokens writing duplicate code — and you’ll spend tokens again next session when the duplication surfaces as a bug.

The only way to answer “does something like this already exist?” without reading every file is a retrieval layer. A function-level index that maps intent to location without reading the implementation.

This is exactly what the @reuse-when and @tags fields in an NLP contract system provide. When you write:

/**
 * @contract
 * @does        Renders the board position by diffing current DOM state against new position map
 * @tags        board, render, diff, dom, position, pieces
 * @reuse-when  You need to update the board display after any position change
 * @complexity  simple
 */

…you’re not writing documentation. You’re writing a retrieval key. A TF-IDF index over those fields can answer “find me something that renders the board position” without touching a single implementation file.

The agent that runs dar find "render board after move" before writing a renderer isn’t being cautious — it’s being cheap. If the retrieval hits, it reuses. If it misses, it generates new code and adds a contract so the next session can find it.

Debt 3 is the hardest to pay because it requires a cultural shift: search before writing. The architecture supports it (a retrieval tool, a contract index), but the behavior has to be enforced — by CLAUDE.md rules, by pre-commit hooks that block uncontracted functions, by making dar find the reflex before any implementation decision.

The Gap the Three Solutions Leave

Here’s the important honest accounting: even with a file map, a contracts file, and a retrieval index, there’s still a gap.

The contracts file and the retrieval index only cover functions that have been explicitly contracted. New code, internal helpers, anything that wasn’t annotated — that’s still dark to retrieval. The agent still has to read the file.

And the contracts live inside the source files. To find a contract for BoardRenderer, you still have to read BoardRenderer.js — or maintain a separate extracted index like CODEMAP.md.

CODEMAP is the bridge. It’s a flat symbol index: function name → file path → line number → one-line description. It fits in 30–50 lines for a mid-size project. An agent that loads CODEMAP alongside the file map has 90% of what it needs to start work without reading a single implementation file.

The full stack looks like:

Layer	What it answers	Token cost
File map (README)	Where do I look?	~50 tokens
CODEMAP.md	What symbols exist at what lines?	~100 tokens
contracts.js	What does each module take and return?	~200 tokens
NLP index (dar find)	Does something like this already exist?	~20 tokens per query

Together, these four layers replace 4,000–8,000 tokens of cold-start reading with ~400 tokens of structured loading. On a project that generates 3–5 sessions of active development, that’s 10,000–40,000 tokens saved — real money in API spend, and real speed in every session.

The Architecture Is the Habit

None of these layers are complicated to build. A README file map is an hour’s work. A contracts file is 10 minutes per module. CODEMAP is byproduct work — you add a line every time you touch a function, because you already know the line number. An NLP index is 80 lines of TF-IDF over JSON.

The hard part isn’t the tools. It’s the habit.

The file map has to be updated when you add modules. The contracts have to be written when you write functions, not “later.” CODEMAP has to get a line when you touch a function, not at the end of the session. The retrieval tool has to be run before writing new code, not after.

When these habits hold, the architecture pays dividends automatically. When they lapse, the project slowly reverts to cold-start poverty — the agent re-reads files it read before, re-generates code it generated before, re-derives structure it documented before.

The architecture doesn’t save tokens. The architecture, maintained as a discipline, saves tokens.

What This Means for Your Next Project

If you’re starting a project you’ll develop across multiple sessions with an AI agent, the minimum viable retrieval stack is:

File map in README on day one — one tree, one line per file, what it owns
contracts.js before first implementation file — all interfaces defined before any code
@contract on every public function — written in the same edit as the function, never deferred
CODEMAP.md as a byproduct — one line added per function touched, never a separate scan

These four habits, held consistently, mean the next session starts with full orientation, full interface visibility, and reuse discovery — at 1/10th the token cost of reading files cold.

The token economy of AI-assisted development isn’t about writing less code. It’s about building the retrieval layers that let the AI re-enter your project at full context with minimal reads. Every minute you spend on those layers pays back across every future session that touches the same code.