tags: [annotations, habits, codebase, ai-readable, contracts, byproduct, intent, incremental] related:

034_five-principles-ai-code-search.md
012_code_for_machines.md
023_living_codemap.md supersedes: ~ status: current —

035 — Six Habits That Make Your Codebase More AI-Readable

The Problem

Most codebases are readable by humans and opaque to AI agents. The agent can read the files — but reading is expensive. Without metadata that describes intent, the agent has to infer purpose from implementation every time. It re-reads, re-reasons, re-derives. That cost compounds across every session.

Making a codebase AI-readable isn’t about adding documentation. It’s about changing where you put information so that machines can retrieve it without needing to understand it.

These six habits do that. They’re not a framework. Each one applies independently and pays off immediately.

1. Write Intent at the Point of Definition

The most important habit: describe what a function is for at the same moment you write it, not after.

Not this:

function normalizeOrder(raw) {
  // implementation
}

This:

/**
 * @does    Converts raw order API payload into a normalized Order with subtotal, tax, total.
 * @reuse-when  You receive a raw payload from the orders REST endpoint and need a normalized Order.
 */
function normalizeOrder(raw) {
  // implementation
}

The difference isn’t documentation — it’s a query target. @does answers “what does this do?” @reuse-when answers “when should I reach for this?” Both questions will be asked by the AI agent every time it approaches this code. Without these fields, the agent reads the body and infers. With them, the agent queries and moves on.

The timing matters. When you’re writing the function, you know exactly what it’s for and when to use it. That knowledge gets cheaper to capture the moment you have it. A week later, you’ll re-read the body to reconstruct it. A session later, the AI will read it too — and charge you tokens for the inference.

The habit: write @does and @reuse-when on every public function, immediately, as part of “done.” Treat a function without these fields the same as a function without a closing brace: unfinished.

2. Use Closed Vocabularies for Classification

Open-ended fields are hard to search. “This is a utility function for orders” and “helper for order processing” and “order-related utility” all describe the same thing differently. No retrieval system reliably equates them.

Closed vocabularies fix this:

@role  transformer | validator | query | mutation | factory |
       renderer | adapter | guard | hook | coordinator | config | registry

Every function gets exactly one role. The set never grows without discussion. Now @role transformer is a filter, not a guess — you can retrieve all transformers in the orders domain, or exclude renderers from an impact analysis, or route all validators through the verification layer.

The same principle applies to @complexity:

@complexity  simple | moderate | complex

This single field drives the routing decision: reuse as-is, verify, or generate. It takes five seconds to write and saves every future session from re-reasoning about whether this function is safe to use without inspection.

The habit: define a closed list for any field the retrieval system will filter or route on. Never let classification fields be free-text. Write the vocabulary once, share it across the team, enforce it with a linter.

3. Name Your Callers and Dependencies Explicitly

Static analysis can tell you what calls a function in the current codebase. It cannot tell you what should call it — the intended callers, the architectural dependencies, the design contract.

/**
 * @used-by     saveOrder, OrderDetail, InvoiceGenerator
 * @depends-on  PriceCalculator, TaxEngine
 */
function normalizeOrder(raw) { ... }

These two fields do two things:

First, they make impact analysis possible without parsing. Before editing normalizeOrder, the agent queries @used-by and gets the blast radius immediately — no graph traversal, no file reads.

Second, they document intent that static analysis misses. Maybe InvoiceGenerator doesn’t call normalizeOrder yet — it should. Listing it in @used-by expresses the architectural intention. When the agent generates InvoiceGenerator, it finds this contract and knows to wire the dependency correctly.

The habit: when you write a function, spend 30 seconds listing the functions that call it and the functions it needs. Update these fields when callers are added or removed. They’re cheap to maintain at the point of change and expensive to reconstruct from grep.

Isolated functions are searchable. Pipelines are composable.

/**
 * @pipeline  order-creation
 * @step      2
 */
function normalizeOrder(raw) { ... }

When the agent needs to implement or modify the order creation flow, a single query — “show me the order-creation pipeline” — returns all steps in sequence. No exploration, no file navigation, no guessing which functions participate.

Without pipeline grouping, the agent has to discover the flow by reading call chains. It finds saveOrder, sees it calls normalizeOrder, traces that to PriceCalculator, and so on. Five file reads to reconstruct what @pipeline encodes in one metadata field.

Pipeline annotations also reveal gaps. If steps 1, 3, and 4 exist but step 2 is missing, the agent knows before writing anything. If you’re adding a new step, you know where in the sequence it belongs.

The habit: for any multi-step workflow — order processing, user registration, report generation, data sync — assign a pipeline name and step number to each participating function. This is a one-time annotation that makes the entire flow queryable as a unit forever.

5. Extract Metadata as a Byproduct, Not a Task

The reason annotation coverage is always incomplete: it’s treated as a separate task. You finish the code, call it done, move on. The annotation is “tomorrow’s problem.” Tomorrow never comes.

The fix is to make extraction a byproduct of existing work, not a task you add to the backlog.

When writing a function: write the @contract block in the same edit. You’re already reading the function — the intent is in your head. Capture it now, in the same file, in the same commit. Zero additional context-switching.

When editing a function: update any stale fields in the same edit. You’ve already read the body to make your change. Checking whether @does is still accurate costs five seconds.

When committing: a pre-commit hook runs dar-nlp extract and updates the contract index automatically. You never manually maintain the index — it tracks the code.

The extraction overhead, amortized this way, is nearly zero. The overhead of not extracting — re-reading, re-inferring, regenerating — compounds across every session.

The habit: configure your commit hook to auto-extract. Write the annotation in the same edit as the code. Review @does and @returns when you touch an existing function. Make annotation a byproduct of the work you’re already doing.

6. Pre-Index Before the Session, Not During

The most expensive moment to build knowledge is when the agent needs it. A session that starts with “let me explore the codebase” is a session that spends its first 20 tool calls building context that could have been built offline.

Pre-indexing means running the expensive analysis work — parsing, graph building, annotation extraction — before the session starts, and storing the results in a queryable format the agent can hit in one call.

What to pre-build:

Symbol index: function names, file paths, line numbers — answers “where is X?” in one lookup
Contract index: TF-IDF or embedding search over annotations — answers “what does X?” without reading files
Call graph: callers and callees for every indexed function — answers “what calls X?” without tracing
Stale detection: which files have changed since last extract — answers “what needs updating?” before the session touches anything

The setup cost is one-time per project. The maintenance cost is a git hook that re-indexes changed files after each commit — typically 1–5 files, taking under a second.

A session that starts against a pre-built index answers orientation questions in 1–3 tool calls. A session that starts cold answers the same questions in 15–30. Multiply that difference across every session and the index pays back its setup cost in the first week.

The habit: run extract once on project setup, install the git hook, and never think about it again. The index stays current automatically. Every session that follows starts with a complete picture instead of an empty one.

Practical Takeaway

These six habits form a stack. Each one makes the next one more useful:

Intent at definition → annotations exist to search
Closed vocabularies → annotations can be filtered and routed
Named callers/dependencies → impact analysis without file reads
Pipeline grouping → workflow composition without exploration
Byproduct extraction → coverage stays high without extra effort
Pre-indexing → every session starts with the full picture available

None of them require a specific tool. They require a decision: to treat your codebase as a dataset that the AI queries, not just a directory that the AI reads. That decision, made consistently, is what separates a codebase that speeds up AI-assisted work from one that makes it expensive every session.

Six Habits That Make Your Codebase More AI-Readable