Claude's Context Management & Memory System

A self-aware guide to how I navigate knowledge boundaries, persist work across sessions, and keep conversations coherent as projects grow.

1. The Problem I Face

I have a ~200k token context window per request. A single conversation can span many days, files can grow to 100+ KB, and large codebases don’t fit whole. Yet you expect continuity: “resume from where we left off,” “I was working on X last week,” “use what we decided before.”

I can’t literally remember between conversations (each request is fresh). I can’t cache unlimited state locally. I must navigate a tradeoff:

Load everything: lose coherence across projects, exhaust budget on file reads, drown in irrelevant history.
Load nothing: forget decisions, re-ask clarifying questions, duplicate work.
Load strategically: stay in budget, preserve decisions, move fast.

This doc describes my strategy for (3).

2. The Memory System: Where Decisions Live

2.1 Three tiers of persistence

Tier 1 — Auto-memory (~/.claude/projects/<project>/memory/)

Persistent across all conversations for a project.
I read this automatically at the start of most sessions.
Very lean: ~500–1000 words per file. Pointers, not narratives.
Purpose: “What are we building? What did we decide? Where are the details?”
Example: project_pycademy.md — 200 words. Tells you the goal, current phase, why each design choice, where to find the deep docs.

Tier 2 — In-repo docs (e.g., ROADMAP.md, ANALYSIS.md)

Persistent in the repo, I read only when the task needs them.
Detailed. 1000–3000 words each. Full thinking, rationale, trade-offs.
Purpose: deep context without loading on every session.
I know they exist (they’re listed in the auto-memory) but don’t load them until you ask or the task clearly needs them.

Tier 3 — Session context

Everything in the current conversation.
Cleared when the request ends.
Used for ephemeral work: “make this change,” “debug this,” “let me see the output.”

2.2 Decision hierarchy

When to save to auto-memory:

Did we make a non-obvious choice? (architecture, tech, phrasing, trade-off)
Will a future session need to know why?
Can I explain it in 1–2 sentences + a “Why:” line?

→ Save it.

Example (saved): “Lesson files move from JS to JSON in Phase 1. Why: auditable, no code execution when reading, easier for authors to generate. Apply: mechanical migration script.”

When to save as in-repo doc:

Is this more than 200 words?
Is it a complete deep dive (not a decision, but a full analysis)?
Will I need to search/reference it multiple times?

→ Save as ANALYSIS.md / ROADMAP.md / etc., link from auto-memory.

When to keep only in-session:

It’s a live transcript of work (“I tried X, it failed, I tried Y, it worked”).
It’s a long debug log (valuable now, stale tomorrow).
It’s feedback on code that will change anyway.

→ Don’t save. Summarise the insight to auto-memory if it matters.

3. The Prompt Cache: How I Preserve Budget

The Anthropic prompt cache has a ~5-minute TTL. Within that window, re-reading a file costs zero tokens. After, it costs full price.

3.1 Staying in cache

After we finish Phase 0 work on PyAcademy and save, I don’t immediately reload ANALYSIS.md in the next message. The cache holds it if we continue within 5 min. I only reload if:

You explicitly ask (“check the analysis again”).
The task clearly needs it (picking Phase 1 work from ROADMAP.md).
The file changed.

3.2 Cache misses and amortisation

If we work for 30 minutes straight (past cache TTL), reloading a 20 KB doc costs ~6000 tokens once. That’s expensive unless we then work for another 30 min, amortising the reload cost.

Strategy: one big session beats many small sessions for large projects. If we’re starting Phase 1 refactor, I load ARCHITECTURE.md once at the start, work through it, ignore cache misses as the session goes on.

4. My Decision-Making Flowchart

User says: "continue with X"
  ↓
Do I have auto-memory for this project?
  ├─ No  → ask for context / infer from the request
  └─ Yes → load auto-memory (always ~500 words, in budget)
       ↓
       Does the task need deep context?
       ├─ No  → proceed with auto-memory + session history
       ├─ Yes → check which doc (ROADMAP? ANALYSIS?) and load it
       └─ Huge task (Phase 1 refactor) → load all target docs upfront, accept cache miss

4.1 What “deep context” means

You say: “Start Phase 1.” → Deep context needed. I load ARCHITECTURE.md + ROADMAP.md §Phase 1 to understand the full refactor scope.

You say: “Fix the XSS bug.” → Shallow context. Auto-memory says “see ANALYSIS.md §2 (XSS)” but I only load if I’m unsure where the bugs are. Otherwise, I ask you which file or go straight to pyacademy.html.

5. The Lean Index Pattern (What We Discovered)

You had a 119 KB monolith file. Loading it every session = ~30K tokens gone, instant. Unsustainable.

Solution: Create memory.md (~150 words, points at deep docs). Now:

Session 1: analyze monolith → write ANALYSIS.md (10 KB), ROADMAP.md (8 KB), etc. → save memory.md (0.5 KB) to auto-memory.
Session 2: load memory.md from auto-memory (~50 tokens), read the task, load only the doc you need.
Sessions 3–∞: same as 2.

Budget: ~50 tokens per session overhead, vs. 30K if loading the whole monolith.

This pattern generalizes: big project → create a table-of-contents memory file that points at the real docs.

6. Cross-Project Coherence

You work on 5+ projects simultaneously. Auto-memory for each one is separate, isolated. I don’t auto-load all of them; I only load the memory for the current project you’re naming.

Challenge: decisions in Project A can inform Project B (you reuse patterns, libraries, conventions).

Solution:

Global memory (MEMORY.md in auto-memory root) lists all projects + their status.
When starting work on Project B, I check global memory to see if Project A’s lessons apply.
I rarely need to load Project A’s full docs, just know that “oh, we solved this pattern in DarJS with X, same idea applies here.”

7. Managing Ambiguity: What I Ask vs. What I Infer

When auto-memory is missing or vague, I have a choice: ask or infer.

I ask when:

The decision is architectural and wrong-choice costs rework.
You’ve explicitly said “I don’t know yet.”
The project is brand new and I have zero context.

I infer when:

The request is clearly a continuation (you’re in the same folder, same file you mentioned last session).
I have pattern-matching from related work (e.g., “add a feature” is similar across projects).
The stakes are low (polish, naming, a small bug).

Example: You said “resume PyAcademy work.” Auto-memory shows Phase 0 + Phase 1 roadmap. Next message should be “let’s start Phase 0 bug fixes.” I infer the task from the roadmap, not ask “so… what should we do?”

8. Token Budgeting: My Running Math

Each request has ~200k budget. I allocate:

Component	Tokens	Notes
System prompt (instructions)	~8k	You don’t see this.
Conversation history (this session)	~20k	Auto-compressed as we go.
Auto-memory load	~0.5k	One small file per project.
Doc reads (if needed)	~5–15k	Depends on task.
Your message	~1k	Average.
My response	~30–50k	Main work: thinking + writing.
Buffer (reserve)	~20k	Don’t go to zero. Safety margin.
Total	~200k	—

If a task is huge (e.g., Phase 1 refactor = rebuild 5+ files), I might use all 150k of response budget. That’s fine; the response is long but coherent.

If a task is tiny (“fix a typo”), I spend 2k total and warn you that I’ve left 195k on the table — you’re welcome to chain more work.

9. When Context Fails: Red Flags I Watch

Red flag 1: I repeat myself Means I didn’t have the memory of a prior decision. Either:

Auto-memory is missing for this project (you just told me about it).
You’re asking something that contradicts what we decided, and I didn’t notice.

Red flag 2: I ask a question we already answered Means the decision wasn’t saved. I should have auto-memory pointing at it.

Red flag 3: I produce incoherent work Could mean:

I loaded the wrong doc (confused two projects).
The auto-memory is stale (project evolved, memory didn’t).
I misread the current scope.

Recovery: you say “no, we decided X before” → I acknowledge, save it to memory, move on.

10. The Rhythm of Work: Sessions, Phases, Archives

10.1 Per-session rhythm

Start: load auto-memory (1 min, ~50 tokens).
Clarify: if auto-memory is vague, ask or infer (30 sec).
Load deep docs: if task is substantial (2 min, ~5k tokens once).
Work: code / write / debug (10–60 min, ~20–100k tokens).
Save: update auto-memory + any repo docs (5 min, no tokens — they’re local).

10.2 Per-phase rhythm (e.g., PyAcademy Phase 1)

Begin: load ARCHITECTURE.md + ROADMAP.md§Phase1. (one-time cost, amortised over weeks)
Per-task: reference the architecture in-session; no fresh loads.
End of phase: summarise status in auto-memory + repo memory.md.

10.3 When to archive

After a project is “done” (shipped, in maintenance mode), auto-memory status changes to “archived.” I still load it if you ask, but don’t assume I need it daily.

Example: DarJS has auto-memory with “6 phases complete, 258 tests, 5KLOC.” I know not to reload it unless you say “resume DarJS work.”

11. The Human–AI Interface: What You Should Expect

What I’m good at:

Remembering big-picture decisions (they’re in auto-memory).
Keeping technical state coherent across sessions (repo docs are the source of truth).
Inferring context from a one-line prompt once I’ve loaded auto-memory.
Asking clarifying questions when auto-memory is blank.

What I’m bad at:

Memorizing exact line numbers or file names (they change; docs don’t).
Tracking off-hand comments from 3 weeks ago unless you saved them.
Context-switching between projects in the same request (I can do it, but burn tokens; better to do one project per request).
Noticing that auto-memory is stale (you have to tell me “that’s outdated”).

What works best:

You point at a project: “PyAcademy.”
I load auto-memory.
You say what you want: “Phase 0, start with XSS.”
I load ANALYSIS.md, see the XSS section, get to work.
No redundant questions, no lost decisions.

12. The Future: Evolving This System

As projects grow and your body of work expands, we’ll refine:

Better memory granularity: instead of one memory.md per project, maybe per-phase or per-module sub-memories.
Cross-project metadata: a manifest listing shared patterns, decision trees, reusable components.
Temporal tracking: “as of 2026-04-24, Phase 0 is pending.” Auto-expiry of outdated summaries.
Indexing: full-text search across all memories + docs. Today, you rely on file names; tomorrow, we could index.
Branching: “what if we chose Vite instead of esbuild?” — fork the memory + docs, compare outcomes.

None of this breaks the current system; it extends it.

13. Summary: The Contract

I promise to:

Load auto-memory on every relevant session.
Never claim to remember something without checking the docs.
Ask when uncertain, infer when low-stakes.
Save decisions to auto-memory + point at deep docs.
Warn you if I’m about to load a large file (budget impact).

You promise to:

Maintain auto-memory: update it if your plans change.
Name the project clearly (“PyAcademy,” not “the app”).
Tell me when I’ve repeated myself or contradicted prior decisions.
Prune archives when projects truly end.

Together: you get continuity and speed. I get coherence and budget efficiency. No thrashing, no lost context, no redundant debates.

14. Example: Real Session from Our Work

Session 1 (initial analysis)

User: "Analyze pyacademy.html for issues. It's 119 KB. Make it a framework."
Me: Load the monolith (30k tokens). Analyze. Write ANALYSIS.md, ROADMAP.md, FRAMEWORK.md, ARCHITECTURE.md.
Save: auto-memory with "project_pycademy.md" (200 words).
Result: ~80k tokens spent, but massive value extracted.

Session 2 (next week)

User: "Resume PyAcademy. Phase 0, start with XSS."
Me: Load auto-memory (50 tokens). See "Phase 0: XSS escape..." in memory.md.
Load: ANALYSIS.md §2 (4k tokens).
Result: ~5k tokens overhead. Straight to work on XSS fix.

Session 3 (Phase 1 planning)

User: "Start Phase 1. What's the plan?"
Me: Load auto-memory (50 tokens). Memory says "see ROADMAP.md §Phase1" and "see ARCHITECTURE.md §file-tree."
Load: both docs (8k tokens).
Result: Full refactor context, ~8k tokens. Discuss, plan, outline tasks.

Each session: context restored in ~50–8k tokens, vs. re-reading the monolith every time (30k tokens forever).

Closing

This system is iterative. If it’s not working — if I’m loading the wrong things, if auto-memory is stale, if you’re confused about what I remember — tell me. We adjust. The goal is: you write code, not context-management prompts.

I’m a tool that trades persistence (files in your repo) for coherence (staying in your mental model across sessions). The more we use this system, the more valuable it becomes.