Raw response to “is this specific to Claude or any LLM?” — saved for editorial review before publishing.
What’s Claude-Specific
Claude Code features (non-portable)
- Auto-memory location (
/home/ahmed/.claude/projects/...) — Claude Code’s feature. - CLAUDE.md auto-loading — built into Claude Code; other tools don’t do this.
- Prompt cache with 5-min TTL — Anthropic’s specific implementation.
- 200k context window — my specific limit (GPT-4 has 128k, varies by tool).
If you switch tools, you lose these affordances.
Claude’s architecture (somewhat portable)
- How I process requests and apply instructions.
- My inference speed / token accounting.
Other models work differently; you’d adapt the system, not copy it.
What’s Universal (Works Anywhere)
The core problem and core solution are model-agnostic:
Problem: Every LLM has a context limit. No built-in memory. Each conversation starts fresh.
Solution: Store decisions outside the model. Load on-demand. Organize by tier (lean summaries, deep docs, session context).
This works with any LLM. The files are plain Markdown + JSON (open standards). The philosophy transfers.
How Other Tools Could Implement This
Option 1: ChatGPT + Custom Instructions (free, limited)
Store memory in: Custom Instructions (static, global).
## My Projects
- PyAcademy: Learning framework. Status: Phase 0 (see GitHub repo).
- DarJS: SMB framework. Status: Phases 1-6 done (258 tests).
## Cross-project conventions
- Always write tests. Use Vitest for JS.
- Modular by default. No >500-line files.
Pros: Free, always loaded.
Cons: Can’t do per-project memory, 150k token limit for instructions (you can fit ~20 projects max), no lazy-loading.
Option 2: OpenAI Assistants API (paid, structured)
Store memory in: Attached files + vector retrieval.
assistant = client.beta.assistants.create(
name="Ahmed's dev assistant",
tools=[{"type": "code_interpreter"}, {"type": "retrieval"}],
file_ids=[ # Attach memory files
"file-pycademy-memory.json",
"file-darjs-roadmap.md",
]
)
Pros: Built-in file handling, semantic search, per-project organization.
Cons: Slower (retrieval → embedding search → LLM), API costs, less fine-grained control.
Option 3: LangChain + Vector DB (powerful, complex)
Store memory in: Pinecone / Weaviate / Milvus (semantic vectors).
from langchain.memory import EntityMemory
from langchain.vectorstores import Pinecone
memory = EntityMemory(llm=llm, vectorstore=pinecone_index)
# Stores decisions, queries them by semantic similarity
Pros: Semantic search (“what did we decide about testing?”), scales to unlimited memory.
Cons: Overkill for structured projects, infrastructure overhead, costs.
Option 4: Plain GitHub (free, durable)
Store memory in: Repo files + commit history.
projects/
├── pycademy/
│ ├── README.md (project status)
│ ├── ANALYSIS.md (issues)
│ ├── ROADMAP.md (phases)
│ └── memory.md (lean summary)
├── darjs/
│ ├── ...
Pros: Version control, searchable, durable, costs nothing.
Cons: Manual copy-paste into conversation, no auto-loading, LLM doesn’t know to check it.
Option 5: Anthropic Files API (newer, designed for this)
Store memory in: Anthropic’s Files API.
with open("pycademy_memory.json", "rb") as f:
response = client.beta.files.upload(file=f)
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a developer assistant."
},
{
"type": "document",
"source": {
"type": "file",
"file_id": response.id,
}
}
],
messages=[{"role": "user", "content": "Resume PyAcademy work."}]
)
Pros: Files API is designed for exactly this (load docs without token cost).
Cons: Slightly newer, requires API usage (not web interface).
Option 6: Ollama / Local Llama (self-hosted)
Store memory in: Local SQLite + custom retrieval.
# You build your own system:
# - Save decisions to SQLite
# - Embed with local model (e.g., nomic-embed-text)
# - Retrieve top-K on every request
# - Inject into system prompt
Pros: Complete control, unlimited memory, no API costs.
Cons: Infrastructure work, embedding quality depends on model.
Comparison Matrix
| Tool | Memory storage | Auto-load? | Lazy-load? | Cost | Portable? |
|---|---|---|---|---|---|
| Claude Code (current) | .claude/projects/ |
✅ | ✅ | Free | ❌ (Claude-specific) |
| ChatGPT Custom Instructions | Global text | ✅ | ❌ | Free | ✅ |
| OpenAI Assistants | Files API | Manual | ✅ | $$ | ✅ |
| LangChain + vector DB | Vector DB | Manual | ✅ | $$$ | ✅ |
| GitHub + manual | Git repo | ❌ | ❌ | Free | ✅ |
| Anthropic Files API | Files API | Manual | ✅ | $ | ✅ |
| Ollama + custom | SQLite | Custom | ✅ | None | ✅ |
My Recommendation for Other Tools
If you’re using ChatGPT:
- Store project memory in GitHub (or a shared doc).
- Start each conversation with “I’m working on X” + paste the latest
memory.md. - Use Custom Instructions for cross-project conventions (naming, testing, language choices).
If you’re using OpenAI API programmatically:
- Use Assistants API with attached files. Let it do retrieval.
- Or use Anthropic’s Files API if you can switch (designed for this, cheaper).
If you’re using local Llama:
- Build a custom retrieval system (SQLite + embeddings).
- You have full control; make it perfect for your workflow.
If you’re using web tools (Claude.ai web, ChatGPT web):
- Store memory in GitHub / Notion / external docs.
- Manually paste relevant docs at the start of a conversation.
- Use the tool’s “custom instructions” or “system prompt” for global rules.
What to Write if You Share This
When you share the blog post with other readers, add a disclaimer:
## Note on Tool Portability
This essay describes a system built specifically for **Claude Code**
(which has auto-memory and file-system integration).
**The philosophy** (three-tier memory, lazy-loading, lean summaries)
works with any LLM. **The implementation** (`.claude/` directories,
CLAUDE.md auto-loading) is Claude-specific.
If you use ChatGPT, Gemini, or local models:
- Store memory in GitHub / Notion / your file system.
- Manually load relevant docs at the start of each conversation.
- Use the tool's "system instructions" or "custom instructions" for global rules.
- See the [Tool Comparison](#comparison-matrix) below for alternatives.
The Portable Takeaway
The system isn’t “Claude’s context management.” It’s “how to manage persistent memory when working with any LLM that has a context limit.”
The specifics change (where files live, how they’re auto-loaded), but the pattern holds:
- Tier 1: Lean summaries (what are we building?).
- Tier 2: Deep docs (how and why?).
- Tier 3: Session context (what’s happening now?).
This works with Claude Code, ChatGPT + GitHub, OpenAI API + vector DB, local Llama, or a notebook + manual copy-paste.
The tool is a detail. The pattern is timeless.
Call to Action
If you want, I can write Part 2 of the series addressing this directly: “How to Adapt This System to Your LLM Tool.” Would that be useful?
Status: Draft ready for review. Move to published blog post once approved.
Date saved: 2026-04-24