Draft: Tool Portability & Alternatives Response

Raw response to “is this specific to Claude or any LLM?” — saved for editorial review before publishing.

What’s Claude-Specific

Claude Code features (non-portable)

Auto-memory location (/home/ahmed/.claude/projects/...) — Claude Code’s feature.
CLAUDE.md auto-loading — built into Claude Code; other tools don’t do this.
Prompt cache with 5-min TTL — Anthropic’s specific implementation.
200k context window — my specific limit (GPT-4 has 128k, varies by tool).

If you switch tools, you lose these affordances.

Claude’s architecture (somewhat portable)

How I process requests and apply instructions.
My inference speed / token accounting.

Other models work differently; you’d adapt the system, not copy it.

What’s Universal (Works Anywhere)

The core problem and core solution are model-agnostic:

Problem: Every LLM has a context limit. No built-in memory. Each conversation starts fresh.

Solution: Store decisions outside the model. Load on-demand. Organize by tier (lean summaries, deep docs, session context).

This works with any LLM. The files are plain Markdown + JSON (open standards). The philosophy transfers.

How Other Tools Could Implement This

Option 1: ChatGPT + Custom Instructions (free, limited)

Store memory in: Custom Instructions (static, global).

## My Projects
- PyAcademy: Learning framework. Status: Phase 0 (see GitHub repo).
- DarJS: SMB framework. Status: Phases 1-6 done (258 tests).

## Cross-project conventions
- Always write tests. Use Vitest for JS.
- Modular by default. No >500-line files.

Pros: Free, always loaded.
Cons: Can’t do per-project memory, 150k token limit for instructions (you can fit ~20 projects max), no lazy-loading.

Option 2: OpenAI Assistants API (paid, structured)

Store memory in: Attached files + vector retrieval.

assistant = client.beta.assistants.create(
    name="Ahmed's dev assistant",
    tools=[{"type": "code_interpreter"}, {"type": "retrieval"}],
    file_ids=[  # Attach memory files
        "file-pycademy-memory.json",
        "file-darjs-roadmap.md",
    ]
)

Pros: Built-in file handling, semantic search, per-project organization.
Cons: Slower (retrieval → embedding search → LLM), API costs, less fine-grained control.

Option 3: LangChain + Vector DB (powerful, complex)

Store memory in: Pinecone / Weaviate / Milvus (semantic vectors).

from langchain.memory import EntityMemory
from langchain.vectorstores import Pinecone

memory = EntityMemory(llm=llm, vectorstore=pinecone_index)
# Stores decisions, queries them by semantic similarity

Pros: Semantic search (“what did we decide about testing?”), scales to unlimited memory.
Cons: Overkill for structured projects, infrastructure overhead, costs.

Option 4: Plain GitHub (free, durable)

Store memory in: Repo files + commit history.

projects/
├── pycademy/
│   ├── README.md (project status)
│   ├── ANALYSIS.md (issues)
│   ├── ROADMAP.md (phases)
│   └── memory.md (lean summary)
├── darjs/
│   ├── ...

Pros: Version control, searchable, durable, costs nothing.
Cons: Manual copy-paste into conversation, no auto-loading, LLM doesn’t know to check it.

Option 5: Anthropic Files API (newer, designed for this)

Store memory in: Anthropic’s Files API.

with open("pycademy_memory.json", "rb") as f:
    response = client.beta.files.upload(file=f)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a developer assistant."
        },
        {
            "type": "document",
            "source": {
                "type": "file",
                "file_id": response.id,
            }
        }
    ],
    messages=[{"role": "user", "content": "Resume PyAcademy work."}]
)

Pros: Files API is designed for exactly this (load docs without token cost).
Cons: Slightly newer, requires API usage (not web interface).

Option 6: Ollama / Local Llama (self-hosted)

Store memory in: Local SQLite + custom retrieval.

# You build your own system:
# - Save decisions to SQLite
# - Embed with local model (e.g., nomic-embed-text)
# - Retrieve top-K on every request
# - Inject into system prompt

Pros: Complete control, unlimited memory, no API costs.
Cons: Infrastructure work, embedding quality depends on model.

Comparison Matrix

Tool	Memory storage	Auto-load?	Lazy-load?	Cost	Portable?
Claude Code (current)	`.claude/projects/`	✅	✅	Free	❌ (Claude-specific)
ChatGPT Custom Instructions	Global text	✅	❌	Free	✅
OpenAI Assistants	Files API	Manual	✅	$$	✅
LangChain + vector DB	Vector DB	Manual	✅	$$$	✅
GitHub + manual	Git repo	❌	❌	Free	✅
Anthropic Files API	Files API	Manual	✅	$	✅
Ollama + custom	SQLite	Custom	✅	None	✅

My Recommendation for Other Tools

If you’re using ChatGPT:

Store project memory in GitHub (or a shared doc).
Start each conversation with “I’m working on X” + paste the latest memory.md.
Use Custom Instructions for cross-project conventions (naming, testing, language choices).

If you’re using OpenAI API programmatically:

Use Assistants API with attached files. Let it do retrieval.
Or use Anthropic’s Files API if you can switch (designed for this, cheaper).

If you’re using local Llama:

Build a custom retrieval system (SQLite + embeddings).
You have full control; make it perfect for your workflow.

If you’re using web tools (Claude.ai web, ChatGPT web):

Store memory in GitHub / Notion / external docs.
Manually paste relevant docs at the start of a conversation.
Use the tool’s “custom instructions” or “system prompt” for global rules.

When you share the blog post with other readers, add a disclaimer:

## Note on Tool Portability

This essay describes a system built specifically for **Claude Code** 
(which has auto-memory and file-system integration). 

**The philosophy** (three-tier memory, lazy-loading, lean summaries) 
works with any LLM. **The implementation** (`.claude/` directories, 
CLAUDE.md auto-loading) is Claude-specific.

If you use ChatGPT, Gemini, or local models:
- Store memory in GitHub / Notion / your file system.
- Manually load relevant docs at the start of each conversation.
- Use the tool's "system instructions" or "custom instructions" for global rules.
- See the [Tool Comparison](#comparison-matrix) below for alternatives.

The Portable Takeaway

The system isn’t “Claude’s context management.” It’s “how to manage persistent memory when working with any LLM that has a context limit.”

The specifics change (where files live, how they’re auto-loaded), but the pattern holds:

Tier 1: Lean summaries (what are we building?).
Tier 2: Deep docs (how and why?).
Tier 3: Session context (what’s happening now?).

This works with Claude Code, ChatGPT + GitHub, OpenAI API + vector DB, local Llama, or a notebook + manual copy-paste.

The tool is a detail. The pattern is timeless.

Call to Action

If you want, I can write Part 2 of the series addressing this directly: “How to Adapt This System to Your LLM Tool.” Would that be useful?

Status: Draft ready for review. Move to published blog post once approved.
Date saved: 2026-04-24