Context Engineering: Stop Your Agent Forgetting — and Burning Tokens It Didn't Need

A coding agent fails in two quiet, expensive ways: it forgets what it learned last session, and it re-reads your whole codebase to answer a small question. Four open-source tools attack both — memory, knowledge graphs, and context compression — and they map cleanly onto ideas any data engineer already trusts.

In the last post I argued that skills teach an agent how you work. This one is about the other half of the problem: what the agent knows, and what it pays to know it.

A coding agent fails in two quiet ways, and both cost you money. It forgets — close the session, and the hard-won understanding of your repo evaporates, so you re-explain it tomorrow. And it over-reads — to answer “where do we validate webhooks?” it scans far more of your codebase than it needed, and every one of those tokens is on the bill.

I spend most of my time building data pipelines, so this framing came naturally: the agent’s context window is a query engine. And we already know how to make query engines fast — index once, query cheap, compress before you pay. Four tools do exactly that for coding agents.

The token bill, made literal: `headroom`

headroom (from a Netflix engineer) sits between your tools and the model and compresses everything the agent reads before it reaches the context window — they report 60–95% fewer tokens for the same responses.

The data-engineer analogy: column pruning and compression. You don’t ship every byte to the query node — you send the smallest representation that still answers the question.

Reach for it when: your agent runs long, tool-heavy sessions (big file reads, large tool outputs) and your token spend has started to sting.

Honest caveat: “same responses” is the claim to verify on your workload. Compression is lossy by design; test it on a few real tasks and watch for quality drift before you trust the savings.

Memory across sessions: `claude-mem`

claude-mem gives the agent persistent memory — it captures and condenses what happened in a session so the next one can pick up the thread instead of starting cold.

The data-engineer analogy: a materialized summary table. You don’t replay the whole event log every morning; you keep a rolled-up state you can read instantly.

Reach for it when: you work in the same repos for weeks and you’re tired of re-establishing context every single session.

Honest caveat: memory is only an asset while it’s true. Stale or wrong memories get recalled with the same confidence as good ones. Whatever you adopt, know how to inspect and prune what it remembers.

Understanding code without re-reading it: `codebase-memory-mcp`

codebase-memory-mcp builds a knowledge graph of your codebase — functions, files, relationships — exposed to Claude Code, Cursor, or Codex over MCP. It auto-syncs as the code changes, runs 100% locally, and the maintainers pitch it as adding zero extra tokens at query time.

The data-engineer analogy: an index. Instead of full-scanning files to learn that processWebhook calls verifySignature, the agent looks it up.

Reach for it when: you’re on a large or unfamiliar codebase and the agent keeps “exploring” the same files to reconstruct structure it could have just queried.

Honest caveat: an index is only as good as its freshness. Auto-sync is the whole game here — confirm it actually keeps up on a fast-moving branch, or you’ll get confidently outdated answers.

One graph for code and everything around it: `graphify`

graphify widens the same idea past code. It turns a folder of source, SQL schemas, R/shell scripts, docs, papers, even images and videos into a single queryable knowledge graph — app code, database schema, and infrastructure in one place.

The data-engineer analogy: a unified semantic layer over a messy lake. The value isn’t any one table — it’s the joins across them.

Reach for it when: the questions that actually slow you down cross boundaries — “this API field maps to which column, populated by which job?” That answer lives in three systems at once, and a single graph is where it stops being a scavenger hunt.

Honest caveat: the broader the graph, the more it costs to build and keep current. Point it at the corner where cross-system questions genuinely hurt — not at everything, on day one.

How I’d actually layer them

These aren’t four answers to one question — they’re four layers of the same stack:

That last clause matters. None of this is free: every layer is more setup, more moving parts, more freshness to babysit. Don’t bolt on a knowledge graph because it’s clever — add it the day you watch the agent re-read the same directory for the fifth time. Let the pain pick the tool. That’s true of data infrastructure, and it’s just as true here.

Part two of a short series on getting real work out of coding agents. Next: loop engineering — designing systems where agents keep working, checking, and fixing without you babysitting every step.

Context Engineering: Stop Your Agent Forgetting — and Burning Tokens It Didn't Need

The token bill, made literal: headroom

Memory across sessions: claude-mem

Understanding code without re-reading it: codebase-memory-mcp

One graph for code and everything around it: graphify

How I’d actually layer them

The token bill, made literal: `headroom`

Memory across sessions: `claude-mem`

Understanding code without re-reading it: `codebase-memory-mcp`

One graph for code and everything around it: `graphify`