Where AI coding tokens actually go

Your AI coding spend lands as one opaque number at the end of the month. It tells you how much — never what for. This post breaks down the four places the tokens actually go, and why the waste hides where it does.

The bill is mostly noise

A flaky test gets pasted back in. The assistant retries blind. The same error loops for thousands of tokens before anything lands. Multiply that across four assistants and a month, and the invoice is dominated by work that shipped nothing.

The categories worth naming:

Root-context bloat — CLAUDE.md and config files re-sent on every call.
Loops & blind retries — runs that spun on the same error.
Tool & MCP failures — servers loaded into context but never used.
Wrong-sized models — a heavyweight model doing featherweight edits.

You can’t cut what you can’t see. The first job is making the waste legible.

Naming the fix, not just the number

Finding waste once is easy. The part that sticks is feedback: a specific, quantified change — do this, save that — handed to the engineer who can act on it.

// Before: every tool loaded on every call
const tools = loadAll(config.mcp);

// After: load only what this session reaches for
const tools = loadUsed(config.mcp, session.history);

That’s the whole idea behind Frugl — read the firehose, flag the waste, and coach each engineer toward wasting less next time.