Ali Rathore

April 2026

LLM spend is an attribution problem

A month of anonymous model spend taught me that cost governance for AI is identity infrastructure, and the discipline that fixes it already has a name.

The strangest forensic work I have done lately was over a bill. A month of model API spend on a shared development server, attached to no human being. I pulled the coding agents’ local session logs off the box, priced each session by hand, and assigned sessions to people by which project directory they sat in and, in honest moments, by writing style.

The cause was almost embarrassingly small. Several engineers shared the host under one operating-system account and one cloud role, and someone set a global environment default that pinned every coding-agent invocation to the most capable model tier. Nobody decided to spend that money. Architecture sessions, overnight evaluation runs, a quick question about where a config file lives: all billed at the top of the price list, because no person ever chose a model for any of it. Roughly a month passed before anyone noticed, because the surface where the spend appeared carried no names.

The archaeology mostly worked, and the part that did not work is the lesson. A large minority of the spend could never be assigned to anyone: sub-agents that logged elsewhere, an editor-embedded agent with its own session store, possibly other principals in the account altogether. That money is permanently anonymous. The identity simply did not get written down when the calls were made, and attribution is a write-time property.

The platform could not help; it counts the wrong thing. Its unit of account is the model: spend per model, invocations per model, latency per model. The principal, the who in “who spent this,” exists only if invocation logging was wired up in advance, exactly the work nobody does before their first incident. Vendors have lately begun shipping per-principal cost allocation. The observability tools that promise per-user cost see only the calls routed through their instrumented gateway. A developer’s coding agent on a shared box is the call the gateway misses.

Somewhere in there this stopped being a finance story. Security asks who did this. Audit asks whether we can prove who did this. Spend asks who pays for this. All three fail at once for one root cause: actions no principal owns. The shared role that made our bill anonymous would have made a breach investigation a shrug, and the cleanup, fittingly, also turned up a leftover credential and an orphaned process.

The logs also corrected my mental model of what an agent costs. One human task fanned out into hundreds of model calls across more than a day: planning, tool use, sub-agents, retries. Roughly half the cost bought no answers at all; it was context replay, the agent re-reading its own conversation into every call. Per-request cost is meaningless at that shape. The unit that survives agents is cost per outcome per principal, and both halves of that fraction depend on identity threaded through the fan-out as it happens.

Everything we changed afterward was identity work. A principal per person, especially on shared machines. Expensive tiers made opt-in per task, the way production access is opt-in, so a default can never again commit everyone silently. Invocation logging on from the start, treated as the audit log that spend turns out to be. Alarms on invocation rates, the way we alarm on authentication failures. None of it required new engineering. Every organization already runs a discipline whose entire job is making actions attributable to principals, with the logging decided before the incident instead of after. It sits down the hall from finance and is called security.