Database lessons for the agent era

My first serious systems work was inside the SAP HANA database core, building a lockless persistent hash table for the engine’s hottest path. The problem was contention: parallel queries serializing on a shared structure, every lock a tax on throughput. The solution was to remove the locks and replace them with compare-and-swap progressions whose correctness had to hold under every interleaving the hardware could produce. Parallel query throughput improved 200%, but what stayed with me is the standard of proof: in a database core, “it works when things happen in a reasonable order” is not an argument. The adversary is the scheduler, and the scheduler owes you nothing.

I now build platforms where autonomous agents run against enterprise data, and the longer I do it, the more the work feels familiar. Strip the vocabulary away and an agent platform is a system in which concurrent, unreliable actors operate on shared state of record. That is a database. The agent era keeps presenting as a new discipline, and most of its hard problems were settled by database engineers decades ago, under different names.

Take isolation. Databases learned early that concurrent transactions cannot be allowed to see each other’s intermediate state, and that the property has to be enforced by the engine, never volunteered by the transaction. Agent platforms relearn this with each incident in which agents share an execution environment and one agent’s mess becomes another’s context. The answer is the database’s answer: workspaces that bound what each actor can read and touch, enforced beneath the actor, so that a misbehaving agent is contained by construction. We even kept the word.

Take recovery. A database assumes the crash. The write-ahead log exists because the engine commits to remembering what it intended before it acts, so that any interruption, at any instant, resolves to a state the system can stand behind: finish what was promised, discard what was not. Agent sessions need precisely this discipline and almost never get it. The common construction treats an agent’s run as ephemeral conversation, so a dropped connection or a crashed process discards an hour of accumulated work, the equivalent of a database that loses committed transactions on restart. Building sessions that are recoverable by design, where work survives interruption and resumes rather than restarts, is the write-ahead log lesson wearing new clothes.

Take atomicity. A transaction either happens or it does not; the engine refuses to leave the world half-written. Agents perform multi-step operations against systems that offer no transactions, which makes partial completion the default failure mode: three of five steps done, the connection gone, the world in a state nobody intended. The platform must know, for every sequence, what its commit point is, what is safe to retry short of it, and what must never be attempted twice past it. Sagas and compensations rediscovered this for microservices; agents need it more, because the steps are chosen at runtime by a probabilistic planner.

Take attribution. Inside a database engine, every change is stamped with its origin: which transaction wrote this version, in what order, visible to whom. Multiversion concurrency control is, among other things, an accountability mechanism, and the engine could not function without knowing who wrote what. The agent equivalent is identity flowing through every layer, so that each action an agent takes is attributable to the human authority it acted under. Most agent stacks instead funnel everything through a service account, which is a database where every row was last modified by “system.” No security review should accept that from an agent platform, and increasingly none does.

Even the audit trail has a lineage: it is the redo log, repurposed. The database keeps a record sufficient to reconstruct what happened, in order, from any point. An agent platform whose logs cannot replay what the agent did, tool call by tool call, fails the same test the same way.

There is a pattern in why these lessons transfer. Database engineering is the discipline that took unreliable concurrent actors and shared mutable state and made it boring through forty years of paying for every shortcut. Its core findings are about what it costs to guarantee behavior in the presence of failure and concurrency, and those costs do not change because the actor is a model instead of a transaction.

So my advice to engineers building agent infrastructure is the advice I least expected to give: study database internals. The isolation levels, the logs, the commit protocols, the visibility rules: that is what infrastructure looks like after every plausible shortcut has been tried in production and billed.