The physical world does not retry idempotently

A decade before anyone asked me whether an AI agent could be trusted with enterprise data, I led the software team at a robotics startup whose machine griddled, assembled, and boxed burgers with no human in the loop. Twenty-some units eventually ran in the field. I spent those years on the question of what software may do when its actions cannot be taken back, and I have never worked on anything since that taught me more about the systems I build now.

Software engineers live inside a generous abstraction: operations can be retried, transactions can be rolled back, state can be restored from backup. A patty dropped on the floor stays dropped. An over-extended actuator has hit whatever it hit. That single property reorganizes how you think about correctness, because the cost of an action stops being symmetric.

An agent acting on production systems lives in the kitchen. The email to a customer cannot be unsent. The record merged in the CRM, the order placed with a supplier, the message posted to a channel: these are physical-world actions wearing software costumes. The retry-and-rollback instincts that serve engineers well everywhere else quietly fail here, and most of what I believe about agent platform design is the robotics lessons, transplanted.

The first lesson is that irreversibility is a property you classify. On the robot, every actuation was designed with the question “what does recovery look like if this goes wrong halfway,” and the honest answers sorted actions into tiers: freely retryable, recoverable with effort, unrecoverable. The control system treated the tiers differently, demanding more certainty before the unrecoverable ones. An agent platform needs exactly this taxonomy, and mostly the industry builds without it: every tool call treated alike, the reversible database read and the irreversible customer email one token apart in the same action space. Permission gates belong on the unrecoverable tier, which is also what keeps the gates rare enough that humans still read them.

The second lesson is graceful degradation as a design discipline. Hardware fails constantly, so robotics never asks whether components will fail, only what the machine does while they are failing. Every subsystem had a safe state it could reach from anywhere; the global design goal was that no single failure could turn lunch into a hazard. Agent systems need safe states with the same rigor: what does the agent do when a tool times out mid-sequence, when a session drops halfway through a multi-step change, when the model returns something malformed at the worst moment? “Resume cleanly, having committed nothing irreversible” is a safe state. Designing so it is always reachable is the work.

The third lesson is about noise. The robot’s cameras and sensors never produced the same reading twice: lighting shifted, produce arrived in shapes nobody planned for, ingredients behaved differently at every temperature. Early on, the team treated sensor noise as a defect to be eliminated. The machine became reliable only when we accepted noise as a permanent resident and built the control loops to be correct in its presence. Model nondeterminism is the same resident under a new name. The model will sometimes misread, sometimes hallucinate, sometimes do the right thing in a surprising way, and no amount of prompt engineering evicts this. Systems anchored in “the model will behave” are the burger machine that assumed every tomato slice would be round. The durable design assumes variance and bounds its consequences: verification passes, constrained action spaces, blast radius set by architecture.

The last lesson is fleet humility. The instant units left the lab, our model of how they behaved diverged from how they actually behaved, and centralized fleet telemetry was the only thing keeping the two honest. Agents at enterprise scale are a fleet, with the same divergence: what they do in production is not what they did in the demo environment, and only observability deep enough to replay any action makes the difference visible.

People are surprised that years spent on burger robots turned out to be preparation for agent platforms. It is the least surprising thing in my career. Robotics is the discipline of autonomous systems taking irreversible actions in a noisy world under partial observation, which is precisely what an enterprise agent is.