What it takes to let agents touch enterprise data

The first version of our permission gating asked a human to approve nearly everything an agent did. Within a week people stopped reading what they approved. I watched it happen to us, the people who built the gate and should have known better. A rubber stamp is worse than no gate, because it launders responsibility while adding latency.

I keep that failure at the front of the file because it has the shape of most failures in this space: trust theater instead of trust. My job is making enterprises comfortable running AI agents against their operational data, and the meeting where that gets decided is security review. The reviewers never ask how good the model is. They ask what the agent can touch, on whose authority it acts, what happens when it goes wrong, and how anyone knows what it did. A model that’s wrong one time in a thousand is still wrong several times a day at enterprise volume, so “it is usually right” is not a control, and every question on their list is an architecture question.

What the agent can touch: the naive design gives agents access to shared infrastructure and constrains behavior with instructions. The design that works inverts it. Agents run in isolated, per-project workspaces, so the blast radius of an action is bounded before the action exists. This is the oldest idea in systems engineering, relearned for every new kind of compute: make the damage small, observable, and recoverable, and you stop needing to trust the thing doing the acting.

On whose authority: an agent is never a principal. It acts for someone, and the system has to know who, all the way down, so an agent acting for a user can do exactly what that user could do and nothing more, with every action landing in the audit trail attributed to a person. The moment an agent acts as a generic service account, attribution dies, and the review fails.

What happens when it goes wrong: assume it will, and make wrongness cheap. The gating asymmetry we landed on after the rubber-stamp episode is that reversible, contained actions proceed without ceremony, and anything that crosses a boundary or can’t be undone stops and waits for a person. The gate earns attention by firing rarely. Reliability turned out to be part of this question in a way I hadn’t expected: our first session handling treated interruption as an edge case, and a dropped connection could throw away an hour of agent work. A system that loses work trains its users to distrust it in ways no security document repairs.

How anyone knows what it did: I’ve come to treat the audit trail as the product. What the customer is buying is the ability to answer, months later, why something happened. The trail is also how you verify the isolation and attribution claims actually hold, instead of taking the architecture diagram’s word for it.

None of this is new. It’s the standard playbook of multi-tenant systems applied to a new kind of compute, which is exactly why it works on reviewers who have seen everything. When it holds, security review stops being where the deal dies and becomes the strongest meeting in the sale: the customer’s hardest people ask their hardest questions, and the answers hold. Enterprises don’t adopt agents because they’ve decided to trust AI. They adopt them when trust stops being required.