A turn is not a request

I closed my laptop in the middle of a twenty-minute agent run and opened it on my phone twenty minutes later. The run was still going, four tool calls further along than where I left it, and the transcript scrolled into view from exactly the frame I had last seen. Nothing had been lost, because nothing about the work was tied to the connection that happened to be watching it.

That is the property I had to build toward, and the first version did not have it. The naive design is the obvious one: a browser opens a stream, the stream spawns the agent process, frames flow back over that stream, and when the stream closes the process dies with it. It demos perfectly. Then a real session runs long enough that a network blips, a phone locks, a tab is reloaded, and the agent is killed mid-edit with half a change written to disk. The work was never the request. The request was just a viewport onto the work.

So the unit had to move. A turn is a process the server owns, with a lifecycle of its own, and a connection is a subscriber that can come and go without the turn noticing. Starting a turn registers it in a process-wide table and launches it on the server’s lifetime, not the socket’s. Every frame the agent emits, a line of reasoning, a tool call, a tool result, gets a monotonic sequence number and lands in a ring buffer. A browser that connects says which sequence number it last saw, and the server replays everything after it, then switches to live. Disconnect at frame 4,000, reconnect at 4,200, and you get exactly frames 4,001 through 4,200 and then the present. The buffer holds the last 5,000 frames and survives five minutes past the end of a turn, which is enough to reattach from a train tunnel and enough that a finished run is still there when a second device wakes up.

The turn runs on the server's clock. The connection is a subscriber that replays the gap on return.

The harder consequence was the transport. I had a clean one-way stream working, and one-way is the wrong shape, because the interesting agents stop and ask. A coding agent wants to run a command that touches the repo, and it has to wait for a human to say yes before it runs, and that yes has to travel back up the same channel the frames came down. A one-way stream cannot carry the answer. So the agent’s request to use a tool becomes a frame to the browser, the browser renders a card, the operator approves or edits the input or denies, and the decision rides back to the waiting agent over a single bidirectional socket. I deleted the one-way path entirely. Carrying both was just two ways for the permission handshake to be subtly wrong.

The permission broker fails closed, which is the only safe direction. If no browser is attached to answer, if the operator walks away, if five minutes pass, the pending tool resolves to deny and the turn continues without running it. A tool never executes because nobody was around to refuse it. That sentence is worth more than it looks: it means leaving the room is the same as saying no, not the same as saying yes.

Decoupling the turn from the connection bought something I did not design for. If a connection is just a subscriber, there can be more than one. Two browsers, a laptop and a phone, can watch the same turn live, and either can answer the next permission prompt. I can point at an element in the shared view and the agent picks it up on its next turn. This is most of the way to multiplayer, and it stops short of the part that is actually hard: two people forming one intent. Today the last writer wins and the others watch. What happens when two operators answer the same permission prompt with different answers in the same second is a question I have not had to resolve yet, because so far it has always been me on both devices.