The harness remembers what the agent forgets

On build 60 the agent walked into the same wall it had walked into on build 12. Same test, same wrong assumption about how the test harness wanted to be called, same twenty minutes of flailing before it found its way through. Watching it happen twice was the thing that bothered me, because the second time was not the agent’s fault. It was mine. I had let it start the sixtieth build knowing exactly as much as it knew on the first, which is to say nothing it had not been handed in the prompt, and then I was annoyed it had not learned.

The agent is stateless across runs and there is no fixing that by wishing. Each build spins up a fresh context, does its work, opens a pull request, and evaporates. Whatever it figured out the hard way, the convention it finally inferred, the dead end it backed out of, the file it should never have opened, goes with it. The next build inherits none of it. If I want the second attempt at a recurring problem to go better than the first, the memory cannot live in the agent. It has to live in the thing that outlives the agent, which is the harness.

So the harness keeps the tape. Every build already records each tool call to an ordered ledger, originally just so I could audit what happened. The ledger turned out to be a record of friction too, if you read it for it. Where the agent ran the same command four times with small variations, it was lost. Where it opened six files in a directory and edited none of them, it was searching for something the layout did not make obvious. Where it hit a failing check and thrashed before recovering, there was a lesson the next agent should not have to relearn. None of that is in the diff. It is in the shape of the path the agent took to produce the diff.

The agent evaporates after every build. The instructions it reads on the next one are where the learning accrues.

So a step at the end of each build reads its own ledger and distills the friction into a sentence or two, and those sentences get written into the instruction file the next agent reads before it starts. Not the raw ledger, which is enormous and mostly noise. The lesson. The test wants to be invoked this way. This directory is generated, do not edit it by hand. The thing you are looking for lives over here. The wall the sixtieth build hit is, by the sixty-first, a line in the briefing. The agent did not get smarter. The room it walks into got better signposted, by the previous tenant, automatically.

I find the inversion genuinely useful and I am also wary of it, because a memory that writes itself can write down the wrong thing. The obvious trap is teaching the next agent a workaround for a bug I should have just fixed, so the lesson hardens the defect into doctrine and the real problem never gets the attention a recurring failure deserves. The quieter trap is rot: a lesson that was true in May and is misleading by July, sitting in the briefing with the authority of everything around it, with nothing checking whether it still holds. The instructions grow, and growth is not the same as getting wiser. I have started reading the distilled lessons more skeptically than the code, because the code at least gets a review, and for a while the thing the harness remembered about itself did not.