Managing engineers who never sleep

At one in the morning an engineer reported that a long-running data ingest had hung. The case was thorough: process parked in a wait state, CPU flat at zero, file-descriptor counts that looked wrong, a confident story tying it all together. I went and looked at the output instead. The job had finished, successfully, some time earlier.

The engineer was an AI coding agent, and what I did next was write it a rule: judge a background job by its output, never by process forensics. Then I looked at the file I had just added to and realized what it was. A year of running agents across twenty-some repositories had piled up more than a hundred and fifty rules like that one, each in the shape I learned writing feedback for humans: what happened, why it matters, what to do next time. Without meaning to, I had built a management apparatus for a team that never sleeps and never remembers yesterday.

The forgetting is the part that matters. An agent can’t learn from a conversation it won’t remember, so feedback that isn’t written into its standing context never happened. With people, documentation is one channel among several. With agents it’s the only one.

Some entries from the file:

Read the code before asserting what it does.

Don’t commit what you haven’t run.

Missing data is a finding. Never seed fake rows to make a test pass.

The odd setting is usually deliberate. Don’t edit configuration you don’t understand.

Ordinary junior coaching, except juniors make these mistakes one bad afternoon at a time. An agent repeats its bad afternoon across every session, in parallel, with perfect confidence, until somebody writes the rule down.

Other entries have no human precedent. Two agents sharing a branch produced a failure I had never needed a name for: one amended what it believed was its own commit and rewrote the other’s work, because HEAD had moved between its commands. The rules that came out of the wreckage, commit forward-only, never stash anyone’s working tree, amount to concurrency control for colleagues. No human team writes that down, because humans don’t interleave on a shared tree at millisecond resolution.

Then there’s the bluffing. An agent once answered a question with specifics the cited page never contained. Real citation, invented claim, no awareness anywhere in the loop that a bluff was underway. I tried to prompt that away and failed, more than once. What works is a validation stage outside the agent that checks claims against sources. I stopped trying to talk it out of lying.

Rereading the file surprised me in one direction: nearly half the rules are accelerators, not brakes. Stop re-running the full check suite after a one-line edit. Stop presenting me with multiple choice when I already stated the criterion. Stop building shims for consumers that don’t exist. Tune an agent only with brakes and you get a very expensive, very timid engineer.

One cost has simply collapsed: review. I don’t ask the author to self-review, since its blind spots ride along in its context. I spawn a fresh agent to read the diff cold, with no investment in the change. A disinterested reviewer for every commit is a luxury no human team ever had.

When I give feedback to a person now, I notice I reach for the same shape. What happened, why it matters, what to do differently. I learned it writing for readers who would not remember me in the morning.