Writing
Essays on agentic systems in production, written from the inside.
The honest default is noFour systems that do nothing alike share their most important line of code, the one that runs when the system has no idea, and in all four it refuses.
Inducing the schema instead of supplying itThe unsolved part of document AI is discovering the columns, not filling them, and a column should only exist if it pays for itself in bits.
I built it one validation too earlyThe abstractions I have had to delete were not wrong, they were premature, built to answer a question the work had not yet asked.
A good average hides the column that mattersA synthetic dataset that scores well on average can still be useless and unsafe, so the only scores worth trusting gate on the worst part and fail closed.
Point, don't dragDrag-and-drop is the wrong handle for steering a model that builds your interface, because the thing you grab and the thing it wrote stopped being one-to-one the moment a list appeared.
The harness remembers what the agent forgetsThe coding agent starts every build with no memory of the last one, so the system that runs it got better by writing down where the agent kept failing.
Letting an agent merge to mainAutonomy is not something the model has. It is the set of gates you are willing to put around it, and an audit trail of everything it did inside them.
Watching an agent workThe interesting part of an agent is not that it acts, it is the shared room where you can see what it sees, point at things, and take the controls back.
A turn is not a requestAn agent's unit of work is a long-lived process, and the moment you treat it like an HTTP request you lose the run every time a laptop sleeps.
Managing engineers who never sleepA year of running AI coding agents turned into an accidental management apparatus, written one rule at a time.
What it takes to let agents touch enterprise dataTrust in agentic systems is an architecture property, not a model property.
Your gold set is lying to youReference answers are artifacts with bugs, and they are the only software in the stack that nobody code-reviews.
An answer you cannot audit is worth nothingProvenance is a data-model property, not a UI feature, and it cannot be retrofitted.
Authorization for answersWe have learned to secure what agents do. The harder question is what they are allowed to know.
Evaluating systems that answer in sentencesEvaluation infrastructure is the difference between a demo and a product.
LLM spend is an attribution problemA month of anonymous model spend taught me that cost governance for AI is identity infrastructure, and the discipline that fixes it already has a name.
Data infrastructure and AI infrastructure are different disciplinesThey optimize for different things, fail in different ways, and the hard problems live at the seam.
Why your knowledge graph isn't helping your RAGWe nearly published the fashionable verdict that GraphRAG is hype. The audit found something better, the structural reasons the published architectures break on enterprise data.
The physical world does not retry idempotentlyWhat burger-making robots taught me about agents acting on production data.
The model designs, the code enforcesIn an LLM ingestion pipeline, the model gets the judgment that requires reading the document and nothing else. The surprise is how badly the prompt wants to violate that in both directions.
Database lessons for the agent eraAgent platforms are rediscovering, one incident at a time, what database engines settled decades ago.
Tables are imagesMeaning in a table lives in its geometry, and the architecture that survives a real corpus is a per-table router, not an ideology.
What regulation does to architectureYears of healthcare engineering taught me that compliance, taken seriously, is a design input that produces better systems.
When documents become databasesSchema discovery is a corpus-statistics problem before it is a modeling problem.
Free like a puppy, times fifteenA product assembled from open source is not the sum of its components. It is the resolution of their disagreements.
Forcing the raceRace conditions are not hard to test. They are untested, because teams accept probabilistic reproduction.