Ali Rathore

May 2026

An answer you cannot audit is worth nothing

Provenance is a data-model property, not a UI feature, and it cannot be retrofitted.

There is a moment in every enterprise AI demo where the system produces a fluent, confident, plausible answer, and everyone in the room is impressed, and nobody can say whether it is true. That moment is where most document-AI projects quietly die, months later, when someone asks the only question that matters for a business decision: how do I know this is right?

The pitch for AI over documents is that it removes human reading from the loop. But an answer that arrives without evidence defers the reading. If the system says the contract permits early termination and cannot show me where, then before anyone acts on that answer, a person has to go find the clause, which is the work the system was bought to eliminate. An unverifiable answer costs the full price of verification, every time, forever.

So the property that decides whether these systems reach production is not answer quality. It is whether the system can show its work: this answer, because of this passage, on this page, in this document. I have watched that single property move customers from demo to deployment, and its absence keep better-sounding systems in pilot purgatory indefinitely.

Here is the part most teams learn too late: provenance is a property of the data model, and either every stage of the pipeline preserves it or it does not exist.

Trace what happens to a fact on its way to an answer. A document is parsed. The parse is split into chunks. Chunks are extracted into fields, embedded, indexed, retrieved, reranked, stuffed into a context window, and finally transformed by a model into prose. That is seven or eight transformations, and provenance survives only if every single one carries the source pointer forward. One stage that drops it, anywhere in the chain, and the lineage is severed: the final answer can gesture at “the corpus” but can no longer point at the page.

This is why provenance cannot be retrofitted in any honest way. Teams try, usually by bolting citation onto the end: ask the model which sources it used, or run a similarity search from the answer back into the index and present the nearest chunks as references. Both produce something that looks like provenance and is actually decoration. The model’s account of its own reasoning is a generated artifact with the same epistemic status as the answer itself. Post-hoc similarity finds passages that resemble the answer, which is not the same as the passages the answer came from, and the difference is precisely the cases that matter: the confident hallucination dressed in a plausible citation. If the pipeline destroyed lineage at ingestion, no amount of cleverness at the end recovers it. The information is simply not there.

Building it for real means designing backwards from the requirement. Parsing has to preserve layout and position, because a fact’s address in the document is part of the fact. Extraction has to be schema-aware, producing structured fields that each carry their source span, rather than prose that carries nothing. Every index entry keeps its pointer. And extraction has to be verified: an independent pass that checks extracted values against their claimed sources and repairs what fails. A citation pointing at a page that does not say what the system claims teaches the user, correctly, that the citations are theater.

None of this is free. Provenance-first design taxes every stage of the pipeline, which is why it loses every short-term argument against the quick path, and why it has to be a day-one constraint imposed by someone with the authority to hold it. It pays at the moment of truth: the security review, the auditor’s question, the executive who asks “where does it say that” in the meeting where the deal is decided.

The deeper shift is in what these systems are for. The industry frames AI over documents as a machine for producing answers. Enterprises need answers they can defend: to an auditor, a regulator, a counterparty, a court.