All research

PAPERTR-2026-07

Methodology / 2026

Receipts as training data

Governance receipts produced at runtime are not audit logs. They are labelled training examples, generated automatically as a side effect of governed agent action. This note explains what a receipt contains, why that makes it dense training material, and what the Rocky series demonstrated when models were trained exclusively on them.

2026-05-27

In early 2026, the receipts Trace was producing started getting richer.

What began as a decision record, allow or deny, action class and outcome, was accumulating more: semantic action type, mutation scope, data sensitivity, the full authority state at the moment of the decision, and eventually the observed outcome linked back to the original intent. Each receipt was becoming a complete picture of one agent decision, in context, with consequences attached.

That is when a different question became possible. Not "what happened?" but "can a model learn from this?"

THE THESIS

A governance receipt is a labelled training example. The label is the governance decision itself, produced automatically by the system. The input is the full evidence state. No human annotation required. No synthetic generation. The training data is a byproduct of governance.

What a receipt contains

Every time Trace intercepts an agent action, it produces a receipt. That receipt records:

  • The proposed action and its target
  • The authority state at decision time: lease validity, operator certification, scope
  • The evidence supporting or undermining the action
  • The governance decision: allow, deny, or freeze
  • The reason the decision was made
  • A cryptographic signature binding all of the above

The governance decision is the label. The rest is the input. The receipt is a structured, immutable, automatically labelled training example.

This is not how most training data is produced.

Why receipt data is different

Most fine-tuning data for agentic AI comes from one of three sources:

SourceProblem
SyntheticGenerated by another model. Inherits its biases and blind spots.
Human-labelledExpensive, slow, and labelled outside the production context where decisions actually matter.
ScrapedBroad coverage, almost no relevance to specific agent behaviour in specific systems.

None of these capture what actually happened when a real agent, operating in a real system, made a real decision with real consequences.

There is also a subtler failure. Most training data does not make permission a first-class variable. A model trained on standard action logs learns surface patterns: which actions tend to appear in allowed contexts, which tend to get blocked. It learns "action X is usually allowed," not "action X is allowed when the authority conditions are met." Flip the lease from valid to expired, and the model may still output allow, because it learned the action, not the permission boundary.

Receipt data is different on every dimension. It is real, not simulated. It is contextual: intent, authority, scope, and outcome are all present in the same artifact. It is automatically labelled: the governance system produces the label as part of doing its job. And it is immutable: the ground truth cannot be revised after the fact.

Critically, the label was produced by the actual permission check. Not inferred from surface patterns. Not guessed by another model. The governance decision is what it is because the authority state was what it was. That is what makes receipt labels authoritative: they are ground truth, not annotation.

The loop

The relationship between governance and learning is not one-directional.

Governed agent actions produce receipts. Receipts are training material for models that make governance decisions. Better-trained models produce sharper governance decisions. Sharper decisions produce higher-quality receipts. The loop closes.

This is not theoretical. The Trace system has generated 46,169 receipts to date. The pattern distiller has extracted 279 distinct governance patterns from that corpus. Those patterns are already being read by the system and fed back into governance enforcement, without any model retraining, purely through structured memory.

The fine-tuning version of this loop operates at larger scale. But the mechanism is the same: governance labels its own data as a side effect of running.

What Rocky proved

The Rocky series is the empirical test of this thesis.

Rocky-α and Rocky-DAM-α were trained exclusively on receipt-shaped evidence: structured records combining proposed action, authority state, governance fields, and, in the DAM variant, physical sensor state. No other training signal. No large pre-trained backbone. No synthetic augmentation beyond what the counterfactual methodology produced from the receipts themselves.

A 6M-parameter model trained on this data learned to make sharp allow, deny, and freeze decisions on scenarios it had never encountered. It almost never permitted actions it should have blocked. When it erred, it erred conservative. It got the direction of failure right.

Rocky-α was trained on approximately 3,000 real Trace receipts, live governance decisions produced by agents operating under Trace in software contexts. Rocky-DAM-α was trained on synthetic receipts generated from real infrastructure schemas, because live dam governance data does not yet exist at training scale. The receipt structure is identical in both cases. The source differs. Real receipts for the software model. Synthetic receipts, shaped like real ones, for the physical infrastructure model. That is the training set that produced a model capable of generalising to novel physical scenarios and applying learned governance rules to inputs it had never encountered.

The density of the signal compensated for the volume. Every receipt carried a real label. Every label reflected a real governance decision. There was no noise from irrelevant examples, no ambiguity from unclear annotation, no domain mismatch between training data and deployment context. The corpus was small and precise, not large and approximate.

THE MECHANISM

Receipts are dense training material because they are produced by the exact task the model is trained to perform.

The receipts themselves can be made denser still. Pairing each example with a minimally edited twin, one accountability variable flipped, forces the model to attend to the permission boundary rather than the action surface. Accountability variables only become first-class when training data makes them counterfactually distinguishable. That method, and the results it produced, are covered in TR-2026-09.

The immutability principle

The system learns from receipts. It cannot change them.

These are two separate layers and they never touch. Receipts are signed at creation and immutable thereafter. The learning layer reads the receipt stream and updates its behaviour. A corrupted or poorly calibrated model cannot alter the historical record. The audit chain remains intact regardless of what the model does with it.

This matters practically. For an infrastructure governance system, the receipt is not just a training artifact. It is the proof that a decision was made, by what authority, on what evidence. That proof must survive the model that reads it. Immutability is not a design preference. It is what makes the system trustworthy enough to operate near something consequential.

What compounds

A receipt from week one is useful. A receipt from month twelve, in the context of eleven months of prior receipts from the same agent operating in the same domain, is something more.

The history gives context. The context gives meaning. A governance decision that looks routine in isolation becomes significant when the receipt stream shows it is the first time a particular authority configuration has appeared, or that it follows a pattern of escalating scope requests, or that it is the exception in a domain where every prior similar action was denied.

This is what synthetic data cannot replicate. The corpus is not just a collection of labelled examples. It is a record of an agent operating in a specific environment over time. That record is institutional memory. Its value accumulates with use.

The strategic implication is direct. Whoever accumulates the largest corpus of real, labelled, domain-specific governance receipts holds an asymmetric advantage in training the next generation of agents for that domain. The corpus cannot be purchased, scraped, or synthesised after the fact. It only exists because governance ran. An organisation that has been running agents under governance for two years has two years of labelled, domain-specific decision history. There is no shortcut to that.

What is not yet answered

Two questions remain open.

The first is volume. At what receipt count does fine-tuning become viable compared to in-context learning? 3,000 receipts trained a 6M model to useful accuracy on a structured task. Whether that threshold holds for larger models, broader domains, or more complex decision spaces requires further measurement.

The second is chaining. When Agent A calls Agent B, both under governance, the receipts from each run contain partial pictures of the same decision chain. How cross-agent receipt sequences affect training signal quality is not yet characterised. The loop works cleanly for single-agent governance. The multi-agent case introduces dependencies that need their own treatment.

Both are tractable. Neither is a blocker for the thesis. The receipts are the training data. Rocky demonstrated it. The loop is running.

GET INVOLVED

Interested in the receipt flywheel? Reach out to james@transientintelligence.com.

END OF DOCUMENT

TR-2026-07 / Methodology / 2026 / 2026-05-27. Active research. Not deployment-ready. Reach out via Become a design partner if you want to discuss applying this work in your domain.