PAPERTR-2026-02

Software governance / 20262026-05-23

Rocky-α

A 6M-parameter model trained from scratch on structured Trace execution logs learned sharp allow, deny, and freeze boundaries in software-action contexts. Evaluated against scenarios it had never encountered, it almost never permitted actions it should have blocked. When it made mistakes, it stopped actions it should have allowed, not the reverse. Proof that governance receipts are dense training material for small, specialised decision engines.

The central finding of Rocky-α is not in the model. It is in what the model was trained on.

Rocky-α is a 6M-parameter model trained from scratch on structured Trace execution logs: real records of agent actions, each capturing what was proposed, what authority was present, and what policy decided. Those records are governance receipts. A model trained exclusively on them learned to make sharp allow, deny, and freeze decisions on software actions it had never seen.

Governance receipts are not audit logs. They are training data. The full case for why is in TR-2026-07.

THE FINDING

Each time Trace intercepts an action, it produces a receipt. That receipt is what Rocky-α was built from. The model that governs the next action was trained on the records of the last ones. That loop exists. Rocky-α is the first evidence it produces real signal.

What Rocky-α is

Rocky-α is a small, single-purpose classifier:

6M parameters, trained from scratch
Receipt-shaped input: proposed action, tool class, target, scope, authority state, evidence freshness
Three outputs only: allow, deny, freeze
Software-action contexts: no physical sensors, no telemetry

It is the software-side baseline. Rocky-DAM-α extends the same architecture to sensor-conditioned physical infrastructure governance. This is the foundation that experiment builds on.

The model produces no prose. No explanation. A decision.

Why build it

Every action an autonomous agent takes is a governance decision. Should this proceed? The industry's current answer is one of two things: prompt a large general model to be careful, or write static rules that block specific actions.

Neither scales. A prompted general model is statistically unreliable on binary decisions it was not trained for. Static rules are rigid and blind to context. Neither approach gets better over time.

Rocky-α is a third path. Instead of asking a general model to infer governance, or encoding rules by hand, the question was: can a model learn the governance boundary directly from the records of governed actions? Every time an agent acts under Trace, a receipt is produced. That receipt captures what was proposed, what authority was present, what evidence supported it, and what policy decided. Train a model on those records, and the governance boundary is not described to it. It is demonstrated.

The bet was that demonstration, at sufficient volume and with the right training discipline, produces a model that generalises. Rocky-α tests that bet in software-action contexts.

Why 6M parameters

The standard assumption in AI is that capability scales with size. Rocky-α challenges that for narrow, structured tasks.

The task here does not require open-ended reasoning, broad world knowledge, or general language understanding. It requires one thing: sensitivity to the permission boundary in a structured record. A frontier model brings enormous capacity to a task that does not need it. That capacity introduces variance. It also introduces cost. A 6M model trained for this specific task is cheaper to run, faster to retrain, and more inspectable than a general model wrapped in a governance prompt.

The more important argument is the training cycle. At 6M parameters, Rocky-α retrains in hours on a single machine. That is not a footnote. It is the reason the receipt loop is practical, not just theoretically appealing. New receipts accumulate. The model retrains. The governance boundary tightens. No GPU cluster. No weeks of compute. The model that governs the next action can be updated on the receipts from the last thousand, and the cost of doing so is negligible.

The size is not a limitation. It is an architectural choice. Narrow task, narrow model, fast cycle.

Evaluation

Most models trained for governance tasks learn the action label, not the permission boundary. They score well on standard accuracy tests by memorising which actions tend to be allowed. The model is not reading the permission slip. It is reading the action name.

Rocky-α was evaluated specifically to expose that failure. The test set was held out and never seen during training. Checkpoint selection was made on a separate validation split. The evaluation measured whether Rocky-α's decisions changed when authority fields changed, independent of the action's surface form. A model that had memorised labels would fail. A model that had learned the boundary would not.

Rocky-α showed it was attending to the right variables.

Mean overall accuracy

74%

Unsafe-proceed (mean)

3.9%

Unsafe-proceed (worst)

9.0%

Deny accuracy (mean)

94%

Metric	Mean	Best	Worst
Overall accuracy	74%	86%	59%
Allow accuracy	40%	59%	18%
Deny accuracy	94%	100%	88%
Freeze accuracy	87%	100%	64%
Unsafe-proceed	3.9%	0.0%	9.0%

To read these numbers correctly, start with what the task actually asks. There are three possible outputs: allow, deny, and freeze. Unlike a binary yes/no decision, a random guess here is right one time in three, not one time in two. Three classes, balanced across the evaluation set, means a model that guesses randomly scores 33%. A model that always predicts deny scores 33%. A model that always predicts deny or freeze scores 67% by never taking a risk. Rocky-α at 74% mean overall accuracy on held-out scenarios it has never seen, designed specifically to expose surface-form shortcuts, is doing something real.

The metric that matters most is unsafe-proceed. A 3.9% mean rate means the model rarely permits actions it should block. When it does err, the 94% deny accuracy and 87% freeze accuracy tell you where the weight of the model sits: it is built around caution, not permission. The 40% allow accuracy looks weak in isolation. In context it is the expected shape of a conservative governance model. It stops more than it needs to. It almost never lets through what it should not.

The 27-point swing between best and worst overall accuracy is not noise. Performance degrades significantly with how evidence is presented, not just what it contains. The boundary recognition is real. The format sensitivity is the next problem to solve.

For reference: in TR-2026-01, the Rocky-DAM-α variant was benchmarked directly against GPT-4o-mini and GPT-4.1-mini on the same output space. General-purpose frontier mini models either refused the majority of valid actions (causing operational paralysis) or introduced unsafe proceed at a higher rate. Neither failure mode is acceptable. They are just different kinds of wrong.

The conservative shape

Rocky-α makes structured errors. It is not random.

Deny recognition is strong across all conditions: mean 94%, never below 88%. Freeze recognition is strong on familiar receipt formats, reaching 100% on two of five evaluation conditions. The unsafe-proceed rate is low.

The weakness is allow recovery, especially under representation shifts. The model was evaluated across five different ways of presenting the same receipt information: different field orderings, different rendering formats, structured versus plain layouts. When the presentation changed, even though the underlying evidence was identical, freeze accuracy degraded from 100% toward 64-69%. The model becomes conservative to the point of inaccuracy: it stops actions it should allow and some it should freeze, collapsing toward a blanket block posture rather than a calibrated one.

That is the correct failure direction for an early governance model. The unsafe-proceed rate remains low even in the worst conditions. But the allow accuracy, 18% at worst and 40% on average, is the current bottleneck.

The checkpoint problem

One limitation is worth naming precisely, because it is a real research finding, not a gap to hide.

Different evaluation conditions mature at different training steps. The model becomes safe or expressive at different checkpoints depending on the type of distribution shift being applied. Per-condition checkpoint selection outperforms selecting a single global checkpoint across all conditions.

Attempts to average checkpoints, or find a hidden stable region between them, failed. The model passes through an unstable band rather than a smooth trade-off curve.

This is not a blocker. It is a clear next optimisation target. The likely fix is at the representation level, not more checkpoint arithmetic.

Limitations

Evaluation scope. The held-out test rows are controlled perturbations designed to stress-test specific generalisation properties. More rigorous than standard train/test splits, but not a real production environment with noisy, continuous inputs.
Software contexts only. Rocky-α reads software-action receipts. No sensor data, no physical system state, no telemetry. That domain is Rocky-DAM-α.
Checkpoint instability. No single checkpoint has been found that is simultaneously safe and expressive across all evaluation conditions. Per-condition selection is the current approach.
Allow calibration. Rocky-α over-stops on allow cases, particularly under representation shifts. This is the current optimisation bottleneck.

Deployable?

No. Not yet.

Autonomous systems are scaling exponentially, making faster decisions with dwindling human oversight. The industry is desperately searching for a referee.

Rocky-α proves the referee does not need to be a multi-billion-parameter behemoth. A 6M-parameter model trained exclusively on receipts can learn razor-sharp boundary judgements that a general assistant will always miss. It retrains in hours. It improves with every receipt the system produces. It gets better at the job by doing the job.

The checkpoint stability must be fixed. The calibration needs work. But the underlying thesis is now undeniable.

The flywheel works. The path is real.

GET INVOLVED

Interested in the Rocky series? Reach out to james@transientintelligence.com.

END OF DOCUMENT

TR-2026-02 / Software governance / 2026 / 2026-05-23. Active research. Not deployment-ready.

Want to become a design partner or request more information on this work?

Become a design partner Contact us