PAPERTR-2026-03
Physical infrastructure / 2026Rocky-DAM-α
Rocky-DAM-α is a 6M-parameter model that reads physical sensor data alongside authority and governance evidence to decide whether infrastructure actions should proceed, be blocked, or halt. Tested on physical situations it had never seen during training, it showed novel decision making.
2026-05-22
Most infrastructure already has senses. Dams, grids, substations, factories, turbines. They read level, pressure, temperature, structural stress, and dozens of other signals. They know what is happening.
They do not know whether a behaviour should happen.
Rocky-DAM-α is the bridge between sensing and acting. A 6M-parameter model trained to read physical sensor state alongside governance evidence and decide, for each proposed action: allow, deny, or freeze.
THE FINDING
Sensors give machines perception. Rocky gives them operational judgement.
The knowledge problem
Dam operations, water treatment, district heating, grid substations. These are not industries that attract large graduate cohorts or venture capital. They are run by small, experienced teams who have spent careers learning how a specific physical system actually behaves. When the reservoir rises in a particular pattern. What an anomalous piezometer reading means in context. When to act and when to wait.
Those operators are retiring. Across Europe and globally, infrastructure workforces are ageing faster than replacements are entering the field. Water treatment workforces are thinning. Dam operations in many countries are managed by teams of three or four in remote locations, averaging well past fifty. The next generation of operators is not arriving at the same rate. These are qualitative patterns observed across the sector; sector-wide attrition figures vary significantly by country and operator type.
The institutional knowledge leaves with them.
SCADA systems capture sensor readings. Maintenance logs capture what actions were taken. Neither captures why. The decision logic, the threshold intuition, the understanding of when a sensor reading is routine and when it is the first signal of something serious, lives in the heads of the people who built it. When they leave, it goes with them.
Rocky-DAM-α is one approach to that problem. Trained on approximately 3,000 synthetic governance receipts, structured records combining physical sensor state, authority fields, and governance decisions generated from real infrastructure schemas, the model learns something closer to decision logic than sensor logging does. Every decision the model makes produces a receipt: what it read, what it decided. Over time, those receipts are the training substrate for the next model. The expertise is not documented. It is demonstrated, and the demonstration is captured.
The goal is long-horizon autonomy for physical systems that operate continuously, across conditions no single operator can monitor around the clock. Rocky handles the routine decision load. Operators define the policy, handle the exceptions, and correct the edge cases that matter. Those corrections become the next training cycle. The system runs further because the operator is in it, not despite them.
What Rocky-DAM-α is
Rocky-DAM-α extends the Rocky-α architecture with a sensor block. Where Rocky-α reads software-action evidence only, Rocky-DAM-α reads physical sensor state alongside governance fields and produces the same three outputs: allow, deny, freeze.
The sensor fields cover the instrumentation families present on real dam infrastructure: reservoir level and trend, piezometer state, seepage, structural readings, environmental conditions, and telemetry freshness. Each continuous sensor reading is encoded into discrete symbolic tokens and concatenated into the same receipt stream as the governance fields. One model in the middle. No separate sensor encoder. No architectural complexity beyond what Rocky-α already demonstrated.
The governance fields remain unchanged from Rocky-α: lease validity, scope authority, operator certification, action class. Governance is upstream of physics. An action can be physically safe and still be denied if the authority is invalid. An action can have valid authority and still be frozen if sensor evidence is unsafe or stale.
The model must read both layers and produce a single decision.
Evaluation
Aggregate results across the held-out evaluation set, on scenarios entirely absent from training:
Overall accuracy
Freeze accuracy
Unsafe-proceed (safe checkpoint)
With Trace policy layer
0% unsafe-proceed
Full results, including direct comparison against GPT-4o-mini and GPT-4.1-mini at roughly 1,500x the parameter count, are in TR-2026-01.
Correct on scenarios it had never seen
The aggregate numbers matter less than what they represent. Rocky-DAM-α was tested against physical scenario families it had never encountered during training: different emergencies, novel combinations of sensor state and governance evidence, situations with no template.
It did not fail. It did not guess.
The model applied the governance rules it had learned to inputs it had never seen. Expired lease: deny, regardless of physical state. Stale or degraded sensors: freeze, regardless of valid authority. Authority intact, sensors clean: allow.
A standard 6M-parameter model memorises its training distribution. Rocky-DAM-α extracted the underlying rules and carried them into new territory. That distinction has a name: structural extrapolation. Not retrieval of a seen example. Inference from a learned principle.
Finds the failure, applies the right rule
Structural extrapolation would mean little if the model was attending to the wrong variables. The deeper result is that Rocky-DAM-α reads governance fields and sensor fields independently and applies the correct rule for each.
In designed evaluation scenarios isolating a single governance failure, the model identified the failing field correctly:
- Lease expired, all else valid: deny.
- Action out of scope, all else valid: deny.
- Sensors degraded or offline, governance valid: freeze.
Each of these requires the model to read one field, apply its rule, and output the correct decision while ignoring the other fields that are intact. A model that had learned surface correlations would fail these cases. Rocky-DAM-α did not.
The 6M model also distinguished freeze from deny on sensor-degradation scenarios where a larger model collapsed both into deny. More parameters did not produce sharper per-class discrimination. It produced less of it.
Scaling did not help
The natural question after a promising 6M result is whether more parameters would do better. The answer, tested empirically, is no.
A variant trained at roughly 50% greater capacity produced the same failure mode as the 6M model, delayed by approximately 30 training steps and with slightly lower accuracy at the safe checkpoint. It showed one briefly stable point where allow emerged with zero unsafe-proceed, but it could not sustain it. By the end of training it had not recovered.
The 6M result is not a parameter ceiling. It is a methodology finding. The remaining gap is calibration: the model can make allow decisions without unsafe-proceed, but not stably across the full distribution of physical scenarios. Scaling does not fix calibration. The 6M model is the result.
This matters for the broader thesis. A governance model for physical infrastructure does not need to be large. It needs to be right about the right things. Rocky-DAM-α has demonstrated that 6M parameters is sufficient to extract governance rules and apply them to novel physical scenarios. The next problem is holding that ability across the full range of conditions, not increasing parameter count.
Where this fits
Rocky-DAM-α sits between SCADA and the operator.
Existing dam infrastructure already collects sensor data, routes it to dashboards, and triggers threshold-based alerts when levels or pressures exceed predefined limits. That layer does not decide whether a behaviour should fire. It reports. It alerts. It escalates.
Rocky-DAM-α decides. Given the current sensor state, given the current governance evidence, given the proposed action: allow, deny, or freeze. The output is a single authoritative decision, not a dashboard update.
This is not a replacement for SCADA. It is the layer that does not currently exist: learned decision authority sitting between the sensor feed and the actuator, conditioned on both physical state and governance rules, receipted for audit.
Every decision produces a receipt. What the model read, what it decided, and why. That receipt is the audit trail. It is also the next training example.
Limitations
- Synthetic physical state. Rocky-DAM-α was trained and evaluated on receipt-shaped sensor evidence, not live SCADA feeds. The sensor encoding adapts continuous readings to discrete symbolic tokens. Whether that abstraction holds under the noise and edge cases of real instrumentation requires field validation.
- Calibration gap. Safe checkpoints produce zero unsafe-proceed but rarely predict allow. Expressive checkpoints recover allow predictions but introduce unsafe-proceed. A stably calibrated checkpoint that does both simultaneously has not yet been found.
- No temporal reasoning. Sensor readings are encoded as snapshots, not time series. The model reads current state, not trends across multiple readings. Trend detection (rising piezometer pressure over hours) is encoded into the symbolic token, not learned from raw time-series data.
- Single-site instrumentation. The sensor schema is grounded in dam instrumentation literature but has not been validated against a specific live site. Threshold mappings will vary by dam class, age, and design standard.
Deployable?
No. Not yet.
Rocky-DAM-α is a research model. The calibration gap is the blocking issue. A physical infrastructure governance model that cannot stably approve valid actions is not operationally useful. And a model that introduces unsafe-proceed, even at low rates, is not safe enough to be trusted near an actuator.
What Rocky-DAM-α demonstrates is the architecture. A small model, trained on receipt-shaped evidence combining governance fields and physical sensor state, can learn governance rules and apply them to novel physical scenarios. It can distinguish between a governance failure and a physical safety event and produce the correct response to each. It can do this at 6M parameters, with fast training cycles, on hardware that does not require a data centre.
The question is not whether this architecture is viable. Rocky-DAM-α has answered that. The question is how far the calibration needs to improve before it can operate in a real environment, and what the path to that improvement looks like.
Most machinery already has senses. It is waiting for judgement.
GET INVOLVED
Interested in applying Rocky to physical infrastructure? Reach out to james@transientintelligence.com.
END OF DOCUMENT
TR-2026-03 / Physical infrastructure / 2026 / 2026-05-22. Active research. Not deployment-ready. Reach out via Become a design partner if you want to discuss applying this work in your domain.