13. Fleet as Instrumented Experiment

Thursday, 7:02 a.m., five units on the bench, cross-site call connecting, no two run logs with the same fields.

By Friday, the team has stories about each one and no comparable record of what was different between them.

That is a missed experiment, not a successful fleet.

Fleet mindset: learning asset, not milestone artifact

A fleet build is valuable when each unit produces usable evidence.

Treating prototypes as proof-by-existence wastes the most expensive learning window in the program.

Fleet intent should be explicit before build:

Which decisions this fleet is meant to inform,
Which requirements/risks each unit is probing,
Which measurements and conditions must be captured.

A unit's evidence is gate-complete when the record answers all three from the record alone — without asking the engineer who ran the test.

Minimum per-unit record

For each serial unit, capture:

configuration (revision/state),
manufacturing context (process/material lots),
test conditions,
results against requirement IDs,
anomalies and disposition,
owner/date for record completeness.

No per-unit record means no defensible cross-unit comparison.

Build matrix, not pile of units

Plan fleet variation deliberately:

what to hold constant,
what to vary,
which interactions are intentionally sampled,
which units are designated for destructive/edge tests.

If all units are "nominal," fleet learning is shallow and late surprises remain likely. A common miss: a five-unit fleet built to identical nominal conditions produces five confirmations of one operating point rather than a sample of the variation the program will face at scale.

Connect fleet data to program controls

Fleet evidence must update:

requirement confidence,
risk register entries,
one-page status deltas,
next decision queue.

Unit 03 measured a leak rate well above Unit 01 at hot soak, moved the seal leak requirement confidence to red, and changed the next gate call from go to hold pending seal redesign evidence.

If fleet data stays in test notes, the program keeps running on stale assumptions.

At a battery-adjacent program, five prototype units were built and instrumented over four weeks. Unit 03 measured a leak rate more than twice Unit 01's baseline at hot soak — not a sensor artifact, the same delta confirmed across multiple test runs. The discrepancy sat in the test engineer's notes for nearly two weeks before it reached the program's risk register, because there was no per-unit record requirement that forced the delta into the risk log. The thermal requirement moved to red only after the team had already placed tooling orders on the assumption that Unit 01 was representative.

The program added a per-unit record requirement and a weekly fleet review cadence to its program controls (the program's standing records). The next fleet campaign surfaced three requirement deltas in the first review week rather than the fifth. The eleven-day gap in the first campaign had cost tooling decisions made on unrepresentative data — decisions that required a correction order once the real distribution of results was visible. - Old process: no per-unit record requirement; unit-level discrepancies captured in engineer notes with no forced path to the risk register.

Artifact changed: a per-unit record requirement and weekly fleet review cadence added to program controls.
Measured improvement: the next fleet campaign surfaced three requirement deltas in the first review week rather than the fifth.
Cost of the gap: eleven days of unlogged unit variance cost tooling decisions on unrepresentative data — a correction order required once the real distribution was visible. An eleven-day lag between test observation and risk log entry costs one tooling decision at the fleet level; the same lag at the gate-closure level costs a release cycle.

Failure patterns in fleet execution

Common misses and what catches each:

unit history lost after rework — per-unit record requires a rework entry before the unit re-enters evidence
instrumentation setup drift between units — per-unit test-conditions field, visible at the evidence completeness check
anomalies documented but not tied to decision owners — step 3 of the weekly review (unresolved anomalies and owners)
test logs decoupled from build configuration — step 1 of the weekly review (unit-by-unit evidence completeness check)

Without the pairing, these failures make apparently rich data unusable for high-stakes decisions.

Practical weekly fleet loop

During fleet campaigns, run a weekly review:

Unit-by-unit evidence completeness check,
requirement/risk deltas from new data,
unresolved anomalies and owners,
decision queue updates for next gate.

Short, strict, repeatable. Each cycle ends with a documented delta note: which requirement or risk rows moved and what evidence moved them.

Why this sits after model/test chapter

Delta and traceability discipline at component and subsystem level scales directly into fleet work — the same logic, applied across physical units in real variation.

When done well, fleet turns uncertainty into bounded decisions before major spend.

If your weekly fleet review this week can name one risk-register row that moved because of a specific unit's measured delta, the campaign is doing its job. If it cannot, you are collecting prototypes, not evidence.

The Hardware OS