12. Models and Tests That Change Decisions

Wednesday, 6:40 p.m., empty hallway outside the war room, the decision review packet contains 20 model runs and 12 test reports but still cannot resolve the thermal case temperature requirement.

The verification team produced twenty model runs and twelve test reports for the same requirement decision.

The gate decision still hangs because no owner can show baseline, delta, and effect size in one traceable artifact (a written program record).

Evidence volume is high, but decision power stays low when each run is not linked to a baseline requirement state.

Evidence must be delta-first

A model or test result is useful only relative to a named baseline artifact.

For every run record, capture a decision delta package:

cite baseline reference,
state what changed,
predict expected effect,
report observed effect,
declare decision implication.

Without this package, results become isolated artifacts and cannot form a learning sequence that changes decisions.

Naming and traceability discipline

Use one boring but enforceable traceability convention:

keep stable requirement ID linkage,
assign run and test IDs,
record revision IDs for design and setup,
stamp timestamp and owner,
store records in a location with immutable history.

This discipline prevents decision debt: review leads pay the cost at gate when they cannot compare outcomes across builds and teams.

Pair model claims with physical evidence

Models are strongest when analysts calibrate them against physical evidence. Tests are strongest when reviewers interpret them with model context and assumptions.

Use these pairing questions before changing a requirement value:

Where do model and test agree?
Where do they diverge?
Which assumptions explain divergence?
What decision can be made now despite remaining uncertainty?

The goal is not perfect correlation before every move; the goal is honest confidence for the next dated decision.

A thermal model for a power-electronics module predicted the switching-stage case temperature would hold at 83 °C at full duty — safely below the 85 °C thermal case temperature limit. The chamber test measured 78 °C steady-state. The team's first interpretation was that the chamber was measuring something different than the model. The calibration finding was narrower: the model's contact-resistance boundary condition assumed a thermal-grease application that the actual mounting jig could not deliver. The jig had been designed for a different module family. The model was correct given its boundary condition. The boundary condition was wrong given the actual assembly. One boundary-condition correction and a re-run later, the model predicted 79 °C — matching the chamber within measurement uncertainty. The requirement revised to 78 °C — a controlled update: the boundary-condition correction is the evidence, the new value is documented with a named owner, and the baseline moves.

Old process: model and chamber treated as independent confidence votes; discrepancies resolved by negotiation.
Artifact changed: the model's boundary-condition log, with the contact-resistance value traced to a direct contact-resistance measurement rather than assumed from a different module family.
Measured improvement: model-chamber delta reduced from 7 °C to 1 °C after one boundary-condition correction.
Cost of the gap: one week of "the chamber must be wrong" investigation before the boundary condition was examined.

That revision is a controlled evidence update — the requirement baseline moves only when evidence satisfies predeclared criteria and the rationale is documented.

Evidence threshold for requirement changes

Do not revise a requirement on one attractive chart without corroborating evidence.

Set requirement-change thresholds in advance:

confirm coverage conditions are met,
verify repeatability is acceptable,
show sensitivity to key variables is understood,
bound measurement uncertainty enough for the decision at hand.

If threshold is not met, the evidence owner documents what is missing in the review artifact and sets a closure date.

Common evidence failure patterns

orphan files with no requirement linkage,
results impossible to reproduce from metadata,
"best run" selection without rationale,
model updates with no calibration note,
test deltas reported without setup changes.

Each pattern inflates confidence in the artifact while leaving decision uncertainty unresolved at review.

Practical evidence package for decisions

For any decision review, provide one compact decision package:

state question to answer,
summarize baseline and delta,
rate evidence quality,
recommend a decision,
assign residual risk and next evidence step.

This package keeps technical depth available while giving exec and PM reviewers a clear, fair decision frame.

The boundary-condition correction is confirmed. The model re-run matches the chamber within measurement uncertainty. The gate decision changes from hold to proceed — one week after the chamber result that everyone initially attributed to measurement error. The requirement is 78 °C, the model predicts 79 °C, and the boundary-condition log now carries the contact-resistance value traced to the direct contact-resistance measurement. The decision packet is one page. The technical backup is three.