14. Risk and How We Define Done on Paper

Thursday, 9:40 a.m., gate prep table covered in coffee rings, the gate review package marks all actions closed while two critical failure modes remain open in current lab evidence.

The gate review package artifact states "all actions closed."

In the lab log, verification and reliability owners still track two unresolved critical failure modes with no accepted closure evidence.

The paper package says done, but the program state is not done.

That gap is where late cost, schedule slips, and credibility damage enter through integration and executive review.

This is the gate-closure logic that depends on calibrated chamber evidence and a controlled requirement revision â€” in the canonical case, the closure that must happen before the thermal case temperature requirement and the supplier CpK risk row can close at the release gate.

"Done on paper" must match decision reality

Artifacts are useful only when a named owner uses them to change a specific decision and downstream behavior.

A credible "done" statement is a decision object that must:

state explicit criteria (gate chair's pass condition),
cite evidence references (gate chair's confidence source),
include owner sign-off,
declare residual risk (gate chair's accept-or-escalate input),
confirm downstream updates are complete.

If any element is missing, "done" is only a status label and fails as a decision condition at gate.

Minimum credible artifact standard

For each risk or verification artifact, require this minimum metadata:

name artifact owner,
record revision and date,
link requirement or risk ID,
state decision linkage (what choice this supports),
report current confidence and known gaps.

If the artifact has no linkage, the gate chair cannot use it for a decision.

Artifact evolution across builds

Risk artifacts should evolve with new evidence across builds, not reset at each phase boundary.

Each artifact revision should show four deltas:

state what changed,
record what remains uncertain,
retire concerns with evidence and rationale,
declare what new risk emerged.

Resetting artifacts to "clean" each phase destroys institutional memory that gate and program leaders need.

Busywork vs decision work

Reject artifact packages that:

are completed only to satisfy format,
contain generic failure text with no operating context,
omit a named owner or closure date,
cannot point to a decision they changed.

Keep artifact packages that:

expose uncertainty honestly,
map to active decisions,
summarize tradeoffs in language each tier can evaluate without follow-up questions.

A gate package at a large industrial program marked all thermal actions closed. Two critical failure modes â€” thermal runaway propagation and connector derating under field humidity â€” were still under active investigation in the test lab. The disconnect: the action tracker was PM-owned and tracked action-item status by ID. The failure-mode log was engineering-owned and tracked investigation state by failure mode. No artifact required a closure entry to cite a failure-mode log entry rather than just an action ID. One internal audit caught the gap three days before the gate review. If the gate had proceeded, the program would have released a build configuration while two unresolved failure modes were still in active investigation â€” a six-to-eight-week integration cost if either materialized in production.

Old process: closure based on action-item status, not failure-mode investigation status.
Artifact changed: a requirement in the gate-packet template that closure evidence must cite the failure-mode log entry (not just the action ID) and state the investigation disposition.
Measured improvement: three days of gate-prep found the gap; the alternative was 6–8 weeks of post-release investigation.
Cost of the gap: the organization had been running this way for two gate cycles and had closed gates with unresolved failure modes at least twice before the audit caught it.

Tie-ins to gates and one-page truth

Risk artifacts should not stay in a technical silo; they must drive shared program controls.

They must feed four decision artifacts:

update gate decisions,
update risk register entries,
update one-page narrative deltas,
trigger escalation with named owners.

If this propagation is weak, paper confidence drifts from the real program state and misleads leadership decisions.

Acceptable implementations for the failure-mode tracker include a dedicated FRACAS (Failure Reporting, Analysis, and Corrective Action System) when the organization already has one, a separate backlog slice that models failure modes as first-class records linked bidirectionally to mitigation actions via a keyed join field, a controlled spreadsheet with a named failure-mode ID column that action items must reference at closure, or a markdown file under version control with explicit closure dispositions. The OS cares that the failure-mode investigation status and the action-item closure status are cross-referenced â€” not that they live in the same tool.

Practical "done" review in 15 minutes

For each high-impact closure item, ask:

What was the done criterion?
What evidence satisfies it?
What risk remains despite closure?
Which dependent decisions changed because of this closure?

If questions 3 and 4 are blank, closure is cosmetic â€” do not treat as done; return the item to the evidence owner before the gate proceeds.

Done means the failure mode is dispositioned, not the action item is closed.