19. Training, Cadence, and Failure Recovery
Friday, 2:18 p.m., the building going quiet at the edges, the habits already starting to slip before people scatter.
The rollout looked good for six weeks. Then one lead change, one urgent supplier issue, and one skipped review cycle later, old behavior was back. The relapse analysis found three gaps: the skipped review had been covered by a handoff email, not a backup-trained lead; the lead change had no ownership-transfer protocol; the supplier issue had triggered an exception meeting that displaced the review cadence without a recovery date.
Three controls were added: a named backup DRI for each cadence slot, a thirty-day cadence audit after any lead change, and an explicit recovery trigger — if any scheduled review is displaced, the next one is held within five days regardless of program pressure. Six months later, a second lead change hit the same team. The cadence held.
- Old process: rollout with no durability controls — one personnel change and one supplier escalation collapsed the review cadence; no backup-DRI assignment, no recovery trigger.
- Artifact changed: backup DRI assignments per cadence slot, a thirty-day cadence audit following any lead change, and a five-day recovery trigger for displaced reviews.
- Measured improvement: a second lead change six months later did not produce a relapse; the cadence held through the disruption.
- Cost of the gap: six weeks of apparent adoption followed by full relapse — all adoption work restarted, plus a program quarter of uncontrolled drift during the collapse.
Training is behavior transfer, not slide transfer
New people do not need the full theory first.
They need operational moves in context.
Onboarding minimum:
- ownership map and closure expectations,
- one-page truth mechanics and source links,
- risk trigger/escalation behavior,
- decision record standard.
Teach these on live program threads, not in abstract slides only.
Cadence is the enforcement mechanism
Without cadence, standards drift into preference.
Minimum cadence set:
- weekly decision/risk sync (updated decision log and risk register entries),
- fixed one-page refresh rhythm (published one-page status before each review),
- gate prep/check with artifact traceability (gate artifact delta from the record system),
- monthly retro on control-loop failures (playbook update with named control adjustments).
Cadence should be predictable and short enough to survive workload spikes.
A PM who has a reliable weekly cadence stops pinging for status between reviews. Not because they stopped caring about the program — because the cadence has given them a predictable artifact update they can trust. The engineering lead who has locked a working cadence has bought themselves a working environment.
Failure recovery without blame loops
When the system slips:
- identify the control that failed (owner, record, gate, risk, status),
- identify why it failed (capacity, ambiguity, missing authority, poor handoff),
- restore minimum behavior on one active thread,
- capture adjustment in playbook.
Do not frame recovery as "who failed process."
Frame it as "which control failed under what condition."
Metrics for real adoption
Activity metrics mislead when closure and mismatch outcomes stay flat.
Track these signals across consecutive cycles — a single-cycle improvement can be noise; a sustained trend is the signal:
- decision closure cycle time,
- reopen rate on closed decisions,
- one-page vs source mismatch frequency,
- late risk discovery rate,
- unresolved ownerless decision count.
If these metrics improve for two reporting cycles, behavior is changing.
If only template completion improves while outcome metrics stay flat, adoption is cosmetic.
Turnover resilience
Systems fail at role transitions unless handover is explicit.
Require handover package for key roles:
- active decisions and owners,
- top risks and triggers,
- current one-page state,
- unresolved escalations,
- next gate commitments.
This turns personnel changes into manageable events.
Keep the bar practical
Overly rigid process breaks under load spikes.
Overly loose process breaks when ownership or data is ambiguous.
Durable cadence means:
- strict on ownership and traceability,
- flexible on meeting format and local workflow details.
Eight weeks later, same program. The lead changed. The supplier issue hit. The review cycle slipped by one day, not three weeks — because the one-page had been published before Friday, the risk register named an owner on the supplier item, and the incoming lead read both before the first review. The handover package was on the shared drive: active decisions, top risks, current one-page, unresolved escalations, next gate commitments. - Old process: cadence habits lived in people, not program artifacts — when people changed, habits reset.
- Artifact changed: the handover package and the locked one-page refresh cadence, converted from personal routine to documented system requirement.
- Measured improvement: successor lead fully operational within one cycle instead of six weeks.
- Cost of the gap: six weeks of apparent stability built on individual memory, not system.