3. Failure Patterns: Chaos, Drift, and Unowned Decisions

The minimum control loop is running. What it cannot do on its own is tell you which failure pattern is breaking it — or where to look first.

Monday, 12:50 p.m., afternoon already drags, coffee going cold in the conference room.

Day 19 of an EVT build for a handheld controller, and this is the third meeting on the same leakage-limit decision; every slip keeps a $42k fixture order on hold.

Test says one thing. The status page says another. Procurement already acted on a third version because nobody marked which number was live.

Three people in the room have calendar proof they worked the issue; the program has no record of who owns the live number or when it was last confirmed.

The program still misses the decision window.

This is where teams waste months: treating recurring operating failures as isolated incidents.


Why naming patterns matters

Most postmortems describe events. Useful postmortems name patterns.

An event is "supplier lot failed incoming check." A pattern is "we had no owner for the handoff from incoming data to build decision, so three teams acted on different states for six days."

If you only name events, you fix the last fire. If you name patterns, you remove a class of fires.

Three patterns show up together in hardware programs:

  1. Chaos: too many truths active at once.
  2. Drift: reality moved, but plan/status did not.
  3. Unowned decisions: the question is known, but no person is accountable for closing it.

They look separate in meetings. In practice, closure slips spread parallel assumptions and stale status, so the three patterns reinforce each other.

Pattern 1: Chaos

Chaos means multiple conflicting versions of the same fact are active at once.

Symptoms are easy to spot:

  • Two "latest" revisions in circulation.
  • Lab evidence in one channel, supplier evidence in another, decision in neither.
  • A confidence-green status page built from stale assumptions.

Chaos cost is immediate: rework, mis-buys, and wasted meeting cycles spent reconciling state before real problem-solving even starts.

The core mechanism is missing control of truth state. If nobody can answer "which value is live, where, and since when," the team is in chaos even when people feel organized.

Pattern 2: Drift

Drift starts after a legitimate change.

New evidence arrives. A requirement shifts. A supplier lead time stretches. Nothing dramatic happens that day. One week later the schedule, one-page status, and downstream tasks still assume the old world.

That lag is drift.

Drift cost compounds quietly:

  • Date risk is discovered late.
  • The risk register shows lower risk than the evidence supports, because nobody updated it after the change.
  • Teams optimize for targets that no longer match physics or supply reality.

The core mechanism is slow or missing propagation of change. A change without a written, owned update path is not a change. It is a pending surprise.

Pattern 3: Unowned decisions

Unowned decisions usually sound like "we keep discussing this."

The question is clear. The data is mostly available. The trade is known. But there is no named person responsible for closing the decision, documenting it, and updating dependent work.

So the decision becomes a recurring meeting topic instead of a completed program action.

Unowned decisions create both chaos and drift:

  • While closure is delayed, multiple interim assumptions spread (chaos).
  • As closure slips, the plan and status diverge from current evidence (drift).

This is why ownership is not a management preference. It is a technical control.

How the three patterns chain together

They usually run in this order:

  1. A decision is unowned, so closure slips.
  2. During the slip, teams run different assumptions (chaos).
  3. Status and schedule lag behind reality (drift).
  4. The eventual correction is expensive because many downstream actions already committed.

You can interrupt the chain at any step, but the cheapest break point is first: assign ownership and close the decision while the blast radius is still small.

Fast diagnostic you can run this week

Pick one active cross-functional issue and ask five questions:

  1. What exact question must be decided?
  2. Who is the owner of closing it?
  3. Where is the current agreed value or decision recorded — the one the team is building and testing to right now?
  4. What changed in the last week, and where is that recorded?
  5. Which downstream artifacts (a written program record) were updated after the change?

If your team cannot answer all five in under ten minutes, you are not dealing with one bad meeting. You are inside one or more of these patterns.

What to fix first

Do not try to solve chaos, drift, and ownership with three separate initiatives.


Day 19 of the handheld-controller EVT — the $42k fixture order on hold, the third meeting on the same leakage-limit decision — resolved when one owner was named for the live value. A single field was added to the existing status record: current live value, owner, and date last confirmed. The next week's review ran the five-question check in seven minutes. The fixture order cleared. The decision did not come back to a meeting.

  • Old process: three channels carrying different values, no truth-state owner.
  • Artifact changed: one field added to the live decision record — owner name, confirmed value, confirmation date.
  • Measured improvement: two repeat meetings eliminated; the $42k fixture order cleared in the same week as the ownership assignment.
  • Cost of the gap: nineteen days of delayed procurement action and a third repeat meeting that could have been the last.