Self-Healing
Three layers of automated recovery -- from quick one-shot fixes to deep root-cause analysis. Up to 6 fix attempts and 2 healing cycles before requesting human intervention.
Three Layers of Recovery
Each layer escalates from the previous. Most failures resolve at Layer 1. Complex issues reach Layer 2. Only persistent failures hit Layer 3.
Escalation path from first-responder fix to deep healing to coordinator retry.
Layer 1: dx-step-fix
First responder. ONE fix attempt per invocation — apply a minimal fix, re-run the test command, report success or failure. No refactoring, no exploration.
Layer 2: dx-step-fix (escalation)
Triggered after 2 consecutive fix failures. dx-step-fix escalates to root cause analysis using extended thinking (ultrathink). Creates corrective steps with a DIFFERENT approach.
Layer 3: Coordinator
The coordinator loop (dx-step-all, dx-agent-all) orchestrates the retry cadence: fix, fix, escalate, execute corrective steps, repeat up to 2 escalation cycles.
dx-step-fix -- First Responder
One-shot fix attempt. Read the error, apply a minimal correction, re-verify, report.
Strategy
- Read the blocked step’s error from implement.md
- Apply a minimal fix (not refactoring)
- Re-run the step’s test command
- Success: mark step as
done - Failure: mark step as
blockedwith diagnosis, STOP
Superpowers Integration
Optionally invokes superpowers:systematic-debugging for structured 4-phase
diagnosis (Observe, Hypothesize, Test, Fix) when the error is ambiguous or multi-layered.
Falls back to inline diagnosis when superpowers is not installed.
One Attempt Only
dx-step-fix makes exactly ONE attempt per invocation. If the fix does not resolve the error,
it marks the step as blocked and stops. The coordinator decides what happens next.
dx-step-fix (escalation) -- Deep Root-Cause Analysis
After 2 consecutive fix failures, the healer takes over with extended thinking and a fundamentally different approach.
Type A: Step Blocked
Step has Status: blocked with a diagnosis. Heal creates ONE corrective step
with distinctive numbering: Step 3h (first cycle), Step 3h2 (second cycle).
The corrective step MUST use a different strategy from the original.
Type B: Review Failed
Full code review (dx-step-verify) failed after 3 cycles. Heal groups remaining Critical/Important
issues by file and creates numbered corrective steps: R1, R2, etc.
Second iteration uses b suffix.
Extended Thinking
Uses ultrathink mode for deep reasoning about the root cause. The healer sees the full error context, previous fix attempts, and the original step intent.
Never Writes Code
The healer NEVER writes source code directly. It only creates new steps in implement.md. The coordinator then executes those steps normally through dx-step.
Different Strategy
If the first fix failed, the corrective step MUST use a fundamentally different approach.
Returns healed (continue) or unrecoverable (stop).
Coordinator Loop
The full recovery flow orchestrated by dx-step-all and dx-agent-all.
Execute, fix (x2), heal, execute corrective steps, repeat. Maximum 2 healing cycles before human intervention.
Maximum Recovery Attempts Per Step
- 2 fix attempts (dx-step-fix)
- 2 healing cycles (dx-step-fix (escalation))
- Each healing cycle: corrective step + 2 more fix attempts
- Total: up to 6 fix attempts + 2 heal analyses
- 3 review-fix cycles (dx-step-verify)
- 2 healing cycles (dx-step-fix (escalation) in coordinator)
- Each healing: corrective steps + rebuild + re-review
- Total: up to 9 review cycles
dx-step-verify -- The 6-Phase Gate
Runs after all steps complete, before commit. Five pre-review checks followed by deep code review.
Sequential quality gates. Each phase has max 2 fix attempts before escalating.
Secret Scan (Phase 4)
IMMEDIATE STOP if secrets are found. No override, no retry, no healing. The pipeline halts and requires human intervention to remove the leaked secret.
Code Review (Phase 6)
Uses dx-code-reviewer agent (Opus model). Confidence threshold of 80 on a 0-100 scale. Only reports issues the reviewer is CERTAIN about. Severity: Critical > Important > Minor. Review-fix loop runs max 3 cycles.
Final Verification Gate
After the review-fix loop concludes, dx-step-verify optionally invokes
superpowers:verification-before-completion for a final cross-cutting check —
confirming all acceptance criteria are met before marking the step done.
Healing Data Captured
What gets recorded during recovery and where it lives.
| Data Point | Where Stored | Used For |
|---|---|---|
| Step status (done/blocked) | implement.md | Flow control |
| Block diagnosis | implement.md (**Blocked:**) | Step-heal input |
| Fix attempts count | Coordinator memory (not persisted) | Strike counting |
| Healing cycles count | Coordinator memory (not persisted) | Max cycle enforcement |
| Review issues | step-verify output (transient) | Fix prioritization |
| Corrective step numbering | implement.md (3h, R1, etc.) | Audit trail |
Known Gaps
Fix/heal counts are only in coordinator memory — lost between sessions. No pattern aggregation, no success rate tracking, and no cross-story learning. These gaps are addressed by the self-learning system (see Learning and Feedback page).
Recovery Statistics
Maximum recovery attempts before requesting human intervention.
Per-Step Execution Failures
- 2 fix attempts (dx-step-fix)
- 2 healing cycles (dx-step-fix (escalation))
- Each healing cycle: new corrective step + 2 more fix attempts
- Total: up to 6 fix attempts + 2 heal analyses
Code Review Failures
- 3 review-fix cycles (dx-step-verify)
- 2 healing cycles (dx-step-fix (escalation) in coordinator)
- Each healing: corrective steps + rebuild + re-review
- Total: up to 9 review cycles
Superpowers Integration
Optional structured methodology hooks that enhance debugging and verification.
systematic-debugging (dx-step-fix)
When the error is ambiguous or multi-layered, dx-step-fix optionally invokes
superpowers:systematic-debugging for a structured 4-phase diagnosis:
- Observe — gather all error context
- Hypothesize — form candidate root causes
- Test — verify each hypothesis
- Fix — apply the validated correction
Falls back to inline diagnosis when superpowers is not installed.
verification-before-completion (dx-step-verify)
After the review-fix loop concludes, dx-step-verify optionally invokes
superpowers:verification-before-completion for a final cross-cutting check —
confirming all acceptance criteria are met before marking the step done. This acts as a
last safety net before commit.
Soft Dependency
All superpowers hooks use a soft-dependency pattern: if the superpowers plugin is installed, the structured methodology is invoked via the Skill tool. If not installed, the skill falls back to condensed inline guidance. No configuration needed — detection is automatic.
Gaps Identified
Self-learning opportunities discovered through healing analysis.
| # | Gap | Impact |
|---|---|---|
| 1 | Fix/heal counts not persisted | Only in coordinator memory, lost between sessions |
| 2 | No pattern aggregation | Same fix types applied repeatedly without learning |
| 3 | No success rate tracking | Cannot identify which fix strategies work vs fail |
| 4 | No cross-story learning | Healing insights from Story A not available for Story B |
| 5 | Corrective step quality not measured | Cannot tell if heal creates better or worse steps |
| 6 | Review issue patterns not tracked | Same issues may recur across stories |
Addressed by Self-Learning
These gaps are the motivation for the self-learning architecture. The .ai/learning/ directory,
/dx-learn, and /dx-retro skills address gaps 1-6 by persisting fix patterns,
aggregating success rates, and enabling cross-story knowledge transfer. See the Learning and Feedback page.