Skip to content

The Self-Running System

The Bottleneck Is You

Your tests catch a regression. Your eval harness flags an inaccurate summary. A linting rule fires on a new file. The system is doing its job: it found the problems.

Now what? You read the failure output. You diagnose the cause. You tell your AI assistant to fix it. You review the fix. You run the checks again. You move to the next failure.

Multiply that across five failures, three workstreams, and a growing test suite, and you see the bottleneck. The measurement is automated. The response is manual. Every failure waits in a queue, and the queue is you.

The advanced track addresses this directly: what if the system that finds the problem also fixes it?

The Pipeline

The core idea is a pipeline that connects detection to resolution without you as the middleman at every step.

  1. A check fails. A test, eval, or linting rule catches a problem in CI. This already happens if you have automated checks running on every change.
  2. A work item is auto-created. Instead of the failure sitting in a log for you to read, the pipeline creates an issue with the failure details, trace data, and enough context for someone (or something) to pick it up.
  3. An agent picks it up in an isolated worktree. A git worktree gives the agent its own copy of the codebase. It can make changes, run tests, and iterate without affecting anyone else's work.
  4. Quality gates verify the fix. Before anything merges, the full pipeline runs against the fix: tests, evals, linting, type checks. The same standards that caught the original failure now verify the repair.
  5. You review the result, not the process. The issue, the diagnosis, the fix, and the verification are all done by the time you look at it. Your review is "did the system handle this correctly?" not "let me sequence every step."

In Your AI Assistant

A git worktree creates a separate working directory linked to the same repository. Each worktree has its own branch and files but shares the same git history. For parallel AI development, this means each agent gets its own isolated environment. Changes in one worktree do not affect another until you explicitly merge. Your AI coding assistant can orchestrate worktrees, dispatching work to isolated environments while you stay in one conversation.

The practical starting point is smaller than the full vision. You do not need all five steps on day one. Start by having your CI pipeline create issues on failure. That alone saves you the triage step. Then have your AI assistant pick up those issues. Then add quality gates to verify the fixes. Each layer you add reduces how much of the loop requires your direct involvement.

The Ratchet Effect

Here is why this pipeline gets better over time instead of just maintaining the status quo.

Every failure that gets fixed becomes a permanent regression guard. The test or eval that caught the original problem stays in the suite. If a future change re-introduces the same issue, the check catches it again. The system never regresses past a fixed failure.

Before the Pipeline With the Pipeline
When a check fails You read the log, diagnose, tell AI to fix it The pipeline creates a work item and an agent picks it up
When a fix lands You manually verify it worked Quality gates verify automatically before merge
When coverage grows More checks = more manual triage for you More checks = more problems caught and resolved automatically
Over time Your workload grows with the system The system gets strictly better; your workload stays stable

This is the ratchet effect: every fixed failure adds a regression guard, every new eval scenario expands coverage, and the system can only move forward. You are not just fixing bugs. You are building a system that accumulates correctness.

Try It

You do not need the full pipeline to start thinking like a Director. Pick one part and design it:

Ask your AI coding assistant:

I want to design a quality pipeline for my project.

Right now, when a test or check fails, I manually diagnose and fix it.
I want to move toward a system where failures automatically become
trackable work items with enough context to act on.

Given my project, help me:
1. Identify which checks I already have (tests, linting, etc.)
2. Design what an auto-created work item would look like for each
   type of failure (what context would it need?)
3. Suggest the first step I could implement today
Do I need all of this to get started?

No. The full pipeline (auto-created work items, agent-driven fixes, quality gates, ratcheting coverage) is the destination, not the starting point. Start with one piece: better failure messages that include enough context to act on. Then add issue creation. Then add agent pickup. Build incrementally, and let each layer prove its value before adding the next.

Key Insight

The self-running system connects detection to resolution: CI failures become work items, agents pick them up in isolated worktrees, quality gates verify the fixes, and every resolved failure becomes a permanent regression guard. This is the ratchet effect: the system gets strictly better over time because it can never regress past a fixed failure. Your role shifts from sequencing individual fixes to designing the pipeline that handles them.