Starter guide · 2026Claude CodeCodexAgents

Loop engineering, explained.

Stop being the person who prompts the AI. Design the system that prompts it for you. This is the practical version of the idea everyone is quoting and almost nobody can explain.

Fig. 0 — The shape of every loop. The decision in the middle is the agent's, not yours.

Fig. 01 — The Shift

The leverage point moved

For two years, working with a coding agent meant: write a prompt, read the result, write the next prompt. You were the loop. You held the tool the whole time.

That phase is ending. The new job is to build a small system that finds the work, hands it to the agent, checks the result, records what happened, and decides the next move. You design it once. It prompts the agent from then on.

“You should be designing loops that prompt your agents.”

Peter Steinberger, creator of OpenClaw

“My job is to write loops.”

Boris Cherny, creator of Claude Code

Plain definition: a loop is a small system that keeps an agent working without you prompting it every 30 seconds. It gives the agent a goal, checks what it produced, decides if it is done, and if not, prompts it again.

The skeptic line is “it's just a cron job with a hat on.” Half right. The schedule is cron. The difference is the decision in the middle: a model looks at the current state and chooses the next action. Cron runs a fixed script. A loop runs judgment.

Fig. 02 — The Lineage

How we got here, in 60 seconds

People talk past each other because “loop” hides several different things. Here is the ladder.

Stage 1 · The Ralph loop

Keep going

A dead-simple loop that feeds the agent the same task over and over so it doesn't stop after one answer. Crude, but it meant you stopped typing “continue” every five minutes.

Stage 2 · Goal loops

Keep going until DONE is true

Instead of “keep going,” you define what done means: all tests pass, lint is clean. A separate model checks completion, so the agent that wrote the code isn't the one grading it. This is /goal in Claude Code and Codex.

Stage 3 · Orchestration loops (now)

Loops that run other agents

A loop that wakes on a schedule, checks GitHub, opens an isolated worktree, sends one agent to build and another to review, runs the tests, opens the PR, and writes down what happened so tomorrow's run resumes instead of restarting. This is what Boris and Peter actually mean.

Fig. 03 — The Test

Should you even build one?

Honest answer: most people don't need a loop yet. Loops earn their cost under four conditions. Miss one and the loop costs more than it returns. Check yourself:

The task repeatsAt least weekly. A loop amortizes its setup across many runs. A one-time job is faster with a good prompt.Verification is automatedA test suite, type checker, linter, or build can fail the work without you in the room. No gate, no loop.Your token budget can absorb wasteLoops re-read context, retry, and explore. That burns tokens whether or not the run ships anything.The agent has real toolsLogs, a reproduction environment, the ability to run the code it writes. Without that, it iterates blind.

Check all four that apply → your verdict appears here.

One more hard rule: every loop needs a hard stop. A max iteration count, a token budget, or a time limit. Without one, the loop runs until you notice the bill.

Fig. 04 — The Building Blocks

The five pieces, plus memory

Claude Code and Codex both ship all five now. The names differ slightly, the capability is the same. A real loop uses all of them.

01 / AUTOMATIONS

The heartbeat

Scheduled runs that fire on a cadence or an event. /loop re-runs on a timer. /goal keeps going until a condition you wrote is actually true. This is what makes a loop a loop instead of one run you did once.

02 / WORKTREES

Parallel without chaos

A separate working directory on its own branch. Two agents editing the same files is the failure mode. Worktrees mean one agent's edits literally cannot touch the other's checkout.

03 / SKILLS

Project knowledge, written once

A SKILL.md holding your conventions, build steps, and “we don't do it like this” rules. Without skills, the loop re-derives your project from zero every cycle. With them, it compounds.

04 / CONNECTORS

Hands on your real tools

Built on MCP. The difference between an agent that says “here's the fix” and a loop that opens the PR, updates the ticket, and pings Slack when CI goes green.

05 / SUB-AGENTS

Maker vs. checker

The single most useful structure in a loop. The model that wrote the code is too nice grading its own homework. A second agent with different instructions catches what the first one talked itself into.

+1 / MEMORY (THE STATE FILE)

The agent forgets. The repo doesn't.

A markdown file or board that lives outside the conversation and holds what's done, what's in progress, and what was learned. Sounds too dumb to matter. It's the spine of every working loop: tomorrow's run resumes instead of restarting.

Fig. 05 — Build It

Your first loop: the minimum viable version

Don't start with a swarm. Start with four parts, in this order. Skipping ahead is how loops fail.

Get one manual run reliable

Prompt the task by hand until the agent does it well, start to finish. If it can't do it once with you watching, a loop just automates the failure.

Turn it into a skill

Write the context, rules, and steps into a SKILL.md so the loop never re-derives them.

Add a gate and a state file

One automated check that can fail bad work (tests, build, lint). One STATE.md the loop updates after each run.

Wrap it in a loop, then schedule it

Use /goal with an objective stop condition, then put it on a cadence with /loop or a scheduled task.

Copy-paste starter · Claude Code

# Babysitter loop (Boris Cherny's own starter pattern)
/loop babysit all my PRs. Auto-fix build issues, and when comments
come in, use a worktree agent to fix them.

# Goal loop with an objective stop condition
/goal All tests in test/auth pass and lint is clean.
Scan src/auth for failures, propose fixes in a worktree,
open a draft PR when the goal condition holds.
Stop after 3 failed attempts.

Copy-paste starter · STATE.md template

# Loop state · ci-triage

## Last run
date · what was found · what was done

## In progress
- branch-name · current status

## Escalated to a human
- the things the loop could not handle

## Lessons learned
- write rules here so the next run doesn't repeat mistakes

## Stop conditions met
- when and how the goal was verified

The metric that matters: cost per accepted change. Not tokens spent, not tasks attempted. If you're rejecting more than half of what the loop ships, you're doing the review work the loop was meant to save you from.

Fig. 06 — Pick the Right Job

Good first loops vs. loops to avoid

The shape of a good first loop: repetitive, machine-checkable, low blast radius. Anything where “done” is a judgment call still needs a human in the chair.

Good first loops

CI failure triage: nightly scan, classify causes, draft fixes for the easy ones
Dependency bump PRs: weekly scan, test compatibility, open PRs
Lint-and-fix passes on every PR
Flaky test reproduction until a theory survives
Issue-to-PR drafts on code with strong tests

Keep a human in the chair

Architecture rewrites
Auth or payments code
Production deploys
Vague product work
Anything where “done” is an opinion

Fig. 07 — Failure Modes

How loops quietly burn money

An agent that can't verify its own work isn't autonomous. It's an expensive way to create slop while you sleep. These are the patterns that kill loops in practice:

The maker grades its own homework

One agent writes and verifies. It's always “A+.”Fix: separate verifier sub-agent with no exposure to the maker's reasoning.

The soft “done”

“Done when it looks good” never holds. A second agent asked to “review” with no objective signal is just a second optimist.Fix: a gate that returns pass or fail. Tests, build, types, lint.

No hard stops

The loop runs until a rate limit or your invoice notices.Fix: max iterations + no-progress detection + a token budget cap.

Comprehension debt

The faster the loop ships code you didn't write, the bigger the gap between what exists and what you understand. The bill that hurts isn't the token bill. It's the day you debug a system nobody has read.Fix: read the diffs. Spot-check the gate. Keep the loop off judgment calls.

Two people can build the exact same loop and get opposite results. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop doesn't know the difference. You do.

Build the loop. Stay the engineer.

Read the full guide

Drop your email to unlock this guide — and every resource in the library. Free, one email a week, unsubscribe anytime.

Keep going

Build the loop. Stay the engineer.

I publish practical AI workflows like this every week — Claude Code, agents, automation systems, and the tools I actually use. Join the newsletter and get the next one in your inbox.

Join the newsletter Browse all resources

Loop engineering, explained.

The leverage point moved

How we got here, in 60 seconds

Keep going

Keep going until DONE is true

Loops that run other agents

Should you even build one?

The five pieces, plus memory

The heartbeat

Parallel without chaos

Project knowledge, written once

Hands on your real tools

Maker vs. checker

The agent forgets. The repo doesn't.

Your first loop: the minimum viable version

Get one manual run reliable

Turn it into a skill

Add a gate and a state file

Wrap it in a loop, then schedule it

Good first loops vs. loops to avoid

Good first loops

Keep a human in the chair

How loops quietly burn money

The maker grades its own homework

The soft “done”

No hard stops

Comprehension debt

Read the full guide

More from the library.

5 Claude Code Skills That Replaced My $150k Dev Team

Claude Code Plugin

The Overnight AI Coding Starter Kit

How to Become an AI Engineer in 6 Months — Complete Resource Guide

Build the loop. Stay the engineer.