Loop engineering, explained.
Stop being the person who prompts the AI. Design the system that prompts it for you. This is the practical version of the idea everyone is quoting and almost nobody can explain.
The leverage point moved
For two years, working with a coding agent meant: write a prompt, read the result, write the next prompt. You were the loop. You held the tool the whole time.
That phase is ending. The new job is to build a small system that finds the work, hands it to the agent, checks the result, records what happened, and decides the next move. You design it once. It prompts the agent from then on.
“You should be designing loops that prompt your agents.”
Peter Steinberger, creator of OpenClaw“My job is to write loops.”
Boris Cherny, creator of Claude CodePlain definition: a loop is a small system that keeps an agent working without you prompting it every 30 seconds. It gives the agent a goal, checks what it produced, decides if it is done, and if not, prompts it again.
The skeptic line is “it's just a cron job with a hat on.” Half right. The schedule is cron. The difference is the decision in the middle: a model looks at the current state and chooses the next action. Cron runs a fixed script. A loop runs judgment.
How we got here, in 60 seconds
People talk past each other because “loop” hides several different things. Here is the ladder.
Keep going
A dead-simple loop that feeds the agent the same task over and over so it doesn't stop after one answer. Crude, but it meant you stopped typing “continue” every five minutes.
Keep going until DONE is true
Instead of “keep going,” you define what done means: all tests pass, lint is clean. A separate model checks completion, so the agent that wrote the code isn't the one grading it. This is /goal in Claude Code and Codex.
Loops that run other agents
A loop that wakes on a schedule, checks GitHub, opens an isolated worktree, sends one agent to build and another to review, runs the tests, opens the PR, and writes down what happened so tomorrow's run resumes instead of restarting. This is what Boris and Peter actually mean.
Should you even build one?
Honest answer: most people don't need a loop yet. Loops earn their cost under four conditions. Miss one and the loop costs more than it returns. Check yourself:
One more hard rule: every loop needs a hard stop. A max iteration count, a token budget, or a time limit. Without one, the loop runs until you notice the bill.
The five pieces, plus memory
Claude Code and Codex both ship all five now. The names differ slightly, the capability is the same. A real loop uses all of them.
The heartbeat
Scheduled runs that fire on a cadence or an event. /loop re-runs on a timer. /goal keeps going until a condition you wrote is actually true. This is what makes a loop a loop instead of one run you did once.
Parallel without chaos
A separate working directory on its own branch. Two agents editing the same files is the failure mode. Worktrees mean one agent's edits literally cannot touch the other's checkout.
Project knowledge, written once
A SKILL.md holding your conventions, build steps, and “we don't do it like this” rules. Without skills, the loop re-derives your project from zero every cycle. With them, it compounds.
Hands on your real tools
Built on MCP. The difference between an agent that says “here's the fix” and a loop that opens the PR, updates the ticket, and pings Slack when CI goes green.
Maker vs. checker
The single most useful structure in a loop. The model that wrote the code is too nice grading its own homework. A second agent with different instructions catches what the first one talked itself into.
The agent forgets. The repo doesn't.
A markdown file or board that lives outside the conversation and holds what's done, what's in progress, and what was learned. Sounds too dumb to matter. It's the spine of every working loop: tomorrow's run resumes instead of restarting.
Your first loop: the minimum viable version
Don't start with a swarm. Start with four parts, in this order. Skipping ahead is how loops fail.
Get one manual run reliable
Prompt the task by hand until the agent does it well, start to finish. If it can't do it once with you watching, a loop just automates the failure.
Turn it into a skill
Write the context, rules, and steps into a SKILL.md so the loop never re-derives them.
Add a gate and a state file
One automated check that can fail bad work (tests, build, lint). One STATE.md the loop updates after each run.
Wrap it in a loop, then schedule it
Use /goal with an objective stop condition, then put it on a cadence with /loop or a scheduled task.
# Babysitter loop (Boris Cherny's own starter pattern) /loop babysit all my PRs. Auto-fix build issues, and when comments come in, use a worktree agent to fix them. # Goal loop with an objective stop condition /goal All tests in test/auth pass and lint is clean. Scan src/auth for failures, propose fixes in a worktree, open a draft PR when the goal condition holds. Stop after 3 failed attempts.
# Loop state · ci-triage ## Last run date · what was found · what was done ## In progress - branch-name · current status ## Escalated to a human - the things the loop could not handle ## Lessons learned - write rules here so the next run doesn't repeat mistakes ## Stop conditions met - when and how the goal was verified
The metric that matters: cost per accepted change. Not tokens spent, not tasks attempted. If you're rejecting more than half of what the loop ships, you're doing the review work the loop was meant to save you from.
Good first loops vs. loops to avoid
The shape of a good first loop: repetitive, machine-checkable, low blast radius. Anything where “done” is a judgment call still needs a human in the chair.
Good first loops
- CI failure triage: nightly scan, classify causes, draft fixes for the easy ones
- Dependency bump PRs: weekly scan, test compatibility, open PRs
- Lint-and-fix passes on every PR
- Flaky test reproduction until a theory survives
- Issue-to-PR drafts on code with strong tests
Keep a human in the chair
- Architecture rewrites
- Auth or payments code
- Production deploys
- Vague product work
- Anything where “done” is an opinion
How loops quietly burn money
An agent that can't verify its own work isn't autonomous. It's an expensive way to create slop while you sleep. These are the patterns that kill loops in practice:
The maker grades its own homework
One agent writes and verifies. It's always “A+.”Fix: separate verifier sub-agent with no exposure to the maker's reasoning.
The soft “done”
“Done when it looks good” never holds. A second agent asked to “review” with no objective signal is just a second optimist.Fix: a gate that returns pass or fail. Tests, build, types, lint.
No hard stops
The loop runs until a rate limit or your invoice notices.Fix: max iterations + no-progress detection + a token budget cap.
Comprehension debt
The faster the loop ships code you didn't write, the bigger the gap between what exists and what you understand. The bill that hurts isn't the token bill. It's the day you debug a system nobody has read.Fix: read the diffs. Spot-check the gate. Keep the loop off judgment calls.
Two people can build the exact same loop and get opposite results. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop doesn't know the difference. You do.
Build the loop. Stay the engineer.
Read the full guide
Drop your email to unlock this guide — and every resource in the library. Free, one email a week, unsubscribe anytime.
Build the loop. Stay the engineer.
I publish practical AI workflows like this every week — Claude Code, agents, automation systems, and the tools I actually use. Join the newsletter and get the next one in your inbox.