In June 2026, Addy Osmani named the thing a lot of us had drifted into without a word for it: loop engineering. The idea is that you stop prompting agents by hand and instead "design the system that does it" for you, a recurring cycle where you define a goal and the agent iterates until it is done [1]. Boris Cherny, who built Claude Code, put it flatly: "I don't prompt Claude anymore. I have loops running that prompt Claude" [1]. Peter Steinberger said the same from the trenches: "you shouldn't be prompting coding agents anymore, you should be designing loops that prompt your agents" [1].
The frame is right, and the ladder behind it is real: prompt engineering, then context engineering, then harness engineering, then loop engineering, each wrapping the last [1][2]. What I want to push back on is the takeaway a lot of solo devs are drawing from it, which is "prompting is over, the words stopped mattering." That is a misread, and an expensive one. A loop does not fix a weak prompt. It runs that prompt on a schedule, unattended, over and over, and it removes the one thing that used to catch a bad turn: you, in the chair, reading the output. Loop engineering does not retire prompt management. It is the strongest argument for it yet.
What loop engineering actually is
I am not here to dispute the altitude gain, because it is genuine. Each rung of the ladder wraps the one below it. Prompt engineering is the words you send. Context engineering is everything the model sees around those words. Harness engineering is the environment the agent runs in. Loop engineering is the cycle on top: the trigger, the iteration, and the stop condition that drive an agent toward a goal without you steering each turn [1][2].
Osmani's own components make the shape concrete: scheduled automations, git worktrees to isolate parallel agents, Skills stored as SKILL.md conventions, MCP connectors to external tools, sub-agents that split the maker from the checker, and external memory so state survives between runs [1][2]. This is a real discipline and it is where serious agentic work is going. None of that is in question here.
What is in question is what happens to the prompt when you climb this ladder. And the answer, hidden in plain sight in that component list, is that the prompt does not disappear. It gets buried, repeated, and left alone.
The loop engineering ladder: four stacked layers, from prompt engineering at the base (the words), to context engineering (everything the model sees), to harness engineering (the environment the agent runs in), to loop engineering at the top (the recurring cycle that drives the agent). A marker on the top layer reads: this is where you left the chair, so the prompt at the base now runs unattended
Your prompts can improve. Promptimizer rewrites and auto-tests them for you.
The misread: "stop prompting" is not "prompts stopped mattering"
Read the quotes again carefully. Cherny and Steinberger are talking about who does the prompting, not about whether the prompt content matters. "I have loops running that prompt Claude" still has something prompting Claude. The prompting moved from your fingers to a system. The words themselves did not evaporate; they got promoted into the loop's payload, where they run more often and with less supervision than ever.
Osmani's component list gives it away twice. Skills stored as SKILL.md are codified prompts: the reusable conventions and instructions the agent carries into every run. And "keep the maker away from the checker," the sub-agent split he recommends, is a verification step, which is to say it is an admission that the loop's outputs need to be judged by something other than the loop that produced them [1]. A loop is scaffolding built around prompts. The prompt is still the atom the loop repeats. Loop engineering changes who pulls the trigger, not whether the round in the chamber is any good.
A loop is an amplifier
Interactive prompting had a safety net you probably never counted as one: you read each answer and adjusted the next turn. If a prompt was a little vague, you caught the drift on turn two and corrected it. That net is not a feature of your prompt. It is a feature of you being present.
A loop deletes that net on purpose. Autonomy is the entire point. So a prompt that was "good enough" because you were steering is now running with nobody steering. Osmani names the risk directly: unattended loops make unattended mistakes, and drift is the failure mode he keeps returning to [1][2]. This is the amplifier property. A rough edge you shrugged off when you were correcting it by hand does not stay a rough edge across dozens of unattended iterations; it compounds, and the loop keeps going because nothing in it knows the output got worse. The loop scales whatever the prompt is, including its flaws, and it scales them while you are not looking.
That reframes the whole exercise. The higher you climb the ladder, the more each unattended repetition rides on the quality of the box at the very bottom. You do not get to move up and stop caring about the base. You move up and the base carries more weight per run, not less.
Three things a loop demands of the prompts inside it
The techniques you're reading about work. Test your prompts now with Prompt Score and see your score in real time.
If the loop is going to run your prompts unattended, those prompts have to earn the trust the loop extends them. Three requirements follow directly, and all three are prompt management, not loop design.
1. Versioned. You will edit the prompts and skills inside a loop; that is normal. But when a loop's output quality drops, the first question is "which version of the prompt did it run," and if the answer is "whatever was in the file at the time," you cannot debug it and you cannot roll back. A prompt running unattended in a loop without a version history is a regression you will never trace to its cause.
2. Scored up front. The loop's whole promise is that you are not watching every turn, which means quality cannot be a thing you check after the fact. It has to be measured before you hand the prompt to the loop. Osmani's maker-checker split is exactly this instinct: something other than the generator has to judge the output. A score against your real inputs is that check, front-loaded, so the loop starts from a prompt you already know holds up instead of one you are hoping holds up.
3. Portable. Loops swap models and split work across sub-agents, often routing the cheap high-volume steps to a smaller model and escalating the hard ones. A prompt welded to one vendor's format quietly breaks the moment the loop routes that step somewhere else. The same portability discipline that matters when you manage prompts for agents matters double inside a loop, because the loop, not you, decides which backend runs the next turn.
A matrix titled the three demands a loop makes on the prompts inside it. Row one, versioned: know which prompt version the loop ran, so a regression is traceable and reversible. Row two, scored up front: quality is measured before the loop runs unattended, because you are not watching each turn. Row three, portable: the prompt survives when the loop swaps models or routes a step to a sub-agent. Bottom line: the loop is new, the prompt discipline underneath it is not
What broke when I moved a task into a loop
I took a recurring task I had been doing by hand and wrapped it in a loop: a trigger, an agent, a verifier, a stop rule. The inner prompt was one I had used interactively for weeks and considered solid. Left to run unattended, it drifted. An edge case I would have spotted and corrected on the spot in a live session slipped through, and because nothing in the loop knew that output was wrong, the loop carried the mistake forward and kept iterating on top of it.
My first instinct was to fix the loop: tighter stop rules, a stronger verifier sub-agent, more guard-rails around the cycle. That was the wrong layer. The loop was doing exactly what I told it to. The problem was the atom it was repeating, a prompt that had always been "fine" only because I was silently patching it every turn. The actual fix was upstream: pin the prompt to a version, score it against the edge cases that had been biting me, and only then let the loop run it unattended. Once the base was solid, the loop was solid. I had been trying to engineer around a weak prompt instead of fixing it, which is the exact trap Osmani warns about when he says to build the loop "like someone who intends to stay the engineer" [1].
A two-panel before-and-after diagram. Left panel, the wrong fix: a weak prompt sits at the center of a loop, and the engineer keeps adding stop rules, verifiers, and guard-rails around the outside while the drift continues. Right panel, the right fix: the same prompt is versioned and scored against edge cases before the loop runs it, and the loop around it stays simple and holds
The signal
Loop engineering is the right altitude for agentic work, and it is not a fad; the field is genuinely moving from typing prompts to designing the systems that type them. But "design the loop" was never supposed to mean "neglect the atom the loop runs." The ladder does not free you from the base. It stacks weight on it, because every rung you add is another layer of automation repeating that base prompt without you in the room.
Osmani's closing line is the whole discipline in one sentence: build the loop, but build it like someone who intends to stay the engineer [1]. Staying the engineer means owning the prompt the loop repeats. It means versioning it so a regression is traceable, scoring it so you trust it before the loop does, and keeping it portable so the loop can route it anywhere. The loop is the new part. The discipline underneath it is the oldest part of this whole stack, and it just became load-bearing.
Keep My Prompts lets you keep one canonical, versioned copy of every prompt and skill, score it on six quality criteria against your own inputs, and compare it across models, so the prompts your loops run unattended are ones you have already confirmed hold up. Free to start, no credit card required.