back to blog
field notes

Review Loops Are the New Bottleneck

Teams do not become agent-native by making agents faster; they do it by shrinking repeated human review loops.

May 20264 min read

The agent-native bottleneck is no longer raw implementation speed. It is the review loop that forms after the patch exists. Teams can make agents generate code faster, but that does not help if every diff enters the same slow queue: stale reviewer context, repeated comments, patch churn, and one more fix after one more fix.

The scene is familiar. A PR lands before lunch. The diff is plausible. CI is green. A reviewer opens it between meetings and leaves the same comment for the third time this week: "Please keep this logic in the server action." The agent patches it, but now the type changes ripple into a test helper. Another reviewer arrives later without the earlier context and asks why the helper changed. A Slack ping follows. The PR is not blocked by coding anymore. It is blocked by shared understanding.

That is the new constraint. Agents compress the time between intent and first draft. They do not automatically compress the time between first draft and accepted change. In many teams, the second interval now dominates. The code appears quickly, but the organization still reviews as if code were scarce and reviewer attention were abundant.

Review loops get expensive because they repeat information at the wrong time. A lead explains an architectural rule in a comment instead of encoding it in the task or tooling. A security concern appears after the patch instead of before the run. A test command is discovered from tribal memory. The agent responds to each comment locally, but the system never learns the rule globally. The same pattern comes back in the next PR.

This is why "make the agent faster" is an incomplete strategy. Faster generation can produce more review inventory. If the review lane does not change, the team gets more diffs waiting for human context. Reviewers feel busier, not more leveraged. Agents become a source of interruption because every run creates a new question: is this patch good, or just fast?

The important metric is not time to first diff. It is time to accepted diff. That metric includes the first patch, the comments, the agent's response, the CI reruns, the reviewer re-load, and the final merge decision. A team that optimizes only the first step can celebrate a ten-minute implementation while ignoring the two days of review drift that follow.

Patch churn is the clearest signal. When a PR changes shape three or four times, ask what information arrived late. Was the acceptance criterion incomplete? Was there an unstated code ownership rule? Did the reviewer expect a smaller blast radius? Did the agent lack the right command? Did the test failure expose a real behavior gap, or did it expose missing setup?

Some churn is healthy. Review should catch issues. But repeated churn around the same categories is process debt. If every third agent PR gets a comment about where logic belongs, that rule should move into the task template, linting, examples, or an agent instruction. If every UI patch misses loading state, the workflow should require that state before review. If every database change triggers a security question, the run should include the policy before code is written.

Reviewer context is another hidden cost. Humans are good at judgment, but context reload is expensive. A reviewer who sees a diff twelve hours after the original Slack thread has to reconstruct the why, the constraints, and the intended proof. If the PR description only says "updates onboarding," the reviewer becomes the missing spec engine. That is not a good use of senior attention.

A better loop carries context forward. The task states the goal and boundaries. The agent records what it changed and how it verified the change. The PR description names tradeoffs, skipped paths, and commands run. CI provides evidence instead of noise. Review comments become decisions that can be reused, not disposable instructions trapped in a single thread.

This is the difference between supervision and loop design. Supervision says a human must inspect every move because the agent might drift. Loop design says the system should make drift visible early and make repeated corrections reusable. The reviewer still owns judgment, but the review no longer has to rediscover the same operating rules on every patch.

The practical move is to classify review comments. For a week, collect the comments that appear on agent-produced PRs. Mark which ones are one-off judgment calls and which ones are repeated system instructions. The repeated comments are not just feedback. They are missing infrastructure. They belong in templates, tests, scripts, examples, policy files, or better task shaping.

Then shorten the loop around those categories. If reviewers keep asking for narrower diffs, add an explicit blast-radius section to the task. If they keep asking for test evidence, make the agent include commands and results in the PR. If they keep correcting architectural placement, give the agent a map of where decisions live. If stale context keeps causing questions, require the PR to preserve the decision trail.

Agent-native teams are not teams where humans disappear from review. They are teams where human review gets more valuable because it stops repeating preventable comments. Senior engineers should spend their attention on product judgment, risk, architecture, and tradeoffs. They should not spend it telling the third patch this week to move code out of the client component.

The bottleneck moved. The first draft is cheap now. Acceptance is the work. If you want agents to create leverage, build the loop that turns repeated review comments into reusable system memory. That is where faster agents become faster teams.

next step

Turn the idea into an operating loop.

Bosun helps teams make agent work cheaper, more reviewable, and easier to trust before the run starts.

build the loop