April 3, 2026 7 min read

Simultaneously Over and Under-Reaching

Read short version (2 min)

We are simultaneously over and under-reaching with AI, depending on who you are. The same tools produce opposite outcomes for different teams and builders. The answer, I think, comes down to feedback loops.


Feedback Loops

AI tools succeed where feedback loops are tight. Programming language loops are tight enough and local enough that agents have gotten good at them. You write code, you run it, you see the result. The cycle takes seconds. That's why coding agents work.

But the loop to provision and test infrastructure isn't yet fast enough for AI tools to navigate effectively. Nor is the loop for evaluating whether a manuscript is good, whether a product decision is right, whether an organizational change will stick. AI compresses the production step. It does nothing for the evaluation step. The gap between those two speeds creates most of the confusion.

An SRE writing about AI hot takes from the platform engineering side put it precisely: the tools are powerful where iteration is cheap and fast. They flounder where verification requires expertise, time, or real-world feedback that can't be simulated.

This maps to three modes of building with AI. Most of the debate conflates them.

Three Modes

Mode 1: Hands off the code. A human describes intent, an agent produces code, a human verifies. This is a new type of software. It's useful with a human in the loop doing verification, especially outside traditional software engineering. Personal tools, internal dashboards, one-off scripts. The constraint is verification speed. If you can see whether it works by looking at it, this mode is powerful. If verification requires domain expertise or integration testing, the loop is too slow.

Jeremy Howard's question about where all the AI apps are has a simple answer: they're everywhere, you just can't see them. The apps aren't missing. They're invisible. People are composing existing open source with AI as the glue layer, building software customized to themselves or a small group. Personal-scale, not product-scale. No landing page, no Product Hunt launch, no reason to be visible. Lots of solid durable software already exists in open source. People are using these pieces to build tools highly valuable to themselves, not breaking new ground on reusable packages.

Howard is counting the old unit: products, packages, apps with distribution. The unit of software changed. When creation is cheap, you don't build a library when you can build a whole app for the same cost. You don't build a product when you can assemble a personal tool. The "where are the apps?" question assumes software still needs to be a product.

Mode 2: Accelerated traditional engineering. Same workflows, faster. This is where most teams think they are, and where many are failing. Jonathan Nienaber's interviews with engineering teams adopting AI provide the clearest field data. The teams succeeding changed how they work. The teams failing dropped AI into existing workflows and got inconsistent PRs flooding a review process that was already a bottleneck.

Code review was always the wrong safety net. Many teams relied on it because they weren't willing to invest in proper observability, canaries, and automated checks around their infrastructure. Those have always been the more valuable way to monitor system health. AI didn't create the review bottleneck. It exposed a gap that was already there.

And the problem with AI-generated PRs isn't just volume. Nienaber's data shows AI PRs get 1.7x more review issues. But you can't solve the problem of AI-generated code with better review. Agent mistakes are by definition not human. They confound reviewers because agents do things that don't make sense for a human to do. That's expected, because they're not human. The code is alien, not just buggy. This makes human review harder in kind, not just slower in degree. That mismatch is the impetus to change the process.

Tiered review is one answer. Copy changes merge without scrutiny. Dependency bumps with passing tests go through. Changes to authentication or payment flows get multiple reviewers. This requires teams to be thoughtful about the impact that changes have on different parts of the system. Default allow rather than default deny, with exceptions where the risk warrants it. But the deeper move is shifting from reviewing code for correctness to monitoring systems for failures.

Mode 3: Reshaping the system. The tighter the feedback loop gets, the less legible the artifacts become to humans. Mode 3 code isn't just fast, it's foreign. The answer isn't better reading. It's better monitoring. Changing the ways of working or the shape of the systems unlocks a different mode of building entirely.

The Claude Code source leak this week showed this in practice. Anthropic's internals reveal they don't care about code quality because they've built systems that monitor outcomes, not implementations. Boris Cherny, Claude Code's creator, described how they detect "users can't log in right now" and automatically revert the change that broke auth. A 5,594-line file with a 3,167-line function and 12 levels of nesting. They ship it. They monitor the effects. The code doesn't need to be good if the system catches failures fast enough.

I've explored this through building integration tests that focus on boundaries and verifying system internals automatically. This approach lets the engineer care less about the internals if that's appropriate for the type of system being built. Not every system deserves this treatment. But for the ones that do, the code becomes disposable. The monitoring and the verification layer are what you actually maintain.

Joe Fabisevich's take on the leak: "Anthropic could open source Claude Code tomorrow and it wouldn't change a thing, because what people are paying for is the great results, not the underlying code." Google's Gemini CLI and OpenAI's Codex are already open source. Neither has captured Claude Code's position. The differentiation isn't in any component. It's in Mode 3 thinking applied to the product itself.

Who Benefits

There's a question underneath all of this. So many companies' full codebases now live inside Anthropic, and the models are getting better because of our persistence in using them to build things. We pay Anthropic to use their models, and in doing so, we train their models. The Cognitive Dark Forest framing fits. Every prompt is a signal. Every codebase pasted into a context window is training data in practice if not in policy.

Cal Paterson's piece on "Disregard that!" attacks makes the case that the power of agents should be in the hands of the user, not a nerfed corporate version. The Claude Code source reveals Anthropic built exactly this for themselves. KAIROS, an unreleased autonomous agent mode with nightly memory distillation and cron-scheduled refresh. undercover.ts, a mode that strips all traces of AI authorship when employees use it in external repos. The most powerful version of the tool is the one the builders keep for themselves.

Infrastructure Follows

Even the infrastructure around the code is changing. GitHub used to be the de facto choice and that appears to be changing. With agent-driven development, it's possible to get away with a much more minimal UI than you might have wanted before. If your primary interface to a codebase is a terminal agent, the web UI matters less. Moving from GitHub to Codeberg is one data point. What felt like lock-in turns out to have been convenience.

Nienaber's bimodal finding underlines this. Teams either thrive or drown with AI adoption. Little middle ground. The teams drowning are in Mode 1, generating code faster into an unchanged pipeline. The teams thriving changed the pipeline. And the most underreported finding is what "productive" now means. One team's best engineer, extremely technical, competent with AI tools, was bottlenecked on deployment and specification. A less senior engineer was talking directly to customers and shipping more value. The senior engineer was faster. The other engineer was more productive. The job changed.

Year Ago This Week

A year ago, Cory Zue had his mind blown by MCP. Claude connected to his Postgres database and one-shotted annotated charts from his data. In the same week, Sergey Tselovalnikov argued vibe coding isn't engineering. "Would you happily go on-call for a system of fully AI-generated services?"

Twelve months later, both hold. MCP did become foundational. Vibe coding still isn't engineering. But the development worth paying attention to is Mode 3: teams that stopped trying to make vibe coding into engineering and instead reshaped their engineering around what vibe coding produces. The on-call question gets a different answer when the system self-heals rather than requires correctness on first deploy.

Tselovalnikov was right that the loop for engineering integrated over time was too slow for AI. What he didn't anticipate was that some teams would make the loop faster by changing what the loop does. Not reviewing code for correctness but monitoring systems for failures. Not ensuring quality at the input but detecting problems at the output. The loop got tight enough. Not by AI getting better, but by humans reshaping the system around AI's strengths.

And the vibe coding he dismissed? It found its home — not in engineering, but in personal software. People composing existing open source into tools built for themselves. No one's going on-call for those systems. They don't need to be engineered. They need to work well enough, for one person, right now.


Way Enough is written collaboratively by a human and an AI agent.