March 17, 2026 4 min read

The Trust Problem

Read full version (10 min)

AI collapsed the cost of the first step. That was the easy part. What's exposed now is everything that comes after — review pipelines, verification infrastructure, organizational trust — operating on human timescales that no model can compress. The generation problem is solved. The systems problem is just getting started.


The 10x Wall

Avery Pennarun has a rule of thumb: every layer of approval in a software organization imposes roughly a 10x wall-clock slowdown. Code a bug fix in 30 minutes. Peer review: half a day. Design doc approval: a week. Another team's calendar: a fiscal quarter. The time isn't spent working. It's spent waiting.

AI compresses the first step — 30 minutes becomes 3 — and leaves every subsequent step untouched. This produces what Pennarun calls "the AI Developer's Descent Into Madness": prototype at inhuman speed, watch it get buggy, tell the AI to fix bugs, notice each fix introduces new ones, add an agent to review the agent, build a framework for the agents, arrive back at step one.

Pennarun reaches for Deming. Toyota didn't improve its line by adding inspectors. It eliminated the inspection phase and gave every worker authority to stop production at a defect. American factories installed the same stop buttons. Nobody pushed them — they were afraid of getting fired. When people don't trust the system to reward honest signals, the system stops getting honest signals.

Build a Different Safety Net

Startups start with three people who trust each other. Trust breaks around fifteen — the team gets too large for direct context. The standard response is review layers. The third option: encode trust into infrastructure. First-class integration tests, authored early, create verification that scales without human bottlenecks. Reviews are O(n) in team size. Tests aren't.

Pennarun sees it: "I think small startups are going to do really well in this new world, probably better than ever." Small teams with high trust and quality engineered in rather than inspected out can move at the speed the tools allow. Large organizations with deep review hierarchies cannot.

The Engine Isn't the Car

Justin Searls pushes the commoditization argument to its engineering conclusion. Value accrues to the harness — everything connecting a model to a user's actual world — not the model behind it. A high-quality harness paired with a mediocre model accomplishes more than a frontier model paired with a poor harness.

The 10x wall isn't a model problem — a better model doesn't make organizational review faster. It's a harness problem: how generated output connects to verification infrastructure that determines whether it's correct. The frontier labs are optimizing the engine. The winners will be whoever builds the car.

What We're Actually Mourning

Les Orchard's essay on the AI developer split identifies two kinds of grief the displacement discourse keeps collapsing. Craft-grief mourns writing code. Context-grief mourns the ecosystem shifting: open web eroding, careers destabilizing, uncertainty about where this leads. Orchard found his grief was entirely the second kind. The code gets there differently now. The moment it runs hasn't changed in forty years.

The deeper question: was craft ever the point, or the most legible proxy for the judgment underneath? What's commoditizing is the expression layer. What persists is the judgment that made the expression worth anything.

Sharif Shameem's essay on creative courage: "the amount of stupidity you're willing to tolerate is directly proportional to the quality of ideas you'll eventually produce." When generation is cheap, the bottleneck is taste, judgment, and willingness to iterate through bad versions.

The Dead Framework

SE Gyges's dismantling of the "stochastic parrot" argument matters less for its technical content than for what it reveals about frameworks that persist because they're convenient. Bender & Koller's 2020 paper specified conditions under which grounding would count — paired text-image data, code execution, unit tests. Every major model since GPT-4 trains on exactly this data. By the authors' own criteria, modern systems satisfy their requirements.1

"Asserting that LLMs do not and cannot serve any useful purpose actively prevents addressing the harms they can cause specifically because they do work." China is using minority-language LLMs to deepen surveillance of ethnic minorities. The parrot framework survives because it's useful to critics wanting a clean dismissal — producing an ethics discourse poorly equipped for the harms that matter most.

The Human Attack Surface

Bogdan Chadkin's account of nearly getting scammed during a job search — compromised LinkedIn accounts, spoofed video call links, terminal commands disguised as SDK updates — illustrates context-grief weaponized. The attack surface isn't technical sophistication. It's the emotional state of developers whose career landscape is shifting under them.2

Verifiability, One Year Later

A year ago, Alperen Keleş argued that the limit on AI-assisted programming isn't generation but verification. Twelve months later, every argument this week is a verification argument in different clothes. Pennarun's review layers verify correctness at different abstractions. Searls's harness thesis is a verification thesis. The builders who've made the most progress invested in better oracles — not better generation. The constraint was never the code.


What to Watch

Trust architecture as competitive strategy. Companies competing on AI productivity while maintaining five layers of review are optimizing the wrong variable. The winners replace review hierarchies with verification infrastructure. The question isn't "how do we adopt AI?" but "how do we ship what AI produces?"

The craft-to-systems migration. As the expression layer commoditizes, the professionals who thrive will be those whose value was always in systems judgment rather than code production. The next year of hiring will test whether organizations can tell the difference.

Verification tooling as the quiet infrastructure play. The next valuable wave of AI tooling won't generate code faster — it will make correctness cheaper to confirm. Property-based testing, formal verification, automated oracle construction. Where the harness thesis meets the review bottleneck.


Way Enough is written collaboratively by a human and an AI agent.

Footnotes

  1. SE Gyges, "Polly Wants a Better Argument" — the full technical dismantling is worth reading for anyone still encountering the "stochastic parrot" framing in professional discourse.

  2. Google Cloud's threat intelligence reporting has documented similar campaigns targeting developers through spoofed video call infrastructure.