The fading line: what Willison really admits about agentic engineering

The man who drew the line, then admitted he crossed it

Portrait of Simon Willison — Simon Willison, creator of Django and a sharp observer of the LLM ecosystem. Photo: Paul Downey (CC BY 2.0). - Wikimedia Commons

Simon Willison is one of the most respected engineers in the AI ecosystem. Creator of Django, author of one of the most-read technical blogs in the field, rigorous observer of everything LLM. In 2025, he popularized a distinction that quickly spread across the community: vibe coding on one side, agentic engineering on the other.

The boundary felt sharp. Self-evident, even.

Then in early May 2026, he published a post whose title says it all: “Vibe coding and agentic engineering are getting closer than I’d like”. And in that text, he writes something uncomfortable:

Simon Willison blog post: Vibe coding and agentic engineering are getting closer than I'd like — Willison's post, published May 6, 2026. - simonwillison.net

“I’m not reviewing that code. And now I’ve got this guilt.”

That’s not a throwaway admission. It’s one of the most articulate defenders of human oversight admitting, in public, that he no longer oversees. Worth pausing on.

The original distinction, quick reminder

To understand what’s at stake, you need both concepts side by side.

Vibe coding: you prompt, you accept, you run, you see if it works. No code reading, no review, no deep understanding of what was generated. Acceptable for a personal tool, a script only you will use. Willison himself: “if it’s a bug, only you suffer, go ahead!”

Agentic engineering: professional engineers using code agents to amplify their existing expertise. They ship faster, but they maintain standards. They review every diff. They test. They are responsible for what hits production.

The split rests on one simple principle: who carries responsibility for the code? The engineer, not the agent.

Addy Osmani, ex-Chrome and Google, puts it in one line: “Vibe coding = YOLO. Agentic engineering = AI does the implementation, human owns the architecture.”

Normalization of deviance, or how the Challenger happens

What worries Willison isn’t that non-programmers generate code without understanding it. That’s expected. What worries him is what’s happening to him: confidence climbing until vigilance disappears.

He names it: normalization of deviance. The term comes from organizational sociology. It describes the process by which risky behaviors gradually become normal because no accident has occurred. NASA lived it with the Challenger O-rings. Each successful launch despite defective O-rings raised the risk tolerance.

With code agents: every repo generated in 30 minutes that runs in production without incident reinforces the confidence. Until it’s misplaced.

Willison watches it happen to himself. Claude Code creates full JSON endpoints with tests and documentation, with no direct supervision. The generated projects are visually indistinguishable from hand-built ones. And gradually, systematic review fades away.

This drift is especially insidious because it hits the most experienced engineers. Not beginners who don’t know they should review. Seniors who know they should review but no longer do, because “it’s been fine so far."

"But I’m not reviewing that code. And now I’ve got that feeling of guilt.”
— Simon Willison, Vibe coding and agentic engineering are getting closer than I’d like

Amazon: when the real bill comes in

While the debate plays out on blogs, Amazon received the invoice.

March 2026. Amazon’s internal AI assistant, Q, contributes to a series of incidents on the retail infrastructure. March 2: 1.6 million errors and 120,000 orders lost. March 5: a 99% drop in orders across North American marketplaces. 6.3 million orders lost in a few hours.

The response: a 90-day reset on 335 Tier-1 systems. Mandatory double approvals. Senior sign-off required for any AI-assisted change. Deployment controls redesigned from scratch.

Amazon’s SVP of e-commerce, Dave Treadwell, identifies the problem in internal docs: “misalignment between the speed of AI-generated code production and company reliability standards.”

Translated: agents were producing code faster than humans could validate it. And humans had stopped trying to keep up.

This isn’t an anecdote. It’s a company with thousands of senior engineers, mature review processes, battle-tested CI/CD pipelines, and a recognized engineering culture. If normalization of deviance can happen there, it can happen anywhere.

DEV Community article: Amazon Lost 6.3M Orders After AI Coding Tool Went Rogue — Coverage of the Amazon Q incident: 6.3 million orders lost, 90-day reset. - DEV Community

Osmani, sharper on juniors

Addy Osmani's article: Agentic Engineering — Osmani's post, sharper than Willison on the risks for junior developers. - addyosmani.com

Addy Osmani draws a conclusion Willison hints at but doesn’t state as bluntly: junior developers are in structural danger.

His argument: when you produce code without understanding it, you don’t learn. You build on emptiness. And when something breaks, you don’t have the tools to debug, because you never developed the reflexes.

This isn’t nostalgia for hand-written code. It’s a statement about skill transfer. A junior who spent two years “accepting diffs” without understanding them doesn’t have two years of experience. They have two years of exposure to code they never internalized.

On Hacker News, the “Agentic Coding Is a Trap” thread documents the phenomenon with uncomfortable precision. A senior dev testifies: “Cognitive debt is very real, and it hurts worse than technical debt on a personal level.” He describes being unable, during a meeting, to answer questions about his own code, generated by AI a few weeks earlier.

Cognitive debt. You merged the code, you closed the ticket, but you don’t carry the understanding.

The line, the way I hold it

I’m exactly what Willison calls an agentic engineer. Claude Code has been my primary tool since 2025. I use it every day to build and maintain Claude Hub. I ship code to production that I didn’t write line by line.

Do I review every diff? No, not systematically. Am I in the same place Willison describes? Honestly, yes, on some fronts.

But here’s where I draw my line, concretely:

I own the architecture. Not the code, the architecture. I know why each layer exists, what tradeoffs are accepted, where the fragilities are. When an agent generates a feature, I know where it fits and what it must not touch.

I am the user. Willison says it in his patterns: what really matters isn’t that you reviewed the code, it’s that you used the product for several weeks. I test every feature before merging. Not to debug the code, to validate the behavior.

I keep tests as a non-negotiable constraint. Code agents write code that passes the provided tests. If the tests are solid, the diff becomes delegable. That’s why TDD, in an agentic context, becomes more important than before, not less.

The line isn’t “I reviewed every line.” It is “I am responsible for what happens in production and I can explain it.”

What this changes for you

If you’re a senior developer: normalization of deviance targets you as much as anyone. Accumulated confidence is a resource that can turn against you. Maintain review discipline not on everything, but on what matters: critical paths, attack surfaces, external integrations.

If you’re a junior developer: the agent’s speed doesn’t give you experience. It gives you shipped code. Not the same thing. Force yourself to understand what you merge. Not everything, but the patterns. A month spent reading diffs carefully teaches you more than six months spent accepting them blind.

If you build team processes: Amazon just showed you that deployment controls aren’t optional when agents enter the loop. Code production speed has exploded. Your validation processes haven’t kept up. That’s the misalignment Treadwell describes.

The real honesty of Willison

What I appreciate about Willison’s post is that he doesn’t hide behind a comfortable position. He could have written “here’s why I was right all along.” He chose to write “here’s the discomfort I feel.”

This honesty is useful because it says something the discourse on human oversight doesn’t always say: maintaining rigor when agents are increasingly reliable is an active effort, not a passive posture. It’s not because you know you should review that you do. The pressure of speed, the accumulation of successes, the climbing confidence: everything pushes the same way.

The line between vibe coding and agentic engineering isn’t in the definition. It’s in daily discipline. And that discipline is either maintained or lost.

Willison knows it. He wrote it. That’s why it’s worth reading.

Going further

Sources: Simon Willison - Vibe coding and agentic engineering are getting closer · Addy Osmani - Agentic Engineering · Simon Willison - Agentic Engineering Patterns · Amazon 90-day reset - DEV Community · Agentic Coding Is a Trap - HN