TL;DR
A May 2026 Google whitepaper argues that software engineering is shifting from writing code toward expressing intent and verifying machine-generated work. The paper, as summarized by Thorsten Meyer AI, says the model itself may account for only about 10% of agent behavior, with tooling, context, tests and oversight carrying the rest.
A new Google whitepaper by Addy Osmani, Shubham Saboo and Sokratis Kartakis argues that AI-assisted software development is being reshaped by verification, tooling and human judgment around coding agents, not by model upgrades alone, a claim that matters as teams rely on AI for a growing share of new code.
The paper, titled The New SDLC With Vibe Coding, says the main shift in software engineering is from writing code directly to expressing intent and letting machines produce working software. According to figures cited in the paper, 85% of professional developers were regularly using AI coding agents as of early 2026, 51% used them daily, and about 41% of all new code was AI-generated.
The whitepaper draws a line between casual “vibe coding” and what it calls agentic engineering. In the paper’s framing, vibe coding means loose prompts, limited review and a reliance on whether the output appears to work. Agentic engineering means formal specifications, automated tests, evals, CI gates, tool controls and human review of architecture and risk.
Its most pointed claim is that the model is only about 10% of an agent system, while the surrounding harness accounts for about 90% of behavior. The paper cites benchmark and experiment results in support of that view, including a Terminal Bench 2.0 case in which an agent reportedly moved from outside the top 30 to the top five by changing the harness while keeping the same model.
The model is only 10%
A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.
The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.
Verification Becomes The Cost Center
The argument matters for engineering leaders because it shifts spending and management attention away from model selection alone. If the paper’s claim is right, teams that buy better models but underinvest in tests, context management, tool permissions, sandboxes and observability may see limited gains and higher maintenance costs.
Thorsten Meyer AI’s analysis of the paper frames the issue as an economics problem: low upfront process investment can look cheap but may lead to repeated fix loops, security remediation and harder maintenance. The same analysis says disciplined agentic engineering has higher upfront costs but can lower the cost per feature when specifications, evals and routing are in place.
automated testing tools for software development
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Vibe Coding Gets Narrower
The paper responds to the broad use of “vibe coding,” a phrase popularized by Andrej Karpathy in February 2025 to describe accepting AI-generated code through feel and repeated prompting. The term has since been used loosely across many AI-assisted workflows.
Google’s paper treats vibe coding as one end of a spectrum rather than the whole category. At the other end, it places agentic engineering, where AI generates code inside a controlled process with tests for deterministic behavior and evals for less predictable agent decisions.
The source material also notes a commercial angle: while the ideas are described as broadly applicable, the analysis says the on-ramps point toward Google’s Gemini, Jules and Agent Development Kit ecosystem.
“generation is solved; verification, judgment, and direction are the new craft”
— Osmani, Saboo and Kartakis, in the Google whitepaper
CI/CD pipeline automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Methods Still Need Scrutiny
The adoption figures and benchmark examples are attributed to the whitepaper and cited sources, including METR and LangChain, but the supplied source material does not provide the full methodology behind every number. It is also not yet clear how broadly the 10% model and 90% harness framing applies across different products, teams, languages and regulated environments.
The paper’s commercial implications are also open to interpretation. The concepts may be tool-agnostic, but the analysis says Google’s examples and suggested paths point toward its own AI developer stack.
AI code review and verification tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Teams Test The Harness Thesis
The next test is whether software teams change how they evaluate AI coding systems. The paper points toward more investment in specifications, automated tests, evals, observability, context engineering and model routing, rather than waiting for a single model upgrade to fix workflow problems.
For readers running engineering teams, the practical milestone is measurable: whether AI coding agents can improve first-pass success, reduce repair cycles and pass production checks without creating hidden debt.
software development harness components
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What is the actual news here?
Google has published a May 2026 whitepaper arguing that AI-driven software development should be judged by the full system around the model, including prompts, tools, tests, evals and human oversight.
Does the paper say models no longer matter?
No. The paper’s claim is that models matter, but they are only one part of a working agent system. It argues that the surrounding harness has a larger effect on real-world behavior.
What is the difference between vibe coding and agentic engineering?
In the paper’s framing, vibe coding relies on loose prompts and surface-level checks. Agentic engineering uses formal specs, automated tests, evals, CI gates and human review before AI-generated code reaches production.
Why should developers care about the 10% claim?
If the claim holds, teams may get more value by improving tests, context, tools and review systems than by switching models alone. It also means engineering culture and process shape AI output.
What remains unconfirmed?
The broad direction is clear from the whitepaper, but the exact strength of the 10% and 90% split across different teams and workloads remains to be tested outside the cited examples.
Source: Thorsten Meyer AI