Do you need a car that goes 200 mph in a world where 70mph is the speed limit

If the Speed Limit Is 70 MPH, Do You Need a Car That Goes 200?

Jun 24, 2026·Last updated on Jun 24, 2026

Share this article:

Author: Matt Gould

A car that does 200 mph is an astonishing piece of engineering. It is also, for almost everyone, completely irrelevant. The speed limit is 70. The car in your driveway that tops out at 120 will never once be asked to prove it. The extra 80 mph of capability is real, it is impressive, and it does nothing for your commute.

I think the AI market is starting to discover its own version of this. For the past several years, the industry has operated on a simple and largely correct assumption: the smartest model wins. Each generation was materially better than the last — GPT-3 to GPT-4 to the o-series, Claude Sonnet to Opus, each Gemini release — and "better" translated cleanly into better business outcomes. So companies reached for the most capable model available, and they were right to.

But I think a different dynamic is beginning to emerge. We're seeing signs that for a large and growing share of real-world workloads, the question is no longer "which model is smartest?" It's "which model is smart enough?" And once you're asking that question, the car that goes 200 starts to look like a strange thing to pay for.

To be clear about what I am not arguing: frontier models are not slowing down, and they are not about to be replaced. The argument is narrower and, I think, more interesting. It's that the purchasing logic is shifting — that the metric businesses optimize for is quietly moving from "highest benchmark score" to "cheapest model that clears the bar."

Intelligence has thresholds

Here's the thing that I don't think the benchmark race fully accounts for: intelligence, for most business tasks, is not a linear input. It has thresholds.

Below a certain level of capability, a model is useless for a task — it hallucinates, it misclassifies, it can't follow the instructions. Above that level, it becomes useful. And well above it, the additional intelligence stops mattering, because the task simply doesn't have any more value to extract.

Think about the kinds of work that actually dominate enterprise AI spend: customer support triage, summarization, classification, information extraction, internal knowledge search, routine coding. For most of these, going from 60% to 90% accuracy is transformative — it's the difference between a tool you can't deploy and one you can. Going from 95% to 97% is often a rounding error in business terms. You've already cleared the speed limit. The road doesn't care that you have more to give.

This is the part of the analogy that I keep coming back to. The frontier labs are competing to build faster and faster cars. But most of the driving happens at 70.

Businesses don't buy intelligence — they buy outcomes

There's a subtle category error baked into how we talk about model selection. We rank models by how smart they are, as if intelligence were the product. But no business actually buys intelligence. They buy outcomes: a resolved support ticket, a correctly extracted invoice field, a merged pull request.

If you take that seriously, the metric that should govern model choice isn't the benchmark score at all. It's something closer to cost per successful task — how much do I spend to get one unit of the outcome I actually want?

And once you frame it that way, a cheaper model that succeeds 92% of the time can easily beat a frontier model that succeeds 96% of the time, if the frontier model costs ten or twenty times as much per call. The academic framing for this is starting to appear too; the recent "cost-of-pass" work formalizes exactly this idea — measuring models by the expected cost to produce a correct answer rather than by raw capability. I think that framing, or something like it, becomes the dominant way enterprises evaluate models within a year or two.

Software history rhymes here

If this pattern feels familiar, it should. We've watched it play out over and over in technology.

Proprietary Unix was, for a long time, genuinely better than Linux. Linux won the data center anyway, because it was good enough and free. Oracle was the gold standard for databases; open-source Postgres and MySQL now run an enormous share of the world's applications. Specialized, expensive hardware gave way to commodity servers. Private data centers gave way to AWS. In each case, the best technology won early — and the good-enough technology won at scale.

The lesson isn't that the premium option was bad. It's that "best" and "most widely adopted" are different competitions, decided by different buyers at different points in a market's maturity. AI looks like it may be approaching the inflection point where those two competitions start to diverge.

The signal is getting louder

What makes me think this is happening now, rather than as a someday-prediction, is that the behavior is showing up in the products themselves — and the most telling moves are coming from the companies with the most to lose.

Microsoft is the clearest example, because it builds frontier-class products and is now deliberately shipping a cheaper, smaller model underneath one of them. At Build in early June it introduced MAI-Code-1-Flash, a lightweight, agentic coding model built directly into GitHub Copilot and VS Code. The pitch is not that it's the smartest coding model in the world — it isn't, and Microsoft doesn't claim it is. It's a small model (roughly 5 billion active parameters) tuned to plan and reason through everyday coding tasks at high speed and low cost, with Microsoft positioning it as comparable in class to Claude Haiku but cheaper to run. That is the threshold argument stated as a product decision: for the bulk of what an in-IDE coding assistant does, you don't need the 200-mph engine, so Microsoft built one tuned for the speed limit. And the strategic motive is the same cost pressure everyone is feeling — the company has been candid that agentic tools chain together many model calls per task, which is precisely the workload where paying frontier rates on every call stops making sense.

Cursor is a second data point, and a sharper one because it built the cheaper model itself, from scratch, rather than tuning someone else's. Its in-house Composer 2.5, released last month, makes the same trade as Microsoft's — Cursor is refreshingly honest that GPT-5.5 still leads the hardest benchmarks. What Composer 2.5 offers is parity with frontier models on everyday coding tasks at roughly one-tenth the cost per token, and Cursor made it the default. Independent testing put it third on a public coding-agent index — behind only the maxed-out variants of Opus and GPT-5.5, which cost something like 10 to 60 times more per task. That's the threshold logic made concrete: clear the bar, then optimize for cost.

The third example is the one I find most conceptually interesting, because it doesn't even require a single good-enough model. On June 12, OpenRouter launched Fusion, which fans a prompt out to a panel of models in parallel and synthesizes their answers into one. On Perplexity's DRACO deep-research benchmark, a panel of three budget models — Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro — scored 64.7%, beating solo GPT-5.5 (60.0%) and solo Claude Opus 4.8 (58.8%) outright, landing within a single point of the top frontier model at roughly half the cost. Three cars that each top out at 90 mph, coordinated well, got there faster than one car that does 200.

The future is routing, not a single model

Put those three examples together and a pattern emerges that I think is the real story: companies are going to stop picking a model and start routing work between models.

The architecture is the same one we already know from the rest of computing infrastructure. You don't serve every web request from the origin server — you put a cache and a CDN in front of it, and only fall through to the expensive layer when you have to. Model routing is the same idea. Send the work to a cheap model first. Measure confidence. Escalate to a frontier model only when the cheap one isn't sure, or when the task is genuinely hard.

OpenRouter's own "advisor" tool is an almost literal version of this — a fast, cheap model does the routine work and consults a stronger model mid-generation only when it matters. Microsoft's tiered approach is the same instinct at the product level. This is what "cost per successful task" looks like once it becomes infrastructure: a layered system where frontier intelligence is a resource you spend deliberately, not a default you pay for on every call.

Frontier models still matter — a lot

I want to resist the lazy version of this argument, because it's wrong. "Cheap models will replace frontier models" is not what's happening and not what I'm claiming.

Frontier models remain essential for reasons that don't show up in a cost-per-task spreadsheet. They handle the genuinely hard reasoning that the cheap models still fail. They push the frontier forward, which is what creates the headroom that later gets commoditized — today's good-enough model is good enough partly because it was trained on, and benchmarked against, yesterday's frontier. And they set the market's expectations for what "good" even means. In the routing world I just described, the frontier model is the escalation target. You still need the car that goes 200; you just stop using it for the school run.

The shift, then, isn't from frontier to cheap. It's from one model for everything to the right model for each task — with the quiet but consequential consequence that "the right model" turns out to be the cheap one far more often than the benchmark leaderboards would suggest.

So, do you need the car that goes 200?

A year ago, the most important question in AI was who could build the smartest model. I think that question is getting less interesting — not because intelligence stopped mattering, but because for most of the work that pays the bills, we've started clearing the threshold where more of it changes the outcome.

The biggest AI story of the next year may not be who builds the smartest model.

It may be discovering which models are smart enough.

Because once the speed limit is 70 mph, very few people need a car that goes 200.