The Step Down | Tommy Long

Everyone talks about the step change that hit agentic coding at the end of 2025. Opus 4.5 landed, the whole thing tipped from toy to tool overnight and the consensus settled quickly. The models had finally got smart enough. The capability step was real but I don't buy that story. It wasn't a step up in intelligence, it was a step down in price.

Launch-day testimonials for Claude Opus 4.5 - Windsurf's CEO saying Opus has always been the real SOTA but was cost-prohibitive and is now cheap enough to be your go-to model, and Notion saying its surprisingly low cost is why they could finally make Opus available

The clearest statement of what actually happened came from Windsurf's CEO on launch day, not from a critic months later. Opus models have always been the real SOTA, he said, they'd just been cost-prohibitive. And 4.5 was now at a price point where it could be your go-to model for most tasks. Both halves of that matter: the capability was already there, and the only thing standing between it and everyday use was the bill. Notion said the same thing in fewer words. The first time they could put Opus in front of users at all was when it got cheap, not when it got clever.

Here are the numbers behind the quotes. Opus 4.5 shipped at $5/$25 per million tokens, a third of Opus 4.1's $15/$75 in and out. And Anthropic did something they'd never done with an Opus release. They removed the Opus-specific caps in Claude Code and told Max users they'd now get "roughly the same number of Opus tokens as you previously had with Sonnet". That line is the whole story in one sentence: they weren't selling you a smarter model, they were handing you the model you already wanted at the budget you used to spend on the cheap one.

OpenAI's models were already cheaper but behind Opus on the hard problems, not worth leaving in the loop. What 4.5 proved is that the prize is frontier intelligence priced to run all day, and OpenAI has been migrating onto that ground ever since, to the point where GPT-5.5 now lands at near-identical price and comparable intelligence to Opus. The fight had stopped being about who's smartest and become about who's cheap enough to live in the loop.

Here's the bit that story gets wrong. Nobody moved from Opus 4.1 to Opus 4.5 and went "wow, so much better". 4.1 was already brilliant. People didn't use it because every agentic loop burns tokens by the bucket, and at $75 output you rationed it for the hard problems and ran Sonnet for everything else. What happened in late 2025 wasn't an Opus upgrade at all, it was a Sonnet-to-Opus migration. My own org moved ~100 engineers off Sonnet 4.5 and onto Opus 4.5, and the jump everyone felt - this is so much better - was completely real. It just wasn't the new model being cleverer than the old one, it was finally being allowed to run the good model all day.

Which points at something more general. In agentic work there's always exactly one model that matters: the best one you can afford to leave running in a loop, the daily driver. The frontier sets the ceiling but the daily driver is set by price, because the loop is where the tokens go, the reading, the retries, the subagents and the verification passes. A model you ration isn't your daily driver however clever it is, it's the consultant you call in for the gnarly bug. Since Sonnet 3.5 the daily driver had been whatever Sonnet was and Opus the consultant, and all Opus 4.5 really did was drop the consultant down into the daily-driver slot without moving the ceiling much.

There's a bigger lesson buried in here than which model to run, and it's about pricing as a product lever. Anthropic's revenue went from around $9bn in 2025 to a projected $47bn this year, call it five-fold. They didn't get there by making the model five times better, they got there by making the one they already had a third of the price and taking the cap off. That's the strange shape of selling intelligence: the ceiling on revenue was never how clever the model was, it was how much of it people could afford to run, so the lever that moved the money was price and packaging, not capability. You drop the unit price and the loop swallows the difference and then some.

That's also why I'm sceptical about the next one. Anthropic's next model, Mythos, reportedly scores 93.9% on SWE-bench Verified, up from Opus 4.8's 88.6%. A five-point bump, the kind we've had every couple of releases without paying a penny more. What's different this time is the price: Mythos reportedly costs around five times as much. The same crowd will see 93.9% and call it the next step change, but I don't think it will be one, at least not at launch, because at 5x you don't put it in the loop, you ration it, exactly the way we all rationed Opus 4.1. The benchmark will be the headline while adoption quietly routes around it to whatever's cheap enough to run all day, so the real event won't be Mythos itself but the distilled Mythos-class model that lands a few months later at daily-driver pricing - same pattern, one tier up. Watch the price collapse, not the launch.

The one thing that could break the pattern is us. The reason Opus 4.1 felt expensive is that we were comparing it to zero. Do I really need to spend this? Anyone who's now watched a model do their job for six months compares it to a salary instead, and against a salary 5x Opus is a rounding error. So maybe the rationing reflex doesn't come back this time and we drop Mythos straight into the loop, and honestly I don't know which way it goes. But I'd still back the daily driver tracking price, because the loop economics don't care how much you respect the model. They care what it costs per million tokens, a hundred times a day.

The discourse will keep score on benchmarks because the benchmark is the legible number, the thing you can screenshot. The thing that actually moved the world was a line on a pricing page. Same as it ever was - the step change wasn't the model getting smarter, it was you finally being able to afford the daily driver you'd wanted all along.