On June 30, 2026, Anthropic shipped Claude Sonnet 5 and made it the default model for Free and Pro on claude.ai, live the same day in Claude Code, the Claude API, Cursor, VS Code and GitHub Copilot [1][2]. If you use any of those, you are already running it, whether or not you chose to. Two things changed under the hood that touch every prompt you send: a new tokenizer that turns the same input into roughly 1.0 to 1.35x more tokens than Sonnet 4.6, and adaptive thinking that is always on with effort defaulting to high on the API and in Claude Code [1][2].
Anthropic set the introductory price (2permillioninput,10 output) so the transition is "roughly cost-neutral" through August 31, then it steps up to 3and15 [1]. That intro window is the whole opportunity here: you have until September 1 to adjust before the extra tokens and the higher rate land on your bill at the same time. Here are the five things I checked in my own library the day the default flipped.
The default switched, and that is the story
The usual model-launch question is "should I migrate." That question is already answered for you. If you write in Claude Code, pair with Cursor or Copilot, or hit the API without pinning a model string, your prompts moved to Sonnet 5 on June 30. There was no migration project, no cutover checklist, no A/B window. The model under your prompts changed while you were doing something else.
And it is a genuinely strong model, which is exactly why nobody is going to roll back. It lands close to Opus 4.8 for a fraction of the price: 63.2% on SWE-bench Pro against Opus 4.8's 69.2%, 81.2% on OSWorld-Verified against 83.4%, and it effectively ties Opus on the GDPval-AA v2 knowledge-work benchmark at 1,618 Elo to Opus 4.8's 1,615 [2]. A few points behind the flagship on hard coding, level with it on knowledge work, at a Sonnet price. There is no argument for fighting this default.
So the job is not "evaluate the switch." It is "you already switched, now adjust the prompts to the model you are actually running." Two shifts drive every adjustment below.
Stat cards for Claude Sonnet 5: it is the new default for Free and Pro plans and live in Claude Code, Cursor, VS Code and Copilot; a new tokenizer turns the same input into up to 1.35x more tokens than Sonnet 4.6; and introductory pricing of 2 dollars input and 10 dollars output per million tokens steps up to 3 and 15 dollars on September 1 2026
The two shifts under the hood
A new tokenizer. The same prompt and the same response now count as roughly 1.0 to 1.35x more tokens than on Sonnet 4.6, depending on content type [1][2]. Rate-card parity does not mean spend parity: identical list prices with more tokens per task means your real cost per task can rise. Anthropic offset this with the intro discount, so through August 31 the average transition is roughly cost-neutral. On September 1 the rate returns to 3/15 and the extra tokens are no longer offset by a lower rate.
Your prompts can improve. Promptimizer rewrites and auto-tests them for you.
Adaptive thinking, always on, effort high by default. Sonnet 5 scales its own deliberation, and the default effort is high on the API and in Claude Code [2]. You are no longer opting into reasoning per call; you are opting out of it when you do not want it. On a high-effort default, every call, including the trivial ones, can spend more output tokens thinking than you intended.
Two-panel diagram of what changed under Claude Sonnet 5. Left panel, the tokenizer: the same input now counts as up to 1.35x more tokens, so rate-card parity is not spend parity, and the intro discount that offsets it expires September 1. Right panel, adaptive thinking: it is always on with effort defaulting to high, so trivial calls spend output tokens reasoning unless you set effort down explicitly
Five things to adjust
A checklist of five things to adjust now that Claude Sonnet 5 is your default: re-measure tokens on your real prompts, recheck cache breakpoints and context budgets, set effort low on high-volume trivial calls, cut redundant reasoning scaffolding, and plan for the September 1 price step from 2 and 10 dollars to 3 and 15 dollars per million tokens
1. Re-measure token counts on your real prompts, not the rate card
The "cost-neutral" claim is an average across content types. Your prompts are not the average. Code, tables, and non-English text tend to sit toward the 1.35x end of the range, so a prompt-heavy or document-heavy workload can be measurably more expensive per task even at the intro rate. Run your actual prompts through both models and compare the token counts and the dollar totals, not the published rates. This is the same lesson the Opus 4.7 tokenizer change taught last time: the rate card is not the bill.
2. Recheck cache breakpoints and context budgets
A new tokenizer moves where your tokens fall, which moves two things you probably hard-coded against the old counts. Cache breakpoints that lined up on a stable prefix under Sonnet 4.6 can shift, quietly lowering your hit rate. And a prompt you sized to fit a context budget with headroom can now run closer to the edge or over it, because the same input tokenizes larger. Re-measure both against the new counts rather than assuming your old margins still hold.
The techniques you're reading about work. Test your prompts now with Prompt Score and see your score in real time.
With effort defaulting to high, a classification, extraction, or formatting prompt you run thousands of times a day is now paying for deliberation it does not need, in output tokens billed at 10andsoon15 per million. Set effort explicitly low on the high-volume, low-judgment calls and reserve high for the work that actually benefits from it. This is the thinking-default trap I flagged when Gemini 3.5 Flash moved its default: a silent default change is the classic cause of a workload that got slower and pricier for no visible reason.
4. Delete the now-redundant reasoning scaffolding
If your prompts carry a "think step by step" block, a "reason carefully before answering" preamble, or a hand-built self-check loop, that scaffolding is now doing a job the model does natively and always. It is dead weight that also double-spends: you pay for the native thinking and then pay again for the scaffolding that tries to induce it. This is the compensation-scaffolding cut I made for Fable 5, and it applies cleanly here: cut the manual reasoning prompts, keep the contracts (schemas, null slots, guard-rails).
5. Plan for September 1 before it arrives
The intro window hides a compounding effect. Right now the lower rate offsets the extra tokens. On September 1 the rate returns to 3/15 while the token inflation stays, so your bill can move for two reasons at once, and neither is a change you made. Decide now: accept the increase with eyes open, route the high-volume work to a cheaper tier, or lock a budget. The worst version is discovering the step in October from an invoice.
What I checked, and the one that surprised me
I ran my own library through this the day the default flipped. The cost-neutral claim mostly held: on chat-style prompts the token bump was small and the intro rate absorbed it. Two things did not behave. A document-analysis prompt landed near the 1.35x end and got measurably pricier per run even at the intro price, which told me exactly which workloads to watch before September. And a high-volume classification prompt was suddenly slower and costlier because it was reasoning at high effort over a task that needed none; setting effort low put it back where it belonged.
The one that surprised me was not about cost. A prompt I had sized carefully to fit the Sonnet 4.6 context window, with what I thought was comfortable headroom, started truncating on long inputs. Nothing about my prompt changed. The same input simply tokenizes larger on Sonnet 5, so the headroom I had measured was gone. That is the whole lesson in one bug: when the tokenizer moves, every number you derived from token counts is stale, including the ones you are not thinking about.
If you take one operational step from this, make it the re-measure. Keep one canonical version of each prompt, run it against the same inputs on the model you were on and the model you are on now, and read the actual token counts. A default switch is invisible until the bill or the latency moves, and by then you are debugging in production instead of adjusting on purpose.
The signal
Sonnet 5 is a real upgrade, close to Opus for the money, and it is not going anywhere as your default. That is precisely why the move is not to admire the benchmarks and leave your prompts alone. The moment the model under your prompts changes, every assumption baked into those prompts, token budgets, cache boundaries, effort levels, reasoning scaffolds, is worth one measured pass. You did not choose this switch, but you can choose to adjust to it before September turns two silent changes into one visible bill.
Keep My Prompts lets you keep one canonical version of each prompt, score it on six quality criteria, and compare the same prompt across two models on your own inputs, so a default switch becomes a measured adjustment instead of a surprise. Free to start, no credit card required.