Prompt Engineering

Claude Sonnet 5 Became Your Default Overnight: 5 Things to Adjust Before September's Price Jump (July 2026)

Published on July 2, 2026·8 min read

On June 30, 2026, Anthropic shipped Claude Sonnet 5 and made it the default model for Free and Pro on claude.ai, live the same day in Claude Code, the Claude API, Cursor, VS Code and GitHub Copilot [1][2]. If you use any of those, you are already running it, whether or not you chose to. Two things changed under the hood that touch every prompt you send: a new tokenizer that turns the same input into roughly 1.0 to 1.35x more tokens than Sonnet 4.6, and adaptive thinking that is always on with effort defaulting to high on the API and in Claude Code [1][2].

Anthropic set the introductory price ( $2 per million input,$ 10 output) so the transition is "roughly cost-neutral" through August 31, then it steps up to $3 and$ 15 [1]. That intro window is the whole opportunity here: you have until September 1 to adjust before the extra tokens and the higher rate land on your bill at the same time. Here are the five things I checked in my own library the day the default flipped.

The default switched, and that is the story

The usual model-launch question is "should I migrate." That question is already answered for you. If you write in Claude Code, pair with Cursor or Copilot, or hit the API without pinning a model string, your prompts moved to Sonnet 5 on June 30. There was no migration project, no cutover checklist, no A/B window. The model under your prompts changed while you were doing something else.

And it is a genuinely strong model, which is exactly why nobody is going to roll back. It lands close to Opus 4.8 for a fraction of the price: 63.2% on SWE-bench Pro against Opus 4.8's 69.2%, 81.2% on OSWorld-Verified against 83.4%, and it effectively ties Opus on the GDPval-AA v2 knowledge-work benchmark at 1,618 Elo to Opus 4.8's 1,615 [2]. A few points behind the flagship on hard coding, level with it on knowledge work, at a Sonnet price. There is no argument for fighting this default.

So the job is not "evaluate the switch." It is "you already switched, now adjust the prompts to the model you are actually running." Two shifts drive every adjustment below.

Stat cards for Claude Sonnet 5: it is the new default for Free and Pro plans and live in Claude Code, Cursor, VS Code and Copilot; a new tokenizer turns the same input into up to 1.35x more tokens than Sonnet 4.6; and introductory pricing of 2 dollars input and 10 dollars output per million tokens steps up to 3 and 15 dollars on September 1 2026

The two shifts under the hood

A new tokenizer. The same prompt and the same response now count as roughly 1.0 to 1.35x more tokens than on Sonnet 4.6, depending on content type [1][2]. Rate-card parity does not mean spend parity: identical list prices with more tokens per task means your real cost per task can rise. Anthropic offset this with the intro discount, so through August 31 the average transition is roughly cost-neutral. On September 1 the rate returns to $3 /$ 15 and the extra tokens are no longer offset by a lower rate.

Your prompts can improve. Promptimizer rewrites and auto-tests them for you.

Try it free

Adaptive thinking, always on, effort high by default. Sonnet 5 scales its own deliberation, and the default effort is high on the API and in Claude Code [2]. You are no longer opting into reasoning per call; you are opting out of it when you do not want it. On a high-effort default, every call, including the trivial ones, can spend more output tokens thinking than you intended.

Two-panel diagram of what changed under Claude Sonnet 5. Left panel, the tokenizer: the same input now counts as up to 1.35x more tokens, so rate-card parity is not spend parity, and the intro discount that offsets it expires September 1. Right panel, adaptive thinking: it is always on with effort defaulting to high, so trivial calls spend output tokens reasoning unless you set effort down explicitly

Five things to adjust

1. Re-measure token counts on your real prompts, not the rate card

The "cost-neutral" claim is an average across content types. Your prompts are not the average. Code, tables, and non-English text tend to sit toward the 1.35x end of the range, so a prompt-heavy or document-heavy workload can be measurably more expensive per task even at the intro rate. Run your actual prompts through both models and compare the token counts and the dollar totals, not the published rates. This is the same lesson the Opus 4.7 tokenizer change taught last time: the rate card is not the bill.

2. Recheck cache breakpoints and context budgets

A new tokenizer moves where your tokens fall, which moves two things you probably hard-coded against the old counts. Cache breakpoints that lined up on a stable prefix under Sonnet 4.6 can shift, quietly lowering your hit rate. And a prompt you sized to fit a context budget with headroom can now run closer to the edge or over it, because the same input tokenizes larger. Re-measure both against the new counts rather than assuming your old margins still hold.

The techniques you're reading about work. Test your prompts now with Prompt Score and see your score in real time.

Test your prompts

3. Set effort down on high-volume, trivial calls

With effort defaulting to high, a classification, extraction, or formatting prompt you run thousands of times a day is now paying for deliberation it does not need, in output tokens billed at $10 and soon$ 15 per million. Set effort explicitly low on the high-volume, low-judgment calls and reserve high for the work that actually benefits from it. This is the thinking-default trap I flagged when Gemini 3.5 Flash moved its default: a silent default change is the classic cause of a workload that got slower and pricier for no visible reason.

4. Delete the now-redundant reasoning scaffolding

If your prompts carry a "think step by step" block, a "reason carefully before answering" preamble, or a hand-built self-check loop, that scaffolding is now doing a job the model does natively and always. It is dead weight that also double-spends: you pay for the native thinking and then pay again for the scaffolding that tries to induce it. This is the compensation-scaffolding cut I made for Fable 5, and it applies cleanly here: cut the manual reasoning prompts, keep the contracts (schemas, null slots, guard-rails).

5. Plan for September 1 before it arrives

The intro window hides a compounding effect. Right now the lower rate offsets the extra tokens. On September 1 the rate returns to $3 /$ 15 while the token inflation stays, so your bill can move for two reasons at once, and neither is a change you made. Decide now: accept the increase with eyes open, route the high-volume work to a cheaper tier, or lock a budget. The worst version is discovering the step in October from an invoice.

What I checked, and the one that surprised me

I ran my own library through this the day the default flipped. The cost-neutral claim mostly held: on chat-style prompts the token bump was small and the intro rate absorbed it. Two things did not behave. A document-analysis prompt landed near the 1.35x end and got measurably pricier per run even at the intro price, which told me exactly which workloads to watch before September. And a high-volume classification prompt was suddenly slower and costlier because it was reasoning at high effort over a task that needed none; setting effort low put it back where it belonged.

The one that surprised me was not about cost. A prompt I had sized carefully to fit the Sonnet 4.6 context window, with what I thought was comfortable headroom, started truncating on long inputs. Nothing about my prompt changed. The same input simply tokenizes larger on Sonnet 5, so the headroom I had measured was gone. That is the whole lesson in one bug: when the tokenizer moves, every number you derived from token counts is stale, including the ones you are not thinking about.

If you take one operational step from this, make it the re-measure. Keep one canonical version of each prompt, run it against the same inputs on the model you were on and the model you are on now, and read the actual token counts. A default switch is invisible until the bill or the latency moves, and by then you are debugging in production instead of adjusting on purpose.

The signal

Sonnet 5 is a real upgrade, close to Opus for the money, and it is not going anywhere as your default. That is precisely why the move is not to admire the benchmarks and leave your prompts alone. The moment the model under your prompts changes, every assumption baked into those prompts, token budgets, cache boundaries, effort levels, reasoning scaffolds, is worth one measured pass. You did not choose this switch, but you can choose to adjust to it before September turns two silent changes into one visible bill.

Keep My Prompts lets you keep one canonical version of each prompt, score it on six quality criteria, and compare the same prompt across two models on your own inputs, so a default switch becomes a measured adjustment instead of a surprise. Free to start, no credit card required.

References

[1] Introducing Claude Sonnet 5, Anthropic, June 30, 2026. https://www.anthropic.com/news/claude-sonnet-5

[2] Claude Sonnet 5: Benchmarks, Pricing & How It Compares, Codersera, 2026. https://codersera.com/blog/claude-sonnet-5-launch-guide-2026/

[3] Claude Sonnet 5 Ships as Anthropic Default: Agentic Performance Closes Opus Gap, Tech Times, July 1, 2026. https://www.techtimes.com/articles/319409/20260701/claude-sonnet-5-ships-anthropic-default-agentic-performance-closes-opus-gap.htm

#claude-sonnet-5#anthropic#tokenizer#model-default#prompt-engineering#llm-cost#adaptive-thinking#claude-code#solo-dev#2026

Ready to organize your prompts?

Start free, no credit card required.

Start Free

No credit card required

Prompt Engineering

A Loop Is a Prompt Amplifier: Why Loop Engineering Makes Prompt Management Non-Optional (2026)

A loop runs your prompt unattended, over and over. Loop engineering doesn't retire prompt management, it makes it non-optional: version it, score it, keep it portable.

Read article →

Prompt Engineering

Claude Fable 5 Is So Capable You're Paying 2x to Run Scaffolding It Doesn't Need. Cut These 5 (Keep 1). (June 2026)

Claude Fable 5 is so capable that the scaffolding you wrote for weaker models is now overhead, and at 2x Opus pricing you pay for it by the token. Cut the 5 compensation patterns, keep the 1 contract.

Read article →

Prompt Engineering

Prompt Engineering Isn't Dead. It Became the Cheapest Part of Your Harness. (June 2026)

The industry keeps renaming the discipline and declaring it dead: prompt, then context, now harness engineering. The prompt didn't die. It became the cheapest, most portable box you own.

Read article →

The default switched, and that is the story

The two shifts under the hood

Five things to adjust

1. Re-measure token counts on your real prompts, not the rate card

2. Recheck cache breakpoints and context budgets

3. Set effort down on high-volume, trivial calls

4. Delete the now-redundant reasoning scaffolding

5. Plan for September 1 before it arrives

What I checked, and the one that surprised me

The signal

References

Ready to organize your prompts?

Related articles

A Loop Is a Prompt Amplifier: Why Loop Engineering Makes Prompt Management Non-Optional (2026)

Claude Fable 5 Is So Capable You're Paying 2x to Run Scaffolding It Doesn't Need. Cut These 5 (Keep 1). (June 2026)

Prompt Engineering Isn't Dead. It Became the Cheapest Part of Your Harness. (June 2026)