Claude Opus 4.7 Prompting Guide: What Changed and How to Adapt Your Prompts
On April 16, 2026, Anthropic released Claude Opus 4.7. Within 48 hours, developers on Reddit, Hacker News, and Discord reported the same confusing pattern: their long-standing prompts, carefully tuned for Opus 4.6, started producing slightly different results. Agents made fewer tool calls. Responses felt more direct, sometimes too terse. Some requests returned a plain 400 error.
This is not a regression. Opus 4.7 is genuinely more capable: it scores 87.6% on SWE-bench Verified, up from 80.8% on Opus 4.6, and 64.3% on SWE-bench Pro, up from 53.4% [1]. The new xhigh effort level at 100k tokens already beats Opus 4.6 max at 200k tokens [1]. Vision resolution jumped from 1568px to 2576px [2].
But Opus 4.7 is also a breaking release. Some API parameters are gone. Default behaviors changed. Prompt scaffolding that helped 4.6 now works against 4.7. If you maintain a production prompt library, you need to know exactly what to change, what to remove, and what to add.
This guide walks through every material change and translates each into concrete prompt edits. It is written for anyone who maintains more than a handful of prompts: developers, prompt engineers, AI product teams, and power users.
1. The Three Breaking API Changes
These will return a 400 error if ignored. Fix them first.
1.1 Extended Thinking Budgets Are Gone
In Opus 4.6, you could set thinking: {"type": "enabled", "budget_tokens": 32000} to control how much the model deliberated. That parameter no longer exists. Opus 4.7 supports only adaptive thinking, and Anthropic's internal evaluations show adaptive reliably outperforms the old extended thinking approach [2].
Note that adaptive thinking is off by default in 4.7. If your prompts relied on deep reasoning, you must enable it explicitly.
1.2 Temperature, top_p, and top_k Are Rejected
Any non-default value for temperature, top_p, or top_k returns a 400 error in 4.7. The safest migration is to remove these parameters entirely and use prompting to shape the model's behavior [2].
If you relied on temperature=0 for determinism, accept that it never guaranteed identical outputs anyway. For structured outputs, shift the responsibility to the prompt: specify the exact format, use JSON schemas where possible, and add verification instructions.
1.3 Thinking Content Is Hidden by Default
Opus 4.7 streams thinking blocks but leaves the thinking field empty unless you opt in. If your product shows reasoning progress to users, this change manifests as a long silent pause before output begins.
The effort parameter is where Opus 4.7 hides its most important prompting decision. Get it wrong and you either waste tokens or undercut the model's intelligence.
Effort level
Best for
Trade-off
low
Simple extraction, classification, short transforms
Fastest, cheapest, least reasoning
medium
Standard conversational tasks, structured generation
Anthropic's official guidance: start with xhigh for coding and agentic use cases, and use a minimum of high for most intelligence-sensitive tasks [2]. In Claude Code, the default was raised to xhigh for all plans [1].
Want to know how effective your prompts are? Prompt Score analyzes them on 6 criteria.
Practical rule: if your prompt uses phrases like "think step by step", "plan before acting", or "reason carefully before responding", raise the effort level instead. These scaffolding instructions were compensating for a reasoning gap that Opus 4.7 closes natively at higher effort.
3. Behavior Changes That Require Prompt Edits
These are not breaking changes but they will change your outputs. Many of them mean you can delete prompt scaffolding that was helping 4.6 and now hurts 4.7.
3.1 More Literal Instruction Following
Opus 4.7 will not silently generalize an instruction from one item to another, and it will not infer requests you did not make [2]. This is the single largest behavioral shift.
What to remove: vague hedges like "try to", "if possible", "you might want to". These worked in 4.6 because the model would interpret them generously. In 4.7 they weaken your instructions.
Before:
"Try to extract all email addresses from the document if possible."
After:
"Extract every email address in the document. Return a JSON array. If none exist, return an empty array."
3.2 Response Length Adapts to Task Complexity
Opus 4.7 calibrates verbosity to perceived task difficulty rather than defaulting to a fixed length [2]. Simple questions get shorter answers, complex ones get longer.
What to remove: instructions like "be concise" on simple questions, or "be thorough" on simple ones. The model now calibrates automatically.
What to keep: explicit format constraints. "Return a 3-sentence summary" still works and overrides automatic calibration.
3.3 Fewer Tool Calls by Default
At medium and lower effort, Opus 4.7 reasons more and calls fewer tools. Raising effort increases tool usage [2]. If you have an agent that stopped using certain tools after the upgrade, the fix is usually to raise the effort, not to rewrite the tool descriptions.
Prompt-level fix: if you need the model to always use a specific tool for a class of tasks, make this a hard rule in the system prompt: "For any calculation involving more than 2 variables, you MUST use the calculator tool."
3.4 More Direct, Opinionated Tone
Opus 4.7 drops the warmer, validation-forward style of 4.6. Fewer emoji. Less "Great question!" scaffolding. More direct answers [2].
If your product needs a warmer tone (customer support, coaching, wellness), specify it explicitly in the system prompt. Do not assume the model defaults to warmth.
Example addition:
"Respond with a warm, supportive tone. Use encouraging language. Acknowledge the user's effort before suggesting improvements."
3.5 More Frequent Progress Updates
Opus 4.7 naturally provides more regular updates during long agentic traces. If you added scaffolding to force interim status messages (e.g., "After each tool call, tell me what you did"), remove it [2]. You are now paying tokens for behavior the model already exhibits.
3.6 Fewer Subagents Spawned
If you use sub-agent patterns, 4.7 spawns fewer by default. This is steerable: if your workflow depends on parallelization, state it explicitly. "For research tasks, delegate sub-queries to parallel sub-agents when the queries are independent."
Claude Opus 4.7 behavior changes mapped to prompt edits: what to remove, what to add, what to keep
4. Task Budgets: A New Cost Lever (Beta)
Task budgets give Claude an advisory token budget across a full agentic loop: thinking, tool calls, tool results, and final output. The model sees a running countdown and prioritizes accordingly [2].
The techniques you're reading about work. Test your prompts now with Prompt Score and see your score in real time.
response = client.beta.messages.create(
model="claude-opus-4-7",
max_tokens=128000,
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 128000},
},
betas=["task-budgets-2026-03-13"],
messages=[{"role": "user", "content": "Review the codebase and propose a refactor plan."}]
)
When to use: cost-sensitive workloads where you want the model to self-moderate (customer-facing agents, batch processing, high-volume pipelines).
When to skip: open-ended agentic tasks where quality matters more than speed. The minimum value is 20k tokens; restrictive budgets may cause the model to complete tasks less thoroughly or refuse them entirely [2].
task_budget is a suggestion the model is aware of. max_tokens is a hard cap the model does not see. Use both for different purposes: task_budget for self-moderation, max_tokens as a ceiling.
5. Vision: You Can Zoom In Deeper
Opus 4.7 raises maximum image resolution from 1568px (1.15 megapixels) to 2576px (3.75 megapixels). The model's coordinates now map 1:1 to actual pixels, so there is no scale-factor math for bounding boxes [2].
Prompt implications:
You can now reference fine details that were invisible to 4.6. "Read the timestamp in the bottom-right corner of the screenshot" works reliably.
For computer-use agents and document understanding, skip the downsampling step. Send full-resolution screenshots.
High-resolution images cost more tokens. If your task does not need fine detail, downsample to reduce costs.
The improvements extend to low-level perception (pointing, measuring, counting) and image localization. Tasks like "count the number of buttons in this UI", "mark the coordinates of each error message", or "measure the padding between these two elements" now produce significantly better results.
6. The New Tokenizer: Budget for 35% More Tokens
Claude Opus 4.7 uses a new tokenizer that may use 1x to 1.35x as many tokens as 4.6 when processing the same text [2]. This is not a bug; the new tokenizer contributes to the model's improved performance.
Practical actions:
Update your max_tokens parameters to give additional headroom (20-35% more)
Adjust compaction triggers and context window monitoring thresholds
Budget for higher costs on the same workload, at least initially
Re-measure token counts using the Opus 4.7-specific count_tokens endpoint before relying on old estimates
The 1M context window remains available at standard API pricing with no long-context premium, so the window itself is not the constraint. Your max_tokens ceiling and cost budget are.
Claude Opus 4.7 migration checklist: API changes, prompt scaffolding to remove, new features to leverage
7. A Practical Migration Checklist
Work through this list in order for every production prompt that targets Opus 4.7.
API layer (required):
Remove extended thinking budgets; use thinking: {"type": "adaptive"}
Remove temperature, top_p, top_k parameters
Add output_config: {"effort": ...} and pick the right level
Decide whether to opt into visible thinking with display: "summarized"
Remove "think step by step" when raising effort instead
Add explicit tone directives if you need warmth
Add explicit tool-use rules if critical tools are being skipped
Add explicit sub-agent directives if you rely on parallelization
Testing (do not skip):
Run A/B tests comparing 4.6 and 4.7 on representative tasks
Measure token usage; expect 1-1.35x increase
Measure quality on your own evals, not just benchmarks
Document which prompt version targets which model
8. Managing Prompts Across Model Generations
The Opus 4.7 release surfaces a problem that every serious AI team already knows: prompts are model-specific artifacts. A prompt tuned for 4.6 does not behave identically on 4.7, just as a GPT-5.4 prompt does not translate to Gemini 3.1 Pro without adjustment.
If you maintain prompts across model versions, you need:
Version control so you can track which prompt version works with which model, and roll back if a model upgrade degrades results.
Side-by-side testing to compare the same prompt on multiple models before switching.
Scoring and regression detection to catch drift early rather than after it affects users.
A central library so your team shares optimized prompts instead of rediscovering fixes independently.
This is the core problem Keep My Prompts solves. Every prompt is versioned with notes on the target model, effort level, and observed quality. The Prompt Score evaluates prompts on six criteria (specificity, context, structure, constraints, role, output format) so you can catch weak prompts before they reach production. The Promptimizer rewrites prompts to score higher, and the quality gate rejects variants that do not improve on the original.
For the Opus 4.7 migration specifically, you can version your 4.6 prompts, create 4.7 variants with the scaffolding removed, test both side by side, and ship the winner. No credit card required to start.
9. The Release Signal
Every model release tells you something about where the frontier is heading. Opus 4.7 tells us three things.
First, prompt scaffolding is on the way out. As models get better at reasoning natively, the verbose instructions we added to compensate become noise. More literal instruction following means prompts get shorter, not longer.
Second, effort is the new temperature. Granular control over reasoning depth (xhigh, high, medium, low) replaces sampling-parameter tuning. Your optimization surface moved from statistical knobs to semantic ones.
Third, vision and memory are catching up to text. High-resolution image support and better file-system-based memory signal that the next frontier for prompting is multimodal orchestration and stateful long-horizon workflows.
If your prompt library still treats every model the same, now is the moment to start versioning, testing, and organizing. The teams who build the discipline first will ship faster every time a new model drops.
Keep My Prompts helps you version prompts per model, score them on 6 quality criteria, and track what works as the frontier moves. Free to start, no credit card required.