
TLDR
Anthropic released Claude Opus 4.7 on April 16, 2026, as a direct upgrade to Opus 4.6. It delivers major gains on the hardest coding tasks, introduces a new xhigh effort level, supports images up to 2,576 pixels on the long edge (roughly 3.75 megapixels), and ships with automatic cybersecurity safeguards. Pricing stays flat at $5 per million input tokens and $25 per million output tokens. Early testers at Cursor, Replit, Vercel, Notion, Devin, Harvey, Databricks, and Warp report double-digit benchmark jumps, stronger instruction following, better long-horizon autonomy, and a more opinionated model that pushes back instead of agreeing reflexively.
Key Takeaways
- Direct upgrade from Opus 4.6 at the same price point, available via API as
claude-opus-4-7, plus Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. - New xhigh effort level slots between high and max, giving developers finer control over the reasoning-versus-latency tradeoff.
- Vision gets a real jump: images up to 2,576 pixels on the long edge, more than 3x prior Claude models. XBOW reported 98.5% visual acuity versus 54.5% for Opus 4.6.
- Coding benchmarks up across the board: Cursor saw 70% on CursorBench versus 58% for 4.6, Rakuten-SWE-Bench resolved 3x more production tasks, and GitHub measured a 13% lift on their 93-task benchmark.
- Long-horizon autonomy is a headline theme. Devin says Opus 4.7 works coherently for hours. Genspark highlights loop resistance and the highest quality-per-tool-call ratio they have measured.
- Instruction following is substantially tighter, which means old prompts written for loose-interpretation models may now behave unexpectedly. Re-tune prompts and harnesses.
- Better memory across file-system-based workflows, reducing the need for up-front context in multi-session work.
- Tokenizer changed: same input can now map to 1.0 to 1.35x more tokens. Opus 4.7 also thinks more at higher effort levels, so output token counts rise too.
- Cybersecurity safeguards automatically detect and block prohibited or high-risk cyber requests. Legitimate security researchers can apply to the new Cyber Verification Program.
- Claude Code gets /ultrareview, a dedicated review session that catches bugs and design issues. Pro and Max users get three free ultrareviews. Auto mode is extended to Max users.
- State-of-the-art on GDPval-AA, a third-party evaluation of economically valuable knowledge work spanning finance, legal, and other domains.
- Not the most capable overall model. That distinction still goes to Claude Mythos Preview, which also remains the best-aligned model Anthropic has trained.
Detailed Summary
What Claude Opus 4.7 Actually Is
Claude Opus 4.7 is Anthropic’s latest generally available frontier model, positioned as a targeted upgrade to Opus 4.6 rather than a ground-up new generation. The focus is squarely on advanced software engineering, long-running agentic workflows, and higher-fidelity vision. Anthropic describes it as handling complex, long-running tasks with rigor and consistency, paying precise attention to instructions, and devising ways to verify its own outputs before reporting back.
The positioning matters. Claude Mythos Preview, announced alongside Project Glasswing, remains the most powerful and best-aligned model Anthropic has trained. Opus 4.7 is the first release after Mythos Preview and serves a dual purpose: give developers a concrete upgrade today, and stress-test new cybersecurity safeguards on a less capable model before Anthropic attempts a broader release of Mythos-class systems.
Coding and Agentic Performance
The early-access testimonials read like a highlight reel of the agentic coding ecosystem. Cursor saw CursorBench scores jump from 58% on Opus 4.6 to over 70% on Opus 4.7. Rakuten measured 3x more resolved production tasks on Rakuten-SWE-Bench with double-digit gains in code quality and test quality. GitHub measured a 13% lift on a 93-task coding benchmark including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Notion observed a 14% improvement over Opus 4.6 at fewer tokens and a third of the tool errors, calling it the first model to pass their implicit-need tests.
Devin emphasized sustained autonomy, saying the model works coherently for hours and pushes through hard problems rather than giving up. Warp reported that Opus 4.7 passed Terminal Bench tasks prior Claude models had failed, including a tricky concurrency bug Opus 4.6 could not crack. Vercel highlighted a behavior they had not seen before: the model actually does proofs on systems code before starting work, and is noticeably more honest about its own limits.
A recurring theme across testimonials is that Opus 4.7 pushes back. Replit’s president said it feels like a better coworker because it challenges technical decisions instead of agreeing by default. Augment Code noted it brings a more opinionated perspective rather than simply agreeing with the user. For anyone building real engineering workflows, that pushback behavior is arguably more valuable than raw benchmark deltas.
Vision: The Quiet Breakthrough
The vision upgrade may be the most underappreciated change. Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels, which is more than three times the previous Claude limit. This is a model-level change, not an API parameter, so every image sent to Claude is processed at higher fidelity automatically.
XBOW, which builds autonomous penetration testing agents that rely heavily on computer use, reported the most dramatic single number in the entire announcement: 98.5% on their visual acuity benchmark versus 54.5% for Opus 4.6. They described their single biggest Opus pain point as effectively disappearing, unlocking an entire class of work where they could not previously use Claude. Solve Intelligence reported major improvements in multimodal understanding for life sciences patent workflows, from reading chemical structures to interpreting complex technical diagrams.
This unlocks computer-use agents reading dense screenshots, data extraction from complex diagrams, and any work requiring pixel-perfect references.
The New xhigh Effort Level
Opus 4.7 introduces an xhigh (extra high) effort level that sits between high and max. This gives developers a new middle gear for the reasoning-versus-latency tradeoff on hard problems. In Claude Code, Anthropic raised the default effort level to xhigh across all plans. For coding and agentic use cases, Anthropic recommends starting with high or xhigh effort rather than defaulting to medium.
Alongside effort controls, the Claude Platform is getting task budgets in public beta, letting developers guide Claude’s token spend so it can prioritize work across longer runs. This matters because Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings.
Token Usage Changes You Need to Plan For
Two token-related changes affect migration. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text, but the tradeoff is that the same input can map to 1.0 to 1.35x more tokens depending on content type. Second, Opus 4.7 thinks more at higher effort levels, which means more output tokens on hard problems.
Anthropic’s own internal coding evaluation shows the net effect is favorable when measured against quality delivered per token, but the recommendation is to measure the difference on real traffic rather than assume. Token usage can be controlled via the effort parameter, task budgets, or simply prompting the model to be more concise. Anthropic published a migration guide with tuning advice.
Claude Code Updates: /ultrareview and Auto Mode
Claude Code gets two meaningful additions. The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it out.
Auto mode, a permissions option where Claude makes decisions on behalf of the user so longer tasks run with fewer interruptions, has been extended from Pro to Max users. The pitch is that auto mode is safer than skipping all permissions while still enabling long autonomous runs.
Cybersecurity Safeguards and the Cyber Verification Program
Opus 4.7 ships with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses. During training, Anthropic experimented with efforts to differentially reduce cyber capabilities, meaning Opus 4.7’s cyber ceiling is intentionally lower than Mythos Preview’s.
For legitimate users, Anthropic launched a Cyber Verification Program for security professionals doing vulnerability research, penetration testing, and red-teaming. Real-world data from these safeguards will inform how Anthropic eventually releases Mythos-class models more broadly.
Safety and Alignment
Opus 4.7 shows a similar safety profile to Opus 4.6 overall. Honesty and resistance to prompt injection attacks improved. Some measures slipped modestly, notably a tendency to give overly detailed harm-reduction advice on controlled substances. Anthropic’s alignment assessment concluded the model is largely well-aligned and trustworthy, though not fully ideal. Mythos Preview still holds the crown as the best-aligned model according to Anthropic’s evaluations. The full Claude Opus 4.7 System Card has the complete breakdown.
Real-World Work Beyond Code
Opus 4.7 posts a state-of-the-art score on the Finance Agent evaluation and on GDPval-AA, a third-party evaluation of economically valuable knowledge work spanning finance, legal, and other domains. Harvey reported 90.9% on BigLaw Bench at high effort with noticeably smarter handling of ambiguous document editing tasks, including correctly distinguishing assignment provisions from change-of-control provisions. Databricks measured 21% fewer errors than Opus 4.6 on OfficeQA Pro document reasoning. Vercel went as far as calling it the best model in the world for building dashboards and data-rich interfaces.
Pricing and Availability
Pricing holds at $5 per million input tokens and $25 per million output tokens. Opus 4.7 is live today across all Claude products, the Claude API as claude-opus-4-7, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
Thoughts
The most interesting thing about this release is not the benchmark deltas, which are strong but expected for a point-release. It is the behavioral shift. When a dozen independent companies describe the same model as opinionated, willing to push back, self-verifying, and honest about its limits, that is a different product category than “next version, slightly better.” That is a model optimized for being a collaborator rather than an autocomplete.
For solo builders running long agentic sessions, the loop resistance and long-horizon autonomy claims are the ones worth taking seriously. Genspark’s framing is sharp: a model that loops indefinitely on 1 in 18 queries wastes compute and blocks users. If Opus 4.7 genuinely closes that failure mode, the economics of overnight autonomous runs change meaningfully.
The vision jump is the sleeper feature. 3.75 megapixel support plus the XBOW acuity number suggests computer-use agents are about to get a lot more reliable at reading actual screens. Anyone building browser agents, automated QA, or visual data extraction pipelines should retest their stacks this week.
The instruction-following tightening is a real gotcha. Prompts written against Opus 4.6’s looser interpretation habits may produce surprising results when the model now takes every word literally. Teams with production prompt libraries should budget time for re-tuning rather than expecting a drop-in swap.
Finally, the strategic framing around Mythos Preview is worth noting. Anthropic is explicitly using Opus 4.7 as a safeguards testbed for eventually releasing more capable cyber-capable systems. That is an honest acknowledgment that capability and deployment readiness are separate problems, and it sets a template for how frontier releases may work going forward.