Anthropic released Claude Opus 4.8 on Thursday, the latest version of its flagship AI model, arriving 41 days after Opus 4.7 in one of the fastest upgrade cycles in the company’s history.
The new model is available immediately across Anthropic’s platforms at the same price as its predecessor, and it comes with a specific set of improvements that the company is describing less in terms of raw benchmark scores and more in terms of the quality that turns out to matter most when an AI model is doing real work: knowing what it does not know, and saying so.
“A general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin,” Anthropic wrote in the release.
Opus 4.8 was specifically built to address that problem. Early testers report the model is more likely to flag uncertainties and less likely to make unsupported claims than any prior version.
The company says it is roughly four times less likely than Opus 4.7 to let coding flaws slip through without flagging them.
The model is accessible at claude.ai for Pro, Max, Team and Enterprise users, through the Claude API as claude-opus-4-8, in Claude Code and in Cowork. Developers using AWS, Google Cloud or Microsoft Foundry can access it through those platforms as well.
Pricing stays at $5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.7. A fast mode option, which runs at approximately 2.5 times the standard speed, has been cut to $10 per million input and $50 per million output, down from $30 and $150 respectively for Opus 4.7 fast mode, a 3x price reduction on the higher-throughput tier.
The Benchmarks
Opus 4.8 posts strong benchmark improvements across the categories that matter most for the agentic AI work that Anthropic has been positioning Claude as the best platform to handle.
Agentic coding scores rose from 64.3 percent to 69.2 percent. Multidisciplinary reasoning with tools improved from 54.7 percent to 57.9 percent.
Agentic computer use moved from 82.8 percent to 83.4 percent. The knowledge work score, a composite measure of performance across professional tasks, rose from 1,753 to 1,890.
On the Super-Agent benchmark, which tests a model’s ability to complete complex end-to-end agentic tasks without human intervention, Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and OpenAI’s GPT-5.5.
Those are the headline numbers, but Anthropic’s own framing of the release suggests the company views the honesty improvements as more significant than any single benchmark.
The problem it is addressing is not obscure to anyone who uses AI tools for real work. AI models that confidently assert progress they have not made, that claim to have completed a task when they have only partially completed it, or that present uncertain conclusions as firm ones, cause specific, compounding problems when they are deployed in agentic contexts where a human is not reviewing every output.
A model that says “I’m not sure” or “I found a potential issue I need to flag” when that is the honest assessment is more useful than a model that provides a confident wrong answer.
Testers described the improvement directly. In Claude Code, one tester noted that Opus 4.8 “asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes.”
That behavior, the model resisting the urge to proceed confidently when the evidence does not warrant confidence, is the practical expression of the honesty improvements in actual use.
Dynamic Workflows And What It Means For Large Codebases
The most significant new feature shipping alongside Opus 4.8 is Dynamic Workflows, a capability now available in research preview in Claude Code for Enterprise, Team and Max users.
Dynamic Workflows allows Claude to take on large, complex coding tasks by breaking them into a plan, deploying hundreds of parallel subagents to execute components of that plan simultaneously, verifying the outputs and reporting back to the user with the consolidated result.
The concrete demonstration Anthropic offered is meaningful: “Claude Code alongside Opus 4.8 can now carry out codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge, with the existing test suite as its bar.”
Codebase migrations, the process of changing a large software project from one architecture, language version or framework to another, are one of the most labor-intensive tasks in software engineering.
They require touching hundreds or thousands of files, verifying that changes do not break existing functionality and managing the dependencies between components throughout the process.
Doing that work autonomously, using the existing test suite to validate outputs, is the kind of capability that meaningfully compresses the engineering hours required for projects that would previously have occupied large teams for weeks.
Anthropic increased Claude Code’s rate limits specifically to support the higher token usage that Dynamic Workflows generates when running hundreds of parallel subagents in a single session.
The feature is designed for large codebases and the rate limit increase is designed to make sure users can run it at the scale it was built for.
The Effort Controls And The Billing Transition
Alongside the model release, Anthropic introduced new controls that allow users to set how much effort Claude applies to a response.
Available on claude.ai and in Cowork, the controls let users adjust the number of tokens the model spends on a task, essentially trading speed and cost against depth and thoroughness based on what a specific task requires.
The framing Anthropic gave for these controls is revealing about where the platform is heading. The effort controls “expose the cost and effort trade-offs to users as the company transitions to token-based billing from subscription tiers.”
That sentence describes a billing model change that has implications for how users and developers will pay for Claude going forward, moving from the current system of fixed subscription tiers toward a model where users pay based on how much compute their tasks actually consume.
The effort controls are the user-facing version of that transition, giving users visibility into the token cost of different levels of effort before the billing model fully shifts.
Claude Mythos And What Is Still Coming
Opus 4.8 is Anthropic’s most widely available model, but it is not its most capable one. Claude Mythos Preview, the model at the top of Anthropic’s capability stack, has been in limited preview with specific research and enterprise partners under Project Glasswing, a cybersecurity scanning program that has been testing Mythos’s capabilities in high-stakes environments.
The reason Mythos has not been released to all customers is explicit in Anthropic’s communications, the model’s capabilities require stronger safeguards than have been fully developed and validated.
Anthropic’s Alignment team has been working on those safeguards while the preview group provides real-world feedback on how the model behaves.
Thursday’s release included a specific timeline signal. “We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks.”
The phrase “coming weeks” is the most specific commitment Anthropic has made about the Mythos timeline, and it landed alongside the news that Opus 4.8’s prosocial traits and alignment metrics are approaching Mythos-level performance on the measures Anthropic tracks for behavior like supporting user autonomy and acting in the user’s best interest.
Axios’s coverage noted that while Opus 4.8 still lags Mythos on overall performance, its alignment scores are close, describing it as near-Mythos level alignment packaged in Anthropic’s broadly accessible production model.
That combination, strong alignment, strong performance, wide availability, unchanged pricing, is the package Anthropic shipped Thursday while the most powerful version of the model waits for the safety work that will bring it to everyone in the coming weeks.