Claude Opus 4.8, officially released. Same pricing. Improved Honesty, a leap in agentic capabilities, and Effort Control.

On Thursday, May 28, 2026, Anthropic officially released the new version of its flagship LLM, "Claude Opus 4.8." While keeping pricing unchanged from the previous-generation Opus 4.7 at $5 (approximately ¥775) per million input tokens and $25 (approximately ¥3,875) per million output tokens, it significantly improved its scores, reaching 69.2% on SWE-Bench Pro (4.7 scored 64.3%) and 1,890 points on GDPval (4.7 scored 1,753 points). New features include improved Honesty—"the probability of leaving code defects unaddressed is roughly one-quarter that of the previous generation"—as well as "Effort Contr

The Weight of a "Minor Version" That Arrived on a 41-Day Cycle

In its official blog post "Introducing Claude Opus 4.8," Anthropic rolled out Opus 4.8 on May 28, just 41 days after Opus 4.7 (released April 17, 2026). This is a pace that clearly exceeds the "months-long" update cadence the company has traditionally followed. TechCrunch reported it as "a much faster upgrade cycle than normal for Anthropic," while Axios added that a general release of the unannounced higher-tier model "Mythos" is "in the coming weeks."

Multiple outlets point to the backdrop behind this sense of speed: a three-way race against OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro, along with a within-the-year IPO race that comes right after Anthropic raised $30 billion (about ¥4.65 trillion) in a Series G in February 2026 at a post-money valuation of $380 billion (about ¥58.9 trillion). Yahoo Finance ran the headline "IPO race with OpenAI heats up," positioning the Opus 4.8 release as proof of product strength amid this competition.

From an engineer's perspective, this "minor-number" release is being distributed quickly in the form of the API identifier claude-opus-4-8, and even at the SDK level, constants such as Model.ClaudeOpus4_8 (C#), anthropic.ModelClaudeOpus4_8 (Go), and Model.CLAUDE_OPUS_4_8 (Java) were added immediately. In other words, existing code using Opus 4.7 is designed to work simply by swapping the model ID, making the migration cost as close to zero as possible. This reflects Anthropic's strategy of "billing itself as a minor version, but shipping with a major-grade posture."

Benchmark: +4.9pt over the previous generation in agentic coding, but the reality is it still loses on Terminal-Bench

The most noteworthy metric is the score on "SWE-Bench Pro," which measures agentic coding capability. According to a table compiled by OfficeChai of the official figures, Opus 4.8 scored 69.2%, Opus 4.7 scored 64.3%, OpenAI GPT-5.5 scored 58.6%, and Google Gemini 3.1 Pro scored 54.2%—meaning Opus 4.8 secured a lead of more than 10 points over its competitors on SWE-Bench Pro.

On OSWorld-Verified, which measures agentic computer operation, it scored 83.4% (4.7 was 82.8%, GPT-5.5 was 78.7%, Gemini 3.1 Pro was 76.2%), and on GDPval, developed by OpenAI to measure knowledge-work performance, it scored 1890 points (4.7 was 1753 points, GPT-5.5 was 1769 points), pulling far ahead of the competition in practical capability within agentic contexts. On the tool-use version of "Humanity's Last Exam," which tests multi-domain reasoning, it scored 57.9% (4.7 was 54.7%), and on the no-tool version, the published result was 49.8%. It also recorded several "firsts": 53.9% on agentic financial analysis (Finance Agent v2), 84% on the browser-agent evaluation Online-Mind2Web, an end-to-end completion of all cases on what Anthropic's official blog calls the "Super-Agent benchmark," and the first-ever score above 10% on the "all-pass standard" of a legal-agent benchmark.

That said, there are also figures here that Silicon Valley engineers should examine closely. On Terminal-Bench 2.1 (autonomous coding in the terminal), GPT-5.5 leads with 78.2% against Opus 4.8's 74.6%. In other words, if you isolate just "autonomous tasks that are completed within the shell," there are still areas where OpenAI holds the advantage. Opus 4.8 wins on overall capability, but for the kind of agent operations completed within a CLI, a full commitment to GPT-5.5 is also worth considering—that is the honest read. Niko Grupen, Harvey's applied research head, quoted in Inc. magazine, commented that "it recorded the highest score ever on our internal legal-agent benchmark," and the view that Opus 4.8 stands a cut above for enterprise use cases requiring long-context reasoning is becoming established.

誠実さ（Honesty）— Hallucination's "code defect pass-through rate" cut to one quarter

The most widely reported aspect of Opus 4.8 is its improvement in "Honesty." According to Anthropic's official blog and reporting from cryptobriefing, Opus 4.8 is "around four times less likely" to let flaws in its own code pass without flagging them, compared to Opus 4.7. Tom's Guide put it in a headline as being "far less likely to 'fake' answers," while Inc. magazine called it "its most honest model yet."

The essence of this improvement lies not in mere "factual accuracy" but in greater metacognitive precision. To borrow Anthropic's official phrasing, Opus 4.8 is "more likely to flag uncertainties about its work" and "less likely to make unsupported claims." From an engineer's perspective, what this means is that in code review, "before stamping an LGTM, it is now more likely to self-check whether it is harboring any oversights."

If you are a developer who has used everything up through Opus 4.7, you have surely had the experience where "I asked Claude to 'review the entire PR and point out any problems,' and it confidently came back with 'no problems,' only for it to fail in CI." With Opus 4.8, you can expect this type of "oversight born of overconfidence" to be substantially reduced. As a practical tip, it's worth temporarily removing the defensively-written instruction prompts you used to add—things like "Do not miss anything. List every suspicious spot."—and observing the raw response. Because the model has now internalized what used to require "prompt hacks that prompt self-skepticism" in the previous generation, the relative usefulness of those hacks should have diminished. In alignment evaluations as well, Anthropic explains that "the incidence of misaligned behavior has dropped substantially, reaching a level on par with the unreleased model Mythos."

Effort Control — Control the "depth of thinking" in five levels with a single model

Alongside Opus 4.8, the biggest operational change for engineers is the formalization of the "Effort" parameter. According to Anthropic's official API documentation (platform.claude.com/docs/en/build-with-claude/effort), Effort is a parameter with five levels—low/medium/high (default)/xhigh/max—that controls "the amount of tokens Claude spends generating a response." It was partially introduced in Opus 4.7 as well, but with Opus 4.8 the recommended guidance was spelled out in the official documentation.

Breaking down the official guidance, low is for "short, well-scoped tasks" and subagent use cases; medium is for "decent results while keeping costs down"; high is the default for "complex reasoning, difficult coding, and agentic tasks"; xhigh is the "recommended starting point for coding and agentic work" as well as for handling "tasks that run longer than 30 minutes" and "budgets on the scale of millions of tokens"; and max is reserved only for "frontier-level problems." Anthropic itself explicitly notes that max carries the risk of "falling into overthinking and degrading quality on structured output," so it is no silver bullet.

As an implementation tip, when calling via curl you hang effort: "xhigh" inside output_config:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 65536,
    "messages": [{"role":"user","content":"…"}],
    "output_config": {"effort": "xhigh"}
  }'

Anthropic's strong official recommendation states: "When running with xhigh or max, always allocate a large max_tokens. Start at 64k tokens and tune as needed." This is because when subagents or tool calls chain together, a small max_tokens will cut off the agent in the middle of its reasoning. The budget_tokens parameter favored in Opus 4.6 has been deprecated, and in Opus 4.7/4.8 the canonical approach is the combination of adaptive thinking (thinking: {type: "adaptive"}) and effort. Note that in Opus 4.8 the manual thinking: {type: "enabled", budget_tokens: N} is unsupported and returns a 400 error, so be careful: if you migrate while leaving existing budget settings in place, you'll run into trouble.

In claude.ai and Cowork (the team-oriented experience from the former Anthropic Console lineage), an Effort selection UI was also added next to the model selector. You can choose between extra (corresponding to the API's xhigh) and max, with the default being high. The official recommendation is that "extra is for difficult tasks and long-running asynchronous workflows." Another important point is the official explanation that, compared to Opus 4.7's default, Opus 4.8's default high delivers "better performance at the same token count."

Dynamic Workflows — Running Hundreds of Subagents in a Single Session

The "Dynamic Workflows" feature introduced in Claude Code is being treated as a research preview, and has been released for the Enterprise, Team, and Max plans. According to Anthropic's official explanation, this is a capability whereby a large model such as Opus "plans, executes, and verifies hundreds of parallel sub-agents within a single session." Specifically, Claude Code is said to be able to carry out "codebase-scale migrations, from kickoff to merge, working across hundreds of thousands of lines of code while using the existing test suite as a benchmark."

What makes this design interesting from an engineer's perspective is the architecture in which each sub-agent runs in "an independent context window" and sends back "only the relevant information" to the main orchestrator. This is a classic Map-Reduce-style LLM orchestration, and it means that the implementation pattern of not polluting the main orchestrator's context is now being provided as a primitive on the API side.

The practical use cases being reported include the kind of work that would normally require "creating hundreds of PRs under human supervision" — for example, "a whole-codebase migration from React 17 to 19," "the exhaustive addition of type annotations in Python," and "a bulk rewrite from an internal DSL to a GraphQL schema." Up through the Opus 4.7 era, the caller had to write "the logic to decompose a massive task," but Opus 4.8 + Dynamic Workflows has Claude take on both the decomposition and the verification.

For tech engineers in Silicon Valley, there are two important observations here. First, the existence of Dynamic Workflows backs up the reason Opus 4.8 recommends "starting at 64k" for max_tokens. Since aggregating the sub-agents' results alone consumes tens of thousands of tokens, having the main orchestrator's max_tokens at 16k is a non-starter. Second, this clearly lays out the path by which Anthropic realizes its ambition of "turning Claude into a contractor for codebase refactoring and migration" — not through a tool, but through a combined model-plus-runtime approach. It makes for a development experience with a stronger "autonomous agent" character, distinct from IDE-layer wrappers like GitHub Copilot or Cursor.

The power of the Messages API — system entries can now be placed "inside the message array"

The Messages API changes shipped alongside Opus 4.8 may look unassuming, but they meaningfully reshape the developer experience. Until now, the system prompt could only be specified at the very beginning of an API request, but starting with Opus 4.8, you can "interleave system entries within the messages array." According to Anthropic's official explanation, this enables a workflow where you can "update instructions to Claude mid-task—without breaking the prompt cache, and without having to route through a user turn."

What does this mean from an engineering standpoint? Previously, if you wanted to "add or remove permissions," "swap out environment variables," or "toggle tools on and off" in the middle of a long-running autonomous agent execution, your only options were to regenerate with a new system prompt or to doctor a user turn. The former breaks the prompt cache, causing billing and latency to spike; the latter pollutes the conversation log and makes debugging difficult.

With the combination of Opus 4.8 and the new Messages API, a flow such as "the initial system prompt grants read-only permissions → once the verification phase finishes, a mid-task system entry is added to grant write permissions → write permissions are revoked after completion" becomes implementable without breaking the prompt cache. The correct reading is that access control and capability toggling for long-running agents are now supported as an API primitive. This change carries especially significant operational impact for teams that provide dynamic tooling over an MCP (Model Context Protocol) server.

Fast Mode — What Does 2.5x Speed at One-Third the Price of the Previous Generation Mean?

Opus 4.8's "Fast Mode" was set, according to Anthropic's official public pricing, at $10 (about ¥1,550) per million input tokens and $50 (about ¥7,750) per million output tokens. As both Axios and TechCrunch explicitly state, this offers 2.5x the throughput at 2x the price of standard mode. 9to5Mac notes that "Fast Mode in the Opus 4.6 era carried a 6x premium over standard"—meaning that whereas the price of speed had been 6x through the previous generation, with Opus 4.8 it costs only 2x, which is why it is described as "3 times cheaper."

In an article written by cryptobriefing before the official release, a skeptical analysis was offered, stating that "this is an unconfirmed rumor as of publication, and a shift from 6x to 2x would be a radical pricing strategy change." However, as of the official release on May 28, multiple primary outlets (Anthropic's official channels, TechCrunch, Axios, and 9to5Mac) reported this figure in agreement, so it can be regarded as confirmed information. Anthropic's official blog itself directly states that "Fast mode … is now three times cheaper than it was for previous models."

The interpretation from a Silicon Valley perspective is as follows. The situations where Fast Mode should be used are "user-interactive workflows with high latency requirements"—for example, inline completion within an IDE, chat UIs for end users, and API-gateway-like use cases with low-latency demands. Conversely, scenarios where you want to prioritize cost over speed—autonomous agents running in nightly batches, long-running codebase migrations, and document generation—should be run on standard mode. The structure by which Anthropic holds pricing steady in "standard" while carving out the "value of speed" as a separately billed axis through Fast Mode is a clever design that prompts the calling side to optimize by use case.

Anthropic's "Adoption Game" Signaled by Holding Prices Steady

Releasing Opus 4.8 at the same price as Opus 4.7 is a clear message to the enterprise-adoption segment. Yahoo Finance wrote that "customizable effort settings help users manage token consumption," while Axios analyzed that it "reflects growing customer demand for cost-effective AI solutions."

What is interesting here is Anthropic's strategy of effectively lowering the per-unit cost not by "reducing the price per token," but by offering "a model that delivers the same results with fewer tokens at the same price per token." The description in the official Opus 4.8 blog—"coding tasks, this effort level spends a similar number of tokens as Opus 4.7's default, but with better performance"—captures the essence of this. In a token-billed SaaS business, "raising quality while keeping the headline price flat" is the most effective form of a price cut.

On the business front, a SaaStr report as of February 2026 noted that Anthropic's annualized revenue (ARR) had reached $14 billion (about ¥2.17 trillion). This represents a 14-fold growth in just 14 months from roughly $1 billion as of December 2024. It ranked first on the CNBC Disruptor 50 2026, and as of May, a Bloomberg-affiliated leak circulated stating that it was "in talks to raise at least $30 billion (about ¥4.65 trillion) at a pre-money valuation exceeding $900 billion (about ¥139.5 trillion)" (per Sacra's tally). Keeping the price of Opus 4.8 flat is best read as a move to "lower the barriers to adoption" in order to sustain this growth trajectory.

各メディアの報道スタンス比較

Surveying the coverage of Opus 4.8, it's striking how clearly each outlet's angle differs. TechCrunch anchored its piece on the "Dynamic Workflows tool," framing it as "a competitive move following recent releases like OpenAI's Codex and Google's Gemini Flash." Axios emphasized the model's relationship to the unreleased Mythos model, offering a roadmap-oriented perspective: "Opus 4.8 falls short of Mythos, but a general release of a Mythos-class model is expected within weeks." Yahoo Finance adopted an "IPO race" frame, casting the story as a display of product strength amid the public-offering competition with OpenAI.

Tom's Guide and 9to5Mac, writing for consumers and Mac developers, emphasized experiential improvements—"more honest" and "fewer hallucinations." Inc. magazine built its piece around the message of "the most honest model," citing Harvey's adoption case from a business-user standpoint. cryptobriefing published both a skeptical article just before the official release and an explanatory piece afterward; it had taken a particularly cautious stance on the abrupt change in Fast Mode's pricing structure, but corrected this to confirmed information on the day of release.

Geeky Gadgets, at the leak stage, circulated the unconfirmed claim that "a tokenizer update could increase token consumption by about 30%." Across multiple primary sources following the official release, no clear mention of this point has been found. Anthropic's official blog makes no reference to a tokenizer change, and examining the API SDK diffs shows no changes to the user-side token-counting API, so at this point it is reasonable to treat the Geeky Gadgets leak as "unverified." As of the time of writing, no independent primary source corroborating this 30%-increase claim has been confirmed.

In the Japanese-language sphere, as of the time of writing (2026-05-29), there are still few in-depth features from major newspapers, and we are at the stage of translating English-language primary sources. Outlets like the Nikkei and Toyo Keizai Online are expected to dig in earnest only a few days from now.

What Silicon Valley Tech Engineers Should Be Doing Right Now (Practical Tips)

First, if you're moving an existing codebase to Opus 4.8, it works simply by swapping the model ID from claude-opus-4-7 to claude-opus-4-8. However, any place that explicitly specifies thinking: {type: "enabled", budget_tokens: N} will return a 400 error, so you'll need to rewrite it to the combination of thinking: {type: "adaptive"} plus output_config.effort. Teams carrying older code with budget_tokens scattered throughout should flush them out with a bulk grep before running regression tests.

Next, the operational design of effort settings. If you broadly classify production workloads, my practical guideline is: "user-interactive (chat, completion, conversational interfaces)" gets medium or low, "code review and code generation" gets high or xhigh, and "nightly batch jobs, codebase migrations, and complex financial analysis" get xhigh or max. Anthropic's official warning that "max causes overthinking in structured output" is important—carelessly choosing max in situations like strict JSON-schema output will actually degrade quality.

When using xhigh/max, starting max_tokens at 64k as officially recommended is the safe bet. In Anthropic's Go SDK you specify it as anthropic.OutputConfigEffortXhigh, and in the Python SDK as output_config={"effort": "xhigh"}. When using it with the streaming API, the thinking phase grows longer, so be mindful of frontend timeout settings (especially HTTP/2 keep-alive and the API gateway's default 30-second timeout).

If you want to try Dynamic Workflows, I strongly recommend starting your migration work with "a repository that has a solid test suite." Just as Anthropic itself writes "existing test suites as a benchmark," tests become the ground truth for quality assurance. If you run a massive migration on a codebase with thin test coverage, there's a risk that subagents will mass-produce "code that runs but is semantically wrong."

The new Messages API feature (mid-task system entry) shows its true value when used for dynamically toggling tool permissions, adding context during long-running jobs, and swapping prompts in A/B tests. Its essential value is that it doesn't break the prompt cache; the pattern of throwing a long system prompt first to get it cached, then adding differential instructions later via a mid-task system entry, is bound to become the new best practice.

Finally, how to use Fast Mode selectively. The most cost-efficient approach is to choose Fast Mode only for production paths with end-user latency requirements, and to fix internal tools and batch processing to standard mode. Within the same product, the realistic approach is to run a two-track operation—"claude-opus-4-8 + Fast Mode for user-facing, claude-opus-4-8 standard mode for internal use"—and route between them at the API gateway layer.

Future Outlook — Mythos and Beyond

As Anthropic itself mentions in the official Opus 4.8 blog post, an even higher-tier, unreleased model called "Mythos" is waiting in the wings, positioned above Opus 4.8. For now it is being offered only to limited partners for cybersecurity use under the name "Project Glasswing," but it was announced that "once the cybersecurity safeguards have been fully developed, we expect to make it available to general customers within a few weeks." Axios explicitly states that "Opus 4.8 still underperforms compared to Mythos," confirming that the existence of a higher-tier model is established fact.

From an engineer's perspective, the realistic outlook is that when Mythos arrives on the standard API, "the latency and cost structure of apps built on Opus 4.8 will need to be reevaluated." Mythos could be deployed with a design such as 5–10× the cost of standard mode, restricted to xhigh/max only, or limited to agent-oriented operation—and in any case, a phase is coming where operational architectures that separate "workloads running stably on Opus 4.8" from "novel problems that only Mythos can solve" will be put to the test.

In addition, on the competitive front, OpenAI's GPT-5.6 (slated for June 2026 based on leaked information) and Google's next-generation Gemini are expected to be rolled out in succession. It is all but certain that "Opus 4.8 vs. GPT-5.6" comparison articles will become the main battleground for tech media from June onward, and at that point "what you can build / have built with Opus 4.8" will translate directly into competitive strength for both Silicon Valley startups and enterprises.

Opus 4.8 is a release with an extremely low barrier to business adoption, combining all three of "unchanged pricing, improved capability, and expanded developer primitives." For Silicon Valley engineers, it is harder to find a reason not to start getting hands-on right now.