Claude Code's Latest Model Release: What Are Mythos 5 and Fable 5?

On June 9, 2026, Anthropic released "Claude Fable 5," which it positions as the most powerful generally available model to date, along with "Claude Mythos 5," a version with some of its safety guardrails removed. Both share exactly the same underlying intelligence, differing only in the presence or absence of safety guardrails — an unusual design philosophy. The models recorded 80.3% on SWE-bench Pro, a demanding software engineering benchmark, significantly outpacing OpenAI's GPT-5.5 (58.6%) and Google's Gemini 3.1 Pro. This article covers the full picture of the service — from pricing structures and usage tips to Silicon Valley's reaction and the company's imminent IPO.

First, the big picture: on June 9th, the "most powerful commercially available model" was unleashed.

On Tuesday, June 9, 2026 (US time), Anthropic released Claude Fable 5 to the general public. In the company's own words, it is a frontier model that "surpasses the capabilities of any model we have previously made available," and it is said to have reached state of the art on virtually every benchmark tested, including software development, knowledge work, image recognition (vision), and scientific research. It is available immediately via the Claude.ai chat interface, the API (model name: claude-fable-5), Amazon Web Services, Google Cloud, Microsoft Foundry, and the coding agent Claude Code and Claude Managed Agents.

The timing is also telling. On June 10—the publication date of this article—Anthropic's developer conference "Code with Claude" holds its Tokyo event, with "Extended Tokyo," aimed at independent developers and early-stage founders, scheduled for the following day, June 11. This marks the third city this year, following San Francisco (May 6) and London (May 19), bringing the excitement of the new model launch directly to Japan's developer community while it is still fresh. Simultaneous interpretation is provided, and online viewing is also available. The fact that Silicon Valley's "world's most powerful model" is being demonstrated in Tokyo just one day after its announcement is itself symbolic of the pace of the AI development race in 2026.

It should be noted that while the title of this article refers to "the latest model for Claude Code," to be precise, Fable 5 is a general-purpose frontier model usable across everything from chat to the API—and within that scope, it has been specifically designed to show its true value in autonomous agent environments such as Claude Code. In what follows, we begin by breaking down, with concrete examples, exactly what these two models are.

What Are Mythos 5 and Fable 5——"Same Brain, Different Safety Rails"

The single most important point to grasp upfront is that Fable 5 and Mythos 5 share an identical core — the same underlying model. The only difference is the presence or absence of safety guardrails. Anthropic has officially explained: "Fable derives from the Latin *fabula* (that which is told), sharing roots with the Greek *mythos* (myth, story). What separates the two models is their safeguards." In other words, the naming concept is that a dangerously powerful raw stone called "Mythos" has been refined into "Fable" — something that can be told safely.

The origins of this series trace back to April 7, 2026, when Anthropic announced "Claude Mythos Preview" on its research division site (red.anthropic.com). Despite being a general-purpose language model, it demonstrated exceptional capabilities in computer security in particular — reportedly discovering zero-day vulnerabilities across all major operating systems and web browsers. It autonomously constructed sophisticated attack chains including JIT heap sprays, sandbox escapes, and multi-vulnerability exploit sequences. It was said that when a non-security expert simply asked it to "build a remote code execution exploit overnight," working code was ready by morning. Its successful exploit count for Firefox stood at 181, dwarfing the mere 2 from the previous generation's Opus 4.6. There were also somewhat chilling reports that it had unearthed a bug in the security-renowned OpenBSD that had "gone unnoticed for 27 years."

The capabilities were so formidable that Anthropic chose not to release it publicly, instead launching a restricted program called Project Glasswing. The initiative is premised on the idea of giving only cyber defenders — those protecting the world's critical infrastructure — early access to Mythos, so that defenders can strengthen their posture before attackers gain the upper hand. Participating organizations reportedly include AWS, Microsoft, Apple, CrowdStrike, and Cisco.

The June 9th release opens this landscape to the general market in a two-tier structure. Fable 5, available for purchase by general consumers and enterprises, has a built-in mechanism by which the model itself blocks responses in dangerous domains and automatically falls back to the conventional safe model, "Claude Opus 4.8," to handle those requests instead. Mythos 5, on the other hand, is the same model with some of those safety constraints lifted, accessible only to organizations that have passed a vetting process — cyber defenders, critical infrastructure operators, and biomedical researchers — through Project Glasswing. Mythos 5 is positioned as "the most powerful cybersecurity-capable model in the world." By analogy: Fable 5 is a consumer car with safety governors that anyone can drive; Mythos 5 is a track-only vehicle with the same engine under the hood.

What it can specifically do — "A colleague who keeps working alone for days"

Let's move past abstraction and look concretely at what Fable 5 does in practice. What Anthropic repeatedly emphasizes is the ability to delegate end-to-end work that would take a human hours, days, or weeks — and leave it almost entirely unattended. When deployed in agentic environments like Claude Code or Claude Managed Agents, Fable 5 can work continuously over multiple days, planning tasks in stages, delegating work to sub-agents (subordinate AIs), and verifying its own output.

The improvements cited in the official documentation are specific. First, extended autonomy: completing goal-oriented tasks spanning multiple days while retaining instructions throughout. Second, first-attempt accuracy — early testers have reported that systems which previously required days of trial and error were built in a single implementation pass on well-specified, complex problems. Third, vision: the ability to interpret charts, graphs, and tables embedded in files and PDFs, and to visually verify the behavior of code it has written. It has reportedly been trained to handle tilted, blurry, or noisy images by autonomously invoking bash tools and image-cropping tools as needed.

Third-party evaluations are equally concrete. Cursor CEO Michael Truell stated that "Claude Fable 5 is the state-of-the-art model on our CursorBench," and GitHub Chief Product Officer Mario Rodriguez commented that "in early testing, it handled complex, long-running coding tasks with a level of autonomy and reliability that exceeded previous benchmarks." Vibe-coding platform Base44 noted it is "great at one-shotting full apps and excels at tool calling," while AI workspace Genspark reported it "clearly outperformed every other model in UI design and game coding." In short, the shared takeaway across these companies is that it has moved closer to a colleague who can independently handle the full cycle — from design through implementation, verification, and revision — rather than simply answering one-off questions.

Overwhelming Benchmark Superiority

The numbers make the advantage clear. Most notable was the challenging software engineering benchmark SWE-bench Pro, where Fable 5/Mythos 5 scored 80.3%. On the same metric, OpenAI's latest general-purpose model GPT-5.5 scored 58.6%, Anthropic's own previous-generation Opus 4.8 scored 69.2%, and Google's Gemini 3.1 Pro scored 54.2% (figures compiled from Anthropic's public tables by TechCrunch and specialist media). A gap of more than 20 percentage points is an exceptional spread for this kind of increasingly saturated benchmark.

Even more striking is Cognition's FrontierCode Diamond — a difficult benchmark measuring high-quality, maintainable agentic coding — where Fable 5/Mythos 5 scored 29.3%. Compared to Opus 4.8's 13.4% and GPT-5.5's 5.7%, this represents a difference that is, quite literally, an order of magnitude. Data analytics platform Hex described Fable 5 as "the first model to ever break 90%" on its core benchmark for lengthy, complex analytical tasks, and it reportedly achieved "the highest score among all models" on the Hebbia Finance Benchmark for financial analysis.

Summarizing the major public benchmarks, the picture looks roughly as follows (sources are figures from Anthropic's public benchmark table as reproduced and organized by various media outlets; ★ indicates the Mythos 5 figure — note that for Fable 5, safety guardrails cause scores in the relevant domains to drop to Opus 4.8 levels, as discussed below).

Benchmark (capability measured)	Fable 5/Mythos 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-bench Pro (practical code repair)	80.3%	69.2%	58.6%	54.2%
FrontierCode Diamond (high-quality agentic development)	29.3%	13.4%	5.7%	—
Terminal-Bench 2.1 (CLI operations)	88.0%★	82.7%	83.4%	70.7%
GDPval-AA (practical knowledge work, Elo score)	1932	1890	1769	1314
Humanity's Last Exam (no tools, hard knowledge questions)	59.0%★	49.8%	41.4%	44.4%
OSWorld-Verified (PC automation)	85.0%	83.4%	78.7%	76.2%

Here I want to raise one caveat that practicing engineers will notice but that promotional coverage on other sites tends to miss. The "most eye-catching numbers" — 78.0% on ExploitBench (exploit code generation), standout results on various biosecurity benchmarks — are in fact Mythos 5 figures, scores from a model the general public cannot purchase. Because Fable 5 redirects cybersecurity and biology/chemistry queries to Opus 4.8 via safety guardrails, its numbers in those domains are lower. Conversely, it is precisely in the domains where you can actually pay to use it — coding, knowledge work, vision — that Fable 5 truly shines. When reading a benchmark table, the sophisticated approach is to first ask: "Is this a Fable number or a Mythos number?"

For reference, on the "Senior Engineer Benchmark" operated by media company Every, Fable 5 reportedly scored 91 out of 100, a level on par with human senior engineers and well above the previous best of 63 from Opus 4.8 (per Every/Digg reporting — worth reading with some skepticism given that it is an independent, third-party metric).

Fable 5 Pricing Structure and Future Outlook

The pricing is straightforward. $10 per million input tokens and $50 per million output tokens, with Fable 5 and Mythos 5priced identically. This is exactly double the previous-generation Opus 4.8's $5 input / $25 output pricing. At the same time, Anthropic describes it as "less than half the price of Claude Mythos Preview," framing it as a significant drop from the price at which users had exclusive early access to the frontier. With prompt caching, a 90% discount applies to input tokens, which can dramatically reduce effective costs for agentic use cases that repeatedly reference long contexts. A US-only inference option carries a 1.1x multiplier on both input and output. It's also worth noting that queries rerouted to Opus 4.8 as a safety fallback are billed at Opus 4.8 rates rather than Fable rates — a consumer-friendly touch.

What stands out is the phased rollout structure. Anticipating surging demand, Fable 5 will be included at no extra cost in Pro, Max, Team, and seat-based Enterprise subscriptions from June 9 through June 22. From June 23 onward, usage will draw from usage credits, with a roadmap to restore standard-plan inclusion "once sufficient compute capacity is secured." Access via API and Consumption-based Enterprise is fully available from day one.

This design — "distribute for free, then switch to paid" — sparked debate on Hacker News. One user noted that it "has an offer-then-remove feel that raises eyebrows; it looks like they want to nudge subscribers toward pay-per-use," while another developer admitted, "it feels bad to be priced out of frontier LLMs." That said, the flip side is that this is evidence of performance so strong that demand overwhelmed compute resources from the very first day. With an IPO on the horizon (discussed later), there is also a significant dimension of Anthropic demonstrating to investors that it can "scale frontier intelligence to customers without triggering catastrophic misuse." The tug-of-war between pricing and capacity will remain the central challenge in operating this new model for the foreseeable future.

The "Design Philosophy" of Safety Mechanisms — Why It Differs from OpenAI and Google

One aspect of Fable 5 that cannot be overlooked alongside its performance is its safety design, which has been brought to the forefront. Fable 5 runs three safety classifiers: offensive cyber (creation of exploits, malware, and attack tools), biological and life sciences (content with misuse potential such as experimental methods and molecular mechanisms), and extraction of the model's summarized reasoning content (distillation/reasoning_extraction). Queries that fall under these categories cause the model to halt its own response and fall back to the safer Opus 4.8. According to Anthropic, this triggers in fewer than 5% of all sessions, meaning over 95% of sessions are completed solely by Fable's own responses.

Robustness verification has also been thorough. External bug bounty testing reported "zero universal jailbreaks" after more than 1,000 hours of testing, while the UK AI Safety Institute (UK AISI) noted in a brief initial testing window that it had "gotten one step closer" — language calibrated to discourage overconfidence. Additionally, Mythos-class traffic is subject to a 30-day data retention policy, used solely for safety purposes — defending against novel attacks and reducing false positives — with no use in model training and deletion after the retention period expires.

This is where the strategic differences between Anthropic and OpenAI/Google come into sharp relief. In fact, just five days before this release, on June 4th, Anthropic published a blog post warning that "AI is approaching recursive self-improvement" — the stage at which AI begins designing and building successor AIs without human intervention. That piece, authored by co-founder Jack Clark and others, cited as evidence the fact that "Claude now writes over 80% of the code merged at Anthropic (compared to single-digit percentages before Claude Code launched in early 2025)," and called — in an analogy to Cold War intermediate-range nuclear arms reduction — for the world to preserve the "option" to pause frontier development.

Releasing "the most powerful model ever" five days after warning that "AI is becoming dangerous" — TechCrunch framed this apparent contradiction with a touch of irony in its headline. But read from an industry perspective, the two are actually consistent. The design approach taken this time — separating Fable (commercially available, with safety constraints) from Mythos (restricted to defenders, with constraints removed) — is Anthropic's concrete answer to the question raised on June 4th: how to harness dangerously powerful capabilities for defense and productivity without enabling misuse. Rather than competing purely on capability, the company places "the dual achievement of capability and safety" at the core of its brand — and that is precisely what has earned it a distinctive position in Silicon Valley.

Tips for Mastering It — Delegate Your "Most Difficult Challenges"

The key to unlocking performance is abandoning the intuitions built around older models. Anthropic's official prompt guide opens by stating that "the teams getting the best results are throwing their hardest unsolved problems at Fable 5. If you only test it on easy tasks, you'll tend to underestimate the breadth of its capabilities." In other words, the trick is to deliberately choose tasks that were previously too heavy for older models and hand them over entirely — from requirements definition through execution.

The primary control lever is a setting called effort. It adjusts the trade-off between intelligence, speed, and cost in a single dial: default to high for most tasks, use xhigh for particularly demanding situations, and switch to medium or low for routine work. Even at lower effort settings, the model reportedly outperforms older models at their maximum settings. On the other hand, for difficult problems a single request can take several minutes — and hours for autonomous operation — so it is recommended to revisit client timeouts, streaming, and progress indicators, and to redesign harnesses so they can asynchronously "check in" on progress without blocking execution.

Practical considerations for keeping long-running sessions from breaking down have also been officially documented. First, give it a memory system — storing one lesson per markdown file with a one-line summary at the top, and having the model reference past learnings, improves performance. (This aligns with the memory approach used in the environment this article was written in.) Second, instruct it to verify its progress — prompting it with something like "before reporting, cross-reference each claim against the tool execution results from this session; only report work you can show evidence for, and if something is unverified, say so" reportedly eliminated nearly all false progress reports in Anthropic's testing. Third, leverage subagents: since Fable 5 spawns parallel subagents more aggressively than previous models, it is more accurate to delegate independent small tasks and assign verification to "a separate validation agent with a clean context" rather than relying on the main agent's self-critique.

There are also some surprising pitfalls worth knowing. Deep into a long session, Fable 5 may occasionally declare "I will now execute X" without actually calling a tool, then end its turn — in that case, a simple "continue" or "go ahead and do it end to end" will resume it. Additionally, legacy prompts or skills that instruct the model to write out its thinking process as part of its answer can inadvertently trigger the reasoning_extraction refusal category, increasing fallbacks to Opus 4.8. When migrating, it is worth auditing old instructions like "show me your thinking" or "explain your reasoning step by step" and, where thought visibility is needed, switching to a structured approach of reading thinking blocks instead. This is the standard way to avoid unnecessary fallbacks. Overall, overly fine-grained skills built for older models can actually become a hindrance, and reworking them in the direction of "give fewer instructions and delegate more" tends to be effective.

How Silicon Valley Covered It — Excitement and Fears of Being "Priced Out"

The editorial tone across publications and sites reflected a mature reception, where admiration for capabilities coexisted with practical concerns over operations and safety. VentureBeat covered it with a superlative headline — "Anthropic Brings Mythos to the Masses — Claude Fable 5, the Most Powerful Generally Available Model Ever" — while TechCrunch, CNBC, NBC News, Inc., IT Pro, and others led with both the leap in capabilities and the "Opus 4.8 fallback" safety mechanism. The very fact that a safety feature was treated as a headline-worthy element itself signals that the center of gravity in AI coverage in 2026 has shifted from "speed" to "speed paired with safety."

The unfiltered voice of the developer community was concentrated on Hacker News. Alongside praise, two concerns stood out: wariness over the aforementioned pricing design that cuts off free access on June 22, and frustration with over-triggering (false positives) from the safety mechanism. One user reported being "warned that I might be trying to make a bioweapon when I tried to use it, and was sent back to Opus 4.8," and others noted that legitimate code reviews and security testing were also being blocked. Fable 5's safety valve can indeed catch harmless defensive cyber work and beneficial life-science tasks in its net — and this "convenience vs. safety tradeoff" is the frontier where Anthropic will be forced to fine-tune its operations going forward.

Even so, the overall tone is positive. As one Every reviewer wrote, "With Fable, AI began to feel less like a 'tool' that follows your instructions and more like a 'collaborator' that thinks alongside you" — reports of a qualitative shift in how work feels, beyond quantitative benchmark scores, came in one after another. Despite friction from pricing and the safety mechanism, virtually no one questioned the capabilities themselves — and that is Silicon Valley's broadest common denominator of an assessment for this release.

Future Outlook — IPO, Token Economy, and "When Will What Happen"

Finally, let us organize what movements can be expected going forward and when. The biggest backdrop is the IPO (Initial Public Offering). According to reporting by Fortune and CNBC, Anthropic submitted its S-1 registration documents confidentially to the U.S. SEC on June 1. Through its most recent Series H round of approximately $65 billion (roughly ¥10.4 trillion), its valuation reportedly reached approximately $965 billion (roughly ¥154 trillion), surpassing OpenAI (valued at approximately $852 billion = roughly ¥136 trillion as of March) for the first time. Its annualized revenue run-rate as of May 2026 stands at approximately $47 billion (roughly ¥7.5 trillion), surging from approximately $10 billion (roughly ¥1.6 trillion) the previous year, with Q2 alone expected to generate $10.9 billion (roughly ¥1.74 trillion) in revenue. However, because the filing is confidential, neither a formal prospectus nor audited financials have been made public, and it should be noted that these figures are based solely on press reports and private placement rounds. The listing is expected as early as October 2026—this coming autumn. The release of Fable 5 served as an event that added a decisive chapter to this IPO narrative: "cutting-edge capabilities, monetized without enabling misuse."

There are three product-side developments to watch in the near term. First is capacity expansion—the timing at which Fable 5, which shifted to a usage-quota model after June 23, will be reintegrated into standard subscriptions as compute resources are secured. Second is the expansion of Project Glasswing, with Anthropic indicating plans to transition to a more systematic "trusted access program," upgrading Mythos Preview users to Mythos 5, and establishing a new "biology program" that lifts biological and chemical restrictions while retaining cybersecurity guardrails. Outlets including Engadget have also reported that Glasswing's scope will expand to approximately 150 new organizations, with the addition of "Claude Security," which performs codebase scanning and patch suggestions. Third, at Code with Claude Tokyo on June 10 (today) and Extended Tokyo on June 11, practical usage tips and live demos of new features premised on the new models will be shared directly with Japanese developers.

From an engineer's perspective, the real focus going forward will shift from "benchmark numbers" to "real-world operational data on autonomous execution." The true value of Fable 5 hinges on how reliably it can complete long-running tasks spanning hours to days, and how far it can suppress false-positive-triggered fallbacks. Given that Anthropic itself has publicly stated that "Claude writes more than 80% of our internal code" and has even sounded warnings about recursive self-improvement, what the world needs to measure next comes down to a single question: how far can human oversight be reduced before things break down? With the three variables of pricing negotiations, safety guardrail tuning, and the IPO intertwining, the AI development race in the second half of 2026 will begin revolving around Fable 5 as a reference point. And as it happens, the frontline atmosphere of that race can be breathed in Tokyo, the day after the announcement—for Japanese developers, this was an "unleashing of the most powerful model" at a timing that could not have been better.