PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: web app from screenshots

  • Claude Fable 5 and Claude Mythos 5: Anthropic Ships Its First Generally Available Mythos-Class AI Model With New Safeguards

    Anthropic has launched Claude Fable 5 and Claude Mythos 5, the first Mythos-class models offered beyond a tiny circle of cyber defenders. Fable 5 is the generally available version, wrapped in a new layer of safeguards, while Mythos 5 is the same underlying model with some of those guardrails lifted for a small group of vetted partners. The pair sits a full tier above the Opus class in raw capability, and the launch is as much a story about how Anthropic is choosing to gate that capability as it is about the benchmarks. Below is a full breakdown of what shipped, what the model can do, and why the safeguard design matters.

    TLDR

    Anthropic released Claude Fable 5, a Mythos-class model that is now its most capable generally available model, posting state-of-the-art results across software engineering, knowledge work, vision, memory, and scientific research. To ship it safely and fast, Fable 5 carries new safety classifiers that route flagged queries in cybersecurity, biology and chemistry, and distillation over to Claude Opus 4.8 instead of refusing, a fallback that triggers in under 5% of sessions. The same model ships without cyber safeguards as Claude Mythos 5 for Project Glasswing partners in collaboration with the US Government, where it is described as having the strongest cybersecurity capabilities of any model in the world. Highlights include a codebase-wide migration of a 50-million-line Ruby codebase that Stripe says took a day instead of two months, beating Pokemon FireRed with a vision-only harness, accelerating drug design roughly tenfold using Mythos 5, producing novel molecular biology hypotheses preferred by scientists about 80% of the time, and over a week of autonomous genomics research. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview, with a staged subscription rollout and a new 30-day data retention policy for Mythos-class traffic.

    Thoughts

    The most interesting decision here is not the capability jump, it is the naming split. Fable and Mythos are the same brain. The only difference is whether the safeguards are on. Anthropic is effectively shipping one model twice: a gated public edition and an ungated edition handed to a short list of trusted defenders working with the US Government. That is a clean way to resolve the central tension of frontier AI, which is that the exact capabilities that help a security professional close a vulnerability also help an attacker find one. Rather than dumbing the model down for everyone or holding it back entirely, they are letting the access list, not the weights, carry the risk. Expect this pattern to repeat as capabilities climb.

    The fallback-to-Opus design is the other quietly important choice. When a classifier flags a query in cybersecurity, biology, chemistry, or suspected distillation, the user does not hit a wall of refusal. The request is silently handed to Opus 4.8, a model that is still excellent at almost everything. Graceful degradation beats a hard no, both for user experience and for trust. It also reframes what a safeguard is. Instead of a binary block, it becomes a routing decision, and because more than 95% of sessions never trigger it, most users will never notice it exists. The honest admission that the classifiers are tuned conservatively and will sometimes catch harmless requests is the right posture, even if it will annoy power users who keep getting bounced to the smaller model.

    The commercial signals are worth reading closely. Pricing came down to less than half of Mythos Preview, which suggests confidence in serving costs at scale, but the subscription rollout tells a more cautious story. Fable 5 is free on Pro, Max, Team, and Enterprise plans only through June 22, after which using it requires usage credits until capacity catches up. That is a polite way of saying demand is expected to badly outrun supply. The model is fully available on the API and consumption-based Enterprise plans from day one, because those bill by the token and self-throttle. Subscriptions, which are all-you-can-eat, are where a capacity crunch actually hurts, so that is exactly where the brakes went on.

    On the science, the genomics result is the one that should make people sit up. A model doing over a week of largely autonomous research, assembling single-cell data across 138 species, then designing and training its own machine learning model that outperforms a recently published Science paper while being 100 times smaller, is a different category of claim than acing a benchmark. So is the drug-design work, where Mythos 5 reportedly matches or beats skilled human operators end to end, choosing binding sites, running protein design tools, and recovering from its own failures. If those hold up to publication and independent replication, the interesting frontier stops being chat quality and becomes whether a model can run a research program. That is also precisely why the biology and chemistry classifier exists, and why Anthropic is being so deliberate about who gets the ungated version.

    One caveat worth keeping in view: nearly all of the evidence in the announcement is Anthropic’s own, or comes from partners with early access and an incentive to be enthusiastic. The Stripe migration, the FrontierCode score, the Slay the Spire memory result, the protein targets, and the genomics model are all compelling, but they are first-party until outside labs and the eventual system card, peer review, and independent red-teamers weigh in. The note that the UK AISI made progress toward a universal jailbreak inside a brief testing window is a useful reminder that the safeguard story is a work in progress, not a finished proof.

    Key Takeaways

    • Claude Fable 5 is a Mythos-class model made safe for general use, and is now Anthropic’s most capable generally available model.
    • Mythos-class is a tier that sits above the Opus class in capability. The first was Claude Mythos Preview, released in April through Project Glasswing.
    • Fable 5 is state-of-the-art on nearly all tested benchmarks, and its lead grows as tasks get longer and more complex.
    • Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. Fable and Mythos differ only by their safeguards.
    • Mythos 5 is described as having the strongest cybersecurity capabilities of any model in the world, and is deployed through Project Glasswing with the US Government.
    • New safety classifiers cover cybersecurity, biology and chemistry, and distillation. Flagged queries fall back to Claude Opus 4.8 rather than being refused.
    • Users are told whenever a fallback happens. More than 95% of Fable sessions involve no fallback at all, and for those sessions Fable performs effectively the same as Mythos 5.
    • The safeguards are tuned conservatively and trigger in less than 5% of sessions on average, sometimes catching harmless requests. Anthropic plans to reduce false positives after launch.
    • Stripe reported Fable 5 compressed months of engineering into days, performing a codebase-wide migration of a 50-million-line Ruby codebase in a day that would have taken a team over two months by hand.
    • Fable 5 scores highest among frontier models on Cognition’s FrontierCode evaluation for high-quality agentic coding, even at medium effort, and is more token-efficient than past Claude models.
    • On Hebbia’s Finance Benchmark for senior-level reasoning, Fable 5 has the highest score of any model, with gains in document reasoning, chart and table interpretation, and problem solving.
    • IMC noted Fable 5 aced their trading-analysis evaluations nearly across the board, including factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.
    • Fable 5 is the new state-of-the-art for vision, and can rebuild a web app’s source code from screenshots alone.
    • Fable 5 beat Pokemon FireRed using a minimal, vision-only harness with no maps, navigation aids, or extra game-state information. Earlier Claude models needed a complex helper harness.
    • Persistent file-based memory improved Fable 5’s Slay the Spire performance three times more than it did for Opus 4.8, and Fable reached the game’s final act three times more often.
    • Fable 5 built a simulation of the solar system, deriving the planets’ orbital motion from physics first principles and using it to predict solar eclipses.
    • Using Mythos 5, internal protein design experts accelerated aspects of drug design by around ten times, with the model matching or beating skilled human operators end to end.
    • Nine of 14 protein targets in the drug-design study yielded strong candidates Anthropic is now investigating.
    • Mythos 5 is Anthropic’s first model to consistently produce novel, compelling scientific hypotheses. Scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons.
    • One Mythos hypothesis, a novel mechanism for an E. coli protein, was corroborated by an independent lab working on the same problem.
    • In over a week of largely autonomous work, Mythos 5 assembled single-cell data for millions of cells across 138 animal species and trained a custom model that outperformed a recent Science paper while being 100 times smaller.
    • Anthropic’s automated alignment assessment found Mythos 5’s level of misaligned behavior was low and similar to Opus 4.8. Because they are the same model, Fable 5’s alignment is similar.
    • An external bug bounty produced no universal jailbreaks in over 1,000 hours of testing, though the UK AISI made progress toward one in a brief initial window.
    • One external partner found Fable 5’s safeguards against harmful cyber queries the most robust of any model tested, including Opus 4.8 and Opus 4.7, with zero compliance on harmful single-turn cyberattack requests.
    • The biology and chemistry classifier is deliberately broad for now. Mythos-class models outperformed dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone.
    • The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models, which could proliferate near-frontier capabilities without safeguards.
    • A new policy requires 30-day data retention for all Mythos-class traffic on first- and third-party surfaces, used only for safety, with logged human access and deletion after 30 days in almost all cases.
    • Anthropic plans trusted access programs that let cybersecurity organizations apply for Mythos 5, and let a small number of life science researchers access Fable 5 with biology and chemistry safeguards removed.
    • Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview. Developers can use claude-fable-5 via the Claude API.
    • Fable 5 is free on Pro, Max, Team, and seat-based Enterprise plans through June 22. On June 23 it moves to usage credits on those plans until capacity allows it to return as a standard inclusion.

    Detailed Summary

    A Mythos-class model, made safe for general use

    Fable 5 is the first Mythos-class model Anthropic has made generally available. Mythos-class is a tier that sits above the Opus class, and the first of its kind, Claude Mythos Preview, was released in April through Project Glasswing to a limited group of cyber defenders and critical software infrastructure providers. The company framed today’s launch as the moment it could finally bring that level of capability to all users, because its safeguards had matured enough to allow it. Fable 5’s capabilities exceed those of any model Anthropic has made generally available, and its advantage over other models grows as tasks get longer and more complex.

    Two models, one brain

    Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. The names are the only real difference: Fable, from the Latin fabula meaning that which is told, is akin to the Greek mythos, and the safeguards are what distinguish the two. Mythos 5 launches first to existing Mythos Preview users, including the Project Glasswing cybersecurity partners, as an upgrade. It is deployed in collaboration with the US Government and is described as having the strongest cybersecurity capabilities of any model in the world. Anthropic plans to steadily expand access through a more systematic trusted access program.

    Software engineering and token efficiency

    Fable 5 can work autonomously for longer than any previous Claude model, and software engineering is where that shows most clearly. During early testing, Stripe reported it compressed months of engineering into days, performing a codebase-wide migration in a 50-million-line Ruby codebase in a single day that would otherwise have taken a whole team over two months by hand. It is also more token-efficient than past models, scoring highest among frontier models on Cognition’s FrontierCode evaluation for high-quality, maintainable agentic coding, even at medium effort.

    Knowledge work, vision, and memory

    On complex analytical work, Fable 5 posted the highest score of any model on Hebbia’s Finance Benchmark for senior-level reasoning, with substantial gains in document-based reasoning and chart and table interpretation, and IMC said it aced their trading-analysis evaluations nearly across the board. In vision, it is the new state-of-the-art, able to extract precise numbers from detailed scientific figures and rebuild a web app’s source code from screenshots alone. It needs less scaffolding too: where earlier Claude models struggled to play Pokemon even with helper harnesses, Fable 5 beat FireRed with a minimal, vision-only harness using nothing but raw game screenshots. On memory, giving Fable persistent file-based notes improved its Slay the Spire performance three times more than it did for Opus 4.8, and it built a physics-first-principles solar system simulation accurate enough to predict solar eclipses.

    Life sciences: drug design, hypotheses, and genomics

    Using Mythos 5, Anthropic’s internal protein design experts accelerated aspects of the drug-design process by around ten times. With protein design and bioinformatics tools but no human assistance, the model matched or beat skilled human operators, executing the full workflow of choosing binding sites, selecting and running design tools, and recovering from failures. Nine of 14 protein targets yielded strong drug-design candidates now under investigation. Mythos 5 is also Anthropic’s first model to consistently produce novel, compelling scientific hypotheses: scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons, and one, a novel mechanism for an E. coli protein, was corroborated by an independent lab. In genomics, Mythos 5 ran over a week of largely autonomous research, assembling single-cell data for millions of cells across 138 species and training a custom model that outperformed a recent Science paper despite being 100 times smaller.

    The new safeguards: classifiers and fallback

    Mythos-class capability is potent enough that Anthropic considers it a substantial misuse risk, especially given how much advanced AI usage is dual use. Fable 5 ships with a new set of classifiers, separate AI systems that detect potential misuse and jailbreak attempts and stop the main model from responding. When a classifier flags a request related to cybersecurity, biology and chemistry, or distillation, the response is handled by Claude Opus 4.8 instead, and the user is told. The cybersecurity classifiers cover both exploitation and broader offensive cyber tasks like reconnaissance and lateral movement, and Anthropic says they prevent Fable from making any progress on those tasks. The biology and chemistry classifier is intentionally broad for now, after tests showed Mythos-class models could outperform dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone. The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models.

    Jailbreak resistance, data retention, and availability

    Anthropic ran extensive red-teaming, including an external bug bounty that produced no universal jailbreaks in over 1,000 hours, though it notes the UK AISI made progress toward one in a brief window. The company concedes it is likely impossible to fully prevent universal jailbreaks and aims instead to make any that remain slow and costly enough to catch before they scale. A new policy requires 30-day data retention for all Mythos-class traffic, used only for safety, with logged human access and deletion after 30 days in almost all cases. On availability, Fable 5 is live everywhere today and fully available on the API and consumption-based Enterprise plans, while subscription access rolls out in stages: free on Pro, Max, Team, and seat-based Enterprise through June 22, then on usage credits from June 23 until capacity allows it to return as a standard inclusion. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens.

    Notable Quotes

    “Today we’re launching Claude Fable 5: a Mythos-class model that we’ve made safe for general use.”

    Anthropic, opening the Claude Fable 5 and Claude Mythos 5 announcement

    “Fable 5’s capabilities exceed those of any model we’ve ever made generally available.”

    Anthropic, on where Fable 5 sits in the lineup

    “It has the strongest cybersecurity capabilities of any model in the world.”

    Anthropic, describing Claude Mythos 5

    “During early testing, Stripe reported that Fable 5 compressed months of engineering into days.”

    Anthropic, on Fable 5’s software engineering results

    “Our early data shows that more than 95% of Fable sessions involve no fallback at all.”

    Anthropic, on how often the safeguards route to Opus 4.8

    “Mythos 5 is our first model to consistently produce novel, compelling scientific hypotheses.”

    Anthropic, on the model’s molecular biology research

    “It is likely impossible to completely prevent universal jailbreaks, but our goal is to make any remaining jailbreaks sufficiently slow and costly that we can detect and prevent them before they are used at scale.”

    Anthropic, on the limits of its safeguards

    “Fable is from the Latin fabula, ‘that which is told,’ akin to the Greek mythos. The safeguards are what distinguish the two models.”

    Anthropic, explaining the Fable and Mythos naming

    Read the full announcement and the benchmark tables on Anthropic’s site here: Claude Fable 5 and Claude Mythos 5.

    Related Reading