PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Category: AI

  • SubQ 1.1 Small Explained: How Subquadratic Sparse Attention Hits 98% Retrieval at 12 Million Tokens With 64.5x Less Compute Than Dense Attention

    Subquadratic, a frontier AI research and infrastructure company, has released the model card and technical report for SubQ 1.1 Small, a long-context language model built on a new attention mechanism the company calls Subquadratic Sparse Attention (SSA). The headline claim is unusual in two directions at once: the model retains 98% single-fact retrieval accuracy at 12 million tokens, roughly twelve times the length it was primarily trained on, while cutting attention compute by 64.5x against dense attention at a 1 million token context. The deeper argument in the report is not really about a single model at all. It is about what happens to the entire retrieval-and-orchestration stack once reasoning over a complete artifact stops being prohibitively expensive.

    TLDR

    SubQ 1.1 Small is a small long-context model that replaces the dense attention of an existing open-weight frontier model with Subquadratic Sparse Attention, a learned, content-dependent sparse attention mechanism that scales linearly in compute and memory rather than quadratically. On retrieval it posts 99.12% on NVIDIA’s 13-task RULER suite at 128K tokens and 100% needle-in-a-haystack accuracy at 1M and 2M tokens, holding at 98% out to 6M and 12M tokens while attending to only 0.13% of token pairs. It keeps competitive general ability, scoring 85.4% on GPQA Diamond and 89.7% pass@4 on LiveCodeBench v6, and reaches 13% on the long-horizon AutomationBench Finance agentic benchmark, close to Opus 4.8 and GPT-5.5 and well ahead of mid and small tiers. The efficiency story is a scaling win rather than a constant-factor one: 64.5x fewer attention FLOPs than dense attention at 1M tokens and 56x faster than FlashAttention-2 on a single attention layer. The report frames cheap long-context compute as a research accelerator that let the team run more than one hundred million-token experiments and find a training recipe (long-context continued pretraining is the strongest lever) rather than guess at one, positions SSA against FlashAttention, DeepSeek’s Lightning Indexer line, state space models like Mamba, and hybrids, invokes Sutton’s Bitter Lesson to argue that RAG, chunking, and agentic scaffolding are partly workarounds for context scarcity, and was independently verified by Appen. Deployment is starting with design partners now, with a 2M to 12M token lineup planned by year end.

    Thoughts

    The most interesting move in this report is the framing, not the benchmark. Subquadratic plants its flag on Richard Sutton’s Bitter Lesson and argues that much of the modern AI stack, the retrieval pipelines, the chunkers, the re-rankers, the agentic orchestration, is scaffolding built around a single computational constraint: dense attention costs grow with the square of context length. If that constraint relaxes, a lot of hand-engineered machinery that exists to feed a model the right fragments at the right moment starts to look like the task-specific pipelines that learned representations eventually displaced. That is a genuinely provocative thesis, and it is the right lens for reading the rest of the document. The company is not selling a longer context window as a feature. It is betting that whole-artifact reasoning is a different shape of capability than retrieval over fragments, and that fragmentation destroys the cross-references a contract or a codebase actually depends on before the model ever sees them.

    The part of the paper most teams will undervalue is the claim that the real payoff of efficient attention is not cheaper inference but cheaper experimentation. A dense long-context training campaign is expensive enough that most groups get a handful of attempts and are forced to guess at the recipe. Subquadratic says SSA let them run more than a hundred experiments across six model generations with per-step iteration under a minute at million-token context, which is how they discovered that long-context continued pretraining, not clever post-training, was the dominant lever. If that holds, algorithmic efficiency becomes a first-class scaling variable alongside parameters and data, because capability becomes responsive to iteration velocity rather than raw compute alone. It reframes efficiency from a deployment line item into a research multiplier, and that is a more durable advantage than any single benchmark number.

    The generalization result deserves scrutiny precisely because it is so clean. A model trained overwhelmingly at 1M tokens, with a sliver at 2M and nothing beyond, holds 98% retrieval at 12M. The proposed explanation is that SSA routes attention by content relevance rather than fixed positional pattern, so there may simply be no obvious length boundary once the routing behavior is learned. That is plausible and the report is careful to say the 12M result emerged rather than being designed for. But single-needle NIAH is a deliberately clean probe with one target and a binary answer. The far harder RULER suite is only reported at 128K, the longest standardized length in the original benchmark, so the multi-hop, aggregation, and distractor-heavy capability that whole-artifact reasoning actually requires has public numbers at 128K, not at 12M. The honest read is that precise retrieval generalizes spectacularly and composite reasoning at extreme length is still an open question the report does not over-claim on.

    What lends the report credibility is how much counter-evidence it volunteers. It walks through MiniMax abandoning its hybrid M1 architecture and returning to full attention for M2 after efficient variants showed multi-hop reasoning deficits at scale. It admits that earlier SubQ checkpoints improved retrieval while regressing on knowledge benchmarks, forcing dedicated capability-balancing work. It describes catching a case where the MRCR benchmark moved up while the model felt worse in real workflow spot-checks, and switching its development signal to RULER as a result. That last point is a quietly important methodological argument: benchmark score and deployment behavior diverged enough to change checkpoint selection, which is a warning every team shipping long-context models should internalize. A vendor confident enough to show where its own metrics misled it is more trustworthy than one that only shows the wins.

    A few caveats keep the enthusiasm grounded. AutomationBench Finance at 13% is genuinely strong relative to peers, but it is a low absolute score across the board, including for GPT-5.5 at 18% and Opus 4.8 at 16%, so this is early evidence of agentic transfer rather than proof of a finished agent. The efficiency comparisons isolate a single attention layer rather than full end-to-end model throughput, which is the right way to expose the scaling shape but not the same as a wall-clock serving benchmark. The model is built from an unnamed donor open-weight frontier model, so some of its general-knowledge and coding strength is inherited rather than created here. And the most aggressive claims about the future, a 2M to 12M lineup and much higher sparsity, are roadmap, not released artifacts. None of that undercuts the core result. It just means the right posture is to treat SubQ 1.1 Small as a strong proof of concept for an architecture that, if it scales as advertised, could quietly remove a layer of the AI stack that everyone currently takes for granted.

    Key Takeaways

    • SubQ 1.1 Small is a long-context language model from Subquadratic AI, built on a new attention mechanism called Subquadratic Sparse Attention (SSA), released June 16, 2026 alongside a model card and technical report.
    • SSA is a learned, content-dependent sparse attention mechanism that scales linearly in both compute and memory with sequence length, rather than quadratically like dense attention.
    • The central result is context-length generalization: the model was trained primarily at 1M tokens, with some training at 2M and none beyond, yet retrieval held far past the training window.
    • Needle-in-a-haystack accuracy is 100% at 1M and 2M tokens and 98% at both 6M and 12M tokens, roughly twelve times the primary training length.
    • At 12M tokens the model attends to only 0.13% of token pairs, close to a 1,000x reduction in attention relationships, while still retrieving accurately.
    • On NVIDIA’s 13-task RULER benchmark at 128K tokens, SubQ 1.1 Small scores 99.12%, with the remaining errors concentrated in aggregation-style tasks rather than retrieval.
    • RULER tests beyond single-fact lookup: single-key and multi-key retrieval, common-word and frequent-word extraction, and multi-hop variable tracing across positions.
    • At 1M tokens, SSA requires 64.5x fewer attention FLOPs than dense attention (3.9 PFLOP versus 252 PFLOP per attention layer).
    • On a single attention layer, SSA runs 56x faster than FlashAttention-2 at 1M tokens (966 ms versus 54,164 ms on an H100), reaching parity near 16K tokens and pulling away as context grows.
    • The efficiency gain is a scaling-law win, not a constant-factor speedup: the advantage over dense attention grows as context length increases.
    • On general knowledge, SubQ 1.1 Small scores 85.4% on GPQA Diamond (pass@1), below GPT-5.5 (93.2) and Opus 4.8 (92), near Sonnet 4.6 and GPT-5.4-mini (87.5), and above GPT-5.4-nano (81.7) and Haiku 4.5 (67.2).
    • On coding, it reaches 89.7% pass@4 on LiveCodeBench v6, close to the absolute frontier (GPT-5.5 92, Opus 4.8 92.2) and ahead of the smaller tiers.
    • On AutomationBench Finance, a long-horizon agentic benchmark, it scores 13%, close to Opus 4.8 (16%) and GPT-5.5 (18%) and ahead of Sonnet 4.6 (8%), Haiku 4.5 (3%), and GPT-5.4-mini (0%). Absolute scores are low across all models.
    • The model was not trained from scratch. The team converted an existing open-weight frontier model by replacing dense attention with SSA, then built long-context ability through staged context extension and continued pretraining.
    • Context was extended in stages (262K, 512K, 1M, 2M) using YaRN positional scaling, with long-context continued pretraining performed between extension stages on naturally long data: books, long documents, and repository-scale code.
    • Roughly one trillion tokens of continued pretraining were performed, most of it at the 1M-token stage.
    • Long-context continued pretraining was the most consistent predictor of long-context retrieval gains across the experiments, more so than post-training tweaks.
    • The team ran more than one hundred long-context experiments across six major model generations, which the report argues is only possible because SSA made million-token iteration cheap (under a minute per step).
    • Capability balance was a recurring challenge: gains in long-context retrieval often regressed short-context knowledge and reasoning unless training was explicitly managed for both.
    • Benchmark scores and real deployment behavior diverged. The MRCR benchmark moved up while qualitative workflow spot-checks got worse, so the team switched its primary development signal to RULER.
    • The report frames RAG, chunking, summarization, and agentic orchestration as scaffolding built around context scarcity, drawing an analogy to Sutton’s Bitter Lesson, where hand-engineered mechanisms get displaced by larger-scale learning.
    • SSA is positioned against FlashAttention (a memory optimization that does not change quadratic compute), fixed-pattern sparse attention, DeepSeek’s learned sparse line, state space models, and hybrid architectures.
    • DeepSeek’s Lightning Indexer (used in DSA and CSA) is the closest published comparison. Its quadratic scoring overtakes the sparse attention it feeds around 52,000 tokens, reaching roughly 16x the attention cost at 1M and 190x at 12M.
    • State space models like Mamba achieve linear cost through a compressed fixed-size state, but that compression is lossy and weakens exact retrieval, which is why production efficient models are usually hybrids with some dense attention layers retained.
    • MiniMax is cited as a cautionary case: it moved from a hybrid M1 to a full-attention M2 after hybrids showed multi-hop reasoning deficits at scale and less mature supporting infrastructure.
    • The benchmark results were independently verified by Appen, a third-party evaluation firm.
    • The named use cases are financial analysis and due diligence, legal and contract work, and software engineering (architecture-level reasoning, cross-file refactoring, dependency tracing, planning, review, and long-horizon memory).
    • Sparsity settings were deliberately conservative, tuned for maximum context length rather than maximum sparsity. Limited experiments at 4x the sparsity reported positive early results.
    • The training infrastructure used a memory-scaling ladder: single node, intra-node sequence parallelism, CPU offload, multi-node sequence parallelism, nested offloading, and Ring Attention for the longest contexts.
    • Beyond about 8M tokens, BF16 numerical underflow and stability became practical constraints on evaluation.
    • The technical report is authored by Saul Ramirez, Alex Whedon, Ashmal Vayani, and Phong Vo of Subquadratic AI.
    • Deployment is starting with a first cohort of design partners, with broader rollout through the quarter and a general model lineup ranging from 2M to 12M tokens by the end of the year.
    • The company’s framing line is “Efficiency is intelligence,” and its broader thesis is that the point is not bigger context windows for their own sake but reasoning directly over complete artifacts with less surrounding scaffolding.

    Detailed Summary

    The problem: whole-artifact reasoning and context scarcity

    The report opens by naming a class of tasks it calls whole-artifact reasoning: problems whose structure requires reasoning across a complete artifact rather than over isolated fragments. A legal agreement may define a term on page 2, qualify it on page 12, carve out an exception on page 46, and amend it in a schedule. A function may be defined in one file, called from forty others, and constrained by invariants encoded in the architecture rather than in comments. A financial review may require connecting filings, earnings reports, contracts, and internal records. In each case the difficulty is not locating a passage, it is reasoning over relationships distributed throughout a large artifact. Most production systems do not do this directly. They rely on retrieval pipelines, chunking, summaries, and agentic workflows that partition information and reconstruct fragments at inference time, because dense attention scales quadratically with context length and makes direct reasoning over large artifacts expensive. Subquadratic argues that much of the modern AI stack is therefore designed to manage context scarcity rather than reason over complete artifacts, and it connects this to Sutton’s Bitter Lesson: sophisticated hand-engineered mechanisms historically get displaced once larger-scale learning becomes practical.

    What SSA is and the three requirements it targets

    Subquadratic Sparse Attention is a content-dependent sparse attention mechanism designed to satisfy three requirements at once, a combination the report argues prior approaches never achieved in a practical long-context system. First, dense-attention-level retrieval and reasoning quality, which requires routing that is content-dependent (determined by the tokens themselves) rather than driven by a fixed positional pattern. Second, subquadratic scaling, where selection, retrieval, and attention are each linear in sequence length so the mechanism is linear end to end, not only within the attention read. Third, full-context training with standard autoregressive generation, so the model can optimize over the entire context during training while keeping efficient token-by-token decoding at inference. The internal mechanism by which SSA achieves this is held back as outside the scope of the report, which focuses instead on the requirements and the experimental program that followed.

    Where SSA sits among prior approaches

    The background section is effectively a taxonomy of long-context modeling. FlashAttention is treated not as a competitor but as the standard dense-attention baseline: it solved the memory problem by never materializing the full attention matrix, but it left the quadratic compute cost untouched, so doubling context still quadruples attention computation. Fixed-pattern sparse attention (sliding-window, strided, as in Longformer, BigBird, and the sliding window in Gemma) scales well but sacrifices content-dependent routing and tends to fail on retrieval benchmarks like RULER. Compression methods like Multi-head Latent Attention reduce KV-cache memory at inference but do not change the quadratic prefill cost. Learned sparse attention, exemplified by DeepSeek’s Native Sparse Attention and its Lightning Indexer, learns where to route but pays a quadratic cost in the indexer itself. State space models and linear attention (Mamba, Mamba-2 and Mamba-3, RetNet, RWKV, gated delta networks) achieve linear cost through a compressed fixed-size state, but that compression is lossy and weak on exact retrieval. Hybrids (Jamba, Kimi Linear, Qwen3 Next, Nemotron) keep a few dense layers to preserve retrieval, which means the quadratic component still dominates at long context. System-level workarounds (RAG, agentic frameworks, recursive language models) move retrieval outside the model entirely. The report’s stated open problem is to combine subquadratic scaling end to end with content-dependent retrieval, arbitrary-position access, and practical ultra-long-context training in one system, which it claims no widely deployed architecture provides and which SSA targets.

    Training: conversion, staged context extension, and continued pretraining

    Rather than training from scratch, the team converted an existing open-weight frontier model that supported a 262K-token context by replacing its dense attention with SSA. They then extended the context window in stages (262K to 512K to 1M to 2M) using YaRN to rescale positional representations, performing long-context continued pretraining between extension stages rather than jumping straight to the final length. The training mixture emphasized naturally long data such as books, long documents, and repository-scale code, packed to the target length with document separators and without masking cross-document attention boundaries. Most continued-pretraining tokens were trained at the 1M-token stage, with roughly one trillion tokens total. Post-training played a separate role: shaping how the long-context capability was expressed while preserving reasoning, coding, and instruction following. The team explored sample-level loss aggregation to keep a few extremely long examples from dominating gradient updates, and staged the post-training corpus across synthetic retrieval tasks, long-context reasoning, coding, educational material, and general instruction following, alternating capability-building phases with recovery phases.

    Results: retrieval, knowledge, coding, and agentic tasks

    On retrieval, SubQ 1.1 Small scores 99.12% on the 13-task RULER average at 128K, with errors concentrated in aggregation-style tasks like common-word and frequent-word extraction. On needle-in-a-haystack, evaluated on 50 held-out UUID samples per length, it scores 100% at 1M and 2M (within the training window) and 98% at 6M and 12M (held out), attending to only 0.13% of token pairs at 12M. On knowledge, GPQA Diamond pass@1 is 85.4%, landing between the small and mid frontier tiers and confirming that long-context optimization need not sacrifice reasoning, a result the report credits to its capability-balancing stages after earlier checkpoints showed retrieval gains coming at the cost of knowledge. On coding, LiveCodeBench v6 pass@4 is 89.7%, and the report notes coding data played a dual role, also improving non-code long-context retrieval because code is dense with the cross-position dependencies that train general routing. On long-horizon agentic work, AutomationBench Finance is 13%, where agents must discover the right endpoints among roughly 500 across 47 applications, make interdependent API calls, follow layered business rules, and ignore seeded distractors, graded on binary end-state correctness with no partial credit.

    Efficiency and the DeepSeek comparison

    Efficiency is measured on one attention layer against a dense baseline on the same backbone. Per-forward-pass attention FLOPs scale from a 2.1x reduction at 32K to 8x at 128K, 31.5x at 512K, and 64.5x at 1M tokens (3.9 PFLOP for SSA versus 252 PFLOP for dense). Measured against FlashAttention-2 in isolation, SSA reaches parity near 16K tokens and pulls away to 56x at 1M, where it runs in 966 ms versus 54,164 ms on an H100. The report devotes a discussion section to DeepSeek’s sparse attention line as the closest published comparison. DeepSeek’s Lightning Indexer is a learned selector, but it is a full-attention distilled transformer, so it scales quadratically: in a V3.2-style configuration the indexer is cheaper than the sparse attention it feeds only below about 52,000 tokens, then overtakes it, reaching roughly 16x the attention cost at 1M tokens and 190x at 12M. SSA targets that same selection role with a selector the report says is dramatically cheaper and linear throughout, and notes SSA could conceptually replace the selector over either uncompressed or compressed representations.

    Efficiency as a research accelerator and the evaluation lessons

    A recurring theme is that the most valuable effect of cheap long-context compute was on the research loop, not just inference. Where a dense campaign would allow a handful of attempts, SSA enabled more than a hundred experiments across six model generations with per-step iteration under a minute at million-token context. That throughput is what surfaced the finding that long-context continued pretraining is the strongest lever, and it leads the authors to argue that algorithmic efficiency should be treated as a first-class scaling variable alongside model and dataset size. The report is unusually candid about evaluation pitfalls. It describes how the MRCR benchmark diverged from deployment behavior, with MRCR-optimized checkpoints often feeling worse on repository-scale code reasoning, multi-document synthesis, and contract analysis, which pushed the team to rely on RULER and a fixed set of qualitative workflow spot-checks as development signals. It also cites MiniMax returning from a hybrid M1 to a full-attention M2 as evidence that reducing asymptotic cost is not sufficient on its own if retrieval quality, reasoning at scale, and system maturity are not preserved at the same time.

    Implications, availability, and what comes next

    The report’s deployment argument is that the most important enterprise implication of long-context models is not larger windows but the ability to reason directly over complete or more-complete artifacts, moving retrieval, re-ranking, and orchestration logic into the model where the task is naturally whole-artifact rather than naturally decomposable. It is careful not to declare retrieval obsolete: for corpora larger than any plausible context window, fast-changing knowledge, and genuinely multi-stage workflows, RAG and orchestration remain the right tools. The narrower claim is that the class of scaffolding that exists only to compensate for context limits gets smaller as efficient long-context models extend the reachable window. The benchmark results were independently verified by Appen. Subquadratic is deploying SubQ 1.1 Small with a first cohort of design partners now, with broader rollout through the quarter and a general lineup spanning 2M to 12M tokens planned by the end of the year, and it flags much higher sparsity as future work.

    Notable Quotes

    “Much of the modern AI stack is therefore designed to manage context scarcity rather than reason over complete artifacts directly.”

    SubQ-1.1-Small Technical Report, framing retrieval and orchestration as workarounds for an architectural limit

    “The hybrid has moved the line, but not changed its shape.”

    SubQ-1.1-Small Technical Report, on why hybrid models keep their quadratic component at long context

    “A routing mechanism intended to make long context affordable becomes the dominant long-context cost, reintroducing quadratic scaling after providing scalar compute savings.”

    SubQ-1.1-Small Technical Report, on DeepSeek’s Lightning Indexer overtaking the attention it feeds

    “If the cost of long-context experiments is too high, teams are forced to guess at the recipe. If the cost falls far enough, they can search for it.”

    SubQ-1.1-Small Technical Report, on efficient attention as a research accelerator

    “Fragmentation systematically destroys those relationships before the model ever sees them.”

    SubQ-1.1-Small Technical Report, on why chunking hurts whole-artifact reasoning

    “Holding the whole artifact in context changes the shape of the task rather than only the speed of it.”

    SubQ-1.1-Small Technical Report, on the difference between bigger windows and direct reasoning

    “The value of SSA is therefore not only that it makes long-context inference cheaper. It makes long-context experimentation cheaper.”

    SubQ-1.1-Small Technical Report, conclusion

    Read the full SubQ 1.1 Small technical report and model card here.

    Related Reading

    • Subquadratic (subq.ai) the company behind SubQ 1.1 Small and the Subquadratic Sparse Attention architecture, where you can join the waitlist.
    • The Bitter Lesson by Richard Sutton the short essay whose argument the report leans on, that hand-engineered mechanisms lose to general methods that scale with computation.
    • Attention Is All You Need the original Transformer paper that introduced the dense attention whose quadratic cost SSA is built to remove.
    • RULER (arXiv) NVIDIA’s long-context benchmark that the report uses as its primary retrieval signal, and that fixed-pattern sparse methods historically struggle with.
    • Retrieval-augmented generation (Wikipedia) background on the RAG approach that the report frames as scaffolding around context scarcity rather than a permanent fixture.
  • OpenAI’s Leaked 2025 Financials: $34 Billion in Spending, a $38.5 Billion Net Loss, and a $17 Billion Microsoft Bill Ahead of Its IPO

    Infographic summarizing OpenAI leaked 2025 financials: $13.07B revenue, $34B total costs, $20.92B operating loss, $38.53B net loss, where the $34B went, the $17.2B paid to Microsoft versus $303M paid back, inference costs, and IPO valuation context

    OpenAI’s audited 2025 financials leaked this week, and they are the clearest picture yet of what it actually costs to run the company behind ChatGPT. Independent journalist Ed Zitron first published the documents, and the Financial Times independently confirmed them. The headline: OpenAI spent $34 billion last year, booked $13.07 billion in revenue, and reported a net loss attributable to the company of $38.5 billion. The disclosure lands just days after OpenAI confidentially filed for an IPO that could value it north of $1 trillion.

    TLDR

    OpenAI’s audited 2025 numbers, leaked by Ed Zitron and confirmed by the Financial Times, show revenue tripling to $13.07 billion while total costs reached $34 billion, producing a $20.92 billion operating loss and a $38.53 billion net loss attributable to the company. The much larger net loss is inflated by a one-time $41.55 billion non-cash charge tied to OpenAI’s October 2025 conversion from a nonprofit to a public benefit corporation; strip the non-cash items and the loss is closer to $8 billion. R&D alone was $19.18 billion, cost of revenue (inference) was $7.5 billion, and sales and marketing ballooned to $5.73 billion. OpenAI paid Microsoft $17.2 billion in 2025 while Microsoft paid OpenAI only $303 million, exposing a deep Azure dependency. The company burned $1.60 for every dollar of revenue, down from $2.37 in 2024, and gross margin slipped from roughly 40% to 33% as more capable models consumed more compute per query. The leak arrives as OpenAI files a confidential S-1, targets a listing as early as September 2026 at up to a $1 trillion valuation, and races rival Anthropic, which is more valuable on paper and claims it is already turning an operating profit.

    Thoughts

    The most important thing to understand about these numbers is that there are two loss figures and the press will conflate them. The $38.53 billion net loss is the scary headline, but $41.55 billion of it is a non-cash accounting charge from converting investor convertible interests into equity during the for-profit restructuring. That charge is real on the audited statement and it will show up in the eventual S-1, but it is a one-time artifact of OpenAI’s unusual corporate history, not money that left the building. The number that describes the actual business is the $20.92 billion operating loss. That is the one to watch, and it is still enormous.

    The genuinely encouraging line in the whole release is the loss-per-dollar ratio. In 2024 OpenAI spent $2.37 to generate a dollar of revenue. In 2025 that fell to $1.60. A company that is still losing $1.60 on every dollar is not a healthy business, but a company whose efficiency improved by a third in a single year while tripling its top line is at least pointed in a defensible direction. The bull case for OpenAI lives entirely in the slope of that line. If it keeps improving at that rate, the math eventually crosses over. If it stalls, the valuation is a fantasy.

    The Microsoft relationship is the single most revealing disclosure, and it is wildly asymmetric. OpenAI paid Microsoft $17.2 billion in 2025. Microsoft paid OpenAI $303 million. That is a 56-to-1 ratio, and it reframes the partnership: Microsoft is not really a peer or even just an investor, it is OpenAI’s landlord and primary supplier, collecting rent on every model trained and every query answered. The April 2026 renegotiation that capped revenue-share payments at $38 billion through 2030, down from a projected $135 billion, suddenly looks less like a favor and more like OpenAI desperately trying to lower its single largest cost. The dependency cuts both ways, but right now Microsoft holds the better hand.

    The structural problem hiding inside the cost of revenue line is inference. Training a model is a fixed, one-time cost. Serving it is a recurring cost that scales with every one of ChatGPT’s roughly 800 million weekly users. OpenAI spent $5.02 billion on Azure inference in the first half of 2025 alone, and the more capable its reasoning models get, the more compute each answer burns. That is why gross margin went down even as revenue went up. It is the opposite of how software is supposed to work, where the marginal cost of one more user trends toward zero. OpenAI’s marginal cost is real, large, and growing. The counterargument is that per-token inference costs have been falling roughly tenfold a year, so the unit economics could still flip. That is the entire wager.

    Finally, the timing matters more than the numbers. OpenAI’s confidential S-1 means these audited figures were going to become public regardless, since the SEC requires the full prospectus at least 15 days before a roadshow. What the leak changes is who gets to study them first. Prospective IPO buyers, enterprise customers signing multi-year API contracts, and competitors now have the audited books weeks or months early, and they are reading them against Anthropic, which filed at a higher valuation and claims an operating profit. For a company asking the public markets to underwrite a $1 trillion bet on a monopoly outcome that does not yet exist, losing control of the narrative this early is not a small thing.

    Key Takeaways

    • OpenAI’s audited 2025 financials were first published by independent journalist Ed Zitron and independently confirmed by the Financial Times, the first verified look at the company’s books before its planned IPO.
    • Revenue grew from $3.7 billion in 2024 to $13.07 billion in 2025, more than tripling year over year, making OpenAI one of the fastest-growing businesses in history.
    • By the end of 2025 OpenAI was generating roughly $2 billion in monthly revenue, up from about $1 billion a quarter at the end of 2024.
    • Total costs and expenses hit $34 billion in 2025, up from $12.48 billion in 2024.
    • Research and development was the single largest expense at $19.18 billion, up from $7.81 billion, and exceeded total revenue on its own.
    • Of that R&D spend, $10.59 billion went to Microsoft, almost certainly the GPU compute cost of training frontier models on Azure.
    • Cost of revenue, the expense of serving ChatGPT responses (inference), rose from $2.65 billion to $7.5 billion.
    • Sales and marketing jumped from $1.11 billion to $5.73 billion, a 418% increase.
    • General and administrative costs rose from $907 million to $1.57 billion.
    • The operating loss, the truest measure of day-to-day economics, grew from $8.78 billion to $20.92 billion.
    • The net loss attributable to OpenAI was $38.53 billion, up nearly eightfold from $5.09 billion in 2024.
    • The bulk of that jump was a one-time, non-cash $41.55 billion charge from OpenAI’s October 28, 2025 conversion to a public benefit corporation, reflecting the changing fair value of convertible interests and warrant liabilities.
    • Stripping out the restructuring charge and other non-cash items such as stock-based compensation and Microsoft computing credits, the underlying loss was about $8 billion.
    • Including all factors, gross net loss reached $60.35 billion, lowered to the $38.53 billion attributable figure by removing $21.82 billion attributed to noncontrolling and redeemable noncontrolling interests.
    • OpenAI burned $1.60 for every $1 of revenue in 2025, an improvement from $2.37 in 2024, the clearest data point in the bull case.
    • Measured as a percentage of revenue, the operating loss improved from 237% in 2024 to 160% in 2025.
    • In total, OpenAI paid Microsoft $17.2 billion in 2025: $10.59 billion in R&D fees, $6.047 billion in cost of revenue, $527 million in sales and marketing, and $42 million in G&A.
    • Microsoft paid OpenAI just $303 million in the same year, a 56-to-1 imbalance underscoring OpenAI’s Azure dependency.
    • SoftBank paid OpenAI $867 million in 2025.
    • At year-end OpenAI carried $3.64 billion in outstanding payables to Microsoft, plus tens of millions more in accrued and non-current liabilities.
    • OpenAI spent $5.02 billion on Azure inference in just the first half of 2025; Azure inference from 2024 through Q3 2025 totaled $12.43 billion.
    • ChatGPT serves roughly 800 million weekly users, meaning billions of queries a week, each one burning GPU time at Azure’s pricing of about $6.98 per H100 GPU-hour.
    • Gross margin fell from roughly 40% in 2024 to 33% in 2025, because more capable reasoning models consume more compute per query.
    • Research firm Sacra estimates OpenAI’s inference costs reached $8.4 billion in 2025 and will rise to $14.1 billion in 2026, a 68% increase.
    • At year-end OpenAI held just over $50 billion in assets, with almost half in cash.
    • The April 2026 Microsoft renegotiation ended exclusivity and capped revenue-share payments at $38 billion through 2030, down from a projected $135 billion, potentially saving OpenAI up to $97 billion over five years.
    • OpenAI filed a confidential draft S-1 with the SEC around May 22, 2026 and confirmed it publicly on June 8, naming Goldman Sachs and Morgan Stanley as underwriters.
    • The company is targeting a listing as early as September 2026 at a valuation that could exceed $1 trillion, though Sam Altman has said a public offering “may be a while.”
    • OpenAI raised $122 billion earlier in 2026 at a $730 billion pre-money valuation, putting its post-money value around $852 billion.
    • At an $852 billion valuation, OpenAI trades at roughly 65 times its 2025 revenue.
    • Rival Anthropic also filed IPO paperwork this month after raising $65 billion at a $900-$965 billion valuation, making it more valuable on paper than OpenAI, and says it expects to report an operating profit of $559 million in the June quarter.
    • HSBC analysts estimate OpenAI may need more than $207 billion in additional capital through 2030 even under optimistic projections.
    • OpenAI projects profitability by 2029 or 2030; independent analysts put the more likely date at 2031 or later.
    • Bridgewater partner Greg Jensen reportedly told clients the implied revenue multiples price OpenAI for “a monopoly outcome that does not yet exist.”
    • Zitron separately reported OpenAI had a negative 122% non-GAAP operating margin in Q1 2026 and that ChatGPT growth has stalled, with the company projecting paid ChatGPT Plus subscriptions to fall from 44 million in 2025 toward cheaper tiers in 2026.

    Detailed Summary

    How the leak happened and why it matters now

    The audited documents were obtained and first published by Ed Zitron on his newsletter Where’s Your Ed At, then independently verified by the Financial Times, which reviewed the same materials. That dual sourcing matters: this is not a rumor or a model, it is OpenAI’s actual audited financial statement. The timing is the story. OpenAI filed a confidential draft S-1 with the SEC around May 22, 2026 and confirmed it publicly on June 8. Under SEC rules the full prospectus must be released at least 15 days before an investor roadshow, so the 2025 numbers were going to be public soon regardless. The leak simply moved that disclosure forward, handing prospective investors, enterprise customers, and competitors an early look at the books.

    Revenue tripled, costs grew faster

    OpenAI’s revenue rose from $3.7 billion in 2024 to $13.07 billion in 2025, and monthly revenue reached nearly $2 billion by year-end. By almost any normal standard that is spectacular growth. The problem is that costs grew faster, reaching $34 billion against $12.48 billion the year before. The gap between what OpenAI earns and what it spends has widened every year since its founding, and 2025 is the starkest example yet. Revenue alone was outpaced by research and development as a single line item in both of the last two years.

    Two loss numbers, and why both matter

    There are two figures that get cited interchangeably and should not be. The operating loss of $20.92 billion is what the business spent beyond what it earned from operations: training models, serving ChatGPT, paying engineers, running marketing. The net loss attributable to OpenAI of $38.53 billion is far larger because 2025 was the year OpenAI completed its conversion from a nonprofit to a for-profit public benefit corporation, finalized on October 28, 2025. That restructuring triggered a $41.55 billion non-cash charge reflecting the changing fair value of convertible equity interests and warrant liabilities. Before the conversion, investors held convertible interest rights treated as liabilities under US accounting rules and revalued upward as OpenAI’s valuation climbed, creating the charge. It is not expected to recur. Including all minor items, gross net loss reached $60.35 billion, reduced to the $38.53 billion attributable figure after removing $21.82 billion tied to noncontrolling and redeemable noncontrolling interests, primarily the OpenAI Foundation’s stake. Strip the non-cash noise and the underlying loss was about $8 billion.

    Where the $34 billion went

    The spending breaks into four lines. Research and development was $19.18 billion, the largest category, with $10.59 billion of it flowing to Microsoft for training compute. Cost of revenue, the expense of serving responses to users, was $7.5 billion and captures inference, the compute consumed every time someone prompts ChatGPT or calls the API. Sales and marketing reached $5.73 billion, up 418% year over year, a striking jump for a product that grew largely by word of mouth. General and administrative costs added $1.57 billion. The shape of the spending tells you OpenAI is simultaneously racing to build better models, serve a massive and growing user base, and aggressively defend market share through marketing.

    The Microsoft dependency

    The most striking single disclosure is the scale of the Microsoft relationship. OpenAI paid Microsoft $17.2 billion in 2025: $10.59 billion in R&D fees for model training, $6.047 billion in cost-of-revenue for inference serving, $527 million in sales and marketing, and $42 million in G&A. Microsoft paid OpenAI just $303 million the same year. SoftBank paid OpenAI $867 million. The 56-to-1 ratio between what OpenAI pays Microsoft and what Microsoft pays back makes the structural reality plain: Microsoft is OpenAI’s largest landlord. The dynamic began shifting in April 2026, when the two renegotiated, ending Microsoft’s exclusivity and capping revenue-share payments at $38 billion through 2030, down from a projected $135 billion. That could save OpenAI up to $97 billion over five years, though Microsoft keeps its IP license through 2032 and remains the primary cloud partner.

    Why inference is the core problem

    Training happens once. Serving happens billions of times a day. When OpenAI releases a model it spends months and billions on training compute, a fixed cost that falls away when training ends. Inference is the opposite: every ChatGPT message runs through the model on Azure GPU hardware, consuming electricity and compute to generate a response. With roughly 800 million weekly users, that is billions of queries a week, each burning GPU time at roughly $6.98 per H100 GPU-hour on demand. OpenAI spent $5.02 billion on Azure inference in the first six months of 2025 alone. Sacra estimates full-year inference costs of $8.4 billion in 2025, rising to $14.1 billion in 2026. This is why gross margin fell from about 40% to 33% even as revenue tripled: more capable reasoning models consume far more compute per query, and revenue has not kept pace with the cost growth that capability generates.

    What it means for the IPO and the race with Anthropic

    OpenAI was last valued around $852 billion post-money after raising $122 billion in early 2026, which puts it at roughly 65 times 2025 revenue. It has named Goldman Sachs and Morgan Stanley as underwriters and is targeting a listing as early as September 2026 at up to a $1 trillion valuation, though Altman has hedged that it “may be a while” and that staying private might be the better course. HSBC estimates the company may need more than $207 billion in additional capital through 2030. The race is with Anthropic, which filed paperwork the same month after raising $65 billion at a $900-$965 billion valuation, making it more valuable on paper, and which says it expects a $559 million operating profit in the June quarter. The contrast is sharp: the two leading AI labs heading toward public markets at the same time, one bleeding cash at scale, the other claiming profitability, both asking investors to bet on a future that has not arrived.

    Notable Quotes

    “The financial condition of OpenAI is deeply concerning. $38.53 billion in losses are astronomical, and far higher than most believed it would be. Losses also appear to be mounting year-over-year at a dramatic rate, and I’m not sure how this company finds a way toward any kind of sustainability or profitability.”

    Ed Zitron, the independent journalist who published the leaked audited financials

    “It’s unclear what this means, nor how OpenAI reconciled the removal of $3.74 billion in costs. I will not speculate further.”

    Ed Zitron, on a discrepancy he found in the restated 2024 figures

    “OpenAI’s two biggest expenses are R&D and marketing. Budget cuts there, coupled with an ability to raise prices or win new sources of revenue, could see the company move into the black over time. Cutting R&D would be the most difficult part of that, given that AI companies can only hold onto their customers by generating the best-performing models.”

    Jim Edwards, Fortune, on whether OpenAI has a realistic path to profitability

    “What the audited documents make impossible to argue is that the path to profitability is short, clear, or cheap.”

    TechTimes analysis of the leaked OpenAI financials

    The implied revenue multiples price OpenAI for “a monopoly outcome that does not yet exist.”

    Bridgewater partner Greg Jensen, reportedly telling clients how to read OpenAI’s valuation

    “OpenAI spent $34bn last year as the ChatGPT maker poured money into a race to dominate the fast-growing AI market ahead of a planned stock market listing.”

    George Hammond and Bryce Elder, Financial Times, framing the audited 2025 spend

    Read Ed Zitron’s original reporting with the full breakdown here, and the Financial Times confirmation here.

    Related Reading

    • Ed Zitron, Where’s Your Ed At the primary source that broke the audited 2025 financials with the full line-by-line breakdown.
    • OpenAI (Wikipedia) background on the company’s history, structure, and the nonprofit-to-for-profit conversion that drives the non-cash charge.
    • Inference (Wikipedia) on the recurring compute cost that explains why OpenAI’s gross margin shrinks as usage grows.
    • Anthropic the rival lab that filed IPO paperwork the same month at a higher valuation and claims it is already operating at a profit.
    • SEC on confidential filings context for why OpenAI’s audited numbers were headed for public disclosure regardless of the leak.
  • US Government Orders Anthropic to Suspend Claude Fable 5 and Mythos 5: Inside the Export Control Directive, the Jailbreak Dispute, and What It Means for Frontier AI

    On June 12, 2026, Anthropic published a statement announcing that the US government, citing national security authorities, has issued an export control directive forcing the company to suspend all access to its newest frontier models, Claude Fable 5 and Claude Mythos 5. The order technically targets foreign nationals inside and outside the United States, including Anthropic’s own foreign national employees, but the practical effect is that both models are going dark for every customer worldwide. It is the first publicly known instance of the US government ordering a deployed frontier AI model offline, and Anthropic is complying while openly disputing the basis for the decision.

    TLDR

    The US government delivered an export control directive to Anthropic at 5:21pm ET on June 12, 2026, suspending all access to Fable 5 and Mythos 5 over an alleged jailbreak of Fable 5’s safeguards. Anthropic says the letter contained no specific details, that the only evidence shared was verbal, and that the technique in question amounts to asking the model to read a codebase and fix software flaws, a capability the company says is freely available from other models including OpenAI’s GPT-5.5 and used daily by cyber defenders. Anthropic defends its defense in depth strategy, notes that thousands of hours of red teaming by the US government, the UK AISI, and third parties found no universal jailbreak, and warns that recalling a commercial model over a narrow, non-universal jailbreak would effectively halt all new frontier model deployments if applied industry-wide. Access to all other Anthropic models, including Claude Opus, Sonnet, and Haiku, is unaffected, and the company says it believes the situation is a misunderstanding and is working to restore access, with more details promised within 24 hours.

    Thoughts

    This is a watershed moment regardless of how it resolves. Governments have blocked AI exports before, but ordering a deployed commercial model recalled out from under hundreds of millions of users is a new kind of intervention, closer to a product recall than a trade restriction. The mechanism matters too. Export control authority aimed at foreign nationals, including a company’s own employees, that cascades into a global shutdown is a blunt instrument doing the work of a regulatory regime that does not exist yet. The US has no statutory process for recalling an AI model, so the government reached for the closest tool on the shelf, and the result is a precedent built on improvisation.

    There is real irony in who got hit first. Anthropic has spent years arguing, publicly and in Washington, that governments should have the power to block unsafe AI deployments. Now the company that asked for a referee is the first one whistled, and its complaint is not about the existence of the power but about the process: a letter at 5:21pm with no specifics, verbal evidence only, and no transparent or technically grounded procedure. That distinction is the whole ballgame for AI governance. A power to halt deployments without due process standards is not regulation, it is discretion, and discretion cuts in every direction depending on who holds it.

    The technical dispute underneath is genuinely interesting because it exposes how unsettled the definition of a dangerous jailbreak is. Anthropic’s account of the offending technique, asking the model to read a specific codebase and fix any software flaws, describes something security teams do on purpose every single day. Vulnerability discovery is the canonical dual use capability: the same analysis that lets a defender patch a hole lets an attacker find one. If the bar for recall is that a model can be coaxed into doing competent security analysis, then every capable model on the market fails that bar, which is exactly Anthropic’s point about GPT-5.5. The hard question the directive dodges is not whether Fable 5 can find bugs but whether it provides meaningful uplift beyond what is already freely available, and Anthropic says it does not.

    For builders, the immediate lesson is uncomfortable: model availability is now a political variable, not just an engineering one. Teams that built directly on Fable 5 lost a production dependency overnight through no fault of Anthropic’s infrastructure, their own code, or any terms of service violation. Multi-model fallback strategies, abstraction layers over providers, and graceful degradation paths just moved from nice-to-have to table stakes for anyone running serious workloads on frontier models. The companies that absorbed this outage gracefully are the ones that assumed any single model could vanish.

    The next 24 hours matter more than the directive itself. Anthropic has promised more details, and the government will face pressure to either substantiate a concern that justifies a global recall or quietly walk it back. Either outcome sets the real precedent. If the directive holds on thin evidence, every frontier lab now operates under the threat of arbitrary shutdown. If it collapses under scrutiny, the case for a formal, transparent statutory process for AI deployment decisions, which Anthropic explicitly endorses in its own statement, gets a lot stronger in Congress than it was a week ago.

    Key Takeaways

    • The US government issued an export control directive on June 12, 2026 suspending all access to Claude Fable 5 and Claude Mythos 5, citing national security authorities.
    • The directive formally targets access by any foreign national, inside or outside the United States, including Anthropic’s own foreign national employees.
    • The net effect is that Anthropic must disable Fable 5 and Mythos 5 for all customers worldwide to ensure compliance, not just for foreign users.
    • Access to all other Anthropic models, including the Claude Opus, Sonnet, and Haiku families, is not affected by the order.
    • Anthropic received the directive at 5:21pm ET the same day it published its statement, and says the letter did not provide specific details of the national security concern.
    • Anthropic’s understanding is that the government believes it has become aware of a method of bypassing, or jailbreaking, Fable 5’s safeguards.
    • Anthropic reviewed a demonstration of the specific technique and says it only identified a small number of previously known, minor vulnerabilities.
    • The company says other publicly available models can discover the same vulnerabilities without requiring any bypass at all.
    • Before launch, Fable 5’s safeguards were red-teamed for thousands of hours in total by the US government, the UK AISI, multiple private third-party organizations, and internal teams.
    • No tester has found a universal jailbreak for Fable 5, meaning a method that broadly bypasses safeguards and unlocks a wide range of cyber capabilities.
    • Anthropic openly states that perfect jailbreak resistance does not appear possible for any model provider today, and that every safeguard in the industry is vulnerable to non-universal jailbreaks.
    • Fable 5 was deployed under a defense in depth strategy: make jailbreaks either narrow or very expensive to produce, then combine that with monitoring to quickly detect and shut down successful attacks.
    • Anthropic’s 30-day customer data retention requirement for Fable exists specifically to support jailbreak research and mitigation, a policy the company says carries real costs with customers.
    • Anthropic says it has not received any disclosure of a concerning non-universal jailbreak that led to a harmful result; disclosed potential jailbreaks were benign or provided no Mythos-specific uplift.
    • The only evidence the government has provided is verbal, describing a narrow, non-universal jailbreak that essentially consists of asking the model to read a specific codebase and fix any software flaws.
    • Anthropic reviewed a report it believes is the basis of the directive and validated that the capability level shown is widely available from other models, including OpenAI’s GPT-5.5, and is used every day by cyber defenders.
    • Anthropic is complying with the legal directive while explicitly disagreeing that a narrow potential jailbreak justifies recalling a commercial model deployed to hundreds of millions of people.
    • The company warns that if this recall standard were applied across the industry, it would essentially halt all new model deployments for every frontier model provider.
    • Anthropic supports government power to block unsafe deployments in principle, but only through a statutory process that is transparent, fair, clear, and grounded in technical facts, and says this action meets none of those principles.
    • Anthropic apologized to customers, called the situation a misunderstanding, said it is working to restore access as soon as possible, and promised more details within 24 hours.

    Detailed Summary

    What the directive actually does

    The order arrived as a letter from the US government at 5:21pm ET on June 12, 2026, invoking national security authorities under export control law. On paper it suspends access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, a category that includes some of Anthropic’s own employees. In practice, Anthropic says compliance requires abruptly disabling both models for every customer, since there is no clean way to enforce a nationality-based access boundary across a global product. The letter did not spell out the specific national security concern. Everything else in Anthropic’s statement is the company’s own reconstruction of what prompted the action.

    The jailbreak at the center of the dispute

    Anthropic’s understanding is that the government became aware of a method for bypassing Fable 5’s safeguards. The company reviewed a demonstration of the technique and characterizes the results as a small number of previously known, minor vulnerabilities, all relatively simple, all discoverable by other publicly available models without any jailbreak at all. According to Anthropic, the government’s evidence so far has been entirely verbal, and the technique boils down to asking the model to read a specific codebase and fix any software flaws. The company reviewed a report it believes underlies the directive and validated that the displayed capability is widely available elsewhere, naming OpenAI’s GPT-5.5 directly, and noted that this exact kind of analysis is what defenders use to keep systems safe.

    Anthropic’s defense in depth posture

    The statement restates the safety posture Anthropic laid out at Fable 5’s launch. The safeguards around cybersecurity tasks are strong enough that users have complained they are overly broad. In the weeks before launch, the US government, the UK AISI, multiple private third-party organizations, and internal teams red-teamed the safeguards for thousands of hours combined, and those tests showed Fable’s protections to be substantially more effective than any previously deployed model. No tester found a universal jailbreak. Anthropic is candid that perfect jailbreak resistance is likely impossible for anyone today, which is why the strategy is defense in depth: keep jailbreaks narrow or expensive, monitor aggressively, and shut down attacks fast. The 30-day customer data retention requirement on Fable exists to support that monitoring and mitigation loop. The company says this posture makes Fable’s risks comparable to models already deployed across the industry.

    Complying while disputing the standard

    Anthropic is removing access for all users as legally required, but the statement draws a hard line on the principle. The company disagrees that a narrow potential jailbreak, one that produced no disclosed harmful result, justifies recalling a commercial model serving hundreds of millions of people. Its broader warning is that this standard, applied evenly, would halt all new frontier model deployments industry-wide, since every provider’s safeguards are vulnerable to narrow jailbreaks. Anthropic also turns its own policy position into a critique: the company has publicly supported giving government the ability to block unsafe deployments, but through a statutory process that is transparent, fair, clear, and grounded in technical facts, and it says this action does not adhere to those principles.

    What happens next

    Anthropic closed by apologizing to customers, calling the situation a misunderstanding, and committing to restore access as soon as possible. The company promised to share more details over the next 24 hours, which makes this a developing story. The open questions are whether the government substantiates its concern with written technical evidence, whether the directive survives that scrutiny, and whether this episode accelerates the formal statutory process for AI deployment decisions that Anthropic says should have governed the action in the first place.

    Notable Quotes

    “The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.”

    Anthropic, on why a directive aimed at foreign nationals becomes a global shutdown

    “We received the directive from the government today at 5:21pm (ET). The letter did not provide specific details of its national security concern.”

    Anthropic, on the abruptness and opacity of the order

    “These vulnerabilities all appear relatively simple, and we have found that other publicly-available models are able to discover them as well without requiring a bypass.”

    Anthropic, on its review of the demonstrated jailbreak technique

    “We suspect that perfect jailbreak resistance is not currently possible for any model provider.”

    Anthropic, restating the position it disclosed at Fable 5’s launch

    “We stand by this defense in depth strategy. It reduces the risks posed by Fable, making them comparable to the risks of existing models already deployed across the industry.”

    Anthropic, defending its layered safeguards approach

    “To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws.”

    Anthropic, describing the technique behind the directive

    “However, we disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people.”

    Anthropic, on complying while contesting the decision

    “If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.”

    Anthropic, on the industry-wide implications of the recall standard

    “As we have stated publicly, we believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts. This action does not adhere to those principles.”

    Anthropic, on the kind of oversight process it says should have governed the action

    “We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible.”

    Anthropic, closing its statement to customers

    Read the full statement on Anthropic’s site here.

    Related Reading

  • Dario Amodei on Policy for the AI Exponential: Anthropic’s Plan for AI Regulation, Job Displacement, Civil Liberties, and Democratic Leadership

    In June 2026, Anthropic CEO Dario Amodei published “Policy on the AI Exponential”, a wide-ranging essay arguing that the gap between how fast AI is advancing and how slowly policy moves has become dangerous, and that the window to close it is open right now. He opens with a memorable image from The Lord of the Rings: the Hobbits trying to rouse Treebeard, the ancient tree who takes a full day just to say hello, to defend his forest before it is cut down. That mismatch in speed, he writes, is exactly the relationship between AI and our political institutions. This post breaks the essay down in full and adds analysis of where the argument lands.

    TLDR

    Amodei argues that AI’s scaling laws point toward “powerful AI,” a country of geniuses in a datacenter, within a few years, while legislation still moves on a timescale of years. For most of the last few years, safety advocates including Anthropic pushed only for optionality-preserving moves like transparency rules, chip export controls, and labor data collection, because the risks were not yet concrete. He says that has changed: events like Claude Mythos Preview proved frontier models are now tools of national strategic consequence, and the time for binding regulation has arrived. The essay covers five policy areas. First, regulation and public safety, where he proposes an FAA-style regime of mandatory third-party testing of frontier models above a compute threshold across four risks (cybersecurity, biological weapons, loss of control, and automated R&D), with government power to block unsafe deployments. Second, macroeconomics and tax policy, where AI could deliver hypergrowth and severe, enduring job displacement at the same time, demanding measurement, pro-employment incentives, and possibly UBI or universal capital accounts. Third, accelerating AI’s positive impact, where the danger is regulators like the FDA being too slow rather than too lax, and biomedical approval needs reform. Fourth, the state and civil liberties, where AI could become the ultimate tool of autocracy through autonomous weapons and mass surveillance, requiring new accountability rules, a domestic ban on autonomous weapons, closing the data broker loophole, and public rights to AI advice. Fifth, securing leadership by democracies through a values-based global coalition that controls the AI supply chain, coordinates on risk, shares benefits, and rejects AI-powered repression. He closes by rejecting the idea that public concern about AI is a PR problem to be marketed away, calling it democratic accountability working as it should.

    Thoughts

    The most important move in this essay is structural, not technical. Amodei is explicitly retiring the “preserve optionality” posture that defined Anthropic’s policy work through 2025 and replacing it with a call for binding rules. For years the argument from safety-minded labs was that the risks were too speculative to legislate against without doing more harm than good, an idea he grounds in the Collingridge dilemma and the Hayekian point that regulators lack the information to make good calls. That was a defensible hedge. What is striking here is the claim that the hedge has expired. He is saying the evidence is now concrete enough that continued caution about regulating has flipped from prudent to negligent. Whether you trust the underlying capability claims or not, that is a genuine change in position from one of the field’s most influential voices, and it deserves to be read as such.

    The FAA analogy is doing enormous work, and it is worth poking at. Airplanes and drugs are mature technologies with stable physics and decades of incident data; the certification regime works because the failure modes are well understood. Frontier models are the opposite: the whole premise of the essay is that capabilities are changing faster than anyone can characterize them. Amodei half-acknowledges this when he warns that a fixed list of safety requirements tends to consume 95 percent of compliance effort on things that turn out not to matter while missing the real risks, a lesson he says Anthropic learned from its own Responsible Scaling Policy. So the proposal is really for an agency nimble enough to rewrite its own standards continuously, which is a much taller order than the FAA. The honest read is that he is proposing a regulator we do not yet know how to build, and betting that building it is still better than the alternative.

    The economics section is where Amodei is most careful, and it is the part most likely to be misread. He goes out of his way to say enduring job displacement is undesirable and that warning about it is not the same as wanting it, a distinction critics of AI leaders often collapse. His real claim is subtle: that AI might jam the economic policy dial on a “hypergrowth, hyper-inequality” setting that is hard to unstick, because AI substitutes for human cognition broadly and faster than past technologies, potentially overwhelming the usual escape hatches like comparative advantage and Jevons paradox. If he is right, the political fight of the next decade is not about growth, which AI supplies, but about distribution, which it does not. His mention of UBI, universal capital accounts, and higher capital gains taxes is notable coming from a frontier CEO, even hedged as it is.

    The civil liberties section is the one that should travel furthest beyond the AI-policy bubble, because it does not depend on accepting his most aggressive timelines. The data broker loophole, the idea that the government can simply buy the bulk data Americans hand to private companies and run mass analysis on it, is a problem that exists today; AI just raises the stakes by making that data vastly more revealing. Same with the proposal that anyone facing adverse government action should have access to AI at least as capable as what the government uses against them. These are concrete, near-term, and bipartisan in a way the abstract autonomy debates are not. The most candid line in the whole piece is his admission that AI cannot be safely entrusted to either governments or companies, an unusually direct acknowledgment that his own industry needs external checks, with Anthropic’s Long-Term Benefit Trust offered as one imperfect example rather than a solution.

    The geopolitics section is the most contested terrain. Framing AI as a nuclear-scale reset of the game board, with a virtual country of 100 million geniuses divisible across military strategy and weapons R&D, leads naturally to a democratic coalition that hoards chips and denies them to adversaries. That logic is internally consistent, but it sits in tension with the benefit-sharing and “eventually the whole world joins” language elsewhere in the same section. Export controls that lock down the supply chain are, by design, a tool of exclusion, and reconciling that with broad diffusion of AI’s benefits to developing countries is the circle the coalition idea has to square. Amodei is clearly aware of the tension and bets that making membership attractive resolves it. The closing image is the one to remember: Treebeard waking up, with the warning that the goal is to channel real public concern into constructive policy rather than let it curdle into formless anger.

    Key Takeaways

    • The core tension of the essay is a mismatch in speed: AI advances exponentially while legislation moves on a multi-year timescale, dramatized by the Treebeard and Hobbits image from The Lord of the Rings.
    • In only four years, AI models went from barely writing a coherent line of code to writing most of the code at major AI companies, with similar gains across biology, physics, math, finance, law, and translation.
    • Scaling laws now have over a decade of empirical support, and if they continue another year or two they likely produce “powerful AI,” a country of geniuses in a datacenter.
    • For the last few years, safety advocates including Anthropic focused on optionality-preserving policies: transparency legislation, chip export controls, and data collection on AI’s labor effects.
    • Amodei argues that posture is no longer enough. Claude Mythos Preview revealed that frontier models pose real cybersecurity risks to the financial sector, critical infrastructure, and national security, and proved AI is now a tool of strategic consequence.
    • He expects biological risks to follow cyber risks, with serious AI autonomy risks potentially not far behind.
    • The essay covers five policy areas: regulation and public safety, macroeconomics and tax policy, accelerating AI’s positive impact, the state and civil liberties, and securing leadership by democracies.
    • Alongside the essay, Anthropic released a legislative proposal on frontier model testing and a policy framework for job displacement, both with promised financial backing.
    • On regulation, Amodei invokes the Collingridge dilemma and Hayek’s information problem to explain why pre-writing AI law in 2023 to 2024 was risky, then argues the situation has now changed.
    • Anthropic’s 2025 answer was transparency, helping pass SB 53 in California, RAISE in New York, and SB 315 in Illinois, plus advocating a federal transparency standard.
    • He now calls for binding regulation modeled on the FAA, where frontier models must pass technical testing and can have release blocked or reversed if they fail high safety standards.
    • Models above a compute threshold should face mandatory third-party testing in four areas: cybersecurity, biological weapons, loss of control of AI systems, and automated R&D that accelerates the other three.
    • Government should be able to block or deter deployment of models judged to present unacceptable risk, scoped to those four risks with protections against political favoritism.
    • Evaluation could come from a government agency or from authorized and inspected private organizations under a “regulatory markets” approach.
    • AI companies should have strong security to protect model weights, conduct regular red teaming and penetration testing, report safety incidents promptly, and work with government against major threat actors.
    • He warns a time may come when the most powerful systems resemble weaponizable nuclear materials rather than airplanes, requiring more aggressive measures, but cautions against getting ahead of present dangers.
    • On economics, AI could deliver extremely rapid growth via accelerated science and operational efficiency, supercharged by AI building better AI.
    • The same properties make AI a broad substitute for human cognition that changes the economy faster than past technologies, risking large and potentially enduring labor market disruption.
    • The feared outcome is a “hypergrowth, hyper-inequality” setting that is hard to unstick, where the challenge shifts from incentivizing growth to sharing its benefits.
    • Amodei is emphatic that enduring job displacement is undesirable and dangerous, and that he warns about it to help society adapt, not as a prophet of doom.
    • Anthropic says it works with customers to find new revenue and use cases rather than only cost cutting, and explores interaction paradigms that keep humans active alongside AI.
    • He predicts AI will enable single individuals to build billion-dollar companies, noting teams of a few people already reach hundreds of millions in revenue, while admitting significant enduring job loss may be intrinsic to the technology.
    • Any response must address both economic provision and the human need for meaning, purpose, and agency, with the latter ultimately more important and beyond what policy can directly deliver.
    • Suggested economic interventions: better measurement and tracking (governments expanding statistics beyond Anthropic’s Economic Index), pro-employment incentives, and long-term macroeconomic support.
    • Pro-employment ideas include wage insurance, retention tax incentives, workforce training grants, and employer-employee matching infrastructure.
    • If displacement is large and permanent, mechanisms like universal basic income or universal capital accounts, financed through company taxes or higher capital gains taxes, may be necessary.
    • He frames datacenter and energy-price backlash as largely a symbol of broader economic anxiety, and says AI companies should pay to absorb rate increases, a pledge Anthropic has already made.
    • For technologies accelerated by AI, the bigger risk is regulators like the FDA being too slow, not too lax, because AI may make downstream tech safer in ways that violate skeptical regulatory assumptions.
    • Biomedicine is the illustrative case: AI could flood the drug pipeline, raise effect sizes, treat previously untreatable diseases, and create whole new therapy categories, while the current FDA and EMA pipeline takes 7 to 8 years.
    • Agencies should pre-approve standards for AI methods like PD/PK modeling, toxicology prediction, dose selection, biomarker validation, synthetic control arms, and surrogate endpoints, plus more flexible accelerated-approval mechanisms.
    • On civil liberties, powerful AI in the wrong hands could be the ultimate tool of autocracy, and existing constitutional protections are not fully equipped to counter a surprise seizure of power.
    • Threats named include fully automated drone armies that obey unlawful orders and surveillance AI that infers the innermost details of every citizen’s life from widely available data.
    • Civil liberties proposals: accountability rules and an “off switch” for autonomous weapons, a domestic ban on fully autonomous weapons including in law enforcement, closing the data broker loophole, and public rights to AI advice during adverse government action.
    • Amodei warns companies as well as governments can seize quasi-state power, citing the Gilded Age and the East India Company, and says AI cannot be safely entrusted to either alone.
    • He offers Anthropic’s Long-Term Benefit Trust as one separation-of-power structure and urges the industry to explore mechanisms that go further.
    • On geopolitics, he argues AI resets the geopolitical game board like nuclear weapons, becoming the dominant source of military and economic power for any nation that holds it.
    • A nation with powerful AI versus one without it, or even one three years behind, could resemble WWII Marines facing medieval swordsmen.
    • He calls for a democratic coalition that shares chips and semiconductor manufacturing equipment internally while denying them to adversaries, citing MATCH and OVERWATCH as good first steps.
    • The coalition should coordinate risk policy, share benefits including harmonized medical approvals, provide mutual AI defense, reject AI-powered repression, and cooperate on macroeconomic stabilization.
    • He rejects the idea that AI’s image is a PR problem, arguing public concern reflects real risks and is democratic accountability working as it should, with the task being to channel it into constructive solutions.

    Detailed Summary

    The speed mismatch between AI and policy

    Amodei frames the entire essay around a single problem: AI advances at a lightning pace while policy, especially legislation, moves very slowly, often for good reasons since governments wield grave powers that should not be used hastily. He illustrates this with Treebeard, the sentient tree from The Lord of the Rings who takes a full day to say hello, as a stand-in for political institutions trying to respond to a technology that can go from amusing toy to a country of geniuses in the time it takes Congress to act. He recounts the dilemma responsible actors have faced: they could see where the exponential was headed, but to observers looking only at present capabilities, AI looked as mundane as the latest consumer app or cryptocurrency, making a laissez-faire attitude hard to argue against. The absence of AI’s radical effects, and uncertainty about their shape, made it genuinely difficult to design good policy even where the will existed.

    That uncertainty, he says, is why safety advocates limited themselves to optionality-preserving measures like transparency rules, export controls, and labor data collection. But over the last few months the evidence of AI’s power and risk has become undeniable, with Claude Mythos Preview as the emblematic example: it scrambled the global cybersecurity landscape and proved AI models are now tools of global and national strategic consequence. He expects biological and autonomy risks to follow, and argues the world must now activate its slow, rickety policy apparatus to handle risks that will compound quickly. He worries current early actions are at least a year out of step with AI’s progress, and presents the essay as an attempt to close that gap across five policy areas, focused on US policy but relevant worldwide.

    Regulation and public safety: an FAA for frontier models

    Amodei opens by acknowledging the real costs of regulation: it can reduce a product’s benefits, disincentivize innovation, and suffer from the Hayekian problem that regulators lack the information for good tradeoffs, plus the Collingridge dilemma that a technology’s impacts are hard to anticipate until it is too late to manage them. In 2023 to 2024 these dynamics argued against pre-writing AI law, since the exact form of biological or autonomy risk, how to test for it, and how it would play out were all unclear, creating a high risk of low-value compliance requirements that miss the real dangers. Anthropic’s answer was transparency: requiring developers to disclose safety procedures, tests, and critical incidents, which is why it supported SB 53 in California, RAISE in New York, and SB 315 in Illinois in early 2026.

    Now, he argues, the risks are clearly here and it is time for binding regulation. His analogy is to cars, airplanes, and drugs: powerful technologies essential to the economy but capable of killing many people if designed or operated poorly. He models AI regulation on the FAA, with frontier models required to pass testing and auditing and with release blocked or reversed if they fail high safety standards. His concrete proposal: mandatory third-party testing for models above a compute threshold across cybersecurity, biological weapons, loss of control, and accelerating automated R&D; government power to block deployment of unacceptably risky models, scoped narrowly with anti-favoritism protections; evaluation by either a government agency or authorized private organizations in a regulatory-markets model; strong weight security, red teaming, and penetration testing at AI companies; and prompt reporting of safety incidents. He notes a future may arrive when systems resemble weaponizable nuclear materials and demand harsher measures, but warns against designing for dangers that have not yet emerged.

    Macroeconomics and tax policy: growth and displacement together

    Here Amodei challenges the standard premise that growth is fragile and must be traded off against the drag of taxes or deficits to reduce inequality. Powerful AI, he suggests, may scramble that assumption by producing extremely rapid growth through accelerated science and efficiency, supercharged by AI building better AI, while simultaneously acting as a broad substitute for human cognition that reshapes the economy faster than any prior technology. The result could be a world stuck on a hypergrowth, hyper-inequality setting that is hard to unstick, where the central challenge is no longer incentivizing growth but sharing its benefits. He is careful to make two points clearly: first, enduring job displacement is undesirable and dangerous and should be minimized, and his warnings are meant to help society adapt, not to play prophet of doom; second, any response must address both economic provision and the deeper human need for meaning, purpose, and agency, which matters more and which policy cannot directly supply.

    His policy menu starts with measurement and tracking, arguing good policy is impossible without accurate data, and that governments could expand economic statistics well beyond Anthropic’s Economic Index. Next come pro-employment incentives such as wage insurance, retention tax incentives, workforce training grants, and employer-employee matching, costs he says society should readily accept since they are likely offset by AI productivity gains. If displacement proves large and permanent, he says long-term income support like universal basic income or universal capital accounts may be needed, financed through taxes on relevant companies or higher capital gains taxes. He closes the section by reframing datacenter and energy-price backlash as mostly a symbol of broader economic anxiety, while saying AI companies should absorb rate increases, as Anthropic has pledged.

    Accelerating AI’s positive impact: the slow-regulator problem

    For technologies accelerated by AI, rather than AI itself, Amodei flips his concern: the bigger danger is regulatory systems designed for a slower pace failing to handle the deluge of new products, and AI making downstream technologies safer in ways that violate the skeptical assumptions baked into agencies like the FDA. He focuses on biomedicine as the area likely to produce AI’s biggest humanitarian benefits and where regulation is especially complex. AI could greatly increase the rate of new drug candidates, improve their effect sizes and safety profiles, treat previously untreatable diseases, and create entirely new therapy categories the way antibodies, peptides, and cell therapies did.

    The current pipeline at the FDA and EMA takes 7 to 8 years, built on the pessimistic assumption that drug candidates usually fail and often carry safety problems even when they work. Without reform, AI will jam or overload that system. Amodei proposes that agencies develop standards now for accepting AI simulation and analysis, so they can be adopted quickly once proven rather than after years of unnecessary testing. Specific candidates include AI-based PD/PK modeling, toxicology prediction to reduce animal testing, more accurate dose selection, biomarker validation from large datasets, synthetic control arms, and surrogate endpoints (especially for aging and neurodegeneration). He urges more flexible accelerated-approval mechanisms generally, and notes biomedical acceleration may also reduce AI’s risks by aiding biodefense and improving mental health.

    The state and civil liberties: guarding against AI-driven tyranny

    Amodei frames the perennial balance between state power and individual liberty, enforced through machinery like the First, Fourth, and Fifth Amendments, the Posse Comitatus Act, and FISA, and argues AI threatens to upset that balance while raising its stakes. Powerful AI in the wrong hands could be the ultimate tool of autocracy, because the enormous returns to intelligence combined with AI’s pace create a perfect storm for a surprise seizure of power. The danger could take many forms but shares one feature: AI conferring sudden power while routing around democratic oversight. He cites a fully automated drone army that could obey unlawful orders, where trained humans might object, and a surveillance AI that analyzes widely available information at massive scale to infer the innermost details of every citizen’s life, an ability current civil liberties law never contemplated.

    His proposals: create accountability rules for autonomous weapons so they respond to court orders, legislation, and human overseers rather than blindly following orders, possibly with a judicial finger on an off switch; ban domestic use of fully autonomous weapons, including in law enforcement, while allowing them against foreign adversaries; close the bulk-collection and data-broker loophole that lets the government buy and analyze data Americans share with private companies; and guarantee public rights to AI advice at least as capable as what the government uses during adverse action, as an extension of the Administrative Procedure Act, due process, or the Sixth Amendment. He closes by warning that companies, not just governments, can capture the state, citing the Gilded Age and East India Company, and argues AI cannot be safely entrusted to either alone. Anthropic’s Long-Term Benefit Trust is offered as one accountability structure, with a call for the industry to go further.

    Securing leadership by democracies: a values-based coalition

    Amodei rejects treating AI as a mere instrument of trade policy to diffuse a tech stack worldwide. He believes AI resets the entire geopolitical game board like nuclear weapons, potentially even more so, becoming the dominant source of military and economic power for whoever holds it. In a virtual country of 100 million geniuses, millions could be assigned to military strategy, drone manufacture, weapons R&D, intelligence, and scientific advancement at once, so a nation with powerful AI facing one without it, or even three years behind, could be like WWII Marines against medieval swordsmen. Because powerful AI also enables deeper autocratic repression, it matters enormously that the world’s strongest nations are democracies.

    His answer is a global coalition built on shared democratic values that draws in the rest of the world by making membership increasingly attractive and exclusion increasingly costly. Operating principles include managing the AI supply chain by sharing chips and semiconductor manufacturing equipment within the coalition while denying them to adversaries, expanding and tightening export controls (he cites MATCH and OVERWATCH as good first steps); coordinating on biological, cyber, and autonomy risk to make compliance compatible and effective; sharing AI’s benefits including harmonized medical approvals; mutual defense through collective AI cyberdefense, drones, manufacturing, compute, and intelligence; rejection of AI-powered repression; and macroeconomic cooperation against contagious employment crises. The coalition would respect each nation’s sovereignty, start with aligned democracies, and grow iteratively, ideally toward the whole world, but at minimum positioning democracies to contain and outcompete repressive regimes.

    A window of opportunity

    Amodei closes on cautious optimism. The same exponential that strains policymaking has created a unique opening: clear evidence of AI’s risks, an early taste of its value and disruption, and public backlash against unregulated approaches have left policymakers unusually open to forward-looking action. Treebeard and his forest are waking up. He firmly rejects the industry-circle view that this is a PR problem solved by better marketing, arguing people are worried because the risks are real, and that public concern in response to transparency is democratic accountability working as it should. The key challenge is focusing that concern into constructive solutions rather than letting it descend into formless anger and violence. He is optimistic because issues from job displacement to model testing to export controls have common-sense appeal across the political spectrum, and a broad nonpartisan coalition could adopt sane, forward-looking policy faster than usual.

    Notable Quotes

    “in only four years, AI models have gone from barely being able to write a coherent line of code to writing most of the code at major AI companies.”

    Dario Amodei, on the pace of the AI exponential

    “in the several years that it can take Congress to act, AI can go from an amusing toy to the full country of geniuses.”

    Dario Amodei, on the mismatch between AI’s speed and the speed of legislation

    “However, now the risks are clearly here. It is time to go beyond transparency to more serious and binding regulation of AI.”

    Dario Amodei, marking the shift from transparency to binding rules

    “enduring job displacement is undesirable and dangerous, and we should do everything we can to minimize or prevent it, not to bring it about.”

    Dario Amodei, clarifying his stance on AI and jobs

    “The key challenge in such a world won’t be incentivizing growth, but finding a way for everyone to share in the benefits.”

    Dario Amodei, on a hypergrowth, hyper-inequality economy

    “Powerful AI in the wrong hands could be the ultimate tool of autocracy, and our existing legal and constitutional protections are not fully equipped to counter this threat.”

    Dario Amodei, on AI and civil liberties

    “A nation that possesses powerful AI facing one without it … could be the equivalent of an army of World War II Marines facing an army of medieval swordsmen.”

    Dario Amodei, on AI as the dominant source of geopolitical power

    “People are worried about AI because they correctly perceive that its risks are real, not because AI CEOs have been insufficiently Panglossian.”

    Dario Amodei, rejecting the idea that AI has a PR problem

    “Treebeard and his forest are waking up.”

    Dario Amodei, on policymakers’ new openness to acting on AI

    “Policy on the AI Exponential” is a dense, structured argument from one of the most consequential figures in the field, and it rewards a full read in the original. The summary and analysis above are a guide, not a substitute. You can read the full essay here.

    Related Reading

  • The AI Layoff Trap: Why Competing Firms Over-Automate, Destroy Their Own Customers, and How a Pigouvian Automation Tax Could Break the Arms Race

    A new economics paper called The AI Layoff Trap, by Brett Hemenway Falk of the University of Pennsylvania and Gerry Tsoukalas of Boston University, makes an argument that is easy to state and hard to escape. If artificial intelligence displaces workers faster than the economy can reabsorb them, it eats into the consumer demand that every firm depends on. The unsettling part is the next step: the authors show that firms knowing this is not enough to make them stop. Even with perfect foresight, rational companies race toward the cliff anyway, and the reason is a textbook market failure hiding inside the automation boom.

    TLDR

    The paper builds a task-based model of a transitioning economy and refocuses it from the labor market to the product market. When a firm automates, it captures the entire cost saving from replacing workers, but it bears only a fraction of the demand destruction that those lost paychecks cause, because most of that lost spending would have gone to rivals. This demand externality means each firm’s privately optimal automation rate is a dominant strategy that overshoots the level that would be best for everyone, including the firm owners themselves. Competition makes it worse, a monopolist would internalize it, and in the frictionless limit the whole thing collapses into a Prisoner’s Dilemma where every firm fires its entire human workforce even though collective restraint would raise all profits. Better AI amplifies the distortion rather than curing it, a dynamic the authors call a Red Queen effect. They test six policy responses. Capital income taxes, worker equity, universal basic income, upskilling, and Coasean bargaining all fail to fix the core incentive. Only a Pigouvian automation tax, set equal to the uninternalized demand loss per task, restores the efficient outcome. The conclusion reframes the AI jobs debate away from cleaning up the aftermath and toward the competitive incentives that drive the layoffs in the first place.

    Thoughts

    The cleverest move in this paper is where it points the camera. Most of the automation literature, going back to Acemoglu and Restrepo’s task-based framework, asks whether the labor market rebalances after displacement through new tasks and a self-correcting wage channel. Falk and Tsoukalas mostly set that debate aside and look at the product market instead. The question is no longer just “will the displaced worker find a new job,” it is “who buys the output once enough workers have lost their income.” By framing lost wages as lost revenue for every firm in the sector, they turn a labor story into a demand story, and the demand story has a much darker equilibrium.

    What makes the result bite is that it does not depend on firms being short-sighted or greedy. The authors grant every firm perfect foresight. Everyone can see the demand cliff ahead. They still automate past the social optimum because the math of a competitive market splits the cost saving and the demand loss unevenly. You keep all the savings from firing your workers. You eat only a sliver of the demand damage, and your competitors absorb the rest, just as you absorb a sliver of theirs. No individual firm can afford to be the one that shows restraint, because restraint just hands market share to rivals who do not. This is a genuine externality, not a coordination failure, which matters because coordination failures can sometimes be solved by communication and this one cannot. Even a binding agreement among all the firms would not hold, since defecting to automate is a dominant strategy for each of them.

    The Red Queen result is the part that should give AI optimists pause. The intuitive hope is that more capable AI raises productivity enough to lift everyone, so the demand problem takes care of itself. The model says the opposite. When AI gets better, each firm sees a bigger share gain from automating ahead of rivals, but at the symmetric equilibrium those share gains cancel out across firms and what remains is a larger distortion. Faster, cheaper, smarter automation widens the wedge between what is privately rational and what is collectively efficient. The technology improving does not relieve the pressure, it intensifies the race.

    The policy section is where the paper earns its keep, because it refuses to let the comfortable answers off the hook. Universal basic income is the response most people reach for, and the model is blunt that it raises living standards without changing a single firm’s incentive to automate. It treats the symptom and ignores the margin. Upskilling and worker equity narrow the gap but cannot close it. Capital income taxes operate on profit levels, not on the per-task decision where the externality actually lives, so they leave the automation rate untouched. The only instrument that works is a tax aimed directly at the act of automating, priced at the demand damage it imposes on others. That is an uncomfortable conclusion for almost everyone. It tells the political left that UBI alone does not fix the structural problem, and it tells the political right that an unregulated market over-automates in a way that destroys profits, not just jobs.

    The honest caveat, which the authors state plainly, is that this is a structural vulnerability rather than a diagnosed crisis. The signature they predict, profit erosion that shows up alongside mass layoffs, requires displacement at a scale and speed the economy has not yet reached. If reabsorption keeps pace, the externality stays too small to measure. But the conditions they flag are worth watching, and a few of the early indicators they cite, like business investment overtaking consumer spending as the leading driver of GDP growth and a falling savings rate, are exactly the kind of demand-side strain the model predicts. The value here is a clear mechanism and a sharp policy implication, available before the crisis rather than after it.

    Key Takeaways

    • The central claim is that AI-driven layoffs can erode the consumer demand firms depend on, and that rational firms with perfect foresight will not stop the process on their own.
    • The mechanism is a demand externality. An automating firm captures the full labor-cost saving but bears only a fraction of the aggregate demand loss it creates, because most of the lost spending would have gone to rivals.
    • Because of that split, each firm’s profit-maximizing automation rate is a strictly dominant strategy that exceeds the level that is collectively efficient.
    • The resulting loss is not a transfer from workers to owners. It is a deadweight loss that leaves both workers and firm owners worse off.
    • The distortion deepens with competition. A monopolist fully internalizes the externality, while fragmented, competitive markets show the widest gap between private and social automation rates.
    • In the frictionless limit, where every task is equally easy to automate, the game becomes a Prisoner’s Dilemma in which every firm replaces its entire human workforce even though collective restraint would raise all profits.
    • The Red Queen effect: more productive AI widens the wedge rather than resolving it, because perceived market-share gains from automating ahead of rivals cancel at the symmetric equilibrium and only the added distortion remains.
    • Endogenous wage adjustment, a key self-correcting channel in standard models, raises the threshold at which the externality activates but cannot close the wedge short of collapsing wages to the cost of AI.
    • Free entry, capital-income recycling, and richer product-market structures also fail to eliminate the distortion.
    • The model evaluates six policy instruments against the externality margin and reaches a clear ranking.
    • Universal basic income raises the floor on living standards but leaves each firm’s automation incentive unchanged.
    • Capital income taxes do not change the equilibrium automation rate, because they operate on profit levels rather than the per-task margin where the externality lives.
    • Upskilling and worker equity participation narrow the wedge but cannot eliminate it.
    • Coasean bargaining fails because automation is a dominant strategy, so no voluntary agreement among firms to restrain layoffs is self-enforcing.
    • Only a Pigouvian automation tax, a per-task charge set equal to the uninternalized demand loss, implements the cooperative optimum.
    • The tax can be self-limiting. Its revenue can fund retraining that raises income replacement, which shrinks the externality over time.
    • By Tinbergen’s principle, a distinct market failure needs a distinct instrument, which is why the single targeted tax succeeds where the broad transfers fail.
    • The mechanism runs through the product market, distinguishing it from work like Beraja and Zorzi that locates inefficient automation in labor-market borrowing constraints.
    • Unlike many other channels for excessive automation, this externality requires competition and vanishes under monopoly, and it persists even when AI is highly productive and credit markets are complete.
    • The demand externality belongs to the family of aggregate demand spillovers, but it is the mirror image of the classic big push: here individually profitable automation is collectively destructive.
    • The authors defend the channel against a general-equilibrium objection, arguing that displaced spending does not rotate back to mass-market firms because high-income consumption saturates and producers cannot quickly retool.
    • A second escape route through a falling interest rate also stalls when rates are near zero or when the income loss is lasting rather than temporary.
    • The empirical signature would be profit erosion coinciding with mass layoffs, which standard competitive models cannot easily explain.
    • The model points to fragmented industries deploying the most capable AI as the place the problem would bite hardest, not the dominant technology firms.
    • Suggested places to look for the effect include customer support, software services, and back-office operations at competing financial institutions.
    • The authors cite real-world signals, including Block cutting nearly half its workforce in February 2026 with AI named as the reason, and more than a million U.S. job cuts announced in 2025 with AI explicitly tied to roughly 55,000.
    • They note that roughly 80% of U.S. workers hold jobs with tasks exposed to large language models, citing Eloundou and coauthors.
    • The model is deliberately conservative, using one sector, one period, and symmetric firms, which the authors argue means the real problem is likely worse than what they show.
    • A practical wrinkle: a unilateral automation tax could push adoption offshore, strengthening the case for multilateral coordination or border adjustments, an explicit analogy to carbon policy.
    • The big reframing is that policy should address not only the aftermath of AI labor displacement but also the competitive incentives that cause it.

    Detailed Summary

    A task-based model refocused on the product market

    The framework borrows the task-based structure of Acemoglu and Restrepo but redirects its attention. Several symmetric firms each choose what fraction of their workforce to replace with AI. Automated tasks cost less to perform, but integration frictions make each additional task harder to automate than the last. On the demand side, workers spend a share of their income on the sector’s output while owners spend less, normalized to zero in the baseline. Some displaced income returns through reemployment or transfers, and the rest is lost to the sector. The setup is intentionally stripped down so the demand channel is transparent and the cliff is visible to every firm in the model.

    The demand externality that traps every firm

    Competition creates the trap. When a firm automates, it pockets the full labor-cost saving, but under competitive pricing it bears only a fraction of the aggregate demand destruction it causes. The rest spills onto rivals. Because each firm faces the same incentive, every firm’s profit-maximizing automation rate is a dominant strategy that exceeds the cooperatively efficient level. Foresight does not save them. The cliff is visible, the incentive to keep walking toward it is individually rational, and the collective result is over-automation that erodes the shared revenue base.

    Competition deepens it, monopoly internalizes it

    The size of the distortion depends on market structure. A monopolist owns all of the demand it would destroy, so it fully internalizes the externality and automates at the efficient rate. As markets fragment, each firm internalizes less and the gap between private and social automation widens. The most competitive markets, often held up as the healthiest, produce the worst over-automation in this model.

    The frictionless limit becomes a Prisoner’s Dilemma

    When integration frictions disappear and every task is equally easy to automate, the game sharpens into a Prisoner’s Dilemma. Full automation dominates restraint for each firm, so every firm displaces its entire human workforce, even though all of them would earn higher profits if they collectively held back. This is the cleanest statement of the trap: a unanimously worse outcome that no firm can unilaterally avoid, and that communication cannot fix because defection is dominant rather than merely tempting.

    The Red Queen effect: better AI makes it worse

    Higher AI productivity does not rescue the equilibrium. Each firm perceives a market-share gain from automating beyond its rivals, but at the symmetric equilibrium those gains cancel across firms, leaving only the extra distortion. So improvements in AI widen the wedge instead of closing it. The authors name this the Red Queen effect, after the character who must run just to stay in place. Endogenous wage adjustment, the classic self-correcting force, raises the threshold where the externality activates but cannot close the wedge once it does, short of wages collapsing all the way to the cost of AI.

    Six policy fixes, and why only one works

    The paper lines up six instruments against the externality. Capital income taxes change profit levels but not the per-task automation margin, so the equilibrium rate is unchanged. Universal basic income lifts living standards without touching the incentive to automate. Upskilling and worker equity narrow the wedge but leave a gap. Coasean bargaining cannot hold because automating is a dominant strategy, so no agreement is self-enforcing. Only a Pigouvian automation tax, set equal to the uninternalized demand loss per task, implements the cooperative optimum. Its revenue can fund retraining that raises income replacement, which shrinks the externality over time and can make the tax self-limiting. Tinbergen’s principle frames the lesson: a distinct market failure needs its own dedicated instrument.

    Does the channel survive general equilibrium?

    A natural objection is that in a frictionless multi-sector economy, displaced income would simply rotate to other spending and the mechanism would dissolve. The authors argue both escape routes are blocked for the mass-market firms most exposed to AI. Spending does not rotate back because high-income consumption saturates and mass-sector producers cannot quickly retool to capture redirected luxury demand. The other route runs through the interest rate: automation shifts income to owners who save more, raising aggregate saving, which a falling interest rate would normally recycle into investment. That adjustment stalls when rates are already near zero or when the income loss is lasting rather than temporary, so displaced workers cannot borrow their way through it.

    What to watch for in the real economy

    The distinguishing empirical signature would be profit erosion that shows up at the same time as mass layoffs, a combination standard competitive models struggle to explain since cost-cutting technology is supposed to raise profits. The authors are careful that this requires displacement at a scale and speed not yet reached, so the contribution is identifying a structural vulnerability rather than diagnosing an active crisis. They point to fragmented industries running the most capable AI as the place to look first, naming customer support, software services, and competing financial institutions’ back-office operations as concrete settings. They also flag a unilateral tax’s offshoring risk, drawing an explicit parallel to carbon policy and the case for multilateral coordination or border adjustments.

    Notable Quotes

    “At the limit, this becomes self-destructive: firms automate their way to boundless productivity and zero demand.”

    The authors, framing the demand cliff that competitive automation runs toward.

    “Rational, forward-looking firms should be the brake; if the cliff ahead is visible to all, why would they race toward it?”

    The authors, setting up the puzzle the paper exists to answer.

    “No firm can afford to be the one that holds back. This is the trap: an automation arms race that only intensifies as AI improves, that leaves workers and firm owners alike worse off, and that no market force can break.”

    From the Discussion, stating the core result in plain language.

    “Because over-automation leaves both firms and workers worse off, correcting it is a matter of eliminating waste, not of redistributing gains between them.”

    The authors, on why the fix is not a left-versus-right transfer fight.

    “This Red Queen effect means that ‘better’ AI, far from mitigating the externality, amplifies it.”

    The authors, on why more capable AI deepens the distortion rather than curing it.

    “The results suggest that policy should address not only the aftermath of AI labor displacement but also the competitive incentives that drive it.”

    From the abstract, the paper’s central policy reframing.

    You can read the full paper, including the formal propositions and the policy table, on arXiv here.

    Related Reading

  • Claude Fable 5 and Claude Mythos 5: Anthropic Ships Its First Generally Available Mythos-Class AI Model With New Safeguards

    Anthropic has launched Claude Fable 5 and Claude Mythos 5, the first Mythos-class models offered beyond a tiny circle of cyber defenders. Fable 5 is the generally available version, wrapped in a new layer of safeguards, while Mythos 5 is the same underlying model with some of those guardrails lifted for a small group of vetted partners. The pair sits a full tier above the Opus class in raw capability, and the launch is as much a story about how Anthropic is choosing to gate that capability as it is about the benchmarks. Below is a full breakdown of what shipped, what the model can do, and why the safeguard design matters.

    TLDR

    Anthropic released Claude Fable 5, a Mythos-class model that is now its most capable generally available model, posting state-of-the-art results across software engineering, knowledge work, vision, memory, and scientific research. To ship it safely and fast, Fable 5 carries new safety classifiers that route flagged queries in cybersecurity, biology and chemistry, and distillation over to Claude Opus 4.8 instead of refusing, a fallback that triggers in under 5% of sessions. The same model ships without cyber safeguards as Claude Mythos 5 for Project Glasswing partners in collaboration with the US Government, where it is described as having the strongest cybersecurity capabilities of any model in the world. Highlights include a codebase-wide migration of a 50-million-line Ruby codebase that Stripe says took a day instead of two months, beating Pokemon FireRed with a vision-only harness, accelerating drug design roughly tenfold using Mythos 5, producing novel molecular biology hypotheses preferred by scientists about 80% of the time, and over a week of autonomous genomics research. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview, with a staged subscription rollout and a new 30-day data retention policy for Mythos-class traffic.

    Thoughts

    The most interesting decision here is not the capability jump, it is the naming split. Fable and Mythos are the same brain. The only difference is whether the safeguards are on. Anthropic is effectively shipping one model twice: a gated public edition and an ungated edition handed to a short list of trusted defenders working with the US Government. That is a clean way to resolve the central tension of frontier AI, which is that the exact capabilities that help a security professional close a vulnerability also help an attacker find one. Rather than dumbing the model down for everyone or holding it back entirely, they are letting the access list, not the weights, carry the risk. Expect this pattern to repeat as capabilities climb.

    The fallback-to-Opus design is the other quietly important choice. When a classifier flags a query in cybersecurity, biology, chemistry, or suspected distillation, the user does not hit a wall of refusal. The request is silently handed to Opus 4.8, a model that is still excellent at almost everything. Graceful degradation beats a hard no, both for user experience and for trust. It also reframes what a safeguard is. Instead of a binary block, it becomes a routing decision, and because more than 95% of sessions never trigger it, most users will never notice it exists. The honest admission that the classifiers are tuned conservatively and will sometimes catch harmless requests is the right posture, even if it will annoy power users who keep getting bounced to the smaller model.

    The commercial signals are worth reading closely. Pricing came down to less than half of Mythos Preview, which suggests confidence in serving costs at scale, but the subscription rollout tells a more cautious story. Fable 5 is free on Pro, Max, Team, and Enterprise plans only through June 22, after which using it requires usage credits until capacity catches up. That is a polite way of saying demand is expected to badly outrun supply. The model is fully available on the API and consumption-based Enterprise plans from day one, because those bill by the token and self-throttle. Subscriptions, which are all-you-can-eat, are where a capacity crunch actually hurts, so that is exactly where the brakes went on.

    On the science, the genomics result is the one that should make people sit up. A model doing over a week of largely autonomous research, assembling single-cell data across 138 species, then designing and training its own machine learning model that outperforms a recently published Science paper while being 100 times smaller, is a different category of claim than acing a benchmark. So is the drug-design work, where Mythos 5 reportedly matches or beats skilled human operators end to end, choosing binding sites, running protein design tools, and recovering from its own failures. If those hold up to publication and independent replication, the interesting frontier stops being chat quality and becomes whether a model can run a research program. That is also precisely why the biology and chemistry classifier exists, and why Anthropic is being so deliberate about who gets the ungated version.

    One caveat worth keeping in view: nearly all of the evidence in the announcement is Anthropic’s own, or comes from partners with early access and an incentive to be enthusiastic. The Stripe migration, the FrontierCode score, the Slay the Spire memory result, the protein targets, and the genomics model are all compelling, but they are first-party until outside labs and the eventual system card, peer review, and independent red-teamers weigh in. The note that the UK AISI made progress toward a universal jailbreak inside a brief testing window is a useful reminder that the safeguard story is a work in progress, not a finished proof.

    Key Takeaways

    • Claude Fable 5 is a Mythos-class model made safe for general use, and is now Anthropic’s most capable generally available model.
    • Mythos-class is a tier that sits above the Opus class in capability. The first was Claude Mythos Preview, released in April through Project Glasswing.
    • Fable 5 is state-of-the-art on nearly all tested benchmarks, and its lead grows as tasks get longer and more complex.
    • Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. Fable and Mythos differ only by their safeguards.
    • Mythos 5 is described as having the strongest cybersecurity capabilities of any model in the world, and is deployed through Project Glasswing with the US Government.
    • New safety classifiers cover cybersecurity, biology and chemistry, and distillation. Flagged queries fall back to Claude Opus 4.8 rather than being refused.
    • Users are told whenever a fallback happens. More than 95% of Fable sessions involve no fallback at all, and for those sessions Fable performs effectively the same as Mythos 5.
    • The safeguards are tuned conservatively and trigger in less than 5% of sessions on average, sometimes catching harmless requests. Anthropic plans to reduce false positives after launch.
    • Stripe reported Fable 5 compressed months of engineering into days, performing a codebase-wide migration of a 50-million-line Ruby codebase in a day that would have taken a team over two months by hand.
    • Fable 5 scores highest among frontier models on Cognition’s FrontierCode evaluation for high-quality agentic coding, even at medium effort, and is more token-efficient than past Claude models.
    • On Hebbia’s Finance Benchmark for senior-level reasoning, Fable 5 has the highest score of any model, with gains in document reasoning, chart and table interpretation, and problem solving.
    • IMC noted Fable 5 aced their trading-analysis evaluations nearly across the board, including factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.
    • Fable 5 is the new state-of-the-art for vision, and can rebuild a web app’s source code from screenshots alone.
    • Fable 5 beat Pokemon FireRed using a minimal, vision-only harness with no maps, navigation aids, or extra game-state information. Earlier Claude models needed a complex helper harness.
    • Persistent file-based memory improved Fable 5’s Slay the Spire performance three times more than it did for Opus 4.8, and Fable reached the game’s final act three times more often.
    • Fable 5 built a simulation of the solar system, deriving the planets’ orbital motion from physics first principles and using it to predict solar eclipses.
    • Using Mythos 5, internal protein design experts accelerated aspects of drug design by around ten times, with the model matching or beating skilled human operators end to end.
    • Nine of 14 protein targets in the drug-design study yielded strong candidates Anthropic is now investigating.
    • Mythos 5 is Anthropic’s first model to consistently produce novel, compelling scientific hypotheses. Scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons.
    • One Mythos hypothesis, a novel mechanism for an E. coli protein, was corroborated by an independent lab working on the same problem.
    • In over a week of largely autonomous work, Mythos 5 assembled single-cell data for millions of cells across 138 animal species and trained a custom model that outperformed a recent Science paper while being 100 times smaller.
    • Anthropic’s automated alignment assessment found Mythos 5’s level of misaligned behavior was low and similar to Opus 4.8. Because they are the same model, Fable 5’s alignment is similar.
    • An external bug bounty produced no universal jailbreaks in over 1,000 hours of testing, though the UK AISI made progress toward one in a brief initial window.
    • One external partner found Fable 5’s safeguards against harmful cyber queries the most robust of any model tested, including Opus 4.8 and Opus 4.7, with zero compliance on harmful single-turn cyberattack requests.
    • The biology and chemistry classifier is deliberately broad for now. Mythos-class models outperformed dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone.
    • The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models, which could proliferate near-frontier capabilities without safeguards.
    • A new policy requires 30-day data retention for all Mythos-class traffic on first- and third-party surfaces, used only for safety, with logged human access and deletion after 30 days in almost all cases.
    • Anthropic plans trusted access programs that let cybersecurity organizations apply for Mythos 5, and let a small number of life science researchers access Fable 5 with biology and chemistry safeguards removed.
    • Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview. Developers can use claude-fable-5 via the Claude API.
    • Fable 5 is free on Pro, Max, Team, and seat-based Enterprise plans through June 22. On June 23 it moves to usage credits on those plans until capacity allows it to return as a standard inclusion.

    Detailed Summary

    A Mythos-class model, made safe for general use

    Fable 5 is the first Mythos-class model Anthropic has made generally available. Mythos-class is a tier that sits above the Opus class, and the first of its kind, Claude Mythos Preview, was released in April through Project Glasswing to a limited group of cyber defenders and critical software infrastructure providers. The company framed today’s launch as the moment it could finally bring that level of capability to all users, because its safeguards had matured enough to allow it. Fable 5’s capabilities exceed those of any model Anthropic has made generally available, and its advantage over other models grows as tasks get longer and more complex.

    Two models, one brain

    Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. The names are the only real difference: Fable, from the Latin fabula meaning that which is told, is akin to the Greek mythos, and the safeguards are what distinguish the two. Mythos 5 launches first to existing Mythos Preview users, including the Project Glasswing cybersecurity partners, as an upgrade. It is deployed in collaboration with the US Government and is described as having the strongest cybersecurity capabilities of any model in the world. Anthropic plans to steadily expand access through a more systematic trusted access program.

    Software engineering and token efficiency

    Fable 5 can work autonomously for longer than any previous Claude model, and software engineering is where that shows most clearly. During early testing, Stripe reported it compressed months of engineering into days, performing a codebase-wide migration in a 50-million-line Ruby codebase in a single day that would otherwise have taken a whole team over two months by hand. It is also more token-efficient than past models, scoring highest among frontier models on Cognition’s FrontierCode evaluation for high-quality, maintainable agentic coding, even at medium effort.

    Knowledge work, vision, and memory

    On complex analytical work, Fable 5 posted the highest score of any model on Hebbia’s Finance Benchmark for senior-level reasoning, with substantial gains in document-based reasoning and chart and table interpretation, and IMC said it aced their trading-analysis evaluations nearly across the board. In vision, it is the new state-of-the-art, able to extract precise numbers from detailed scientific figures and rebuild a web app’s source code from screenshots alone. It needs less scaffolding too: where earlier Claude models struggled to play Pokemon even with helper harnesses, Fable 5 beat FireRed with a minimal, vision-only harness using nothing but raw game screenshots. On memory, giving Fable persistent file-based notes improved its Slay the Spire performance three times more than it did for Opus 4.8, and it built a physics-first-principles solar system simulation accurate enough to predict solar eclipses.

    Life sciences: drug design, hypotheses, and genomics

    Using Mythos 5, Anthropic’s internal protein design experts accelerated aspects of the drug-design process by around ten times. With protein design and bioinformatics tools but no human assistance, the model matched or beat skilled human operators, executing the full workflow of choosing binding sites, selecting and running design tools, and recovering from failures. Nine of 14 protein targets yielded strong drug-design candidates now under investigation. Mythos 5 is also Anthropic’s first model to consistently produce novel, compelling scientific hypotheses: scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons, and one, a novel mechanism for an E. coli protein, was corroborated by an independent lab. In genomics, Mythos 5 ran over a week of largely autonomous research, assembling single-cell data for millions of cells across 138 species and training a custom model that outperformed a recent Science paper despite being 100 times smaller.

    The new safeguards: classifiers and fallback

    Mythos-class capability is potent enough that Anthropic considers it a substantial misuse risk, especially given how much advanced AI usage is dual use. Fable 5 ships with a new set of classifiers, separate AI systems that detect potential misuse and jailbreak attempts and stop the main model from responding. When a classifier flags a request related to cybersecurity, biology and chemistry, or distillation, the response is handled by Claude Opus 4.8 instead, and the user is told. The cybersecurity classifiers cover both exploitation and broader offensive cyber tasks like reconnaissance and lateral movement, and Anthropic says they prevent Fable from making any progress on those tasks. The biology and chemistry classifier is intentionally broad for now, after tests showed Mythos-class models could outperform dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone. The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models.

    Jailbreak resistance, data retention, and availability

    Anthropic ran extensive red-teaming, including an external bug bounty that produced no universal jailbreaks in over 1,000 hours, though it notes the UK AISI made progress toward one in a brief window. The company concedes it is likely impossible to fully prevent universal jailbreaks and aims instead to make any that remain slow and costly enough to catch before they scale. A new policy requires 30-day data retention for all Mythos-class traffic, used only for safety, with logged human access and deletion after 30 days in almost all cases. On availability, Fable 5 is live everywhere today and fully available on the API and consumption-based Enterprise plans, while subscription access rolls out in stages: free on Pro, Max, Team, and seat-based Enterprise through June 22, then on usage credits from June 23 until capacity allows it to return as a standard inclusion. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens.

    Notable Quotes

    “Today we’re launching Claude Fable 5: a Mythos-class model that we’ve made safe for general use.”

    Anthropic, opening the Claude Fable 5 and Claude Mythos 5 announcement

    “Fable 5’s capabilities exceed those of any model we’ve ever made generally available.”

    Anthropic, on where Fable 5 sits in the lineup

    “It has the strongest cybersecurity capabilities of any model in the world.”

    Anthropic, describing Claude Mythos 5

    “During early testing, Stripe reported that Fable 5 compressed months of engineering into days.”

    Anthropic, on Fable 5’s software engineering results

    “Our early data shows that more than 95% of Fable sessions involve no fallback at all.”

    Anthropic, on how often the safeguards route to Opus 4.8

    “Mythos 5 is our first model to consistently produce novel, compelling scientific hypotheses.”

    Anthropic, on the model’s molecular biology research

    “It is likely impossible to completely prevent universal jailbreaks, but our goal is to make any remaining jailbreaks sufficiently slow and costly that we can detect and prevent them before they are used at scale.”

    Anthropic, on the limits of its safeguards

    “Fable is from the Latin fabula, ‘that which is told,’ akin to the Greek mythos. The safeguards are what distinguish the two models.”

    Anthropic, explaining the Fable and Mythos naming

    Read the full announcement and the benchmark tables on Anthropic’s site here: Claude Fable 5 and Claude Mythos 5.

    Related Reading

  • Anthropic Raises $65 Billion Series H at $965 Billion Valuation to Fund AI Safety Research and Massive Compute Expansion

    Anthropic has closed one of the largest private financing rounds in the history of technology, raising $65 billion in Series H funding at a $965 billion post-money valuation. The round, announced on May 28, 2026, lands as demand for Claude reaches what the company calls historic levels, and it positions Anthropic to pour fresh capital into safety research, compute, and the products that enterprises now lean on every day.

    TLDR

    Anthropic raised $65 billion in its Series H at a $965 billion post-money valuation, with Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital leading and Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN co-leading, alongside $15 billion in previously committed hyperscaler investment that includes $5 billion from Amazon. The raise follows Anthropic crossing $47 billion in run-rate revenue earlier in May 2026, and it funds three priorities named by CFO Krishna Rao: advancing safety and interpretability research, expanding compute capacity to meet growing Claude demand, and scaling the products and partnerships customers depend on. On the infrastructure side, the company is locking in gigawatt-scale compute through 5 gigawatts with Amazon, 5 gigawatts of TPU capacity via Google and Broadcom, GPU access from SpaceX, and supply from partners Micron, Samsung, and SK hynix, while Claude remains available across all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure, with widespread enterprise adoption across industries.

    Thoughts

    Start with the number that everyone will fixate on. A $965 billion post-money valuation against $47 billion in run-rate revenue is roughly 20 times sales, and for a company growing this fast that multiple is not the interesting part. The interesting part is that run-rate revenue crossed $47 billion earlier this month, which means the denominator is moving so quickly that the multiple is already stale. Investors are not pricing the business Anthropic is today. They are pricing the slope. A 20x multiple on a number that may double again inside a year is a very different bet than 20x on a flat line, and the lead names here (Altimeter, Dragoneer, Greenoaks, Sequoia, with Capital Group, Coatue, GIC and others co-leading) are not the kind of capital that pays for nostalgia. They are paying for the second derivative.

    But the real story is not the valuation. It is the compute. Read the infrastructure list carefully and you see the actual problem this round solves: 5 gigawatts from Amazon, 5 gigawatts of TPU capacity through Google and Broadcom, GPU access from SpaceX, and memory supply locked down with Micron, Samsung, and SK hynix. That is more than 10 gigawatts of secured power and silicon. The constraint on frontier AI in 2026 is no longer talent or even algorithms. It is electricity, land, and the multi-year queue for advanced packaging and high-bandwidth memory. You cannot buy 10 gigawatts on a quarterly basis. You reserve it years out, and you need the balance sheet to make those commitments credible. A $65 billion raise is, in plain terms, the down payment that lets Anthropic sign for capacity nobody can conjure on demand. The money is downstream of the megawatts.

    The diversification across that compute stack matters as much as the size. By splitting between Amazon’s infrastructure, Google and Broadcom’s custom TPUs, and SpaceX-supplied GPUs, Anthropic is refusing to become hostage to any single supplier’s roadmap or pricing. Custom silicon through Broadcom in particular is a bet on bending the cost curve, because the long-term economics of serving Claude at this scale depend on dollars per token, not just on raw availability. Anyone who has watched cloud lock-in play out over the last decade understands the move. Optionality at the hardware layer is leverage, and leverage is what keeps margins from being dictated by whoever owns the only fab slot you can reach.

    It is worth pausing on the fact that the round explicitly funds safety and interpretability research alongside scaling, and not as a footnote. Most companies treat safety spend as a cost center to be minimized once growth kicks in. Naming it first, ahead of compute and products, is a statement about where Anthropic believes its durable advantage sits. If models keep getting more capable, the binding constraint on deployment inside regulated industries (finance, healthcare, government) becomes trust, not intelligence. Interpretability is the work that turns a black box into something an enterprise risk committee can actually sign off on. Framed that way, safety research is not philanthropy subtracted from the bottom line. It is the thing that unlocks the most lucrative and defensible parts of the market, and pairing it with the scaling budget is the tell.

    Finally, look at distribution. Claude now ships on all three major clouds at once: AWS, Google Cloud, and Microsoft Azure. In a market where most frontier labs are tethered to a single hyperscaler, being available everywhere enterprises already run their workloads is a structural edge. It removes the procurement friction of asking a customer to adopt a new vendor relationship, and it means Anthropic competes on the merits of the model rather than on which cloud a buyer happened to standardize on years ago. Combine that omnipresent distribution with the compute reservations and the explicit safety mandate, and the shape of the strategy is clear. This is not a company buying time. It is a company buying the three things that actually compound: capacity that cannot be rushed, trust that cannot be faked, and reach into every place where work already happens.

    Key Takeaways

    • Anthropic raised $65 billion in its Series H funding round, one of the largest private financings in the history of the technology industry.
    • The round set Anthropic’s post-money valuation at $965 billion, placing the company within reach of the $1 trillion mark.
    • Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital led the Series H round.
    • Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN served as co-leads on the investment.
    • The new capital builds on $15 billion in previously committed hyperscaler investments, which includes $5 billion from Amazon.
    • Anthropic crossed $47 billion in run-rate revenue earlier in May 2026, reflecting the surging commercial demand for Claude.
    • A core priority for the funding is to advance Anthropic’s safety and interpretability research.
    • The company will use the capital to expand compute capacity in order to meet growing demand for Claude.
    • Anthropic plans to scale the products and partnerships that customers depend on across its business.
    • CFO Krishna Rao said the funding will help Anthropic serve the historic demand it is experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.
    • Amazon is providing 5 gigawatts of compute capacity as part of Anthropic’s infrastructure expansion.
    • Google and Broadcom are supplying 5 gigawatts of TPU capacity to power Claude’s growth.
    • SpaceX is contributing GPU access to Anthropic’s compute footprint.
    • Micron, Samsung, and SK hynix are partnering with Anthropic on memory and infrastructure to support its scaling needs.
    • Claude is available on all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure.
    • Anthropic reports widespread enterprise adoption of Claude across a broad range of industries.

    Detailed Summary

    The Raise and the Valuation

    Anthropic has raised $65 billion in Series H funding, a round that values the company at $965 billion on a post-money basis. The size of the raise places it among the largest private financing events the technology industry has ever seen, and the valuation pushes Anthropic to the doorstep of the trillion dollar mark. The capital arrives at a moment when demand for the company’s Claude models has accelerated sharply, and the round is built to fund the response to that demand rather than simply mark a milestone. Anthropic framed the financing in its Series H announcement as the fuel for staying at the research frontier while scaling the infrastructure and products that customers increasingly rely on.

    Who Put In the Money

    The Series H was led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, a group that combines deep growth-stage technology experience with conviction in Anthropic’s long-term trajectory. Joining as co-leads were Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN, a roster that spans crossover funds, sovereign wealth, and institutional investors. Beyond the new equity, Anthropic pointed to $15 billion in previously committed hyperscaler investment, including $5 billion from Amazon. Taken together, the investor base reflects a mix of financial backers and strategic partners with a direct stake in seeing Claude reach more customers and more compute.

    Revenue at $47 Billion Run-Rate

    Underpinning the valuation is a business that has scaled with unusual speed. Anthropic crossed a $47 billion run-rate revenue figure earlier in May 2026, a number that signals how quickly enterprises and developers have adopted Claude across their workflows. Run-rate revenue annualizes the company’s most recent performance, and at this level it puts Anthropic firmly among the fastest growing software businesses on record. That financial momentum is the practical justification for both the round’s size and the near trillion dollar valuation investors were willing to support.

    The Compute Buildout

    A large share of the strategy behind the raise centers on securing compute at enormous scale. Anthropic detailed a set of infrastructure partnerships designed to keep pace with Claude demand. Amazon is providing 5 gigawatts of capacity, while Google and Broadcom together are supplying 5 gigawatts of TPU capacity. SpaceX is contributing GPU access, broadening the range of silicon Anthropic can draw on. Supporting the buildout on the hardware supply side are Micron, Samsung, and SK hynix, the memory and component partners whose output is essential to standing up data centers at this magnitude. The combined picture is a company assembling power, chips, and supply chain commitments measured in gigawatts rather than racks.

    Where the Money Goes

    Anthropic outlined three priorities for the new capital. The first is to advance safety and interpretability research, continuing the work of understanding how models behave and ensuring they remain reliable as they grow more capable. The second is to expand compute capacity to meet the growing demand for Claude, the practical engine behind the infrastructure commitments above. The third is to scale the products and partnerships that customers depend on, deepening the company’s reach into the tools and platforms where work actually happens. Krishna Rao, Anthropic’s chief financial officer, said the funding “will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.”

    Claude Everywhere

    The funding lands on top of a distribution footprint that already spans the major cloud ecosystems. Claude is available on all three leading cloud platforms, AWS, Google Cloud, and Microsoft Azure, which means enterprises can reach the models through whichever provider they have standardized on. That availability has translated into widespread enterprise adoption across industries, from software and finance to healthcare and beyond. By being present everywhere developers and businesses already operate, Anthropic positions Claude not as a destination customers must travel to but as a capability woven into the platforms they use every day.

    Notable Quotes

    This funding will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.

    Krishna Rao, CFO at Anthropic, on the purpose of the Series H round.

    Advance safety and interpretability research, expand compute capacity to meet growing Claude demand, and scale products and partnerships customers depend on.

    How Anthropic describes its use of funds from the round.

    For the full details on the round, the lead and co-lead investors, and how Anthropic plans to deploy the capital across safety research, compute, and products, read the full announcement here.

    Related Reading

    • Anthropic, the AI safety and research company behind Claude that raised this Series H round.
    • Sequoia Capital, one of the lead investors anchoring the financing.
    • Amazon Web Services, one of the three major cloud platforms where Claude is available and the source of a $5 billion investment.
    • Google Cloud TPUs, the tensor processing units behind the 5 gigawatts of TPU capacity in the Google and Broadcom partnership.
    • AI safety, the research field at the center of how Anthropic says it will use the new funding.
  • Claude Opus 4.8 Released: Anthropic Bets on Honesty, Dynamic Workflows, Effort Control, and Cheaper Fast Mode

    Anthropic has released Claude Opus 4.8, the newest member of its flagship Opus class, available today across every surface and priced exactly like the model it replaces. The company calls it “a modest but tangible improvement” on Opus 4.7, but the framing undersells what is actually interesting here: the headline upgrade is not a benchmark number, it is honesty. Opus 4.8 is built to know when it does not know, and that single behavioral shift may matter more for real agent work than any raw capability bump.

    TLDR

    Claude Opus 4.8 is an across-the-board upgrade to Anthropic’s Opus class that ships today at the same regular price as Opus 4.7 ($5 per million input tokens, $25 per million output tokens), with the model positioned as “a more effective collaborator.” The marquee improvement is honesty: Opus 4.8 is roughly four times less likely than its predecessor to let flaws in its own code pass unremarked, and it is more willing to flag uncertainty rather than confidently claim progress on thin evidence. A pre-release alignment assessment found new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest, with misaligned behavior at rates similar to Anthropic’s best-aligned model, Claude Mythos Preview. Three things launch alongside the model: dynamic workflows in Claude Code (research preview), where Claude plans work then runs hundreds of parallel subagents that run even longer and verify their own outputs before reporting back; effort control in claude.ai and Cowork, a slider for how hard Claude thinks; and a Messages API update that accepts system entries inside the messages array so developers can update instructions mid-task without breaking the prompt cache. Fast mode now runs at 2.5x speed and is three times cheaper than before ($10 / $50 per million tokens). The roadmap points to cheaper Opus-equivalent models, a higher-intelligence class above Opus, and a wider rollout of Mythos-class models gated behind stronger cyber safeguards under Project Glasswing.

    Thoughts

    The most important sentence in this announcement is not about coding scores. It is the claim that Opus 4.8 is about four times less likely than Opus 4.7 to let flaws in its own code slip by without comment. For a chat assistant, overconfidence is annoying. For an agent, it is catastrophic. The whole premise of long-running autonomous work is that you hand the model a task and walk away, which means the model’s own judgment about whether it succeeded becomes the only judgment in the loop until you come back. A model that confidently declares victory on a half-finished migration does not save you time, it costs you a debugging session plus the time you spent trusting it. Honesty, framed this way, is not a soft virtue. It is the load-bearing reliability property that makes unattended agents usable at all.

    Read the launch as a single coherent argument rather than a list of features, and the pieces lock together. Dynamic workflows let Claude plan a job and fan out hundreds of parallel subagents that, with Opus 4.8, run longer than before. Effort control lets you dial up how much the model thinks. The honesty improvement means the model checks its own work and flags what it is unsure about instead of papering over it. Put those three together and you get one product thesis: let it run longer, let it think harder, and trust it to tell you when something is wrong. The codebase-scale migration example, hundreds of thousands of lines from kickoff to merge with the existing test suite as the bar, is the proof point. None of those three capabilities is worth much alone. A model that runs for hours but lies about its results is a liability. A model that flags uncertainty but cannot sustain a long task never reaches the moment where its honesty matters. Anthropic shipped all three at once because they only pay off together.

    The economics deserve a closer look than the “same price” headline invites. Regular pricing is flat versus Opus 4.7, which is the polite way of saying you get a better model for free. The real move is fast mode: 2.5x the speed at three times cheaper than it cost on previous models, landing at $10 per million input and $50 per million output. That is Anthropic quietly attacking the latency-versus-cost tradeoff that has shaped how teams deploy frontier models. Until now, “fast” meant “expensive,” so you reserved it for interactive moments and ate the wait everywhere else. Collapsing that premium changes the default. And note the subtle token story underneath: Opus 4.8 at its default high effort spends roughly the same tokens on coding as Opus 4.7’s default while performing better, so the effort slider is not a way to bleed you dry, it is an honest exposure of the quality-cost dial that was always there implicitly.

    The Messages API change is the kind of unglamorous plumbing that practitioners will appreciate immediately. Letting system entries live inside the messages array means you can update an agent’s instructions, permissions, token budget, or environment context partway through a task without smuggling the update through a fake user turn and without blowing up your prompt cache. Anyone who has built a long-running agent has hit this wall: the world changes mid-task, the agent needs new constraints, and the only clean way to inject them previously was a cache-busting hack. This is Anthropic treating agents as first-class, stateful, long-lived processes rather than oversized chat sessions. It is a small spec change with outsized implications for how you architect an agent that runs for an hour.

    Then there is the roadmap, where the most telling line is the quietest. Anthropic says a small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing, and that models of this capability level require stronger cyber safeguards before general release. Notice that they are pinning Opus 4.8’s alignment numbers to Mythos as the benchmark for “best-aligned,” while simultaneously holding Mythos back from general availability on safety grounds. That is a deliberate signal: the next class of model is good enough that they are gating it on cyber-offense risk, not on capability. For a site about the pursuit of joy, fulfillment, and purpose through AI, this is the part worth sitting with. The frontier is increasingly defined not by what the models can do, but by what their builders decide it is responsible to ship. Honesty in the small (flagging a bad line of code) and restraint in the large (holding back a cyber-capable model) are the same instinct expressed at two different scales.

    Key Takeaways

    • Claude Opus 4.8 is now available everywhere, replacing Opus 4.7 as Anthropic’s flagship Opus-class model and positioned as “a more effective collaborator.”
    • Regular usage pricing is unchanged from Opus 4.7, holding at $5 per million input tokens and $25 per million output tokens, so the capability gains come at no added cost.
    • The single most emphasized improvement is honesty, which Anthropic treats as a core trained behavior rather than a marketing flourish.
    • Evaluations show Opus 4.8 is around four times less likely than its predecessor to let flaws in its own code pass unremarked, a direct reliability win for autonomous coding.
    • Early testers report the model is more likely to flag uncertainty about its work and less likely to make unsupported claims or jump to conclusions on thin evidence.
    • A detailed alignment assessment was run before release and concluded Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest.
    • Misaligned behavior such as deception or cooperation with misuse is at rates substantially lower than Opus 4.7 and similar to Anthropic’s best-aligned model, Claude Mythos Preview.
    • The full alignment assessment and pre-deployment safety tests are documented in the public Claude Opus 4.8 System Card.
    • Dynamic workflows launch as a research preview inside Claude Code, letting Claude plan the work and then run hundreds of parallel subagents in a single session.
    • With Opus 4.8, those subagents can run even longer, and Claude verifies its outputs before reporting back rather than declaring success blindly.
    • Anthropic’s flagship example for dynamic workflows is a codebase-scale migration across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as the success bar.
    • Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.
    • Effort control arrives in claude.ai and Cowork as a setting next to the model selector that lets users choose how much effort Claude puts into a response.
    • Higher effort makes Claude think more frequently and deeply for better answers; lower effort responds faster and consumes rate limits more slowly. Effort control is available on all plans.
    • Opus 4.8 defaults to “high” effort, judged the best overall balance of quality and user experience.
    • On coding tasks, the default effort spends a similar number of tokens as Opus 4.7’s default but delivers better performance, so quality rises without a token penalty.
    • Users can select “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows.
    • Rate limits in Claude Code were increased to accommodate the higher token usage of the higher effort levels.
    • The Messages API now accepts system entries inside the messages array, a meaningful change for agent developers.
    • That update lets developers change Claude’s instructions mid-task, adjusting permissions, token budgets, or environment context, without breaking the prompt cache or routing through a user turn.
    • Fast mode now runs at 2.5x speed and is three times cheaper than it was for previous models, priced at $10 per million input tokens and $50 per million output tokens.
    • Developers access the model as claude-opus-4-8 through the Claude API.
    • Partner Miguel Gonzalez reports Opus 4.8 scored 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested.
    • Databricks reports that, inside Genie, Opus 4.8 reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7.
    • Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark, the highest score recorded there.
    • Eleven partners weighed in, including Cursor, Cognition’s Devin, Databricks Genie, Thomson Reuters CoCounsel, and Hebbia, spanning coding, legal, finance, and enterprise data work.
    • Anthropic is working on models that deliver many of the same capabilities as Opus at a lower cost.
    • The company plans to release a new class of model with even higher intelligence than Opus.
    • Under Project Glasswing, a small number of organizations are already using Claude Mythos Preview for cybersecurity work, with Mythos-class models expected to reach all customers in the coming weeks once stronger cyber safeguards are in place.

    Detailed Summary

    What Claude Opus 4.8 Is

    Claude Opus 4.8 is an upgrade to Anthropic’s Opus class of models, building on Opus 4.7 with improvements across benchmarks covering coding, agentic skills, reasoning, and practical knowledge-work tasks. Anthropic describes the result as “a more effective collaborator” while characterizing the release overall as “a modest but tangible improvement on its predecessor.” The model is available today, everywhere, and developers call it as claude-opus-4-8 via the Claude API. The announcement includes a comparison table against the predecessor and other models, though the per-cell numbers in that table are published as an image and are not reproduced here as text.

    Honesty: The Headline Improvement

    Anthropic singles out honesty as one of the most prominent improvements in Opus 4.8. All of the company’s models are trained to be honest, which includes avoiding claims they cannot support. A persistent problem with AI models generally is that they sometimes jump to conclusions, confidently claiming progress despite thin evidence. Early testers report that Opus 4.8 is more likely to flag uncertainties about its own work and less likely to make unsupported claims. The most concrete measure: evaluations show Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. For agentic and unattended use, this self-skepticism is the difference between a model that reliably tells you when something went wrong and one that quietly ships a broken result.

    Alignment Assessment

    A detailed alignment assessment was run before release. On the positive side, the Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” On the risk side, misaligned behavior such as deception or cooperation with misuse occurs at rates substantially lower than Opus 4.7, and similar to Anthropic’s best-aligned model, Claude Mythos Preview. The full alignment assessment and the pre-deployment safety tests are published in the Claude Opus 4.8 System Card, which also contains the complete benchmark table and wider evaluations.

    Dynamic Workflows in Claude Code

    Launching today as a research preview in Claude Code, dynamic workflows let Claude plan the work and then run hundreds of parallel subagents in a single session. With Opus 4.8, those agents can run even longer than before, and Claude verifies its outputs before reporting back rather than reporting unchecked results. The showcase example is a codebase-scale migration: Claude Code with Opus 4.8 can carry out migrations across hundreds of thousands of lines of code, all the way from kickoff to merge, using the existing test suite as its bar for success. Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.

    Effort Control

    Effort control arrives in claude.ai and Cowork as a setting alongside the model selector that lets users choose how much effort Claude puts into a response. Higher effort means Claude thinks more frequently and deeply for better responses; lower effort means it responds faster and uses rate limits more slowly. Opus 4.8 defaults to “high” effort, which Anthropic judged the best overall balance of quality and user experience. On coding tasks, that default spends a similar number of tokens as Opus 4.7’s default while performing better. Users who want more can choose “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows. To support the heavier token usage at higher effort levels, rate limits in Claude Code were increased. Effort control is available on all plans.

    Messages API Update

    The Messages API now accepts system entries inside the messages array. This lets developers update Claude’s instructions mid-task without breaking the prompt cache and without routing the update through a user turn. In practice that means you can update permissions, token budgets, or environment context while an agent is running, which is exactly the kind of statefulness a long-running autonomous process needs. It is a small specification change with significant consequences for how developers build durable agents.

    Pricing and Fast Mode

    Regular usage pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. The notable shift is in fast mode, where the model works at 2.5x the speed and fast mode is now three times cheaper than it was for previous models, landing at $10 per million input tokens and $50 per million output tokens. The combination of unchanged regular pricing and dramatically cheaper fast mode reshapes the latency-versus-cost calculus that has long governed how teams deploy frontier models.

    Partner Results Across Coding, Legal, Finance, and Data

    Eleven partners shared results spanning the spectrum of professional work. Miguel Gonzalez reports 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested. Databricks reports that Genie reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7. Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark. Cursor reports gains across every effort level on CursorBench with more efficient tool calling, and Cognition reports that Devin sees cleaner tool use, fixes to the comment-verbosity and tool-calling issues seen with Opus 4.7, and improvements over Opus 4.6. Hebbia reports strong quality with better citation precision and more token efficiency on retrieval for dense financial filings. The footnotes note that Terminal-Bench 2.1 was scored on the Terminus-2 public harness (GPT-5.5’s Codex CLI harness score is 83.4%), that OSWorld-Verified methodology changed with Opus 4.7’s score updated to 82.3%, and that on Finance Agent v2 Gemini 3.5 Flash scores 57.9%.

    What Is Next: Cheaper Models, Higher Intelligence, and Mythos

    Anthropic outlined a three-part roadmap. First, the company is working on models that provide many of the same capabilities as Opus at a lower cost. Second, it plans to release a new class of model with even higher intelligence than Opus. Third, as part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work; models of this capability level require stronger cyber safeguards before general release, and Anthropic expects to bring Mythos-class models to all customers in the coming weeks.

    Notable Quotes

    “Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with.”

    Tom Pritchard, Staff Engineer, in Claude Code

    “On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability.”

    Kay Zhu, Co-Founder and CTO, on the Super-Agent benchmark

    “On CursorBench, Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through.”

    Michael Truell, Co-Founder and CEO, on CursorBench results

    “Claude Opus 4.8 delivers the highest score recorded on our Legal Agent Benchmark, and is the first model to break 10% overall on the all-pass standard. For substantive legal work, that’s the kind of accuracy lift that translates directly into how much real attorney work our customers can hand off with confidence.”

    Niko Grupen, Head of Applied Research, on the Legal Agent Benchmark

    “Claude Opus 4.8 feels like a major quality-of-life update over Opus 4.7: faster, easier to collaborate with, and better at carrying context and style direction across a long session. Opus 4.8 is the model I kept trusting for work where voice, taste, and technical execution all have to happen side-by-side.”

    Katie Parrott, Staff Writer, on long writing sessions

    “Claude Opus 4.8 is the strongest computer-use and browser-agent model we’ve tested, scoring 84% on Online-Mind2Web, which is a meaningful jump over both Opus 4.7 and GPT-5.5. It stays reflective and on-task in the way our customers’ agent workloads need to be reliable end-to-end.”

    Miguel Gonzalez, Tech Lead, on computer-use and browser agents

    “Claude Opus 4.8 uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need to keep running unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7. This release from Anthropic translates directly into faster capability gains for engineers building on Devin.”

    Scott Wu, CEO, on building with Devin

    “On our long-running evals, Claude Opus 4.8’s analysis was consistently higher quality than prior Opus models. It finished faster and produced richer, more information dense outputs. Overall, a noticeably better signal to noise ratio. The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch.”

    Michael Ran, Sr. Investment Associate, on long-running analysis evals

    Claude Opus 4.8 is a quieter release than its “modest but tangible” billing suggests, because the gains land where autonomous work actually lives: a model that flags its own uncertainty, runs longer and checks itself, scales effort on demand, and stays affordable while fast mode gets cheaper. The honesty improvement alone changes the trust math for anyone deploying agents. Read Anthropic’s full announcement here.

    Related Reading

  • SpaceX S-1 IPO Filing Breakdown, Ticker SPCX on Nasdaq and Nasdaq Texas, xAI Integration, Musk’s Trillion Share Mars Pay Plan, $18.7B Revenue, and the 100 Gigawatt Orbital AI Compute Bet

    Space Exploration Technologies Corp. filed its S-1 registration statement with the SEC on May 20, 2026, kicking off the largest and weirdest IPO in modern capital markets history. The 280-page preliminary prospectus proposes to list Class A common stock on both the Nasdaq Stock Market and the new Nasdaq Texas exchange under the ticker SPCX, bundles xAI into SpaceX as a third reportable segment via a February 2026 reorganization under common control, and asks public investors to underwrite a $28.5 trillion total addressable market that explicitly includes asteroid mining, lunar manufacturing, Mars passenger transport, and 100 gigawatts per year of orbital AI compute on solar-powered satellites. The filing reports $18.67 billion of 2025 revenue and a $4.94 billion net loss, with a Q1 2026 net loss of $4.28 billion driven almost entirely by the AI segment’s $7.7 billion of quarterly capex.

    TLDR

    SpaceX is going public on Nasdaq and Nasdaq Texas as SPCX, led by Goldman Sachs, Morgan Stanley, BofA Securities, Citigroup, and J.P. Morgan. The company has been reincorporated in Texas, headquartered at Starbase, structured as a perpetual dual-class controlled company with Class B shares carrying 10 votes each and electing a majority of the board, and post-merger contains three segments: Space (Falcon, Dragon, Starship), Connectivity (Starlink with 10.3 million subscribers across 164 countries and roughly 9,600 satellites in orbit), and AI (the former xAI, including the Colossus and Colossus II superclusters in Memphis totaling about 1.0 gigawatt of nameplate compute, Grok, and the X platform with 550 million MAUs). Revenue grew from $10.4 billion in 2023 to $14.0 billion in 2024 to $18.7 billion in 2025, with Connectivity contributing $11.4 billion at a 63% segment Adjusted EBITDA margin and the new AI segment burning $1.2 billion of segment Adjusted EBITDA in 2025 while spending $12.7 billion of capex. Elon Musk holds an unspecified majority of the voting power, has a base salary of $54,080 unchanged since 2019, no key-person life insurance, and was granted in January and March 2026 a combined roughly 1.3 billion performance-restricted Class B shares that vest against market-cap milestones from $500 billion up to $7.5 trillion, with the highest tranches contingent on building a permanent Mars colony of one million inhabitants and on deploying non-Earth data centers delivering 100 terawatts of compute per year. The prospectus discloses Anthropic’s $1.25 billion per month compute deal through May 2029, a $60 billion option to acquire Cursor (Anysphere) with a $10 billion combined break fee, the Terafab one-terawatt-per-year chip JV with Tesla and Intel, the $19.6 billion EchoStar spectrum acquisition, a $20 billion SpaceX Bridge Loan, a $5 billion amended revolver, a Houston-exclusive Texas Business Court forum clause with ICC arbitration fallback, and several uniquely SpaceX risk factors including third-party Musk conduct triggering foreign asset seizures, anti-satellite weapons, cascading cyber-induced orbital debris events, and Grok’s named “Spicy” Imagine Mode and “Unhinged” Voice Mode.

    Key Takeaways

    • Ticker SPCX, dual listed on Nasdaq and Nasdaq Texas, Class A par $0.001, joint lead bookrunners Goldman Sachs, Morgan Stanley, BofA Securities, Citigroup, and J.P. Morgan, with a 22-firm syndicate including Barclays, Deutsche Bank, RBC, UBS, Wells Fargo, Allen & Company, Cantor, Needham, Raymond James, Societe Generale, Stifel, William Blair, BTG Pactual, ING, Macquarie, Mirae Asset, Mizuho, and Santander.
    • Headquartered at 1 Rocket Road, Starbase, Texas. Reincorporated from Delaware to Texas on February 14, 2024. Five-for-one forward stock split executed May 4, 2026. All share data in the filing is post-split.
    • Perpetual dual-class structure with no sunset. Class A carries 1 vote per share, Class B carries 10 votes per share, Class C carries no votes (and has been eliminated via the Class C Reclassification). Class B converts to Class A only on a non-permitted transfer.
    • Class B holders elect a majority of the board (the Class B Directors), as long as any Class B shares remain outstanding. Removing Musk from CEO or Chairman requires a separate Class B majority vote. SpaceX will be a Nasdaq controlled company and will rely on the exemptions, meaning no requirement for fully independent compensation or nominating committees.
    • Consolidated revenue: $10.39 billion in 2023, $14.02 billion in 2024, $18.67 billion in 2025, and $4.69 billion in Q1 2026 (up 15.4% year over year). Financials are retrospectively recast to combine xAI and X Holdings since both transactions were between entities under Musk’s common control.
    • Net income (loss): $(4.63) billion in 2023, $0.79 billion in 2024, $(4.94) billion in 2025, and $(4.28) billion in Q1 2026. Accumulated deficit pro forma $41.31 billion as of March 31, 2026.
    • Connectivity (Starlink) is the cash engine. 2025 revenue $11.39 billion, up 49.8%. 2025 operating income $4.42 billion, up 120.4%. 2025 segment Adjusted EBITDA $7.17 billion, up 86.2%. Consumer subscriptions are more than 60% of Connectivity revenue.
    • Starlink subscribers: 2.3 million at year-end 2023, 4.4 million at year-end 2024, 8.9 million at year-end 2025, and 10.3 million as of March 31, 2026. Roughly 9,600 broadband and mobile satellites in low Earth orbit, about 75% of all active maneuverable satellites globally. Available in 164 countries and territories.
    • Starlink ARPU is declining as the mix shifts international and lower priced: $99 monthly in 2023, $91 in 2024, $81 in 2025, $66 in Q1 2026. Management says this is expected to continue.
    • Starlink direct to cell now has roughly 650 V1 Mobile satellites and 7.4 million monthly unique devices across about 30 countries, with partnerships across roughly 30 mobile network operators including T-Mobile, Rogers, KDDI, Optus, Telstra, One NZ, Kyivstar, VMO2, Salt, and Entel. V3 satellites begin deploying in the second half of 2026, designed for 1 Tbps downlink per satellite with up to 60 per Starship launch (a 20x payload-capacity step over Falcon 9).
    • Space segment now generates lower revenue growth because Starlink dedicated launches are not booked as inter-segment revenue. Space revenue: $3.56 billion (2023), $3.80 billion (2024), $4.09 billion (2025). Falcon launches in 2025: 165 total, 43 third-party customer and 122 internal Starlink. Mass to orbit: 1,210 metric tons (2023), 1,699 (2024), 2,213 (2025). SpaceX has now launched more than 80% of the world’s mass to orbit since 2023.
    • Falcon 9 has flown roughly 620 missions with greater than 99% mission success. A single booster has been reflown 34 times. Falcon Heavy is 11-for-11 since 2018 and certified for NSSL. SpaceX flew 11 of 12 NSSL medium and heavy lift missions in 2025.
    • Starship has completed 11 flight tests and is preparing the 12th, debuting next-generation Starship, Super Heavy, and Raptor 3 from a new Starbase pad. V3 is designed for 100 metric tons fully reusable to LEO, V4 targets 200 tons. Cumulative Starship R&D investment is greater than $15 billion, including $3.00 billion in 2025 alone. Operational payload delivery to orbit is expected in the second half of 2026.
    • Dragon has flown 78 crewmembers from 20 countries since 2020 and Cargo Dragon remains the only spacecraft capable of returning meaningful mass from the ISS.
    • AI segment, the absorbed xAI business plus X, generated $818 million Q1 2026 revenue but operating losses of $(2.47) billion and segment Adjusted EBITDA of $(609) million. AI capex was $7.72 billion in Q1 2026 alone, dwarfing Space ($1.05 billion) and Connectivity ($1.33 billion).
    • Colossus and Colossus II in Memphis and Southaven Mississippi together provide about 1.0 gigawatt of nameplate compute draw. Colossus came online in 122 days with about 100,000 H100s. Colossus II added 110,000 GB200s in 91 days and 110,000 GB300s in 64 days. Next phase: another 220,000 GB300s and 400 megawatts. Industry benchmark for a 100 megawatt greenfield datacenter is two years.
    • Grok and X together have 1.3 billion supported accounts on a trailing basis, about 550 million MAUs, roughly 117 million MAUs using Grok AI features, and roughly 350 million daily posts. Imagine generates about 10 billion images and 2 billion videos per month. Paid subscribers totaled 6.3 million as of March 31, 2026 (4.4 million X Premium variants plus 1.9 million SuperGrok variants).
    • Disclosed Anthropic cloud services agreements signed May 2026: Anthropic pays $1.25 billion per month for compute capacity on Colossus and Colossus II through May 2029, ramping in May and June 2026, with 90-day termination by either party.
    • Cursor (Anysphere) compute agreement and acquisition option signed April 2026: SpaceX has the right but not the obligation to acquire Cursor at an implied $60.0 billion equity value, paid in Class A stock priced off the SPCX VWAP. SpaceX-side termination or breach triggers a $1.5 billion termination fee plus an $8.5 billion deferred services fee.
    • Terafab JV with Tesla, announced March 2026, joined by Intel in April 2026, targets one terawatt per year of compute hardware production. The filing explicitly notes that neither Tesla nor Intel is obligated to remain, and definitive agreements may not be signed.
    • Macrohard, in development with Tesla, is described as a platform designed to fully emulate digital workflows, augment human computer operation, and create a fully AI-operated software company.
    • EchoStar Spectrum Transaction (AWS-3, AWS-4, H-block, 65 megahertz US plus global MSS) was FCC-approved May 12, 2026. Total deal value $19.6 billion, including roughly $11.1 billion of equity (261.8 million Class A shares at an implied $42.40) and up to $8.5 billion of debt assumption. Closing expected around November 30, 2027.
    • Balance sheet as of March 31, 2026: cash and equivalents $15.85 billion, short-term marketable securities $7.82 billion, total assets $102.09 billion, total liabilities $60.51 billion, total debt principal $29.13 billion. The $20 billion SpaceX Bridge Loan (Goldman Sachs Bank USA as administrative agent, March 2026) refinanced legacy X and xAI debt and must be repaid within six months of IPO. The amended SpaceX Credit Facility, also May 2026, was upsized to $5.0 billion and extended to May 19, 2031.
    • Use of proceeds: expansion of AI compute infrastructure, enhancements to launch infrastructure and launch vehicles, increases in satellite constellation scale and capacity, and general corporate purposes. No dividends are anticipated and the credit agreements restrict them.
    • Total addressable market estimate of $28.5 trillion (ex-China and Russia): Space $370 billion, Connectivity $1.6 trillion ($870 billion broadband and $740 billion mobile), and AI $26.5 trillion ($2.4 trillion infrastructure, $760 billion consumer subscriptions, $600 billion digital advertising, and $22.7 trillion enterprise applications).
    • Stated future markets explicitly listed in the prospectus: point-to-point Earth transport via Starship, space tourism, in-orbit manufacturing including pharmaceuticals and materials, passenger and cargo to Moon and Mars, lunar mining of rare materials, lunar mass driver, lunar factories building AI compute satellites, asteroid mining, and orbital solar-powered AI. The headline aspirational target is 100 gigawatts per year of orbital AI compute on solar-powered satellites in Sun-synchronous orbit, with first deployments targeted as early as 2028.
    • Musk 2025 total compensation $54,080 (base salary unchanged since 2019, tied historically to California’s exempt-employee minimum). No bonus, no stock or option awards reported for 2025. SpaceX maintains no key-person life insurance on Musk.
    • January 13, 2026 Musk grant: 1 billion performance-based restricted Class B shares across 15 equal tranches tied to market-cap milestones from $500 billion to $7.5 trillion (in $500 billion increments), with at least one tranche additionally gated on “a permanent human colony on Mars with at least one million inhabitants” and on continued employment.
    • March 23, 2026 Musk replacement award (assumed from xAI): 302,072,285 performance-based restricted Class B shares across 12 tranches from $1.065 trillion to $6.565 trillion market cap, additionally requiring completion of “non-Earth-based data centers capable of delivering 100 terawatts of compute per year.” Replaces an earlier xAI award after Musk had already earned and canceled 25,172,695 Class A shares at the first milestone.
    • Gwynne Shotwell 2025 total compensation $85.81 million, primarily option awards. Bret Johnsen (CFO) 2025 total compensation $9.84 million. Non-employee directors received zero cash and zero equity for 2025 service.
    • Board of 8 post-IPO: Musk (Chairman, CEO, CTO), Shotwell (President, COO), Antonio Gracias (Valor Management), Ira Ehrenpreis (DBL Partners and Tesla), Randy Glein (DFJ Growth, audit chair), Donald Harrison (Google), Steve Jurvetson (Future Ventures), and Luke Nosek (Gigafund and Founders Fund). Class B Directors: Musk, Shotwell, Gracias, Harrison, Nosek. Common Stock Directors: Ehrenpreis, Glein, Jurvetson.
    • Lock-up is 180 days for company, directors, and officers, but Musk and certain significant investors are subject to an extended 366-day lock-up, and 100% of Musk’s shares are explicitly not subject to early-release tiers. A Directed Share Program with Schwab, Fidelity, Robinhood, SoFi, and E*TRADE handles retail allocation; DSP shares have no lock-up.
    • Corporate Opportunities waiver in the charter renounces interest in business opportunities presented to directors, officers, board observers, and their affiliates. Musk and his affiliates are explicitly not restricted from competing with SpaceX. This carve-out covers Tesla, Neuralink, The Boring Company, and any future Musk venture.
    • Exclusive forum is the Texas Business Court, Eleventh Division, in Houston, including for federal securities claims. If unenforceable, the fallback is mandatory ICC arbitration in Houston under Expedited Procedure Rules. Jury trial is waived. Class actions are prohibited.
    • Texas Business Organizations Code carve-outs: Section 21.419 codifies a statutory business-judgment-rule presumption, Section 21.552 requires 3% minimum ownership to bring derivative proceedings, and Section 21.373 (2025) requires 3% ownership for six months plus solicitation of 67% of voting power for shareholder proposals (SpaceX concedes enforceability is “expected” to be challenged).
    • Unprecedented risk-factor disclosure: in August 2024 Brazil’s Supreme Court froze Starlink’s Brazilian assets over the conduct of X “when X was not owned by us and was only affiliated with Mr. Musk.” SpaceX warns that third-party Musk conduct may continue to trigger foreign retaliation against SpaceX.
    • Risk language names Grok’s “Spicy” Imagine Mode and “Unhinged” Voice Mode as carrying heightened risks of explicit content, misinformation, and “potential nonconsensual or exploitative imagery.” A putative class action over content “representing children in sexualized contexts” is disclosed, as is an Irish DPC GDPR inquiry into Grok and an FTC inquiry into chatbots as companions for children and teens.
    • The S-1 uses the term “Department of War” (not Defense) for the federal customer requiring CMMC compliance and discloses that anti-satellite weapons have been publicly discussed by foreign governments as a tool against the Starlink constellation. A cyberattack-induced cascading Kessler-style debris event is cited as a possibility.
    • Workforce of more than 22,000 full-time employees globally, with no collective bargaining and engineering acceptance rate under 2% in 2025.
    • Operating asset footprint: Starbase (Texas, HQ, Starship), Hawthorne (California, Falcon, Dragon, Merlin and Raptor), McGregor (Texas, engine testing), Redmond (Washington, Starlink satellite production at about 70 per week), Bastrop (Texas, terminal production at tens of thousands per day, doubling in 2026 to include AI compute satellites), Kennedy and Cape Canaveral (Florida, LC-39A, SLC-40, SLC-37 in build for Starship), Vandenberg (California, SLC-4 polar launches), Memphis and Southaven (Tennessee and Mississippi, Colossus data centers), Palo Alto (California, xAI HQ), more than 400 Starlink ground stations globally, and three autonomous spaceport drone ships including “Of Course I Still Love You,” “Just Read the Instructions,” and “A Shortfall of Gravitas.”
    • Related party transactions of note: roughly $20.2 billion of equipment lease undiscounted payments to Valor (Gracias) entities guaranteed by SpaceX; aircraft, security, and tunnel-construction payments to Musk affiliates; xAI subsidiary leases real property from Musk Industries LLC.
    • Pampena v. Musk: an April 3, 2026 partial judgment in the Northern District of California, where a jury found Musk personally violated Section 10(b) and Rule 10b-5 on two May 2022 statements regarding his Twitter purchase. Post-trial motions are pending. The 2018 SEC “funding secured” settlement is also disclosed.
    • Critical accounting policy quirks: flight vehicles are depreciated over expected average number of flights rather than time. Starship costs are expensed to R&D until commercialization, then capitalized. Starlink dedicated launch costs are capitalized into Connectivity PP&E rather than booked as inter-segment Space revenue, which mechanically suppresses the headline Space growth rate.
    • The One Big Beautiful Bill Act (Public Law 119-21) reversed a $659 million U.S. R&D credit deferred tax asset recognized in 2024, driving the 2025 income tax provision of $718 million versus a $549 million benefit in 2024.
    • Pre-IPO ownership pro forma at March 31, 2026: Class A 6,824,581,339 shares and Class B 5,695,729,430 shares outstanding, for a combined 12.52 billion shares before primary issuance. Class C and the redeemable convertible preferred are converted/reclassified at close.
    • Authorized capitalization post-IPO: 36.13 billion Class A, 6.13 billion Class B, 10.0 billion Class C (none issued), and 2.4 billion preferred (none issued). Headroom for future issuance is enormous.
    • Five-for-one stock split executed May 4, 2026 to set the IPO share count and round-lot price. Price range, share count, and proceeds are bracketed in this preliminary filing and will be updated before launch.

    Detailed Summary

    A different kind of S-1 from the start

    Most S-1 filings open with corporate prose and a careful, neutral business description. SpaceX opens with an Elon Musk epigraph about wanting to wake up in the morning and “think the future is going to be great,” a mission statement that says the company exists “to make life multiplanetary, to understand the true nature of the universe, and to extend the light of consciousness to the stars,” and a Kardashev Type II framing that treats the next century of capital allocation as a civilizational project. Investors are being told, in legally binding language, that single-planet existence is “a single point of failure” and that the company is hedging against humans sharing the fate of the dinosaurs. The filing dual-lists SPCX on Nasdaq in New York and Nasdaq Texas in Dallas, picks the new Texas Business Court in Houston as exclusive forum, and reincorporates from Delaware to Texas. Every macro signal is set deliberately.

    Three segments after the xAI absorption

    The most consequential mechanical change in the S-1 is the retrospective recast of financial statements to combine xAI Holdings and X Holdings into SpaceX. Both transactions are accounted for as reorganizations of entities under common control (Musk’s), so prior-period revenue, opex, and capex move into the SpaceX line items rather than appearing as acquired-business additions. This is what produces the headline numbers: $10.4 billion (2023), $14.0 billion (2024), $18.7 billion (2025). The Space segment includes Falcon, Dragon, and Starship. Connectivity is Starlink in all its consumer, enterprise, government, and mobile forms plus the Starshield military variant. AI is the former xAI in full: Colossus and Colossus II superclusters, Grok, the X platform, and the Imagine media products. The recast also explains why net income flips so violently year to year. 2024’s $791 million net income reflects a quieter pre-merger SpaceX. 2025’s $4.94 billion net loss and Q1 2026’s $4.28 billion loss reflect the integrated AI business burning capital at unprecedented rate.

    Connectivity is the cash engine

    Starlink is the only segment that looks like a normal high-margin growth business. Revenue rose 96.4% in 2024 and another 49.8% in 2025 to $11.39 billion. Operating income tripled in 2024 and then doubled again in 2025 to $4.42 billion. Segment Adjusted EBITDA in 2025 was $7.17 billion, an EBITDA margin north of 60%. Subscribers grew from 2.3 million to 10.3 million in twenty-seven months. The constellation is now roughly 9,600 satellites, about 75% of all active maneuverable satellites on orbit. Inter-satellite laser links exceed 23,000, forming a mesh that delivers 700+ Tbps of cumulative downlink. ARPU is declining steadily, from $99 monthly in 2023 to $66 in Q1 2026, but management frames this as deliberate international mix shift toward lower priced plans and notes that direct-to-cell is just beginning to monetize. Roughly 650 V1 Mobile satellites already provide service to 7.4 million monthly unique devices through partnerships with roughly 30 mobile network operators. The EchoStar spectrum acquisition adds 65 megahertz in the US plus global MSS spectrum to support V2 Mobile broadband and 5G IoT starting in 2027.

    Space economics are obscured by accounting

    The Space segment looks small in the headline financials ($4.09 billion of 2025 revenue, an operating loss of $657 million) until you understand the accounting. Starlink launches are capitalized into Connectivity PP&E rather than booked as inter-segment Space revenue. That single policy is why 2025 Space revenue grew only 7.6% even though SpaceX flew 170 missions, of which 122 were internal Starlink. The actual operating reality is that SpaceX flew more than 80% of the world’s mass to orbit in 2025, owns 24 flight-proven reusable Falcon 9 boosters certified for 40 flights each, has refln a single booster 34 times, and has invested more than $15 billion in Starship to date. Starship’s eleventh flight test is on the books, the twelfth will debut the next-generation vehicle and Raptor 3 engine, and operational payload delivery to orbit is targeted for the second half of 2026. V3 Starship is designed to deliver 100 tons to LEO fully reusable and to carry up to 60 V3 Starlink satellites per launch, a 20x payload step over Falcon 9. The Starship cost target is a 99% reduction against the historical $18,500 per kilogram average, on the way to “airline-like” reflight cadence.

    AI is a money furnace with a thesis

    The AI segment is brand new to the SpaceX line item set and dominates the loss line. AI generated $3.20 billion of 2025 revenue (up 22.2%) but lost $6.36 billion at the operating line, much of it driven by GPU depreciation. AI capex was $12.73 billion in 2025 and another $7.72 billion in Q1 2026 alone. Colossus came online in 122 days with about 100,000 H100s and 130 megawatts. Colossus II followed with 110,000 GB200s in 91 days and 110,000 GB300s in 64 days, with another 220,000 GB300s and 400 megawatts in the next phase. The two superclusters now draw about one gigawatt combined. Grok-5 is training on Colossus II, targeting multi-trillion parameters. The X platform contributes 550 million MAUs and roughly 350 million daily posts to the segment, with 117 million MAUs touching Grok AI features. The thesis the prospectus is pitching is vertical integration on physics: SpaceX controls power generation (data center turbines and, eventually, orbital solar), launch (Starship to lift orbital compute satellites), satellite manufacturing (Redmond and Bastrop), chip supply (Terafab JV with Tesla and Intel for one terawatt per year of compute hardware), and the application layer (Grok and X). Management calls this “shovels-to-tokens” and argues no other AI company has this much control over the physical stack.

    The Anthropic, Cursor, and Terafab carve-outs

    Three subsequent events disclosed in the S-1 reframe SpaceX as a cloud and software platform as much as a hardware company. Anthropic signed cloud services agreements in May 2026 to pay $1.25 billion per month for Colossus and Colossus II capacity through May 2029, ramping in May and June 2026. The Cursor (Anysphere) agreement signed April 2026 includes both a compute commitment and an option for SpaceX to acquire the company at a $60 billion implied equity value, with a $1.5 billion termination fee and an $8.5 billion deferred services fee if SpaceX breaches or terminates. Terafab is a manufacturing JV with Tesla, joined by Intel in April 2026, with a stated one terawatt per year compute hardware production target. The prospectus is explicit that Tesla and Intel are not obligated to remain in Terafab and that no definitive agreements may be signed. Anthropic, the leading commercial competitor to OpenAI, is now SpaceX’s largest disclosed cloud customer.

    The Musk pay package

    The CEO compensation disclosure is the most aggressive in S-1 history. Musk’s reported 2025 total compensation was $54,080, a base salary unchanged since 2019. SpaceX maintains no key-person life insurance on him. Then on January 13, 2026 the board granted him one billion performance-based restricted Class B shares, vesting across fifteen equal tranches as market capitalization milestones are achieved at $500 billion increments from $500 billion all the way to $7.5 trillion, with at least one tranche additionally conditioned on the existence of a permanent human Mars colony of at least one million inhabitants and on continued employment. On March 23, 2026 the board granted an additional 302.07 million performance-based restricted Class B shares across twelve tranches from $1.065 trillion to $6.565 trillion of market cap, additionally requiring the completion of “non-Earth-based data centers capable of delivering 100 terawatts of compute per year.” This second grant replaces an earlier xAI award after Musk had already earned 25.17 million Class A shares at the first xAI milestone, which were then canceled and rolled in. The combined package is roughly 1.3 billion restricted Class B shares, dwarfing the Tesla 2018 award that previously held the record. Other executive comp is more conventional. Gwynne Shotwell’s 2025 total was $85.81 million, primarily option awards. Bret Johnsen, CFO, received $9.84 million. Non-employee directors received zero cash and zero equity for 2025 service.

    Governance built to be Musk-proof in one direction only

    SpaceX takes the dual-class playbook further than any prior tech IPO. Class B carries 10 votes per share, has no sunset, and elects a majority of the board as a separate class. Removing Musk from CEO or Chairman requires a separate Class B majority vote, and Musk holds the majority of Class B. The charter renounces interest in business opportunities presented to Musk and his affiliates, explicitly preserving his right to run competing ventures (Tesla, Neuralink, The Boring Company, anything next). The company opts into the Texas Business Organizations Code’s Section 21.419 business-judgment-rule presumption, requires 3% ownership to bring a derivative suit, requires 3% ownership for six months plus solicitation of 67% of voting power to bring shareholder proposals under Section 21.373 (a provision SpaceX itself concedes will likely be challenged in court), picks the Texas Business Court in Houston as exclusive forum even for federal securities claims, and falls back to mandatory ICC arbitration in Houston with Expedited Procedure Rules if forum exclusivity is struck down. Jury trials are waived. Class actions are prohibited. SpaceX will be a controlled company and will rely on Nasdaq exemptions from independent committee requirements. Musk and certain significant investors are subject to a 366-day lock-up rather than the standard 180 days, and 100% of Musk’s shares are excluded from the early-release tiers other holders enjoy.

    Risk factors disclose things no S-1 has disclosed before

    The Risk Factors section contains language no prior S-1 has used. SpaceX warns that “actions and statements of Mr. Musk and his affiliated ventures, whether or not directly relating to us, may draw significant public attention and scrutiny” and notes that in August 2024 the Brazilian Supreme Court froze Starlink’s Brazilian assets over the conduct of X “when X was not owned by us and was only affiliated with Mr. Musk.” That is the precedent: a foreign government seized SpaceX assets over Musk’s separate business conduct. The filing names Grok’s “Spicy” Imagine Mode and “Unhinged” Voice Mode as carrying heightened risks of explicit content and “potential nonconsensual or exploitative imagery,” discloses a putative class action over content “representing children in sexualized contexts,” an Irish DPC GDPR inquiry into Grok’s processing of EU children’s data, and an FTC inquiry into chatbots as companions for children and teens. The orbital risk language describes a cyberattack-triggered cascading Kessler-style debris event that could render SpaceX-licensed orbits “unusable for an extended period,” notes that “certain foreign governments have publicly discussed the potential use of anti-satellite weapons against the Starlink constellation,” and acknowledges that the FAA does not currently permit return-to-launch-site reentries for Starship and the company will require a waiver “which is not guaranteed.” The filing also uses “Department of War” rather than “Department of Defense” when discussing CMMC compliance for federal customers, reflecting the recent rebranding.

    Capital position and the bridge loan time bomb

    The balance sheet is large but the debt structure tells a story about why an IPO is urgent now. SpaceX has $15.85 billion of cash and $7.82 billion of short-term marketable securities against total debt principal of $29.13 billion. The largest piece is the $20 billion SpaceX Bridge Loan signed March 2026 with Goldman Sachs Bank USA as administrative agent, used to refinance legacy X and xAI debt (including X B-1, X B-3, and xAI 12.5% Senior Secured Notes). The bridge matures September 2, 2027 (extendable to March 2028 with a 0.25% fee per quarter), priced at Term SOFR plus 0.75% to 1.75%, with 0.125% duration fees kicking in at year one. It must be repaid within six months after IPO completion. The amended SpaceX Credit Facility was upsized to $5.0 billion and extended to May 19, 2031 in May 2026, with a $2.0 billion performance LC sublimit. The leverage covenant is 3.75x maximum (4.25x post-qualified acquisition). Capex is enormous and consistent: $20.74 billion in 2025 ($3.83 billion Space, $4.18 billion Connectivity, $12.73 billion AI), $10.11 billion in Q1 2026 alone. Operating cash flow ($6.79 billion in 2025) does not cover capex, and the gap is being filled by financing activity ($26.35 billion of net financing inflow in 2025).

    The 100 gigawatt orbital AI bet

    Buried in the Business section is the future-markets framing that justifies the AI-segment burn rate. SpaceX is asking public investors to underwrite a plan to deploy 100 gigawatts per year of orbital AI compute on solar-powered satellites in Sun-synchronous orbit. Reaching that scale requires thousands of Starship launches per year and roughly one million metric tons of mass to orbit annually. First modular orbital AI shells are targeted for “as early as 2028.” The justification given is that the Sun contains roughly 99.8% of the solar system’s energy, that orbital compute escapes terrestrial constraints on power, cooling, latency, and permitting, and that no other AI company controls the physical stack required to deploy at that scale. The prospectus stitches this directly to the Mars project: lunar mining of rare materials, lunar mass drivers to launch satellites at low cost, and lunar factories building AI compute satellites are listed alongside asteroid mining and Mars passenger transport as the future markets investors are being asked to value. The risk language acknowledges that none of these markets currently exist and that breakthrough advances in nuclear energy could moot the orbital compute thesis entirely. Investors are being asked to take Musk’s word that the long-tail outcomes are real options.

    Thoughts

    The most important number in this S-1 is not the revenue, the loss, or the implied valuation. It is the $54,080 Musk salary unchanged since 2019 against the 1.3 billion performance-restricted Class B shares contingent on a Mars colony and 100 terawatts of off-Earth compute. This is a pay package that resolves the question of whether SpaceX is a public-markets-style optimized corporation by answering it directly: no. SpaceX is going public on Musk’s terms, with a perpetual dual-class structure, a controlled-company exemption, a Houston exclusive forum, an arbitration backstop, a class-action prohibition, a charter that explicitly renounces interest in business opportunities Musk gets pitched elsewhere, and a CEO compensation structure that pays nothing for normal performance and 1.3 billion shares for an interplanetary civilization. Investors who buy SPCX are not buying voting power. They are buying optionality on the most ambitious capital allocation thesis a public company has ever attempted, contingent on Musk continuing to deliver outcomes the rest of the industry cannot.

    The xAI absorption is the most consequential corporate event in the prospectus and the one most worth scrutinizing. Accounting it as a common-control reorganization is technically defensible because Musk controlled all three entities, but the practical effect is to fold xAI’s enormous compute burn and X’s separate litigation surface area into SpaceX’s reported financial history without showing the deals as acquisitions. The Q1 2026 net loss of $4.28 billion is almost entirely xAI capex pulling forward. The two segments that actually make money (Connectivity at a 63% Adjusted EBITDA margin, Space when you adjust for the launch accounting policy) are being asked to subsidize an AI build-out that requires the orbital compute thesis to come true to ever generate adequate returns. Strip out AI and SpaceX would be one of the highest-quality businesses ever taken public. Include AI and it is something more like a venture-stage company stapled to a cash-flow machine, with the venture stage absorbing the cash. That is the trade the IPO is asking the market to price.

    The risk-factor language about third-party Musk conduct triggering foreign asset seizures is the cleanest single articulation in any S-1 of why founder-led companies with cross-portfolio exposure are different from normal public companies. The Brazil precedent is real, the legal theory is established, and the prospectus admits it directly. Buying SPCX means accepting that a fight between Musk and a foreign government over X content moderation, a Neuralink ethics dispute, a Boring Company permit fight, or a future venture entirely unrelated to space could trigger a freeze on Starlink subscriber revenue in that country. The Corporate Opportunities waiver is the legal mechanism that makes this acceptable to the board. It is far from clear that it is acceptable to public-market shareholders. The early reception of SPCX will partly be a referendum on whether the market thinks Brazil 2024 was a one-time event or a template.

    The Anthropic disclosure is the funniest detail. SpaceX, controlled by Musk, is now selling roughly $15 billion per year of compute to Anthropic, a company explicitly founded by former OpenAI researchers who broke away from the OpenAI-Musk faction in 2021. SpaceX-Colossus is now Anthropic’s largest disclosed compute supplier through May 2029, on 90-day termination by either side. The OpenAI lawsuit, the xAI launch, and the Grok positioning as the “truth-seeking” anti-OpenAI all sit in tension with the fact that Anthropic now anchors xAI’s third-party compute revenue. The economic logic is simple. The political logic, given the lockup of compute supply that this deal effectively creates, is fascinating. Public investors are being asked to underwrite a business where the largest compute customer is a direct AI competitor and where that supply contract is the single biggest piece of disclosed enterprise AI revenue.

    What this IPO most resembles is not Tesla’s 2010 deal or Twitter’s 2013 deal but rather a hybrid of the East India Company chartering and a moonshot R&D vehicle taken public. It is a real cash-flowing business at the Connectivity layer (the largest satellite ISP on Earth) wrapped around a launch monopoly (more than 80% of global mass to orbit) wrapped around a venture-stage AI laboratory (Colossus, Grok, the Anthropic deal, the Cursor option) all underwritten by a CEO compensation structure whose biggest payoffs require a Mars colony. The investor question is not whether any individual piece works, because three of the four pieces clearly do. The question is whether the public market will price the orbital compute and Mars optionality at zero, at a small positive number, or at the eye-watering multiple the $7.5 trillion top tranche of Musk’s pay package implies the board thinks is achievable. There is no precedent for a public company successfully executing on that scale of ambition. There is also no precedent for SpaceX, Starlink, Falcon 9, or Colossus II coming online in 91 days. The S-1 reads like the company assumes the precedent is itself.

    Read the full SpaceX S-1 filing on the SEC EDGAR system for the complete prospectus, including the financial statements and all related disclosures.