PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: AI infrastructure

  • OpenAI and Broadcom Unveil Jalapeño, a Custom LLM Inference Chip to Cut Compute Costs and Reduce Nvidia Dependence

    OpenAI and Broadcom pulled the wrapper off Jalapeño on Wednesday, June 24, 2026, a custom silicon accelerator that OpenAI is calling its first “Intelligence Processor” and its first real move into designing the hardware underneath its own models. Broadcom President and CEO Hock Tan and President Charlie Kawwas physically handed the wafer to OpenAI CEO Sam Altman and President and Co-Founder Greg Brockman, a staged moment meant to signal that the ChatGPT maker is no longer just a models-and-products company but is now reaching all the way down to the chip. Jalapeño is purpose-built for large language model inference, the compute-intensive job of actually serving answers to users rather than training the model in the first place, and OpenAI plans to deploy it at gigawatt scale by the end of 2026 as the first step in a multi-generation platform built with Broadcom and Canadian electronics manufacturer Celestica. You can read the announcement straight from the source in OpenAI’s official post.

    TLDR

    OpenAI and Broadcom unveiled Jalapeño, OpenAI’s first custom AI chip, an ASIC designed from a blank slate specifically for LLM inference rather than training, manufactured by TSMC and integrated into server systems by Celestica that only OpenAI will use. OpenAI claims the chip went from initial design to manufacturing tape-out in just nine months, what it calls the fastest ASIC development cycle ever in high-performance advanced semiconductors, accelerated in part by using its own AI models to design the silicon. Engineering samples are already running ML workloads in the lab, including GPT-5.3-Codex-Spark, and OpenAI says early testing shows performance per watt “substantially better” than current state-of-the-art, a self-reported and not yet independently verified claim with a full technical report promised in the coming months. Broadcom CEO Hock Tan told Reuters the chip matches Nvidia’s Blackwell and Google’s TPUs, framing the launch as part of a flywheel where OpenAI owns the full stack from chip to model to product. The chip slots into a broader infrastructure strategy targeting 10 gigawatts of custom accelerator capacity between 2026 and 2029 with deployments alongside Microsoft and other partners, and The Decoder reported Microsoft is expected to buy 40 percent of the chips, a guarantee Broadcom reportedly demanded to secure the first phase. The move is widely read as OpenAI diversifying away from Nvidia, continuing a procurement spree that already includes AWS Trainium, AMD, and Cerebras, as inference quietly becomes the company’s real cost center.

    Thoughts

    The single most important word in this announcement is “inference,” and it is the word doing the heavy lifting. Training a frontier model is a capital expense that happens in bursts. Inference is the bill that arrives every single day, forever, scaling linearly with usage. Every ChatGPT reply, every Codex task, every API call, every agent step is an inference event, and as OpenAI’s product surface explodes that recurring cost is the thing that actually threatens the unit economics. A custom chip aimed squarely at inference is therefore not a vanity project or a research flex. It is OpenAI attacking the largest variable cost in its business at the root, trying to bend its cost-per-token curve below what it pays renting Nvidia GPUs. If Jalapeño lands anywhere near its claims, the payoff is not faster benchmarks, it is gross margin.

    The performance-per-watt claim, though, deserves the most skeptical reading in the room. OpenAI says Jalapeño will deliver performance per watt “substantially better” than current state-of-the-art, but it has not finalized the numbers, has not said which chips it tested against, on what tasks, or under what conditions, and the full technical report is somewhere in the indefinite “coming months.” These are self-reported figures from a company with an enormous interest in convincing the market it has a credible alternative to Nvidia. Hock Tan’s line that the chip is “as good as” Blackwell and Google’s TPUs is a CEO talking his own book in an interview, not a measured result. The honest posture is to treat the figures as marketing until the technical report lands. A chip running engineering samples in a lab at target frequency is real progress, but it is a very long way from a chip that holds those numbers across a production fleet under messy real-world load.

    OpenAI left the most revealing detail out of its own press release: the report, via The Decoder, that Broadcom demanded Microsoft guarantee it will buy 40 percent of the chips to secure the first phase. That single sentence tells you who is actually carrying the risk. Building gigawatt-scale custom silicon is brutally capital-intensive, and Broadcom is not willing to commit manufacturing capacity on the strength of OpenAI’s demand alone. It wants a balance sheet behind the order, and Microsoft, OpenAI’s largest backer, is the balance sheet. That detail quietly reframes the whole “OpenAI owns the stack” narrative. OpenAI may design the chip, but the deployment is underwritten by Microsoft’s purchasing commitment, which means Microsoft also gets leverage and supply security out of an OpenAI-branded part. Ownership of the design is not the same as ownership of the risk.

    The flywheel framing is genuinely interesting and probably the most defensible strategic claim OpenAI is making. OpenAI says it used its own models to accelerate parts of the chip design and optimization, compressing a normally multi-year ASIC cycle into nine months. If that is even partly true, it is a meaningful loop: the models help design the chips, the chips run the models more cheaply, the cheaper models drive more usage and revenue, and the revenue funds the next chip. That is a compounding advantage that is hard for a pure hardware vendor to replicate and hard for a pure software lab to replicate. The catch is that nine months from design to tape-out is a claim about speed, not about whether the resulting chip is actually competitive in volume. Fast tape-out and great silicon are different achievements, and the industry has seen plenty of chips that taped out quickly and underwhelmed in production.

    Strip away the “Intelligence Processor” branding and this is a playbook we have already watched run three times. Google built TPUs, Amazon built Trainium and Inferentia, Meta built MTIA, and all of them turned to Broadcom or Marvell for the design IP that is hard to replicate in-house. OpenAI is doing the same thing with the same partner, just later and louder. The diversification arc is unmistakable: OpenAI was one of the biggest Nvidia GPU buyers on earth, and in the span of a year it has signed deals for AWS Trainium, AMD accelerators, and Cerebras inference hardware, and now its own custom ASIC. Nvidia is not in trouble, demand still vastly outstrips supply, but the era where the largest AI labs were captive single-vendor customers is clearly ending. The most intriguing wildcard is OpenAI’s own line that Jalapeño is “designed with flexibility to work with all LLMs.” That is not how you describe a chip you intend to keep entirely to yourself. It hints, however faintly, at an OpenAI that could one day rent out inference infrastructure the way it now rents models, which would put it in direct competition with the very cloud providers it currently depends on.

    Key Takeaways

    • OpenAI and Broadcom unveiled Jalapeño on Wednesday, June 24, 2026, OpenAI’s first custom AI chip and its first piece of in-house silicon after years focused on models and products.
    • The chip is branded an “Intelligence Processor” and described as the first AI accelerator in a multi-generation compute platform the two companies are building together.
    • Jalapeño is purpose-built for large language model inference, the compute-intensive work of generating responses and serving answers to users, and explicitly not for training.
    • Inference is OpenAI’s recurring cost center: every ChatGPT conversation, coding request, image generation, and agent action relies on it, making it one of the highest ongoing costs in the business.
    • Broadcom President and CEO Hock Tan and President Charlie Kawwas physically delivered the first wafer to OpenAI CEO Sam Altman and President Greg Brockman.
    • OpenAI designed the chip from scratch around its understanding of LLM fundamentals, informed by its roadmap of models, kernels, serving systems, and product needs.
    • Jalapeño is described as a blank-slate design for modern LLM inference, not a general-purpose accelerator adapted from earlier AI workloads.
    • The chip is shaped by the systems OpenAI runs daily across ChatGPT, Codex, the API, and future agentic products, while also being designed to work with current and future LLMs across the industry.
    • The stated performance goal is to combine the throughput of today’s leading AI accelerators with latency closer to the fastest specialized inference systems, suiting it for interactive LLM products at scale.
    • OpenAI frames this as its full-stack advantage: it designs frontier models, builds products on top of them, and now designs the chip architecture, kernels, memory systems, networking, scheduling, and deployment systems underneath.
    • OpenAI claims Jalapeño went from initial design to manufacturing tape-out in just nine months.
    • The companies call it what they believe to be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors, against a backdrop of typically multi-year timelines.
    • OpenAI used its own AI models to accelerate parts of the chip design and optimization process, which it credits for the speed.
    • OpenAI frames the result as a flywheel: the same models served to users help improve the infrastructure that runs future models, lowering compute cost across the industry.
    • Engineering samples of Jalapeño are already running ML workloads in the lab at production target frequency and power.
    • Among the workloads running on the samples is OpenAI’s GPT-5.3-Codex-Spark model.
    • GPT-5.3-Codex-Spark currently runs on Cerebras hardware, which also specializes in inference, per The Decoder.
    • OpenAI says early testing shows Jalapeño will deliver performance per watt “substantially better” than current state-of-the-art hardware.
    • That performance-per-watt claim is self-reported and lacks independent verification; OpenAI has not said which chips it tested against, on what tasks, or under what conditions.
    • OpenAI says it is still measuring final performance and has promised a detailed technical report in the coming months.
    • The architecture reduces data movement and balances compute, memory, and networking resources to push realized utilization much closer to theoretical peak performance.
    • Jalapeño is an ASIC, which experts say is less flexible than Nvidia’s GPU but less expensive and tailorable to specific AI tasks.
    • Broadcom contributes silicon implementation and networking technologies, including its Tomahawk networking silicon, to bring the platform to large-scale production.
    • Canadian electronics manufacturer Celestica provides board, rack, and system integration expertise and will build the server systems.
    • The chips are manufactured by Taiwan’s TSMC, the world’s leading advanced semiconductor foundry, after OpenAI sent over the design.
    • Both the chips and the Celestica-built server systems will be used only by OpenAI, not sold to outside customers.
    • OpenAI plans to deploy Jalapeño at gigawatt scale by the end of 2026, with expansion in the years ahead, as the first step in a multi-generation plan.
    • Hock Tan said gigawatt-scale data center deployment will happen with Microsoft and other partners beginning in 2026.
    • The Decoder reported Microsoft is expected to buy 40 percent of the chips, with Broadcom reportedly demanding Microsoft guarantee that share to secure the first phase.
    • Broadcom CEO Hock Tan told Reuters that Jalapeño is as good as Nvidia’s Blackwell chips and the TPUs designed by Alphabet’s Google.
    • In October 2025, after 18 months of working together, OpenAI and Broadcom went public with plans to develop and deploy racks of OpenAI-designed chips starting late this year; CNBC framed the unveiling as coming eight months after that deal.
    • The prior OpenAI-Broadcom plan ultimately aimed at 10 gigawatts of custom AI accelerator capacity, with deployments expected between 2026 and 2029.
    • Estimates suggest OpenAI’s broader infrastructure plans could eventually involve around 26 gigawatts of computing capacity across custom chips, Nvidia hardware, and other accelerators.
    • OpenAI has been one of the biggest buyers of Nvidia’s GPUs since kickstarting the generative AI boom in 2022, but explosive demand has pushed it to seek other sources of advanced silicon.
    • Earlier in 2026 OpenAI struck a deal with Amazon Web Services that includes use of AWS Trainium chips, and has also signed agreements with AMD and with Cerebras, which held its IPO in May.
    • The move is widely characterized as OpenAI diversifying away from and reducing dependence on Nvidia while creating an alternative to its GPUs.
    • OpenAI’s stated goals with the chip are to reduce costs, improve energy efficiency, secure long-term computing supply, and gain more control over the infrastructure powering its services.
    • Broadcom shares climbed about 2 percent following the announcement, are up roughly 10 percent year-to-date in 2026, and have multiplied almost sevenfold since the end of 2022.
    • To build in-house chips, Meta, Amazon, and Google have turned to firms like Broadcom and Marvell for design services and IP that are hard to replicate internally; Reuters first reported OpenAI was exploring its own chip in 2023, and sources told Reuters in April 2026 that Anthropic is weighing its own AI chip.
    • Broadcom’s margin on custom AI chips is currently lower than on products like networking switches due to AI-driven high-bandwidth memory demand; Tan said SK Hynix and Samsung Electronics supply Broadcom with memory chips.

    Detailed Summary

    A blank-slate chip built only for inference

    Jalapeño is OpenAI’s first so-called Intelligence Processor, and the company is emphatic that it is not a repurposed general-purpose accelerator. It was designed from a blank slate specifically for modern large language model inference, the job of crunching data to answer a user’s query rather than the separate, bursty work of training a model. OpenAI says it designed the chip from scratch around its own deep understanding of LLM fundamentals, informed by its roadmap of models, kernels, serving systems, and product needs, drawing on the systems it runs every day across ChatGPT, Codex, the API, and future agentic products. The stated objective is to fuse the raw power and throughput of today’s leading AI accelerators with latency closer to the fastest specialized inference systems, which would make Jalapeño particularly well suited to interactive products used at scale. Notably, OpenAI also says the chip is designed with flexibility to work with all LLMs across the industry, not only its own, a claim that sits a little oddly next to its plan to keep the hardware entirely in-house.

    The full-stack flywheel and AI designing its own silicon

    OpenAI is selling Jalapeño as proof of a full-stack advantage. The argument is that because OpenAI now develops frontier models, builds products on top of them, and designs the infrastructure underneath them, including chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and the product experience, every layer can be optimized around the same goal of making its models faster, more reliable, and cheaper. OpenAI describes this as a flywheel: better infrastructure drives compute efficiency, which enables better training and serving, which powers more capable models, which become better products, which drive more usage and revenue, which funds the next generation of infrastructure. The most striking piece of that loop is that OpenAI used its own AI models to accelerate parts of the chip’s design and optimization. The company’s framing is direct: if AI can help engineers design better chips faster, it can lower the cost of compute across the industry. That self-referential loop is the part of the announcement that is genuinely novel rather than a rerun of an existing hyperscaler playbook.

    Nine-month tape-out and the partner stack

    OpenAI claims it took roughly nine months to go from initial design to manufacturing tape-out, and calls this what it believes to be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors, against an industry norm measured in years. It credits deep software-hardware co-development, Broadcom’s silicon implementation expertise, and the use of its own models to compress the schedule. The work is split across a clear partner stack: OpenAI provides the architecture and AI-specific requirements, Broadcom contributes silicon implementation and networking technology, including its Tomahawk networking silicon, and Celestica handles boards, racks, and system integration, building the actual server systems. Once the design was complete, OpenAI sent it to TSMC in Taiwan, the world’s leading advanced foundry, for manufacturing. Crucially, both the chips and the systems built around them are for OpenAI’s exclusive use; they are not products being sold to outside customers.

    Performance claims that nobody can check yet

    OpenAI says early testing shows Jalapeño will deliver performance per watt substantially better than current state-of-the-art hardware, with an architecture that reduces data movement and balances compute, memory, and networking to push realized utilization much closer to theoretical peak. Hardware program lead Richard Ho said the team optimized around the kernels, memory movement, networking, and serving patterns that matter most for frontier models, and that the chip will execute key workloads close to the hardware’s theoretical limits. He told Reuters it will be performant on what he thinks will be all kinds of future LLM iterations. The important caveat is that none of this is verifiable. OpenAI is still measuring final performance, has not finalized the numbers, and has not disclosed which chips it benchmarked against, on what tasks, or under what conditions, with the technical report only promised in the coming months. As The Decoder put it bluntly, these are self-reported numbers, unverifiable for now, that should not be taken at face value. Broadcom CEO Hock Tan’s separate claim to Reuters that the chip is as good as Nvidia’s Blackwell and Google’s TPUs is similarly an unverified assertion from an interested party.

    Gigawatts, Microsoft’s 40 percent, and who carries the risk

    Jalapeño is the opening move in a much larger infrastructure buildout. Initial deployment is targeted for the end of 2026 at gigawatt scale, expanding over multiple generations. Tan said the gigawatt-scale data centers will come online with Microsoft and other partners beginning in 2026. The deal traces back to October 2025, when, after 18 months of collaboration, OpenAI and Broadcom went public with plans to deploy racks of OpenAI-designed chips, ultimately aiming for 10 gigawatts of custom accelerator capacity with deployments expected between 2026 and 2029. Broader estimates put OpenAI’s total infrastructure ambition at around 26 gigawatts across custom chips, Nvidia hardware, and other accelerators. The detail that cuts through the optimism comes from The Decoder: Microsoft is expected to buy 40 percent of the chips, and Broadcom reportedly demanded that Microsoft guarantee that purchase to secure the first phase. That guarantee shows that the financial risk of this buildout is not OpenAI’s alone; it rests heavily on its largest backer’s balance sheet.

    The Nvidia diversification arc and Broadcom’s windfall

    Jalapeño is the clearest signal yet of OpenAI loosening its dependence on Nvidia. OpenAI has been one of the biggest buyers of Nvidia GPUs since it kickstarted the generative AI boom in 2022, but demand has exploded past what any single vendor can supply. Within 2026 alone, OpenAI has struck a deal with AWS that includes Trainium chips, signed agreements with AMD and with Cerebras, which held its IPO in May, and now rolled out its own ASIC. The pattern mirrors what Meta, Amazon, and Google already did, all of them leaning on firms like Broadcom and Marvell for design IP that is hard to build in-house, and Anthropic is reportedly weighing the same move, per sources who spoke to Reuters in April 2026. Broadcom is the obvious beneficiary, with shares up about 2 percent on the news, up roughly 10 percent in 2026, and up nearly sevenfold since the end of 2022. Even so, Tan noted that the AI-driven surge in high-bandwidth memory demand makes Broadcom’s margin on custom AI chips lower than on products like networking switches, with SK Hynix and Samsung Electronics supplying the memory.

    Notable Quotes

    “The world is moving to a compute-powered economy.”

    Greg Brockman, President and Co-Founder of OpenAI, framing the launch as a broad economic shift

    “Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems. By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access.”

    Greg Brockman, President and Co-Founder of OpenAI, on the full-stack rationale for building its own chip

    “Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers.”

    Richard Ho, who leads OpenAI’s hardware program, describing the chip as purpose-built rather than adapted

    “We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware’s theoretical limits.”

    Richard Ho, who leads OpenAI’s hardware program, on the architecture’s optimization targets and early performance

    “It will be performant on, we think, all kind of future iterations of LLMs.”

    Richard Ho, OpenAI hardware chief, to Reuters on the chip’s forward compatibility with future models

    “Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI.”

    Hock Tan, President and CEO, Broadcom, on the scale of the infrastructure commitment

    “This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026.”

    Hock Tan, President and CEO, Broadcom, on the multi-generation plan and 2026 gigawatt-scale deployment with Microsoft

    “The goal is to combine the power and throughput of today’s leading AI accelerators with latency closer to the fastest specialized inference systems, making Jalapeño well suited for interactive LLM products at scale.”

    OpenAI, in the press release, stating the performance objective for the chip

    “These are self-reported numbers that haven’t been finalized. Take them with a grain of salt.”

    Maximilian Schreiner, The Decoder, on the unverified performance-per-watt claim

    Jalapeño is a real chip running real workloads in a lab, but the gap between an engineering sample and a profitable production fleet is exactly where this story will be decided over the next year, and the most important numbers, the performance-per-watt figures that justify the whole effort, remain self-reported and unverified until OpenAI publishes its technical report. Read OpenAI’s full announcement here.

    Related Reading

    • OpenAI, the chip’s designer and the primary source of the announcement and quotes.
    • Broadcom, the co-developer providing silicon implementation and Tomahawk networking.
    • Celestica, which builds the boards, racks, and server systems around the Jalapeño chip.
    • ASIC (application-specific integrated circuit), what Jalapeño is, a custom chip built for one task unlike a general-purpose GPU.
    • Nvidia Blackwell, the Nvidia architecture Broadcom’s CEO claims Jalapeño matches.
  • SubQ 1.1 Small Explained: How Subquadratic Sparse Attention Hits 98% Retrieval at 12 Million Tokens With 64.5x Less Compute Than Dense Attention

    Subquadratic, a frontier AI research and infrastructure company, has released the model card and technical report for SubQ 1.1 Small, a long-context language model built on a new attention mechanism the company calls Subquadratic Sparse Attention (SSA). The headline claim is unusual in two directions at once: the model retains 98% single-fact retrieval accuracy at 12 million tokens, roughly twelve times the length it was primarily trained on, while cutting attention compute by 64.5x against dense attention at a 1 million token context. The deeper argument in the report is not really about a single model at all. It is about what happens to the entire retrieval-and-orchestration stack once reasoning over a complete artifact stops being prohibitively expensive.

    TLDR

    SubQ 1.1 Small is a small long-context model that replaces the dense attention of an existing open-weight frontier model with Subquadratic Sparse Attention, a learned, content-dependent sparse attention mechanism that scales linearly in compute and memory rather than quadratically. On retrieval it posts 99.12% on NVIDIA’s 13-task RULER suite at 128K tokens and 100% needle-in-a-haystack accuracy at 1M and 2M tokens, holding at 98% out to 6M and 12M tokens while attending to only 0.13% of token pairs. It keeps competitive general ability, scoring 85.4% on GPQA Diamond and 89.7% pass@4 on LiveCodeBench v6, and reaches 13% on the long-horizon AutomationBench Finance agentic benchmark, close to Opus 4.8 and GPT-5.5 and well ahead of mid and small tiers. The efficiency story is a scaling win rather than a constant-factor one: 64.5x fewer attention FLOPs than dense attention at 1M tokens and 56x faster than FlashAttention-2 on a single attention layer. The report frames cheap long-context compute as a research accelerator that let the team run more than one hundred million-token experiments and find a training recipe (long-context continued pretraining is the strongest lever) rather than guess at one, positions SSA against FlashAttention, DeepSeek’s Lightning Indexer line, state space models like Mamba, and hybrids, invokes Sutton’s Bitter Lesson to argue that RAG, chunking, and agentic scaffolding are partly workarounds for context scarcity, and was independently verified by Appen. Deployment is starting with design partners now, with a 2M to 12M token lineup planned by year end.

    Thoughts

    The most interesting move in this report is the framing, not the benchmark. Subquadratic plants its flag on Richard Sutton’s Bitter Lesson and argues that much of the modern AI stack, the retrieval pipelines, the chunkers, the re-rankers, the agentic orchestration, is scaffolding built around a single computational constraint: dense attention costs grow with the square of context length. If that constraint relaxes, a lot of hand-engineered machinery that exists to feed a model the right fragments at the right moment starts to look like the task-specific pipelines that learned representations eventually displaced. That is a genuinely provocative thesis, and it is the right lens for reading the rest of the document. The company is not selling a longer context window as a feature. It is betting that whole-artifact reasoning is a different shape of capability than retrieval over fragments, and that fragmentation destroys the cross-references a contract or a codebase actually depends on before the model ever sees them.

    The part of the paper most teams will undervalue is the claim that the real payoff of efficient attention is not cheaper inference but cheaper experimentation. A dense long-context training campaign is expensive enough that most groups get a handful of attempts and are forced to guess at the recipe. Subquadratic says SSA let them run more than a hundred experiments across six model generations with per-step iteration under a minute at million-token context, which is how they discovered that long-context continued pretraining, not clever post-training, was the dominant lever. If that holds, algorithmic efficiency becomes a first-class scaling variable alongside parameters and data, because capability becomes responsive to iteration velocity rather than raw compute alone. It reframes efficiency from a deployment line item into a research multiplier, and that is a more durable advantage than any single benchmark number.

    The generalization result deserves scrutiny precisely because it is so clean. A model trained overwhelmingly at 1M tokens, with a sliver at 2M and nothing beyond, holds 98% retrieval at 12M. The proposed explanation is that SSA routes attention by content relevance rather than fixed positional pattern, so there may simply be no obvious length boundary once the routing behavior is learned. That is plausible and the report is careful to say the 12M result emerged rather than being designed for. But single-needle NIAH is a deliberately clean probe with one target and a binary answer. The far harder RULER suite is only reported at 128K, the longest standardized length in the original benchmark, so the multi-hop, aggregation, and distractor-heavy capability that whole-artifact reasoning actually requires has public numbers at 128K, not at 12M. The honest read is that precise retrieval generalizes spectacularly and composite reasoning at extreme length is still an open question the report does not over-claim on.

    What lends the report credibility is how much counter-evidence it volunteers. It walks through MiniMax abandoning its hybrid M1 architecture and returning to full attention for M2 after efficient variants showed multi-hop reasoning deficits at scale. It admits that earlier SubQ checkpoints improved retrieval while regressing on knowledge benchmarks, forcing dedicated capability-balancing work. It describes catching a case where the MRCR benchmark moved up while the model felt worse in real workflow spot-checks, and switching its development signal to RULER as a result. That last point is a quietly important methodological argument: benchmark score and deployment behavior diverged enough to change checkpoint selection, which is a warning every team shipping long-context models should internalize. A vendor confident enough to show where its own metrics misled it is more trustworthy than one that only shows the wins.

    A few caveats keep the enthusiasm grounded. AutomationBench Finance at 13% is genuinely strong relative to peers, but it is a low absolute score across the board, including for GPT-5.5 at 18% and Opus 4.8 at 16%, so this is early evidence of agentic transfer rather than proof of a finished agent. The efficiency comparisons isolate a single attention layer rather than full end-to-end model throughput, which is the right way to expose the scaling shape but not the same as a wall-clock serving benchmark. The model is built from an unnamed donor open-weight frontier model, so some of its general-knowledge and coding strength is inherited rather than created here. And the most aggressive claims about the future, a 2M to 12M lineup and much higher sparsity, are roadmap, not released artifacts. None of that undercuts the core result. It just means the right posture is to treat SubQ 1.1 Small as a strong proof of concept for an architecture that, if it scales as advertised, could quietly remove a layer of the AI stack that everyone currently takes for granted.

    Key Takeaways

    • SubQ 1.1 Small is a long-context language model from Subquadratic AI, built on a new attention mechanism called Subquadratic Sparse Attention (SSA), released June 16, 2026 alongside a model card and technical report.
    • SSA is a learned, content-dependent sparse attention mechanism that scales linearly in both compute and memory with sequence length, rather than quadratically like dense attention.
    • The central result is context-length generalization: the model was trained primarily at 1M tokens, with some training at 2M and none beyond, yet retrieval held far past the training window.
    • Needle-in-a-haystack accuracy is 100% at 1M and 2M tokens and 98% at both 6M and 12M tokens, roughly twelve times the primary training length.
    • At 12M tokens the model attends to only 0.13% of token pairs, close to a 1,000x reduction in attention relationships, while still retrieving accurately.
    • On NVIDIA’s 13-task RULER benchmark at 128K tokens, SubQ 1.1 Small scores 99.12%, with the remaining errors concentrated in aggregation-style tasks rather than retrieval.
    • RULER tests beyond single-fact lookup: single-key and multi-key retrieval, common-word and frequent-word extraction, and multi-hop variable tracing across positions.
    • At 1M tokens, SSA requires 64.5x fewer attention FLOPs than dense attention (3.9 PFLOP versus 252 PFLOP per attention layer).
    • On a single attention layer, SSA runs 56x faster than FlashAttention-2 at 1M tokens (966 ms versus 54,164 ms on an H100), reaching parity near 16K tokens and pulling away as context grows.
    • The efficiency gain is a scaling-law win, not a constant-factor speedup: the advantage over dense attention grows as context length increases.
    • On general knowledge, SubQ 1.1 Small scores 85.4% on GPQA Diamond (pass@1), below GPT-5.5 (93.2) and Opus 4.8 (92), near Sonnet 4.6 and GPT-5.4-mini (87.5), and above GPT-5.4-nano (81.7) and Haiku 4.5 (67.2).
    • On coding, it reaches 89.7% pass@4 on LiveCodeBench v6, close to the absolute frontier (GPT-5.5 92, Opus 4.8 92.2) and ahead of the smaller tiers.
    • On AutomationBench Finance, a long-horizon agentic benchmark, it scores 13%, close to Opus 4.8 (16%) and GPT-5.5 (18%) and ahead of Sonnet 4.6 (8%), Haiku 4.5 (3%), and GPT-5.4-mini (0%). Absolute scores are low across all models.
    • The model was not trained from scratch. The team converted an existing open-weight frontier model by replacing dense attention with SSA, then built long-context ability through staged context extension and continued pretraining.
    • Context was extended in stages (262K, 512K, 1M, 2M) using YaRN positional scaling, with long-context continued pretraining performed between extension stages on naturally long data: books, long documents, and repository-scale code.
    • Roughly one trillion tokens of continued pretraining were performed, most of it at the 1M-token stage.
    • Long-context continued pretraining was the most consistent predictor of long-context retrieval gains across the experiments, more so than post-training tweaks.
    • The team ran more than one hundred long-context experiments across six major model generations, which the report argues is only possible because SSA made million-token iteration cheap (under a minute per step).
    • Capability balance was a recurring challenge: gains in long-context retrieval often regressed short-context knowledge and reasoning unless training was explicitly managed for both.
    • Benchmark scores and real deployment behavior diverged. The MRCR benchmark moved up while qualitative workflow spot-checks got worse, so the team switched its primary development signal to RULER.
    • The report frames RAG, chunking, summarization, and agentic orchestration as scaffolding built around context scarcity, drawing an analogy to Sutton’s Bitter Lesson, where hand-engineered mechanisms get displaced by larger-scale learning.
    • SSA is positioned against FlashAttention (a memory optimization that does not change quadratic compute), fixed-pattern sparse attention, DeepSeek’s learned sparse line, state space models, and hybrid architectures.
    • DeepSeek’s Lightning Indexer (used in DSA and CSA) is the closest published comparison. Its quadratic scoring overtakes the sparse attention it feeds around 52,000 tokens, reaching roughly 16x the attention cost at 1M and 190x at 12M.
    • State space models like Mamba achieve linear cost through a compressed fixed-size state, but that compression is lossy and weakens exact retrieval, which is why production efficient models are usually hybrids with some dense attention layers retained.
    • MiniMax is cited as a cautionary case: it moved from a hybrid M1 to a full-attention M2 after hybrids showed multi-hop reasoning deficits at scale and less mature supporting infrastructure.
    • The benchmark results were independently verified by Appen, a third-party evaluation firm.
    • The named use cases are financial analysis and due diligence, legal and contract work, and software engineering (architecture-level reasoning, cross-file refactoring, dependency tracing, planning, review, and long-horizon memory).
    • Sparsity settings were deliberately conservative, tuned for maximum context length rather than maximum sparsity. Limited experiments at 4x the sparsity reported positive early results.
    • The training infrastructure used a memory-scaling ladder: single node, intra-node sequence parallelism, CPU offload, multi-node sequence parallelism, nested offloading, and Ring Attention for the longest contexts.
    • Beyond about 8M tokens, BF16 numerical underflow and stability became practical constraints on evaluation.
    • The technical report is authored by Saul Ramirez, Alex Whedon, Ashmal Vayani, and Phong Vo of Subquadratic AI.
    • Deployment is starting with a first cohort of design partners, with broader rollout through the quarter and a general model lineup ranging from 2M to 12M tokens by the end of the year.
    • The company’s framing line is “Efficiency is intelligence,” and its broader thesis is that the point is not bigger context windows for their own sake but reasoning directly over complete artifacts with less surrounding scaffolding.

    Detailed Summary

    The problem: whole-artifact reasoning and context scarcity

    The report opens by naming a class of tasks it calls whole-artifact reasoning: problems whose structure requires reasoning across a complete artifact rather than over isolated fragments. A legal agreement may define a term on page 2, qualify it on page 12, carve out an exception on page 46, and amend it in a schedule. A function may be defined in one file, called from forty others, and constrained by invariants encoded in the architecture rather than in comments. A financial review may require connecting filings, earnings reports, contracts, and internal records. In each case the difficulty is not locating a passage, it is reasoning over relationships distributed throughout a large artifact. Most production systems do not do this directly. They rely on retrieval pipelines, chunking, summaries, and agentic workflows that partition information and reconstruct fragments at inference time, because dense attention scales quadratically with context length and makes direct reasoning over large artifacts expensive. Subquadratic argues that much of the modern AI stack is therefore designed to manage context scarcity rather than reason over complete artifacts, and it connects this to Sutton’s Bitter Lesson: sophisticated hand-engineered mechanisms historically get displaced once larger-scale learning becomes practical.

    What SSA is and the three requirements it targets

    Subquadratic Sparse Attention is a content-dependent sparse attention mechanism designed to satisfy three requirements at once, a combination the report argues prior approaches never achieved in a practical long-context system. First, dense-attention-level retrieval and reasoning quality, which requires routing that is content-dependent (determined by the tokens themselves) rather than driven by a fixed positional pattern. Second, subquadratic scaling, where selection, retrieval, and attention are each linear in sequence length so the mechanism is linear end to end, not only within the attention read. Third, full-context training with standard autoregressive generation, so the model can optimize over the entire context during training while keeping efficient token-by-token decoding at inference. The internal mechanism by which SSA achieves this is held back as outside the scope of the report, which focuses instead on the requirements and the experimental program that followed.

    Where SSA sits among prior approaches

    The background section is effectively a taxonomy of long-context modeling. FlashAttention is treated not as a competitor but as the standard dense-attention baseline: it solved the memory problem by never materializing the full attention matrix, but it left the quadratic compute cost untouched, so doubling context still quadruples attention computation. Fixed-pattern sparse attention (sliding-window, strided, as in Longformer, BigBird, and the sliding window in Gemma) scales well but sacrifices content-dependent routing and tends to fail on retrieval benchmarks like RULER. Compression methods like Multi-head Latent Attention reduce KV-cache memory at inference but do not change the quadratic prefill cost. Learned sparse attention, exemplified by DeepSeek’s Native Sparse Attention and its Lightning Indexer, learns where to route but pays a quadratic cost in the indexer itself. State space models and linear attention (Mamba, Mamba-2 and Mamba-3, RetNet, RWKV, gated delta networks) achieve linear cost through a compressed fixed-size state, but that compression is lossy and weak on exact retrieval. Hybrids (Jamba, Kimi Linear, Qwen3 Next, Nemotron) keep a few dense layers to preserve retrieval, which means the quadratic component still dominates at long context. System-level workarounds (RAG, agentic frameworks, recursive language models) move retrieval outside the model entirely. The report’s stated open problem is to combine subquadratic scaling end to end with content-dependent retrieval, arbitrary-position access, and practical ultra-long-context training in one system, which it claims no widely deployed architecture provides and which SSA targets.

    Training: conversion, staged context extension, and continued pretraining

    Rather than training from scratch, the team converted an existing open-weight frontier model that supported a 262K-token context by replacing its dense attention with SSA. They then extended the context window in stages (262K to 512K to 1M to 2M) using YaRN to rescale positional representations, performing long-context continued pretraining between extension stages rather than jumping straight to the final length. The training mixture emphasized naturally long data such as books, long documents, and repository-scale code, packed to the target length with document separators and without masking cross-document attention boundaries. Most continued-pretraining tokens were trained at the 1M-token stage, with roughly one trillion tokens total. Post-training played a separate role: shaping how the long-context capability was expressed while preserving reasoning, coding, and instruction following. The team explored sample-level loss aggregation to keep a few extremely long examples from dominating gradient updates, and staged the post-training corpus across synthetic retrieval tasks, long-context reasoning, coding, educational material, and general instruction following, alternating capability-building phases with recovery phases.

    Results: retrieval, knowledge, coding, and agentic tasks

    On retrieval, SubQ 1.1 Small scores 99.12% on the 13-task RULER average at 128K, with errors concentrated in aggregation-style tasks like common-word and frequent-word extraction. On needle-in-a-haystack, evaluated on 50 held-out UUID samples per length, it scores 100% at 1M and 2M (within the training window) and 98% at 6M and 12M (held out), attending to only 0.13% of token pairs at 12M. On knowledge, GPQA Diamond pass@1 is 85.4%, landing between the small and mid frontier tiers and confirming that long-context optimization need not sacrifice reasoning, a result the report credits to its capability-balancing stages after earlier checkpoints showed retrieval gains coming at the cost of knowledge. On coding, LiveCodeBench v6 pass@4 is 89.7%, and the report notes coding data played a dual role, also improving non-code long-context retrieval because code is dense with the cross-position dependencies that train general routing. On long-horizon agentic work, AutomationBench Finance is 13%, where agents must discover the right endpoints among roughly 500 across 47 applications, make interdependent API calls, follow layered business rules, and ignore seeded distractors, graded on binary end-state correctness with no partial credit.

    Efficiency and the DeepSeek comparison

    Efficiency is measured on one attention layer against a dense baseline on the same backbone. Per-forward-pass attention FLOPs scale from a 2.1x reduction at 32K to 8x at 128K, 31.5x at 512K, and 64.5x at 1M tokens (3.9 PFLOP for SSA versus 252 PFLOP for dense). Measured against FlashAttention-2 in isolation, SSA reaches parity near 16K tokens and pulls away to 56x at 1M, where it runs in 966 ms versus 54,164 ms on an H100. The report devotes a discussion section to DeepSeek’s sparse attention line as the closest published comparison. DeepSeek’s Lightning Indexer is a learned selector, but it is a full-attention distilled transformer, so it scales quadratically: in a V3.2-style configuration the indexer is cheaper than the sparse attention it feeds only below about 52,000 tokens, then overtakes it, reaching roughly 16x the attention cost at 1M tokens and 190x at 12M. SSA targets that same selection role with a selector the report says is dramatically cheaper and linear throughout, and notes SSA could conceptually replace the selector over either uncompressed or compressed representations.

    Efficiency as a research accelerator and the evaluation lessons

    A recurring theme is that the most valuable effect of cheap long-context compute was on the research loop, not just inference. Where a dense campaign would allow a handful of attempts, SSA enabled more than a hundred experiments across six model generations with per-step iteration under a minute at million-token context. That throughput is what surfaced the finding that long-context continued pretraining is the strongest lever, and it leads the authors to argue that algorithmic efficiency should be treated as a first-class scaling variable alongside model and dataset size. The report is unusually candid about evaluation pitfalls. It describes how the MRCR benchmark diverged from deployment behavior, with MRCR-optimized checkpoints often feeling worse on repository-scale code reasoning, multi-document synthesis, and contract analysis, which pushed the team to rely on RULER and a fixed set of qualitative workflow spot-checks as development signals. It also cites MiniMax returning from a hybrid M1 to a full-attention M2 as evidence that reducing asymptotic cost is not sufficient on its own if retrieval quality, reasoning at scale, and system maturity are not preserved at the same time.

    Implications, availability, and what comes next

    The report’s deployment argument is that the most important enterprise implication of long-context models is not larger windows but the ability to reason directly over complete or more-complete artifacts, moving retrieval, re-ranking, and orchestration logic into the model where the task is naturally whole-artifact rather than naturally decomposable. It is careful not to declare retrieval obsolete: for corpora larger than any plausible context window, fast-changing knowledge, and genuinely multi-stage workflows, RAG and orchestration remain the right tools. The narrower claim is that the class of scaffolding that exists only to compensate for context limits gets smaller as efficient long-context models extend the reachable window. The benchmark results were independently verified by Appen. Subquadratic is deploying SubQ 1.1 Small with a first cohort of design partners now, with broader rollout through the quarter and a general lineup spanning 2M to 12M tokens planned by the end of the year, and it flags much higher sparsity as future work.

    Notable Quotes

    “Much of the modern AI stack is therefore designed to manage context scarcity rather than reason over complete artifacts directly.”

    SubQ-1.1-Small Technical Report, framing retrieval and orchestration as workarounds for an architectural limit

    “The hybrid has moved the line, but not changed its shape.”

    SubQ-1.1-Small Technical Report, on why hybrid models keep their quadratic component at long context

    “A routing mechanism intended to make long context affordable becomes the dominant long-context cost, reintroducing quadratic scaling after providing scalar compute savings.”

    SubQ-1.1-Small Technical Report, on DeepSeek’s Lightning Indexer overtaking the attention it feeds

    “If the cost of long-context experiments is too high, teams are forced to guess at the recipe. If the cost falls far enough, they can search for it.”

    SubQ-1.1-Small Technical Report, on efficient attention as a research accelerator

    “Fragmentation systematically destroys those relationships before the model ever sees them.”

    SubQ-1.1-Small Technical Report, on why chunking hurts whole-artifact reasoning

    “Holding the whole artifact in context changes the shape of the task rather than only the speed of it.”

    SubQ-1.1-Small Technical Report, on the difference between bigger windows and direct reasoning

    “The value of SSA is therefore not only that it makes long-context inference cheaper. It makes long-context experimentation cheaper.”

    SubQ-1.1-Small Technical Report, conclusion

    Read the full SubQ 1.1 Small technical report and model card here.

    Related Reading

    • Subquadratic (subq.ai) the company behind SubQ 1.1 Small and the Subquadratic Sparse Attention architecture, where you can join the waitlist.
    • The Bitter Lesson by Richard Sutton the short essay whose argument the report leans on, that hand-engineered mechanisms lose to general methods that scale with computation.
    • Attention Is All You Need the original Transformer paper that introduced the dense attention whose quadratic cost SSA is built to remove.
    • RULER (arXiv) NVIDIA’s long-context benchmark that the report uses as its primary retrieval signal, and that fixed-pattern sparse methods historically struggle with.
    • Retrieval-augmented generation (Wikipedia) background on the RAG approach that the report frames as scaffolding around context scarcity rather than a permanent fixture.
  • OpenAI’s Leaked 2025 Financials: $34 Billion in Spending, a $38.5 Billion Net Loss, and a $17 Billion Microsoft Bill Ahead of Its IPO

    Infographic summarizing OpenAI leaked 2025 financials: $13.07B revenue, $34B total costs, $20.92B operating loss, $38.53B net loss, where the $34B went, the $17.2B paid to Microsoft versus $303M paid back, inference costs, and IPO valuation context

    OpenAI’s audited 2025 financials leaked this week, and they are the clearest picture yet of what it actually costs to run the company behind ChatGPT. Independent journalist Ed Zitron first published the documents, and the Financial Times independently confirmed them. The headline: OpenAI spent $34 billion last year, booked $13.07 billion in revenue, and reported a net loss attributable to the company of $38.5 billion. The disclosure lands just days after OpenAI confidentially filed for an IPO that could value it north of $1 trillion.

    TLDR

    OpenAI’s audited 2025 numbers, leaked by Ed Zitron and confirmed by the Financial Times, show revenue tripling to $13.07 billion while total costs reached $34 billion, producing a $20.92 billion operating loss and a $38.53 billion net loss attributable to the company. The much larger net loss is inflated by a one-time $41.55 billion non-cash charge tied to OpenAI’s October 2025 conversion from a nonprofit to a public benefit corporation; strip the non-cash items and the loss is closer to $8 billion. R&D alone was $19.18 billion, cost of revenue (inference) was $7.5 billion, and sales and marketing ballooned to $5.73 billion. OpenAI paid Microsoft $17.2 billion in 2025 while Microsoft paid OpenAI only $303 million, exposing a deep Azure dependency. The company burned $1.60 for every dollar of revenue, down from $2.37 in 2024, and gross margin slipped from roughly 40% to 33% as more capable models consumed more compute per query. The leak arrives as OpenAI files a confidential S-1, targets a listing as early as September 2026 at up to a $1 trillion valuation, and races rival Anthropic, which is more valuable on paper and claims it is already turning an operating profit.

    Thoughts

    The most important thing to understand about these numbers is that there are two loss figures and the press will conflate them. The $38.53 billion net loss is the scary headline, but $41.55 billion of it is a non-cash accounting charge from converting investor convertible interests into equity during the for-profit restructuring. That charge is real on the audited statement and it will show up in the eventual S-1, but it is a one-time artifact of OpenAI’s unusual corporate history, not money that left the building. The number that describes the actual business is the $20.92 billion operating loss. That is the one to watch, and it is still enormous.

    The genuinely encouraging line in the whole release is the loss-per-dollar ratio. In 2024 OpenAI spent $2.37 to generate a dollar of revenue. In 2025 that fell to $1.60. A company that is still losing $1.60 on every dollar is not a healthy business, but a company whose efficiency improved by a third in a single year while tripling its top line is at least pointed in a defensible direction. The bull case for OpenAI lives entirely in the slope of that line. If it keeps improving at that rate, the math eventually crosses over. If it stalls, the valuation is a fantasy.

    The Microsoft relationship is the single most revealing disclosure, and it is wildly asymmetric. OpenAI paid Microsoft $17.2 billion in 2025. Microsoft paid OpenAI $303 million. That is a 56-to-1 ratio, and it reframes the partnership: Microsoft is not really a peer or even just an investor, it is OpenAI’s landlord and primary supplier, collecting rent on every model trained and every query answered. The April 2026 renegotiation that capped revenue-share payments at $38 billion through 2030, down from a projected $135 billion, suddenly looks less like a favor and more like OpenAI desperately trying to lower its single largest cost. The dependency cuts both ways, but right now Microsoft holds the better hand.

    The structural problem hiding inside the cost of revenue line is inference. Training a model is a fixed, one-time cost. Serving it is a recurring cost that scales with every one of ChatGPT’s roughly 800 million weekly users. OpenAI spent $5.02 billion on Azure inference in the first half of 2025 alone, and the more capable its reasoning models get, the more compute each answer burns. That is why gross margin went down even as revenue went up. It is the opposite of how software is supposed to work, where the marginal cost of one more user trends toward zero. OpenAI’s marginal cost is real, large, and growing. The counterargument is that per-token inference costs have been falling roughly tenfold a year, so the unit economics could still flip. That is the entire wager.

    Finally, the timing matters more than the numbers. OpenAI’s confidential S-1 means these audited figures were going to become public regardless, since the SEC requires the full prospectus at least 15 days before a roadshow. What the leak changes is who gets to study them first. Prospective IPO buyers, enterprise customers signing multi-year API contracts, and competitors now have the audited books weeks or months early, and they are reading them against Anthropic, which filed at a higher valuation and claims an operating profit. For a company asking the public markets to underwrite a $1 trillion bet on a monopoly outcome that does not yet exist, losing control of the narrative this early is not a small thing.

    Key Takeaways

    • OpenAI’s audited 2025 financials were first published by independent journalist Ed Zitron and independently confirmed by the Financial Times, the first verified look at the company’s books before its planned IPO.
    • Revenue grew from $3.7 billion in 2024 to $13.07 billion in 2025, more than tripling year over year, making OpenAI one of the fastest-growing businesses in history.
    • By the end of 2025 OpenAI was generating roughly $2 billion in monthly revenue, up from about $1 billion a quarter at the end of 2024.
    • Total costs and expenses hit $34 billion in 2025, up from $12.48 billion in 2024.
    • Research and development was the single largest expense at $19.18 billion, up from $7.81 billion, and exceeded total revenue on its own.
    • Of that R&D spend, $10.59 billion went to Microsoft, almost certainly the GPU compute cost of training frontier models on Azure.
    • Cost of revenue, the expense of serving ChatGPT responses (inference), rose from $2.65 billion to $7.5 billion.
    • Sales and marketing jumped from $1.11 billion to $5.73 billion, a 418% increase.
    • General and administrative costs rose from $907 million to $1.57 billion.
    • The operating loss, the truest measure of day-to-day economics, grew from $8.78 billion to $20.92 billion.
    • The net loss attributable to OpenAI was $38.53 billion, up nearly eightfold from $5.09 billion in 2024.
    • The bulk of that jump was a one-time, non-cash $41.55 billion charge from OpenAI’s October 28, 2025 conversion to a public benefit corporation, reflecting the changing fair value of convertible interests and warrant liabilities.
    • Stripping out the restructuring charge and other non-cash items such as stock-based compensation and Microsoft computing credits, the underlying loss was about $8 billion.
    • Including all factors, gross net loss reached $60.35 billion, lowered to the $38.53 billion attributable figure by removing $21.82 billion attributed to noncontrolling and redeemable noncontrolling interests.
    • OpenAI burned $1.60 for every $1 of revenue in 2025, an improvement from $2.37 in 2024, the clearest data point in the bull case.
    • Measured as a percentage of revenue, the operating loss improved from 237% in 2024 to 160% in 2025.
    • In total, OpenAI paid Microsoft $17.2 billion in 2025: $10.59 billion in R&D fees, $6.047 billion in cost of revenue, $527 million in sales and marketing, and $42 million in G&A.
    • Microsoft paid OpenAI just $303 million in the same year, a 56-to-1 imbalance underscoring OpenAI’s Azure dependency.
    • SoftBank paid OpenAI $867 million in 2025.
    • At year-end OpenAI carried $3.64 billion in outstanding payables to Microsoft, plus tens of millions more in accrued and non-current liabilities.
    • OpenAI spent $5.02 billion on Azure inference in just the first half of 2025; Azure inference from 2024 through Q3 2025 totaled $12.43 billion.
    • ChatGPT serves roughly 800 million weekly users, meaning billions of queries a week, each one burning GPU time at Azure’s pricing of about $6.98 per H100 GPU-hour.
    • Gross margin fell from roughly 40% in 2024 to 33% in 2025, because more capable reasoning models consume more compute per query.
    • Research firm Sacra estimates OpenAI’s inference costs reached $8.4 billion in 2025 and will rise to $14.1 billion in 2026, a 68% increase.
    • At year-end OpenAI held just over $50 billion in assets, with almost half in cash.
    • The April 2026 Microsoft renegotiation ended exclusivity and capped revenue-share payments at $38 billion through 2030, down from a projected $135 billion, potentially saving OpenAI up to $97 billion over five years.
    • OpenAI filed a confidential draft S-1 with the SEC around May 22, 2026 and confirmed it publicly on June 8, naming Goldman Sachs and Morgan Stanley as underwriters.
    • The company is targeting a listing as early as September 2026 at a valuation that could exceed $1 trillion, though Sam Altman has said a public offering “may be a while.”
    • OpenAI raised $122 billion earlier in 2026 at a $730 billion pre-money valuation, putting its post-money value around $852 billion.
    • At an $852 billion valuation, OpenAI trades at roughly 65 times its 2025 revenue.
    • Rival Anthropic also filed IPO paperwork this month after raising $65 billion at a $900-$965 billion valuation, making it more valuable on paper than OpenAI, and says it expects to report an operating profit of $559 million in the June quarter.
    • HSBC analysts estimate OpenAI may need more than $207 billion in additional capital through 2030 even under optimistic projections.
    • OpenAI projects profitability by 2029 or 2030; independent analysts put the more likely date at 2031 or later.
    • Bridgewater partner Greg Jensen reportedly told clients the implied revenue multiples price OpenAI for “a monopoly outcome that does not yet exist.”
    • Zitron separately reported OpenAI had a negative 122% non-GAAP operating margin in Q1 2026 and that ChatGPT growth has stalled, with the company projecting paid ChatGPT Plus subscriptions to fall from 44 million in 2025 toward cheaper tiers in 2026.

    Detailed Summary

    How the leak happened and why it matters now

    The audited documents were obtained and first published by Ed Zitron on his newsletter Where’s Your Ed At, then independently verified by the Financial Times, which reviewed the same materials. That dual sourcing matters: this is not a rumor or a model, it is OpenAI’s actual audited financial statement. The timing is the story. OpenAI filed a confidential draft S-1 with the SEC around May 22, 2026 and confirmed it publicly on June 8. Under SEC rules the full prospectus must be released at least 15 days before an investor roadshow, so the 2025 numbers were going to be public soon regardless. The leak simply moved that disclosure forward, handing prospective investors, enterprise customers, and competitors an early look at the books.

    Revenue tripled, costs grew faster

    OpenAI’s revenue rose from $3.7 billion in 2024 to $13.07 billion in 2025, and monthly revenue reached nearly $2 billion by year-end. By almost any normal standard that is spectacular growth. The problem is that costs grew faster, reaching $34 billion against $12.48 billion the year before. The gap between what OpenAI earns and what it spends has widened every year since its founding, and 2025 is the starkest example yet. Revenue alone was outpaced by research and development as a single line item in both of the last two years.

    Two loss numbers, and why both matter

    There are two figures that get cited interchangeably and should not be. The operating loss of $20.92 billion is what the business spent beyond what it earned from operations: training models, serving ChatGPT, paying engineers, running marketing. The net loss attributable to OpenAI of $38.53 billion is far larger because 2025 was the year OpenAI completed its conversion from a nonprofit to a for-profit public benefit corporation, finalized on October 28, 2025. That restructuring triggered a $41.55 billion non-cash charge reflecting the changing fair value of convertible equity interests and warrant liabilities. Before the conversion, investors held convertible interest rights treated as liabilities under US accounting rules and revalued upward as OpenAI’s valuation climbed, creating the charge. It is not expected to recur. Including all minor items, gross net loss reached $60.35 billion, reduced to the $38.53 billion attributable figure after removing $21.82 billion tied to noncontrolling and redeemable noncontrolling interests, primarily the OpenAI Foundation’s stake. Strip the non-cash noise and the underlying loss was about $8 billion.

    Where the $34 billion went

    The spending breaks into four lines. Research and development was $19.18 billion, the largest category, with $10.59 billion of it flowing to Microsoft for training compute. Cost of revenue, the expense of serving responses to users, was $7.5 billion and captures inference, the compute consumed every time someone prompts ChatGPT or calls the API. Sales and marketing reached $5.73 billion, up 418% year over year, a striking jump for a product that grew largely by word of mouth. General and administrative costs added $1.57 billion. The shape of the spending tells you OpenAI is simultaneously racing to build better models, serve a massive and growing user base, and aggressively defend market share through marketing.

    The Microsoft dependency

    The most striking single disclosure is the scale of the Microsoft relationship. OpenAI paid Microsoft $17.2 billion in 2025: $10.59 billion in R&D fees for model training, $6.047 billion in cost-of-revenue for inference serving, $527 million in sales and marketing, and $42 million in G&A. Microsoft paid OpenAI just $303 million the same year. SoftBank paid OpenAI $867 million. The 56-to-1 ratio between what OpenAI pays Microsoft and what Microsoft pays back makes the structural reality plain: Microsoft is OpenAI’s largest landlord. The dynamic began shifting in April 2026, when the two renegotiated, ending Microsoft’s exclusivity and capping revenue-share payments at $38 billion through 2030, down from a projected $135 billion. That could save OpenAI up to $97 billion over five years, though Microsoft keeps its IP license through 2032 and remains the primary cloud partner.

    Why inference is the core problem

    Training happens once. Serving happens billions of times a day. When OpenAI releases a model it spends months and billions on training compute, a fixed cost that falls away when training ends. Inference is the opposite: every ChatGPT message runs through the model on Azure GPU hardware, consuming electricity and compute to generate a response. With roughly 800 million weekly users, that is billions of queries a week, each burning GPU time at roughly $6.98 per H100 GPU-hour on demand. OpenAI spent $5.02 billion on Azure inference in the first six months of 2025 alone. Sacra estimates full-year inference costs of $8.4 billion in 2025, rising to $14.1 billion in 2026. This is why gross margin fell from about 40% to 33% even as revenue tripled: more capable reasoning models consume far more compute per query, and revenue has not kept pace with the cost growth that capability generates.

    What it means for the IPO and the race with Anthropic

    OpenAI was last valued around $852 billion post-money after raising $122 billion in early 2026, which puts it at roughly 65 times 2025 revenue. It has named Goldman Sachs and Morgan Stanley as underwriters and is targeting a listing as early as September 2026 at up to a $1 trillion valuation, though Altman has hedged that it “may be a while” and that staying private might be the better course. HSBC estimates the company may need more than $207 billion in additional capital through 2030. The race is with Anthropic, which filed paperwork the same month after raising $65 billion at a $900-$965 billion valuation, making it more valuable on paper, and which says it expects a $559 million operating profit in the June quarter. The contrast is sharp: the two leading AI labs heading toward public markets at the same time, one bleeding cash at scale, the other claiming profitability, both asking investors to bet on a future that has not arrived.

    Notable Quotes

    “The financial condition of OpenAI is deeply concerning. $38.53 billion in losses are astronomical, and far higher than most believed it would be. Losses also appear to be mounting year-over-year at a dramatic rate, and I’m not sure how this company finds a way toward any kind of sustainability or profitability.”

    Ed Zitron, the independent journalist who published the leaked audited financials

    “It’s unclear what this means, nor how OpenAI reconciled the removal of $3.74 billion in costs. I will not speculate further.”

    Ed Zitron, on a discrepancy he found in the restated 2024 figures

    “OpenAI’s two biggest expenses are R&D and marketing. Budget cuts there, coupled with an ability to raise prices or win new sources of revenue, could see the company move into the black over time. Cutting R&D would be the most difficult part of that, given that AI companies can only hold onto their customers by generating the best-performing models.”

    Jim Edwards, Fortune, on whether OpenAI has a realistic path to profitability

    “What the audited documents make impossible to argue is that the path to profitability is short, clear, or cheap.”

    TechTimes analysis of the leaked OpenAI financials

    The implied revenue multiples price OpenAI for “a monopoly outcome that does not yet exist.”

    Bridgewater partner Greg Jensen, reportedly telling clients how to read OpenAI’s valuation

    “OpenAI spent $34bn last year as the ChatGPT maker poured money into a race to dominate the fast-growing AI market ahead of a planned stock market listing.”

    George Hammond and Bryce Elder, Financial Times, framing the audited 2025 spend

    Read Ed Zitron’s original reporting with the full breakdown here, and the Financial Times confirmation here.

    Related Reading

    • Ed Zitron, Where’s Your Ed At the primary source that broke the audited 2025 financials with the full line-by-line breakdown.
    • OpenAI (Wikipedia) background on the company’s history, structure, and the nonprofit-to-for-profit conversion that drives the non-cash charge.
    • Inference (Wikipedia) on the recurring compute cost that explains why OpenAI’s gross margin shrinks as usage grows.
    • Anthropic the rival lab that filed IPO paperwork the same month at a higher valuation and claims it is already operating at a profit.
    • SEC on confidential filings context for why OpenAI’s audited numbers were headed for public disclosure regardless of the leak.
  • Benedict Evans on the Economics of AI Usage, Why Foundation Models May Become Commodities, and What Comes Next for SaaS

    Benedict Evans returns to the a16z podcast to update the thesis behind his widely read “AI eats the world” presentation, and the picture he paints is less about hype and more about hard economics. In this conversation he works through what has actually played out in the last year, why agentic coding became the one use case with real product market fit, and why he keeps arguing that foundation models may end up as commodities while the value moves somewhere else entirely. You can watch the full conversation here.

    TLDW

    Benedict Evans argues that the AI moment looks a lot like the early internet, the early PC era, and the rollout of mobile data, which means it is exciting, genuinely transformative, and almost impossible to predict use case by use case. Agentic coding is the only field with clear product market fit right now, with revenue run rates exploding from roughly nine billion to forty seven billion, while consumers still use chatbots weekly rather than daily. His central claim is that foundation models show no obvious network effect or sustainable differentiation, the chatbot is a limited v1 interface, and the model labs cannot build every application, so the value will likely move up the stack the way it did with chips, ISPs, and mobile networks rather than staying with the model providers. He covers the brutal supply and demand disequilibrium driving today’s token pricing and ten thousand dollar surprise bills, the financial gravity problem of hyperscalers spending over half their revenue on capex, the Jevons paradox and consumer surplus that may compete away productivity gains, the way the important questions move out of San Francisco and into industries like law, consulting, finance, and advertising, and the distinction between automating tasks and changing jobs. His closing image is an IBM ad from the 1950s promising “150 extra engineers,” a reminder that every platform shift feels unprecedented and that in twenty years we will simply say of course computers do that.

    Thoughts

    The most useful thing Evans does here is refuse to collapse uncertainty into a clean prediction, and then explain exactly why that refusal is the correct posture rather than a cop out. He distinguishes between the parts where he will commit to a view, that foundation models are probably not a product and the chatbot is probably not the right interface, and the parts where there are simply too many open paths to call. That discipline is rare in AI commentary, where the incentive is to sound certain. The commodity argument is not “models are worthless.” It is a chain of reasoning: there is no visible network effect, no durable differentiation beyond willingness to spend, no lock in comparable to Windows or iOS, and a likely structure of three to six well funded competitors plus open source and edge models all selling the same thing. Ask where price discipline comes from in that picture and the honest answer is that it probably does not, which is how you get a commodity even when demand is effectively infinite.

    The mobile data analogy is the load bearing comparison and it deserves to be taken seriously. Mobile data traffic rose something like fifteen hundred to two thousand times over fifteen years, the networks built an extraordinary piece of global infrastructure, everyone came to depend on it, and yet the operators captured almost none of the value because all the interesting stuff got built on top by someone else. Telco stocks were flat for two decades. If that is the template, then the trillion dollars of capex flowing into AI infrastructure can be both a worthwhile investment and a terrible place to expect outsized equity returns, because building the road is not the same as owning the traffic. The counterpoint Evans keeps fairly on the table is the operating system path, where Windows and iOS did capture value, but he notes they had levers and network effects that LLMs do not appear to have.

    His framing of where the questions live is the part most people in tech underweight. Once a technology works, the interesting questions stop being technology questions. Netflix is not a tech company in the sense that matters, because its real decisions are Los Angeles decisions about shows, talent, and sports, not San Francisco decisions about infrastructure. By the same logic, what AI means for a law firm is mostly a question for people who understand what associates actually do and what clients are actually paying for, not for model researchers. This is why the “the model will just do the whole thing” story keeps running aground. Most valuable software does not solve a problem the customer already knew they had. It often takes years to convince an industry that a problem even exists, and an LLM prompt does not surface latent problems that no one has articulated.

    The economic plumbing he describes is where the near term risk actually sits. We are in extreme disequilibrium, where twenty dollars a month can buy ten thousand dollars of tokens on one side and a weekend of experimentation can produce a ten thousand dollar bill on the other, exactly the pattern mobile data went through around 2009 and 2010. That gets resolved with the boring machinery of caps, throttling, and pricing tiers, not with magic. Layered on top is the financial gravity problem: Microsoft, Meta, and Google heading toward spending more than half of revenue on capex, with roughly seven hundred billion dollars of guidance across the big players, against a hard ceiling because there is not ten trillion dollars a year available to spend. And even when the productivity gains are real, the Jevons paradox and consumer surplus suggest much of the benefit gets competed away. If a discounted cash flow model used to take a week and now takes ten seconds, you do fifty of them and charge the client the same, which is great for clients and unremarkable for margins.

    The honest takeaway for builders is that the answer to “what does this do to software” is more software, probably one or two orders of magnitude more, just as SaaS itself produced an explosion rather than a consolidation. The SaaS apocalypse is real in the sense that some meaningful percentage of existing companies get wiped out, and unknowable in the sense that no one can yet say which ones, which is why thoughtful investors are reluctant to be long software in the dark. For anyone pursuing a more deliberate, purposeful relationship with technology, the closing note is the one to keep: every one of these shifts felt singular and world ending and world making at the time, it reshaped work and put people out of jobs and created things we love, and then it quietly became invisible. The goal is to stay clear eyed about which of those buckets a given change lands in rather than getting swept up in the noise of what someone said at a party yesterday.

    Key Takeaways

    • Agentic coding shifted from “kind of useful” to “really changing everything” at the start of the year, and it is the single field with unambiguous product market fit, where customers are pulling it out of your hands.
    • Coding working first was foreseeable in hindsight: software developers were the ones messing with the tools, and the first thing people do with a new kind of computer is build more computing, just as the first thing people did with PCs was make computers.
    • Anthropic, with less capital raised, chose to focus on coding and got it working, while OpenAI cycled through a more everything all at once strategy before narrowing in.
    • The intense focus on coding comes bundled with a supply crunch, a capacity crunch, and a price and capex imbalance that defines the current moment.
    • Most of the fundamental questions from two or three years ago still have no answers: whether there will be a winner in models, whether models capture value up the stack, how much they can do, and whether consumers will use this daily rather than weekly.
    • There is a wide gap between Valley insiders running clusters of Mac Studios all day and the roughly forty percent of people who say AI is “kind of useful, I used it last week for something.”
    • Outside tech, companies are adopting AI as one at a time point solutions for specific back office processes, like a commodities company using LLMs for better cash flow forecasting, not as a general purpose assistant.
    • Adoption always compounds on prior platforms: you could not have nine hundred million weekly active users in the Netscape era because there were not nine hundred million PCs on the planet.
    • Early in any platform shift almost nothing works smoothly, from sound cards and floppy disks with TCP/IP to computers that froze and lost your work, and AI is at that stage now.
    • Today’s token pricing crunch mirrors the mobile data shock of 2009 to 2010, where flat rate plans collided with surging usage and networks had to realign price with marginal cost through caps, fair use, and throttling.
    • Mobile data traffic rose roughly fifteen hundred to two thousand times in fifteen years, mobile networks earn around a trillion dollars and spend about two hundred billion a year on capex, yet their stocks have been flat for twenty years because all the value moved up the stack.
    • The central LLM question is whether the model can do the whole thing or whether you need hundreds of applications built on top, the same way you needed apps on Windows and iOS.
    • Evans sees no network effect and no sustainable differentiation between models beyond willingness to spend money, which points toward commodity infrastructure sold near marginal cost.
    • Chip companies, ISPs, and mobile operators did not capture the value; Windows and iOS did, but only because they had levers to move up the stack and real network effects, which models lack.
    • A useful comparison is semiconductors, where each generation gets more expensive and the field narrows to fewer players, suggesting three to six frontier model makers spending somewhere between two hundred billion and two trillion dollars a year.
    • Enterprises do not standardize on a model the way they once thought about AWS; the cloud and the model get abstracted away, so customers do not even know which one their SaaS product runs on.
    • Demand for tokens being effectively infinite does not prevent a price equilibrium, exactly as infinite demand for mobile bits still produced murderous price wars between commodity carriers.
    • History teaches that something will happen but rarely what; the smartest people in tech wrongly predicted Android would crush the iPhone on open versus closed grounds.
    • One characteristic of tech is that the moment you understand how something works is the moment to move on, which is why Evans stopped updating his Apple spreadsheet years ago.
    • The people who are good at using a tool are usually not the people who are good at designing what the tool should be, which is why model labs cannot build every skill or vertical application.
    • Claude skills and similar templates resemble file new in Excel: useful starting points that users eventually outgrow, raising the question of who builds the real software.
    • The questions increasingly move out of technology and into specific industries; what AI means for law, consulting, advertising, or accounting is partly an AI question and partly a deep domain question.
    • Netflix is not a tech company in the way that matters, because its real questions are media industry questions about shows, talent, and sports, not infrastructure; the same logic now applies across industries facing AI.
    • AI differs from prior platform shifts because the physical limits are unknown; in 1995 you knew PCs cost three thousand dollars and broadband could not reach everyone overnight, but no one knows how cheap, fast, or capable models will get.
    • Evans offers four buttons to press on any use case: is it just price elasticity and the Jevons paradox, does it remove a cost barrier to entry, does it unlock a new business model, or does it make something previously impossible now possible like trains over horses or Spotify over CDs.
    • Advertising and e-commerce are a standout opportunity because today’s systems know a SKU and a metadata field but not what a product actually is or why people buy it, and LLMs could change that level of understanding.
    • The valuable shift is not doing the old thing more, like more spreadsheets or better email, but doing genuinely new things, such as asking an LLM how to change prices to improve churn using all your call recordings, CRM flows, and product telemetry.
    • Enterprise software today splits into three buckets: big horizontal systems like SAP and Workday, three to four hundred vertical SaaS apps plus a thousand internal apps, and a fuzzy improvised middle of Excel, email, and shared files, with AI arriving as a new option across all three.
    • A core design tension is where to put the probabilistic software that can make mistakes versus the deterministic database that cannot, and whether the LLM sits at the top or the bottom of the stack; the answer is probably both depending on the task.
    • The net effect on software is way more software, since SaaS itself produced one to two orders of magnitude more software and all software companies exist to solve problems created by other software companies.
    • The SaaS apocalypse is real but unknowable: some percentage of SaaS companies get wiped out, but no one knows which, so you should not derate the whole sector fifty percent and many investors are wary of being long software for now.
    • Much of what an organization does is implicit, undocumented, and not in the training data, which is exactly the value McKinsey, Bain, and BCG provide by getting license to map how a company really works.
    • The real decisions are usually exception handling: the question is always what you cannot automate and what still requires human judgment about cases that were never written down.
    • Distinguish tasks from jobs: accountants spend almost none of their time the way they did fifty years ago, yet to the client the job looks the same.
    • LLMs excel where you want the average, the answer anyone would give, and struggle where you specifically do not want the average and cannot fully explain why you did it differently.
    • There is a financial gravity ceiling: Microsoft, Meta, and Google are on track to spend over fifty percent of revenue on capex versus fifteen to twenty percent for capital intensive telecoms, with seven hundred billion in guidance this year and no path to ten trillion.
    • Hyperscalers face an existential FOMO trap: returns look positive now, but they cannot let rivals build the future of compute without participating, even as the CFO asks how much participation is enough.
    • Token maxing will face a reckoning as the disequilibrium resolves, but measuring ROI is hard because most reported benefits so far, like better analytics, support, and productivity, are tough to put a financial value on.
    • Consumer surplus means many gains get competed away: if analysis that took a week now takes a day, you do five times more analysis and charge the same, the way investment banks did with spreadsheets.
    • Evans closes with a 1950s IBM ad promising “150 extra engineers,” a reminder that every fundamental technology change feels unprecedented, and that in twenty years AI will simply be invisible magic we take for granted.

    Detailed Summary

    What changed in the last year

    Evans frames the past year as a narrowing of focus. A year and a half after the first version of his presentation, the field has developed a much clearer sense of diverging product strategies and competitive tension that goes beyond simply building a bigger model with more compute. The dominant shift is that agentic coding started genuinely working, and the entire industry narrowed in on it because it has absolute product market fit, the kind where customers pull the product out of your hands. That success arrives alongside the supply crunch, capacity constraints, and price imbalance that now define the moment. At the same time, the charts keep climbing, models keep getting bigger, capex keeps growing, and usage keeps growing, while the deep questions from a few years ago remain unanswered.

    Why coding worked first

    That coding led was predictable at a naive level: the people experimenting with the tools were software developers, and they naturally tried to make software development work. Evans compares the moment to the internet around 1997 and 1998, and also to PCs in the late seventies and early eighties, when the technology was exciting but it was not clear what it was for and it did not quite work yet. The first thing people did with PCs was make computers, and since LLMs are in a sense computers, the first thing people are doing with them is making more compute. What was harder to foresee was the precise timing of the shift, the moment when agentic coding flipped from useful to transformative at the start of this year.

    Jobs, juniors, and what we have not learned

    On the question of what this means for engineers and team structure, Evans is blunt that we have learned almost nothing yet, because this did not even work six months ago and everyone is scrambling to interpret it. The pricing crunch alone means it will take a couple of years to settle. The newly concrete questions include whether you still hire junior people and what they would do, and why you were hiring juniors in the first place, whether to do the work itself or to develop people. Because software development now genuinely automates a class of work that used to be done by people, those questions have moved from theoretical to real, but no one can responsibly claim to know what a software team or a software career looks like in three years.

    OpenAI, Anthropic, and the strategy split

    Evans dryly notes the drama around the model labs, including the disruption of a senior leadership medical leave at OpenAI. In the latter part of last year, OpenAI’s question was essentially what to build on top of the models, an everything all at once approach that looked almost like asking the model for fifteen ideas and then doing all of them. Anthropic, with less capital raised, instead committed to coding and got it working, whether by deliberate strategy or by stumbling into it. The result is that software development plus a few other fields are where things genuinely work, surrounded by a large population of people excited around the edges and corporations quietly automating specific back office processes. He cites a commodities company that wants LLMs for better cash flow forecasting across many small producers, a very different thing from asking a chatbot to summarize your meetings.

    The mobile data analogy and value capture

    The richest section is the comparison to mobile. Adoption always compounds on prior platforms, so AI inherits a far larger installed base than the internet or mobile did at their starts. Early on, nothing works smoothly, and Evans recalls the era of buying a three hundred dollar sound card or wrestling a floppy disk of TCP/IP into a machine. The pricing dynamics directly echo mobile data around 2009 and 2010, when flat rate plans met exploding usage and ten thousand dollar bills, forcing networks to realign price with marginal cost. Crucially, mobile data traffic then rose fifteen hundred to two thousand times, the networks built extraordinary global infrastructure with around a trillion dollars of revenue and two hundred billion in annual capex, and yet their stocks stayed flat for twenty years because all the cool stuff and all the value got built and captured by someone else higher up the stack. Chip companies, ISPs, and mobile operators did not capture value; Windows and iOS did, but they had levers and network effects that models do not appear to share.

    The case that models become commodities

    Evans lays out the building blocks of his commodity thesis. First, there is no clear way to build a model that is sustainably and fundamentally better than everyone else’s, with no visible network effect and no strategic lever comparable to what Instagram, YouTube, or Google search enjoy. Differences in emphasis and taste exist, but not durable competitive moats beyond spending. Second, the chatbot is a weird, limited v1 interface that works well for some tasks and people but requires tooling, the right data, configuration, control, and thoughtful design for most real jobs, and the people good at a job are rarely the people good at designing the tool for it. Third, the labs cannot build every application any more than Microsoft or Apple could build every Windows or iPhone app. Enterprises do not standardize on a model the way they never standardized on a visible cloud provider, because it gets abstracted away. Taken together, that points to low level infrastructure sold by perhaps half a dozen competitors plus open source and edge, with no obvious source of price discipline, which is the definition of a commodity even when demand is infinite.

    The questions move out of technology

    One of the next big questions is when models become good enough that you no longer need the largest, fastest, most expensive model, and can use an older model, an open source model, or one running on device where compute is effectively free to the developer. But the deeper shift is that the important questions move out of technology and into industries. Drawing on his own essays “content isn’t king” and “Netflix isn’t a tech company,” Evans argues that Netflix’s real decisions are Los Angeles media questions, not San Francisco infrastructure questions, and San Francisco does not even know what the right questions are. By the same logic, what AI means for a law firm is mostly a question for people who understand law firms, what generative video means for Hollywood is a question Ben Affleck can answer better than he can, and the questions become half AI and half something else.

    Four buttons and the new things AI unlocks

    To reason about impact, Evans offers four buttons. Is a use case just price elasticity, the Jevons paradox of doing the same thing for less or more for the same money. Does it remove a cost that was a barrier to entry, like a newspaper’s printing press. Does it unlock something in your business model. Or does it make something previously impossible now possible, the way steam engines made trains possible regardless of how many horses you bought, or Spotify turned fifteen dollars a month into all the music there is. He stresses that the same broad change can mean wildly different things by industry, just as the internet devastated newspapers but barely touched movie studios. His favorite tractable example is advertising and e-commerce, a trillion dollar advertising market against twenty five trillion in retail, where today’s systems know a SKU and a metadata field and that people who bought one thing bought another, but do not know what a product is or why people buy it. An LLM could in principle understand the product, recommend ten coats at different prices with pros and cons, or look at your Instagram and suggest a winter coat that changes your look but not too much, which would have been science fiction three years ago.

    More software, the SaaS apocalypse, and tasks versus jobs

    For software specifically, Evans expects more competition, cheaper and quicker building, and new categories that were impossible before, all under an uncertain new margin structure where outcome based pricing is hard because most software work cannot be tied cleanly to profit and loss. He frames enterprise software as three buckets, big horizontal systems, hundreds of vertical and internal apps, and a fuzzy improvised middle of Excel and email, with AI arriving as another option across all of them. The deeper design tension is where to place probabilistic software that can make mistakes versus deterministic systems that cannot, and whether the LLM sits at the top or bottom of the stack, with the answer being both depending on the task. The net result is way more software, since SaaS itself produced orders of magnitude more software and software exists to solve problems created by other software. That fuels the SaaS apocalypse anxiety: some companies clearly get wiped out, but since no one knows which, you should not derate the whole sector, even as many investors stay cautious about being long software.

    Implicit knowledge, exception handling, and where the average fails

    Much of what organizations do is implicit, undocumented, and absent from any training data, which is precisely the value of strategy consultancies that get license to map how a company really works versus how it is supposed to work. The real decisions tend to be exception handling, the cases that require human judgment because they were never written down or do not look like before. Evans separates tasks from jobs, noting accountants do almost nothing the way they did fifty years ago while the client still buys the same thing. And he offers a sharp test: LLMs are excellent where you want the average, the answer anyone would give, and weak where you specifically do not want the average and cannot fully articulate why you did it differently.

    Capex, financial gravity, and the ROI question

    On spending, Evans describes a financial gravity problem. Microsoft, Meta, and Google are on line to spend over half their revenue on capex this year, against fifteen to twenty percent for capital intensive telecoms, with roughly seven hundred billion in guidance across the big players, a sum comparable to all of telecom or oil and gas. They cannot sustainably leap to one and a half trillion next year because the money is not there, so the curve must eventually taper. The hyperscalers are caught in an existential FOMO trap: returns look positive now, but they cannot sit out what might be the future of compute without risking becoming the next stranded incumbent, even as the CFO asks how much is enough. On token maxing, he expects a reckoning as the disequilibrium resolves, but measuring ROI is genuinely hard because most reported benefits so far are soft and hard to value, and consumer surplus means much of the gain gets competed away, the way faster spreadsheets simply meant more analysis at the same price.

    Closing image

    Evans ends with an IBM advertisement from the early 1950s showing a sea of engineers holding slide rules, with the tagline that an IBM electronic calculator gives you 150 extra engineers, exactly the pitch behind countless modern startup decks. We move through these fundamental technology waves every ten or fifteen or twenty years, each one feeling completely unlike anything before, and AI is amazing and transformative in the same way mobile, the internet, and PCs were. The base case is that it will produce wonderful things, ruin some livelihoods, put people out of work, and eventually become invisible. His one line description of where it all ends up is that it will be magic, and in twenty years we will simply say of course computers do that, the way an hour of crash free streaming HD video over Wi-Fi already feels unremarkable.

    Notable Quotes

    “Agentic coding went from being kind of useful to really changing everything.”

    Benedict Evans, on the pivotal shift at the start of the year

    “We are in this extreme scarcity. We can’t spend $10 trillion a year on AI infrastructure cuz there isn’t $10 trillion a year there to spend on it.”

    Benedict Evans, on the hard ceiling of AI capex

    “I don’t think foundation models are a product. I don’t think a chatbot is a product. I think the value will be further up.”

    Benedict Evans, stating the core of his thesis

    “They built this amazing piece of global incredibly sophisticated very expensive global infrastructure with enormous growth in use, and they didn’t make any money from it because all the value moved up stack.”

    Benedict Evans, on the mobile network analogy

    “The moment that you understand something and you know how it works and what’s going to happen is the moment you should move on to something else.”

    Benedict Evans, on how to pay attention in tech

    “These are all Los Angeles questions. These are not San Francisco questions. No one in San Francisco even knows what the right questions are.”

    Benedict Evans, on why Netflix is not a tech company

    “The important stuff is not doing the old thing but more. It’s doing something new that you couldn’t have done with the old thing.”

    Benedict Evans, on where the real value of a new technology shows up

    “All software companies exist to solve problems created by other software companies.”

    Benedict Evans, on why AI produces more software, not less

    “It’s going to be magic, and in 20 years time we’ll just say, well, of course that’s how it is. Computers have always done that.”

    Benedict Evans, on how the whole shift ends up

    This is a dense, clear eyed conversation that rewards a full listen, especially if you are trying to think past the hype cycle about where AI value actually lands. Watch the full conversation here, and check out the “AI eats the world” presentation referenced throughout.

    Related Reading

    • Benedict Evans’ website home of the “AI eats the world” presentation and his newsletter referenced throughout the conversation.
    • Andreessen Horowitz (a16z) the venture firm whose podcast hosted this discussion and where Evans was formerly a partner.
    • Jevons paradox (Wikipedia) background on the price elasticity idea Evans uses to explain how cheaper AI may lead to more usage rather than savings.
    • Stratechery by Ben Thompson the analysis Evans cites on software as a designed workflow versus a process that grows out of how a business runs.
    • The Pursuit of Purpose a PJFP look at finding direction and meaning in work as automation reshapes careers and industries.
  • Thomas Laffont of Coatue on the $4 Trillion AI IPO Wave: SpaceX, Anthropic, OpenAI, and Why the New Unicorn Economy Is Healthier

    Thomas Laffont, co-founder of the $55 billion hedge fund Coatue Management, made his All-In Podcast premiere with a data-dense walk through what he calls a once-in-a-generation moment for the unicorn economy. In front of Chamath Palihapitiya, Jason Calacanis, David Sacks, and David Friedberg, he argued that a roughly $4 trillion wave of private value is about to hit the public markets, led by SpaceX, Anthropic, and OpenAI, and that the new AI-driven unicorn economy is actually healthier than the one that came before it. You can watch the full presentation and Q&A on YouTube.

    TLDW

    Laffont presents Coatue’s slide deck on the state of the unicorn economy and argues it has rebalanced after the excesses of 2021. The average unicorn is up about 70 percent since September 2024, AI keeps taking a bigger share of all fundraising, and the model has shifted from many small unicorns to fewer companies each raising far more, with funding per unicorn up roughly 5x since 2021. He introduces a “Magnificent 8” private index (SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more) worth nearly $4 trillion that has crushed the public Mag 7, then shows that exits are finally thawing as SpaceX heads to an IPO in weeks and Anthropic confidentially files its S1. He lays out Coatue’s “CODE” framework for why SpaceX gets more valuable the more it launches, a counterintuitive finding that the odds of a 10x actually rise as companies get bigger (31 percent for $100 billion-plus centicorns), the explosive revenue ramp of OpenAI and Anthropic past Workday, ServiceNow, Adobe, Salesforce, and now the hyperscalers, a three-pillar map of where AI revenue comes from (consumer, ads, enterprise), and the AI memory thesis. The Q&A with Chamath and Calacanis digs into the power law, K-shaped outcomes, whether these valuations are disconnected from reality, the public market as the great antiseptic, and what happens when trillions in private value finally recycles back through GPs and LPs.

    Thoughts

    The most useful idea in the talk is not the $4 trillion headline, it is the cohort-health chart. Laffont splits unicorns into eras and shows that the pre-2021 cohort was healthy, roughly 80 percent had raised again or exited 20 quarters after minting, while the giant 2021 ZIRP cohort of 479 companies is stuck with under 20 percent doing either. That single comparison reframes the whole AI boom. The bullish read is that the 2024 AI cohort is small, concentrated, and cash-generative, so it looks more like the healthy pre-ZIRP group than the 2021 hangover. The bearish read is that we are watching the same movie with bigger numbers, and the test only comes when these companies face public markets. Laffont is honest that we do not yet know which cohort the AI class resembles, and that intellectual humility is what makes the deck credible rather than promotional.

    The SpaceX “CODE” framework is the sharpest analytical move of the presentation. Most people would assume a launch business gets cheaper per launch as it scales. Laffont shows the opposite, the market pays more per launch as cadence rises, and explains it as a phase change in business quality: from one-time government launch revenue, to a single recurring-revenue constellation, to multiple constellations, to a platform with optional upside in space data centers, the moon, and Mars. It is a clean way to think about any company that climbs from a project business to a platform business, and it applies far beyond rockets. The lesson for investors is that valuation can rationally expand even as unit economics look like they should compress, because the nature of the revenue underneath is changing.

    The counterintuitive 10x odds finding deserves more attention than it got in the room. Conventional wisdom says the bigger you are, the harder it is to grow, so a $100 billion company should be less likely to 10x than a $10 billion one. Coatue’s data says the reverse: centicorns have a 31 percent shot at a 10x, far higher than the 8 percent a unicorn has at becoming a decacorn. Laffont’s explanation is a filtering mechanism, every step up validates a compounding advantage and durability of earnings, so survivors are increasingly the kind of business that keeps compounding. This is essentially a quantitative restatement of quality investing, and it is the intellectual backbone of the LP strategy the besties tease out, just buy whoever reaches $100 billion and hold.

    Where the argument gets genuinely contested is valuation, and the panel does not let it slide. The pushback that “these are not fake companies” is true and important, OpenAI and Anthropic are growing faster than any software company in history, and Anthropic reportedly had a profitable month. But growth and reality do not settle the question of price when you are paying 50 to 100 times revenue for trillion-dollar private companies, as Bill Ackman pointed out earlier in the day. Laffont’s answer is the most grounded thing he says all session: the public market is the great antiseptic, it will not care about anyone’s slide deck, and he wants to see these names withstand short sellers and skeptics. That is the right posture. The deck is a thesis, not a verdict, and the verdict arrives roughly six months and one day after the IPOs, once passive flows and supply have washed through.

    The closing thread, that almost every sector is being transformed at once and we still do not have superintelligence, is the part worth sitting with. The risk in a presentation this bullish is treating the trend as destiny. The value is in the framing tools Laffont hands you, cohort health, phase-change business quality, the filtering odds, the three revenue pillars, and the antiseptic of public scrutiny. Use those to interrogate each name rather than to buy the index on faith, and the talk earns its premiere billing.

    Key Takeaways

    • Coatue Management is one of the most successful hedge funds of the last two decades with about $55 billion under management, and is raising roughly another billion dollars specifically to invest in AI.
    • The unicorn economy is up about 70 percent on average since September 2024, and the public market has made a similar move up over the same period.
    • The unicorn economy’s share of the NASDAQ rose significantly after 2015 but has plateaued in recent years, reflecting strong performance from public companies.
    • AI keeps increasing its wallet share of all venture fundraising, multiple years in a row now.
    • The composition of funding has changed. The unicorn “factory” peaked in the ZIRP era of 2021 and has normalized at a much lower level since.
    • Funding per unicorn has increased roughly 5x since 2021. There are fewer unicorns, and each one is raising more.
    • Cohort health, pre-ZIRP group: of about 73 unicorns, 20 quarters after minting roughly 80 percent had either raised a new round or exited, which is healthy.
    • Cohort health, 2021 group: of about 479 unicorns, 20 quarters in, fewer than 20 percent had exited or raised again. Far larger cohort, far worse outcomes.
    • The open question is which cohort the new 2024 AI cohort will resemble.
    • Funding is concentrating: the top 10 companies capture a large share, and it is a small number of AI companies, not all of them, with Anthropic and OpenAI raising massive rounds.
    • Laffont proposes a “Magnificent 8” private index: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more, spanning internet, AI, fintech, and space tech.
    • That private index represents almost $4 trillion of value and has crushed the traditional public Mag 7, with almost every name outperforming.
    • Exits are thawing. 2026 is on a good trend for cash returned versus consumed, not quite 2021 levels, with half a year still to go.
    • That trend does not yet include three imminent liquidity events: SpaceX (IPO expected in weeks) and Anthropic (confidentially filed its S1), whose combined value could exceed the prior decade of exits combined.
    • The ecosystem is far more balanced than when Laffont first presented at the 2024 All-In Summit, when it was consuming much more cash than it returned.
    • OpenAI and Anthropic revenue growth is unlike anything previously seen. Starting from January 2025, they passed Workday, then ServiceNow, then Adobe, then Salesforce, and are now bigger than Google Cloud and Azure.
    • On current forecasts, that revenue could pass AWS by the end of the year and exceed all of Microsoft by 2028.
    • Hyperscalers are not sitting still. The largest companies in the world are funding the disruption, investing unprecedented sums to enable the ChatGPT moment.
    • The SpaceX “CODE” framework: the number one driver correlated to SpaceX’s valuation is cadence of launches, and valuation per launch rises as launches increase.
    • Why per-launch value rises: business quality improves through phases, pre-constellation (one-time government revenue), initial ramp (one recurring-revenue constellation), scale (multiple constellations), and platform (space data centers, moon and Mars optionality).
    • Anthropic in particular is scaling like no company seen across the PC, internet, or mobile eras.
    • Counterintuitive 10x odds: a unicorn has about an 8 percent chance of becoming a decacorn, a decacorn has 8 to 13 percent odds of reaching $100 billion, but a centicorn ($100 billion-plus) has a 31 percent chance of a 10x.
    • Value creation has accelerated. It typically takes years to go from $500 billion to $1 trillion in market cap, yet recently three companies did it in one year and two did it in a matter of weeks.
    • Cerebras is the counterexample of slow success: years of dark periods and no new capital developing its technology, then a massive OpenAI contract that quintupled the company’s value ahead of its IPO.
    • Semiconductors are on a generational run, with the sector dramatically outperforming the index since the 2024 All-In Summit.
    • AI memory thesis: the more an AI system knows about you, the more useful it is, so memory per user could quintuple, which helps explain recent moves in memory companies.
    • Where the revenue is: the AI ecosystem is roughly $140 billion today, about $300 billion this year, and is expected to double in 2027.
    • Three revenue pillars: consumer (subscribers times ARPU), ads (about a quarter of Meta and Google ads are AI-enabled today, heading toward 100 percent and roughly $150 billion), and enterprise (tools like Claude Code and Codex inside businesses).
    • Disruption is hitting every sector: software, telco (Starlink-powered global phone calls), semis, energy (data centers reshaping Pennsylvania’s grid), auto (Ferrari’s electric and autonomous stumble), and consumer (GLP-1s reshaping food, alcohol, and wellness).
    • Final takeaways: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of not owning a winner is higher than ever, disruption is everywhere, and we do not even have superintelligence yet.
    • In the Q&A, both Anthropic and OpenAI publicly say they want to be public, and big outcomes now look likely to become liquid within roughly a 12-month window.
    • The valuation pushback: these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly even had a profitable month.
    • The public market is framed as the great equalizer and antiseptic, but with passive buying the true price discovery may not land on day one, more like six months and a day after listing.
    • A floated LP strategy: wait for whoever reaches $100 billion and concentrate capital there as the least brittle, quickest-return bet, tempered by the warning that valuations are disconnecting from any historical metric (50x to 100x revenue).
    • An open risk: with so much capital, OpenAI and Anthropic could rationally start a price war, the way ride-sharing and food-delivery players once did, though heavy infrastructure spend complicates it.

    Detailed Summary

    The unicorn economy has rebalanced after 2021

    Laffont opens by reframing a market many assume is frothy. The average unicorn is up about 70 percent since September 2024, and the public market has tracked a similar climb, so private and public value are moving together rather than diverging. The unicorn economy’s share of the NASDAQ rose sharply after 2015 and then plateaued, which he reads as a sign of how strong public companies have become. Underneath the headline, the structure of funding has changed. The 2021 ZIRP era was a unicorn factory that minted enormous numbers of companies, and that machine has since normalized to a much lower level. The result is a barbell: fewer new unicorns, but each raising far more, with funding per unicorn up roughly 5x since 2021. AI sits at the center of this, taking a steadily larger share of all venture dollars for several years running.

    Cohort health is the real story

    The deck’s most important slide measures the health of the ecosystem by cohort. The pre-ZIRP cohort, about 73 unicorns, looks healthy: 20 quarters after becoming unicorns, roughly 80 percent had either raised a new round or exited. The 2021 cohort tells the opposite story. It is enormous, about 479 unicorns, and 20 quarters in, fewer than 20 percent had raised again or exited. That contrast sets up the central question of the talk. A new 2024 cohort of AI companies is forming, and no one yet knows whether it will resemble the healthy pre-ZIRP group or the bloated, stuck 2021 group. Laffont’s framing leans optimistic because the AI cohort is small and concentrated, but he is careful not to declare the answer.

    The Magnificent 8 and a $4 trillion private index

    Funding is not just flowing to AI, it is flowing to a handful of AI names, with the top 10 capturing a large share and Anthropic and OpenAI raising the biggest rounds. From this concentration Laffont builds a private index he half-jokingly calls the Magnificent 8, a number he expects to shrink as companies go public. The members span sectors: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, and Anduril, covering internet, AI, fintech, and space tech. He says he would be comfortable owning that index for the next decade-plus. Collectively it represents almost $4 trillion of value and has outperformed the public Mag 7, with nearly every constituent beating that benchmark.

    Exits are thawing and a wall of liquidity is coming

    One of Laffont’s recurring concerns at past summits has been balance: the unicorn economy is great at consuming cash, but a healthy ecosystem must also return it. On that score 2026 is trending well, not quite 2021, but solid with half a year left. Crucially, that figure does not yet include three imminent events. SpaceX is expected to go public within weeks, and Anthropic confidentially filed its S1 the day of the talk. Adding those up, just a few companies could deliver more liquidity than the prior ten years combined. The takeaway is that the ecosystem that was dangerously out of balance in 2024 is now meaningfully more balanced, and improving.

    The revenue ramp past the hyperscalers

    The growth rates of OpenAI and Anthropic, Laffont argues, are unlike anything previously seen. Charting from January 2025, the leading AI labs passed Workday, then ServiceNow, then Adobe by year end, then Salesforce by January, and are now bigger than Google Cloud and Azure. On forecast, that revenue could surpass AWS by the end of the year and exceed all of Microsoft by 2028. He stresses that the hyperscalers are not passive bystanders, they are actively funding the disruption, pouring unprecedented capital into enabling the change that began with the ChatGPT moment.

    The SpaceX CODE framework

    Laffont devotes real time to how Coatue thinks about SpaceX. The single factor most correlated with SpaceX’s valuation is cadence of launches, which is intuitive for a launch business. The surprise is that valuation per launch has risen rather than fallen as cadence climbed. His explanation, the CODE framework, is that the quality of the business model improves the more SpaceX launches. In phase one, pre-constellation, you are simply proving rockets, with a few government customers and lumpy, unpredictable one-time revenue. In the initial ramp you stand up a constellation, which is an end market and a recurring-revenue business that grows with every satellite and subscriber. At scale you operate multiple constellations, and Laffont expects companies, governments, and militaries to want to own their own. Ultimately it becomes a platform, with new businesses layered on top, from space data centers to the optionality of the moon and Mars.

    Counterintuitive odds and the speed of value creation

    Coatue bucketed companies and asked the odds of a 10x within each. A unicorn has roughly an 8 percent chance of becoming a decacorn. A decacorn has 8 to 13 percent odds of reaching $100 billion. But a centicorn, $100 billion or more, has a 31 percent chance of a 10x, counting both public and private companies. The bigger you are, the better your odds, which inverts intuition. Laffont pairs this with the sheer speed of recent value creation. Going from $500 billion to $1 trillion in market cap normally takes years, yet three companies did it in a single year and two did it in a matter of weeks. He also offers Cerebras as the patient counterexample, a chip company that endured years of dark periods and no new capital before a massive OpenAI contract quintupled its value ahead of IPO, part of a broader generational run for semiconductors.

    AI memory and where the revenue actually comes from

    A throughline from the day’s other speakers is that the more an AI knows about you, the more useful it is, from your restaurant preferences to your work context. Laffont turns that into a thesis: memory per user could quintuple based on what these systems require, which helps explain recent moves in memory companies. He then tackles the most contested question, where is the revenue. He sizes the AI ecosystem at about $140 billion today, roughly $300 billion this year, and doubling in 2027, built on three pillars. Consumer is subscribers times ARPU. Ads are the pillar people forget, with about a quarter of Meta and Google ads already AI-enabled and penetration heading toward 100 percent, a roughly $150 billion opportunity. Enterprise is the breakthrough category, exemplified by tools like Claude Code and Codex operating inside businesses.

    Every sector is being transformed at once

    What makes this era different, Laffont says, is that nearly every sector is being transformed simultaneously. Software is obvious, but look at telco, where he believes Starlink will soon power a device that lets you make a phone call anywhere on earth, attacking the global telco and broadband profit pool with a better product. Compute is driving massive change in semis, data centers are reshaping the energy equation in places like Pennsylvania, and the auto business is being upended, as Ferrari’s stumble introducing electric and autonomous technology showed. In consumer, GLP-1 drugs are profoundly changing consumption of food and alcohol and the broader focus on wellness. His takeaways close the loop: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of missing them is higher than ever, disruption is everywhere, and superintelligence has not even arrived yet.

    The Q&A: power law, valuation, and the public market test

    Chamath and Jason Calacanis press Laffont on what this means for allocators. The recurring theme is the power law and K-shaped outcomes, with gains consolidating into a small number of companies. The positive side, Laffont notes, is that outcomes are enormous and increasingly liquid within a 12-month window, and both Anthropic and OpenAI say they want to be public. The hard part is valuation. The besties cite Bill Ackman’s framing that investors are making venture bets on trillion-dollar companies at 50 to 100 times revenue. Laffont’s pushback is that these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly had a profitable month. But he embraces the discipline ahead: the public market is the great antiseptic and will not care about anyone’s presentation, though with heavy passive buying, true price discovery may take roughly six months and a day rather than landing on day one. Asked whether the compounding is a market inefficiency or survivor bias, he declines to over-read a small sample, noting that Anthropic before Claude Code was a completely different company than after. The conversation closes on what happens when trillions recycle from GPs to LPs, the case for simply owning whoever crosses $100 billion, the risk of everyone crowding into three names, and the possibility of an eventual OpenAI versus Anthropic price war.

    Notable Quotes

    “So we have fewer unicorns that are each raising more.”

    Thomas Laffont, summarizing how funding per unicorn has risen roughly 5x since 2021

    “The reason is that the quality of SpaceX’s business model increases the more you launch.”

    Thomas Laffont, explaining the CODE framework and why valuation per launch rises with cadence

    “The winners are compounding faster than ever, which means the costs of not being in a winner are higher than ever.”

    Thomas Laffont, on the central risk of a power-law market

    “And by the way, we don’t even have super intelligence yet.”

    Thomas Laffont, closing his takeaways on how early the transformation still is

    “These are companies generating substantial revenue at scale that are growing faster than anything we’ve ever seen.”

    Thomas Laffont, pushing back on the idea that AI valuations rest on fake companies

    “It will be the great antiseptic. It will not care about my presentation.”

    Thomas Laffont, on the public market as the ultimate test for SpaceX, OpenAI, and Anthropic

    “Anthropic pre-cloud code was a completely different company than post cloud code.”

    Thomas Laffont, on why he won’t over-read a small sample of hyper-compounders

    “The power law rules our lives. All the great gains are being consolidated into small numbers of companies.”

    An All-In host, framing the Q&A on concentration in private markets

    This is a curated set of highlights. To hear the full presentation, the slide walkthrough, and the complete Q&A with Chamath and Jason Calacanis, watch the full conversation here.

    Related Reading

    • Coatue Management. Primary source for Thomas Laffont’s firm and the technology investing strategy behind the deck.
    • The All-In Podcast. The show and summit where Laffont made this premiere presentation.
    • Power law (Wikipedia). Background on the distribution Laffont and the hosts say governs venture and public-market returns.
    • The Magnificent Seven (Wikipedia). The public-market benchmark Laffont’s private “Magnificent 8” index is measured against.
    • Cerebras Systems. The AI chipmaker Laffont cites as the slow-grind IPO that was eventually transformed by a major OpenAI contract.
  • Anthropic Raises $65 Billion Series H at $965 Billion Valuation to Fund AI Safety Research and Massive Compute Expansion

    Anthropic has closed one of the largest private financing rounds in the history of technology, raising $65 billion in Series H funding at a $965 billion post-money valuation. The round, announced on May 28, 2026, lands as demand for Claude reaches what the company calls historic levels, and it positions Anthropic to pour fresh capital into safety research, compute, and the products that enterprises now lean on every day.

    TLDR

    Anthropic raised $65 billion in its Series H at a $965 billion post-money valuation, with Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital leading and Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN co-leading, alongside $15 billion in previously committed hyperscaler investment that includes $5 billion from Amazon. The raise follows Anthropic crossing $47 billion in run-rate revenue earlier in May 2026, and it funds three priorities named by CFO Krishna Rao: advancing safety and interpretability research, expanding compute capacity to meet growing Claude demand, and scaling the products and partnerships customers depend on. On the infrastructure side, the company is locking in gigawatt-scale compute through 5 gigawatts with Amazon, 5 gigawatts of TPU capacity via Google and Broadcom, GPU access from SpaceX, and supply from partners Micron, Samsung, and SK hynix, while Claude remains available across all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure, with widespread enterprise adoption across industries.

    Thoughts

    Start with the number that everyone will fixate on. A $965 billion post-money valuation against $47 billion in run-rate revenue is roughly 20 times sales, and for a company growing this fast that multiple is not the interesting part. The interesting part is that run-rate revenue crossed $47 billion earlier this month, which means the denominator is moving so quickly that the multiple is already stale. Investors are not pricing the business Anthropic is today. They are pricing the slope. A 20x multiple on a number that may double again inside a year is a very different bet than 20x on a flat line, and the lead names here (Altimeter, Dragoneer, Greenoaks, Sequoia, with Capital Group, Coatue, GIC and others co-leading) are not the kind of capital that pays for nostalgia. They are paying for the second derivative.

    But the real story is not the valuation. It is the compute. Read the infrastructure list carefully and you see the actual problem this round solves: 5 gigawatts from Amazon, 5 gigawatts of TPU capacity through Google and Broadcom, GPU access from SpaceX, and memory supply locked down with Micron, Samsung, and SK hynix. That is more than 10 gigawatts of secured power and silicon. The constraint on frontier AI in 2026 is no longer talent or even algorithms. It is electricity, land, and the multi-year queue for advanced packaging and high-bandwidth memory. You cannot buy 10 gigawatts on a quarterly basis. You reserve it years out, and you need the balance sheet to make those commitments credible. A $65 billion raise is, in plain terms, the down payment that lets Anthropic sign for capacity nobody can conjure on demand. The money is downstream of the megawatts.

    The diversification across that compute stack matters as much as the size. By splitting between Amazon’s infrastructure, Google and Broadcom’s custom TPUs, and SpaceX-supplied GPUs, Anthropic is refusing to become hostage to any single supplier’s roadmap or pricing. Custom silicon through Broadcom in particular is a bet on bending the cost curve, because the long-term economics of serving Claude at this scale depend on dollars per token, not just on raw availability. Anyone who has watched cloud lock-in play out over the last decade understands the move. Optionality at the hardware layer is leverage, and leverage is what keeps margins from being dictated by whoever owns the only fab slot you can reach.

    It is worth pausing on the fact that the round explicitly funds safety and interpretability research alongside scaling, and not as a footnote. Most companies treat safety spend as a cost center to be minimized once growth kicks in. Naming it first, ahead of compute and products, is a statement about where Anthropic believes its durable advantage sits. If models keep getting more capable, the binding constraint on deployment inside regulated industries (finance, healthcare, government) becomes trust, not intelligence. Interpretability is the work that turns a black box into something an enterprise risk committee can actually sign off on. Framed that way, safety research is not philanthropy subtracted from the bottom line. It is the thing that unlocks the most lucrative and defensible parts of the market, and pairing it with the scaling budget is the tell.

    Finally, look at distribution. Claude now ships on all three major clouds at once: AWS, Google Cloud, and Microsoft Azure. In a market where most frontier labs are tethered to a single hyperscaler, being available everywhere enterprises already run their workloads is a structural edge. It removes the procurement friction of asking a customer to adopt a new vendor relationship, and it means Anthropic competes on the merits of the model rather than on which cloud a buyer happened to standardize on years ago. Combine that omnipresent distribution with the compute reservations and the explicit safety mandate, and the shape of the strategy is clear. This is not a company buying time. It is a company buying the three things that actually compound: capacity that cannot be rushed, trust that cannot be faked, and reach into every place where work already happens.

    Key Takeaways

    • Anthropic raised $65 billion in its Series H funding round, one of the largest private financings in the history of the technology industry.
    • The round set Anthropic’s post-money valuation at $965 billion, placing the company within reach of the $1 trillion mark.
    • Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital led the Series H round.
    • Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN served as co-leads on the investment.
    • The new capital builds on $15 billion in previously committed hyperscaler investments, which includes $5 billion from Amazon.
    • Anthropic crossed $47 billion in run-rate revenue earlier in May 2026, reflecting the surging commercial demand for Claude.
    • A core priority for the funding is to advance Anthropic’s safety and interpretability research.
    • The company will use the capital to expand compute capacity in order to meet growing demand for Claude.
    • Anthropic plans to scale the products and partnerships that customers depend on across its business.
    • CFO Krishna Rao said the funding will help Anthropic serve the historic demand it is experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.
    • Amazon is providing 5 gigawatts of compute capacity as part of Anthropic’s infrastructure expansion.
    • Google and Broadcom are supplying 5 gigawatts of TPU capacity to power Claude’s growth.
    • SpaceX is contributing GPU access to Anthropic’s compute footprint.
    • Micron, Samsung, and SK hynix are partnering with Anthropic on memory and infrastructure to support its scaling needs.
    • Claude is available on all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure.
    • Anthropic reports widespread enterprise adoption of Claude across a broad range of industries.

    Detailed Summary

    The Raise and the Valuation

    Anthropic has raised $65 billion in Series H funding, a round that values the company at $965 billion on a post-money basis. The size of the raise places it among the largest private financing events the technology industry has ever seen, and the valuation pushes Anthropic to the doorstep of the trillion dollar mark. The capital arrives at a moment when demand for the company’s Claude models has accelerated sharply, and the round is built to fund the response to that demand rather than simply mark a milestone. Anthropic framed the financing in its Series H announcement as the fuel for staying at the research frontier while scaling the infrastructure and products that customers increasingly rely on.

    Who Put In the Money

    The Series H was led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, a group that combines deep growth-stage technology experience with conviction in Anthropic’s long-term trajectory. Joining as co-leads were Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN, a roster that spans crossover funds, sovereign wealth, and institutional investors. Beyond the new equity, Anthropic pointed to $15 billion in previously committed hyperscaler investment, including $5 billion from Amazon. Taken together, the investor base reflects a mix of financial backers and strategic partners with a direct stake in seeing Claude reach more customers and more compute.

    Revenue at $47 Billion Run-Rate

    Underpinning the valuation is a business that has scaled with unusual speed. Anthropic crossed a $47 billion run-rate revenue figure earlier in May 2026, a number that signals how quickly enterprises and developers have adopted Claude across their workflows. Run-rate revenue annualizes the company’s most recent performance, and at this level it puts Anthropic firmly among the fastest growing software businesses on record. That financial momentum is the practical justification for both the round’s size and the near trillion dollar valuation investors were willing to support.

    The Compute Buildout

    A large share of the strategy behind the raise centers on securing compute at enormous scale. Anthropic detailed a set of infrastructure partnerships designed to keep pace with Claude demand. Amazon is providing 5 gigawatts of capacity, while Google and Broadcom together are supplying 5 gigawatts of TPU capacity. SpaceX is contributing GPU access, broadening the range of silicon Anthropic can draw on. Supporting the buildout on the hardware supply side are Micron, Samsung, and SK hynix, the memory and component partners whose output is essential to standing up data centers at this magnitude. The combined picture is a company assembling power, chips, and supply chain commitments measured in gigawatts rather than racks.

    Where the Money Goes

    Anthropic outlined three priorities for the new capital. The first is to advance safety and interpretability research, continuing the work of understanding how models behave and ensuring they remain reliable as they grow more capable. The second is to expand compute capacity to meet the growing demand for Claude, the practical engine behind the infrastructure commitments above. The third is to scale the products and partnerships that customers depend on, deepening the company’s reach into the tools and platforms where work actually happens. Krishna Rao, Anthropic’s chief financial officer, said the funding “will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.”

    Claude Everywhere

    The funding lands on top of a distribution footprint that already spans the major cloud ecosystems. Claude is available on all three leading cloud platforms, AWS, Google Cloud, and Microsoft Azure, which means enterprises can reach the models through whichever provider they have standardized on. That availability has translated into widespread enterprise adoption across industries, from software and finance to healthcare and beyond. By being present everywhere developers and businesses already operate, Anthropic positions Claude not as a destination customers must travel to but as a capability woven into the platforms they use every day.

    Notable Quotes

    This funding will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.

    Krishna Rao, CFO at Anthropic, on the purpose of the Series H round.

    Advance safety and interpretability research, expand compute capacity to meet growing Claude demand, and scale products and partnerships customers depend on.

    How Anthropic describes its use of funds from the round.

    For the full details on the round, the lead and co-lead investors, and how Anthropic plans to deploy the capital across safety research, compute, and products, read the full announcement here.

    Related Reading

    • Anthropic, the AI safety and research company behind Claude that raised this Series H round.
    • Sequoia Capital, one of the lead investors anchoring the financing.
    • Amazon Web Services, one of the three major cloud platforms where Claude is available and the source of a $5 billion investment.
    • Google Cloud TPUs, the tensor processing units behind the 5 gigawatts of TPU capacity in the Google and Broadcom partnership.
    • AI safety, the research field at the center of how Anthropic says it will use the new funding.
  • Raoul Pal: Why the Crypto Bull Run Is Just Starting, the AI Economic Singularity, and Why You Should Never Sell Bitcoin

    Macro investor and Real Vision co-founder Raoul Pal returned to the When Shift Happens podcast for episode 173 to argue that the recent crypto drawdown is a nasty correction inside a much larger bull market, not the end of the cycle. Across an hour and a half he ties together the AI capital race, the coming economic singularity, why layer one blockchains are a kind of universal basic equity, and the deceptively simple discipline that actually compounds wealth: buy, hold, and almost never sell.

    TLDW

    Pal frames everything through what he calls the universal code, the conversion of units of energy into units of intelligence, and says the global race to fund AI is so large that no government or company can stop feeding it capital. That liquidity, plus relentless currency debasement, is the engine under both the AI stocks going vertical and the crypto market that has lagged them. He calls the Bitcoin slide from 126K toward 60K a normal correction in a bull market, says liquidity is now reaccelerating, and argues smart contract layer ones (Ethereum, Solana, Sui) are the best risk-adjusted bet because the entire financial system and a coming swarm of AI agents will run on those rails, giving crypto an effectively infinite total addressable market. He explains why he added Zcash as a Bitcoin-with-privacy and quantum-proof trade, lays out his plan to launch an NFT fund built around grail digital art and NFT-backed lending, and makes a data-backed case that buying oversold dips and never selling beats trying to trade cycles. The conversation closes on a 70/30 bullish framework for 2026 and 2027 and a reflection on kindness.

    Thoughts

    The strongest idea in this conversation is not a price target, it is a reframe. Pal keeps pulling the camera back from “what will Bitcoin do this quarter” to “what is the organizing principle of the entire economy right now,” and his answer is the funneling of all available capital into anything that produces intelligence. Once you accept that frame, the buy-the-dip behavior in both AI equities and crypto stops looking like mania and starts looking like a rational response to a one-way game. The part worth sitting with is his game-theory claim that neither the US nor China can stop, and that even a spectacular failure like an OpenAI blowup would simply trigger an instant asset auction rather than a collapse, because no single player can be allowed to win outright. Whether or not that is fully true, it is a genuinely different mental model than the recession-and-bust cycle most investors carry around.

    His layer-one thesis is the most actionable takeaway and also the most quietly radical. The pitch is that for the first time ordinary people can own a piece of the core infrastructure that the machine economy will be built on, the way you never got to own a slice of TCP/IP or the open web. He calls this universal basic equity and treats it as humanity’s pension plan. The honest tension he admits is that the racy returns may not be in the boring base layer at all, and that the truly investable winners of this era, the private stablecoin companies, are largely closed off to retail. So the layer-one trade is partly a consolation prize for the fact that the best businesses are unreachable. That is a more candid admission than most crypto bulls will make.

    The behavioral core of the episode is the most useful for a normal reader, and it is almost embarrassingly simple. Pal has been in markets for 35 years and says he does not know a single person who reliably buys bottoms and sells tops, including the legends, who he points out made most of their money on management fees rather than heroic trades. His prescription is to add only when the asset is one to two standard deviations oversold on its long-term log trend, otherwise do nothing, and to treat patience as an action rather than inaction. The line that does the most work is “the market owes you nothing.” It quietly dismantles the entitlement that drives people to overtrade, chase, and burn emotional energy on a strategy that the data says underperforms simply holding.

    Where a reader should keep some skepticism is the certainty. Pal assigns the bull case a 70 percent probability and the bear case 30, but the bear case he sketches (Middle East war reignites, inflation forces tightening, liquidity gets starved, the intelligence buildout slows) is not a minor footnote, it is the whole structure failing at once. The thesis also leans hard on the assumption that AI agents will become massive on-chain economic actors, which is plausible but still mostly forward-looking rather than observed. The value here is the framework, not the forecast. If you take one thing, take the energy-into-intelligence lens and the standard-deviation discipline, and hold the specific tickers and timelines loosely.

    Key Takeaways

    • Pal’s central frame is the universal code: the universe, and now the economy, continuously converts units of energy into units of intelligence, and capital flows to whatever produces the most intelligence.
    • The AI buildout is a race of nations and corporations that nobody can exit. Game theory means neither the US nor China can stop, because the other side would gain a decisive advantage.
    • Even a catastrophic AI failure would not break the trend. If OpenAI ran out of money, its assets would be auctioned instantly to multiple buyers so no single company could double its compute and win the whole game.
    • The economic singularity is the point where institutions and the way we measure the economy can no longer keep up with the speed of technology, made worse when AI and robots are added to the population as economic actors.
    • AI is the first real-world example of Reed’s law, the exponential of the exponential, where most past technology followed the slower Metcalfe’s law log channel.
    • By around 2028, roughly five to six years after AI went mainstream, AI will have produced more words than all of humanity has produced in sum total since the Gutenberg press.
    • The current run is funded by cash flow, not debt. Unlike the late-1990s tech boom, the buildout is paid for out of the earnings of the most cash-generative firms in history.
    • Chips and energy are the binding constraints. Companies report being booked out three years and beyond, and xAI is reportedly handing older data centers to Anthropic because no one can get enough compute.
    • Pal expects the Fed to run a Greenspan-style playbook, cut rates and then get out of the way, letting a productivity miracle grow the economy faster than the debt pile so debt to GDP falls.
    • Bitcoin falling from 126K toward 60K is a nasty correction in a bull market, not a bear market. Pal has seen many 50 percent Bitcoin drawdowns since 2013, and altcoins always fall further on the risk curve.
    • The 2025 to 2026 correction has been choppy and slow rather than the fast V-shape of 2021, which is part of why sentiment feels so bad.
    • Crypto lagged because liquidity is finite. The government shutdown withdrew liquidity, which hits crypto with about a three-month lag, while AI capex and Chinese gold buying sucked capital away.
    • Liquidity is now reaccelerating in the US, China, and globally, which Pal sees as the reason the worst is likely over for crypto.
    • The birth of economic agents in late 2024 gives crypto an effectively infinite total addressable market, since agents will be economic actors that hold treasuries, make payments, and transact on-chain.
    • Smart contract layer ones are Pal’s preferred bet. He compares the structure to operating systems and cloud, where value concentrates into three to five major players plus a few specialists.
    • He calls owning layer ones universal basic equity and humanity’s pension plan, the chance to own the rails the agentic economy will run on, something the internet never offered retail.
    • Discounted cash flow analysis is the wrong tool for valuing a blockchain. The whole purpose of the network is to be the cheapest, fastest, and most programmable, so high fees are a bug, not a strength.
    • Pal measures layer ones by intelligence density: number of developers, programmability, speed to finality, applications per user, and the ratio of stablecoins to total value locked as stored energy.
    • Only three tokens maintained economic density when the market fell 80 percent: Ethereum, Solana, and Sui. ETH is the safe Microsoft-like choice, Solana is faster and cheaper, Sui is earlier but extremely fast and programmable.
    • Pal added Zcash in the correction as a Bitcoin-with-privacy trade. The left-curve case is simple privacy value, the right-curve case is that it is also quantum-proof and a hedge against AI-enabled state surveillance.
    • He admits he did not execute the Zcash buy well, kept meaning to add more while traveling, and watched it run up 50 percent. He treats it as a small position, not a portfolio overhaul.
    • On Hyperliquid he is complimentary but uninvested, because he does not trade, use perps, or use leverage, and he expects Robinhood and Coinbase to compete hard for that niche.
    • DeFi is better suited to machines than humans. Agents may not even need front ends or websites, just low-friction access to swap across multiple stablecoins and currencies instantly.
    • DeFi is not dead despite mega-hacks. Pal argues hacks force better products, and notes that banks quietly absorb theft losses too, so the answer is to build more secure systems.
    • The entire financial system is moving to blockchain rails because they are the most efficient way to operate, a prediction Pal first made in 2014 before smart contracts existed.
    • Pal is launching an NFT fund focused on grail assets (one-of-one alien CryptoPunks, top artists) trading from roughly 600K to tens of millions, plus a convex middle tier of artists with social consensus.
    • He names artists like Dies with the most likes (whom he compares to a Hunter S. Thompson of art) and Kim Asendorf, whose work uses tokens at the pixel level.
    • The fund will also lend against NFTs for yields around 15 percent or more, acquiring assets cheaply if borrowers default and recycling yield into emerging artists.
    • His real estate analogy: a smaller NFT in a great collection is like a modest apartment in a billionaire neighborhood, while grails are the 20 million dollar penthouses that actually compound.
    • Bitcoin is partly an AI proxy because global savings should rise as AI lifts economic growth, and Bitcoin targets a share of those savings as a digital store of value.
    • The core mindset shift: if you know where the world is going and roughly where market cap is heading on the log trend, you would never sell, you would only ever accumulate.
    • Selling well is nearly impossible. Even if you take profit at two standard deviations overbought, adding it back at the bottom is something almost no one actually manages.
    • The people who made the most money in crypto are the ones who did not trade it. Pal cites holders who profited by doing essentially nothing while active traders lost their edge.
    • Pal’s discipline requires roughly two to three actions every five years: add when one to two standard deviations oversold, optionally trim when two standard deviations overbought, otherwise nothing.
    • By his standard deviation measure, Bitcoin and crypto are as cheap as they have been in their long-term uptrend versus the NASDAQ, which he reads as a signal to allocate more to crypto.
    • Fear and greed sat below 10 for the longest stretch in the index’s history during this correction, hitting its lowest reading ever, a classic oversold extreme.
    • His 2026 to 2027 bull case stacks stablecoin explosion, the Clarity Act getting signed, rising global liquidity, debt rollovers forcing money printing, a strong business cycle, AI agents, and a cheap entry point. He puts it at roughly 70/30 to the upside.

    Detailed Summary

    Two economies and the money illusion

    The conversation opens loosely with travel, stablecoin spending, and a riff on why people agonize over a 75 dollar airport breakfast but happily lose money on an NFT that drops 80 percent. Pal’s explanation is that we live in two economies at once. The crypto and tech economy can grow 50 to 150 percent in a good year, while the real economy grows around 2 percent. Money earned in the fast economy does not feel real, which is why people spend and speculate so freely with it. This sets up the rest of the episode, where Pal treats the fast economy as the place serious capital is being forced to go.

    The AI capital race nobody can stop

    Asked why the stock market only seems to go up, Pal gives two reasons: liquidity expansion and the most extraordinary capital event in human history, the funneling of all capital into intelligence. He frames it as a race of nations, corporations, and individuals that cannot be slowed because of game theory. No superpower can let another reach AGI alone, only the US and China can afford the race, and neither can stop without ceding the advantage. He even games out an OpenAI bankruptcy and concludes the US would instantly auction the assets across many buyers rather than let one firm double its compute and win, which is why he calls the whole thing too big to fail. The practical conclusion is blunt: buy the dip, because the structure forces capital to keep flowing.

    The economic singularity, Reed’s law, and electricity through sand

    Pal defines the economic singularity as the moment when institutions and our economic measurements can no longer cope with the speed of technology, especially once AI and robots count as population. He explains that almost all past technology adoption followed Metcalfe’s law, a log channel visible in the charts of Google, Facebook, and the NASDAQ, but AI is the first observed example of Reed’s law, the exponential of the exponential. To make it concrete he cites ARK research showing AI will, by roughly 2028, have produced more words per year than all of humanity, and notes Anthropic expected 10x growth and got 80x in a quarter. He marvels that we are putting electricity through silicon, the second most common element on Earth, and producing intelligence six orders of magnitude faster than a human neuron.

    Why crypto lagged and why the worst is over

    Pal explains the crypto underperformance mechanically. There is only so much liquidity, the government shutdown withdrew it, and that hits crypto with roughly a three-month lag, landing right in the middle of the October drawdown. At the same time, the AI buildout and Chinese gold buying pulled capital toward the longest-duration assets, leaving SaaS and crypto with nearly identical charts as they got left behind. His read for 2026 is that liquidity is now reaccelerating across the US, China, and the world, so there is nothing to worry about yet. The Bitcoin move from 126K toward 60K is, in his framing, a normal correction, comparable in length to the roughly six-month 2021 pullback that resolved into new highs.

    Layer ones as universal basic equity

    The heart of the investment thesis is that smart contract layer ones will accrue a growing share of crypto value as the investable infrastructure layer. Pal argues the entire financial system plus a coming swarm of AI agents will use these rails, giving crypto an infinite total addressable market. Like operating systems and cloud, value will concentrate into three to five chains plus specialists. He measures them by intelligence density rather than discounted cash flow, since the point of the network is to be cheapest and fastest. By his analysis only Ethereum, Solana, and Sui held economic density through an 80 percent drawdown. ETH wins on developers, security, and Lindy effects (the Microsoft you do not get fired for owning), Solana is faster and cheaper, and Sui is earlier but offers a different order of magnitude on speed, finality, and programmability. He frames owning a basket of four or five as humanity’s pension plan.

    Zcash, privacy, and the quantum hedge

    Pal reveals he added Zcash during the correction, alongside buying more Sui. He had said in December he would wait for it to pull back, and he did, though he admits he did not buy enough as it ran up 50 percent. His left-curve case is that privacy has real value and people will understand it more, making it essentially Bitcoin with privacy that could plausibly reach 5 to 10 percent of Bitcoin’s value. His right-curve case is that it is also quantum-proof and a hedge against governments wielding AI-enabled control over people. He dismisses the mid-curve worry that it will be banned, noting that the ban fear has shadowed crypto his entire career and never materialized.

    Agents, DeFi, and financial rails

    Pal argues the biggest future users of DeFi and crypto payments will be AI agents, whose scale is effectively infinite. Setting up agents himself, he keeps hitting walls that require small payments, and sees agents making endless micro-payments plus larger transactions, holding treasuries across multiple stablecoins and currencies, and rebalancing through DeFi instantly without any human involved. DeFi, he says, is actually better suited to machines than people, and may not even need front ends. On the wave of mega-hacks he is unbothered, arguing they force better products, that banks quietly absorb theft too, and that the financial system always migrates to the most efficient rails because that is how you make more money. He first predicted blockchain would become the financial industry’s infrastructure rail back in 2014.

    The NFT fund and grail digital art

    Pal is launching an NFT fund because so many people told him they want exposure but do not know how. The fund targets grail assets, the scarce one-of-one pieces with proven social consensus that trade from around 600K into the tens of millions, plus a convex middle tier of artists who have long-term proven value and could be wildly re-rated. He names Dies with the most likes, an Indiana artist cataloging the decline of middle America whom he likens to Hunter S. Thompson, and German artist Kim Asendorf, whose 3D works are built from individually tokenized pixels. The math of convexity is the draw: an artist re-rating from 20 to 200 ETH while ETH itself multiplies could compound into a 100x. The fund will also lend against NFTs for yields above 15 percent, acquiring assets cheaply on default and recycling yield into emerging artists, and will build a club connecting investors to artists. His real estate framing reassures smaller holders: owning a lesser piece in a top collection is like a modest flat in a billionaire neighborhood.

    Never sell, and the math of patience

    The behavioral spine of the episode is Pal’s argument that buying, holding, and accumulating beats trading cycles. He has built a Real Vision indicator that signals a buy when an asset is one to two standard deviations oversold on its log regression channel, and says it compounds at a stupid rate. The problem with selling is deciding how much and then having the discipline to buy it back at the bottom, which almost no one does. In 35 years he says he has never met anyone who reliably buys bottoms and sells tops, and notes the trading legends made most of their money on management fees. The people who made the most in crypto are the ones who did nothing. He reframes holding as patience, an active stance, and ties it back to the universal code: buying Bitcoin and doing nothing is the most energy-efficient trade you can make, while overtrading burns mental and emotional energy for a worse outcome. His advice to those tempted by AI’s vertical charts is to go play with AI and just hold your Bitcoin.

    The 2026 to 2027 outlook

    Pal closes the macro case by stacking the bull factors: a massive stablecoin expansion over the next 24 months, the Clarity Act getting signed and freeing builders, rising global liquidity, trillions in interest payments that force more money printing, a strong business cycle recycling earnings into speculative assets, the arrival of AI agents, and a cheap entry point with fear and greed at historic lows. He even floats a permanent resolution of Middle East conflict as part of the upside. The bear case is the mirror image: war reignites, inflation runs hotter, tightening starves capital, and the intelligence buildout slows. He puts the odds at roughly 70 percent bullish, 30 percent bearish, and says he does not see the bear case yet. The episode ends on a personal note about kindness, with Pal unable to name a single kindest act because, he says, everything is made of kindness.

    Notable Quotes

    “We’re going through the most extraordinary time in human history. Nothing else matters. This whole funneling of all capital into intelligence is the biggest race that’s ever happened.”

    Raoul Pal, on why capital keeps flooding into AI

    “The game is so big that nobody will stop.”

    Raoul Pal, on the game theory of the US and China AI race

    “This is how amazing it is. We’re putting electricity through sand and creating intelligence.”

    Raoul Pal, on silicon and the universal code

    “It’s a nasty correction in a bull market. I’ve been in crypto since 2013. I’ve seen many corrections, non-bear markets of 50% in Bitcoin.”

    Raoul Pal, on Bitcoin falling from 126K toward 60K

    “The market owes you nothing. You would just have to be better at doing a job.”

    Raoul Pal, on the entitlement that ruins crypto investors

    “This is humanity’s pension plan. We get to invest in the infrastructure rails of which all the agentic economy will run.”

    Raoul Pal, on owning layer one blockchains

    “The people who’ve made the most money out of crypto are the people who don’t trade it.”

    Raoul Pal, on why holding beats trading

    “Your job is to be a mercenary for your own capital. You want to make the most money over time.”

    Raoul Pal, on why no one has to stay loyal to crypto

    “Bitcoin and crypto is as cheap as it has been in its long-term uptrend versus NASDAQ.”

    Raoul Pal, on the relative value signal he watches

    This is a compressed look at a wide-ranging conversation. Watch the full episode on When Shift Happens here for Pal’s complete reasoning, the charts he references, and the back-and-forth that the summary above leaves out.

    Related Reading

    • Real Vision the financial media platform Raoul Pal co-founded, where his Global Macro Investor research and exponential age thesis live.
    • Metcalfe’s law (Wikipedia) the network-value relationship Pal uses to model the log regression channel for crypto.
    • Reed’s law (Wikipedia) background on the exponential-of-the-exponential growth Pal says AI is the first real-world example of.
    • Technological singularity (Wikipedia) context for the economic singularity Pal argues is now only about four years away.
    • Zcash the privacy coin Pal added in the correction as a Bitcoin-with-privacy and quantum-proof trade.
  • Gavin Baker on Orbital Compute, TSMC, Frontier AI Models, Anthropic’s Vertical Take Off, and the Coming Wafer Shortage

    Gavin Baker, founder and CIO of Atreides Management, returns to Patrick O’Shaughnessy’s Invest Like the Best for his sixth appearance. He calls the current AI moment the most extraordinary moment in the history of capitalism, walks through what Anthropic’s vertical takeoff in revenue actually means, lays out why orbital compute is closer than skeptics believe, dissects the TSMC bottleneck that may be the only thing standing between today’s market and a full-on AI bubble, and rates every hyperscaler on how they have positioned for a world where frontier model providers may stop selling API access altogether.

    TLDW

    Anthropic added eleven billion dollars of ARR in a single month, which is roughly the combined business of Palantir, Snowflake, and Databricks built over a decade. That is the setup. From there Gavin Baker covers the March and April selloff, the contrarian read that a closed Strait of Hormuz was actually bullish for American manufacturing competitiveness, why Anthropic and OpenAI multiples may be misleadingly cheap on an unconstrained run rate basis, why Elon Musk’s discipline on SpaceX valuation created a superpower of permanent access to capital, the practical engineering case for orbital compute as racks in space rather than Pentagon sized space stations, why TSMC’s capacity discipline is the single most important variable in whether the AI cycle becomes a bubble, what Terafab in Texas changes, why the Pareto frontier of AI models has flipped from Google dominance to Anthropic and OpenAI dominance in nine months, the shift from all you can eat AI subscriptions to usage based pricing and what that means for revenue scaling, Richard Sutton’s bitter lesson as the largest risk to the AI trade, why frontier tokens still capture an overwhelming share of economic value, the role of continual learning as the third great open question, why most new chip startups should not try to build a better GPU, why Cerebras did something different and hard, why disaggregated inference may extend GPU useful lives to ten or fifteen years and rescue the private credit industry, why being in the token path is the new venture filter, the new prisoner’s dilemma around releasing frontier models via API, an honest rating of Google, Meta, Amazon, and Microsoft, why personal safety is becoming a real AI era risk, and why he remains an AI optimist maximalist who believes this could be the next Pax Americana.

    Key Takeaways

    • Anthropic added eleven billion dollars of ARR in one month, more than the combined businesses of Palantir, Snowflake, and Databricks built across a decade. There is no precedent for this in the history of capitalism.
    • The SaaS and cloud revolution created between five and ten trillion dollars of value over twenty years. AI is replaying that compression on a timeline measured in months.
    • The March selloff was a drawdown driven by disagreement with price action, not invalidated thesis. That is the kind of drawdown an investor can lean into.
    • Deep Seek Monday in January 2025 was a similar setup. By the day of the selloff, AWS Asia GPU prices had already doubled, GPU availability had fallen, and it was obvious reasoning models would be vastly more compute hungry at inference. The market priced the opposite.
    • The Strait of Hormuz closing was actually positive for America. US natural gas (the primary input into US electricity, which feeds AI) fell twenty percent on Bloomberg while Asian and European natural gas doubled or tripled. American manufacturing competitiveness improved overnight.
    • The US is now the world’s largest producer and exporter of oil and gas. The economy is dramatically less energy intensive than in the 1970s. The shortage trauma comparison does not hold.
    • Tech as a sector traded as cheaply versus the rest of the market in early April as at any point in the last ten years, into the single most bullish moment for AI fundamentals on record.
    • Anthropic is dramatically more capital efficient than OpenAI, having burned roughly eighty percent less to reach a similar revenue scale. They have very different structural returns on invested capital.
    • Anthropic at roughly nine hundred billion for fifty billion of ARR (growing a thousand percent) is striking. Adjusted for compute constraint, the unconstrained run rate could be one hundred fifty to two hundred billion, putting the implied multiple closer to five times.
    • Claude Opus generates roughly seventy percent fewer tokens for the same question than previously, with token quantity tied to answer quality. Subscribers on flat-fee plans are getting a lobotomized model.
    • Elon Musk’s superpower is twenty years of making investors money. He never pushes valuation. SpaceX compounded low thirty percent per year for a decade because Musk treats fair pricing as a sacred covenant.
    • Capitalism will solve the watts shortage. The current bottleneck has shifted from chips and energy to zoning and political approval. Many capex decisions are paused until after the US midterms.
    • The watts shortage probably begins to alleviate in 2027 and 2028. Orbital compute solves it longer term.
    • Orbital compute is not Pentagon sized data centers in space. It is racks in space. A Blackwell rack is three thousand pounds, eight feet tall, four feet deep, three feet wide. SpaceX has shown a satellite roughly that size.
    • The satellites operate in sun synchronous orbit so solar wings (around five hundred feet per side) always face the sun and the radiator on the dark side always points to deep space.
    • Starlink V3 satellites already run at around twenty kilowatts. A Blackwell rack runs at one hundred kilowatts. SpaceX engineers express genuine confidence they have already solved cooling and radiator design at these scales.
    • Racks in space are connected with lasers traveling through vacuum, the same lasers already on every Starlink. SpaceX operates the world’s largest satellite fleet and, via xAI Colossus, the world’s largest data center on Earth.
    • Inference will move to orbit. Training will stay on Earth for a long time. Terrestrial data centers remain valuable for the rest of an investor’s career.
    • The wafer bottleneck is structural and political. TSMC is essentially Taiwan’s GDP, water, and electricity. The leaders see themselves as inheritors of Morris Chang’s sacred legacy and they do not behave like a Western public company.
    • Jensen Huang has never had a contract with TSMC. The relationship is run on handshakes and the assumption that things will be fair over time.
    • If TSMC did everything Jensen wanted, Nvidia could be selling two to three trillion dollars of GPUs in 2026 and 2027. TSMC’s discipline is the single largest factor preventing a true AI bubble.
    • Historically, foundational technologies always get a bubble. Railroads, canals, the internet. The current AI buildout is overwhelmingly funded out of operating cash flow, GPUs are running at one hundred percent utilization, and that is fundamentally different from the year 2000 fiber overbuild.
    • If one of Intel or Samsung Foundry catches up at the leading node, the other will follow, and TSMC’s discipline collapses. Watch TSMC capacity decisions to predict a bubble.
    • Terafab, the SpaceX and Tesla joint venture to build the world’s largest fab in America, has a partnership with Intel that grants access to fifty years of institutional foundry knowledge. The A teams at ASML, KLA, Lam Research, and Applied Materials will follow Elon’s reputation in hardware engineering.
    • The hiring playbook for Terafab includes building Taiwan Town, Japan Town, and Korea Town next to the fab. Recruit the engineers and import their families, their restaurants, and their staff.
    • Frontier tokens still capture an overwhelming share of all economic value created at the model layer. This is surprising and is one of the three big open questions for AI investing.
    • The Pareto frontier of intelligence versus cost has flipped. Nine months ago Google’s TPU dominated every point on the frontier. Today Anthropic and OpenAI dominate, with Grok 4.3 on the frontier and Gemini 3.1 hanging on.
    • Google’s conservative TPU V8 design (partly an attempt to reduce dependence on Broadcom and Nvidia) is the leading explanation for the loss of per token cost leadership.
    • AI pricing is shifting from all you can eat to usage based, mirroring the cellular and long distance industries. Cellular stopped being a great growth industry when it went all you can eat. AI just made the opposite move.
    • OpenAI and Anthropic together could exceed two hundred billion in ARR this year if compute keeps coming online and frontier token pricing holds.
    • The two hundred fifty dollar a month consumer AI plan is no longer enough to evaluate frontier capability. Enterprise plans with usage based billing are required because rate limits are now severe.
    • The three biggest open questions for AI investors are: violation of the bitter lesson via ASI or human ingenuity, whether frontier tokens keep commanding their premium, and when continual learning arrives.
    • Today’s continual learning is crude reinforcement learning during mid training on verifiable tasks. True continual learning means weights updating dynamically, like a human who learns the first time they touch fire.
    • Trying to build a better GPU is a losing strategy. Jensen will copy any one to three percent share design. Startups should target one percent share, do something different, and make it hard enough that Nvidia cannot fast follow.
    • Disaggregated inference (separating prefill and decode) opens new design canvases. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently.
    • Cerebras did something different and hard with wafer scale computing. Three generations of chips and real grit to get there.
    • Disaggregation of inference may stretch GPU useful lives to ten or fifteen years, dropping financing costs from low sevens to five or six percent, mathematically lowering the cost of the AI buildout and likely saving the private credit industry from its SaaS loan exposure.
    • Sellers of shortage outperform buyers of shortage. But owning the largest installed base of what is currently in shortage (hyperscaler CPU fleets, for example) is also a strong position.
    • Most of the economic value at the application layer of AI has been destroyed, not created. The exceptions are companies in the token path or in niches small enough that frontier labs ignore them.
    • Coding may be the shortest path to ASI. If you can write code, you can write code that does anything. Cursor, Cognition, and Anthropic correctly focused on it.
    • Jensen could probably get close to the frontier with his own Nemotron family of models whenever he wants. The fact that he chooses not to is a strategic decision about not commoditizing his customers.
    • The new prisoner’s dilemma in AI is whether frontier labs release their best model via API. If everyone agrees not to, Chinese open source falls behind. If anyone defects, the defector pulls ahead on revenue and resources, forcing everyone else to defect.
    • Google still owns the largest compute installed base. Without TPU’s prior cost advantage, this matters more. YouTube data has real value in a world of robotics. GCP is going crazy.
    • Meta deserves credit for becoming AI first internally faster than any other internet giant. Musa, their first MSL model, is impressively close to the Pareto frontier.
    • Amazon is strong because of Trainium and robotics driven retail P&L efficiency. Nova is better than it gets credit for.
    • Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Microsoft products rather than reselling to OpenAI is a courageous and probably correct call, even at the cost of an eight hundred dollar stock price.
    • The hyperscalers most engaged with startups are Amazon and Nvidia by a mile, followed by Google. Broadcom is the favorite ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement and that will cost them as the best teams are now at startups.
    • Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion at the speed of FaceTime is already feasible.
    • Ukraine is winning largely on the back of having the best battlefield AI outside America and Israel. Adversaries are starting to internalize what AI dominance means geopolitically.
    • An optimistic read is that this becomes a new Pax Americana, the way the post 1945 American nuclear monopoly was used to rebuild Germany and Japan rather than dominate.
    • AI cured a friend’s daughter’s rare disease by spinning up a research effort that identified a market drug capable of impacting her condition. That is the upside that keeps Gavin an AI optimist maximalist.

    Detailed Summary

    The most extraordinary moment in the history of capitalism

    Gavin’s framing of the current moment is unusually direct. Anthropic added eleven billion dollars of annual recurring revenue in a single month. The three highest profile SaaS companies of the last decade plus, Palantir, Snowflake, and Databricks, took a decade and tens of thousands of employees collectively to build the combined business that Anthropic added in thirty days. He has been investing through every major tech cycle and says there is no historical analog. Not the dotcom era, not the cloud transition, not mobile. This is its own thing.

    The market response, then, was peculiar. The NASDAQ sold off into the single most bullish moment for AI fundamentals on record. Tech traded at roughly its widest discount versus the rest of the market in a decade. Investors who said they wished they had bought into AI during 2022, during COVID, or during Deep Seek Monday got the same valuation setup again in early April, this time with an even clearer inflection.

    Why the Strait of Hormuz closing was secretly bullish for America

    One reason the macro fear in March may have been mispriced is that the same geopolitical event that drove the selloff was, in practice, a relative benefit to the United States. American natural gas, the input into American electricity, which is the input into American AI training and inference, fell roughly twenty percent. Asian and European natural gas prices doubled or tripled. The US emerged with sharply improved relative manufacturing competitiveness, which is exactly what the current administration cares about.

    The 1970s comparison does not hold. The US economy is dramatically less energy intensive, it is now the world’s largest producer and largest exporter of oil and gas, and there are no shortages, only price moves. That backdrop made it easier for disciplined investors to stay focused on AI fundamentals through the volatility.

    Anthropic and OpenAI valuations on an unconstrained run rate

    Anthropic at roughly nine hundred billion for fifty billion of ARR sounds rich until you adjust for the fact that the company is severely compute constrained. Gavin estimates that, unconstrained, Anthropic might be at one hundred fifty to two hundred billion in run rate revenue, putting the implied multiple closer to five times. He also points out that Claude Opus now generates roughly seventy percent fewer tokens for the same question than it used to. Token quantity correlates with answer quality, and Anthropic is rate limiting and shrinking outputs to ration capacity across its user base.

    Anthropic and OpenAI are also structurally very different. Anthropic has burned around eighty percent less cash than OpenAI to reach a comparable revenue scale. That implies very different long term returns on invested capital, though OpenAI has done a better job locking in compute and Sarah Friar is one of the most exceptional CFOs Gavin has worked with.

    Why neither lab is raising at a three trillion dollar valuation

    The answer Gavin gives is that both labs are deliberately leaving valuation on the table the way Elon has done for two decades. SpaceX compounded at low thirty percent annually for a decade because Elon never pushed price. The result is a permanent superpower of access to capital. Investors trust him because they have made money with him for twenty years. That is a moat that compounds with every round.

    Anthropic could probably raise at a one hundred percent premium to its rumored latest mark. They are choosing not to. In an uncertain world (Ukraine, Russia, Iran, Taiwan), preserving the ability to raise more capital later at fair prices is more valuable than maximizing this round.

    Watts and wafers, the two real constraints

    Capitalism is solving the watts problem. The leading PE infrastructure investors now say zoning and political approval, not chips or energy, are the gating factors. Companies are deferring big capex announcements until after the US midterms. Turbine capacity is being doubled at the manufacturers. Companies like Boom Aerospace are repurposing jet engines for grid use. Watts probably ease meaningfully in 2027 and 2028 and then orbital compute does the rest.

    Wafers are the harder problem because they live in Taiwan, run on handshakes, and depend on a corporate culture that does not respond to public market incentives. TSMC is essentially the GDP, water consumption, and electricity consumption of Taiwan. Its leadership treats the company as the legacy of Morris Chang. The Silicon Shield doctrine is real and internal.

    Orbital compute as racks in space

    The biggest mental update Gavin asks listeners to make is to stop picturing data centers in space as Pentagon sized space stations. A Blackwell rack is three thousand pounds and roughly the size of a refrigerator. SpaceX has shown a concept satellite of about that size. Solar wings extend five hundred feet to each side and the radiator extends hundreds of feet behind, both possible because the orbit is sun synchronous and the orientation is fixed relative to the sun.

    SpaceX engineers Gavin has spoken to at Starbase express genuine confidence that they have solved cooling at these power levels. They have. Starlink V3 satellites already operate at twenty kilowatts. A Blackwell rack is one hundred kilowatts. The same company operates the world’s largest satellite fleet and the world’s largest data center on Earth via xAI Colossus. The racks are connected to each other with lasers traveling through vacuum, technology already deployed in every Starlink. The naysayers, Gavin observes, are armchair skeptics and Larry Ellison’s response (he is out there landing rockets, no one else is) is the right frame.

    Terafab in Texas and the threat to TSMC’s discipline

    Terafab, the SpaceX and Tesla joint venture, intends to be the largest fab in the world. The partnership with Intel grants access to fifty years of foundry institutional knowledge, allowing Terafab to start three to five quarters behind the leading node rather than fifteen years behind. The A teams at the semicap equipment companies (ASML, KLA, Lam Research, Applied Materials) will follow Elon’s reputation in hardware engineering the same way they followed TSMC twenty years ago when Intel stumbled.

    The talent strategy is the part most observers underestimate. Recruit the best engineers globally, then import their families, their restaurants, their staff. Build Taiwan Town, Japan Town, and Korea Town next to the fab. Optimize the human experience for the people whose work matters. Intel and Samsung do not think that way.

    Bubble watch and the year 2000 comparison

    Every foundational technology in modern history has had a bubble. Railroads, canals, the internet. Carlota Perez documented why. Markets correctly identify the importance, diversity of opinion collapses, supply gets ahead of demand, the bubble crashes. The current cycle has two important differences. The buildout is overwhelmingly funded out of operating cash flow, not debt. Every GPU is running at one hundred percent utilization, while at the peak of the fiber bubble ninety nine percent of fiber was unused.

    TSMC discipline is the single largest reason a bubble has not formed. If Jensen could buy everything TSMC could theoretically make, Nvidia could sell two to three trillion dollars of GPUs in 2026 and 2027. At some point that becomes more than the market can absorb. If Intel or Samsung Foundry catches up at the leading node, the other will too. TSMC’s pricing discipline collapses and the bubble starts.

    The Pareto frontier and the loss of Google’s cost advantage

    The most important chart in AI is the Pareto frontier of model intelligence versus per token cost. Nine months ago, Google’s TPU based models dominated every point on it. OpenAI, Anthropic, and xAI sat inside the frontier. Today the frontier is dominated by Anthropic and OpenAI, with Grok 4.3 on the frontier and Gemini 3.1 hanging on by subsidization more than economics. The most likely cause is Google’s conservative TPU V8 design, an attempt to reduce dependence on Broadcom and Nvidia that sacrificed per token economics.

    The bitter lesson, frontier tokens, and continual learning

    Three open questions dominate AI investing. The first is whether Richard Sutton’s bitter lesson (more compute beats human algorithmic cleverness) gets violated by ASI itself optimizing for efficiency. Closer observers of AI are more skeptical of a violation. Gavin thinks ASI’s first move will be to make itself more efficient and more resourced, which is technically a temporary violation.

    The second is whether frontier tokens keep capturing the overwhelming share of economic value at the model layer. Today they do, surprisingly. Gemini 3.1 Pro was mindblowing nine months ago and is intolerable today. The third is when continual learning arrives. Today’s models need a million fire touches to learn what a human learns from one. True continual learning would mean dynamic weight updates in real time and would produce a fast takeoff.

    From all you can eat to usage based AI pricing

    AI is shifting from flat fee plans to usage based pricing. The historical analogy is cellular and long distance. Both stopped being great growth industries when they went all you can eat. AI just made the opposite move. The consequence is that flat fee subscribers, even on premium consumer plans, get a rate limited and token throttled version of the frontier model. Enterprise plans with usage based billing are now required to evaluate true capability. Gavin thinks the combination of new compute coming online and usage based pricing is what gets OpenAI and Anthropic past two hundred billion in combined ARR this year.

    Chip startups, prefill decode disaggregation, and Cerebras

    Trying to build a better GPU is the wrong move. The four scaled players (Nvidia, AMD, Trainium, TPU) have copy capability for any one to three percent share design that looks attractive. The good news for startups is that disaggregated inference (separating prefill and decode) opens a richer design canvas. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently. Andrew Fox’s analogy is a British naval ship of the eighteenth century. Prefill is loading the cannon. Decode is firing it.

    Cerebras is the model. Wafer scale computing is genuinely different and genuinely hard. It took three generations of chips to get right. Andrew Feldman and his team had the grit to keep going through chip one being a failure. The design has a high ratio of on chip compute and memory relative to shoreline IO, which is why Cerebras is now experimenting with putting an optical wafer on top of the compute wafer to solve scale out.

    GPU useful lives and the rescue of private credit

    One of the strongest claims in the conversation is that disaggregated inference will stretch GPU useful lives to ten or fifteen years. The skeptical narrative (GPUs are obsolete in two years, companies are cooking their depreciation books) is wrong. You can put a Cerebras system or Groq LPU in front of older Hopper or Ampere parts, use them only for prefill, and run them until they physically melt. Private credit, which is in pain from SaaS loans and which underwrote GPU loans on three to four year lives, may be saved by this.

    If GPU financing rates can come down from low sevens to five or six percent, the mathematics of the AI buildout improves materially. That is a structural tailwind that compounds for years.

    The application layer, the token path, and a new prisoner’s dilemma

    Trillions of dollars of value have been destroyed at the application layer, not created. Cursor and Cognition are the rare scaled exceptions, and they got there by focusing on coding very early. As Amjad Masad noted, coding is plausibly the shortest path to ASI because a coding agent can write itself into any new domain. Jamin Ball’s frame is that the new venture filter is whether the company is in the token path. Data Bricks is. Most application layer startups are not.

    Jensen could probably get close to the frontier with Nemotron whenever he wants, and the strategic question of whether to do that is a new prisoner’s dilemma. If every frontier lab agrees not to release best models via API, Chinese open source falls steadily behind. If anyone defects, the defector gains revenue and resources, and everyone else has to defect. The same dynamic exists between TSMC, Intel, and Samsung. If Nvidia or AMD ever truly used an alternative foundry, that foundry would catch up rapidly.

    Rating the hyperscalers

    Google has the largest compute installed base, the YouTube data that matters in a robotics world, and a search business that prints. Their loss of TPU cost leadership is the surprise of the year. If Google IO in five days does not produce a leapfrog model, the Nvidia centric narrative gets even stronger.

    Meta deserves real credit. Zuckerberg made Meta AI first internally faster than any other internet giant, paid up for the talent contracts when no one else would, and shipped Musa as a first model from MSL that is close to the Pareto frontier. Amazon is well positioned on Trainium, robotics in retail, and a Nova model line that is better than it gets credit for. Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Copilot rather than reselling to OpenAI is courageous and probably correct, even at the cost of stock price.

    The most interesting cross hyperscaler metric is startup engagement. Nvidia and Amazon engage deeply with startups. Google is next. Broadcom is the favored ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement, which Gavin believes will cost them as the best teams now sit at startups.

    Personal safety, geopolitics, and the Pax Americana case

    The closing section turns darker. Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion via something that looks exactly like your child calling on FaceTime is already feasible. Political violence against AI leaders is a real concern. Geopolitically, Ukraine is winning largely because it has the best battlefield AI outside America and Israel. How adversaries respond to that asymmetry is the next great variable.

    Gavin’s optimistic frame is the Pax Americana. After 1945 the US had a nuclear monopoly and could have controlled the world. Instead it rebuilt Germany and Japan, both of which became the most reliable American allies for the next eighty years. If AI dominance plays out similarly, this is a generationally positive story rather than a destabilizing one. The personal anecdote that closes the conversation is a friend whose daughter was diagnosed with a rare genetic condition. He spun up agents, identified a drug already on the market that addresses her mutation, and her life is immeasurably different because of AI. That is the upside.

    Thoughts

    The Anthropic eleven billion in a month framing is the kind of stat that resets priors. The right way to interpret it is not as a one off but as a measure of how fast value can compound when the underlying technology improves on a curve steeper than the ability of the rest of the economy to absorb it. The skeptical question is whether that ARR is durable or whether it is heavily tied to a customer base of other AI companies that are themselves on a single venture funded year of runway. The bullish answer is that frontier coding, frontier research, and frontier enterprise tasks are not going to stop being valuable, and Anthropic is the best at all three. Both can be true. The number is still extraordinary.

    The argument that TSMC discipline is the only thing preventing a bubble is the analytically tightest part of the conversation. The implied trade is to watch TSMC capacity additions like a hawk and to be more, not less, cautious if Intel Foundry or Samsung Foundry ever announce real share at the leading node. The Terafab thesis is more speculative but more interesting. If Elon’s talent recruiting playbook works and the Intel partnership gives Terafab a real seat at the table within five years, the geometry of the global semiconductor industry shifts in a way that is bullish for American manufacturing, bullish for power and water infrastructure in Texas, and ambiguous for TSMC itself.

    The Pareto frontier discussion deserves more attention than it usually gets. Pricing leadership in AI is not a vanity metric. It determines who can subsidize free tier usage, who can absorb compute shortages, who can ship cheaper enterprise plans, and ultimately whose model becomes the default for any given workload. Google losing per token leadership in nine months is one of the most under analyzed events in the sector and it explains a lot about why Anthropic and OpenAI are growing the way they are. If Google IO does not produce a leapfrog model, the implied verdict on TPU V8 design choices gets a lot harsher.

    The application layer destruction point is worth sitting with. Founders building on top of frontier models are competing in a world where the model itself moves faster than any moat they can build, where the model lab can absorb their niche if it gets interesting, and where the only protection is either deep token path integration or a niche so small the lab does not bother. That is a much harsher venture environment than the early SaaS era. The compensating opportunity is that one human can now run a hundred agents, so the ceiling on what a small team can build is correspondingly higher. The bet is that productivity per founder rises faster than competitive pressure from the labs. We will find out.

    The orbital compute pitch is the section that will polarize listeners. The naive read is that this is science fiction. The closer read is that every component (sun synchronous orbit, laser interconnect, twenty kilowatt satellite buses, ten thousand satellite manufacturing cadence, full rocket reusability) already exists. The remaining engineering problems are repair, maintenance, and radiator scale, all of which are real but tractable on a five to ten year horizon. The strategic implication is that the political and zoning ceiling on terrestrial data centers becomes less binding if orbital compute is a credible alternative for inference workloads. The investor implication is that being short the watts and cooling complex on a five year horizon is a real trade, not a meme.

    Watch the full conversation here.

  • Jensen Huang at Stanford CS153 Frontier Systems on Co-Design, Agentic Computing, Vera Rubin, Open Models, and the Million-X Decade That Reshaped AI Infrastructure

    https://www.youtube.com/watch?v=tsQB0n0YV3k

    NVIDIA CEO Jensen Huang returned to Stanford for the CS153 Frontier Systems class (the room nicknamed itself “AI Coachella”) to lay out, in raw form, how he thinks about the computer being reinvented for the first time in over sixty years. Across roughly seventy minutes of student questions he walks through the codesign philosophy that gave NVIDIA a million-x decade, the architectural through-line from Hopper to Grace Blackwell to Vera Rubin to Feynman, the case for open source foundation models, the realities of tokens per watt and MFU, energy demand running a thousand times higher, the China and export-control debate, and his own biggest strategic mistakes. Watch the full conversation on YouTube.

    TLDW

    Huang argues every layer of computing has changed: the programming model, the system architecture, the deployment pattern, the economics. Co-design across CPUs, GPUs, networking, storage, switches and compilers gave NVIDIA roughly a million-x speed-up over ten years versus the ten-x Moore’s Law era, and that headroom is what let researchers say “just train on the whole internet.” Hopper was built for pre-training, Grace Blackwell NVLink72 for inference and reasoning (50x over Hopper in two years), Vera Rubin is built for agents that load long memory, call tools and need a low-latency single-threaded CPU bolted directly to the GPU, and Feynman extends that to swarms of agents that spawn sub-agents. Open weights matter because safety, sovereignty (230-plus languages no one else will fund) and domain models for biology, autonomy, robotics and climate need a foundation that NVIDIA is willing to seed. Compute is not really the scarce resource (Huang says place the order and the chips ship), the broken thing is institutional budgeting that can’t put a billion dollars into a shared university supercomputer. Energy demand is heading a thousand times higher and this is finally the moment market forces alone will fund sustainable generation. On geopolitics he rejects the GPUs-as-atomic-bombs framing and warns America will end up like its telecom industry if it cedes two thirds of the world. On career he advises seeking suffering on purpose. On strategy he says observe, reason from first principles, build a mental model, work backwards, minimize opportunity cost, maximize optionality.

    Key Takeaways

    • The computing model has been substantially unchanged since the IBM System 360, sixty-plus years ago. Huang’s first computer architecture book was the System 360 manual. AI is the first true reinvention.
    • Old computing was pre-recorded retrieval. New computing is generated, contextually aware and continuous. Cloud was on-demand. Agentic systems run continuously.
    • Codesign is NVIDIA’s central thesis. Inherited from the Hennessy and Patterson RISC era at Stanford, extended across CPUs, GPUs, networking, switches, storage, compilers and frameworks all optimized together.
    • The result of full-stack codesign: roughly 1,000,000x faster compute over ten years, versus a generous 10x to 100x for Moore’s Law in the same period. Dennard scaling effectively ended a decade ago.
    • That million-x speed-up is what unlocked “train on all of the internet” as a realistic AI strategy.
    • After GPT, Huang says it was obvious thinking was next. Reasoning is just generating tokens consumed internally, then using tools is generating tokens consumed externally. Agentic systems followed predictably.
    • Education needs AI baked into the curriculum, not just taught as a subject. Pre-recorded textbooks cannot keep pace with knowledge being generated in real time.
    • Huang says he cannot learn anymore without AI. He has the AI read the paper, then read every related paper, then become a dedicated researcher he can interrogate.
    • Mead and Conway and the first-principles methodology of semiconductor design are still worth learning even though most of the scaling tricks have been exhausted.
    • NVIDIA itself is one of the largest consumers of Anthropic and OpenAI tokens in the world. One hundred percent of NVIDIA engineers are now agentically supported. Huang recommends Claude and similar tools by name and says open-source downloads will not match the integrated product harness.
    • NVIDIA still invests heavily in open foundation models because language and intelligence represent the codification of human knowledge. Five pillars: Nemotron (language), BioNeMo (biology), Alphamayo (autonomous vehicles), Groot (humanoid robotics) and a climate science model (mesoscale multiphysics).
    • Sovereign language models matter. Roughly 230 world languages will never be a top priority for a commercial frontier lab. Nemotron is near-frontier and fully fine-tunable so any country can adapt it.
    • Safety and security require open weights. You cannot defend against or audit a black box. Transparent systems let researchers interrogate models and let defenders deploy swarms.
    • The future of cyber defense is not bigger-model-versus-bigger-model. It is trillions of cheap fast small models like Nemotron Nano surrounding the threat.
    • Domain models fuse language priors with world models. Alphamayo learned to drive safely on a few million miles instead of billions because it can reason like a human about the road.
    • MFU (Model Flops Utilization) is a misleading metric. Huang says he wants low MFU, because that means he over-provisioned every resource and never gets pinned by Amdahl’s law during a spike.
    • The xAI Memphis cluster running at 11 percent MFU is not necessarily a failure mode. In disaggregated prefill plus decode inference you can deliver very high tokens per watt with very low MFU.
    • The right metric is performance, ultimately tokens per watt as a proxy for intelligence per watt, and even that needs adjustment because not all tokens are equal. Coding tokens are worth more than other tokens.
    • Hopper was designed for pre-training. NVIDIA chose to build multi-billion-dollar systems when the largest existing scientific supercomputer cost $350 million, with no proven customer base. It worked.
    • Grace Blackwell NVLink72 was designed for inference, especially the high-memory-bandwidth decode phase. It is the world’s first rack-scale computer and delivered a 50x speed-up over Hopper in two years, against an expected 2x from Moore’s Law.
    • Vera Rubin is designed for agents. Long-term memory wired into storage and into the GPU fabric, working memory, heavy tool use, and Vera, a CPU optimized for low-latency multi-core single-threaded code so a multi-billion-dollar GPU system does not stall waiting on a slow tool call.
    • Feynman is being shaped for swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that demands a new compute pattern.
    • Tokens per watt improved 50x in one generation. Compounding energy efficiency is the lever NVIDIA controls directly.
    • Total compute energy demand is heading roughly a thousand times higher than today, possibly two orders of magnitude beyond that. Huang says he would not be surprised if the estimate is low.
    • For the first time in history, market forces alone are enough to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make sustainable energy investment rational.
    • Copper interconnect is becoming a bottleneck. Photonics is moving from optional to structural inside racks and across them.
    • Comparing NVIDIA GPUs to atomic bombs, Huang says, is a stupid analogy. A billion people use NVIDIA GPUs. He advocates them to his family. He does not advocate atomic bombs to anyone.
    • If the United States cedes two thirds of the global market to competitors on policy grounds, the American technology industry will end up like American telecommunications, which was policied out of existence.
    • Huang directly rejects AI doom-by-singularity narratives. It is not true that we have no idea how these systems work. It is not true that the technology becomes infinitely powerful in a nanosecond. He calls the rhetoric irresponsible and harmful to the field students are about to enter.
    • On Stanford specifically: if the university president places an order, NVIDIA will deliver the chips. The bottleneck is that no university department has a billion-dollar compute budget because budgeting is fragmented across grants. Stanford’s $40 billion endowment is more than enough to fix that.
    • “It’s Stanford’s fault” is meant as empowerment. If something is your fault, you can solve it.
    • Career advice: do not optimize purely for passion. Most people do not yet know what they love. Pick the job in front of you and do it as well as possible. Even as CEO, Huang says, 90 percent of the work is hard and he suffers through it.
    • Suffering on purpose builds the muscle of resilience. When the company, the team or the family needs you to be tough, that muscle has to already exist.
    • NVIDIA’s first generation of products was technically wrong in nearly every dimension: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point. The strategic recovery, not the technology, taught Huang the lessons that have lasted decades.
    • The biggest clean strategic mistake Huang names is the move into mobile chips (Tegra). It grew to a billion dollars then went to zero when Qualcomm’s modem dominance shut NVIDIA out of the 3G to 4G transition. The recovery into automotive and robotics (the Thor chip is the great great great grandson of that mobile lineage) was real, but Huang refuses to rationalize the original choice.
    • Forecasting framework: observe, reason from first principles, ask “so what” and “what next” until you have a mental model of the future, place your company inside that model, then work backwards while minimizing opportunity cost and maximizing optionality.
    • Best part of the CEO job: living at the intersection of vision, strategy and execution surrounded by people capable enough to make ambitious visions real. Worst part: the responsibility for everyone who joined the spaceship, especially in the near-death moments NVIDIA had four or five times early on.
    • Underrated insider note: Huang’s first apple pie with cheese, first hot fudge sandwich and first milkshake all happened at Denny’s. The Superbird, the fried chicken and a custom Superbird-style ham and cheese with tomato and mustard are his order.

    Detailed Summary

    Computing reinvented from the ground up

    Huang frames the moment as the first true rewrite of the computer in sixty-plus years. From the IBM System 360 forward, the mental model of writing code, running code, taking a computer to market and reasoning about applications stayed roughly constant. AI changes the programming model itself. Software is no longer a compiled binary running deterministically on a CPU. It is a neural network running on a GPU producing generated, contextual, real-time output. That cascades into how companies are organized, what tools developers use, what the network and storage stack look like, and what an application is even allowed to do. Robo-taxis, he notes, are an application no one would have attempted before deep learning unlocked perception.

    Codesign and the million-x decade

    Codesign is the philosophical center of the talk. Huang traces it to the RISC work of John Hennessy at Stanford, where simpler instruction sets won by being co-designed with the compiler rather than maximally optimized in isolation. NVIDIA extends the principle across every layer simultaneously: GPU architecture, CPU architecture, NVLink and NVSwitch fabrics, photonic interconnects, networking silicon, storage paths, CUDA libraries, frameworks and ultimately the model design. The numbers Huang gives are arresting. Moore’s Law in its prime delivered roughly 100x per decade. By the time Dennard scaling broke, real-world gains had compressed to roughly 10x. NVIDIA’s codesigned stack delivered between 100,000x and 1,000,000x over the same ten-year window. That non-linear speed-up is, in Huang’s telling, the precondition for modern AI: it is what allowed researchers to stop curating training sets and just feed the entire internet to the model.

    Education has to fuse first principles with AI tools

    Asked how curriculum should evolve, Huang argues AI must be integrated into the learning process, not just taught about. He recalls Hennessy writing his textbook by hand a chapter a week while Huang was a student, and says pre-recorded textbooks cannot keep up with the rate at which AI generates new knowledge. He describes his own learning workflow: hand the paper to an AI, then have it read the entire surrounding literature, then treat the AI as a dedicated researcher who can be interrogated. At the same time he defends the classics. Mead and Conway are still the foundation. Most modern semiconductor scaling tricks have been exhausted, but knowing where the field came from sharpens judgment when designing what comes next.

    Open source and the five domain pillars

    Huang gives one of the most detailed public accounts of why NVIDIA invests so heavily in open foundation models even while being a top customer of closed labs. He recommends Claude and OpenAI by name for production coding work, and says 100 percent of NVIDIA engineers are now agentically supported. The open-weights case rests on three legs. First, language is the codification of intelligence, and there are at least 230 languages that no commercial lab will ever prioritize. Nemotron is built near frontier and released so any country or community can fine-tune it. Second, the same representation-learning approach has to be replicated in domains where the data is not internet text, so NVIDIA seeded BioNeMo for biology, Alphamayo for autonomy, Groot for humanoid robotics and a climate model for mesoscale multiphysics. The economics of those fields would never produce a foundation model on their own. Third, safety and security require transparency. A black box cannot be defended or audited, and the future of cyber defense is not bigger-model-versus-bigger-model but swarms of cheap fast small models like Nemotron Nano surrounding the threat.

    MFU is the wrong metric, tokens per watt is closer

    A student raises the leaked memo that the xAI Memphis cluster is running at 11 percent Model Flops Utilization. Huang flips the framing. He says he would rather be at low MFU all the time, because that means he over-provisioned flops, memory bandwidth, memory capacity and network capacity. Bottlenecks shift constantly, so over-provisioning across every dimension is what lets the system absorb a spike without getting pinned by Amdahl’s law. In disaggregated inference, where prefill and decode are physically separated and decode is bandwidth-bound rather than flop-bound, NVLink72 can deliver extremely high tokens per watt while reporting very low MFU. Huang argues the right framing is performance, and ultimately tokens per watt as a rough proxy for intelligence per watt, adjusted for the fact that not all tokens are equal. A coding token is worth more than a generic token.

    Hopper, Grace Blackwell NVLink72, Vera Rubin, Feynman

    Huang gives the clearest public framing of NVIDIA’s roadmap as a sequence of architectural answers to evolving compute patterns. Hopper was built for pre-training, at a moment when NVIDIA chose to build multi-billion-dollar machines while the largest scientific supercomputer in the world cost $350 million and the marketplace for such systems was, on paper, zero. Grace Blackwell NVLink72 was the answer to inference and reasoning: a rack-scale computer that ganged 72 GPUs together because decode needs aggregate memory bandwidth far beyond a single chip. The generation-over-generation speed-up was 50x in two years, twenty-five times what Moore’s Law would have delivered. Vera Rubin is being built explicitly for agents. Agents load long-term memory from storage that has to be wired directly into the GPU fabric, they use working memory, they call tools that run on a CPU, and they wait. So the CPU has to be Vera, optimized for low-latency single-threaded code, because the multi-billion-dollar GPU system cannot afford to idle waiting on a slow tool call. Feynman extends the pattern to swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that will demand its own compute pattern.

    Energy demand and the grid

    Huang’s energy projection is one of the most aggressive numbers in the talk. NVIDIA can compound tokens per watt by 50x per generation through codesign, but the total compute demand is heading roughly a thousand times higher, and Huang says he would not be surprised if the real figure is one or two orders of magnitude beyond that. The reason is structural: future computing is generative and continuous, not pre-recorded and on-demand. The good news, he argues, is that this is the best moment in the history of humanity to invest in sustainable generation. Market forces alone are now sufficient to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make the math work.

    Adversarial countries, export controls and the telecom warning

    This is the segment where Huang is visibly fired up. He attacks the GPUs-as-atomic-bombs framing on its face. NVIDIA GPUs power medical imaging, video games and soy sauce delivery. A billion people use them. He advocates them to his family. The analogy collapses at the first comparison. He attacks the second framing, that American companies should not compete abroad because they will lose anyway, as a self-fulfilling defeat. Competition makes the company better. The third framing, that depriving the rest of the world of general-purpose computing benefits the United States, also fails on first principles: it benefits one or two American companies at the cost of an entire industry. The cautionary parallel is telecommunications. The United States once had a leading position in telecom fundamental technology and policied itself out of it. Huang’s worry, voiced explicitly to a room of CS students, is that they will graduate into a shell of a computer industry if the same path is repeated.

    AI doom and rational optimism

    In the same arc Huang rejects the science-fiction framing of AI as a singularity that arrives suddenly on a Wednesday at 7pm and ends civilization. He calls those claims irresponsible, says they are not true, and points out that the people advancing them are believed by audiences who then make policy on that basis. It is not true that no one understands how these systems work. It is not true that intelligence becomes infinitely powerful instantaneously. It is not true that there is no defense. His framing, which the host echoes as “rational optimism,” is that the goal is to create a future where people care about computers because the technology students are learning is worth mastering.

    Stanford’s compute problem is Stanford’s fault

    A student presses on the scarcity of compute for independent researchers, startups and universities inside the United States. Huang’s answer is sharp: there is no shortage. Place the order and the chips will arrive. The actual broken thing is institutional. University grants are fragmented across departments. No researcher can raise enough on a single grant to fund a billion-dollar shared cluster, and no one shares. He compares it to showing up at the grocery store demanding a billion dollars of tomatoes today. The solution is planning, aggregation and a campus-scale supercomputer, the way Stanford once built the linear accelerator. The endowment is $40 billion. Pulling a billion off it, contracting cloud capacity and giving every student and researcher AI supercomputer access is, in Huang’s view, obviously doable. When he says “it is Stanford’s fault” the host laughs, but Huang clarifies: if it is your fault you have the power to fix it.

    Career, suffering and resilience

    Asked how a CS student should spend the next few years, Huang pushes back on the standard “follow your passion” advice. Most people do not know what they love yet, because no one knows what they do not know. The bar of demanding joy from every working day is too high. Whatever the job is, do it as well as you can. Even as CEO of NVIDIA he says he genuinely loves about 10 percent of his work. The other 90 percent is hard and he suffers through it. He recommends suffering on purpose, because resilience is a muscle that only builds under load, and when the company, the team or the family needs that muscle, it has to already exist. Earlier in his life that meant cleaning toilets and busing tables at Denny’s. He does it today running a multi-trillion-dollar company.

    The biggest mistakes

    Huang separates technical mistakes from strategic mistakes. NVIDIA’s first generation of products was technically wrong in almost every way: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point inside. The company wasted two and a half years. But the strategic genius of the recovery, the reading of the market, the conservation of resources and the reapplication of talent, is what taught him strategy. The clean strategic mistake he names is mobile. NVIDIA’s Tegra line grew to a billion dollars of revenue and then collapsed to zero when Qualcomm’s modem dominance locked NVIDIA out of the 3G to 4G transition. Huang explicitly refuses the comforting rationalization that the Tegra effort fed the Thor automotive chip (“Thor is the great great great grandson”). The original decision, he says, was a waste of time. The lesson is to think one or two clicks further about whether a market is structurally winnable before committing the company.

    Forecasting under fog of war

    The final substantive exchange is on forecasting. Huang’s method has four steps. Observe what is actually happening (AlexNet crushing two decades of computer vision research in one shot, GPT producing reasoning by token generation). Reason from first principles about why it works. Ask “so what” and “what next” recursively until a mental model of the future emerges. Place the company inside that future and work backwards. Crucially, expect to be partly wrong. Some outcomes will absolutely happen, some will likely happen, some might happen, and the strategy has to be robust across that distribution. The real cost of any strategic choice is the opportunity cost of the alternatives you did not take, so the discipline is to minimize that cost and maximize optionality while letting the journey itself pay for the journey.

    Thoughts

    The most useful thing in this conversation is the explicit architectural mapping of compute patterns to chip generations. Hopper for pre-training. Grace Blackwell NVLink72 for inference, because decode is bandwidth-bound and a single chip cannot supply it. Vera Rubin for agents, because tool calls stall multi-billion-dollar GPU systems and so the CPU has to be optimized for low-latency single-threaded code. Feynman for swarms. That sequence is not marketing. It is a falsifiable thesis about where the bottleneck moves next, and every other infrastructure company should be measuring themselves against it. If Huang is right that swarms of sub-agents are the next dominant pattern, then the design pressure shifts from raw flops to fabric topology, memory hierarchy and storage-to-GPU latency. That has implications for everyone downstream, including the hyperscalers building competing accelerators.

    The MFU section is the most intellectually generous moment in the talk. The instinct in the AI ops community has been to chase MFU as if it were a virtue. Huang argues, persuasively, that low MFU is consistent with high tokens per watt in a disaggregated inference setup, and that bottlenecks rotate fast enough that over-provisioning every resource is the rational design. That reframing matters because it changes what “scarce” means. Compute is not scarce in the way the discourse treats it. What is scarce is a coherent system designed end-to-end. The xAI 11 percent number, in that frame, is not embarrassing. It is the natural reading of a workload that is mostly decode.

    The Stanford segment is the part most likely to be quoted out of context. “It’s Stanford’s fault” is a deliberately provocative line, but the underlying claim is correct and load-bearing. Compute is not gated by NVIDIA refusing to ship chips. It is gated by the fact that fragmented grant funding cannot aggregate into the billion-dollar order that NVIDIA can fulfill. The implication is that universities and national labs need a structural change in how they pool capital for compute, and that the current model of every researcher buying a handful of cards is genuinely obsolete. Huang’s nudge about pulling a billion off the endowment is concrete enough to be acted on, and other major research universities should read this segment as a direct prompt.

    The geopolitical segment is the highest-stakes one. The telecommunications comparison is correct as a historical pattern, and Huang is one of the very few executives in a position to deliver that warning credibly. The unresolved tension is that the argument applies symmetrically. If American AI dominance is built by selling globally, that includes selling into adversarial states, and the policy question is where the line falls. Huang does not answer that question. He attacks the framing that lets the question be answered badly. That is a meaningful contribution to the discourse even if it does not resolve the underlying tradeoff.

    The career advice section is the part the social-media clips will mishandle. “Seek suffering” reads as macho when extracted. In context it is a specific operational claim about how resilience compounds, and it is paired with the Tegra story where Huang himself paid the price of not thinking one more click ahead. That kind of self-implication is rare in CEO talks, and it is the reason the talk is worth listening to in full rather than only reading the recap.

    Watch the full Stanford CS153 Frontier Systems conversation with Jensen Huang here.

  • Krishna Rao on Anthropic Going From 9 Billion to 30 Billion ARR in One Quarter and the Compute Strategy Powering Claude

    Krishna Rao, Chief Financial Officer of Anthropic, sat down with Patrick O’Shaughnessy on Invest Like the Best for one of the most detailed public looks yet at the operating engine behind Claude. He covers how Anthropic compounded from $9 billion of run rate revenue at the start of the year to north of $30 billion by the end of Q1, why he spends 30 to 40 percent of his time on compute, the playbook for buying gigawatts of AI infrastructure across Trainium, TPU, and GPU platforms, how Anthropic prices its models, why returns to frontier intelligence keep climbing, and what the Mythos release tells us about the cyber capabilities of the next generation of Claude.

    TLDW

    Anthropic is running the most compute fungible frontier lab in the world, with active deployments across AWS Trainium, Google TPU, and Nvidia GPU, and an internal orchestration layer that lets a chip serve inference in the morning and run reinforcement learning the same evening. Krishna Rao explains the cone of uncertainty that governs gigawatt scale compute procurement, the floor Anthropic refuses to drop below on model development compute, the Jevons paradox unlock from cutting Opus pricing, the 500 percent annualized net dollar retention from enterprise customers, the layer cake of long term deals with Google, Broadcom, Amazon, and the recent xAI Colossus tie up in Memphis, the phased release of the Mythos model in response to spiking cyber capabilities, the internal use of Claude Code to produce statutory financial statements and run a Monthly Financial Review skill, and why the team believes scaling laws are alive and well. The interview also covers fundraising history through Series D and Series E, the $75 billion already raised plus another $50 billion coming, talent density beating talent mass during the Meta poaching wave, and Rao’s belief that biotech and drug discovery represent the most exciting frontier for AI.

    Key Takeaways

    • Anthropic entered the year with about $9 billion of run rate revenue and ended the first quarter with north of $30 billion of run rate revenue, a more than 3x leap driven by model intelligence gains and the products built around them.
    • Compute is described as the lifeblood of the company, the canvas everything else is built on, and the most consequential class of decisions Rao makes. Buy too much and you go bankrupt. Buy too little and you cannot serve customers or stay at the frontier.
    • Rao spends 30 to 40 percent of his time on compute, even today, and the leadership team meets repeatedly on both procurement and ongoing compute allocation.
    • Anthropic is the only frontier language lab actively using all three major chip platforms in production: AWS Trainium, Google TPU, and Nvidia GPU. It is also the only major model available on all three clouds.
    • Flexibility is the central design principle. Anthropic builds flexibility into the deals themselves, into the orchestration layer that maps workloads to chips, and into compilers built from the chip level up.
    • The cone of uncertainty frames procurement. Small differences in weekly or monthly growth compound into wildly different two year outcomes, so the team plans across a range of scenarios rather than a single point estimate, and ranges toward the upper end while protecting downside.
    • Compute allocation across the company sits in three buckets: model development and research, internal employee acceleration, and external customer serving. A non negotiable floor protects model development even when customer demand is tight.
    • Anthropic estimates that if it cut off internal employee use of its own models, the freed compute could serve billions of dollars of additional revenue. It chooses not to, because internal use compounds into better future models.
    • Intelligence is multi dimensional, not a single IQ score. Anthropic measures real world capability through customer feedback, long horizon task performance, tool use, computer use, and speed at agentic tasks, not just leaderboard benchmarks that have largely saturated.
    • Each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers both capability improvements and an efficiency multiplier on token processing. New models often serve customers at a fraction of the prior cost while doing more.
    • Reinforcement learning is described as inference inside a sandbox with a reward function, so model efficiency gains directly improve internal RL throughput. The flywheel is tightly coupled.
    • Over 90 percent of code at Anthropic is now written by Claude Code, and a large share of Claude Code itself is written by Claude Code.
    • Anthropic shipped roughly 30 distinct product and feature releases in January and the pace has accelerated since.
    • Scaling laws, in Anthropic’s internal data, are alive and well. The team holds itself to a skeptical scientific standard and still does not see them slowing down.
    • Anthropic recently signed a 5 gigawatt deal with Google and Broadcom for TPUs starting in 2027, plus an Amazon Trainium agreement for up to 5 gigawatts, totaling more than $100 billion in commitments. A significant portion lands this year and next year.
    • A new partnership for capacity at the xAI Colossus facility in Memphis was announced just before the interview, aimed at expanding consumer and prosumer capacity.
    • Pricing has been remarkably stable across Haiku, Sonnet, and Opus. The biggest deliberate change was lowering Opus pricing, which produced a textbook Jevons paradox: consumption rose far faster than the price drop, and the new Opus 4.6 and 4.7 slot in at the same price point.
    • Mythos is the first model Anthropic chose to release in a phased way because of a sharp spike in cyber capability. In an open source codebase where a prior model found 22 security vulnerabilities, Mythos found roughly 250.
    • The Mythos release framework focuses on defensive use first, expands access over time, and is presented as a template for future capability spikes.
    • Anthropic now sells to 9 of the Fortune 10 and reports net dollar retention above 500 percent on an annualized basis. These are not pilots. Rao describes signing two double digit million dollar commitments during a 20 minute Uber ride to the studio.
    • The platform strategy is mostly horizontal. Anthropic will go vertical with offerings like Claude for Financial Services, Claude for Life Sciences, and Claude Security where it can demonstrate the model’s capabilities, but expects most application value to accrue to customers building on top.
    • Investors raised over $75 billion in equity since Rao joined, with another $50 billion in commitments tied to the Amazon and Google deals. Capital intensity is real, but the raises fund the upper end of the cone of uncertainty more than they fund current losses.
    • The Series E close coincided with the day the DeepSeek news broke, forcing investors to reassess their AI thesis in real time. Anthropic closed the round anyway.
    • Inside finance, Claude now produces statutory financial statements for every Anthropic legal entity, with a human checker. A library of more than 70 finance specific skills underpins workflows.
    • A custom Monthly Financial Review skill produces a 90 to 95 percent ready monthly close report, so leadership discussion shifts from reconciling numbers to debating implications.
    • An internal real time analytics platform called Anthrop Stats compresses weekly insight cycles from hours to about 30 minutes.
    • The biggest token user inside Anthropic’s finance team is the head of tax, focused on tax policy engines and workflow automation. The most senior people, not the youngest, are leading internal adoption.
    • Talent density beats talent mass. When Meta and others ran aggressive offer waves, Anthropic lost two people while peer labs lost dozens.
    • All seven Anthropic co founders remain at the company, as does most of the first 20 to 30 employees, which Rao credits to a collaborative, transparent, debate friendly culture and a real culture interview that can veto otherwise top tier candidates.
    • Dario Amodei holds an open all hands every two weeks, writes a short prepared document, and takes unscripted questions from anyone at the company.
    • AI safety investments in interpretability and alignment have a commercial side effect. Looking inside the model helps Anthropic build better models, and enterprises selling sensitive workloads want to trust the lab they hand customer data to.
    • Anthropic explicitly identifies as America first in its approach to model development, and engages closely with the US administration on capability releases such as Mythos.
    • The longer term product vision is the virtual collaborator: an agent with organizational context, access to the company’s tools, persistent memory, and the ability to work on ideas, not just tasks, over long horizons.
    • CoWork, Anthropic’s extension of the Claude Code paradigm into general knowledge work, is being adopted faster than Claude Code itself when indexed to the same point in its launch curve.
    • Anthropic’s product teams ship daily, with a fleet of agents working across the company on specific tasks. Everyone effectively becomes a manager of agents.
    • The dominant downside risks to Anthropic’s high end forecast are slower customer diffusion of model capability into real workflows, scaling laws flattening unexpectedly, and Anthropic losing its position at the frontier.
    • Rao is most excited about biotech and healthcare outcomes, especially the prospect that AI could push drug discovery and lab throughput up 10x or 100x, turning currently incurable diagnoses into treatable ones within a patient’s lifetime.

    Detailed Summary

    Compute as Lifeblood and the Cone of Uncertainty

    Rao opens with the claim that compute is the most important resource at Anthropic, and the most consequential decision class in the company. You cannot buy a gigawatt of compute next week. You have to anticipate demand a year or two in advance, and the cost of being wrong in either direction is high. Buy too much and the unit economics collapse. Buy too little and you cannot serve customers or stay at the frontier, which are described as the same failure mode. To navigate this, the team uses a cone of uncertainty rather than point estimates. Small differences in weekly growth compound into vastly different two year outcomes, and Anthropic tries to position itself toward the upper end of that cone while preserving optionality. Rao notes he has had to consciously break a lifetime of linear thinking and force himself into exponential models.

    Three Chip Platforms, One Orchestration Layer

    Anthropic uses Amazon’s Trainium, Google’s TPUs, and Nvidia’s GPUs fungibly. That was not free. Adopting TPUs at scale started around the third TPU generation, when outside observers thought it was a strange choice. Anthropic invested years into compilers and orchestration so workloads can flow across chips by generation and by job type. The team works deeply with Annapurna Labs at AWS to influence Trainium roadmaps because Anthropic stresses these chips harder than almost anyone. The result is what Rao believes is the most efficient utilization of compute across any frontier lab, with a dollar of compute going further inside Anthropic than anywhere else.

    Three Buckets and the Model Development Floor

    Compute gets allocated across model development, internal acceleration of employees, and customer serving. The conversations are collaborative rather than zero sum, but there is a hard floor on model development that the company refuses to cross even if it makes customer demand harder to serve in the short term. The thesis is simple. The returns to frontier intelligence are extremely high, especially in enterprise, so cutting model investment to chase near term revenue is a bad trade. Internal employee use is also explicitly protected. Rao notes that diverting that internal usage to external customers would unlock billions of additional revenue today, but the compounding benefit of accelerating researchers and engineers outweighs that.

    Intelligence Is Multi Dimensional

    Rao pushes back hard on the IQ framing of model progress. Benchmarks saturate quickly, and the real signal comes from how customers actually use the models. Anthropic looks at long horizon task completion, tool use, computer use, and time to result on agentic tasks. Two equally capable agents who differ only in speed produce dramatically different value, because the faster one compounds into more attempts and more outcomes. Frontier model leaps are also fuel efficient. The sedan to sports car analogy breaks down because each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers a step up in capability and a multiplier on per token efficiency.

    From 9 Billion to 30 Billion ARR in One Quarter

    The headline number for the quarter is a leap from about $9 billion of run rate revenue to over $30 billion, accomplished without onboarding a corresponding step up in compute, because new compute lands on ramps locked in 12 months prior. Rao attributes the leap to model capability gains, products that surface that intelligence in usable form factors, and an enterprise customer base that pulls more workloads onto Claude as each generation unlocks new use cases. Coding started the wave with Sonnet 3.5 and 3.6, and the same pattern is now playing out elsewhere in the economy.

    Recursive Self Improvement and Talent Density

    Over 90 percent of Anthropic’s code is now written by Claude Code, including most of Claude Code itself. Rao describes this as a structural reason to keep allocating internal compute to employees even when external demand is hungry. Recursive self improvement is not happening through models that need no humans. It is happening through researchers who set direction and use frontier models to compress months of work into days. Talent density beats talent mass. When Meta and other labs went after Anthropic researchers with very large packages, Anthropic lost two people while peer labs lost dozens.

    Procurement Strategy and the Layer Cake

    Compute lands as a layer cake. Last month Anthropic signed a 5 gigawatt TPU deal with Google and Broadcom starting in 2027, alongside an Amazon Trainium agreement for up to 5 gigawatts. The total is north of $100 billion in commitments. A new tie up with xAI’s Colossus facility in Memphis was announced just before the interview, intended for nearer term capacity to support consumer and prosumer growth. Anthropic evaluates near term and long term compute deals against the same set of variables: price, duration, location, chip type, and how efficiently the team can run it. The relationships are deeper than procurement. The hyperscalers are also distribution channels for the model.

    Platform First, Selective Vertical Bets

    Rao describes Anthropic as a platform first business, with most expected value accruing to customers building on the platform. The team will only go vertical when it can either demonstrate capabilities that are skating to where the puck is going, like Claude Code did before the models could fully support it, or when it wants to set a template for an industry vertical, as with Claude for Financial Services, Claude for Life Sciences, and Claude Security. He acknowledges that surprise capability jumps make customers anxious about the platform competing with them, and frames Anthropic’s mitigation as deeper partnerships, early access programs, and an emphasis on accelerating customer building rather than disintermediating it.

    Pricing, Jevons Paradox, and Return on Compute

    Pricing across Haiku, Sonnet, and Opus has been stable. The notable exception is Opus, which Anthropic deliberately repriced lower when launching Opus 4.5 because Opus class problems were being squeezed into Sonnet workloads. Efficiency gains made it possible to serve Opus profitably at the new level. The consumption response was a classic Jevons paradox, with usage rising far more than the price reduction would have predicted, and Opus 4.6 then slotted in at the same price with a capability bump. Margins are not framed as a per token markup. Compute is fungible across model development, internal acceleration, and customer serving, so Anthropic measures return on the entire compute envelope rather than software style variable cost per call.

    Fundraising, DeepSeek, and Capital Intensity

    Rao joined while Anthropic was closing its Series D, mid frontier model launch and during the FTX share liquidation. Investors initially questioned whether Anthropic needed a frontier model, whether AI safety and a real business could coexist, and why the sales team was so small. The Series E closed the same day the DeepSeek news broke, with markets violently re pricing AI in real time. Since Rao joined, Anthropic has raised over $75 billion, with another $50 billion tied to the Amazon and Google compute deals. The reason for the size of the raises is the cone of uncertainty, not current losses. Returns on compute today are described as robust.

    Mythos, Cyber Capability, and Phased Releases

    The Mythos release marks the first time Anthropic shipped a model under a deliberately phased rollout because of a specific capability spike. Cyber is the dimension that spiked. Where a prior model found 22 vulnerabilities in an open source codebase, Mythos found roughly 250. The defensive applications, automatically patching massive codebases, are genuinely valuable, but the offensive risk is real enough that Anthropic chose to release to a smaller group first and expand access over time. Rao positions this as a template for future capability spikes, not a permanent restriction. He also describes the relationship with the US administration as cooperative, including the Department of War interaction, with Anthropic supporting a regulatory framework that does not strangle innovation but takes responsibility seriously.

    Claude Inside Finance

    Anthropic’s finance team is one of the strongest internal case studies. Statutory financial statements for every legal entity are produced by Claude, with a human reviewer. A skill library of more than 70 finance specific skills underpins a Monthly Financial Review skill that drafts the monthly close at 90 to 95 percent ready, so leadership meetings shift from explaining the numbers to discussing what to do about them. An internal analytics platform called Anthrop Stats compresses weekly insight cycles from hours to 30 minutes. The biggest internal token user in finance is the head of tax, building policy engines, which Rao highlights as evidence that adoption is driven by the most senior people, not just younger engineers.

    Culture, Co Founders, and the Race to the Top

    Seven co founders should not, on paper, work as a leadership group. Rao argues it works because the culture was set early around collaboration, intellectual honesty, transparency, and humility. The culture interview is a real veto, not a checkbox. Dario Amodei runs an all hands every two weeks with a short written piece followed by unscripted questions, and decisions, once made, get clean alignment rather than residual politics. Anthropic frames its approach as a race to the top, where being a model for how to build the technology responsibly is itself a recruiting and retention advantage.

    The Virtual Collaborator and the Frontier Ahead

    The product vision Rao describes is the virtual collaborator. Not just a smarter chatbot, but an agent with organizational context, access to the company’s tools, memory, and the ability to work on ideas over long horizons. Coding was the first domain to feel this, but CoWork, Anthropic’s extension of the Claude Code pattern into general knowledge work, is being adopted faster than Claude Code was at the same age. Product development inside Anthropic already looks different. Teams ship daily, with fleets of agents working across the company, and individual humans increasingly act as managers of those fleets.

    Downside Risks and What Excites Him Most

    The three risks Rao names if asked to do a premortem on a softer year are slower customer diffusion of model capability into real workflows, scaling laws unexpectedly flattening, and Anthropic losing its frontier position to competitors. None of these are observed today, but he is unwilling to claim them with certainty. On the upside, he is most excited about biotech and healthcare. Lab throughput rising 10x or 100x, paired with AI assisted clinical workflows, could turn currently incurable diagnoses into treatable ones within a patient’s lifetime. That is the outcome he wants the technology to chase.

    Thoughts

    The most consequential structural point in this interview is the framing of compute as a single fungible resource pool measured by return on the entire envelope, not as a variable cost per inference call. That accounting shift, if you accept it, breaks most of the bear cases about AI lab unit economics. The bear argument almost always assumes that a token served to a customer is the only thing the chip did that day. Rao’s version is that the same fleet trains models in the morning, runs reinforcement learning at lunch, serves customers in the afternoon, and accelerates internal engineers in the evening. If even half of that is real, the right comparison is total compute spend versus total enterprise value created by the platform, and on that ratio Anthropic looks structurally strong rather than weak.

    The Jevons paradox on Opus pricing is the most actionable insight for anyone running an AI product. Most teams default to either chasing premium pricing on the newest model or undercutting to chase volume. Anthropic did something more disciplined: it left Sonnet and Haiku alone, dropped Opus when efficiency gains made it serveable, and watched aggregate usage rise faster than the price cut. The lesson is that frontier model pricing is not really a price problem. It is a capability access problem, and elasticity around the right tier is much higher than the standard SaaS playbook implies.

    The Mythos cyber jump deserves more attention than it has gotten. Going from 22 to 250 vulnerabilities found in the same codebase is the kind of capability discontinuity that genuinely changes the regulatory calculus. Anthropic is signaling that it can identify these discontinuities ahead of release and choose a deployment shape that respects them. Whether peer labs adopt similar discipline is the open question. Anthropic’s race to the top framing assumes they will be forced to. The competitive market may say otherwise.

    The hiring data point is the most underrated investor signal. Two departures while peer labs lost dozens, during the most aggressive talent war in tech history, is not a culture poster. It is a structural advantage that compounds every time another lab tries to buy its way to the frontier. Money can be matched. Conviction in the mission, transparent leadership, and a culture interview that can veto otherwise stellar candidates cannot. If you believe scaling laws hold, talent retention at this density is one of the few moats that actually scales with capital.

    Finally, the most interesting personal admission is that Krishna Rao, a finance leader trained at Blackstone and Cedar, is openly telling investors that linear thinking is the failure mode he had to break out of. The companies that pattern match this moment to prior technology waves are mispricing it, in both directions. The cone of uncertainty Anthropic uses internally is the right metaphor for everyone else too. If you are forecasting AI as if it is cloud in 2010, you are almost certainly wrong, and the magnitude of the error is much larger than it would be in any prior era.

    Watch the full conversation with Krishna Rao on Invest Like the Best here.