PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: AI infrastructure

  • Thomas Laffont of Coatue on the $4 Trillion AI IPO Wave: SpaceX, Anthropic, OpenAI, and Why the New Unicorn Economy Is Healthier

    Thomas Laffont, co-founder of the $55 billion hedge fund Coatue Management, made his All-In Podcast premiere with a data-dense walk through what he calls a once-in-a-generation moment for the unicorn economy. In front of Chamath Palihapitiya, Jason Calacanis, David Sacks, and David Friedberg, he argued that a roughly $4 trillion wave of private value is about to hit the public markets, led by SpaceX, Anthropic, and OpenAI, and that the new AI-driven unicorn economy is actually healthier than the one that came before it. You can watch the full presentation and Q&A on YouTube.

    TLDW

    Laffont presents Coatue’s slide deck on the state of the unicorn economy and argues it has rebalanced after the excesses of 2021. The average unicorn is up about 70 percent since September 2024, AI keeps taking a bigger share of all fundraising, and the model has shifted from many small unicorns to fewer companies each raising far more, with funding per unicorn up roughly 5x since 2021. He introduces a “Magnificent 8” private index (SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more) worth nearly $4 trillion that has crushed the public Mag 7, then shows that exits are finally thawing as SpaceX heads to an IPO in weeks and Anthropic confidentially files its S1. He lays out Coatue’s “CODE” framework for why SpaceX gets more valuable the more it launches, a counterintuitive finding that the odds of a 10x actually rise as companies get bigger (31 percent for $100 billion-plus centicorns), the explosive revenue ramp of OpenAI and Anthropic past Workday, ServiceNow, Adobe, Salesforce, and now the hyperscalers, a three-pillar map of where AI revenue comes from (consumer, ads, enterprise), and the AI memory thesis. The Q&A with Chamath and Calacanis digs into the power law, K-shaped outcomes, whether these valuations are disconnected from reality, the public market as the great antiseptic, and what happens when trillions in private value finally recycles back through GPs and LPs.

    Thoughts

    The most useful idea in the talk is not the $4 trillion headline, it is the cohort-health chart. Laffont splits unicorns into eras and shows that the pre-2021 cohort was healthy, roughly 80 percent had raised again or exited 20 quarters after minting, while the giant 2021 ZIRP cohort of 479 companies is stuck with under 20 percent doing either. That single comparison reframes the whole AI boom. The bullish read is that the 2024 AI cohort is small, concentrated, and cash-generative, so it looks more like the healthy pre-ZIRP group than the 2021 hangover. The bearish read is that we are watching the same movie with bigger numbers, and the test only comes when these companies face public markets. Laffont is honest that we do not yet know which cohort the AI class resembles, and that intellectual humility is what makes the deck credible rather than promotional.

    The SpaceX “CODE” framework is the sharpest analytical move of the presentation. Most people would assume a launch business gets cheaper per launch as it scales. Laffont shows the opposite, the market pays more per launch as cadence rises, and explains it as a phase change in business quality: from one-time government launch revenue, to a single recurring-revenue constellation, to multiple constellations, to a platform with optional upside in space data centers, the moon, and Mars. It is a clean way to think about any company that climbs from a project business to a platform business, and it applies far beyond rockets. The lesson for investors is that valuation can rationally expand even as unit economics look like they should compress, because the nature of the revenue underneath is changing.

    The counterintuitive 10x odds finding deserves more attention than it got in the room. Conventional wisdom says the bigger you are, the harder it is to grow, so a $100 billion company should be less likely to 10x than a $10 billion one. Coatue’s data says the reverse: centicorns have a 31 percent shot at a 10x, far higher than the 8 percent a unicorn has at becoming a decacorn. Laffont’s explanation is a filtering mechanism, every step up validates a compounding advantage and durability of earnings, so survivors are increasingly the kind of business that keeps compounding. This is essentially a quantitative restatement of quality investing, and it is the intellectual backbone of the LP strategy the besties tease out, just buy whoever reaches $100 billion and hold.

    Where the argument gets genuinely contested is valuation, and the panel does not let it slide. The pushback that “these are not fake companies” is true and important, OpenAI and Anthropic are growing faster than any software company in history, and Anthropic reportedly had a profitable month. But growth and reality do not settle the question of price when you are paying 50 to 100 times revenue for trillion-dollar private companies, as Bill Ackman pointed out earlier in the day. Laffont’s answer is the most grounded thing he says all session: the public market is the great antiseptic, it will not care about anyone’s slide deck, and he wants to see these names withstand short sellers and skeptics. That is the right posture. The deck is a thesis, not a verdict, and the verdict arrives roughly six months and one day after the IPOs, once passive flows and supply have washed through.

    The closing thread, that almost every sector is being transformed at once and we still do not have superintelligence, is the part worth sitting with. The risk in a presentation this bullish is treating the trend as destiny. The value is in the framing tools Laffont hands you, cohort health, phase-change business quality, the filtering odds, the three revenue pillars, and the antiseptic of public scrutiny. Use those to interrogate each name rather than to buy the index on faith, and the talk earns its premiere billing.

    Key Takeaways

    • Coatue Management is one of the most successful hedge funds of the last two decades with about $55 billion under management, and is raising roughly another billion dollars specifically to invest in AI.
    • The unicorn economy is up about 70 percent on average since September 2024, and the public market has made a similar move up over the same period.
    • The unicorn economy’s share of the NASDAQ rose significantly after 2015 but has plateaued in recent years, reflecting strong performance from public companies.
    • AI keeps increasing its wallet share of all venture fundraising, multiple years in a row now.
    • The composition of funding has changed. The unicorn “factory” peaked in the ZIRP era of 2021 and has normalized at a much lower level since.
    • Funding per unicorn has increased roughly 5x since 2021. There are fewer unicorns, and each one is raising more.
    • Cohort health, pre-ZIRP group: of about 73 unicorns, 20 quarters after minting roughly 80 percent had either raised a new round or exited, which is healthy.
    • Cohort health, 2021 group: of about 479 unicorns, 20 quarters in, fewer than 20 percent had exited or raised again. Far larger cohort, far worse outcomes.
    • The open question is which cohort the new 2024 AI cohort will resemble.
    • Funding is concentrating: the top 10 companies capture a large share, and it is a small number of AI companies, not all of them, with Anthropic and OpenAI raising massive rounds.
    • Laffont proposes a “Magnificent 8” private index: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more, spanning internet, AI, fintech, and space tech.
    • That private index represents almost $4 trillion of value and has crushed the traditional public Mag 7, with almost every name outperforming.
    • Exits are thawing. 2026 is on a good trend for cash returned versus consumed, not quite 2021 levels, with half a year still to go.
    • That trend does not yet include three imminent liquidity events: SpaceX (IPO expected in weeks) and Anthropic (confidentially filed its S1), whose combined value could exceed the prior decade of exits combined.
    • The ecosystem is far more balanced than when Laffont first presented at the 2024 All-In Summit, when it was consuming much more cash than it returned.
    • OpenAI and Anthropic revenue growth is unlike anything previously seen. Starting from January 2025, they passed Workday, then ServiceNow, then Adobe, then Salesforce, and are now bigger than Google Cloud and Azure.
    • On current forecasts, that revenue could pass AWS by the end of the year and exceed all of Microsoft by 2028.
    • Hyperscalers are not sitting still. The largest companies in the world are funding the disruption, investing unprecedented sums to enable the ChatGPT moment.
    • The SpaceX “CODE” framework: the number one driver correlated to SpaceX’s valuation is cadence of launches, and valuation per launch rises as launches increase.
    • Why per-launch value rises: business quality improves through phases, pre-constellation (one-time government revenue), initial ramp (one recurring-revenue constellation), scale (multiple constellations), and platform (space data centers, moon and Mars optionality).
    • Anthropic in particular is scaling like no company seen across the PC, internet, or mobile eras.
    • Counterintuitive 10x odds: a unicorn has about an 8 percent chance of becoming a decacorn, a decacorn has 8 to 13 percent odds of reaching $100 billion, but a centicorn ($100 billion-plus) has a 31 percent chance of a 10x.
    • Value creation has accelerated. It typically takes years to go from $500 billion to $1 trillion in market cap, yet recently three companies did it in one year and two did it in a matter of weeks.
    • Cerebras is the counterexample of slow success: years of dark periods and no new capital developing its technology, then a massive OpenAI contract that quintupled the company’s value ahead of its IPO.
    • Semiconductors are on a generational run, with the sector dramatically outperforming the index since the 2024 All-In Summit.
    • AI memory thesis: the more an AI system knows about you, the more useful it is, so memory per user could quintuple, which helps explain recent moves in memory companies.
    • Where the revenue is: the AI ecosystem is roughly $140 billion today, about $300 billion this year, and is expected to double in 2027.
    • Three revenue pillars: consumer (subscribers times ARPU), ads (about a quarter of Meta and Google ads are AI-enabled today, heading toward 100 percent and roughly $150 billion), and enterprise (tools like Claude Code and Codex inside businesses).
    • Disruption is hitting every sector: software, telco (Starlink-powered global phone calls), semis, energy (data centers reshaping Pennsylvania’s grid), auto (Ferrari’s electric and autonomous stumble), and consumer (GLP-1s reshaping food, alcohol, and wellness).
    • Final takeaways: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of not owning a winner is higher than ever, disruption is everywhere, and we do not even have superintelligence yet.
    • In the Q&A, both Anthropic and OpenAI publicly say they want to be public, and big outcomes now look likely to become liquid within roughly a 12-month window.
    • The valuation pushback: these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly even had a profitable month.
    • The public market is framed as the great equalizer and antiseptic, but with passive buying the true price discovery may not land on day one, more like six months and a day after listing.
    • A floated LP strategy: wait for whoever reaches $100 billion and concentrate capital there as the least brittle, quickest-return bet, tempered by the warning that valuations are disconnecting from any historical metric (50x to 100x revenue).
    • An open risk: with so much capital, OpenAI and Anthropic could rationally start a price war, the way ride-sharing and food-delivery players once did, though heavy infrastructure spend complicates it.

    Detailed Summary

    The unicorn economy has rebalanced after 2021

    Laffont opens by reframing a market many assume is frothy. The average unicorn is up about 70 percent since September 2024, and the public market has tracked a similar climb, so private and public value are moving together rather than diverging. The unicorn economy’s share of the NASDAQ rose sharply after 2015 and then plateaued, which he reads as a sign of how strong public companies have become. Underneath the headline, the structure of funding has changed. The 2021 ZIRP era was a unicorn factory that minted enormous numbers of companies, and that machine has since normalized to a much lower level. The result is a barbell: fewer new unicorns, but each raising far more, with funding per unicorn up roughly 5x since 2021. AI sits at the center of this, taking a steadily larger share of all venture dollars for several years running.

    Cohort health is the real story

    The deck’s most important slide measures the health of the ecosystem by cohort. The pre-ZIRP cohort, about 73 unicorns, looks healthy: 20 quarters after becoming unicorns, roughly 80 percent had either raised a new round or exited. The 2021 cohort tells the opposite story. It is enormous, about 479 unicorns, and 20 quarters in, fewer than 20 percent had raised again or exited. That contrast sets up the central question of the talk. A new 2024 cohort of AI companies is forming, and no one yet knows whether it will resemble the healthy pre-ZIRP group or the bloated, stuck 2021 group. Laffont’s framing leans optimistic because the AI cohort is small and concentrated, but he is careful not to declare the answer.

    The Magnificent 8 and a $4 trillion private index

    Funding is not just flowing to AI, it is flowing to a handful of AI names, with the top 10 capturing a large share and Anthropic and OpenAI raising the biggest rounds. From this concentration Laffont builds a private index he half-jokingly calls the Magnificent 8, a number he expects to shrink as companies go public. The members span sectors: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, and Anduril, covering internet, AI, fintech, and space tech. He says he would be comfortable owning that index for the next decade-plus. Collectively it represents almost $4 trillion of value and has outperformed the public Mag 7, with nearly every constituent beating that benchmark.

    Exits are thawing and a wall of liquidity is coming

    One of Laffont’s recurring concerns at past summits has been balance: the unicorn economy is great at consuming cash, but a healthy ecosystem must also return it. On that score 2026 is trending well, not quite 2021, but solid with half a year left. Crucially, that figure does not yet include three imminent events. SpaceX is expected to go public within weeks, and Anthropic confidentially filed its S1 the day of the talk. Adding those up, just a few companies could deliver more liquidity than the prior ten years combined. The takeaway is that the ecosystem that was dangerously out of balance in 2024 is now meaningfully more balanced, and improving.

    The revenue ramp past the hyperscalers

    The growth rates of OpenAI and Anthropic, Laffont argues, are unlike anything previously seen. Charting from January 2025, the leading AI labs passed Workday, then ServiceNow, then Adobe by year end, then Salesforce by January, and are now bigger than Google Cloud and Azure. On forecast, that revenue could surpass AWS by the end of the year and exceed all of Microsoft by 2028. He stresses that the hyperscalers are not passive bystanders, they are actively funding the disruption, pouring unprecedented capital into enabling the change that began with the ChatGPT moment.

    The SpaceX CODE framework

    Laffont devotes real time to how Coatue thinks about SpaceX. The single factor most correlated with SpaceX’s valuation is cadence of launches, which is intuitive for a launch business. The surprise is that valuation per launch has risen rather than fallen as cadence climbed. His explanation, the CODE framework, is that the quality of the business model improves the more SpaceX launches. In phase one, pre-constellation, you are simply proving rockets, with a few government customers and lumpy, unpredictable one-time revenue. In the initial ramp you stand up a constellation, which is an end market and a recurring-revenue business that grows with every satellite and subscriber. At scale you operate multiple constellations, and Laffont expects companies, governments, and militaries to want to own their own. Ultimately it becomes a platform, with new businesses layered on top, from space data centers to the optionality of the moon and Mars.

    Counterintuitive odds and the speed of value creation

    Coatue bucketed companies and asked the odds of a 10x within each. A unicorn has roughly an 8 percent chance of becoming a decacorn. A decacorn has 8 to 13 percent odds of reaching $100 billion. But a centicorn, $100 billion or more, has a 31 percent chance of a 10x, counting both public and private companies. The bigger you are, the better your odds, which inverts intuition. Laffont pairs this with the sheer speed of recent value creation. Going from $500 billion to $1 trillion in market cap normally takes years, yet three companies did it in a single year and two did it in a matter of weeks. He also offers Cerebras as the patient counterexample, a chip company that endured years of dark periods and no new capital before a massive OpenAI contract quintupled its value ahead of IPO, part of a broader generational run for semiconductors.

    AI memory and where the revenue actually comes from

    A throughline from the day’s other speakers is that the more an AI knows about you, the more useful it is, from your restaurant preferences to your work context. Laffont turns that into a thesis: memory per user could quintuple based on what these systems require, which helps explain recent moves in memory companies. He then tackles the most contested question, where is the revenue. He sizes the AI ecosystem at about $140 billion today, roughly $300 billion this year, and doubling in 2027, built on three pillars. Consumer is subscribers times ARPU. Ads are the pillar people forget, with about a quarter of Meta and Google ads already AI-enabled and penetration heading toward 100 percent, a roughly $150 billion opportunity. Enterprise is the breakthrough category, exemplified by tools like Claude Code and Codex operating inside businesses.

    Every sector is being transformed at once

    What makes this era different, Laffont says, is that nearly every sector is being transformed simultaneously. Software is obvious, but look at telco, where he believes Starlink will soon power a device that lets you make a phone call anywhere on earth, attacking the global telco and broadband profit pool with a better product. Compute is driving massive change in semis, data centers are reshaping the energy equation in places like Pennsylvania, and the auto business is being upended, as Ferrari’s stumble introducing electric and autonomous technology showed. In consumer, GLP-1 drugs are profoundly changing consumption of food and alcohol and the broader focus on wellness. His takeaways close the loop: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of missing them is higher than ever, disruption is everywhere, and superintelligence has not even arrived yet.

    The Q&A: power law, valuation, and the public market test

    Chamath and Jason Calacanis press Laffont on what this means for allocators. The recurring theme is the power law and K-shaped outcomes, with gains consolidating into a small number of companies. The positive side, Laffont notes, is that outcomes are enormous and increasingly liquid within a 12-month window, and both Anthropic and OpenAI say they want to be public. The hard part is valuation. The besties cite Bill Ackman’s framing that investors are making venture bets on trillion-dollar companies at 50 to 100 times revenue. Laffont’s pushback is that these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly had a profitable month. But he embraces the discipline ahead: the public market is the great antiseptic and will not care about anyone’s presentation, though with heavy passive buying, true price discovery may take roughly six months and a day rather than landing on day one. Asked whether the compounding is a market inefficiency or survivor bias, he declines to over-read a small sample, noting that Anthropic before Claude Code was a completely different company than after. The conversation closes on what happens when trillions recycle from GPs to LPs, the case for simply owning whoever crosses $100 billion, the risk of everyone crowding into three names, and the possibility of an eventual OpenAI versus Anthropic price war.

    Notable Quotes

    “So we have fewer unicorns that are each raising more.”

    Thomas Laffont, summarizing how funding per unicorn has risen roughly 5x since 2021

    “The reason is that the quality of SpaceX’s business model increases the more you launch.”

    Thomas Laffont, explaining the CODE framework and why valuation per launch rises with cadence

    “The winners are compounding faster than ever, which means the costs of not being in a winner are higher than ever.”

    Thomas Laffont, on the central risk of a power-law market

    “And by the way, we don’t even have super intelligence yet.”

    Thomas Laffont, closing his takeaways on how early the transformation still is

    “These are companies generating substantial revenue at scale that are growing faster than anything we’ve ever seen.”

    Thomas Laffont, pushing back on the idea that AI valuations rest on fake companies

    “It will be the great antiseptic. It will not care about my presentation.”

    Thomas Laffont, on the public market as the ultimate test for SpaceX, OpenAI, and Anthropic

    “Anthropic pre-cloud code was a completely different company than post cloud code.”

    Thomas Laffont, on why he won’t over-read a small sample of hyper-compounders

    “The power law rules our lives. All the great gains are being consolidated into small numbers of companies.”

    An All-In host, framing the Q&A on concentration in private markets

    This is a curated set of highlights. To hear the full presentation, the slide walkthrough, and the complete Q&A with Chamath and Jason Calacanis, watch the full conversation here.

    Related Reading

    • Coatue Management. Primary source for Thomas Laffont’s firm and the technology investing strategy behind the deck.
    • The All-In Podcast. The show and summit where Laffont made this premiere presentation.
    • Power law (Wikipedia). Background on the distribution Laffont and the hosts say governs venture and public-market returns.
    • The Magnificent Seven (Wikipedia). The public-market benchmark Laffont’s private “Magnificent 8” index is measured against.
    • Cerebras Systems. The AI chipmaker Laffont cites as the slow-grind IPO that was eventually transformed by a major OpenAI contract.
  • Anthropic Raises $65 Billion Series H at $965 Billion Valuation to Fund AI Safety Research and Massive Compute Expansion

    Anthropic has closed one of the largest private financing rounds in the history of technology, raising $65 billion in Series H funding at a $965 billion post-money valuation. The round, announced on May 28, 2026, lands as demand for Claude reaches what the company calls historic levels, and it positions Anthropic to pour fresh capital into safety research, compute, and the products that enterprises now lean on every day.

    TLDR

    Anthropic raised $65 billion in its Series H at a $965 billion post-money valuation, with Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital leading and Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN co-leading, alongside $15 billion in previously committed hyperscaler investment that includes $5 billion from Amazon. The raise follows Anthropic crossing $47 billion in run-rate revenue earlier in May 2026, and it funds three priorities named by CFO Krishna Rao: advancing safety and interpretability research, expanding compute capacity to meet growing Claude demand, and scaling the products and partnerships customers depend on. On the infrastructure side, the company is locking in gigawatt-scale compute through 5 gigawatts with Amazon, 5 gigawatts of TPU capacity via Google and Broadcom, GPU access from SpaceX, and supply from partners Micron, Samsung, and SK hynix, while Claude remains available across all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure, with widespread enterprise adoption across industries.

    Thoughts

    Start with the number that everyone will fixate on. A $965 billion post-money valuation against $47 billion in run-rate revenue is roughly 20 times sales, and for a company growing this fast that multiple is not the interesting part. The interesting part is that run-rate revenue crossed $47 billion earlier this month, which means the denominator is moving so quickly that the multiple is already stale. Investors are not pricing the business Anthropic is today. They are pricing the slope. A 20x multiple on a number that may double again inside a year is a very different bet than 20x on a flat line, and the lead names here (Altimeter, Dragoneer, Greenoaks, Sequoia, with Capital Group, Coatue, GIC and others co-leading) are not the kind of capital that pays for nostalgia. They are paying for the second derivative.

    But the real story is not the valuation. It is the compute. Read the infrastructure list carefully and you see the actual problem this round solves: 5 gigawatts from Amazon, 5 gigawatts of TPU capacity through Google and Broadcom, GPU access from SpaceX, and memory supply locked down with Micron, Samsung, and SK hynix. That is more than 10 gigawatts of secured power and silicon. The constraint on frontier AI in 2026 is no longer talent or even algorithms. It is electricity, land, and the multi-year queue for advanced packaging and high-bandwidth memory. You cannot buy 10 gigawatts on a quarterly basis. You reserve it years out, and you need the balance sheet to make those commitments credible. A $65 billion raise is, in plain terms, the down payment that lets Anthropic sign for capacity nobody can conjure on demand. The money is downstream of the megawatts.

    The diversification across that compute stack matters as much as the size. By splitting between Amazon’s infrastructure, Google and Broadcom’s custom TPUs, and SpaceX-supplied GPUs, Anthropic is refusing to become hostage to any single supplier’s roadmap or pricing. Custom silicon through Broadcom in particular is a bet on bending the cost curve, because the long-term economics of serving Claude at this scale depend on dollars per token, not just on raw availability. Anyone who has watched cloud lock-in play out over the last decade understands the move. Optionality at the hardware layer is leverage, and leverage is what keeps margins from being dictated by whoever owns the only fab slot you can reach.

    It is worth pausing on the fact that the round explicitly funds safety and interpretability research alongside scaling, and not as a footnote. Most companies treat safety spend as a cost center to be minimized once growth kicks in. Naming it first, ahead of compute and products, is a statement about where Anthropic believes its durable advantage sits. If models keep getting more capable, the binding constraint on deployment inside regulated industries (finance, healthcare, government) becomes trust, not intelligence. Interpretability is the work that turns a black box into something an enterprise risk committee can actually sign off on. Framed that way, safety research is not philanthropy subtracted from the bottom line. It is the thing that unlocks the most lucrative and defensible parts of the market, and pairing it with the scaling budget is the tell.

    Finally, look at distribution. Claude now ships on all three major clouds at once: AWS, Google Cloud, and Microsoft Azure. In a market where most frontier labs are tethered to a single hyperscaler, being available everywhere enterprises already run their workloads is a structural edge. It removes the procurement friction of asking a customer to adopt a new vendor relationship, and it means Anthropic competes on the merits of the model rather than on which cloud a buyer happened to standardize on years ago. Combine that omnipresent distribution with the compute reservations and the explicit safety mandate, and the shape of the strategy is clear. This is not a company buying time. It is a company buying the three things that actually compound: capacity that cannot be rushed, trust that cannot be faked, and reach into every place where work already happens.

    Key Takeaways

    • Anthropic raised $65 billion in its Series H funding round, one of the largest private financings in the history of the technology industry.
    • The round set Anthropic’s post-money valuation at $965 billion, placing the company within reach of the $1 trillion mark.
    • Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital led the Series H round.
    • Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN served as co-leads on the investment.
    • The new capital builds on $15 billion in previously committed hyperscaler investments, which includes $5 billion from Amazon.
    • Anthropic crossed $47 billion in run-rate revenue earlier in May 2026, reflecting the surging commercial demand for Claude.
    • A core priority for the funding is to advance Anthropic’s safety and interpretability research.
    • The company will use the capital to expand compute capacity in order to meet growing demand for Claude.
    • Anthropic plans to scale the products and partnerships that customers depend on across its business.
    • CFO Krishna Rao said the funding will help Anthropic serve the historic demand it is experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.
    • Amazon is providing 5 gigawatts of compute capacity as part of Anthropic’s infrastructure expansion.
    • Google and Broadcom are supplying 5 gigawatts of TPU capacity to power Claude’s growth.
    • SpaceX is contributing GPU access to Anthropic’s compute footprint.
    • Micron, Samsung, and SK hynix are partnering with Anthropic on memory and infrastructure to support its scaling needs.
    • Claude is available on all three major cloud platforms, AWS, Google Cloud, and Microsoft Azure.
    • Anthropic reports widespread enterprise adoption of Claude across a broad range of industries.

    Detailed Summary

    The Raise and the Valuation

    Anthropic has raised $65 billion in Series H funding, a round that values the company at $965 billion on a post-money basis. The size of the raise places it among the largest private financing events the technology industry has ever seen, and the valuation pushes Anthropic to the doorstep of the trillion dollar mark. The capital arrives at a moment when demand for the company’s Claude models has accelerated sharply, and the round is built to fund the response to that demand rather than simply mark a milestone. Anthropic framed the financing in its Series H announcement as the fuel for staying at the research frontier while scaling the infrastructure and products that customers increasingly rely on.

    Who Put In the Money

    The Series H was led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, a group that combines deep growth-stage technology experience with conviction in Anthropic’s long-term trajectory. Joining as co-leads were Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN, a roster that spans crossover funds, sovereign wealth, and institutional investors. Beyond the new equity, Anthropic pointed to $15 billion in previously committed hyperscaler investment, including $5 billion from Amazon. Taken together, the investor base reflects a mix of financial backers and strategic partners with a direct stake in seeing Claude reach more customers and more compute.

    Revenue at $47 Billion Run-Rate

    Underpinning the valuation is a business that has scaled with unusual speed. Anthropic crossed a $47 billion run-rate revenue figure earlier in May 2026, a number that signals how quickly enterprises and developers have adopted Claude across their workflows. Run-rate revenue annualizes the company’s most recent performance, and at this level it puts Anthropic firmly among the fastest growing software businesses on record. That financial momentum is the practical justification for both the round’s size and the near trillion dollar valuation investors were willing to support.

    The Compute Buildout

    A large share of the strategy behind the raise centers on securing compute at enormous scale. Anthropic detailed a set of infrastructure partnerships designed to keep pace with Claude demand. Amazon is providing 5 gigawatts of capacity, while Google and Broadcom together are supplying 5 gigawatts of TPU capacity. SpaceX is contributing GPU access, broadening the range of silicon Anthropic can draw on. Supporting the buildout on the hardware supply side are Micron, Samsung, and SK hynix, the memory and component partners whose output is essential to standing up data centers at this magnitude. The combined picture is a company assembling power, chips, and supply chain commitments measured in gigawatts rather than racks.

    Where the Money Goes

    Anthropic outlined three priorities for the new capital. The first is to advance safety and interpretability research, continuing the work of understanding how models behave and ensuring they remain reliable as they grow more capable. The second is to expand compute capacity to meet the growing demand for Claude, the practical engine behind the infrastructure commitments above. The third is to scale the products and partnerships that customers depend on, deepening the company’s reach into the tools and platforms where work actually happens. Krishna Rao, Anthropic’s chief financial officer, said the funding “will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.”

    Claude Everywhere

    The funding lands on top of a distribution footprint that already spans the major cloud ecosystems. Claude is available on all three leading cloud platforms, AWS, Google Cloud, and Microsoft Azure, which means enterprises can reach the models through whichever provider they have standardized on. That availability has translated into widespread enterprise adoption across industries, from software and finance to healthcare and beyond. By being present everywhere developers and businesses already operate, Anthropic positions Claude not as a destination customers must travel to but as a capability woven into the platforms they use every day.

    Notable Quotes

    This funding will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens.

    Krishna Rao, CFO at Anthropic, on the purpose of the Series H round.

    Advance safety and interpretability research, expand compute capacity to meet growing Claude demand, and scale products and partnerships customers depend on.

    How Anthropic describes its use of funds from the round.

    For the full details on the round, the lead and co-lead investors, and how Anthropic plans to deploy the capital across safety research, compute, and products, read the full announcement here.

    Related Reading

    • Anthropic, the AI safety and research company behind Claude that raised this Series H round.
    • Sequoia Capital, one of the lead investors anchoring the financing.
    • Amazon Web Services, one of the three major cloud platforms where Claude is available and the source of a $5 billion investment.
    • Google Cloud TPUs, the tensor processing units behind the 5 gigawatts of TPU capacity in the Google and Broadcom partnership.
    • AI safety, the research field at the center of how Anthropic says it will use the new funding.
  • Raoul Pal: Why the Crypto Bull Run Is Just Starting, the AI Economic Singularity, and Why You Should Never Sell Bitcoin

    Macro investor and Real Vision co-founder Raoul Pal returned to the When Shift Happens podcast for episode 173 to argue that the recent crypto drawdown is a nasty correction inside a much larger bull market, not the end of the cycle. Across an hour and a half he ties together the AI capital race, the coming economic singularity, why layer one blockchains are a kind of universal basic equity, and the deceptively simple discipline that actually compounds wealth: buy, hold, and almost never sell.

    TLDW

    Pal frames everything through what he calls the universal code, the conversion of units of energy into units of intelligence, and says the global race to fund AI is so large that no government or company can stop feeding it capital. That liquidity, plus relentless currency debasement, is the engine under both the AI stocks going vertical and the crypto market that has lagged them. He calls the Bitcoin slide from 126K toward 60K a normal correction in a bull market, says liquidity is now reaccelerating, and argues smart contract layer ones (Ethereum, Solana, Sui) are the best risk-adjusted bet because the entire financial system and a coming swarm of AI agents will run on those rails, giving crypto an effectively infinite total addressable market. He explains why he added Zcash as a Bitcoin-with-privacy and quantum-proof trade, lays out his plan to launch an NFT fund built around grail digital art and NFT-backed lending, and makes a data-backed case that buying oversold dips and never selling beats trying to trade cycles. The conversation closes on a 70/30 bullish framework for 2026 and 2027 and a reflection on kindness.

    Thoughts

    The strongest idea in this conversation is not a price target, it is a reframe. Pal keeps pulling the camera back from “what will Bitcoin do this quarter” to “what is the organizing principle of the entire economy right now,” and his answer is the funneling of all available capital into anything that produces intelligence. Once you accept that frame, the buy-the-dip behavior in both AI equities and crypto stops looking like mania and starts looking like a rational response to a one-way game. The part worth sitting with is his game-theory claim that neither the US nor China can stop, and that even a spectacular failure like an OpenAI blowup would simply trigger an instant asset auction rather than a collapse, because no single player can be allowed to win outright. Whether or not that is fully true, it is a genuinely different mental model than the recession-and-bust cycle most investors carry around.

    His layer-one thesis is the most actionable takeaway and also the most quietly radical. The pitch is that for the first time ordinary people can own a piece of the core infrastructure that the machine economy will be built on, the way you never got to own a slice of TCP/IP or the open web. He calls this universal basic equity and treats it as humanity’s pension plan. The honest tension he admits is that the racy returns may not be in the boring base layer at all, and that the truly investable winners of this era, the private stablecoin companies, are largely closed off to retail. So the layer-one trade is partly a consolation prize for the fact that the best businesses are unreachable. That is a more candid admission than most crypto bulls will make.

    The behavioral core of the episode is the most useful for a normal reader, and it is almost embarrassingly simple. Pal has been in markets for 35 years and says he does not know a single person who reliably buys bottoms and sells tops, including the legends, who he points out made most of their money on management fees rather than heroic trades. His prescription is to add only when the asset is one to two standard deviations oversold on its long-term log trend, otherwise do nothing, and to treat patience as an action rather than inaction. The line that does the most work is “the market owes you nothing.” It quietly dismantles the entitlement that drives people to overtrade, chase, and burn emotional energy on a strategy that the data says underperforms simply holding.

    Where a reader should keep some skepticism is the certainty. Pal assigns the bull case a 70 percent probability and the bear case 30, but the bear case he sketches (Middle East war reignites, inflation forces tightening, liquidity gets starved, the intelligence buildout slows) is not a minor footnote, it is the whole structure failing at once. The thesis also leans hard on the assumption that AI agents will become massive on-chain economic actors, which is plausible but still mostly forward-looking rather than observed. The value here is the framework, not the forecast. If you take one thing, take the energy-into-intelligence lens and the standard-deviation discipline, and hold the specific tickers and timelines loosely.

    Key Takeaways

    • Pal’s central frame is the universal code: the universe, and now the economy, continuously converts units of energy into units of intelligence, and capital flows to whatever produces the most intelligence.
    • The AI buildout is a race of nations and corporations that nobody can exit. Game theory means neither the US nor China can stop, because the other side would gain a decisive advantage.
    • Even a catastrophic AI failure would not break the trend. If OpenAI ran out of money, its assets would be auctioned instantly to multiple buyers so no single company could double its compute and win the whole game.
    • The economic singularity is the point where institutions and the way we measure the economy can no longer keep up with the speed of technology, made worse when AI and robots are added to the population as economic actors.
    • AI is the first real-world example of Reed’s law, the exponential of the exponential, where most past technology followed the slower Metcalfe’s law log channel.
    • By around 2028, roughly five to six years after AI went mainstream, AI will have produced more words than all of humanity has produced in sum total since the Gutenberg press.
    • The current run is funded by cash flow, not debt. Unlike the late-1990s tech boom, the buildout is paid for out of the earnings of the most cash-generative firms in history.
    • Chips and energy are the binding constraints. Companies report being booked out three years and beyond, and xAI is reportedly handing older data centers to Anthropic because no one can get enough compute.
    • Pal expects the Fed to run a Greenspan-style playbook, cut rates and then get out of the way, letting a productivity miracle grow the economy faster than the debt pile so debt to GDP falls.
    • Bitcoin falling from 126K toward 60K is a nasty correction in a bull market, not a bear market. Pal has seen many 50 percent Bitcoin drawdowns since 2013, and altcoins always fall further on the risk curve.
    • The 2025 to 2026 correction has been choppy and slow rather than the fast V-shape of 2021, which is part of why sentiment feels so bad.
    • Crypto lagged because liquidity is finite. The government shutdown withdrew liquidity, which hits crypto with about a three-month lag, while AI capex and Chinese gold buying sucked capital away.
    • Liquidity is now reaccelerating in the US, China, and globally, which Pal sees as the reason the worst is likely over for crypto.
    • The birth of economic agents in late 2024 gives crypto an effectively infinite total addressable market, since agents will be economic actors that hold treasuries, make payments, and transact on-chain.
    • Smart contract layer ones are Pal’s preferred bet. He compares the structure to operating systems and cloud, where value concentrates into three to five major players plus a few specialists.
    • He calls owning layer ones universal basic equity and humanity’s pension plan, the chance to own the rails the agentic economy will run on, something the internet never offered retail.
    • Discounted cash flow analysis is the wrong tool for valuing a blockchain. The whole purpose of the network is to be the cheapest, fastest, and most programmable, so high fees are a bug, not a strength.
    • Pal measures layer ones by intelligence density: number of developers, programmability, speed to finality, applications per user, and the ratio of stablecoins to total value locked as stored energy.
    • Only three tokens maintained economic density when the market fell 80 percent: Ethereum, Solana, and Sui. ETH is the safe Microsoft-like choice, Solana is faster and cheaper, Sui is earlier but extremely fast and programmable.
    • Pal added Zcash in the correction as a Bitcoin-with-privacy trade. The left-curve case is simple privacy value, the right-curve case is that it is also quantum-proof and a hedge against AI-enabled state surveillance.
    • He admits he did not execute the Zcash buy well, kept meaning to add more while traveling, and watched it run up 50 percent. He treats it as a small position, not a portfolio overhaul.
    • On Hyperliquid he is complimentary but uninvested, because he does not trade, use perps, or use leverage, and he expects Robinhood and Coinbase to compete hard for that niche.
    • DeFi is better suited to machines than humans. Agents may not even need front ends or websites, just low-friction access to swap across multiple stablecoins and currencies instantly.
    • DeFi is not dead despite mega-hacks. Pal argues hacks force better products, and notes that banks quietly absorb theft losses too, so the answer is to build more secure systems.
    • The entire financial system is moving to blockchain rails because they are the most efficient way to operate, a prediction Pal first made in 2014 before smart contracts existed.
    • Pal is launching an NFT fund focused on grail assets (one-of-one alien CryptoPunks, top artists) trading from roughly 600K to tens of millions, plus a convex middle tier of artists with social consensus.
    • He names artists like Dies with the most likes (whom he compares to a Hunter S. Thompson of art) and Kim Asendorf, whose work uses tokens at the pixel level.
    • The fund will also lend against NFTs for yields around 15 percent or more, acquiring assets cheaply if borrowers default and recycling yield into emerging artists.
    • His real estate analogy: a smaller NFT in a great collection is like a modest apartment in a billionaire neighborhood, while grails are the 20 million dollar penthouses that actually compound.
    • Bitcoin is partly an AI proxy because global savings should rise as AI lifts economic growth, and Bitcoin targets a share of those savings as a digital store of value.
    • The core mindset shift: if you know where the world is going and roughly where market cap is heading on the log trend, you would never sell, you would only ever accumulate.
    • Selling well is nearly impossible. Even if you take profit at two standard deviations overbought, adding it back at the bottom is something almost no one actually manages.
    • The people who made the most money in crypto are the ones who did not trade it. Pal cites holders who profited by doing essentially nothing while active traders lost their edge.
    • Pal’s discipline requires roughly two to three actions every five years: add when one to two standard deviations oversold, optionally trim when two standard deviations overbought, otherwise nothing.
    • By his standard deviation measure, Bitcoin and crypto are as cheap as they have been in their long-term uptrend versus the NASDAQ, which he reads as a signal to allocate more to crypto.
    • Fear and greed sat below 10 for the longest stretch in the index’s history during this correction, hitting its lowest reading ever, a classic oversold extreme.
    • His 2026 to 2027 bull case stacks stablecoin explosion, the Clarity Act getting signed, rising global liquidity, debt rollovers forcing money printing, a strong business cycle, AI agents, and a cheap entry point. He puts it at roughly 70/30 to the upside.

    Detailed Summary

    Two economies and the money illusion

    The conversation opens loosely with travel, stablecoin spending, and a riff on why people agonize over a 75 dollar airport breakfast but happily lose money on an NFT that drops 80 percent. Pal’s explanation is that we live in two economies at once. The crypto and tech economy can grow 50 to 150 percent in a good year, while the real economy grows around 2 percent. Money earned in the fast economy does not feel real, which is why people spend and speculate so freely with it. This sets up the rest of the episode, where Pal treats the fast economy as the place serious capital is being forced to go.

    The AI capital race nobody can stop

    Asked why the stock market only seems to go up, Pal gives two reasons: liquidity expansion and the most extraordinary capital event in human history, the funneling of all capital into intelligence. He frames it as a race of nations, corporations, and individuals that cannot be slowed because of game theory. No superpower can let another reach AGI alone, only the US and China can afford the race, and neither can stop without ceding the advantage. He even games out an OpenAI bankruptcy and concludes the US would instantly auction the assets across many buyers rather than let one firm double its compute and win, which is why he calls the whole thing too big to fail. The practical conclusion is blunt: buy the dip, because the structure forces capital to keep flowing.

    The economic singularity, Reed’s law, and electricity through sand

    Pal defines the economic singularity as the moment when institutions and our economic measurements can no longer cope with the speed of technology, especially once AI and robots count as population. He explains that almost all past technology adoption followed Metcalfe’s law, a log channel visible in the charts of Google, Facebook, and the NASDAQ, but AI is the first observed example of Reed’s law, the exponential of the exponential. To make it concrete he cites ARK research showing AI will, by roughly 2028, have produced more words per year than all of humanity, and notes Anthropic expected 10x growth and got 80x in a quarter. He marvels that we are putting electricity through silicon, the second most common element on Earth, and producing intelligence six orders of magnitude faster than a human neuron.

    Why crypto lagged and why the worst is over

    Pal explains the crypto underperformance mechanically. There is only so much liquidity, the government shutdown withdrew it, and that hits crypto with roughly a three-month lag, landing right in the middle of the October drawdown. At the same time, the AI buildout and Chinese gold buying pulled capital toward the longest-duration assets, leaving SaaS and crypto with nearly identical charts as they got left behind. His read for 2026 is that liquidity is now reaccelerating across the US, China, and the world, so there is nothing to worry about yet. The Bitcoin move from 126K toward 60K is, in his framing, a normal correction, comparable in length to the roughly six-month 2021 pullback that resolved into new highs.

    Layer ones as universal basic equity

    The heart of the investment thesis is that smart contract layer ones will accrue a growing share of crypto value as the investable infrastructure layer. Pal argues the entire financial system plus a coming swarm of AI agents will use these rails, giving crypto an infinite total addressable market. Like operating systems and cloud, value will concentrate into three to five chains plus specialists. He measures them by intelligence density rather than discounted cash flow, since the point of the network is to be cheapest and fastest. By his analysis only Ethereum, Solana, and Sui held economic density through an 80 percent drawdown. ETH wins on developers, security, and Lindy effects (the Microsoft you do not get fired for owning), Solana is faster and cheaper, and Sui is earlier but offers a different order of magnitude on speed, finality, and programmability. He frames owning a basket of four or five as humanity’s pension plan.

    Zcash, privacy, and the quantum hedge

    Pal reveals he added Zcash during the correction, alongside buying more Sui. He had said in December he would wait for it to pull back, and he did, though he admits he did not buy enough as it ran up 50 percent. His left-curve case is that privacy has real value and people will understand it more, making it essentially Bitcoin with privacy that could plausibly reach 5 to 10 percent of Bitcoin’s value. His right-curve case is that it is also quantum-proof and a hedge against governments wielding AI-enabled control over people. He dismisses the mid-curve worry that it will be banned, noting that the ban fear has shadowed crypto his entire career and never materialized.

    Agents, DeFi, and financial rails

    Pal argues the biggest future users of DeFi and crypto payments will be AI agents, whose scale is effectively infinite. Setting up agents himself, he keeps hitting walls that require small payments, and sees agents making endless micro-payments plus larger transactions, holding treasuries across multiple stablecoins and currencies, and rebalancing through DeFi instantly without any human involved. DeFi, he says, is actually better suited to machines than people, and may not even need front ends. On the wave of mega-hacks he is unbothered, arguing they force better products, that banks quietly absorb theft too, and that the financial system always migrates to the most efficient rails because that is how you make more money. He first predicted blockchain would become the financial industry’s infrastructure rail back in 2014.

    The NFT fund and grail digital art

    Pal is launching an NFT fund because so many people told him they want exposure but do not know how. The fund targets grail assets, the scarce one-of-one pieces with proven social consensus that trade from around 600K into the tens of millions, plus a convex middle tier of artists who have long-term proven value and could be wildly re-rated. He names Dies with the most likes, an Indiana artist cataloging the decline of middle America whom he likens to Hunter S. Thompson, and German artist Kim Asendorf, whose 3D works are built from individually tokenized pixels. The math of convexity is the draw: an artist re-rating from 20 to 200 ETH while ETH itself multiplies could compound into a 100x. The fund will also lend against NFTs for yields above 15 percent, acquiring assets cheaply on default and recycling yield into emerging artists, and will build a club connecting investors to artists. His real estate framing reassures smaller holders: owning a lesser piece in a top collection is like a modest flat in a billionaire neighborhood.

    Never sell, and the math of patience

    The behavioral spine of the episode is Pal’s argument that buying, holding, and accumulating beats trading cycles. He has built a Real Vision indicator that signals a buy when an asset is one to two standard deviations oversold on its log regression channel, and says it compounds at a stupid rate. The problem with selling is deciding how much and then having the discipline to buy it back at the bottom, which almost no one does. In 35 years he says he has never met anyone who reliably buys bottoms and sells tops, and notes the trading legends made most of their money on management fees. The people who made the most in crypto are the ones who did nothing. He reframes holding as patience, an active stance, and ties it back to the universal code: buying Bitcoin and doing nothing is the most energy-efficient trade you can make, while overtrading burns mental and emotional energy for a worse outcome. His advice to those tempted by AI’s vertical charts is to go play with AI and just hold your Bitcoin.

    The 2026 to 2027 outlook

    Pal closes the macro case by stacking the bull factors: a massive stablecoin expansion over the next 24 months, the Clarity Act getting signed and freeing builders, rising global liquidity, trillions in interest payments that force more money printing, a strong business cycle recycling earnings into speculative assets, the arrival of AI agents, and a cheap entry point with fear and greed at historic lows. He even floats a permanent resolution of Middle East conflict as part of the upside. The bear case is the mirror image: war reignites, inflation runs hotter, tightening starves capital, and the intelligence buildout slows. He puts the odds at roughly 70 percent bullish, 30 percent bearish, and says he does not see the bear case yet. The episode ends on a personal note about kindness, with Pal unable to name a single kindest act because, he says, everything is made of kindness.

    Notable Quotes

    “We’re going through the most extraordinary time in human history. Nothing else matters. This whole funneling of all capital into intelligence is the biggest race that’s ever happened.”

    Raoul Pal, on why capital keeps flooding into AI

    “The game is so big that nobody will stop.”

    Raoul Pal, on the game theory of the US and China AI race

    “This is how amazing it is. We’re putting electricity through sand and creating intelligence.”

    Raoul Pal, on silicon and the universal code

    “It’s a nasty correction in a bull market. I’ve been in crypto since 2013. I’ve seen many corrections, non-bear markets of 50% in Bitcoin.”

    Raoul Pal, on Bitcoin falling from 126K toward 60K

    “The market owes you nothing. You would just have to be better at doing a job.”

    Raoul Pal, on the entitlement that ruins crypto investors

    “This is humanity’s pension plan. We get to invest in the infrastructure rails of which all the agentic economy will run.”

    Raoul Pal, on owning layer one blockchains

    “The people who’ve made the most money out of crypto are the people who don’t trade it.”

    Raoul Pal, on why holding beats trading

    “Your job is to be a mercenary for your own capital. You want to make the most money over time.”

    Raoul Pal, on why no one has to stay loyal to crypto

    “Bitcoin and crypto is as cheap as it has been in its long-term uptrend versus NASDAQ.”

    Raoul Pal, on the relative value signal he watches

    This is a compressed look at a wide-ranging conversation. Watch the full episode on When Shift Happens here for Pal’s complete reasoning, the charts he references, and the back-and-forth that the summary above leaves out.

    Related Reading

    • Real Vision the financial media platform Raoul Pal co-founded, where his Global Macro Investor research and exponential age thesis live.
    • Metcalfe’s law (Wikipedia) the network-value relationship Pal uses to model the log regression channel for crypto.
    • Reed’s law (Wikipedia) background on the exponential-of-the-exponential growth Pal says AI is the first real-world example of.
    • Technological singularity (Wikipedia) context for the economic singularity Pal argues is now only about four years away.
    • Zcash the privacy coin Pal added in the correction as a Bitcoin-with-privacy and quantum-proof trade.
  • Gavin Baker on Orbital Compute, TSMC, Frontier AI Models, Anthropic’s Vertical Take Off, and the Coming Wafer Shortage

    Gavin Baker, founder and CIO of Atreides Management, returns to Patrick O’Shaughnessy’s Invest Like the Best for his sixth appearance. He calls the current AI moment the most extraordinary moment in the history of capitalism, walks through what Anthropic’s vertical takeoff in revenue actually means, lays out why orbital compute is closer than skeptics believe, dissects the TSMC bottleneck that may be the only thing standing between today’s market and a full-on AI bubble, and rates every hyperscaler on how they have positioned for a world where frontier model providers may stop selling API access altogether.

    TLDW

    Anthropic added eleven billion dollars of ARR in a single month, which is roughly the combined business of Palantir, Snowflake, and Databricks built over a decade. That is the setup. From there Gavin Baker covers the March and April selloff, the contrarian read that a closed Strait of Hormuz was actually bullish for American manufacturing competitiveness, why Anthropic and OpenAI multiples may be misleadingly cheap on an unconstrained run rate basis, why Elon Musk’s discipline on SpaceX valuation created a superpower of permanent access to capital, the practical engineering case for orbital compute as racks in space rather than Pentagon sized space stations, why TSMC’s capacity discipline is the single most important variable in whether the AI cycle becomes a bubble, what Terafab in Texas changes, why the Pareto frontier of AI models has flipped from Google dominance to Anthropic and OpenAI dominance in nine months, the shift from all you can eat AI subscriptions to usage based pricing and what that means for revenue scaling, Richard Sutton’s bitter lesson as the largest risk to the AI trade, why frontier tokens still capture an overwhelming share of economic value, the role of continual learning as the third great open question, why most new chip startups should not try to build a better GPU, why Cerebras did something different and hard, why disaggregated inference may extend GPU useful lives to ten or fifteen years and rescue the private credit industry, why being in the token path is the new venture filter, the new prisoner’s dilemma around releasing frontier models via API, an honest rating of Google, Meta, Amazon, and Microsoft, why personal safety is becoming a real AI era risk, and why he remains an AI optimist maximalist who believes this could be the next Pax Americana.

    Key Takeaways

    • Anthropic added eleven billion dollars of ARR in one month, more than the combined businesses of Palantir, Snowflake, and Databricks built across a decade. There is no precedent for this in the history of capitalism.
    • The SaaS and cloud revolution created between five and ten trillion dollars of value over twenty years. AI is replaying that compression on a timeline measured in months.
    • The March selloff was a drawdown driven by disagreement with price action, not invalidated thesis. That is the kind of drawdown an investor can lean into.
    • Deep Seek Monday in January 2025 was a similar setup. By the day of the selloff, AWS Asia GPU prices had already doubled, GPU availability had fallen, and it was obvious reasoning models would be vastly more compute hungry at inference. The market priced the opposite.
    • The Strait of Hormuz closing was actually positive for America. US natural gas (the primary input into US electricity, which feeds AI) fell twenty percent on Bloomberg while Asian and European natural gas doubled or tripled. American manufacturing competitiveness improved overnight.
    • The US is now the world’s largest producer and exporter of oil and gas. The economy is dramatically less energy intensive than in the 1970s. The shortage trauma comparison does not hold.
    • Tech as a sector traded as cheaply versus the rest of the market in early April as at any point in the last ten years, into the single most bullish moment for AI fundamentals on record.
    • Anthropic is dramatically more capital efficient than OpenAI, having burned roughly eighty percent less to reach a similar revenue scale. They have very different structural returns on invested capital.
    • Anthropic at roughly nine hundred billion for fifty billion of ARR (growing a thousand percent) is striking. Adjusted for compute constraint, the unconstrained run rate could be one hundred fifty to two hundred billion, putting the implied multiple closer to five times.
    • Claude Opus generates roughly seventy percent fewer tokens for the same question than previously, with token quantity tied to answer quality. Subscribers on flat-fee plans are getting a lobotomized model.
    • Elon Musk’s superpower is twenty years of making investors money. He never pushes valuation. SpaceX compounded low thirty percent per year for a decade because Musk treats fair pricing as a sacred covenant.
    • Capitalism will solve the watts shortage. The current bottleneck has shifted from chips and energy to zoning and political approval. Many capex decisions are paused until after the US midterms.
    • The watts shortage probably begins to alleviate in 2027 and 2028. Orbital compute solves it longer term.
    • Orbital compute is not Pentagon sized data centers in space. It is racks in space. A Blackwell rack is three thousand pounds, eight feet tall, four feet deep, three feet wide. SpaceX has shown a satellite roughly that size.
    • The satellites operate in sun synchronous orbit so solar wings (around five hundred feet per side) always face the sun and the radiator on the dark side always points to deep space.
    • Starlink V3 satellites already run at around twenty kilowatts. A Blackwell rack runs at one hundred kilowatts. SpaceX engineers express genuine confidence they have already solved cooling and radiator design at these scales.
    • Racks in space are connected with lasers traveling through vacuum, the same lasers already on every Starlink. SpaceX operates the world’s largest satellite fleet and, via xAI Colossus, the world’s largest data center on Earth.
    • Inference will move to orbit. Training will stay on Earth for a long time. Terrestrial data centers remain valuable for the rest of an investor’s career.
    • The wafer bottleneck is structural and political. TSMC is essentially Taiwan’s GDP, water, and electricity. The leaders see themselves as inheritors of Morris Chang’s sacred legacy and they do not behave like a Western public company.
    • Jensen Huang has never had a contract with TSMC. The relationship is run on handshakes and the assumption that things will be fair over time.
    • If TSMC did everything Jensen wanted, Nvidia could be selling two to three trillion dollars of GPUs in 2026 and 2027. TSMC’s discipline is the single largest factor preventing a true AI bubble.
    • Historically, foundational technologies always get a bubble. Railroads, canals, the internet. The current AI buildout is overwhelmingly funded out of operating cash flow, GPUs are running at one hundred percent utilization, and that is fundamentally different from the year 2000 fiber overbuild.
    • If one of Intel or Samsung Foundry catches up at the leading node, the other will follow, and TSMC’s discipline collapses. Watch TSMC capacity decisions to predict a bubble.
    • Terafab, the SpaceX and Tesla joint venture to build the world’s largest fab in America, has a partnership with Intel that grants access to fifty years of institutional foundry knowledge. The A teams at ASML, KLA, Lam Research, and Applied Materials will follow Elon’s reputation in hardware engineering.
    • The hiring playbook for Terafab includes building Taiwan Town, Japan Town, and Korea Town next to the fab. Recruit the engineers and import their families, their restaurants, and their staff.
    • Frontier tokens still capture an overwhelming share of all economic value created at the model layer. This is surprising and is one of the three big open questions for AI investing.
    • The Pareto frontier of intelligence versus cost has flipped. Nine months ago Google’s TPU dominated every point on the frontier. Today Anthropic and OpenAI dominate, with Grok 4.3 on the frontier and Gemini 3.1 hanging on.
    • Google’s conservative TPU V8 design (partly an attempt to reduce dependence on Broadcom and Nvidia) is the leading explanation for the loss of per token cost leadership.
    • AI pricing is shifting from all you can eat to usage based, mirroring the cellular and long distance industries. Cellular stopped being a great growth industry when it went all you can eat. AI just made the opposite move.
    • OpenAI and Anthropic together could exceed two hundred billion in ARR this year if compute keeps coming online and frontier token pricing holds.
    • The two hundred fifty dollar a month consumer AI plan is no longer enough to evaluate frontier capability. Enterprise plans with usage based billing are required because rate limits are now severe.
    • The three biggest open questions for AI investors are: violation of the bitter lesson via ASI or human ingenuity, whether frontier tokens keep commanding their premium, and when continual learning arrives.
    • Today’s continual learning is crude reinforcement learning during mid training on verifiable tasks. True continual learning means weights updating dynamically, like a human who learns the first time they touch fire.
    • Trying to build a better GPU is a losing strategy. Jensen will copy any one to three percent share design. Startups should target one percent share, do something different, and make it hard enough that Nvidia cannot fast follow.
    • Disaggregated inference (separating prefill and decode) opens new design canvases. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently.
    • Cerebras did something different and hard with wafer scale computing. Three generations of chips and real grit to get there.
    • Disaggregation of inference may stretch GPU useful lives to ten or fifteen years, dropping financing costs from low sevens to five or six percent, mathematically lowering the cost of the AI buildout and likely saving the private credit industry from its SaaS loan exposure.
    • Sellers of shortage outperform buyers of shortage. But owning the largest installed base of what is currently in shortage (hyperscaler CPU fleets, for example) is also a strong position.
    • Most of the economic value at the application layer of AI has been destroyed, not created. The exceptions are companies in the token path or in niches small enough that frontier labs ignore them.
    • Coding may be the shortest path to ASI. If you can write code, you can write code that does anything. Cursor, Cognition, and Anthropic correctly focused on it.
    • Jensen could probably get close to the frontier with his own Nemotron family of models whenever he wants. The fact that he chooses not to is a strategic decision about not commoditizing his customers.
    • The new prisoner’s dilemma in AI is whether frontier labs release their best model via API. If everyone agrees not to, Chinese open source falls behind. If anyone defects, the defector pulls ahead on revenue and resources, forcing everyone else to defect.
    • Google still owns the largest compute installed base. Without TPU’s prior cost advantage, this matters more. YouTube data has real value in a world of robotics. GCP is going crazy.
    • Meta deserves credit for becoming AI first internally faster than any other internet giant. Musa, their first MSL model, is impressively close to the Pareto frontier.
    • Amazon is strong because of Trainium and robotics driven retail P&L efficiency. Nova is better than it gets credit for.
    • Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Microsoft products rather than reselling to OpenAI is a courageous and probably correct call, even at the cost of an eight hundred dollar stock price.
    • The hyperscalers most engaged with startups are Amazon and Nvidia by a mile, followed by Google. Broadcom is the favorite ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement and that will cost them as the best teams are now at startups.
    • Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion at the speed of FaceTime is already feasible.
    • Ukraine is winning largely on the back of having the best battlefield AI outside America and Israel. Adversaries are starting to internalize what AI dominance means geopolitically.
    • An optimistic read is that this becomes a new Pax Americana, the way the post 1945 American nuclear monopoly was used to rebuild Germany and Japan rather than dominate.
    • AI cured a friend’s daughter’s rare disease by spinning up a research effort that identified a market drug capable of impacting her condition. That is the upside that keeps Gavin an AI optimist maximalist.

    Detailed Summary

    The most extraordinary moment in the history of capitalism

    Gavin’s framing of the current moment is unusually direct. Anthropic added eleven billion dollars of annual recurring revenue in a single month. The three highest profile SaaS companies of the last decade plus, Palantir, Snowflake, and Databricks, took a decade and tens of thousands of employees collectively to build the combined business that Anthropic added in thirty days. He has been investing through every major tech cycle and says there is no historical analog. Not the dotcom era, not the cloud transition, not mobile. This is its own thing.

    The market response, then, was peculiar. The NASDAQ sold off into the single most bullish moment for AI fundamentals on record. Tech traded at roughly its widest discount versus the rest of the market in a decade. Investors who said they wished they had bought into AI during 2022, during COVID, or during Deep Seek Monday got the same valuation setup again in early April, this time with an even clearer inflection.

    Why the Strait of Hormuz closing was secretly bullish for America

    One reason the macro fear in March may have been mispriced is that the same geopolitical event that drove the selloff was, in practice, a relative benefit to the United States. American natural gas, the input into American electricity, which is the input into American AI training and inference, fell roughly twenty percent. Asian and European natural gas prices doubled or tripled. The US emerged with sharply improved relative manufacturing competitiveness, which is exactly what the current administration cares about.

    The 1970s comparison does not hold. The US economy is dramatically less energy intensive, it is now the world’s largest producer and largest exporter of oil and gas, and there are no shortages, only price moves. That backdrop made it easier for disciplined investors to stay focused on AI fundamentals through the volatility.

    Anthropic and OpenAI valuations on an unconstrained run rate

    Anthropic at roughly nine hundred billion for fifty billion of ARR sounds rich until you adjust for the fact that the company is severely compute constrained. Gavin estimates that, unconstrained, Anthropic might be at one hundred fifty to two hundred billion in run rate revenue, putting the implied multiple closer to five times. He also points out that Claude Opus now generates roughly seventy percent fewer tokens for the same question than it used to. Token quantity correlates with answer quality, and Anthropic is rate limiting and shrinking outputs to ration capacity across its user base.

    Anthropic and OpenAI are also structurally very different. Anthropic has burned around eighty percent less cash than OpenAI to reach a comparable revenue scale. That implies very different long term returns on invested capital, though OpenAI has done a better job locking in compute and Sarah Friar is one of the most exceptional CFOs Gavin has worked with.

    Why neither lab is raising at a three trillion dollar valuation

    The answer Gavin gives is that both labs are deliberately leaving valuation on the table the way Elon has done for two decades. SpaceX compounded at low thirty percent annually for a decade because Elon never pushed price. The result is a permanent superpower of access to capital. Investors trust him because they have made money with him for twenty years. That is a moat that compounds with every round.

    Anthropic could probably raise at a one hundred percent premium to its rumored latest mark. They are choosing not to. In an uncertain world (Ukraine, Russia, Iran, Taiwan), preserving the ability to raise more capital later at fair prices is more valuable than maximizing this round.

    Watts and wafers, the two real constraints

    Capitalism is solving the watts problem. The leading PE infrastructure investors now say zoning and political approval, not chips or energy, are the gating factors. Companies are deferring big capex announcements until after the US midterms. Turbine capacity is being doubled at the manufacturers. Companies like Boom Aerospace are repurposing jet engines for grid use. Watts probably ease meaningfully in 2027 and 2028 and then orbital compute does the rest.

    Wafers are the harder problem because they live in Taiwan, run on handshakes, and depend on a corporate culture that does not respond to public market incentives. TSMC is essentially the GDP, water consumption, and electricity consumption of Taiwan. Its leadership treats the company as the legacy of Morris Chang. The Silicon Shield doctrine is real and internal.

    Orbital compute as racks in space

    The biggest mental update Gavin asks listeners to make is to stop picturing data centers in space as Pentagon sized space stations. A Blackwell rack is three thousand pounds and roughly the size of a refrigerator. SpaceX has shown a concept satellite of about that size. Solar wings extend five hundred feet to each side and the radiator extends hundreds of feet behind, both possible because the orbit is sun synchronous and the orientation is fixed relative to the sun.

    SpaceX engineers Gavin has spoken to at Starbase express genuine confidence that they have solved cooling at these power levels. They have. Starlink V3 satellites already operate at twenty kilowatts. A Blackwell rack is one hundred kilowatts. The same company operates the world’s largest satellite fleet and the world’s largest data center on Earth via xAI Colossus. The racks are connected to each other with lasers traveling through vacuum, technology already deployed in every Starlink. The naysayers, Gavin observes, are armchair skeptics and Larry Ellison’s response (he is out there landing rockets, no one else is) is the right frame.

    Terafab in Texas and the threat to TSMC’s discipline

    Terafab, the SpaceX and Tesla joint venture, intends to be the largest fab in the world. The partnership with Intel grants access to fifty years of foundry institutional knowledge, allowing Terafab to start three to five quarters behind the leading node rather than fifteen years behind. The A teams at the semicap equipment companies (ASML, KLA, Lam Research, Applied Materials) will follow Elon’s reputation in hardware engineering the same way they followed TSMC twenty years ago when Intel stumbled.

    The talent strategy is the part most observers underestimate. Recruit the best engineers globally, then import their families, their restaurants, their staff. Build Taiwan Town, Japan Town, and Korea Town next to the fab. Optimize the human experience for the people whose work matters. Intel and Samsung do not think that way.

    Bubble watch and the year 2000 comparison

    Every foundational technology in modern history has had a bubble. Railroads, canals, the internet. Carlota Perez documented why. Markets correctly identify the importance, diversity of opinion collapses, supply gets ahead of demand, the bubble crashes. The current cycle has two important differences. The buildout is overwhelmingly funded out of operating cash flow, not debt. Every GPU is running at one hundred percent utilization, while at the peak of the fiber bubble ninety nine percent of fiber was unused.

    TSMC discipline is the single largest reason a bubble has not formed. If Jensen could buy everything TSMC could theoretically make, Nvidia could sell two to three trillion dollars of GPUs in 2026 and 2027. At some point that becomes more than the market can absorb. If Intel or Samsung Foundry catches up at the leading node, the other will too. TSMC’s pricing discipline collapses and the bubble starts.

    The Pareto frontier and the loss of Google’s cost advantage

    The most important chart in AI is the Pareto frontier of model intelligence versus per token cost. Nine months ago, Google’s TPU based models dominated every point on it. OpenAI, Anthropic, and xAI sat inside the frontier. Today the frontier is dominated by Anthropic and OpenAI, with Grok 4.3 on the frontier and Gemini 3.1 hanging on by subsidization more than economics. The most likely cause is Google’s conservative TPU V8 design, an attempt to reduce dependence on Broadcom and Nvidia that sacrificed per token economics.

    The bitter lesson, frontier tokens, and continual learning

    Three open questions dominate AI investing. The first is whether Richard Sutton’s bitter lesson (more compute beats human algorithmic cleverness) gets violated by ASI itself optimizing for efficiency. Closer observers of AI are more skeptical of a violation. Gavin thinks ASI’s first move will be to make itself more efficient and more resourced, which is technically a temporary violation.

    The second is whether frontier tokens keep capturing the overwhelming share of economic value at the model layer. Today they do, surprisingly. Gemini 3.1 Pro was mindblowing nine months ago and is intolerable today. The third is when continual learning arrives. Today’s models need a million fire touches to learn what a human learns from one. True continual learning would mean dynamic weight updates in real time and would produce a fast takeoff.

    From all you can eat to usage based AI pricing

    AI is shifting from flat fee plans to usage based pricing. The historical analogy is cellular and long distance. Both stopped being great growth industries when they went all you can eat. AI just made the opposite move. The consequence is that flat fee subscribers, even on premium consumer plans, get a rate limited and token throttled version of the frontier model. Enterprise plans with usage based billing are now required to evaluate true capability. Gavin thinks the combination of new compute coming online and usage based pricing is what gets OpenAI and Anthropic past two hundred billion in combined ARR this year.

    Chip startups, prefill decode disaggregation, and Cerebras

    Trying to build a better GPU is the wrong move. The four scaled players (Nvidia, AMD, Trainium, TPU) have copy capability for any one to three percent share design that looks attractive. The good news for startups is that disaggregated inference (separating prefill and decode) opens a richer design canvas. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently. Andrew Fox’s analogy is a British naval ship of the eighteenth century. Prefill is loading the cannon. Decode is firing it.

    Cerebras is the model. Wafer scale computing is genuinely different and genuinely hard. It took three generations of chips to get right. Andrew Feldman and his team had the grit to keep going through chip one being a failure. The design has a high ratio of on chip compute and memory relative to shoreline IO, which is why Cerebras is now experimenting with putting an optical wafer on top of the compute wafer to solve scale out.

    GPU useful lives and the rescue of private credit

    One of the strongest claims in the conversation is that disaggregated inference will stretch GPU useful lives to ten or fifteen years. The skeptical narrative (GPUs are obsolete in two years, companies are cooking their depreciation books) is wrong. You can put a Cerebras system or Groq LPU in front of older Hopper or Ampere parts, use them only for prefill, and run them until they physically melt. Private credit, which is in pain from SaaS loans and which underwrote GPU loans on three to four year lives, may be saved by this.

    If GPU financing rates can come down from low sevens to five or six percent, the mathematics of the AI buildout improves materially. That is a structural tailwind that compounds for years.

    The application layer, the token path, and a new prisoner’s dilemma

    Trillions of dollars of value have been destroyed at the application layer, not created. Cursor and Cognition are the rare scaled exceptions, and they got there by focusing on coding very early. As Amjad Masad noted, coding is plausibly the shortest path to ASI because a coding agent can write itself into any new domain. Jamin Ball’s frame is that the new venture filter is whether the company is in the token path. Data Bricks is. Most application layer startups are not.

    Jensen could probably get close to the frontier with Nemotron whenever he wants, and the strategic question of whether to do that is a new prisoner’s dilemma. If every frontier lab agrees not to release best models via API, Chinese open source falls steadily behind. If anyone defects, the defector gains revenue and resources, and everyone else has to defect. The same dynamic exists between TSMC, Intel, and Samsung. If Nvidia or AMD ever truly used an alternative foundry, that foundry would catch up rapidly.

    Rating the hyperscalers

    Google has the largest compute installed base, the YouTube data that matters in a robotics world, and a search business that prints. Their loss of TPU cost leadership is the surprise of the year. If Google IO in five days does not produce a leapfrog model, the Nvidia centric narrative gets even stronger.

    Meta deserves real credit. Zuckerberg made Meta AI first internally faster than any other internet giant, paid up for the talent contracts when no one else would, and shipped Musa as a first model from MSL that is close to the Pareto frontier. Amazon is well positioned on Trainium, robotics in retail, and a Nova model line that is better than it gets credit for. Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Copilot rather than reselling to OpenAI is courageous and probably correct, even at the cost of stock price.

    The most interesting cross hyperscaler metric is startup engagement. Nvidia and Amazon engage deeply with startups. Google is next. Broadcom is the favored ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement, which Gavin believes will cost them as the best teams now sit at startups.

    Personal safety, geopolitics, and the Pax Americana case

    The closing section turns darker. Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion via something that looks exactly like your child calling on FaceTime is already feasible. Political violence against AI leaders is a real concern. Geopolitically, Ukraine is winning largely because it has the best battlefield AI outside America and Israel. How adversaries respond to that asymmetry is the next great variable.

    Gavin’s optimistic frame is the Pax Americana. After 1945 the US had a nuclear monopoly and could have controlled the world. Instead it rebuilt Germany and Japan, both of which became the most reliable American allies for the next eighty years. If AI dominance plays out similarly, this is a generationally positive story rather than a destabilizing one. The personal anecdote that closes the conversation is a friend whose daughter was diagnosed with a rare genetic condition. He spun up agents, identified a drug already on the market that addresses her mutation, and her life is immeasurably different because of AI. That is the upside.

    Thoughts

    The Anthropic eleven billion in a month framing is the kind of stat that resets priors. The right way to interpret it is not as a one off but as a measure of how fast value can compound when the underlying technology improves on a curve steeper than the ability of the rest of the economy to absorb it. The skeptical question is whether that ARR is durable or whether it is heavily tied to a customer base of other AI companies that are themselves on a single venture funded year of runway. The bullish answer is that frontier coding, frontier research, and frontier enterprise tasks are not going to stop being valuable, and Anthropic is the best at all three. Both can be true. The number is still extraordinary.

    The argument that TSMC discipline is the only thing preventing a bubble is the analytically tightest part of the conversation. The implied trade is to watch TSMC capacity additions like a hawk and to be more, not less, cautious if Intel Foundry or Samsung Foundry ever announce real share at the leading node. The Terafab thesis is more speculative but more interesting. If Elon’s talent recruiting playbook works and the Intel partnership gives Terafab a real seat at the table within five years, the geometry of the global semiconductor industry shifts in a way that is bullish for American manufacturing, bullish for power and water infrastructure in Texas, and ambiguous for TSMC itself.

    The Pareto frontier discussion deserves more attention than it usually gets. Pricing leadership in AI is not a vanity metric. It determines who can subsidize free tier usage, who can absorb compute shortages, who can ship cheaper enterprise plans, and ultimately whose model becomes the default for any given workload. Google losing per token leadership in nine months is one of the most under analyzed events in the sector and it explains a lot about why Anthropic and OpenAI are growing the way they are. If Google IO does not produce a leapfrog model, the implied verdict on TPU V8 design choices gets a lot harsher.

    The application layer destruction point is worth sitting with. Founders building on top of frontier models are competing in a world where the model itself moves faster than any moat they can build, where the model lab can absorb their niche if it gets interesting, and where the only protection is either deep token path integration or a niche so small the lab does not bother. That is a much harsher venture environment than the early SaaS era. The compensating opportunity is that one human can now run a hundred agents, so the ceiling on what a small team can build is correspondingly higher. The bet is that productivity per founder rises faster than competitive pressure from the labs. We will find out.

    The orbital compute pitch is the section that will polarize listeners. The naive read is that this is science fiction. The closer read is that every component (sun synchronous orbit, laser interconnect, twenty kilowatt satellite buses, ten thousand satellite manufacturing cadence, full rocket reusability) already exists. The remaining engineering problems are repair, maintenance, and radiator scale, all of which are real but tractable on a five to ten year horizon. The strategic implication is that the political and zoning ceiling on terrestrial data centers becomes less binding if orbital compute is a credible alternative for inference workloads. The investor implication is that being short the watts and cooling complex on a five year horizon is a real trade, not a meme.

    Watch the full conversation here.

  • Jensen Huang at Stanford CS153 Frontier Systems on Co-Design, Agentic Computing, Vera Rubin, Open Models, and the Million-X Decade That Reshaped AI Infrastructure

    https://www.youtube.com/watch?v=tsQB0n0YV3k

    NVIDIA CEO Jensen Huang returned to Stanford for the CS153 Frontier Systems class (the room nicknamed itself “AI Coachella”) to lay out, in raw form, how he thinks about the computer being reinvented for the first time in over sixty years. Across roughly seventy minutes of student questions he walks through the codesign philosophy that gave NVIDIA a million-x decade, the architectural through-line from Hopper to Grace Blackwell to Vera Rubin to Feynman, the case for open source foundation models, the realities of tokens per watt and MFU, energy demand running a thousand times higher, the China and export-control debate, and his own biggest strategic mistakes. Watch the full conversation on YouTube.

    TLDW

    Huang argues every layer of computing has changed: the programming model, the system architecture, the deployment pattern, the economics. Co-design across CPUs, GPUs, networking, storage, switches and compilers gave NVIDIA roughly a million-x speed-up over ten years versus the ten-x Moore’s Law era, and that headroom is what let researchers say “just train on the whole internet.” Hopper was built for pre-training, Grace Blackwell NVLink72 for inference and reasoning (50x over Hopper in two years), Vera Rubin is built for agents that load long memory, call tools and need a low-latency single-threaded CPU bolted directly to the GPU, and Feynman extends that to swarms of agents that spawn sub-agents. Open weights matter because safety, sovereignty (230-plus languages no one else will fund) and domain models for biology, autonomy, robotics and climate need a foundation that NVIDIA is willing to seed. Compute is not really the scarce resource (Huang says place the order and the chips ship), the broken thing is institutional budgeting that can’t put a billion dollars into a shared university supercomputer. Energy demand is heading a thousand times higher and this is finally the moment market forces alone will fund sustainable generation. On geopolitics he rejects the GPUs-as-atomic-bombs framing and warns America will end up like its telecom industry if it cedes two thirds of the world. On career he advises seeking suffering on purpose. On strategy he says observe, reason from first principles, build a mental model, work backwards, minimize opportunity cost, maximize optionality.

    Key Takeaways

    • The computing model has been substantially unchanged since the IBM System 360, sixty-plus years ago. Huang’s first computer architecture book was the System 360 manual. AI is the first true reinvention.
    • Old computing was pre-recorded retrieval. New computing is generated, contextually aware and continuous. Cloud was on-demand. Agentic systems run continuously.
    • Codesign is NVIDIA’s central thesis. Inherited from the Hennessy and Patterson RISC era at Stanford, extended across CPUs, GPUs, networking, switches, storage, compilers and frameworks all optimized together.
    • The result of full-stack codesign: roughly 1,000,000x faster compute over ten years, versus a generous 10x to 100x for Moore’s Law in the same period. Dennard scaling effectively ended a decade ago.
    • That million-x speed-up is what unlocked “train on all of the internet” as a realistic AI strategy.
    • After GPT, Huang says it was obvious thinking was next. Reasoning is just generating tokens consumed internally, then using tools is generating tokens consumed externally. Agentic systems followed predictably.
    • Education needs AI baked into the curriculum, not just taught as a subject. Pre-recorded textbooks cannot keep pace with knowledge being generated in real time.
    • Huang says he cannot learn anymore without AI. He has the AI read the paper, then read every related paper, then become a dedicated researcher he can interrogate.
    • Mead and Conway and the first-principles methodology of semiconductor design are still worth learning even though most of the scaling tricks have been exhausted.
    • NVIDIA itself is one of the largest consumers of Anthropic and OpenAI tokens in the world. One hundred percent of NVIDIA engineers are now agentically supported. Huang recommends Claude and similar tools by name and says open-source downloads will not match the integrated product harness.
    • NVIDIA still invests heavily in open foundation models because language and intelligence represent the codification of human knowledge. Five pillars: Nemotron (language), BioNeMo (biology), Alphamayo (autonomous vehicles), Groot (humanoid robotics) and a climate science model (mesoscale multiphysics).
    • Sovereign language models matter. Roughly 230 world languages will never be a top priority for a commercial frontier lab. Nemotron is near-frontier and fully fine-tunable so any country can adapt it.
    • Safety and security require open weights. You cannot defend against or audit a black box. Transparent systems let researchers interrogate models and let defenders deploy swarms.
    • The future of cyber defense is not bigger-model-versus-bigger-model. It is trillions of cheap fast small models like Nemotron Nano surrounding the threat.
    • Domain models fuse language priors with world models. Alphamayo learned to drive safely on a few million miles instead of billions because it can reason like a human about the road.
    • MFU (Model Flops Utilization) is a misleading metric. Huang says he wants low MFU, because that means he over-provisioned every resource and never gets pinned by Amdahl’s law during a spike.
    • The xAI Memphis cluster running at 11 percent MFU is not necessarily a failure mode. In disaggregated prefill plus decode inference you can deliver very high tokens per watt with very low MFU.
    • The right metric is performance, ultimately tokens per watt as a proxy for intelligence per watt, and even that needs adjustment because not all tokens are equal. Coding tokens are worth more than other tokens.
    • Hopper was designed for pre-training. NVIDIA chose to build multi-billion-dollar systems when the largest existing scientific supercomputer cost $350 million, with no proven customer base. It worked.
    • Grace Blackwell NVLink72 was designed for inference, especially the high-memory-bandwidth decode phase. It is the world’s first rack-scale computer and delivered a 50x speed-up over Hopper in two years, against an expected 2x from Moore’s Law.
    • Vera Rubin is designed for agents. Long-term memory wired into storage and into the GPU fabric, working memory, heavy tool use, and Vera, a CPU optimized for low-latency multi-core single-threaded code so a multi-billion-dollar GPU system does not stall waiting on a slow tool call.
    • Feynman is being shaped for swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that demands a new compute pattern.
    • Tokens per watt improved 50x in one generation. Compounding energy efficiency is the lever NVIDIA controls directly.
    • Total compute energy demand is heading roughly a thousand times higher than today, possibly two orders of magnitude beyond that. Huang says he would not be surprised if the estimate is low.
    • For the first time in history, market forces alone are enough to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make sustainable energy investment rational.
    • Copper interconnect is becoming a bottleneck. Photonics is moving from optional to structural inside racks and across them.
    • Comparing NVIDIA GPUs to atomic bombs, Huang says, is a stupid analogy. A billion people use NVIDIA GPUs. He advocates them to his family. He does not advocate atomic bombs to anyone.
    • If the United States cedes two thirds of the global market to competitors on policy grounds, the American technology industry will end up like American telecommunications, which was policied out of existence.
    • Huang directly rejects AI doom-by-singularity narratives. It is not true that we have no idea how these systems work. It is not true that the technology becomes infinitely powerful in a nanosecond. He calls the rhetoric irresponsible and harmful to the field students are about to enter.
    • On Stanford specifically: if the university president places an order, NVIDIA will deliver the chips. The bottleneck is that no university department has a billion-dollar compute budget because budgeting is fragmented across grants. Stanford’s $40 billion endowment is more than enough to fix that.
    • “It’s Stanford’s fault” is meant as empowerment. If something is your fault, you can solve it.
    • Career advice: do not optimize purely for passion. Most people do not yet know what they love. Pick the job in front of you and do it as well as possible. Even as CEO, Huang says, 90 percent of the work is hard and he suffers through it.
    • Suffering on purpose builds the muscle of resilience. When the company, the team or the family needs you to be tough, that muscle has to already exist.
    • NVIDIA’s first generation of products was technically wrong in nearly every dimension: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point. The strategic recovery, not the technology, taught Huang the lessons that have lasted decades.
    • The biggest clean strategic mistake Huang names is the move into mobile chips (Tegra). It grew to a billion dollars then went to zero when Qualcomm’s modem dominance shut NVIDIA out of the 3G to 4G transition. The recovery into automotive and robotics (the Thor chip is the great great great grandson of that mobile lineage) was real, but Huang refuses to rationalize the original choice.
    • Forecasting framework: observe, reason from first principles, ask “so what” and “what next” until you have a mental model of the future, place your company inside that model, then work backwards while minimizing opportunity cost and maximizing optionality.
    • Best part of the CEO job: living at the intersection of vision, strategy and execution surrounded by people capable enough to make ambitious visions real. Worst part: the responsibility for everyone who joined the spaceship, especially in the near-death moments NVIDIA had four or five times early on.
    • Underrated insider note: Huang’s first apple pie with cheese, first hot fudge sandwich and first milkshake all happened at Denny’s. The Superbird, the fried chicken and a custom Superbird-style ham and cheese with tomato and mustard are his order.

    Detailed Summary

    Computing reinvented from the ground up

    Huang frames the moment as the first true rewrite of the computer in sixty-plus years. From the IBM System 360 forward, the mental model of writing code, running code, taking a computer to market and reasoning about applications stayed roughly constant. AI changes the programming model itself. Software is no longer a compiled binary running deterministically on a CPU. It is a neural network running on a GPU producing generated, contextual, real-time output. That cascades into how companies are organized, what tools developers use, what the network and storage stack look like, and what an application is even allowed to do. Robo-taxis, he notes, are an application no one would have attempted before deep learning unlocked perception.

    Codesign and the million-x decade

    Codesign is the philosophical center of the talk. Huang traces it to the RISC work of John Hennessy at Stanford, where simpler instruction sets won by being co-designed with the compiler rather than maximally optimized in isolation. NVIDIA extends the principle across every layer simultaneously: GPU architecture, CPU architecture, NVLink and NVSwitch fabrics, photonic interconnects, networking silicon, storage paths, CUDA libraries, frameworks and ultimately the model design. The numbers Huang gives are arresting. Moore’s Law in its prime delivered roughly 100x per decade. By the time Dennard scaling broke, real-world gains had compressed to roughly 10x. NVIDIA’s codesigned stack delivered between 100,000x and 1,000,000x over the same ten-year window. That non-linear speed-up is, in Huang’s telling, the precondition for modern AI: it is what allowed researchers to stop curating training sets and just feed the entire internet to the model.

    Education has to fuse first principles with AI tools

    Asked how curriculum should evolve, Huang argues AI must be integrated into the learning process, not just taught about. He recalls Hennessy writing his textbook by hand a chapter a week while Huang was a student, and says pre-recorded textbooks cannot keep up with the rate at which AI generates new knowledge. He describes his own learning workflow: hand the paper to an AI, then have it read the entire surrounding literature, then treat the AI as a dedicated researcher who can be interrogated. At the same time he defends the classics. Mead and Conway are still the foundation. Most modern semiconductor scaling tricks have been exhausted, but knowing where the field came from sharpens judgment when designing what comes next.

    Open source and the five domain pillars

    Huang gives one of the most detailed public accounts of why NVIDIA invests so heavily in open foundation models even while being a top customer of closed labs. He recommends Claude and OpenAI by name for production coding work, and says 100 percent of NVIDIA engineers are now agentically supported. The open-weights case rests on three legs. First, language is the codification of intelligence, and there are at least 230 languages that no commercial lab will ever prioritize. Nemotron is built near frontier and released so any country or community can fine-tune it. Second, the same representation-learning approach has to be replicated in domains where the data is not internet text, so NVIDIA seeded BioNeMo for biology, Alphamayo for autonomy, Groot for humanoid robotics and a climate model for mesoscale multiphysics. The economics of those fields would never produce a foundation model on their own. Third, safety and security require transparency. A black box cannot be defended or audited, and the future of cyber defense is not bigger-model-versus-bigger-model but swarms of cheap fast small models like Nemotron Nano surrounding the threat.

    MFU is the wrong metric, tokens per watt is closer

    A student raises the leaked memo that the xAI Memphis cluster is running at 11 percent Model Flops Utilization. Huang flips the framing. He says he would rather be at low MFU all the time, because that means he over-provisioned flops, memory bandwidth, memory capacity and network capacity. Bottlenecks shift constantly, so over-provisioning across every dimension is what lets the system absorb a spike without getting pinned by Amdahl’s law. In disaggregated inference, where prefill and decode are physically separated and decode is bandwidth-bound rather than flop-bound, NVLink72 can deliver extremely high tokens per watt while reporting very low MFU. Huang argues the right framing is performance, and ultimately tokens per watt as a rough proxy for intelligence per watt, adjusted for the fact that not all tokens are equal. A coding token is worth more than a generic token.

    Hopper, Grace Blackwell NVLink72, Vera Rubin, Feynman

    Huang gives the clearest public framing of NVIDIA’s roadmap as a sequence of architectural answers to evolving compute patterns. Hopper was built for pre-training, at a moment when NVIDIA chose to build multi-billion-dollar machines while the largest scientific supercomputer in the world cost $350 million and the marketplace for such systems was, on paper, zero. Grace Blackwell NVLink72 was the answer to inference and reasoning: a rack-scale computer that ganged 72 GPUs together because decode needs aggregate memory bandwidth far beyond a single chip. The generation-over-generation speed-up was 50x in two years, twenty-five times what Moore’s Law would have delivered. Vera Rubin is being built explicitly for agents. Agents load long-term memory from storage that has to be wired directly into the GPU fabric, they use working memory, they call tools that run on a CPU, and they wait. So the CPU has to be Vera, optimized for low-latency single-threaded code, because the multi-billion-dollar GPU system cannot afford to idle waiting on a slow tool call. Feynman extends the pattern to swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that will demand its own compute pattern.

    Energy demand and the grid

    Huang’s energy projection is one of the most aggressive numbers in the talk. NVIDIA can compound tokens per watt by 50x per generation through codesign, but the total compute demand is heading roughly a thousand times higher, and Huang says he would not be surprised if the real figure is one or two orders of magnitude beyond that. The reason is structural: future computing is generative and continuous, not pre-recorded and on-demand. The good news, he argues, is that this is the best moment in the history of humanity to invest in sustainable generation. Market forces alone are now sufficient to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make the math work.

    Adversarial countries, export controls and the telecom warning

    This is the segment where Huang is visibly fired up. He attacks the GPUs-as-atomic-bombs framing on its face. NVIDIA GPUs power medical imaging, video games and soy sauce delivery. A billion people use them. He advocates them to his family. The analogy collapses at the first comparison. He attacks the second framing, that American companies should not compete abroad because they will lose anyway, as a self-fulfilling defeat. Competition makes the company better. The third framing, that depriving the rest of the world of general-purpose computing benefits the United States, also fails on first principles: it benefits one or two American companies at the cost of an entire industry. The cautionary parallel is telecommunications. The United States once had a leading position in telecom fundamental technology and policied itself out of it. Huang’s worry, voiced explicitly to a room of CS students, is that they will graduate into a shell of a computer industry if the same path is repeated.

    AI doom and rational optimism

    In the same arc Huang rejects the science-fiction framing of AI as a singularity that arrives suddenly on a Wednesday at 7pm and ends civilization. He calls those claims irresponsible, says they are not true, and points out that the people advancing them are believed by audiences who then make policy on that basis. It is not true that no one understands how these systems work. It is not true that intelligence becomes infinitely powerful instantaneously. It is not true that there is no defense. His framing, which the host echoes as “rational optimism,” is that the goal is to create a future where people care about computers because the technology students are learning is worth mastering.

    Stanford’s compute problem is Stanford’s fault

    A student presses on the scarcity of compute for independent researchers, startups and universities inside the United States. Huang’s answer is sharp: there is no shortage. Place the order and the chips will arrive. The actual broken thing is institutional. University grants are fragmented across departments. No researcher can raise enough on a single grant to fund a billion-dollar shared cluster, and no one shares. He compares it to showing up at the grocery store demanding a billion dollars of tomatoes today. The solution is planning, aggregation and a campus-scale supercomputer, the way Stanford once built the linear accelerator. The endowment is $40 billion. Pulling a billion off it, contracting cloud capacity and giving every student and researcher AI supercomputer access is, in Huang’s view, obviously doable. When he says “it is Stanford’s fault” the host laughs, but Huang clarifies: if it is your fault you have the power to fix it.

    Career, suffering and resilience

    Asked how a CS student should spend the next few years, Huang pushes back on the standard “follow your passion” advice. Most people do not know what they love yet, because no one knows what they do not know. The bar of demanding joy from every working day is too high. Whatever the job is, do it as well as you can. Even as CEO of NVIDIA he says he genuinely loves about 10 percent of his work. The other 90 percent is hard and he suffers through it. He recommends suffering on purpose, because resilience is a muscle that only builds under load, and when the company, the team or the family needs that muscle, it has to already exist. Earlier in his life that meant cleaning toilets and busing tables at Denny’s. He does it today running a multi-trillion-dollar company.

    The biggest mistakes

    Huang separates technical mistakes from strategic mistakes. NVIDIA’s first generation of products was technically wrong in almost every way: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point inside. The company wasted two and a half years. But the strategic genius of the recovery, the reading of the market, the conservation of resources and the reapplication of talent, is what taught him strategy. The clean strategic mistake he names is mobile. NVIDIA’s Tegra line grew to a billion dollars of revenue and then collapsed to zero when Qualcomm’s modem dominance locked NVIDIA out of the 3G to 4G transition. Huang explicitly refuses the comforting rationalization that the Tegra effort fed the Thor automotive chip (“Thor is the great great great grandson”). The original decision, he says, was a waste of time. The lesson is to think one or two clicks further about whether a market is structurally winnable before committing the company.

    Forecasting under fog of war

    The final substantive exchange is on forecasting. Huang’s method has four steps. Observe what is actually happening (AlexNet crushing two decades of computer vision research in one shot, GPT producing reasoning by token generation). Reason from first principles about why it works. Ask “so what” and “what next” recursively until a mental model of the future emerges. Place the company inside that future and work backwards. Crucially, expect to be partly wrong. Some outcomes will absolutely happen, some will likely happen, some might happen, and the strategy has to be robust across that distribution. The real cost of any strategic choice is the opportunity cost of the alternatives you did not take, so the discipline is to minimize that cost and maximize optionality while letting the journey itself pay for the journey.

    Thoughts

    The most useful thing in this conversation is the explicit architectural mapping of compute patterns to chip generations. Hopper for pre-training. Grace Blackwell NVLink72 for inference, because decode is bandwidth-bound and a single chip cannot supply it. Vera Rubin for agents, because tool calls stall multi-billion-dollar GPU systems and so the CPU has to be optimized for low-latency single-threaded code. Feynman for swarms. That sequence is not marketing. It is a falsifiable thesis about where the bottleneck moves next, and every other infrastructure company should be measuring themselves against it. If Huang is right that swarms of sub-agents are the next dominant pattern, then the design pressure shifts from raw flops to fabric topology, memory hierarchy and storage-to-GPU latency. That has implications for everyone downstream, including the hyperscalers building competing accelerators.

    The MFU section is the most intellectually generous moment in the talk. The instinct in the AI ops community has been to chase MFU as if it were a virtue. Huang argues, persuasively, that low MFU is consistent with high tokens per watt in a disaggregated inference setup, and that bottlenecks rotate fast enough that over-provisioning every resource is the rational design. That reframing matters because it changes what “scarce” means. Compute is not scarce in the way the discourse treats it. What is scarce is a coherent system designed end-to-end. The xAI 11 percent number, in that frame, is not embarrassing. It is the natural reading of a workload that is mostly decode.

    The Stanford segment is the part most likely to be quoted out of context. “It’s Stanford’s fault” is a deliberately provocative line, but the underlying claim is correct and load-bearing. Compute is not gated by NVIDIA refusing to ship chips. It is gated by the fact that fragmented grant funding cannot aggregate into the billion-dollar order that NVIDIA can fulfill. The implication is that universities and national labs need a structural change in how they pool capital for compute, and that the current model of every researcher buying a handful of cards is genuinely obsolete. Huang’s nudge about pulling a billion off the endowment is concrete enough to be acted on, and other major research universities should read this segment as a direct prompt.

    The geopolitical segment is the highest-stakes one. The telecommunications comparison is correct as a historical pattern, and Huang is one of the very few executives in a position to deliver that warning credibly. The unresolved tension is that the argument applies symmetrically. If American AI dominance is built by selling globally, that includes selling into adversarial states, and the policy question is where the line falls. Huang does not answer that question. He attacks the framing that lets the question be answered badly. That is a meaningful contribution to the discourse even if it does not resolve the underlying tradeoff.

    The career advice section is the part the social-media clips will mishandle. “Seek suffering” reads as macho when extracted. In context it is a specific operational claim about how resilience compounds, and it is paired with the Tegra story where Huang himself paid the price of not thinking one more click ahead. That kind of self-implication is rare in CEO talks, and it is the reason the talk is worth listening to in full rather than only reading the recap.

    Watch the full Stanford CS153 Frontier Systems conversation with Jensen Huang here.

  • Krishna Rao on Anthropic Going From 9 Billion to 30 Billion ARR in One Quarter and the Compute Strategy Powering Claude

    Krishna Rao, Chief Financial Officer of Anthropic, sat down with Patrick O’Shaughnessy on Invest Like the Best for one of the most detailed public looks yet at the operating engine behind Claude. He covers how Anthropic compounded from $9 billion of run rate revenue at the start of the year to north of $30 billion by the end of Q1, why he spends 30 to 40 percent of his time on compute, the playbook for buying gigawatts of AI infrastructure across Trainium, TPU, and GPU platforms, how Anthropic prices its models, why returns to frontier intelligence keep climbing, and what the Mythos release tells us about the cyber capabilities of the next generation of Claude.

    TLDW

    Anthropic is running the most compute fungible frontier lab in the world, with active deployments across AWS Trainium, Google TPU, and Nvidia GPU, and an internal orchestration layer that lets a chip serve inference in the morning and run reinforcement learning the same evening. Krishna Rao explains the cone of uncertainty that governs gigawatt scale compute procurement, the floor Anthropic refuses to drop below on model development compute, the Jevons paradox unlock from cutting Opus pricing, the 500 percent annualized net dollar retention from enterprise customers, the layer cake of long term deals with Google, Broadcom, Amazon, and the recent xAI Colossus tie up in Memphis, the phased release of the Mythos model in response to spiking cyber capabilities, the internal use of Claude Code to produce statutory financial statements and run a Monthly Financial Review skill, and why the team believes scaling laws are alive and well. The interview also covers fundraising history through Series D and Series E, the $75 billion already raised plus another $50 billion coming, talent density beating talent mass during the Meta poaching wave, and Rao’s belief that biotech and drug discovery represent the most exciting frontier for AI.

    Key Takeaways

    • Anthropic entered the year with about $9 billion of run rate revenue and ended the first quarter with north of $30 billion of run rate revenue, a more than 3x leap driven by model intelligence gains and the products built around them.
    • Compute is described as the lifeblood of the company, the canvas everything else is built on, and the most consequential class of decisions Rao makes. Buy too much and you go bankrupt. Buy too little and you cannot serve customers or stay at the frontier.
    • Rao spends 30 to 40 percent of his time on compute, even today, and the leadership team meets repeatedly on both procurement and ongoing compute allocation.
    • Anthropic is the only frontier language lab actively using all three major chip platforms in production: AWS Trainium, Google TPU, and Nvidia GPU. It is also the only major model available on all three clouds.
    • Flexibility is the central design principle. Anthropic builds flexibility into the deals themselves, into the orchestration layer that maps workloads to chips, and into compilers built from the chip level up.
    • The cone of uncertainty frames procurement. Small differences in weekly or monthly growth compound into wildly different two year outcomes, so the team plans across a range of scenarios rather than a single point estimate, and ranges toward the upper end while protecting downside.
    • Compute allocation across the company sits in three buckets: model development and research, internal employee acceleration, and external customer serving. A non negotiable floor protects model development even when customer demand is tight.
    • Anthropic estimates that if it cut off internal employee use of its own models, the freed compute could serve billions of dollars of additional revenue. It chooses not to, because internal use compounds into better future models.
    • Intelligence is multi dimensional, not a single IQ score. Anthropic measures real world capability through customer feedback, long horizon task performance, tool use, computer use, and speed at agentic tasks, not just leaderboard benchmarks that have largely saturated.
    • Each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers both capability improvements and an efficiency multiplier on token processing. New models often serve customers at a fraction of the prior cost while doing more.
    • Reinforcement learning is described as inference inside a sandbox with a reward function, so model efficiency gains directly improve internal RL throughput. The flywheel is tightly coupled.
    • Over 90 percent of code at Anthropic is now written by Claude Code, and a large share of Claude Code itself is written by Claude Code.
    • Anthropic shipped roughly 30 distinct product and feature releases in January and the pace has accelerated since.
    • Scaling laws, in Anthropic’s internal data, are alive and well. The team holds itself to a skeptical scientific standard and still does not see them slowing down.
    • Anthropic recently signed a 5 gigawatt deal with Google and Broadcom for TPUs starting in 2027, plus an Amazon Trainium agreement for up to 5 gigawatts, totaling more than $100 billion in commitments. A significant portion lands this year and next year.
    • A new partnership for capacity at the xAI Colossus facility in Memphis was announced just before the interview, aimed at expanding consumer and prosumer capacity.
    • Pricing has been remarkably stable across Haiku, Sonnet, and Opus. The biggest deliberate change was lowering Opus pricing, which produced a textbook Jevons paradox: consumption rose far faster than the price drop, and the new Opus 4.6 and 4.7 slot in at the same price point.
    • Mythos is the first model Anthropic chose to release in a phased way because of a sharp spike in cyber capability. In an open source codebase where a prior model found 22 security vulnerabilities, Mythos found roughly 250.
    • The Mythos release framework focuses on defensive use first, expands access over time, and is presented as a template for future capability spikes.
    • Anthropic now sells to 9 of the Fortune 10 and reports net dollar retention above 500 percent on an annualized basis. These are not pilots. Rao describes signing two double digit million dollar commitments during a 20 minute Uber ride to the studio.
    • The platform strategy is mostly horizontal. Anthropic will go vertical with offerings like Claude for Financial Services, Claude for Life Sciences, and Claude Security where it can demonstrate the model’s capabilities, but expects most application value to accrue to customers building on top.
    • Investors raised over $75 billion in equity since Rao joined, with another $50 billion in commitments tied to the Amazon and Google deals. Capital intensity is real, but the raises fund the upper end of the cone of uncertainty more than they fund current losses.
    • The Series E close coincided with the day the DeepSeek news broke, forcing investors to reassess their AI thesis in real time. Anthropic closed the round anyway.
    • Inside finance, Claude now produces statutory financial statements for every Anthropic legal entity, with a human checker. A library of more than 70 finance specific skills underpins workflows.
    • A custom Monthly Financial Review skill produces a 90 to 95 percent ready monthly close report, so leadership discussion shifts from reconciling numbers to debating implications.
    • An internal real time analytics platform called Anthrop Stats compresses weekly insight cycles from hours to about 30 minutes.
    • The biggest token user inside Anthropic’s finance team is the head of tax, focused on tax policy engines and workflow automation. The most senior people, not the youngest, are leading internal adoption.
    • Talent density beats talent mass. When Meta and others ran aggressive offer waves, Anthropic lost two people while peer labs lost dozens.
    • All seven Anthropic co founders remain at the company, as does most of the first 20 to 30 employees, which Rao credits to a collaborative, transparent, debate friendly culture and a real culture interview that can veto otherwise top tier candidates.
    • Dario Amodei holds an open all hands every two weeks, writes a short prepared document, and takes unscripted questions from anyone at the company.
    • AI safety investments in interpretability and alignment have a commercial side effect. Looking inside the model helps Anthropic build better models, and enterprises selling sensitive workloads want to trust the lab they hand customer data to.
    • Anthropic explicitly identifies as America first in its approach to model development, and engages closely with the US administration on capability releases such as Mythos.
    • The longer term product vision is the virtual collaborator: an agent with organizational context, access to the company’s tools, persistent memory, and the ability to work on ideas, not just tasks, over long horizons.
    • CoWork, Anthropic’s extension of the Claude Code paradigm into general knowledge work, is being adopted faster than Claude Code itself when indexed to the same point in its launch curve.
    • Anthropic’s product teams ship daily, with a fleet of agents working across the company on specific tasks. Everyone effectively becomes a manager of agents.
    • The dominant downside risks to Anthropic’s high end forecast are slower customer diffusion of model capability into real workflows, scaling laws flattening unexpectedly, and Anthropic losing its position at the frontier.
    • Rao is most excited about biotech and healthcare outcomes, especially the prospect that AI could push drug discovery and lab throughput up 10x or 100x, turning currently incurable diagnoses into treatable ones within a patient’s lifetime.

    Detailed Summary

    Compute as Lifeblood and the Cone of Uncertainty

    Rao opens with the claim that compute is the most important resource at Anthropic, and the most consequential decision class in the company. You cannot buy a gigawatt of compute next week. You have to anticipate demand a year or two in advance, and the cost of being wrong in either direction is high. Buy too much and the unit economics collapse. Buy too little and you cannot serve customers or stay at the frontier, which are described as the same failure mode. To navigate this, the team uses a cone of uncertainty rather than point estimates. Small differences in weekly growth compound into vastly different two year outcomes, and Anthropic tries to position itself toward the upper end of that cone while preserving optionality. Rao notes he has had to consciously break a lifetime of linear thinking and force himself into exponential models.

    Three Chip Platforms, One Orchestration Layer

    Anthropic uses Amazon’s Trainium, Google’s TPUs, and Nvidia’s GPUs fungibly. That was not free. Adopting TPUs at scale started around the third TPU generation, when outside observers thought it was a strange choice. Anthropic invested years into compilers and orchestration so workloads can flow across chips by generation and by job type. The team works deeply with Annapurna Labs at AWS to influence Trainium roadmaps because Anthropic stresses these chips harder than almost anyone. The result is what Rao believes is the most efficient utilization of compute across any frontier lab, with a dollar of compute going further inside Anthropic than anywhere else.

    Three Buckets and the Model Development Floor

    Compute gets allocated across model development, internal acceleration of employees, and customer serving. The conversations are collaborative rather than zero sum, but there is a hard floor on model development that the company refuses to cross even if it makes customer demand harder to serve in the short term. The thesis is simple. The returns to frontier intelligence are extremely high, especially in enterprise, so cutting model investment to chase near term revenue is a bad trade. Internal employee use is also explicitly protected. Rao notes that diverting that internal usage to external customers would unlock billions of additional revenue today, but the compounding benefit of accelerating researchers and engineers outweighs that.

    Intelligence Is Multi Dimensional

    Rao pushes back hard on the IQ framing of model progress. Benchmarks saturate quickly, and the real signal comes from how customers actually use the models. Anthropic looks at long horizon task completion, tool use, computer use, and time to result on agentic tasks. Two equally capable agents who differ only in speed produce dramatically different value, because the faster one compounds into more attempts and more outcomes. Frontier model leaps are also fuel efficient. The sedan to sports car analogy breaks down because each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers a step up in capability and a multiplier on per token efficiency.

    From 9 Billion to 30 Billion ARR in One Quarter

    The headline number for the quarter is a leap from about $9 billion of run rate revenue to over $30 billion, accomplished without onboarding a corresponding step up in compute, because new compute lands on ramps locked in 12 months prior. Rao attributes the leap to model capability gains, products that surface that intelligence in usable form factors, and an enterprise customer base that pulls more workloads onto Claude as each generation unlocks new use cases. Coding started the wave with Sonnet 3.5 and 3.6, and the same pattern is now playing out elsewhere in the economy.

    Recursive Self Improvement and Talent Density

    Over 90 percent of Anthropic’s code is now written by Claude Code, including most of Claude Code itself. Rao describes this as a structural reason to keep allocating internal compute to employees even when external demand is hungry. Recursive self improvement is not happening through models that need no humans. It is happening through researchers who set direction and use frontier models to compress months of work into days. Talent density beats talent mass. When Meta and other labs went after Anthropic researchers with very large packages, Anthropic lost two people while peer labs lost dozens.

    Procurement Strategy and the Layer Cake

    Compute lands as a layer cake. Last month Anthropic signed a 5 gigawatt TPU deal with Google and Broadcom starting in 2027, alongside an Amazon Trainium agreement for up to 5 gigawatts. The total is north of $100 billion in commitments. A new tie up with xAI’s Colossus facility in Memphis was announced just before the interview, intended for nearer term capacity to support consumer and prosumer growth. Anthropic evaluates near term and long term compute deals against the same set of variables: price, duration, location, chip type, and how efficiently the team can run it. The relationships are deeper than procurement. The hyperscalers are also distribution channels for the model.

    Platform First, Selective Vertical Bets

    Rao describes Anthropic as a platform first business, with most expected value accruing to customers building on the platform. The team will only go vertical when it can either demonstrate capabilities that are skating to where the puck is going, like Claude Code did before the models could fully support it, or when it wants to set a template for an industry vertical, as with Claude for Financial Services, Claude for Life Sciences, and Claude Security. He acknowledges that surprise capability jumps make customers anxious about the platform competing with them, and frames Anthropic’s mitigation as deeper partnerships, early access programs, and an emphasis on accelerating customer building rather than disintermediating it.

    Pricing, Jevons Paradox, and Return on Compute

    Pricing across Haiku, Sonnet, and Opus has been stable. The notable exception is Opus, which Anthropic deliberately repriced lower when launching Opus 4.5 because Opus class problems were being squeezed into Sonnet workloads. Efficiency gains made it possible to serve Opus profitably at the new level. The consumption response was a classic Jevons paradox, with usage rising far more than the price reduction would have predicted, and Opus 4.6 then slotted in at the same price with a capability bump. Margins are not framed as a per token markup. Compute is fungible across model development, internal acceleration, and customer serving, so Anthropic measures return on the entire compute envelope rather than software style variable cost per call.

    Fundraising, DeepSeek, and Capital Intensity

    Rao joined while Anthropic was closing its Series D, mid frontier model launch and during the FTX share liquidation. Investors initially questioned whether Anthropic needed a frontier model, whether AI safety and a real business could coexist, and why the sales team was so small. The Series E closed the same day the DeepSeek news broke, with markets violently re pricing AI in real time. Since Rao joined, Anthropic has raised over $75 billion, with another $50 billion tied to the Amazon and Google compute deals. The reason for the size of the raises is the cone of uncertainty, not current losses. Returns on compute today are described as robust.

    Mythos, Cyber Capability, and Phased Releases

    The Mythos release marks the first time Anthropic shipped a model under a deliberately phased rollout because of a specific capability spike. Cyber is the dimension that spiked. Where a prior model found 22 vulnerabilities in an open source codebase, Mythos found roughly 250. The defensive applications, automatically patching massive codebases, are genuinely valuable, but the offensive risk is real enough that Anthropic chose to release to a smaller group first and expand access over time. Rao positions this as a template for future capability spikes, not a permanent restriction. He also describes the relationship with the US administration as cooperative, including the Department of War interaction, with Anthropic supporting a regulatory framework that does not strangle innovation but takes responsibility seriously.

    Claude Inside Finance

    Anthropic’s finance team is one of the strongest internal case studies. Statutory financial statements for every legal entity are produced by Claude, with a human reviewer. A skill library of more than 70 finance specific skills underpins a Monthly Financial Review skill that drafts the monthly close at 90 to 95 percent ready, so leadership meetings shift from explaining the numbers to discussing what to do about them. An internal analytics platform called Anthrop Stats compresses weekly insight cycles from hours to 30 minutes. The biggest internal token user in finance is the head of tax, building policy engines, which Rao highlights as evidence that adoption is driven by the most senior people, not just younger engineers.

    Culture, Co Founders, and the Race to the Top

    Seven co founders should not, on paper, work as a leadership group. Rao argues it works because the culture was set early around collaboration, intellectual honesty, transparency, and humility. The culture interview is a real veto, not a checkbox. Dario Amodei runs an all hands every two weeks with a short written piece followed by unscripted questions, and decisions, once made, get clean alignment rather than residual politics. Anthropic frames its approach as a race to the top, where being a model for how to build the technology responsibly is itself a recruiting and retention advantage.

    The Virtual Collaborator and the Frontier Ahead

    The product vision Rao describes is the virtual collaborator. Not just a smarter chatbot, but an agent with organizational context, access to the company’s tools, memory, and the ability to work on ideas over long horizons. Coding was the first domain to feel this, but CoWork, Anthropic’s extension of the Claude Code pattern into general knowledge work, is being adopted faster than Claude Code was at the same age. Product development inside Anthropic already looks different. Teams ship daily, with fleets of agents working across the company, and individual humans increasingly act as managers of those fleets.

    Downside Risks and What Excites Him Most

    The three risks Rao names if asked to do a premortem on a softer year are slower customer diffusion of model capability into real workflows, scaling laws unexpectedly flattening, and Anthropic losing its frontier position to competitors. None of these are observed today, but he is unwilling to claim them with certainty. On the upside, he is most excited about biotech and healthcare. Lab throughput rising 10x or 100x, paired with AI assisted clinical workflows, could turn currently incurable diagnoses into treatable ones within a patient’s lifetime. That is the outcome he wants the technology to chase.

    Thoughts

    The most consequential structural point in this interview is the framing of compute as a single fungible resource pool measured by return on the entire envelope, not as a variable cost per inference call. That accounting shift, if you accept it, breaks most of the bear cases about AI lab unit economics. The bear argument almost always assumes that a token served to a customer is the only thing the chip did that day. Rao’s version is that the same fleet trains models in the morning, runs reinforcement learning at lunch, serves customers in the afternoon, and accelerates internal engineers in the evening. If even half of that is real, the right comparison is total compute spend versus total enterprise value created by the platform, and on that ratio Anthropic looks structurally strong rather than weak.

    The Jevons paradox on Opus pricing is the most actionable insight for anyone running an AI product. Most teams default to either chasing premium pricing on the newest model or undercutting to chase volume. Anthropic did something more disciplined: it left Sonnet and Haiku alone, dropped Opus when efficiency gains made it serveable, and watched aggregate usage rise faster than the price cut. The lesson is that frontier model pricing is not really a price problem. It is a capability access problem, and elasticity around the right tier is much higher than the standard SaaS playbook implies.

    The Mythos cyber jump deserves more attention than it has gotten. Going from 22 to 250 vulnerabilities found in the same codebase is the kind of capability discontinuity that genuinely changes the regulatory calculus. Anthropic is signaling that it can identify these discontinuities ahead of release and choose a deployment shape that respects them. Whether peer labs adopt similar discipline is the open question. Anthropic’s race to the top framing assumes they will be forced to. The competitive market may say otherwise.

    The hiring data point is the most underrated investor signal. Two departures while peer labs lost dozens, during the most aggressive talent war in tech history, is not a culture poster. It is a structural advantage that compounds every time another lab tries to buy its way to the frontier. Money can be matched. Conviction in the mission, transparent leadership, and a culture interview that can veto otherwise stellar candidates cannot. If you believe scaling laws hold, talent retention at this density is one of the few moats that actually scales with capital.

    Finally, the most interesting personal admission is that Krishna Rao, a finance leader trained at Blackstone and Cedar, is openly telling investors that linear thinking is the failure mode he had to break out of. The companies that pattern match this moment to prior technology waves are mispricing it, in both directions. The cone of uncertainty Anthropic uses internally is the right metaphor for everyone else too. If you are forecasting AI as if it is cloud in 2010, you are almost certainly wrong, and the magnitude of the error is much larger than it would be in any prior era.

    Watch the full conversation with Krishna Rao on Invest Like the Best here.

  • Jensen Huang on Nvidia’s Supply Chain Moat, TPU Competition, China Export Controls, and Why Nvidia Will Not Become a Cloud (Dwarkesh Podcast Summary)

    TLDW (Too Long, Didn’t Watch)

    Jensen Huang sat down with Dwarkesh Patel for over 90 minutes covering Nvidia’s supply chain dominance, the TPU threat, why Nvidia will not become a hyperscaler, whether the US should sell AI chips to China, and why Nvidia does not pursue multiple chip architectures at once. Jensen framed Nvidia’s entire business as transforming “electrons into tokens” and argued that Nvidia’s real moat is not any single technology but the full stack ecosystem it has built over two decades. He was blunt about his regret over not investing in Anthropic and OpenAI earlier, passionate about keeping the American tech stack dominant worldwide, and dismissive of the idea that China’s chip industry can be meaningfully contained through export controls.

    Key Takeaways

    1. Nvidia’s moat is the ecosystem, not the chip. Jensen repeatedly emphasized that Nvidia’s competitive advantage comes from CUDA, its massive installed base, its deep partnerships across the entire supply chain, and the fact that it operates in every cloud. The moat is not a single product but an interlocking system that took 20+ years to build.

    2. Supply chain bottlenecks are temporary, energy bottlenecks are not. Jensen argued that CoWoS packaging, HBM memory, EUV capacity, and logic fabrication bottlenecks can all be resolved in two to three years with the right demand signal. The real constraint on AI scaling is energy policy, which takes far longer to fix.

    3. TPUs and ASICs are not an existential threat to Nvidia. Jensen was emphatic that no competitor has demonstrated better price-performance or performance-per-watt than Nvidia, and challenged TPU and Trainium to prove otherwise on public benchmarks like InferenceMAX and MLPerf. He described Anthropic as a “unique instance, not a trend” for TPU adoption.

    4. Jensen regrets not investing in Anthropic and OpenAI earlier. He admitted he did not deeply internalize how much capital AI labs needed and that traditional VC funding was not sufficient for companies at that scale. He described this as a clear miss, though he said Nvidia was not in a position to make multi-billion dollar investments at the time.

    5. Nvidia will not become a hyperscaler. Jensen’s philosophy is “do as much as needed, as little as possible.” Building cloud infrastructure is something other companies can do, so Nvidia supports neoclouds like CoreWeave, Nebius, and Nscale instead of competing with them. Nvidia invests in ecosystem partners rather than vertically integrating into cloud services.

    6. Jensen is strongly against US chip export controls on China. This was the longest and most heated segment of the interview. Jensen argued that China already has abundant compute, energy, and AI researchers, and that export controls have accelerated China’s domestic chip industry while causing the US to concede the world’s second-largest technology market. He compared the situation to how US telecom policy allowed Huawei to dominate global telecommunications.

    7. AI will cause software tool usage to skyrocket, not collapse. Jensen pushed back on the narrative that AI will commoditize software companies. He argued that agents will use existing tools at massive scale, causing the number of instances of products like Excel, Synopsys Design Compiler, and other enterprise tools to grow exponentially.

    8. Nvidia does not pick winners among AI labs. Jensen explained that Nvidia invests across multiple foundation model companies simultaneously and refuses to favor any single one. He cited his own company’s unlikely survival story as the reason for this humility: Nvidia’s original graphics architecture was “precisely wrong” and would have been counted out by anyone picking winners.

    9. Nvidia added Groq for premium token economics. Nvidia recently acquired Groq and is folding it into the CUDA ecosystem because the market is now segmenting into different token tiers. Some customers will pay premium prices for faster response times even at lower throughput, creating a new segment of the inference market.

    10. Without AI, Nvidia would still be very large. Jensen was clear that accelerated computing, not AI specifically, is the foundational mission of the company. Molecular dynamics, quantum chemistry, computational lithography, data processing, and physics simulation all benefit from GPU acceleration regardless of deep learning.

    Detailed Summary

    Nvidia’s Real Business: Electrons to Tokens

    Jensen opened the conversation by reframing Nvidia’s entire value proposition. When Dwarkesh suggested that Nvidia is fundamentally a software company that sends a GDS2 file to TSMC for manufacturing, Jensen pushed back hard. He described Nvidia’s job as transforming electrons into tokens, with everything in between representing an “incredible journey” of artistry, engineering, science, and invention. He said the transformation is far from deeply understood and the journey is far from over, making commoditization unlikely.

    Jensen described Nvidia as operating a philosophy of doing “as much as necessary and as little as possible.” Whatever Nvidia does not need to do itself, it partners with someone else and makes it part of the broader ecosystem. This is why Nvidia has what Jensen called probably the largest ecosystem of partners in the industry, spanning the full supply chain upstream and downstream, application developers, model makers, and all five layers of the AI stack.

    On the question of whether AI will commoditize software companies, Jensen offered a contrarian take. He argued that agents are going to use software tools at unprecedented scale, meaning the number of instances of products like Excel, Cadence design tools, and Synopsys compilers will skyrocket. Today the bottleneck is the number of human engineers. Tomorrow, those engineers will be supported by swarms of agents exploring design spaces and using the same tools humans use today. Jensen said the reason this has not happened yet is simply that the agents are not good enough at using tools. That will change.

    The Supply Chain Moat

    Dwarkesh pressed Jensen on Nvidia’s reported $100 billion (and potentially $250 billion) in purchase commitments with foundries, memory manufacturers, and packaging companies. The question was whether Nvidia’s real moat for the next few years is simply locking up scarce upstream components so that no competitor can get the memory and logic they need to build alternative accelerators.

    Jensen confirmed this is a significant advantage but framed it differently. He said Nvidia has made enormous explicit and implicit commitments upstream. The implicit commitments matter just as much: Jensen personally meets with CEOs across the supply chain to explain the scale of the coming AI industry, convince them to invest in capacity, and assure them that Nvidia’s downstream demand is large enough to justify that investment. Nvidia’s GTC conference serves this purpose too, bringing the entire ecosystem together so upstream suppliers can see downstream demand and vice versa.

    Jensen described a process of systematically “prefetching bottlenecks” years in advance. CoWoS advanced packaging was a major bottleneck two years ago, but Nvidia swarmed it with repeated doubling of capacity until TSMC recognized it as mainstream computing technology rather than a specialty product. More recently, Nvidia has invested in the silicon photonics ecosystem through partnerships with Lumentum and Coherent, invented new packaging technologies, licensed patents to keep the supply chain open, and even invested in new testing equipment like double-sided probing.

    When Dwarkesh asked about the ultimate physical bottlenecks, Jensen surprised him. The hardest bottleneck to solve is not CoWoS or HBM or EUV machines. It is plumbers and electricians needed to build data centers. Jensen used this as a launching point to criticize “doomers” who discourage people from pursuing careers in software engineering or radiology, arguing that scaring people out of these professions creates the real bottlenecks.

    On EUV and logic scaling specifically, Jensen was optimistic. He said no supply chain bottleneck lasts longer than two to three years. Once you can build one of something, you can build ten, and once you can build ten, you can build a million. The key is a clear demand signal. If TSMC is convinced of the demand, ASML will produce enough EUV machines. Meanwhile, Nvidia continues to improve computing efficiency by 10x to 50x per generation through architecture, algorithms, and system design.

    The TPU Question

    Dwarkesh pushed hard on whether Google’s TPUs represent a real threat, noting that two of the top three AI models (Claude and Gemini) were trained on TPUs. Jensen drew a sharp distinction between what Nvidia builds and what a TPU is. Nvidia builds accelerated computing, which serves molecular dynamics, quantum chromodynamics, data processing, fluid dynamics, particle physics, and AI. A TPU is a tensor processing unit optimized for matrix multiplies. Nvidia’s market reach is far greater than any TPU or ASIC can possibly have.

    Jensen emphasized programmability as Nvidia’s core architectural advantage. If you want to invent a new attention mechanism, build a hybrid SSM model, fuse diffusion and autoregressive techniques, or disaggregate computation in a novel way, you need a generally programmable architecture. The only way to achieve 10x or 100x performance leaps (versus the roughly 25% per year from Moore’s Law) is to fundamentally change the algorithm, and that requires the flexibility CUDA provides.

    On the specific question of whether hyperscalers with huge engineering teams can simply write their own kernels and bypass CUDA, Jensen acknowledged they do write custom kernels but argued that Nvidia’s engineers still routinely deliver 2x to 3x speedups when they optimize a partner’s stack. He described Nvidia’s GPUs as “F1 racers” that anyone can drive at 100 mph, but extracting peak performance requires deep architectural expertise. Nvidia uses AI itself to generate many of its optimized kernels.

    Jensen was particularly blunt about public benchmarks. He pointed to Dylan Patel’s InferenceMAX benchmark and said neither TPU nor Trainium has been willing to demonstrate their claimed performance advantages on it. He said Nvidia’s performance-per-TCO is the best in the world, “bar none,” and challenged anyone to prove otherwise.

    Regarding Anthropic’s multi-gigawatt deal with Broadcom and Google for TPUs, Jensen called it “a unique instance, not a trend.” He said without Anthropic, there would be essentially no TPU growth and no Trainium growth. He traced this back to his own mistake: when Anthropic and OpenAI needed multi-billion dollar investments from their compute suppliers to get off the ground, Nvidia was not in a position to provide that capital. Google and AWS were, and in return, Anthropic committed to using their compute.

    Nvidia’s Investment Strategy and Regrets

    Jensen was unusually candid about his regret over not investing in foundation model companies earlier. He said he did not deeply internalize how different AI labs were from typical startups. A traditional VC would never put $5 to $10 billion into a single AI lab, but that was exactly what companies like OpenAI and Anthropic needed. By the time Jensen understood this, Nvidia was not in a financial or cultural position to make those kinds of investments.

    Now, Nvidia has invested approximately $30 billion in OpenAI and $10 billion in Anthropic. Jensen said he is delighted to support both and considers their existence essential for the world. But he acknowledged that these investments came at much higher valuations than would have been possible years earlier.

    Jensen explained Nvidia’s broader investment philosophy: support everyone, do not pick winners. He invests in one foundation model company, he invests in all of them. This comes from hard-won humility. When Nvidia started, there were 60 3D graphics companies. Nvidia’s original architecture was “precisely wrong” and the company would have been at the top of most lists to fail. Jensen said he has enough humility from that experience to know that you cannot predict which AI company will ultimately succeed.

    Why Nvidia Will Not Become a Hyperscaler

    Dwarkesh pointed out that Nvidia has the cash to build and operate its own cloud infrastructure, bypassing the middleman ecosystem that converts CapEx into OpEx for AI labs. Jensen rejected this path based on his core operating philosophy.

    If Nvidia did not build its computing platform, NVLink, and the CUDA ecosystem, nobody else would have done it. He is “completely certain” of that. These are things Nvidia must do. But the world has lots of clouds. If Nvidia did not build a cloud, someone else would show up. So the answer is to support the ecosystem instead: invest in CoreWeave, Nscale, Nebius, and others to help them exist and scale, rather than competing with them.

    Jensen was clear that Nvidia is not trying to be in the financing business either. When OpenAI needed a $30 billion investment before its IPO, Nvidia stepped up because OpenAI needed it and Nvidia deeply believed in the company. But these are targeted ecosystem investments, not a strategic pivot into cloud services.

    On GPU allocation during shortages, Jensen pushed back on the narrative that Nvidia strategically “fractures” the market by giving allocations to smaller neoclouds. He said the process is straightforward: you forecast demand, you place a purchase order, and it is first in, first out. Nvidia never changes prices based on demand. Jensen said he prefers to be dependable and serve as the foundation of the industry rather than extracting maximum short-term value.

    The China Debate

    The longest and most heated section of the interview was Jensen’s case against US chip export controls on China. This was a genuine debate, with Dwarkesh pushing the national security argument and Jensen pushing back forcefully.

    Jensen’s core argument rested on several pillars. First, China already has abundant compute. They manufacture 60% or more of the world’s mainstream chips, have massive energy infrastructure (including empty data centers with full power), and employ roughly 50% of the world’s AI researchers. The threshold of compute needed to build models like Anthropic’s Mythos has already been reached and exceeded by China’s existing infrastructure.

    Second, export controls have backfired. They accelerated China’s domestic chip industry, forced their AI ecosystem to optimize for internal architectures instead of the American tech stack, and caused the United States to concede the second-largest technology market in the world. Jensen compared this directly to how US telecom policy allowed Huawei to dominate global telecommunications infrastructure.

    Third, Jensen argued that AI is a five-layer stack (energy, chips, computing platform, models, applications) and the US needs to win at every layer. Fixating on one layer (models) at the expense of another layer (chips) is counterproductive. If Chinese open source AI models end up optimized for non-American hardware and that stack gets exported to the global south, the Middle East, Africa, and Southeast Asia, the US will have lost something far more valuable than whatever marginal compute advantage the export controls provided.

    Dwarkesh countered with the Mythos example: Anthropic’s new model found thousands of high-severity zero-day vulnerabilities across every major operating system and browser, including one that had existed in OpenBSD for 27 years. If China had enough compute to train and deploy a model like Mythos at scale before the US could prepare, the cyber-offensive capabilities would be devastating.

    Jensen’s response was direct. Mythos was trained on “fairly mundane capacity” that is already abundantly available in China. The amount of compute is not the bottleneck for that kind of breakthrough. Great computer science is, and China has no shortage of brilliant AI researchers. He pointed to DeepSeek as evidence: most advances in AI come from algorithmic innovation, not raw hardware. If China’s researchers can achieve breakthroughs like DeepSeek with limited hardware, imagine what they could do with more.

    Jensen also argued for dialogue over confrontation. He said it is essential that American and Chinese AI researchers are talking to each other, and that both countries agree on what AI should not be used for. The idea that you can prevent AI risks by cutting off chip sales, when the real advances come from algorithms and computer science, reflects a fundamental misunderstanding of how AI progress works.

    The debate ended without resolution, but Jensen’s final point was sharp: “I’m not talking to somebody who woke up a loser. That loser attitude, that loser premise, makes no sense to me.”

    Why Not Multiple Chip Architectures?

    Near the end of the interview, Dwarkesh asked why Nvidia does not run multiple parallel chip projects with different architectures, like a Cerebras-style wafer-scale design or a Dojo-style huge package, or even one without CUDA.

    Jensen’s answer was simple: “We don’t have a better idea.” Nvidia simulates all of these alternative approaches in its internal simulators and they are provably worse. The company works on exactly the projects it wants to work on. If the workload were to change dramatically (not just the algorithms, but the actual market shape), Nvidia might add other accelerators.

    In fact, Nvidia recently did exactly this by acquiring Groq. The inference market is now segmenting into different tiers. Some customers will pay premium prices for extremely fast response times even if throughput is lower. This creates a new “high ASP token” segment that justifies a different point on the performance curve. But Jensen was clear: if he had more money, he would put it all behind Nvidia’s existing architecture, not diversify into alternatives.

    Nvidia Without AI

    Jensen closed by saying that even if the deep learning revolution had never happened, Nvidia would be “very, very large.” The premise of the company has always been that general-purpose computing cannot scale indefinitely and that domain-specific acceleration is the way forward. Molecular dynamics, seismic processing, image processing, computational lithography, quantum chemistry, and data processing all benefit from GPU acceleration regardless of AI. Jensen said the fundamental promise of accelerated computing has not changed “not even a little bit.”

    Thoughts

    This interview is one of the most revealing Jensen Huang conversations in years, partly because Dwarkesh actually pushes back instead of lobbing softballs. A few things stand out.

    The Anthropic regret is real and significant. Jensen is essentially admitting that Nvidia’s biggest strategic miss of the AI era was not understanding that foundation model companies needed supplier-level capital commitments, not VC funding. The fact that Google and AWS used compute investments to lock in Anthropic’s architecture choices has had downstream consequences that Nvidia is still working to unwind. When Jensen says Anthropic is “a unique instance, not a trend” for TPU adoption, he is simultaneously downplaying the threat and revealing exactly how seriously he takes it.

    The China debate is the highlight. Jensen’s argument is more nuanced than it first appears. He is not saying “sell China everything.” He is saying the current binary approach of near-total restriction has backfired by accelerating China’s domestic chip industry and pushing the Chinese AI ecosystem away from the American tech stack. His comparison to the US telecom industry losing global market share to Huawei is pointed and historically grounded. Whether you agree with his conclusion or not, the framing of AI as a five-layer stack where the US needs to compete at every layer is a useful mental model.

    The “electrons to tokens” framing is Jensen at his best. It is a simple metaphor that captures something genuinely complex about where value is created in the AI supply chain. And his insistence that the transformation is “far from deeply understood” is a subtle way of arguing that Nvidia’s competitive position will be durable because the problem space is not close to being solved.

    The Groq acquisition reveal is interesting for what it signals about the inference market. If Nvidia is creating a separate product tier for premium-priced, low-latency tokens, it suggests the company sees inference economics fragmenting significantly. This aligns with the broader trend of AI becoming an enterprise product where different customers have wildly different willingness to pay based on how they use tokens.

    Finally, Jensen’s refusal to diversify chip architectures is a bold bet. “We simulate it all in our simulator, provably worse” is an incredibly confident statement. History is full of companies that were right until they were not. But Nvidia’s track record of 50x generation-over-generation improvements through co-design across processors, fabric, libraries, and algorithms is hard to argue with. The question is whether the current paradigm of transformer-based models on GPU clusters represents a local or global optimum for AI compute.

  • The Genesis Mission: Inside the “Manhattan Project” for AI-Driven Science

    TL;DR

    On November 24, 2025, President Trump signed an Executive Order launching “The Genesis Mission.” This initiative aims to centralize federal data and high-performance computing under the Department of Energy to create a massive AI platform. Likened to the World War II Manhattan Project, its goal is to accelerate scientific discovery in critical fields like nuclear energy, biotechnology, and advanced manufacturing.

    Key Takeaways

    • The “Manhattan Project” of AI: The Administration frames this as a historic national effort comparable in urgency to the project that built the atomic bomb, aimed now at global technology dominance.
    • Department of Energy Leads: The Secretary of Energy will oversee the mission, leveraging National Labs and supercomputing infrastructure.
    • The “Platform”: A new “American Science and Security Platform” will be built to host AI agents, foundation models, and secure federal datasets.
    • Six Core Challenges: The mission initially focuses on advanced manufacturing, biotechnology, critical materials, nuclear energy, quantum information science, and semiconductors.
    • Data is the Fuel: The order prioritizes unlocking the “world’s largest collection” of federal scientific datasets to train these new AI models.

    Detailed Summary of the Executive Order

    The Executive Order, titled Launching the Genesis Mission, establishes a coordinated national effort to harness Artificial Intelligence for scientific breakthroughs. Here is how the directive breaks down:

    1. Purpose and Ambition

    The order asserts that America is currently in a race for global technology dominance in AI. To win this race, the Administration is launching the “Genesis Mission,” described as a dedicated effort to unleash a new age of AI-accelerated innovation. The explicit goal is to secure energy dominance, strengthen national security, and multiply the return on taxpayer investment in R&D.

    2. The American Science and Security Platform

    The core mechanism of this mission is the creation of the American Science and Security Platform. This infrastructure will provide:

    • Compute: Secure cloud-based AI environments and DOE national lab supercomputers.
    • AI Agents: Autonomous agents designed to test hypotheses, automate research workflows, and explore design spaces.
    • Data: Access to proprietary, federally curated, and open scientific datasets, as well as synthetic data generated by DOE resources.

    3. Timeline and Milestones

    The Secretary of Energy is on a tight schedule to operationalize this vision:

    • 90 Days: Identify all available federal computing and storage resources.
    • 120 Days: Select initial data/model assets and develop a cybersecurity plan for incorporating data from outside the federal government.
    • 270 Days: Demonstrate an “initial operating capability” of the Platform for at least one national challenge.

    4. Targeted Scientific Domains

    The mission is not open-ended; it focuses on specific high-impact areas. Within 60 days, the Secretary must submit a list of at least 20 challenges, spanning priority domains including Biotechnology, Nuclear Fission and Fusion, Quantum Information Science, and Semiconductors.

    5. Public-Private and International Collaboration

    While led by the DOE, the mission explicitly calls for bringing together “brilliant American scientists” from universities and pioneering businesses. The Secretary is tasked with developing standardized frameworks for IP ownership, licensing, and trade-secret protections to encourage private sector participation.


    Analysis and Thoughts

    “The Genesis Mission will… multiply the return on taxpayer investment into research and development.”

    The Data Sovereignty Play
    The most significant aspect of this order is the recognition of federal datasets as a strategic asset. By explicitly mentioning the “world’s largest collection of such datasets” developed over decades, the Administration is leveraging an asset that private companies cannot easily duplicate. This suggests a shift toward “Sovereign AI” where the government doesn’t just regulate AI, but builds the foundational models for science.

    Hardware over Software
    Placing this under the Department of Energy (DOE) rather than the National Science Foundation (NSF) or Commerce is a strategic signal. The DOE owns the National Labs (like Oak Ridge and Lawrence Livermore) and the world’s fastest supercomputers. This indicates the Administration views this as a heavy-infrastructure challenge—requiring massive energy and compute—rather than just a software problem.

    The “Manhattan Project” Framing
    Invoking the Manhattan Project sets an incredibly high bar. That project resulted in a singular, world-changing weapon. The Genesis Mission aims for a broader diffusion of “AI agents” to automate research. The success of this mission will depend heavily on the integration mentioned in Section 2—getting academic, private, and classified federal systems to talk to each other without compromising security.

    The Energy Component
    It is notable that nuclear fission and fusion are highlighted as specific challenges. AI is notoriously energy-hungry. By tasking the DOE with solving energy problems using AI, the mission creates a feedback loop: better AI designs better power plants, which power better AI.

  • Inside Microsoft’s AGI Masterplan: Satya Nadella Reveals the 50-Year Bet That Will Redefine Computing, Capital, and Control

    1) Fairwater 2 is live at unprecedented scale, with Fairwater 4 linking over a 1 Pb AI WAN

    Nadella walks through the new Fairwater 2 site and states Microsoft has targeted a 10x training capacity increase every 18 to 24 months relative to GPT-5’s compute. He also notes Fairwater 4 will connect on a one petabit network, enabling multi-site aggregation for frontier training, data generation, and inference.

    2) Microsoft’s MAI program, a parallel superintelligence effort alongside OpenAI

    Microsoft is standing up its own frontier lab and will “continue to drop” models in the open, with an omni-model on the roadmap and high-profile hires joining Mustafa Suleyman. This is a clear signal that Microsoft intends to compete at the top tier while still leveraging OpenAI models in products.

    3) Clarification on IP: Microsoft says it has full access to the GPT family’s IP

    Nadella says Microsoft has access to all of OpenAI’s model IP (consumer hardware excluded) and shared that the firms co-developed system-level designs for supercomputers. This resolves long-standing ambiguity about who holds rights to GPT-class systems.

    4) New exclusivity boundaries: OpenAI’s API is Azure-exclusive, SaaS can run elsewhere with limited exceptions

    The interview spells out that OpenAI’s platform API must run on Azure. ChatGPT as SaaS can be hosted elsewhere only under specific carve-outs, for example certain US government cases.

    5) Per-agent future for Microsoft’s business model

    Nadella describes a shift where companies provision Windows 365 style computers for autonomous agents. Licensing and provisioning evolve from per-user to per-user plus per-agent, with identity, security, storage, and observability provided as the substrate.

    6) The 2024–2025 capacity “pause” explained

    Nadella confirms Microsoft paused or dropped some leases in the second half of last year to avoid lock-in to a single accelerator generation, keep the fleet fungible across GB200, GB300, and future parts, and balance training with global serving to match monetization.

    7) Concrete scaling cadence disclosure

    The 10x training capacity target every 18 to 24 months is stated on the record while touring Fairwater 2. This implies the next frontier runs will be roughly an order of magnitude above GPT-5 compute.

    8) Multi-model, multi-supplier posture

    Microsoft will keep using OpenAI models in products for years, build MAI models in parallel, and integrate other frontier models where product quality or cost warrants it.

    Why these points matter

    • Industrial scale: Fairwater’s disclosed networking and capacity targets set a new bar for AI factories and imply rapid model scaling.
    • Strategic independence: MAI plus GPT IP access gives Microsoft a dual track that reduces single-partner risk.
    • Ecosystem control: Azure exclusivity for OpenAI’s API consolidates platform power at the infrastructure layer.
    • New revenue primitives: Per-agent provisioning reframes Microsoft’s core metrics and pricing.

    Pull quotes

      “We’ve tried to 10x the training capacity every 18 to 24 months.”

      “The API is Azure-exclusive. The SaaS business can run anywhere, with a few exceptions.”

      “We have access to the GPT family’s IP.”

    TL;DW

    • Microsoft is building a global network of AI super-datacenters (Fairwater 2 and beyond) designed for fast upgrade cycles and cross-region training at petabit scale.
    • Strategy spans three layers: infrastructure, models, and application scaffolding, so Microsoft creates value regardless of which model wins.
    • AI economics shift margins, so Microsoft blends subscriptions with metered consumption and focuses on tokens per dollar per watt.
    • Future includes autonomous agents that get provisioned like users with identity, security, storage, and observability.
    • Trust and sovereignty are central. Microsoft leans into compliant, sovereign cloud footprints to win globally.

    Detailed Summary

    1) Fairwater 2: AI Superfactory

    Microsoft’s Fairwater 2 is presented as the most powerful AI datacenter yet, packing hundreds of thousands of GB200 and GB300 accelerators, tied by a petabit AI WAN and designed to stitch training jobs across buildings and regions. The key lesson: keep the fleet fungible and avoid overbuilding for a single hardware generation as power density and cooling change with each wave like Vera Rubin and Rubin Ultra.

    2) The Three-Layer Strategy

    • Infrastructure: Azure’s hyperscale footprint, tuned for training, data generation, and inference, with strict flexibility across model architectures.
    • Models: Access to OpenAI’s GPT family for seven years plus Microsoft’s own MAI roadmap for text, image, and audio, moving toward an omni-model.
    • Application Scaffolding: Copilots and agent frameworks like GitHub’s Agent HQ and Mission Control that orchestrate many agents on real repos and workflows.

    This layered approach lets Microsoft compete whether the value accrues to models, tooling, or infrastructure.

    3) Business Models and Margins

    AI raises COGS relative to classic SaaS, so pricing blends entitlements with consumption tiers. GitHub Copilot helped catalyze a multibillion market in a year, even as rivals emerged. Microsoft aims to ride a market that is expanding 10x rather than clinging to legacy share. Efficiency focus: tokens per dollar per watt through software optimization as much as hardware.

    4) Copilot, GitHub, and Agent Control Planes

    GitHub becomes the control plane for multi-agent development. Agent HQ and Mission Control aim to let teams launch, steer, and observe multiple agents working in branches, with repo-native primitives for issues, actions, and reviews.

    5) Models vs Scaffolding

    Nadella argues model monopolies are checked by open source and substitution. Durable value sits in the scaffolding layer that brings context, data liquidity, compliance, and deep tool knowledge, exemplified by Excel Agent that understands formulas and artifacts beyond screen pixels.

    6) Rise of Autonomous Agents

    Two worlds emerge: human-in-the-loop Copilots and fully autonomous agents. Microsoft plans to provision agents with computers, identity, security, storage, and observability, evolving end-user software into an infrastructure business for agents as well as people.

    7) MAI: Microsoft’s In-House Frontier Effort

    Microsoft is assembling a top-tier lab led by Mustafa Suleyman and veterans from DeepMind and Google. Early MAI models show progress in multimodal arenas. The plan is to combine OpenAI access with independent research and product-optimized models for latency and cost.

    8) Capex and Industrial Transformation

    Capex has surged. Microsoft frames this era as capital intensive and knowledge intensive. Software scheduling, workload placement, and continual throughput improvements are essential to maximize returns on a fleet that upgrades every 18 to 24 months.

    9) The Lease Pause and Flexibility

    Microsoft paused some leases to avoid single-generation lock-in and to prevent over-reliance on a small number of mega-customers. The portfolio favors global diversity, regulatory alignment, balanced training and inference, and location choices that respect sovereignty and latency needs.

    10) Chips and Systems

    Custom silicon like Maia will scale in lockstep with Microsoft’s own models and OpenAI collaboration, while Nvidia remains central. The bar for any new accelerator is total fleet TCO, not just raw performance, and system design is co-evolved with model needs.

    11) Sovereign AI and Trust

    Nations want AI benefits with continuity and control. Microsoft’s approach combines sovereign cloud patterns, data residency, confidential computing, and compliance so countries can adopt leading AI while managing concentration risk. Nadella emphasizes trust in American technology and institutions as a decisive global advantage.


    Key Takeaways

    1. Build for flexibility: Datacenters, pricing, and software are optimized for fast evolution and multi-model support.
    2. Three-layer stack wins: Infrastructure, models, and scaffolding compound each other and hedge against shifts in where value accrues.
    3. Agents are the next platform: Provisioned like users with identity and observability, agents will demand a new kind of enterprise infrastructure.
    4. Efficiency is king: Tokens per dollar per watt drives margins more than any single chip choice.
    5. Trust and sovereignty matter: Compliance and credible guarantees are strategic differentiators in a bipolar world.
  • Composer: Building a Fast Frontier Model with Reinforcement Learning

    Composer represents Cursor’s most ambitious step yet toward a new generation of intelligent, high-speed coding agents. Built through deep reinforcement learning (RL) and large-scale infrastructure, Composer delivers frontier-level results at speeds up to four times faster than comparable models:contentReference[oaicite:0]{index=0}. It isn’t just another large language model; it’s an actively trained software engineering assistant optimized to think, plan, and code with precision — in real time.

    From Cheetah to Composer: The Evolution of Speed

    The origins of Composer go back to an experimental prototype called Cheetah, an agent Cursor developed to study how much faster coding models could get before hitting usability limits. Developers consistently preferred the speed and fluidity of an agent that responded instantly, keeping them “in flow.” Cheetah proved the concept, but it was Composer that matured it — integrating reinforcement learning and mixture-of-experts (MoE) architecture to achieve both speed and intelligence.

    Composer’s training goal was simple but demanding: make the model capable of solving real-world programming challenges in real codebases using actual developer tools. During RL, Composer was given tasks like editing files, running terminal commands, performing semantic searches, or refactoring code. Its objective wasn’t just to get the right answer — it was to work efficiently, using minimal steps, adhering to existing abstractions, and maintaining code quality:contentReference[oaicite:1]{index=1}.

    Training on Real Engineering Environments

    Rather than relying on synthetic datasets or static benchmarks, Cursor trained Composer within a dynamic software environment. Every RL episode simulated an authentic engineering workflow — debugging, writing unit tests, applying linter fixes, and performing large-scale refactors. Over time, Composer developed behaviors that mirror an experienced developer’s workflow. It learned when to open a file, when to search globally, and when to execute a command rather than speculate.

    Cursor’s evaluation framework, Cursor Bench, measures progress by realism rather than abstract metrics. It compiles actual agent requests from engineers and compares Composer’s solutions to human-curated optimal responses. This lets Cursor measure not just correctness, but also how well the model respects a team’s architecture, naming conventions, and software practices — metrics that matter in production environments.

    Reinforcement Learning as a Performance Engine

    Reinforcement learning is at the heart of Composer’s performance. Unlike supervised fine-tuning, which simply mimics examples, RL rewards Composer for producing high-quality, efficient, and contextually relevant work. It actively learns to choose the right tools, minimize unnecessary output, and exploit parallelism across tasks. The model was even rewarded for avoiding unsupported claims — pushing it to generate more verifiable and responsible code suggestions.

    As RL progressed, emergent behaviors appeared. Composer began autonomously running semantic searches to explore codebases, fixing linter errors, and even generating and executing tests to validate its own work. These self-taught habits transformed it from a passive text generator into an active agent capable of iterative reasoning.

    Infrastructure at Scale: Thousands of Sandboxed Agents

    Behind Composer’s intelligence is a massive engineering effort. Training large MoE models efficiently requires significant parallelization and precision management. Cursor’s infrastructure, built with PyTorch and Ray, powers asynchronous RL at scale. Their system supports thousands of simultaneous environments, each a sandboxed virtual workspace where Composer experiments safely with file edits, code execution, and search queries.

    To achieve this scale, the team integrated MXFP8 MoE kernels with expert and hybrid-sharded data parallelism. This setup allows distributed training across thousands of NVIDIA GPUs with minimal communication cost — effectively combining speed, scale, and precision. MXFP8 also enables faster inference without any need for post-training quantization, giving developers real-world performance gains instantly.

    Cursor’s infrastructure can spawn hundreds of thousands of concurrent sandboxed coding environments. This capability, adapted from their Background Agents system, was essential to unify RL experiments with production-grade conditions. It ensures that Composer’s training environment matches the complexity of real-world coding, creating a model genuinely optimized for developer workflows.

    The Cursor Bench and What “Frontier” Means

    Composer’s benchmark performance earned it a place in what Cursor calls the “Fast Frontier” class — models designed for efficient inference while maintaining top-tier quality. This group includes systems like Haiku 4.5 and Gemini Flash 2.5. While GPT-5 and Sonnet 4.5 remain the strongest overall, Composer outperforms nearly every open-weight model, including Qwen Coder and GLM 4.6:contentReference[oaicite:2]{index=2}. In tokens-per-second performance, Composer’s throughput is among the highest ever measured under the standardized Anthropic tokenizer.

    Built by Developers, for Developers

    Composer isn’t just research — it’s in daily use inside Cursor. Engineers rely on it for their own development, using it to edit code, manage large repositories, and explore unfamiliar projects. This internal dogfooding loop means Composer is constantly tested and improved in real production contexts. Its success is measured by one thing: whether it helps developers get more done, faster, and with fewer interruptions.

    Cursor’s goal isn’t to replace developers, but to enhance them — providing an assistant that acts as an extension of their workflow. By combining fast inference, contextual understanding, and reinforcement learning, Composer turns AI from a static completion tool into a real collaborator.

    Wrap Up

    Composer represents a milestone in AI-assisted software engineering. It demonstrates that reinforcement learning, when applied at scale with the right infrastructure and metrics, can produce agents that are not only faster but also more disciplined, efficient, and trustworthy. For developers, it’s a step toward a future where coding feels as seamless and interactive as conversation — powered by an agent that truly understands how to build software.