PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: foundation models

  • Jensen Huang at Stanford CS153 Frontier Systems on Co-Design, Agentic Computing, Vera Rubin, Open Models, and the Million-X Decade That Reshaped AI Infrastructure

    https://www.youtube.com/watch?v=tsQB0n0YV3k

    NVIDIA CEO Jensen Huang returned to Stanford for the CS153 Frontier Systems class (the room nicknamed itself “AI Coachella”) to lay out, in raw form, how he thinks about the computer being reinvented for the first time in over sixty years. Across roughly seventy minutes of student questions he walks through the codesign philosophy that gave NVIDIA a million-x decade, the architectural through-line from Hopper to Grace Blackwell to Vera Rubin to Feynman, the case for open source foundation models, the realities of tokens per watt and MFU, energy demand running a thousand times higher, the China and export-control debate, and his own biggest strategic mistakes. Watch the full conversation on YouTube.

    TLDW

    Huang argues every layer of computing has changed: the programming model, the system architecture, the deployment pattern, the economics. Co-design across CPUs, GPUs, networking, storage, switches and compilers gave NVIDIA roughly a million-x speed-up over ten years versus the ten-x Moore’s Law era, and that headroom is what let researchers say “just train on the whole internet.” Hopper was built for pre-training, Grace Blackwell NVLink72 for inference and reasoning (50x over Hopper in two years), Vera Rubin is built for agents that load long memory, call tools and need a low-latency single-threaded CPU bolted directly to the GPU, and Feynman extends that to swarms of agents that spawn sub-agents. Open weights matter because safety, sovereignty (230-plus languages no one else will fund) and domain models for biology, autonomy, robotics and climate need a foundation that NVIDIA is willing to seed. Compute is not really the scarce resource (Huang says place the order and the chips ship), the broken thing is institutional budgeting that can’t put a billion dollars into a shared university supercomputer. Energy demand is heading a thousand times higher and this is finally the moment market forces alone will fund sustainable generation. On geopolitics he rejects the GPUs-as-atomic-bombs framing and warns America will end up like its telecom industry if it cedes two thirds of the world. On career he advises seeking suffering on purpose. On strategy he says observe, reason from first principles, build a mental model, work backwards, minimize opportunity cost, maximize optionality.

    Key Takeaways

    • The computing model has been substantially unchanged since the IBM System 360, sixty-plus years ago. Huang’s first computer architecture book was the System 360 manual. AI is the first true reinvention.
    • Old computing was pre-recorded retrieval. New computing is generated, contextually aware and continuous. Cloud was on-demand. Agentic systems run continuously.
    • Codesign is NVIDIA’s central thesis. Inherited from the Hennessy and Patterson RISC era at Stanford, extended across CPUs, GPUs, networking, switches, storage, compilers and frameworks all optimized together.
    • The result of full-stack codesign: roughly 1,000,000x faster compute over ten years, versus a generous 10x to 100x for Moore’s Law in the same period. Dennard scaling effectively ended a decade ago.
    • That million-x speed-up is what unlocked “train on all of the internet” as a realistic AI strategy.
    • After GPT, Huang says it was obvious thinking was next. Reasoning is just generating tokens consumed internally, then using tools is generating tokens consumed externally. Agentic systems followed predictably.
    • Education needs AI baked into the curriculum, not just taught as a subject. Pre-recorded textbooks cannot keep pace with knowledge being generated in real time.
    • Huang says he cannot learn anymore without AI. He has the AI read the paper, then read every related paper, then become a dedicated researcher he can interrogate.
    • Mead and Conway and the first-principles methodology of semiconductor design are still worth learning even though most of the scaling tricks have been exhausted.
    • NVIDIA itself is one of the largest consumers of Anthropic and OpenAI tokens in the world. One hundred percent of NVIDIA engineers are now agentically supported. Huang recommends Claude and similar tools by name and says open-source downloads will not match the integrated product harness.
    • NVIDIA still invests heavily in open foundation models because language and intelligence represent the codification of human knowledge. Five pillars: Nemotron (language), BioNeMo (biology), Alphamayo (autonomous vehicles), Groot (humanoid robotics) and a climate science model (mesoscale multiphysics).
    • Sovereign language models matter. Roughly 230 world languages will never be a top priority for a commercial frontier lab. Nemotron is near-frontier and fully fine-tunable so any country can adapt it.
    • Safety and security require open weights. You cannot defend against or audit a black box. Transparent systems let researchers interrogate models and let defenders deploy swarms.
    • The future of cyber defense is not bigger-model-versus-bigger-model. It is trillions of cheap fast small models like Nemotron Nano surrounding the threat.
    • Domain models fuse language priors with world models. Alphamayo learned to drive safely on a few million miles instead of billions because it can reason like a human about the road.
    • MFU (Model Flops Utilization) is a misleading metric. Huang says he wants low MFU, because that means he over-provisioned every resource and never gets pinned by Amdahl’s law during a spike.
    • The xAI Memphis cluster running at 11 percent MFU is not necessarily a failure mode. In disaggregated prefill plus decode inference you can deliver very high tokens per watt with very low MFU.
    • The right metric is performance, ultimately tokens per watt as a proxy for intelligence per watt, and even that needs adjustment because not all tokens are equal. Coding tokens are worth more than other tokens.
    • Hopper was designed for pre-training. NVIDIA chose to build multi-billion-dollar systems when the largest existing scientific supercomputer cost $350 million, with no proven customer base. It worked.
    • Grace Blackwell NVLink72 was designed for inference, especially the high-memory-bandwidth decode phase. It is the world’s first rack-scale computer and delivered a 50x speed-up over Hopper in two years, against an expected 2x from Moore’s Law.
    • Vera Rubin is designed for agents. Long-term memory wired into storage and into the GPU fabric, working memory, heavy tool use, and Vera, a CPU optimized for low-latency multi-core single-threaded code so a multi-billion-dollar GPU system does not stall waiting on a slow tool call.
    • Feynman is being shaped for swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that demands a new compute pattern.
    • Tokens per watt improved 50x in one generation. Compounding energy efficiency is the lever NVIDIA controls directly.
    • Total compute energy demand is heading roughly a thousand times higher than today, possibly two orders of magnitude beyond that. Huang says he would not be surprised if the estimate is low.
    • For the first time in history, market forces alone are enough to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make sustainable energy investment rational.
    • Copper interconnect is becoming a bottleneck. Photonics is moving from optional to structural inside racks and across them.
    • Comparing NVIDIA GPUs to atomic bombs, Huang says, is a stupid analogy. A billion people use NVIDIA GPUs. He advocates them to his family. He does not advocate atomic bombs to anyone.
    • If the United States cedes two thirds of the global market to competitors on policy grounds, the American technology industry will end up like American telecommunications, which was policied out of existence.
    • Huang directly rejects AI doom-by-singularity narratives. It is not true that we have no idea how these systems work. It is not true that the technology becomes infinitely powerful in a nanosecond. He calls the rhetoric irresponsible and harmful to the field students are about to enter.
    • On Stanford specifically: if the university president places an order, NVIDIA will deliver the chips. The bottleneck is that no university department has a billion-dollar compute budget because budgeting is fragmented across grants. Stanford’s $40 billion endowment is more than enough to fix that.
    • “It’s Stanford’s fault” is meant as empowerment. If something is your fault, you can solve it.
    • Career advice: do not optimize purely for passion. Most people do not yet know what they love. Pick the job in front of you and do it as well as possible. Even as CEO, Huang says, 90 percent of the work is hard and he suffers through it.
    • Suffering on purpose builds the muscle of resilience. When the company, the team or the family needs you to be tough, that muscle has to already exist.
    • NVIDIA’s first generation of products was technically wrong in nearly every dimension: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point. The strategic recovery, not the technology, taught Huang the lessons that have lasted decades.
    • The biggest clean strategic mistake Huang names is the move into mobile chips (Tegra). It grew to a billion dollars then went to zero when Qualcomm’s modem dominance shut NVIDIA out of the 3G to 4G transition. The recovery into automotive and robotics (the Thor chip is the great great great grandson of that mobile lineage) was real, but Huang refuses to rationalize the original choice.
    • Forecasting framework: observe, reason from first principles, ask “so what” and “what next” until you have a mental model of the future, place your company inside that model, then work backwards while minimizing opportunity cost and maximizing optionality.
    • Best part of the CEO job: living at the intersection of vision, strategy and execution surrounded by people capable enough to make ambitious visions real. Worst part: the responsibility for everyone who joined the spaceship, especially in the near-death moments NVIDIA had four or five times early on.
    • Underrated insider note: Huang’s first apple pie with cheese, first hot fudge sandwich and first milkshake all happened at Denny’s. The Superbird, the fried chicken and a custom Superbird-style ham and cheese with tomato and mustard are his order.

    Detailed Summary

    Computing reinvented from the ground up

    Huang frames the moment as the first true rewrite of the computer in sixty-plus years. From the IBM System 360 forward, the mental model of writing code, running code, taking a computer to market and reasoning about applications stayed roughly constant. AI changes the programming model itself. Software is no longer a compiled binary running deterministically on a CPU. It is a neural network running on a GPU producing generated, contextual, real-time output. That cascades into how companies are organized, what tools developers use, what the network and storage stack look like, and what an application is even allowed to do. Robo-taxis, he notes, are an application no one would have attempted before deep learning unlocked perception.

    Codesign and the million-x decade

    Codesign is the philosophical center of the talk. Huang traces it to the RISC work of John Hennessy at Stanford, where simpler instruction sets won by being co-designed with the compiler rather than maximally optimized in isolation. NVIDIA extends the principle across every layer simultaneously: GPU architecture, CPU architecture, NVLink and NVSwitch fabrics, photonic interconnects, networking silicon, storage paths, CUDA libraries, frameworks and ultimately the model design. The numbers Huang gives are arresting. Moore’s Law in its prime delivered roughly 100x per decade. By the time Dennard scaling broke, real-world gains had compressed to roughly 10x. NVIDIA’s codesigned stack delivered between 100,000x and 1,000,000x over the same ten-year window. That non-linear speed-up is, in Huang’s telling, the precondition for modern AI: it is what allowed researchers to stop curating training sets and just feed the entire internet to the model.

    Education has to fuse first principles with AI tools

    Asked how curriculum should evolve, Huang argues AI must be integrated into the learning process, not just taught about. He recalls Hennessy writing his textbook by hand a chapter a week while Huang was a student, and says pre-recorded textbooks cannot keep up with the rate at which AI generates new knowledge. He describes his own learning workflow: hand the paper to an AI, then have it read the entire surrounding literature, then treat the AI as a dedicated researcher who can be interrogated. At the same time he defends the classics. Mead and Conway are still the foundation. Most modern semiconductor scaling tricks have been exhausted, but knowing where the field came from sharpens judgment when designing what comes next.

    Open source and the five domain pillars

    Huang gives one of the most detailed public accounts of why NVIDIA invests so heavily in open foundation models even while being a top customer of closed labs. He recommends Claude and OpenAI by name for production coding work, and says 100 percent of NVIDIA engineers are now agentically supported. The open-weights case rests on three legs. First, language is the codification of intelligence, and there are at least 230 languages that no commercial lab will ever prioritize. Nemotron is built near frontier and released so any country or community can fine-tune it. Second, the same representation-learning approach has to be replicated in domains where the data is not internet text, so NVIDIA seeded BioNeMo for biology, Alphamayo for autonomy, Groot for humanoid robotics and a climate model for mesoscale multiphysics. The economics of those fields would never produce a foundation model on their own. Third, safety and security require transparency. A black box cannot be defended or audited, and the future of cyber defense is not bigger-model-versus-bigger-model but swarms of cheap fast small models like Nemotron Nano surrounding the threat.

    MFU is the wrong metric, tokens per watt is closer

    A student raises the leaked memo that the xAI Memphis cluster is running at 11 percent Model Flops Utilization. Huang flips the framing. He says he would rather be at low MFU all the time, because that means he over-provisioned flops, memory bandwidth, memory capacity and network capacity. Bottlenecks shift constantly, so over-provisioning across every dimension is what lets the system absorb a spike without getting pinned by Amdahl’s law. In disaggregated inference, where prefill and decode are physically separated and decode is bandwidth-bound rather than flop-bound, NVLink72 can deliver extremely high tokens per watt while reporting very low MFU. Huang argues the right framing is performance, and ultimately tokens per watt as a rough proxy for intelligence per watt, adjusted for the fact that not all tokens are equal. A coding token is worth more than a generic token.

    Hopper, Grace Blackwell NVLink72, Vera Rubin, Feynman

    Huang gives the clearest public framing of NVIDIA’s roadmap as a sequence of architectural answers to evolving compute patterns. Hopper was built for pre-training, at a moment when NVIDIA chose to build multi-billion-dollar machines while the largest scientific supercomputer in the world cost $350 million and the marketplace for such systems was, on paper, zero. Grace Blackwell NVLink72 was the answer to inference and reasoning: a rack-scale computer that ganged 72 GPUs together because decode needs aggregate memory bandwidth far beyond a single chip. The generation-over-generation speed-up was 50x in two years, twenty-five times what Moore’s Law would have delivered. Vera Rubin is being built explicitly for agents. Agents load long-term memory from storage that has to be wired directly into the GPU fabric, they use working memory, they call tools that run on a CPU, and they wait. So the CPU has to be Vera, optimized for low-latency single-threaded code, because the multi-billion-dollar GPU system cannot afford to idle waiting on a slow tool call. Feynman extends the pattern to swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that will demand its own compute pattern.

    Energy demand and the grid

    Huang’s energy projection is one of the most aggressive numbers in the talk. NVIDIA can compound tokens per watt by 50x per generation through codesign, but the total compute demand is heading roughly a thousand times higher, and Huang says he would not be surprised if the real figure is one or two orders of magnitude beyond that. The reason is structural: future computing is generative and continuous, not pre-recorded and on-demand. The good news, he argues, is that this is the best moment in the history of humanity to invest in sustainable generation. Market forces alone are now sufficient to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make the math work.

    Adversarial countries, export controls and the telecom warning

    This is the segment where Huang is visibly fired up. He attacks the GPUs-as-atomic-bombs framing on its face. NVIDIA GPUs power medical imaging, video games and soy sauce delivery. A billion people use them. He advocates them to his family. The analogy collapses at the first comparison. He attacks the second framing, that American companies should not compete abroad because they will lose anyway, as a self-fulfilling defeat. Competition makes the company better. The third framing, that depriving the rest of the world of general-purpose computing benefits the United States, also fails on first principles: it benefits one or two American companies at the cost of an entire industry. The cautionary parallel is telecommunications. The United States once had a leading position in telecom fundamental technology and policied itself out of it. Huang’s worry, voiced explicitly to a room of CS students, is that they will graduate into a shell of a computer industry if the same path is repeated.

    AI doom and rational optimism

    In the same arc Huang rejects the science-fiction framing of AI as a singularity that arrives suddenly on a Wednesday at 7pm and ends civilization. He calls those claims irresponsible, says they are not true, and points out that the people advancing them are believed by audiences who then make policy on that basis. It is not true that no one understands how these systems work. It is not true that intelligence becomes infinitely powerful instantaneously. It is not true that there is no defense. His framing, which the host echoes as “rational optimism,” is that the goal is to create a future where people care about computers because the technology students are learning is worth mastering.

    Stanford’s compute problem is Stanford’s fault

    A student presses on the scarcity of compute for independent researchers, startups and universities inside the United States. Huang’s answer is sharp: there is no shortage. Place the order and the chips will arrive. The actual broken thing is institutional. University grants are fragmented across departments. No researcher can raise enough on a single grant to fund a billion-dollar shared cluster, and no one shares. He compares it to showing up at the grocery store demanding a billion dollars of tomatoes today. The solution is planning, aggregation and a campus-scale supercomputer, the way Stanford once built the linear accelerator. The endowment is $40 billion. Pulling a billion off it, contracting cloud capacity and giving every student and researcher AI supercomputer access is, in Huang’s view, obviously doable. When he says “it is Stanford’s fault” the host laughs, but Huang clarifies: if it is your fault you have the power to fix it.

    Career, suffering and resilience

    Asked how a CS student should spend the next few years, Huang pushes back on the standard “follow your passion” advice. Most people do not know what they love yet, because no one knows what they do not know. The bar of demanding joy from every working day is too high. Whatever the job is, do it as well as you can. Even as CEO of NVIDIA he says he genuinely loves about 10 percent of his work. The other 90 percent is hard and he suffers through it. He recommends suffering on purpose, because resilience is a muscle that only builds under load, and when the company, the team or the family needs that muscle, it has to already exist. Earlier in his life that meant cleaning toilets and busing tables at Denny’s. He does it today running a multi-trillion-dollar company.

    The biggest mistakes

    Huang separates technical mistakes from strategic mistakes. NVIDIA’s first generation of products was technically wrong in almost every way: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point inside. The company wasted two and a half years. But the strategic genius of the recovery, the reading of the market, the conservation of resources and the reapplication of talent, is what taught him strategy. The clean strategic mistake he names is mobile. NVIDIA’s Tegra line grew to a billion dollars of revenue and then collapsed to zero when Qualcomm’s modem dominance locked NVIDIA out of the 3G to 4G transition. Huang explicitly refuses the comforting rationalization that the Tegra effort fed the Thor automotive chip (“Thor is the great great great grandson”). The original decision, he says, was a waste of time. The lesson is to think one or two clicks further about whether a market is structurally winnable before committing the company.

    Forecasting under fog of war

    The final substantive exchange is on forecasting. Huang’s method has four steps. Observe what is actually happening (AlexNet crushing two decades of computer vision research in one shot, GPT producing reasoning by token generation). Reason from first principles about why it works. Ask “so what” and “what next” recursively until a mental model of the future emerges. Place the company inside that future and work backwards. Crucially, expect to be partly wrong. Some outcomes will absolutely happen, some will likely happen, some might happen, and the strategy has to be robust across that distribution. The real cost of any strategic choice is the opportunity cost of the alternatives you did not take, so the discipline is to minimize that cost and maximize optionality while letting the journey itself pay for the journey.

    Thoughts

    The most useful thing in this conversation is the explicit architectural mapping of compute patterns to chip generations. Hopper for pre-training. Grace Blackwell NVLink72 for inference, because decode is bandwidth-bound and a single chip cannot supply it. Vera Rubin for agents, because tool calls stall multi-billion-dollar GPU systems and so the CPU has to be optimized for low-latency single-threaded code. Feynman for swarms. That sequence is not marketing. It is a falsifiable thesis about where the bottleneck moves next, and every other infrastructure company should be measuring themselves against it. If Huang is right that swarms of sub-agents are the next dominant pattern, then the design pressure shifts from raw flops to fabric topology, memory hierarchy and storage-to-GPU latency. That has implications for everyone downstream, including the hyperscalers building competing accelerators.

    The MFU section is the most intellectually generous moment in the talk. The instinct in the AI ops community has been to chase MFU as if it were a virtue. Huang argues, persuasively, that low MFU is consistent with high tokens per watt in a disaggregated inference setup, and that bottlenecks rotate fast enough that over-provisioning every resource is the rational design. That reframing matters because it changes what “scarce” means. Compute is not scarce in the way the discourse treats it. What is scarce is a coherent system designed end-to-end. The xAI 11 percent number, in that frame, is not embarrassing. It is the natural reading of a workload that is mostly decode.

    The Stanford segment is the part most likely to be quoted out of context. “It’s Stanford’s fault” is a deliberately provocative line, but the underlying claim is correct and load-bearing. Compute is not gated by NVIDIA refusing to ship chips. It is gated by the fact that fragmented grant funding cannot aggregate into the billion-dollar order that NVIDIA can fulfill. The implication is that universities and national labs need a structural change in how they pool capital for compute, and that the current model of every researcher buying a handful of cards is genuinely obsolete. Huang’s nudge about pulling a billion off the endowment is concrete enough to be acted on, and other major research universities should read this segment as a direct prompt.

    The geopolitical segment is the highest-stakes one. The telecommunications comparison is correct as a historical pattern, and Huang is one of the very few executives in a position to deliver that warning credibly. The unresolved tension is that the argument applies symmetrically. If American AI dominance is built by selling globally, that includes selling into adversarial states, and the policy question is where the line falls. Huang does not answer that question. He attacks the framing that lets the question be answered badly. That is a meaningful contribution to the discourse even if it does not resolve the underlying tradeoff.

    The career advice section is the part the social-media clips will mishandle. “Seek suffering” reads as macho when extracted. In context it is a specific operational claim about how resilience compounds, and it is paired with the Tegra story where Huang himself paid the price of not thinking one more click ahead. That kind of self-implication is rare in CEO talks, and it is the reason the talk is worth listening to in full rather than only reading the recap.

    Watch the full Stanford CS153 Frontier Systems conversation with Jensen Huang here.

  • Elad Gil on the AI Frontier: Compute Constraints, the Personal IPO, and Why Most AI Founders Should Sell in the Next 12 to 18 Months

    Elad Gil sat down with Tim Ferriss for a wide ranging conversation that pairs almost perfectly with his recent Substack post Random thoughts while gazing at the misty AI Frontier. Together, the podcast and the post lay out the cleanest framework I have seen for what is actually happening in AI right now: a Korean memory bottleneck capping every lab, a class wide personal IPO across the research community, the fastest revenue ramps in capitalist history, and a brutal dot com style culling that most founders do not yet want to admit is coming. Below is a complete breakdown.

    TLDW (Too Long, Didn’t Watch)

    Elad Gil argues that AI is producing the fastest revenue ramps in capitalist history while setting up the same brutal power law that wiped out 99 percent of dot com companies. OpenAI and Anthropic each sit at roughly 0.1 percent of US GDP today, on a path to 1 percent of GDP run rate by end of 2026, which is insanely fast by any historical standard. The current ceiling on capabilities is not chips but Korean high bandwidth memory, and that constraint will likely hold all major labs roughly comparable in capability through 2028. Talent has just experienced a class wide personal IPO via Meta led bidding, with packages running tens to hundreds of millions per researcher. Most AI companies should consider exiting in the next 12 to 18 months while the tide is high. Right now consensus is correct. Save the contrarianism for later.

    Key Takeaways

    • OpenAI and Anthropic are each at roughly 0.1 percent of US GDP. With US GDP near 30 trillion dollars and each lab at a roughly 30 billion dollar revenue run rate, AI has gone from essentially zero to 0.25 to 0.5 percent of GDP in just a few years. If the labs hit 100 billion in run rate by year end 2026 (which many expect), AI hits 1 percent of GDP run rate inside a single year.
    • The AI personal IPO is real. 50 to a few hundred AI researchers across multiple companies just experienced a class wide IPO event due to Meta led bidding, with top packages reportedly tens to hundreds of millions per person. The closest historical analog is early crypto holders around 2017.
    • The bottleneck is Korean memory, not Nvidia chips. High bandwidth memory from Hynix, Samsung, Micron, and others is the binding constraint. Expected to hold roughly two years. After that, power and data center buildout become the next walls.
    • No lab can pull dramatically ahead before 2028. Because every lab is compute constrained on the same input, OpenAI, Anthropic, Google, xAI, and Meta should remain roughly comparable in capability through that window, absent an algorithmic breakthrough that stays inside one lab.
    • Compute is the new currency. Token budgets now define what an engineer can accomplish, what a company can spend, and what business models are viable. Some companies (neoclouds, Cursor) are effectively inference providers disguised as tools.
    • The dot com base rate is the AI base rate. Around 1,500 to 2,000 companies went public in the late 1990s internet cycle. A dozen or two survived. AI will likely look the same.
    • Most AI founders should consider selling in the next 12 to 18 months. If you are not in the durable handful, this is your value maximizing window. A handful of companies (OpenAI, Anthropic) should never sell.
    • Buyers are bigger than ever. One percent of a 3 trillion dollar market cap is 30 billion dollars. That math makes massive AI acquisitions trivial for hyperscalers, vertical incumbents, and adjacent giants.
    • Underrated exit path: merger of equals. Two private AI competitors destroying each other on price should consider just merging. PayPal and X.com did exactly this in the 1990s.
    • 91 percent of global AI private market cap sits in a 10 by 10 mile square. If you want to do AI, move to the Bay Area. Remote work for cluster industries is BS.
    • Want money? Ask for advice. Want advice? Ask for money. The inverse also works: offering useful advice frequently leads to inbound investment opportunities.
    • AI is selling units of labor, not software. The shift is from selling seats and tools to selling cognitive output. This is why Harvey can win in legal, where decades of legal SaaS failed.
    • AI eats closed loops first. Tasks that can be turned into testable closed loop systems (code, AI research) get automated fastest. Map jobs on a 2×2 of closed loop tightness vs economic value to see where AI hits soonest.
    • Headcount will flatten at later stage companies. Multiple late stage CEOs told Elad they will not do big AI layoffs but will simply stop growing headcount even as revenue grows 30 to 100 percent. Hidden layoffs are also hitting outsourcing firms in India and the Philippines first.
    • The Slop Age could be the golden era of AI plus humanity. AI produces useful slop at volume, humans desloppify it, leverage is high, and the work is fun. This window may close as AI gets superhuman.
    • Market first, team second (90 percent of the time). Great teams die in bad markets. The exception is when you meet someone truly exceptional at the very earliest stage.
    • The one belief framework. If your investment memo needs three core beliefs to be true, it is too complicated. Coinbase was an index on crypto. Stripe was an index on e-commerce. That was the entire memo.
    • The four year vest is a relic. It exists because in the 1970s companies actually went public in four years. Today the private window has stretched to 20 years and venture has eaten what used to be public market growth investing.
    • Boards are in-laws. You cannot fire investor board members. Take a worse price for a better board member, because as Naval Ravikant said, valuation is temporary, control is forever.
    • Right now, consensus is correct. Save the contrarianism. The smart move is to just buy more AI exposure rather than try to outsmart the obvious.
    • Distribution wins more than founders admit. Google paid hundreds of millions to push the toolbar. Facebook bought ads on people’s own names in Europe. TikTok spent billions on user acquisition. Allbirds (yes, the shoe company) just raised a convert to build a GPU farm.
    • Anti-AI sentiment will get worse before it gets better. Maine banned new data centers. There has been violence directed at AI leaders. Expect more political and activist backlash, especially as AI is blamed for harms it has not yet caused while its benefits are mismeasured.
    • Use AI as a cold reader. Elad uploads photos of founders to AI models with cold reading prompts and reports surprisingly accurate personality assessments based on micro features.

    Detailed Summary

    The Numbers Are Insane and Mostly Underappreciated

    The most stunning data point in either source is the GDP math. US GDP is roughly 30 trillion dollars. OpenAI and Anthropic are each rumored to be at roughly 30 billion dollars in revenue run rate, putting each one at 0.1 percent of US GDP. Add cloud AI revenue and the picture gets stranger: AI has grown from essentially zero to between 0.25 and 0.5 percent of GDP in only a few years. If the labs hit 100 billion in run rate by year end 2026, AI will be at roughly 1 percent of GDP run rate inside a single year. There is no historical analog for that pace. Elad notes that productivity gains from AI may end up mismeasured the way internet productivity was undercounted in the 2000s, which would have downstream consequences for regulation: AI gets blamed for the bad (job losses) and credited for none of the good (new jobs, education gains, healthcare improvements). His half joking aside is that the real ASI test may be the ability to actually measure AI’s economic impact.

    The AI Personal IPO

    The most underdiscussed phenomenon in AI right now, according to Elad, is what he calls a class wide personal IPO. When a company IPOs, a subset of employees become wealthy, lose focus, and either start companies, get into politics, fund passion projects, or check out. Meta started aggressively bidding for AI talent. Other major labs had to match. The result was 50 to a few hundred researchers, scattered across multiple labs, suddenly receiving compensation in the tens to hundreds of millions of dollars range. The only historical analog Elad can think of is early crypto holders around 2017. Some chunk of these newly wealthy researchers will redirect attention to AI for science, side projects, or quiet quitting. The aggregate field stays mission aligned, but the distribution of attention has shifted.

    The Korean Memory Bottleneck

    Every major AI lab today is building giant Nvidia clusters paired with high bandwidth memory primarily from Korean fabs and a few other suppliers. They run massive amounts of data through these clusters for months, and the output is, almost absurdly, a single flat file containing what amounts to a compressed version of human knowledge plus reasoning. Right now, the binding constraint on this whole stack is HBM memory from Hynix, Samsung, Micron, and others. Korean memory fab capacity has been below the capacity of every other piece of the system. Elad estimates this constraint persists for roughly two years. After that, the next walls are likely data center construction and power. The strategic implication is enormous. While memory constrains everyone, no single lab can buy 10x the compute of its rivals, so capabilities should stay roughly comparable across the major labs. Once that constraint lifts, possibly around 2028, one player could theoretically pull dramatically ahead, especially if AI assisted AI research closes a self improvement loop inside one lab.

    Compute Is the New Currency

    The blog post sharpens a framing that runs throughout the podcast: compute, denominated in tokens, is now a unit of economic value. Token budgets define what an engineer can accomplish, what a company can spend, and what business models work. Some companies are effectively inference providers wearing tool costumes. Neoclouds are the cleanest example. Cursor is another, subsidizing inference as a user acquisition strategy. The most absurd recent example: Allbirds, the shoe company, raised a convertible to build a GPU farm. Whether this becomes the AI version of Microstrategy’s Bitcoin trade or a cautionary tale, it tells you where the cost of capital believes the next decade is going.

    The Dot Com Survival Math

    Elad walks through the brutal arithmetic that AI founders should be internalizing. In the late 1990s and early 2000s, somewhere between 1,500 and 2,000 internet companies went public. Of those, roughly a dozen or two survived in any meaningful form. Every cycle has looked like this: automotive in the early 1900s, SaaS, mobile, crypto. There is no reason AI will be different. Most current AI companies, including those ramping revenue today, will see the market, competition, and adoption turn on them. The question every AI founder should be asking is whether they are in the durable handful or not.

    Most AI Companies Should Consider Exiting in the Next 12 to 18 Months

    This is the most actionable and most uncomfortable take in either source. While the tide is rising, every AI company looks unstoppable. Whether they actually are, in a 10 year frame, is a separate question. Founders running successful AI companies should take a cold honest look at whether the next 12 to 18 months is their value maximizing window. Companies typically have a 6 to 12 month peak before some headwind hits, often visible in the second derivative of growth. The best signal that you should sell is when growth rate is starting to plateau and you can see why. A handful of companies (OpenAI, Anthropic, the durable winners) should never exit. Many others should, while everything is still on the upswing.

    What Makes an AI Company Durable

    Elad lays out four lenses for evaluating durability at the application layer:

    1. Does your product get dramatically better when the underlying model gets better, in a way that keeps customers loyal?
    2. How deep and broad is the product? Are you building multiple integrated products embedded in actual workflows?
    3. Are you embedded in real change management at the customer? AI adoption is mostly a workflow change problem, not a tech problem. Workflow embedding is durable.
    4. Are you capturing and using proprietary data in a way that creates a system of record? Data moats are often overstated, but sometimes real.

    At the lab layer, Elad believes OpenAI, Anthropic, and Google are durable absent disaster. He predicted three years ago that the foundation model market would settle into an oligopoly aligned with cloud, and that prediction has roughly held.

    Selling Work, Not Software

    The deepest structural insight in the conversation is that generative AI is shifting what software companies sell. The old model was selling seats, tools, and SaaS subscriptions. The new model is selling units of cognitive labor. Zendesk sold seats to support reps. Decagon and Sierra sell agentic support output. Harvey can win in legal even though selling to law firms was historically considered terrible business, because Harvey is not selling tools, it is augmenting lawyer output. This shift opens markets that were previously closed and dramatically grows tech TAMs. It is also why founder limited theories of entrepreneurship currently understate how many opportunities exist.

    AI Eats Closed Loops First

    One of the cleanest mental models in the blog post is the closed loop framework. AI automates first what can be turned into a testable closed loop. Code is the canonical example: outputs can be tested, errors detected, models can iterate. AI research is similar. Both have tight feedback loops and high economic value, which puts them at the top of the AI impact ranking. Map jobs on a 2×2 of closed loop tightness vs economic value and you can see where AI hits soonest. The interesting forward question is which jobs become more closed loop next. Data collection and labeling will keep growing in every field as a result.

    The Harness Matters More Than People Think

    For coding tools and increasingly for enterprise applications, what Elad calls the harness, the wrapper of UX, prompting, workflow integration, and brand around the underlying model, is becoming sticky. It is not just which model you call. It is the environment built around it. Cursor and Windsurf demonstrate this in coding. The interesting open questions are what the harness looks like for sales AI, for AI architects, for analyst workflows. Those gaps leave room for startups even as model capabilities converge.

    Hidden Layoffs and the Developing World

    Most announced AI driven layoffs are probably just COVID era overhiring corrections wrapped in a more flattering narrative. But real AI driven labor displacement is happening, and it is hitting outsourcing firms first. That means countries like India and the Philippines, where many outsourced services jobs sit, are likely to be the most impacted earliest. Several developing economies built their growth ladders on services exports. If AI takes those jobs first, the migration and economic patterns of the next decade may shift in ways nobody is yet planning for.

    The Flat Company

    Multiple late stage CEOs told Elad they will not announce big AI layoffs. Instead, they will simply stop growing headcount. If revenue grows 30 to 100 percent, headcount stays flat or shrinks via attrition. Existing employees become dramatically more productive. The very best people who can leverage AI will see compensation inflate. Sales and some growth engineering keep hiring. Almost everything else flatlines. This is mostly a later stage and public company phenomenon. True early stage startups should still scale aggressively after product market fit, just with more leverage per person.

    Exit Options for AI Founders

    Elad lays out four exit categories. First, the labs and hyperscalers themselves: Apple, Amazon, Google, Microsoft, Meta. Second, vertical incumbents like Thomson Reuters for legal or healthcare giants for clinical AI. Third, the underrated category of merger of equals between two private AI competitors who are currently destroying each other on price. PayPal and X.com did this in the 1990s. Uber and Lyft reportedly almost did. Fourth, large adjacent tech companies: Oracle, Samsung, Tesla, SpaceX, Snowflake, Databricks, Stripe, Coinbase. The market cap math has changed in a way that makes acquisition trivial. One percent of a three trillion dollar market cap is 30 billion dollars, which means a hyperscaler can do massive acquisitions almost casually.

    Geographic Concentration Is Extreme

    Elad’s team analyzed where private market cap aggregates. Historically half of global tech private market cap sat in the US, with half of that in the Bay Area. With AI, 91 percent of global AI private market cap is in a single 10 by 10 mile square in the Bay Area. New York is a distant second and then it falls off a cliff. For defense tech, the cluster is Southern California (SpaceX, Anduril, El Segundo, Irvine). Fintech and crypto skew toward New York. The remote everywhere advice is, Elad says, just BS for anyone trying to break into an industry cluster.

    How Elad Got Into His Best Deals

    Stripe started with Elad cold emailing Patrick Collison after selling an API company to Twitter. A couple of walks later, Patrick texted that he was raising and Elad was in. Airbnb came from helping the founders raise their Series A and being asked at the end if he wanted to invest. Anduril came from noticing that Google had shut down Project Maven and asking if anyone was building defense tech, then meeting Trey Stephens at a Founders Fund lunch. Perplexity came from Aravind Srinivas cold messaging him on LinkedIn while still at OpenAI. Across all of these, the pattern is the same: be in the cluster, be helpful, be talking publicly about technology nobody else is talking about, and be useful to founders before any money is on the table.

    The One Belief Framework

    Investors love complicated 50 page memos. Elad believes the actual decision usually collapses into a single core belief. Coinbase: this is an index on crypto, and crypto will keep growing. Stripe: this is an index on e-commerce, and e-commerce will keep growing. Anduril: AI plus drones plus a cost plus model will be important for defense. If your thesis needs three things to be true, it is probably not going to work. If it needs nothing, you have no thesis.

    Boards as In-Laws

    Elad emphasizes that founders should treat board composition like one of the most important hiring decisions of the company. You cannot fire an investor board member. They have contractual rights. So if you are going to be stuck with someone for a decade, take a worse valuation for a better human. Reid Hoffman’s framing is that the best board member is a co-founder you could not have otherwise hired. Naval Ravikant’s framing is that valuation is temporary but control is forever. Elad recommends writing a job spec for every board seat.

    The Slop Age as a Golden Era

    One of the warmest takes in the blog post is the framing of the current moment as the Slop Age, and the suggestion that this might actually be the golden era of AI plus humanity. Before the last few years, AI was inaccessible and narrow. Eventually AI may become superhuman at most tasks. Today, AI produces useful slop at volume, which means humans are still needed to desloppify the slop, but the leverage on time and ambition is real. That makes the work fun. If AI displaces people or starts doing more interesting work, this golden moment fades. Elad also notes the obvious counter, that the era of human generated internet slop preceded the AI slop era. AGI may end the slop age, or alternately may be the thing that finally cleans up all the prior waves of human slop.

    Anti-AI Regulation and Violence Will Increase

    This is one of the more sobering threads in the blog post. Real world AI driven labor displacement has been small so far, but anti-AI sentiment is already strong and growing. Maine just banned new data centers. There has been actual violence directed at AI leaders, including a recent attack on Sam Altman. Elad’s view is that AI leaders should work harder on optimistic public framing, real political lobbying, and reining in the doom narrative coming from inside the field. Otherwise the regulatory and activist backlash will get much worse, and likely on the basis of mismeasured impacts.

    Right Now Consensus Is Correct

    The headline contrarian take from the episode is that contrarianism right now is wrong. There are moments in time when betting against the crowd pays. This is not one of them. The smart bet is just buying more AI exposure. Trying to find the clever angle, the underlooked hardware play, the secret macro thesis, is overthinking it. Save the contrarian moves for later in the cycle.

    Distribution Almost Always Matters

    Elad pushes back on the founder mythology that great products win on their own. Google paid hundreds of millions of dollars in the early 2000s to distribute its toolbar through every popular app installer on the internet. Facebook bought search ads against people’s own names in European markets to seed network liquidity. TikTok spent billions on user acquisition before its algorithm could lock people in. Snowflake spent enormous sums on enterprise sales and channel partnerships. Sometimes the best product wins. Often the company with the best distribution wins. Founders should plan for both.

    AI as a Cold Reader and a Research Partner

    Two of the more practical AI workflows Elad describes: First, uploading photos of founders to AI models with cold reading prompts that ask the model to identify micro features (crows feet from genuine smiling, brow patterns, posture cues) and infer personality traits, sense of humor, and likely social behavior. He reports the outputs are surprisingly specific. Second, running deep dives across multiple models in parallel (Claude, ChatGPT, Gemini), asking each for primary sources, summary tables, and cross checked data. He recently used this approach to investigate the rise in autism and ADHD diagnoses, concluding that diagnostic criteria shifts and school incentives drive most of it, and noting that maternal age has a stronger statistical association with autism than paternal age, despite paternal age getting all the public discourse.

    The First Ever 10 Year Plan

    For someone who has been compounding aggressively for two decades, Elad has somehow never written a 10 year plan until now. He knows it will not play out as written. The point is that the act of imagining a decade out shifts what you choose to do in the near term. He explicitly rejects the AGI in two years therefore plans are pointless framing as defeatist. There will be interesting things to do regardless of how the AGI timeline plays out.

    Thoughts

    This is one of the more useful AI investor conversations of 2026, mostly because Elad is willing to put numbers and timelines on things that are usually left vague. Pairing the podcast with the underlying Substack post is the right move because the post is where the GDP math, the closed loop framework, and the Slop Age framing actually live. The podcast is where Elad explains how he thinks rather than just what he thinks.

    The 12 to 18 month sell window framing is the most actionable single idea in either source, and probably the most uncomfortable for AI founders sitting on multi billion dollar paper valuations. The math is unforgiving. A dozen winners out of thousands. If you are honest with yourself about whether you are in the dozen, you know what to do.

    The Korean memory bottleneck framing explains a lot of current behavior. The talent wars make more sense once you accept that compute is not going to be the differentiator for two years, so people become the only remaining lever. The convergence of capabilities across OpenAI, Anthropic, Google, and xAI starts to look less like coincidence and more like the structural inevitability of a supply constrained input. The 2028 inflection date is the one to watch.

    Compute as currency is the cleanest reframing in the blog post. Once you start pricing companies in tokens rather than dollars, everything from Cursor’s economics to Allbirds raising a convert to build a GPU farm becomes legible. The interesting question is whether this is a permanent unit of denomination or a transitional one that fades when inference costs collapse.

    The software to labor argument is the structural framing that I think will hold up the longest. Once you internalize that we are not selling seats anymore but selling cognitive output, every vertical that was previously locked behind ugly procurement and IT inertia opens up. Harvey is the proof of concept. There will be 30 more Harveys across every white collar profession.

    The closed loop framework is the cleanest predictor of which jobs get hit hardest and soonest. If you want to know whether your role is exposed, the questions to ask are whether outputs can be machine evaluated, how tight the feedback loop is, and how high the economic value is. The intersection is where AI lands first.

    The geographic concentration data is genuinely shocking. 91 percent of global AI private market cap in a 10 by 10 mile area is the kind of statistic that should make everyone outside that square think very carefully about what game they are playing.

    The Slop Age framing is the most emotionally honest moment in the post. We are in a window where humans still meaningfully add value on top of AI output. That window is finite. Enjoy it.

    The anti-AI backlash thread is the one I think most people in the industry are still underweighting. Maine banning new data centers is a leading indicator, not a one off. The fact that the impacts are likely to be mismeasured by official statistics makes the political dynamics worse, not better. AI will get blamed for harms it did not cause and credited for none of the gains. If the field’s leaders do not start communicating better and lobbying smarter, the regulatory environment in 2028 will be much worse than in 2026.

    Finally, Elad’s first ever 10 year plan stands out as the most quietly important moment in the episode. The implicit message is that even people who have been compounding aggressively for two decades benefit from forcing a longer time horizon onto their thinking. Most plans fail. The act of planning still changes what you do today.

    Read the original Elad Gil post here: Random thoughts while gazing at the misty AI Frontier. Find Elad on X at @eladgil, on his Substack at blog.eladgil.com, and on his website at eladgil.com. Tim Ferriss publishes the full episode at tim.blog/podcast.

  • Jensen Huang on Nvidia’s Supply Chain Moat, TPU Competition, China Export Controls, and Why Nvidia Will Not Become a Cloud (Dwarkesh Podcast Summary)

    TLDW (Too Long, Didn’t Watch)

    Jensen Huang sat down with Dwarkesh Patel for over 90 minutes covering Nvidia’s supply chain dominance, the TPU threat, why Nvidia will not become a hyperscaler, whether the US should sell AI chips to China, and why Nvidia does not pursue multiple chip architectures at once. Jensen framed Nvidia’s entire business as transforming “electrons into tokens” and argued that Nvidia’s real moat is not any single technology but the full stack ecosystem it has built over two decades. He was blunt about his regret over not investing in Anthropic and OpenAI earlier, passionate about keeping the American tech stack dominant worldwide, and dismissive of the idea that China’s chip industry can be meaningfully contained through export controls.

    Key Takeaways

    1. Nvidia’s moat is the ecosystem, not the chip. Jensen repeatedly emphasized that Nvidia’s competitive advantage comes from CUDA, its massive installed base, its deep partnerships across the entire supply chain, and the fact that it operates in every cloud. The moat is not a single product but an interlocking system that took 20+ years to build.

    2. Supply chain bottlenecks are temporary, energy bottlenecks are not. Jensen argued that CoWoS packaging, HBM memory, EUV capacity, and logic fabrication bottlenecks can all be resolved in two to three years with the right demand signal. The real constraint on AI scaling is energy policy, which takes far longer to fix.

    3. TPUs and ASICs are not an existential threat to Nvidia. Jensen was emphatic that no competitor has demonstrated better price-performance or performance-per-watt than Nvidia, and challenged TPU and Trainium to prove otherwise on public benchmarks like InferenceMAX and MLPerf. He described Anthropic as a “unique instance, not a trend” for TPU adoption.

    4. Jensen regrets not investing in Anthropic and OpenAI earlier. He admitted he did not deeply internalize how much capital AI labs needed and that traditional VC funding was not sufficient for companies at that scale. He described this as a clear miss, though he said Nvidia was not in a position to make multi-billion dollar investments at the time.

    5. Nvidia will not become a hyperscaler. Jensen’s philosophy is “do as much as needed, as little as possible.” Building cloud infrastructure is something other companies can do, so Nvidia supports neoclouds like CoreWeave, Nebius, and Nscale instead of competing with them. Nvidia invests in ecosystem partners rather than vertically integrating into cloud services.

    6. Jensen is strongly against US chip export controls on China. This was the longest and most heated segment of the interview. Jensen argued that China already has abundant compute, energy, and AI researchers, and that export controls have accelerated China’s domestic chip industry while causing the US to concede the world’s second-largest technology market. He compared the situation to how US telecom policy allowed Huawei to dominate global telecommunications.

    7. AI will cause software tool usage to skyrocket, not collapse. Jensen pushed back on the narrative that AI will commoditize software companies. He argued that agents will use existing tools at massive scale, causing the number of instances of products like Excel, Synopsys Design Compiler, and other enterprise tools to grow exponentially.

    8. Nvidia does not pick winners among AI labs. Jensen explained that Nvidia invests across multiple foundation model companies simultaneously and refuses to favor any single one. He cited his own company’s unlikely survival story as the reason for this humility: Nvidia’s original graphics architecture was “precisely wrong” and would have been counted out by anyone picking winners.

    9. Nvidia added Groq for premium token economics. Nvidia recently acquired Groq and is folding it into the CUDA ecosystem because the market is now segmenting into different token tiers. Some customers will pay premium prices for faster response times even at lower throughput, creating a new segment of the inference market.

    10. Without AI, Nvidia would still be very large. Jensen was clear that accelerated computing, not AI specifically, is the foundational mission of the company. Molecular dynamics, quantum chemistry, computational lithography, data processing, and physics simulation all benefit from GPU acceleration regardless of deep learning.

    Detailed Summary

    Nvidia’s Real Business: Electrons to Tokens

    Jensen opened the conversation by reframing Nvidia’s entire value proposition. When Dwarkesh suggested that Nvidia is fundamentally a software company that sends a GDS2 file to TSMC for manufacturing, Jensen pushed back hard. He described Nvidia’s job as transforming electrons into tokens, with everything in between representing an “incredible journey” of artistry, engineering, science, and invention. He said the transformation is far from deeply understood and the journey is far from over, making commoditization unlikely.

    Jensen described Nvidia as operating a philosophy of doing “as much as necessary and as little as possible.” Whatever Nvidia does not need to do itself, it partners with someone else and makes it part of the broader ecosystem. This is why Nvidia has what Jensen called probably the largest ecosystem of partners in the industry, spanning the full supply chain upstream and downstream, application developers, model makers, and all five layers of the AI stack.

    On the question of whether AI will commoditize software companies, Jensen offered a contrarian take. He argued that agents are going to use software tools at unprecedented scale, meaning the number of instances of products like Excel, Cadence design tools, and Synopsys compilers will skyrocket. Today the bottleneck is the number of human engineers. Tomorrow, those engineers will be supported by swarms of agents exploring design spaces and using the same tools humans use today. Jensen said the reason this has not happened yet is simply that the agents are not good enough at using tools. That will change.

    The Supply Chain Moat

    Dwarkesh pressed Jensen on Nvidia’s reported $100 billion (and potentially $250 billion) in purchase commitments with foundries, memory manufacturers, and packaging companies. The question was whether Nvidia’s real moat for the next few years is simply locking up scarce upstream components so that no competitor can get the memory and logic they need to build alternative accelerators.

    Jensen confirmed this is a significant advantage but framed it differently. He said Nvidia has made enormous explicit and implicit commitments upstream. The implicit commitments matter just as much: Jensen personally meets with CEOs across the supply chain to explain the scale of the coming AI industry, convince them to invest in capacity, and assure them that Nvidia’s downstream demand is large enough to justify that investment. Nvidia’s GTC conference serves this purpose too, bringing the entire ecosystem together so upstream suppliers can see downstream demand and vice versa.

    Jensen described a process of systematically “prefetching bottlenecks” years in advance. CoWoS advanced packaging was a major bottleneck two years ago, but Nvidia swarmed it with repeated doubling of capacity until TSMC recognized it as mainstream computing technology rather than a specialty product. More recently, Nvidia has invested in the silicon photonics ecosystem through partnerships with Lumentum and Coherent, invented new packaging technologies, licensed patents to keep the supply chain open, and even invested in new testing equipment like double-sided probing.

    When Dwarkesh asked about the ultimate physical bottlenecks, Jensen surprised him. The hardest bottleneck to solve is not CoWoS or HBM or EUV machines. It is plumbers and electricians needed to build data centers. Jensen used this as a launching point to criticize “doomers” who discourage people from pursuing careers in software engineering or radiology, arguing that scaring people out of these professions creates the real bottlenecks.

    On EUV and logic scaling specifically, Jensen was optimistic. He said no supply chain bottleneck lasts longer than two to three years. Once you can build one of something, you can build ten, and once you can build ten, you can build a million. The key is a clear demand signal. If TSMC is convinced of the demand, ASML will produce enough EUV machines. Meanwhile, Nvidia continues to improve computing efficiency by 10x to 50x per generation through architecture, algorithms, and system design.

    The TPU Question

    Dwarkesh pushed hard on whether Google’s TPUs represent a real threat, noting that two of the top three AI models (Claude and Gemini) were trained on TPUs. Jensen drew a sharp distinction between what Nvidia builds and what a TPU is. Nvidia builds accelerated computing, which serves molecular dynamics, quantum chromodynamics, data processing, fluid dynamics, particle physics, and AI. A TPU is a tensor processing unit optimized for matrix multiplies. Nvidia’s market reach is far greater than any TPU or ASIC can possibly have.

    Jensen emphasized programmability as Nvidia’s core architectural advantage. If you want to invent a new attention mechanism, build a hybrid SSM model, fuse diffusion and autoregressive techniques, or disaggregate computation in a novel way, you need a generally programmable architecture. The only way to achieve 10x or 100x performance leaps (versus the roughly 25% per year from Moore’s Law) is to fundamentally change the algorithm, and that requires the flexibility CUDA provides.

    On the specific question of whether hyperscalers with huge engineering teams can simply write their own kernels and bypass CUDA, Jensen acknowledged they do write custom kernels but argued that Nvidia’s engineers still routinely deliver 2x to 3x speedups when they optimize a partner’s stack. He described Nvidia’s GPUs as “F1 racers” that anyone can drive at 100 mph, but extracting peak performance requires deep architectural expertise. Nvidia uses AI itself to generate many of its optimized kernels.

    Jensen was particularly blunt about public benchmarks. He pointed to Dylan Patel’s InferenceMAX benchmark and said neither TPU nor Trainium has been willing to demonstrate their claimed performance advantages on it. He said Nvidia’s performance-per-TCO is the best in the world, “bar none,” and challenged anyone to prove otherwise.

    Regarding Anthropic’s multi-gigawatt deal with Broadcom and Google for TPUs, Jensen called it “a unique instance, not a trend.” He said without Anthropic, there would be essentially no TPU growth and no Trainium growth. He traced this back to his own mistake: when Anthropic and OpenAI needed multi-billion dollar investments from their compute suppliers to get off the ground, Nvidia was not in a position to provide that capital. Google and AWS were, and in return, Anthropic committed to using their compute.

    Nvidia’s Investment Strategy and Regrets

    Jensen was unusually candid about his regret over not investing in foundation model companies earlier. He said he did not deeply internalize how different AI labs were from typical startups. A traditional VC would never put $5 to $10 billion into a single AI lab, but that was exactly what companies like OpenAI and Anthropic needed. By the time Jensen understood this, Nvidia was not in a financial or cultural position to make those kinds of investments.

    Now, Nvidia has invested approximately $30 billion in OpenAI and $10 billion in Anthropic. Jensen said he is delighted to support both and considers their existence essential for the world. But he acknowledged that these investments came at much higher valuations than would have been possible years earlier.

    Jensen explained Nvidia’s broader investment philosophy: support everyone, do not pick winners. He invests in one foundation model company, he invests in all of them. This comes from hard-won humility. When Nvidia started, there were 60 3D graphics companies. Nvidia’s original architecture was “precisely wrong” and the company would have been at the top of most lists to fail. Jensen said he has enough humility from that experience to know that you cannot predict which AI company will ultimately succeed.

    Why Nvidia Will Not Become a Hyperscaler

    Dwarkesh pointed out that Nvidia has the cash to build and operate its own cloud infrastructure, bypassing the middleman ecosystem that converts CapEx into OpEx for AI labs. Jensen rejected this path based on his core operating philosophy.

    If Nvidia did not build its computing platform, NVLink, and the CUDA ecosystem, nobody else would have done it. He is “completely certain” of that. These are things Nvidia must do. But the world has lots of clouds. If Nvidia did not build a cloud, someone else would show up. So the answer is to support the ecosystem instead: invest in CoreWeave, Nscale, Nebius, and others to help them exist and scale, rather than competing with them.

    Jensen was clear that Nvidia is not trying to be in the financing business either. When OpenAI needed a $30 billion investment before its IPO, Nvidia stepped up because OpenAI needed it and Nvidia deeply believed in the company. But these are targeted ecosystem investments, not a strategic pivot into cloud services.

    On GPU allocation during shortages, Jensen pushed back on the narrative that Nvidia strategically “fractures” the market by giving allocations to smaller neoclouds. He said the process is straightforward: you forecast demand, you place a purchase order, and it is first in, first out. Nvidia never changes prices based on demand. Jensen said he prefers to be dependable and serve as the foundation of the industry rather than extracting maximum short-term value.

    The China Debate

    The longest and most heated section of the interview was Jensen’s case against US chip export controls on China. This was a genuine debate, with Dwarkesh pushing the national security argument and Jensen pushing back forcefully.

    Jensen’s core argument rested on several pillars. First, China already has abundant compute. They manufacture 60% or more of the world’s mainstream chips, have massive energy infrastructure (including empty data centers with full power), and employ roughly 50% of the world’s AI researchers. The threshold of compute needed to build models like Anthropic’s Mythos has already been reached and exceeded by China’s existing infrastructure.

    Second, export controls have backfired. They accelerated China’s domestic chip industry, forced their AI ecosystem to optimize for internal architectures instead of the American tech stack, and caused the United States to concede the second-largest technology market in the world. Jensen compared this directly to how US telecom policy allowed Huawei to dominate global telecommunications infrastructure.

    Third, Jensen argued that AI is a five-layer stack (energy, chips, computing platform, models, applications) and the US needs to win at every layer. Fixating on one layer (models) at the expense of another layer (chips) is counterproductive. If Chinese open source AI models end up optimized for non-American hardware and that stack gets exported to the global south, the Middle East, Africa, and Southeast Asia, the US will have lost something far more valuable than whatever marginal compute advantage the export controls provided.

    Dwarkesh countered with the Mythos example: Anthropic’s new model found thousands of high-severity zero-day vulnerabilities across every major operating system and browser, including one that had existed in OpenBSD for 27 years. If China had enough compute to train and deploy a model like Mythos at scale before the US could prepare, the cyber-offensive capabilities would be devastating.

    Jensen’s response was direct. Mythos was trained on “fairly mundane capacity” that is already abundantly available in China. The amount of compute is not the bottleneck for that kind of breakthrough. Great computer science is, and China has no shortage of brilliant AI researchers. He pointed to DeepSeek as evidence: most advances in AI come from algorithmic innovation, not raw hardware. If China’s researchers can achieve breakthroughs like DeepSeek with limited hardware, imagine what they could do with more.

    Jensen also argued for dialogue over confrontation. He said it is essential that American and Chinese AI researchers are talking to each other, and that both countries agree on what AI should not be used for. The idea that you can prevent AI risks by cutting off chip sales, when the real advances come from algorithms and computer science, reflects a fundamental misunderstanding of how AI progress works.

    The debate ended without resolution, but Jensen’s final point was sharp: “I’m not talking to somebody who woke up a loser. That loser attitude, that loser premise, makes no sense to me.”

    Why Not Multiple Chip Architectures?

    Near the end of the interview, Dwarkesh asked why Nvidia does not run multiple parallel chip projects with different architectures, like a Cerebras-style wafer-scale design or a Dojo-style huge package, or even one without CUDA.

    Jensen’s answer was simple: “We don’t have a better idea.” Nvidia simulates all of these alternative approaches in its internal simulators and they are provably worse. The company works on exactly the projects it wants to work on. If the workload were to change dramatically (not just the algorithms, but the actual market shape), Nvidia might add other accelerators.

    In fact, Nvidia recently did exactly this by acquiring Groq. The inference market is now segmenting into different tiers. Some customers will pay premium prices for extremely fast response times even if throughput is lower. This creates a new “high ASP token” segment that justifies a different point on the performance curve. But Jensen was clear: if he had more money, he would put it all behind Nvidia’s existing architecture, not diversify into alternatives.

    Nvidia Without AI

    Jensen closed by saying that even if the deep learning revolution had never happened, Nvidia would be “very, very large.” The premise of the company has always been that general-purpose computing cannot scale indefinitely and that domain-specific acceleration is the way forward. Molecular dynamics, seismic processing, image processing, computational lithography, quantum chemistry, and data processing all benefit from GPU acceleration regardless of AI. Jensen said the fundamental promise of accelerated computing has not changed “not even a little bit.”

    Thoughts

    This interview is one of the most revealing Jensen Huang conversations in years, partly because Dwarkesh actually pushes back instead of lobbing softballs. A few things stand out.

    The Anthropic regret is real and significant. Jensen is essentially admitting that Nvidia’s biggest strategic miss of the AI era was not understanding that foundation model companies needed supplier-level capital commitments, not VC funding. The fact that Google and AWS used compute investments to lock in Anthropic’s architecture choices has had downstream consequences that Nvidia is still working to unwind. When Jensen says Anthropic is “a unique instance, not a trend” for TPU adoption, he is simultaneously downplaying the threat and revealing exactly how seriously he takes it.

    The China debate is the highlight. Jensen’s argument is more nuanced than it first appears. He is not saying “sell China everything.” He is saying the current binary approach of near-total restriction has backfired by accelerating China’s domestic chip industry and pushing the Chinese AI ecosystem away from the American tech stack. His comparison to the US telecom industry losing global market share to Huawei is pointed and historically grounded. Whether you agree with his conclusion or not, the framing of AI as a five-layer stack where the US needs to compete at every layer is a useful mental model.

    The “electrons to tokens” framing is Jensen at his best. It is a simple metaphor that captures something genuinely complex about where value is created in the AI supply chain. And his insistence that the transformation is “far from deeply understood” is a subtle way of arguing that Nvidia’s competitive position will be durable because the problem space is not close to being solved.

    The Groq acquisition reveal is interesting for what it signals about the inference market. If Nvidia is creating a separate product tier for premium-priced, low-latency tokens, it suggests the company sees inference economics fragmenting significantly. This aligns with the broader trend of AI becoming an enterprise product where different customers have wildly different willingness to pay based on how they use tokens.

    Finally, Jensen’s refusal to diversify chip architectures is a bold bet. “We simulate it all in our simulator, provably worse” is an incredibly confident statement. History is full of companies that were right until they were not. But Nvidia’s track record of 50x generation-over-generation improvements through co-design across processors, fabric, libraries, and algorithms is hard to argue with. The question is whether the current paradigm of transformer-based models on GPU clusters represents a local or global optimum for AI compute.