PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: fine tuning

Can the AI Industry Regulate Itself? All-In on Demis Hassabis’s SRO Proposal, Stripe’s PayPal Bid, Apple vs OpenAI, and New York’s Data Center Ban
The besties open on the biggest live question in artificial intelligence policy: can the AI industry regulate itself before the government does it for them? Jason Calacanis, Chamath Palihapitiya, David Sacks, and David Friedberg dig into DeepMind co-founder Demis Hassabis’s proposal for a FINRA-style self-regulatory organization for frontier models, then work through a packed docket that runs from Stripe’s audacious bid for PayPal to Apple’s trade-secrets lawsuit against OpenAI, the xAI Grok Build data leak, the economics of token spend, New York’s first-in-the-nation data center moratorium, foreign influence campaigns shaping American attitudes toward AI, and a science corner on an enzyme that reverses skin aging. You can watch the full episode here.

TLDW

Demis Hassabis proposed a US-led international AI standards body modeled on FINRA: federally overseen, industry funded, run by independent technical experts, with frontier labs submitting models 30 days before release, voluntary at first and mandatory later. The proposal drew broad endorsement across the industry, and the besties debate whether an SRO beats the alternatives. Sacks says he could get on board only under five strict conditions (broad representation including startups and open source, frontier-only review, catastrophic-risk-only scope, voluntary-first, and substitution for rather than addition to new agencies), and warns the plan is an opening bid that Anthropic will use as a stepping stone toward Dario Amodei’s “FAA for AI.” The show then turns to Stripe, Block, and Advent bidding roughly $53 billion for PayPal and what it means for Visa and Mastercard, a wave of AI-native operators reviving stale digital businesses (Bending Spoons, Ryan Cohen), Apple’s lawsuit accusing OpenAI of stealing trade secrets, xAI’s Grok Build silently uploading entire codebases despite a privacy setting, the enormous spread in token costs and Ramp’s new spend controls, Apple’s local-model opportunity with M7 Ultra silicon, America’s looming energy deficit and behind-the-meter power, New York’s hyperscale data center moratorium, alleged Russian and PRC influence operations shaping anti-GMO and anti-data-center sentiment, and a science corner on a Calico enzyme that degrades glycation products to reverse skin aging.

Thoughts

The most important idea in this episode is not the SRO itself but Sacks’s framing of it as an opening bid. His five conditions are a genuinely useful blueprint for how self-regulation could work without curdling into regulatory capture, and his instinct that catastrophic-risk-only scope (cyber and CBRN, not disinformation or “microaggressions”) is the only defensible mandate is the right line to draw. But the deeper point is structural: when an industry walks into government and says “please regulate me,” almost no one in government answers “we’re not qualified.” They say thank you and come back for more. That asymmetry, not any specific rule, is what makes voluntary concessions dangerous. If the SRO is offered for free rather than traded for hard federal preemption written into law, it becomes the floor of a ratchet, not the ceiling of a compromise.

The Anthropic critique running through the segment deserves to be taken on its merits rather than dismissed as a grudge. The claim is specific and falsifiable: that a company now valued in the trillions is funding a state-by-state strategy of one-upmanship, where each new bill is tougher than the last, deliberately producing a patchwork rather than the single national framework everyone claims to want. Whether or not you accept the motive, the mechanism is real and the incentives are legible. If your cost per million tokens is fifty to a hundred times your competitor’s, and cheaper open models plus fine-tuning can cover the vast majority of tasks, then the fastest way to protect a premium price is to make the cheap alternatives legally or practically harder to ship. That is the ladder-pulling thesis, and the token-cost numbers cited on the show are the reason it is not paranoid.

The PayPal bid is the clearest signal of a new operating logic in the capital markets. The interesting question Chamath poses is not “what synergies does PayPal have” but “what is the only thing Advent, Stripe, and Block could build together,” and the answer is a genuine competitor to Visa and Mastercard: hundreds of millions of consumer accounts, Stripe’s merchant relationships and risk infrastructure, Block’s point-of-sale and Cash App, and stablecoin rails from Bridge and PYUSD that can push transactions on-us and bypass the card networks. The antitrust twist is elegant. Define the market as merchant APIs and it looks like consolidation; define it as the card duopoly and the same deal is pro-competitive. This deal would have been dead on arrival two years ago, and the fact that it is live now tells you as much about the regulatory climate as it does about payments.

Underneath the payments story is a broader thesis worth naming: AI-native operators buying mature, founder-less, “stale” digital businesses and modernizing them. Bending Spoons rolling up AOL, Vimeo, Evernote, WeTransfer, and Eventbrite is the template, and Ryan Cohen’s eBay interest is the second dot on the line. The claim is that a modern operator can diagnose where a legacy business overspends, underinvests, and fails to use AI, then fix it with a small team of AI-first executives rather than a McKinsey engagement. It is a persuasive pattern, though PayPal is a harder case than the show admits: a 25-year-old interaction model growing 7% a year is not obviously revived by efficiency alone. Buying 400 million consumer accounts is buying distribution, not a product vision, and the open question is whether anyone can resuscitate the consumer experience rather than just milk it.

The data center segment is where policy, energy, and information warfare collide, and Friedberg’s anti-GMO analogy is the sharpest thing in it. His argument is that manufactured public sentiment, traceable in one case to a foreign media push, can override the scientific and economic merits of a technology for years, and that the anti-data-center movement rhymes with it: closed-loop cooling that uses trivial amounts of water, land-use efficiency that dwarfs almonds and golf courses, and natural gas that burns clean, all drowned out by a moral panic. Whether or not you buy the specific foreign-influence attribution, the underlying tension is real and unresolved. America is staring at a structural electricity deficit while individual blue states treat data centers as a luxury they can refuse, and behind-the-meter power plus edge compute chasing cheap electrons is emerging as the workaround. The moratorium framing matters most here: a “pause” on data centers is not a few months, it is five years once you count ramp-up, and that is long enough to lose a race that may only be measured in months of lead.

Key Takeaways
- Demis Hassabis proposed a US-led international AI standards body modeled on FINRA: federally overseen, industry funded, and run by independent technical experts rather than a new government agency.
- Under the proposal, frontier labs would submit models roughly 30 days before release; the body would assess risk to cybersecurity, national security, and biological threats, update benchmarks quarterly, and could coordinate a development slowdown if the situation demanded it.
- The plan would be voluntary at first and mandatory later, and drew endorsement from a broad set of industry figures including Elon Musk, Sam Altman, Anthropic’s Jack Clark, Sundar Pichai, Satya Nadella, and Jack Dorsey.
- A self-regulatory organization (SRO) like FINRA or the National Futures Association lets the industry set its own testing rules under federal oversight, adjusting faster than a government agency could as the technology changes.
- Sacks laid out five conditions for supporting an SRO: broad representation including startups and open source; review of true frontier models only; scope limited to catastrophic risk (cyber and CBRN); voluntary before mandatory; and a substitute for, not an addition to, new regulatory agencies.
- Sacks argued a government “FAA for AI” would be extreme: type certification for a new aircraft design takes 5 to 9 years, and applying that permission-based model to AI would push release timelines from months to years and lose the race to China.
- He characterized the SRO as an “opening bid” that Anthropic and others would use as a stepping stone toward Dario Amodei’s repeatedly stated goal of an FAA-style regulator, unless it is traded for hard federal preemption written into law.
- The besties cited a Politico report on Anthropic’s alleged state-by-state strategy of one-upmanship, using California’s SB 53 as a model and then ratcheting each subsequent state’s rules tougher, producing a patchwork rather than a single national framework.
- Chamath warned of a “torrent of money” trying to influence both political parties toward some form of regulatory capture, and urged establishing industry rules quickly to supersede the need for a federal agency.
- Stripe and private equity firm Advent, joined by Jack Dorsey’s Block contributing about $17 billion in equity, are jointly bidding roughly $53 billion (about $60 per share) for PayPal, with many expecting the final clearing price closer to $70.
- The strategic logic is a new competitor to Visa and Mastercard: PayPal’s 400-plus million consumer accounts, Stripe’s merchants and risk infrastructure, Block’s point-of-sale and Cash App, and stablecoin rails from Stripe’s Bridge and PayPal’s PYUSD.
- The antitrust outcome hinges on market definition: framed as merchant APIs (Stripe vs. Braintree) it looks anti-competitive, but framed against the Visa/Mastercard duopoly it is pro-competitive, and a deal like this would have been blocked two years ago.
- PayPal peaked around a $322 billion market cap and fell to roughly $30 to 40 billion, which is precisely why it is now attracting bids; Stripe now processes more annual volume than PayPal, but lacks PayPal’s consumer relationship.
- Sacks traced PayPal’s long stagnation to its 2002 eBay acquisition under Meg Whitman, when the founding team was pushed out; the “PayPal mafia” (which Sacks prefers to call the “PayPal diaspora”) formed as a result.
- The deal is framed as part of a wave of AI-native operators reviving mature, founder-less digital businesses, with Bending Spoons (AOL, Vimeo, Evernote, WeTransfer, Eventbrite) as the roll-up template and Ryan Cohen’s eBay interest as another data point.
- M&A is broadly “back on the menu” post-Lina Khan, with deals like Uber acquiring Delivery Hero, driving liquidity and renewed LP appetite for venture alongside SpaceX distributions.
- Apple filed a 41-page lawsuit against OpenAI on July 10th alleging stolen trade secrets tied to OpenAI’s consumer hardware device; OpenAI’s chief hardware officer Tang Tan is a former Apple VP of iPhone design.
- The complaint alleges Apple job candidates were directed to bring actual parts to OpenAI interviews for “show and tell,” and cites a text about accessing network storage; OpenAI has reportedly poached over 400 Apple employees.
- The besties’ rule of thumb: when leaving a company, the only thing you can take is what is in your head; no documents, thumb drives, or files, because Apple rarely litigates and doing so signals something egregious.
- xAI’s Grok Build, powered by Grok 4.5 and running inside Cursor, was reportedly sending users’ entire codebases (potentially including passwords and API keys) to servers despite a privacy setting meant to prevent it; xAI disabled the upload on July 13th and open-sourced the harness.
- Chamath’s takeaway: privacy in AI is fragile and brittle, “zero data retention” cannot be guaranteed, and there are non-obvious data-leak vectors and “trap doors” everywhere, arguing for a stratified ecosystem with independent third-party layers between enterprises and models.
- The “reverse information paradox” (building on Palantir’s Alex Karp) holds that technically capable enterprises want control over their compute, models, weights, data, and “alpha,” via real trust boundaries, private evals, in-tenant learning loops, decoupled orchestration, and the right to fine-tune.
- Cited token costs per million showed a huge spread: roughly $56 on a premium frontier model, about $26 on another, roughly $1.50 for Grok input, around $1 for Elon’s, and about 50 cents for Chinese models, with a claim that 95 to 98% of tasks could run one tier cheaper.
- Ramp CEO Eric Glyman launched token spend management because CFOs cannot see or control AI spend; Ramp customers’ token spend has grown 21x in a year, and someone will eventually miss an earnings quarter on runaway AI opex.
- Engineers optimize for the latest, greatest model while CFOs bear the cost, a misalignment that platforms fine-tuning cheaper open models (like Mira Murati’s Thinking Machines effort) are positioned to exploit.
- Calacanis called Apple a “screaming buy” on local models: rumored M7 Ultra silicon supporting up to 1.5 terabytes of memory could run last-generation frontier-class models locally on a Mac Studio, putting downward pressure on cloud AI pricing.
- Edge compute is fragmenting outward: Sunrun announced distributed data center blocks for homes, and Span partnered with Nvidia, with compute increasingly “chasing energy” like cheap solar and battery power.
- Chamath projected the US will be short 2.5 Californias’ worth of energy by 2050; a recent PJM auction that needed 7 to 8 gigawatts reportedly saw only a fraction show up, underscoring the electricity crunch.
- “Behind the meter” power lets data centers generate their own electricity on owned property, but clean-air permitting is a major obstacle; Elon reportedly used clustered mobile engines and solutions like Bloom Energy to keep projects under personal-use permits (as with Colossus in Memphis).
- New York Governor Kathy Hochul announced the nation’s first statewide moratorium on hyperscale data centers; the besties rebutted her claims on power, land, noise, water, and pollution point by point.
- Modern data centers use closed-loop cooling (one claim compared a typical facility’s water use to a couple of In-N-Out restaurants), occupy trivial land relative to their economic value, generate tax revenue and construction jobs, and are largely powered by clean-burning natural gas.
- Sacks argued the same political forces slowing domestic data centers are also behind chip export controls that would block data centers in allied countries, raising the question of where the buildout can happen at all.
- Friedberg drew an anti-GMO analogy: he argued anti-GMO sentiment tracked the US presence of Russia Today (2010 to 2022) rather than the science, and worried a similar manufactured sentiment is now driving anti-data-center attitudes.
- Sacks cited an OpenAI blog post on PRC-linked influence operations targeting US AI debates, with a congressional investigation reportedly coming, noting China has a clear incentive to slow American AI infrastructure.
- Sacks framed the moment as a “moral panic”: the catastrophes people fear from AI (cyber, job loss) have not materialized, yet the US risks damaging its crown jewel of free-market innovation with premature regulation over hypothetical risks.
- The panel questioned Dario Amodei’s prediction that 50% of entry-level knowledge-worker jobs could disappear within one to five years, arguing the harms have not shown up and only a handful of frontier labs (which already do safety testing and red-teaming) even matter.
- A cited framing of the alleged Anthropic strategy: brand yourself as the safe AI company, ban unsafe AI, then profit; a fresh Chinese model (Kimi K2) was noted as very close to the frontier, suggesting a US lead of only months.
- Science corner: a paper from Google’s Calico and partner Retro-style researchers used AlphaFold plus directed evolution to engineer a novel enzyme that degrades CML, a key advanced glycation end product in the extracellular matrix that drives aging.
- The engineered enzyme cleared 52 to 97% of CML from body proteins in vitro and eliminated 55% of CML from donated elderly human skin, effectively reversing that skin’s biological age toward that of a 31-year-old, pointing first toward a potentially trillion-dollar cosmetic market.
Detailed Summary

Demis Hassabis’s FINRA-Style SRO for AI

DeepMind’s Demis Hassabis published a proposal for a US-led international AI standards body modeled on FINRA, the Financial Industry Regulatory Authority. The design is federally overseen but industry funded and run by independent technical experts. Frontier labs would submit models about 30 days before release, and models would be assessed for risk across cybersecurity, national security, biological threats, and other high-risk domains. Benchmarks would update quarterly, the body could coordinate a development slowdown if warranted, and participation would be voluntary at first and mandatory later. The proposal drew endorsements across the industry, including Elon Musk (who called it thoughtful), Sam Altman, Anthropic’s Jack Clark, Sundar Pichai, Satya Nadella, and Jack Dorsey.

Friedberg explained the SRO concept: bodies like FINRA and the National Futures Association let financial institutions set their own regulatory rules and check one another, under federal oversight but not federal control, reporting up to Senate and House committees. The AI analogy is that many players are all advancing the technology and none wants a single outside regulator dictating tests, especially after California’s earlier AI legislation was, in his telling, outdated by the time it would have taken effect. An SRO can bring in industry experts, adjust tests over time, and operate faster than a new agency. Chamath endorsed it strongly, warning that a “torrent of money” will try to influence both political parties toward regulatory capture, and that establishing rules quickly is the way to avoid that off-ramp while retaining ultimate federal oversight through Commerce and the DOJ.

Sacks’s Five Conditions and the “FAA for AI” Warning

Sacks said he could personally get on board with an SRO because it is “infinitely better” than a new government agency that would become a “DMV for AI,” or worse, Dario Amodei’s “FAA for AI.” He laid out five conditions: the SRO must have broad industry representation including startups and open source (to avoid the three biggest labs capturing it); it should review only true frontier models that represent a step change in capability, not hold up lesser models; its scope should be catastrophic risk only, meaning cyber and CBRN (chemical, biological, radiological, nuclear), not disinformation or speech; it should be voluntary before mandatory, proving it works first; and it must substitute for, not add to, new regulatory structures.

He then explained why an FAA model is extreme: the FAA approves new airplane designs through type certification, which takes 5 to 9 years for a new aircraft and 3 to 5 years for major amendments. Applying permission-based regulation to AI, where new model versions ship every couple of months, would push timelines from months to years and lose the race to a China that will not abide by those rules. His conclusion: if the choice is FAA for AI, DMV for AI, or Hassabis’s SRO, the SRO wins, but it has to be kept “honest and pure,” because otherwise it becomes the opening bid in a coming wave of regulation and a vehicle for massive regulatory capture. He argued that companies making concessions to buy off politicians will only invite the government to come back for more, and that at some point these companies have to grow a spine, draw a line, and demand preemption in exchange.

The Anthropic Regulatory-Capture Debate

Sacks revisited his October claim that Anthropic was running a “sophisticated regulatory capture strategy based on fear-mongering,” arguing that what looked like beating up on a startup now looks different given Anthropic’s trillion-dollar valuation and industry-leading revenue. He cited a Politico piece, “Inside Anthropic’s state-by-state plan to ratchet up AI rules,” describing a strategy of one-upmanship: pass a model bill like California’s SB 53, then make each subsequent state’s rules stricter, deliberately producing a patchwork instead of a single national framework. The panel noted states have strong sovereignty rights (as with self-driving cars) and Anthropic is “winning” in California, Illinois, New York, and other blue states, because government officials rarely refuse an invitation to regulate.

Stripe, Block, and Advent Bid for PayPal

Stripe and private equity firm Advent, joined by Jack Dorsey’s Block contributing about $17 billion in equity, are jointly bidding roughly $53 billion (about $60 per share) for PayPal, with many expecting a final price closer to $70. PayPal still has more than 400 million consumer accounts and processes about $1.7 trillion a year, but its 25-year-old product is growing only about 7% and is seen as legacy. Chamath’s key question was what unique thing this trio could build: a competitor to Visa and Mastercard. Combining PayPal’s consumer accounts, Stripe’s merchant relationships and risk infrastructure, Block’s point-of-sale and Cash App, and stablecoin rails from Stripe’s Bridge and PayPal’s PYUSD would allow far more on-us transactions that bypass the card networks, potentially passing large discounts to merchants and consumers.

Friedberg walked through the deal structure: the $17 billion equity contribution effectively means Stripe and Block sell equity to cash investors, that cash buys PayPal, and the parties end up cross-owning pieces of each other, with the Stripe team the likely operator post-close. The antitrust question turns on market definition: framed as merchant APIs, it is Stripe versus Braintree and looks like consolidation; framed against the Visa/Mastercard duopoly, adding competition is pro-competitive. Sacks noted the deal would have been “the antitrust equivalent of a colonoscopy” two years ago. He also recounted PayPal’s history: acquired by eBay in 2002 under the corporate-minded Meg Whitman, the founding team was pushed out, creating what he prefers to call the “PayPal diaspora” rather than the “PayPal mafia.”

AI-Native Operators and the M&A Wave

Freeberg framed the PayPal and eBay stories as part of an emerging line: AI-native operators buying first-generation digital-native businesses that have gone mature, stale, and founder-less, and that have not yet realized their AI potential or are overspending. Bending Spoons is the roll-up template, having acquired AOL, Vimeo, Evernote, WeTransfer, and Eventbrite and revitalized them from Milan with young, AI-first executives. The panel connected this to Josh Kushner’s and General Catalyst’s roll-ups of traditional services businesses. Calacanis added the macro backdrop: after venture was “on the ropes” under Lina Khan, M&A is “back on the menu,” with deals like Uber acquiring Delivery Hero, renewed LP appetite, and liquidity from SpaceX distributions.

Apple Sues OpenAI Over Trade Secrets

Apple filed a 41-page lawsuit against OpenAI on July 10th alleging stolen trade secrets used to develop OpenAI’s consumer hardware device. OpenAI’s chief hardware officer, Tang Tan, is Apple’s former VP of iPhone design; the complaint alleges he directed Apple job candidates interviewing at OpenAI to bring “actual parts” for “show and tell,” and cites a text from a former Apple engineer about accessing network storage. OpenAI has reportedly poached over 400 Apple employees. Chamath noted Apple rarely litigates, so the suit signals something they found egregious, while cautioning that the facts are alleged and unproven. Sacks declined to opine on the specifics but offered a simple rule: when changing jobs, take nothing but what is in your head, no documents, thumb drives, or files.

The Grok Build Data Leak and AI Privacy

xAI’s Grok Build, powered by Grok 4.5 and running inside Cursor, was reportedly sending users’ entire codebases (not just the files needed for a task, but potentially passwords, API keys, and change logs) to servers, despite a privacy setting meant to stop it. xAI disabled the upload on July 13th, Elon said previously uploaded data was deleted, and xAI open-sourced the harness. Chamath used it to make a larger point tied to his CNBC comments and Alex Karp’s remarks: privacy in AI is fragile and brittle, “zero data retention” cannot truly be guaranteed, and there are non-obvious leak vectors and “trap doors” everywhere. His conclusion is that enterprises need a stratified ecosystem with independent third-party layers between them and the models to manage exposure (a model his firm 8090 uses in its “software factory”).

Sacks connected this to a blog post on the “reverse information paradox,” building on Karp’s point that technically capable enterprises want control over their compute, models, weights, data, and “alpha.” The recipe: establish a real trust boundary with private evals, proprietary learning loops inside the tenant, decoupled orchestration, and the explicit right to fine-tune their own outputs. He described an emerging ecosystem forming alternatives to the monolithic closed model stacks that Anthropic and, to some extent, OpenAI want customers locked into.

Token Economics and Ramp’s Spend Controls

The panel cited a wide spread in cost per million tokens: roughly $56 on a premium frontier model, about $26 on another (similar to a Claude tier), around $1.50 for Grok input, about $1 for Elon’s, and roughly 50 cents for Chinese models. Calacanis said he built a deep-linking podcast player across models on Perplexity and that the new Grok run cost only $11. Ramp CEO Eric Glyman appeared on Squawk Box to launch token spend management, noting Ramp customers’ token spend has grown 21x in a year and that CFOs struggle to see or control spend on an open-ended tab where rates rise with each new model. The takeaway: engineers optimize for the newest model while CFOs bear the cost, and unless that misalignment is controlled, runaway opex becomes a “money-burning furnace” that will eventually cause a public company to miss earnings. The panel argued 95 to 98% of tasks could run one tier cheaper, which is exactly the opportunity platforms fine-tuning cheaper open models (like Mira Murati’s Thinking Machines) are chasing.

Apple’s Local-Model Opportunity and Edge Compute

Calacanis called Apple a “screaming buy,” citing Mark Gurman’s report that a rumored M7 Ultra chip could support up to 1.5 terabytes of memory, double the current ceiling. That would let a Mac Studio run last-generation frontier-class models locally, giving users effectively unlimited tokens on the desktop and putting downward pressure on cloud AI pricing from the likes of Anthropic and OpenAI. Freeberg added that edge compute is fragmenting outward: solar company Sunrun announced distributed data center blocks for homes, and Span partnered with Nvidia. The theme is compute chasing cheap energy, whether excess solar or battery power charged at night.

The Energy Deficit and Behind-the-Meter Power

Chamath warned the US will be short about 2.5 Californias’ worth of energy by 2050, and pointed to a recent PJM auction (serving Pennsylvania, New Jersey, Maryland and other states) that needed 7 to 8 gigawatts but reportedly saw only a fraction show up. He explained “behind the meter” power: rather than drawing grid power from a utility line, a data center generates its own electricity on owned property. The obstacle is clean-air permitting. Solar takes too much space and batteries still need a generation source, so operators use gas. He described Elon clustering mobile 18-wheeler-style engines to keep them under personal-use permits, and newer solutions like Bloom Energy that allow large installations under similar rules, which is how projects like Colossus in Memphis got off the ground.

New York’s Data Center Moratorium

New York Governor Kathy Hochul announced the nation’s first statewide moratorium on hyperscale data centers, citing power draw, land use, water, and noise pollution. The besties rebutted each claim: behind-the-meter power means facilities bring their own electricity rather than competing with residential ratepayers; data centers are highly land-efficient, and New York State is roughly 70 to 80% undeveloped outside the city; noise can be managed with distance; modern facilities use closed-loop cooling (one comparison put a typical facility’s water use at a couple of In-N-Out restaurants, far less than almonds or golf courses); and natural gas is a clean-burning power source. They noted the tax revenue, construction boom, and ongoing jobs data centers create. Sacks cited a theory that Democrats intend the “moratorium” as leverage: pause construction until they can dictate terms, then lift it under a future administration in exchange for a new regulatory agency and speech controls ported from the social-media trust-and-safety agenda. He stressed a moratorium is effectively a five-year pause once ramp-up is counted, and that the same forces slowing domestic builds are pushing chip export controls that would block data centers in allied countries too.

Foreign Influence, Anti-GMO, and the AI Moral Panic

Freeberg drew an extended analogy between anti-data-center sentiment and anti-GMO sentiment. He argued that GMOs were prevalent and uncontroversial from their 1996 launch until anti-GMO sentiment rose in tandem with Russia Today’s US presence (2010 to 2022) and fell after RT was pushed out, and that similar KGB-era “directed measures” influence campaigns can be traced to opposition to nuclear energy in Germany. He cited a poll showing over 50% of Americans believe data centers increase water and electricity costs even where facilities recycle water and generate their own power. Sacks pointed to an OpenAI blog post on PRC-linked influence operations targeting US AI debates, with a congressional investigation reportedly coming, arguing China has a clear incentive to slow US AI infrastructure, kill open source, and constrain cheaper models. Sacks then broadened it to a “moral panic”: the feared catastrophes (cyber, job loss) have not materialized, yet the US risks damaging its crown jewel of free-market innovation over hypothetical risks, questioning Dario Amodei’s prediction that 50% of entry-level knowledge-worker jobs could vanish within one to five years and noting the fresh Chinese model Kimi K2 is close to the frontier.

Science Corner: An Enzyme That Reverses Skin Aging

Freeberg closed with a paper from Google’s secretive longevity startup Calico and a pharma partner focused on the extracellular matrix, the space between cells. Over time, sugars and fats bind to proteins there in a process called glycation, accumulating as advanced glycation end products (chiefly a molecule called CML) that stiffen tissue, cause wrinkles and immobility, and drive inflammation, with nothing in the body to break them down. The researchers used AlphaFold to find a protein that could bind and degrade CML, then applied directed evolution across five recursive cycles, DNA-programming thousands of variants to maximize activity. The engineered enzyme cleared 52 to 97% of CML from body proteins like collagen, casein, and hemoglobin in vitro, and eliminated 55% of CML from donated elderly human skin, effectively reversing that skin’s biological age toward a 31-year-old’s. Open questions remain about delivery (cream, shot, supplement, or an RNA therapy that makes the enzyme inside the body), but the panel expects the first market to be a trillion-dollar cosmetic one, and hailed it as a profound demonstration of AI-driven protein engineering.

Notable Quotes

“The whole industry is going to need to be regulated and I think the industry needs to regulate themselves. That’s the key to this.”
Jason Calacanis, replaying his earlier call for AI self-certification

“If my choices are between FAA for AI or what I would call the DMV for AI, I would much rather go for Demis’ SRO for AI.”
David Sacks, on why self-regulation beats a new government agency

“There’s hardly anyone in government who will ever say, oh no no no, we’re not qualified. Most people in the government will say thank you very much, what else can we take.”
David Sacks, on the asymmetry that makes voluntary concessions dangerous

“What it prevents is a handful of actors using their balance sheets and their capital to essentially pull the ladder up.”
Chamath Palihapitiya, on the point of establishing industry rules quickly

“You are creating a competitor to Visa and Mastercard.”
Chamath Palihapitiya, on the only thing Stripe, Block, and Advent could build together with PayPal

“The only thing you can bring to your new job is what’s in your head. Your memories. But never leave with anything else.”
David Sacks, on avoiding trade-secret disputes when changing employers

“Privacy in AI is very fragile and it’s very brittle. You are leaking information where you don’t know it.”
Chamath Palihapitiya, on the limits of zero-data-retention promises

“Unless you get a control of this and you can directly say how much money you’re making, this is a bridge to nowhere. It is a money burning furnace.”
Chamath Palihapitiya, on uncontrolled enterprise token spend

“We’re on the threshold of destroying the crown jewel of our economy, which is the system of free market innovation that we have.”
David Sacks, on the risk of a premature AI regulatory apparatus

“Number one, brand yourself as a safe AI company. Number two, ban unsafe AI. Three, profit.”
David Sacks, summarizing the strategy he attributes to the “safe AI” positioning

Watch the full conversation here: Can the AI Industry Regulate Itself? on the All-In Podcast.

Related Reading
- FINRA the financial-industry self-regulatory organization that Demis Hassabis’s AI proposal is modeled on.
- AlphaFold (Wikipedia) the protein-structure prediction system behind the age-reversal enzyme discovery in the science corner.
- PayPal Mafia (Wikipedia) background on the founders Sacks calls the “PayPal diaspora.”
- The Founders by Jimmy Soni, the definitive history of PayPal’s founding team and its diaspora.
- Advanced glycation end-products (Wikipedia) the biochemistry of CML and the extracellular-matrix aging the Calico enzyme targets.
July 18, 2026
Jensen Huang at Stanford CS153 Frontier Systems on Co-Design, Agentic Computing, Vera Rubin, Open Models, and the Million-X Decade That Reshaped AI Infrastructure
https://www.youtube.com/watch?v=tsQB0n0YV3k

NVIDIA CEO Jensen Huang returned to Stanford for the CS153 Frontier Systems class (the room nicknamed itself “AI Coachella”) to lay out, in raw form, how he thinks about the computer being reinvented for the first time in over sixty years. Across roughly seventy minutes of student questions he walks through the codesign philosophy that gave NVIDIA a million-x decade, the architectural through-line from Hopper to Grace Blackwell to Vera Rubin to Feynman, the case for open source foundation models, the realities of tokens per watt and MFU, energy demand running a thousand times higher, the China and export-control debate, and his own biggest strategic mistakes. Watch the full conversation on YouTube.

TLDW

Huang argues every layer of computing has changed: the programming model, the system architecture, the deployment pattern, the economics. Co-design across CPUs, GPUs, networking, storage, switches and compilers gave NVIDIA roughly a million-x speed-up over ten years versus the ten-x Moore’s Law era, and that headroom is what let researchers say “just train on the whole internet.” Hopper was built for pre-training, Grace Blackwell NVLink72 for inference and reasoning (50x over Hopper in two years), Vera Rubin is built for agents that load long memory, call tools and need a low-latency single-threaded CPU bolted directly to the GPU, and Feynman extends that to swarms of agents that spawn sub-agents. Open weights matter because safety, sovereignty (230-plus languages no one else will fund) and domain models for biology, autonomy, robotics and climate need a foundation that NVIDIA is willing to seed. Compute is not really the scarce resource (Huang says place the order and the chips ship), the broken thing is institutional budgeting that can’t put a billion dollars into a shared university supercomputer. Energy demand is heading a thousand times higher and this is finally the moment market forces alone will fund sustainable generation. On geopolitics he rejects the GPUs-as-atomic-bombs framing and warns America will end up like its telecom industry if it cedes two thirds of the world. On career he advises seeking suffering on purpose. On strategy he says observe, reason from first principles, build a mental model, work backwards, minimize opportunity cost, maximize optionality.

Key Takeaways
- The computing model has been substantially unchanged since the IBM System 360, sixty-plus years ago. Huang’s first computer architecture book was the System 360 manual. AI is the first true reinvention.
- Old computing was pre-recorded retrieval. New computing is generated, contextually aware and continuous. Cloud was on-demand. Agentic systems run continuously.
- Codesign is NVIDIA’s central thesis. Inherited from the Hennessy and Patterson RISC era at Stanford, extended across CPUs, GPUs, networking, switches, storage, compilers and frameworks all optimized together.
- The result of full-stack codesign: roughly 1,000,000x faster compute over ten years, versus a generous 10x to 100x for Moore’s Law in the same period. Dennard scaling effectively ended a decade ago.
- That million-x speed-up is what unlocked “train on all of the internet” as a realistic AI strategy.
- After GPT, Huang says it was obvious thinking was next. Reasoning is just generating tokens consumed internally, then using tools is generating tokens consumed externally. Agentic systems followed predictably.
- Education needs AI baked into the curriculum, not just taught as a subject. Pre-recorded textbooks cannot keep pace with knowledge being generated in real time.
- Huang says he cannot learn anymore without AI. He has the AI read the paper, then read every related paper, then become a dedicated researcher he can interrogate.
- Mead and Conway and the first-principles methodology of semiconductor design are still worth learning even though most of the scaling tricks have been exhausted.
- NVIDIA itself is one of the largest consumers of Anthropic and OpenAI tokens in the world. One hundred percent of NVIDIA engineers are now agentically supported. Huang recommends Claude and similar tools by name and says open-source downloads will not match the integrated product harness.
- NVIDIA still invests heavily in open foundation models because language and intelligence represent the codification of human knowledge. Five pillars: Nemotron (language), BioNeMo (biology), Alphamayo (autonomous vehicles), Groot (humanoid robotics) and a climate science model (mesoscale multiphysics).
- Sovereign language models matter. Roughly 230 world languages will never be a top priority for a commercial frontier lab. Nemotron is near-frontier and fully fine-tunable so any country can adapt it.
- Safety and security require open weights. You cannot defend against or audit a black box. Transparent systems let researchers interrogate models and let defenders deploy swarms.
- The future of cyber defense is not bigger-model-versus-bigger-model. It is trillions of cheap fast small models like Nemotron Nano surrounding the threat.
- Domain models fuse language priors with world models. Alphamayo learned to drive safely on a few million miles instead of billions because it can reason like a human about the road.
- MFU (Model Flops Utilization) is a misleading metric. Huang says he wants low MFU, because that means he over-provisioned every resource and never gets pinned by Amdahl’s law during a spike.
- The xAI Memphis cluster running at 11 percent MFU is not necessarily a failure mode. In disaggregated prefill plus decode inference you can deliver very high tokens per watt with very low MFU.
- The right metric is performance, ultimately tokens per watt as a proxy for intelligence per watt, and even that needs adjustment because not all tokens are equal. Coding tokens are worth more than other tokens.
- Hopper was designed for pre-training. NVIDIA chose to build multi-billion-dollar systems when the largest existing scientific supercomputer cost $350 million, with no proven customer base. It worked.
- Grace Blackwell NVLink72 was designed for inference, especially the high-memory-bandwidth decode phase. It is the world’s first rack-scale computer and delivered a 50x speed-up over Hopper in two years, against an expected 2x from Moore’s Law.
- Vera Rubin is designed for agents. Long-term memory wired into storage and into the GPU fabric, working memory, heavy tool use, and Vera, a CPU optimized for low-latency multi-core single-threaded code so a multi-billion-dollar GPU system does not stall waiting on a slow tool call.
- Feynman is being shaped for swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that demands a new compute pattern.
- Tokens per watt improved 50x in one generation. Compounding energy efficiency is the lever NVIDIA controls directly.
- Total compute energy demand is heading roughly a thousand times higher than today, possibly two orders of magnitude beyond that. Huang says he would not be surprised if the estimate is low.
- For the first time in history, market forces alone are enough to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make sustainable energy investment rational.
- Copper interconnect is becoming a bottleneck. Photonics is moving from optional to structural inside racks and across them.
- Comparing NVIDIA GPUs to atomic bombs, Huang says, is a stupid analogy. A billion people use NVIDIA GPUs. He advocates them to his family. He does not advocate atomic bombs to anyone.
- If the United States cedes two thirds of the global market to competitors on policy grounds, the American technology industry will end up like American telecommunications, which was policied out of existence.
- Huang directly rejects AI doom-by-singularity narratives. It is not true that we have no idea how these systems work. It is not true that the technology becomes infinitely powerful in a nanosecond. He calls the rhetoric irresponsible and harmful to the field students are about to enter.
- On Stanford specifically: if the university president places an order, NVIDIA will deliver the chips. The bottleneck is that no university department has a billion-dollar compute budget because budgeting is fragmented across grants. Stanford’s $40 billion endowment is more than enough to fix that.
- “It’s Stanford’s fault” is meant as empowerment. If something is your fault, you can solve it.
- Career advice: do not optimize purely for passion. Most people do not yet know what they love. Pick the job in front of you and do it as well as possible. Even as CEO, Huang says, 90 percent of the work is hard and he suffers through it.
- Suffering on purpose builds the muscle of resilience. When the company, the team or the family needs you to be tough, that muscle has to already exist.
- NVIDIA’s first generation of products was technically wrong in nearly every dimension: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point. The strategic recovery, not the technology, taught Huang the lessons that have lasted decades.
- The biggest clean strategic mistake Huang names is the move into mobile chips (Tegra). It grew to a billion dollars then went to zero when Qualcomm’s modem dominance shut NVIDIA out of the 3G to 4G transition. The recovery into automotive and robotics (the Thor chip is the great great great grandson of that mobile lineage) was real, but Huang refuses to rationalize the original choice.
- Forecasting framework: observe, reason from first principles, ask “so what” and “what next” until you have a mental model of the future, place your company inside that model, then work backwards while minimizing opportunity cost and maximizing optionality.
- Best part of the CEO job: living at the intersection of vision, strategy and execution surrounded by people capable enough to make ambitious visions real. Worst part: the responsibility for everyone who joined the spaceship, especially in the near-death moments NVIDIA had four or five times early on.
- Underrated insider note: Huang’s first apple pie with cheese, first hot fudge sandwich and first milkshake all happened at Denny’s. The Superbird, the fried chicken and a custom Superbird-style ham and cheese with tomato and mustard are his order.
Detailed Summary

Computing reinvented from the ground up

Huang frames the moment as the first true rewrite of the computer in sixty-plus years. From the IBM System 360 forward, the mental model of writing code, running code, taking a computer to market and reasoning about applications stayed roughly constant. AI changes the programming model itself. Software is no longer a compiled binary running deterministically on a CPU. It is a neural network running on a GPU producing generated, contextual, real-time output. That cascades into how companies are organized, what tools developers use, what the network and storage stack look like, and what an application is even allowed to do. Robo-taxis, he notes, are an application no one would have attempted before deep learning unlocked perception.

Codesign and the million-x decade

Codesign is the philosophical center of the talk. Huang traces it to the RISC work of John Hennessy at Stanford, where simpler instruction sets won by being co-designed with the compiler rather than maximally optimized in isolation. NVIDIA extends the principle across every layer simultaneously: GPU architecture, CPU architecture, NVLink and NVSwitch fabrics, photonic interconnects, networking silicon, storage paths, CUDA libraries, frameworks and ultimately the model design. The numbers Huang gives are arresting. Moore’s Law in its prime delivered roughly 100x per decade. By the time Dennard scaling broke, real-world gains had compressed to roughly 10x. NVIDIA’s codesigned stack delivered between 100,000x and 1,000,000x over the same ten-year window. That non-linear speed-up is, in Huang’s telling, the precondition for modern AI: it is what allowed researchers to stop curating training sets and just feed the entire internet to the model.

Education has to fuse first principles with AI tools

Asked how curriculum should evolve, Huang argues AI must be integrated into the learning process, not just taught about. He recalls Hennessy writing his textbook by hand a chapter a week while Huang was a student, and says pre-recorded textbooks cannot keep up with the rate at which AI generates new knowledge. He describes his own learning workflow: hand the paper to an AI, then have it read the entire surrounding literature, then treat the AI as a dedicated researcher who can be interrogated. At the same time he defends the classics. Mead and Conway are still the foundation. Most modern semiconductor scaling tricks have been exhausted, but knowing where the field came from sharpens judgment when designing what comes next.

Open source and the five domain pillars

Huang gives one of the most detailed public accounts of why NVIDIA invests so heavily in open foundation models even while being a top customer of closed labs. He recommends Claude and OpenAI by name for production coding work, and says 100 percent of NVIDIA engineers are now agentically supported. The open-weights case rests on three legs. First, language is the codification of intelligence, and there are at least 230 languages that no commercial lab will ever prioritize. Nemotron is built near frontier and released so any country or community can fine-tune it. Second, the same representation-learning approach has to be replicated in domains where the data is not internet text, so NVIDIA seeded BioNeMo for biology, Alphamayo for autonomy, Groot for humanoid robotics and a climate model for mesoscale multiphysics. The economics of those fields would never produce a foundation model on their own. Third, safety and security require transparency. A black box cannot be defended or audited, and the future of cyber defense is not bigger-model-versus-bigger-model but swarms of cheap fast small models like Nemotron Nano surrounding the threat.

MFU is the wrong metric, tokens per watt is closer

A student raises the leaked memo that the xAI Memphis cluster is running at 11 percent Model Flops Utilization. Huang flips the framing. He says he would rather be at low MFU all the time, because that means he over-provisioned flops, memory bandwidth, memory capacity and network capacity. Bottlenecks shift constantly, so over-provisioning across every dimension is what lets the system absorb a spike without getting pinned by Amdahl’s law. In disaggregated inference, where prefill and decode are physically separated and decode is bandwidth-bound rather than flop-bound, NVLink72 can deliver extremely high tokens per watt while reporting very low MFU. Huang argues the right framing is performance, and ultimately tokens per watt as a rough proxy for intelligence per watt, adjusted for the fact that not all tokens are equal. A coding token is worth more than a generic token.

Hopper, Grace Blackwell NVLink72, Vera Rubin, Feynman

Huang gives the clearest public framing of NVIDIA’s roadmap as a sequence of architectural answers to evolving compute patterns. Hopper was built for pre-training, at a moment when NVIDIA chose to build multi-billion-dollar machines while the largest scientific supercomputer in the world cost $350 million and the marketplace for such systems was, on paper, zero. Grace Blackwell NVLink72 was the answer to inference and reasoning: a rack-scale computer that ganged 72 GPUs together because decode needs aggregate memory bandwidth far beyond a single chip. The generation-over-generation speed-up was 50x in two years, twenty-five times what Moore’s Law would have delivered. Vera Rubin is being built explicitly for agents. Agents load long-term memory from storage that has to be wired directly into the GPU fabric, they use working memory, they call tools that run on a CPU, and they wait. So the CPU has to be Vera, optimized for low-latency single-threaded code, because the multi-billion-dollar GPU system cannot afford to idle waiting on a slow tool call. Feynman extends the pattern to swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that will demand its own compute pattern.

Energy demand and the grid

Huang’s energy projection is one of the most aggressive numbers in the talk. NVIDIA can compound tokens per watt by 50x per generation through codesign, but the total compute demand is heading roughly a thousand times higher, and Huang says he would not be surprised if the real figure is one or two orders of magnitude beyond that. The reason is structural: future computing is generative and continuous, not pre-recorded and on-demand. The good news, he argues, is that this is the best moment in the history of humanity to invest in sustainable generation. Market forces alone are now sufficient to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make the math work.

Adversarial countries, export controls and the telecom warning

This is the segment where Huang is visibly fired up. He attacks the GPUs-as-atomic-bombs framing on its face. NVIDIA GPUs power medical imaging, video games and soy sauce delivery. A billion people use them. He advocates them to his family. The analogy collapses at the first comparison. He attacks the second framing, that American companies should not compete abroad because they will lose anyway, as a self-fulfilling defeat. Competition makes the company better. The third framing, that depriving the rest of the world of general-purpose computing benefits the United States, also fails on first principles: it benefits one or two American companies at the cost of an entire industry. The cautionary parallel is telecommunications. The United States once had a leading position in telecom fundamental technology and policied itself out of it. Huang’s worry, voiced explicitly to a room of CS students, is that they will graduate into a shell of a computer industry if the same path is repeated.

AI doom and rational optimism

In the same arc Huang rejects the science-fiction framing of AI as a singularity that arrives suddenly on a Wednesday at 7pm and ends civilization. He calls those claims irresponsible, says they are not true, and points out that the people advancing them are believed by audiences who then make policy on that basis. It is not true that no one understands how these systems work. It is not true that intelligence becomes infinitely powerful instantaneously. It is not true that there is no defense. His framing, which the host echoes as “rational optimism,” is that the goal is to create a future where people care about computers because the technology students are learning is worth mastering.

Stanford’s compute problem is Stanford’s fault

A student presses on the scarcity of compute for independent researchers, startups and universities inside the United States. Huang’s answer is sharp: there is no shortage. Place the order and the chips will arrive. The actual broken thing is institutional. University grants are fragmented across departments. No researcher can raise enough on a single grant to fund a billion-dollar shared cluster, and no one shares. He compares it to showing up at the grocery store demanding a billion dollars of tomatoes today. The solution is planning, aggregation and a campus-scale supercomputer, the way Stanford once built the linear accelerator. The endowment is $40 billion. Pulling a billion off it, contracting cloud capacity and giving every student and researcher AI supercomputer access is, in Huang’s view, obviously doable. When he says “it is Stanford’s fault” the host laughs, but Huang clarifies: if it is your fault you have the power to fix it.

Career, suffering and resilience

Asked how a CS student should spend the next few years, Huang pushes back on the standard “follow your passion” advice. Most people do not know what they love yet, because no one knows what they do not know. The bar of demanding joy from every working day is too high. Whatever the job is, do it as well as you can. Even as CEO of NVIDIA he says he genuinely loves about 10 percent of his work. The other 90 percent is hard and he suffers through it. He recommends suffering on purpose, because resilience is a muscle that only builds under load, and when the company, the team or the family needs that muscle, it has to already exist. Earlier in his life that meant cleaning toilets and busing tables at Denny’s. He does it today running a multi-trillion-dollar company.

The biggest mistakes

Huang separates technical mistakes from strategic mistakes. NVIDIA’s first generation of products was technically wrong in almost every way: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point inside. The company wasted two and a half years. But the strategic genius of the recovery, the reading of the market, the conservation of resources and the reapplication of talent, is what taught him strategy. The clean strategic mistake he names is mobile. NVIDIA’s Tegra line grew to a billion dollars of revenue and then collapsed to zero when Qualcomm’s modem dominance locked NVIDIA out of the 3G to 4G transition. Huang explicitly refuses the comforting rationalization that the Tegra effort fed the Thor automotive chip (“Thor is the great great great grandson”). The original decision, he says, was a waste of time. The lesson is to think one or two clicks further about whether a market is structurally winnable before committing the company.

Forecasting under fog of war

The final substantive exchange is on forecasting. Huang’s method has four steps. Observe what is actually happening (AlexNet crushing two decades of computer vision research in one shot, GPT producing reasoning by token generation). Reason from first principles about why it works. Ask “so what” and “what next” recursively until a mental model of the future emerges. Place the company inside that future and work backwards. Crucially, expect to be partly wrong. Some outcomes will absolutely happen, some will likely happen, some might happen, and the strategy has to be robust across that distribution. The real cost of any strategic choice is the opportunity cost of the alternatives you did not take, so the discipline is to minimize that cost and maximize optionality while letting the journey itself pay for the journey.

Thoughts

The most useful thing in this conversation is the explicit architectural mapping of compute patterns to chip generations. Hopper for pre-training. Grace Blackwell NVLink72 for inference, because decode is bandwidth-bound and a single chip cannot supply it. Vera Rubin for agents, because tool calls stall multi-billion-dollar GPU systems and so the CPU has to be optimized for low-latency single-threaded code. Feynman for swarms. That sequence is not marketing. It is a falsifiable thesis about where the bottleneck moves next, and every other infrastructure company should be measuring themselves against it. If Huang is right that swarms of sub-agents are the next dominant pattern, then the design pressure shifts from raw flops to fabric topology, memory hierarchy and storage-to-GPU latency. That has implications for everyone downstream, including the hyperscalers building competing accelerators.

The MFU section is the most intellectually generous moment in the talk. The instinct in the AI ops community has been to chase MFU as if it were a virtue. Huang argues, persuasively, that low MFU is consistent with high tokens per watt in a disaggregated inference setup, and that bottlenecks rotate fast enough that over-provisioning every resource is the rational design. That reframing matters because it changes what “scarce” means. Compute is not scarce in the way the discourse treats it. What is scarce is a coherent system designed end-to-end. The xAI 11 percent number, in that frame, is not embarrassing. It is the natural reading of a workload that is mostly decode.

The Stanford segment is the part most likely to be quoted out of context. “It’s Stanford’s fault” is a deliberately provocative line, but the underlying claim is correct and load-bearing. Compute is not gated by NVIDIA refusing to ship chips. It is gated by the fact that fragmented grant funding cannot aggregate into the billion-dollar order that NVIDIA can fulfill. The implication is that universities and national labs need a structural change in how they pool capital for compute, and that the current model of every researcher buying a handful of cards is genuinely obsolete. Huang’s nudge about pulling a billion off the endowment is concrete enough to be acted on, and other major research universities should read this segment as a direct prompt.

The geopolitical segment is the highest-stakes one. The telecommunications comparison is correct as a historical pattern, and Huang is one of the very few executives in a position to deliver that warning credibly. The unresolved tension is that the argument applies symmetrically. If American AI dominance is built by selling globally, that includes selling into adversarial states, and the policy question is where the line falls. Huang does not answer that question. He attacks the framing that lets the question be answered badly. That is a meaningful contribution to the discourse even if it does not resolve the underlying tradeoff.

The career advice section is the part the social-media clips will mishandle. “Seek suffering” reads as macho when extracted. In context it is a specific operational claim about how resilience compounds, and it is paired with the Tegra story where Huang himself paid the price of not thinking one more click ahead. That kind of self-implication is rare in CEO talks, and it is the reason the talk is worth listening to in full rather than only reading the recap.

Watch the full Stanford CS153 Frontier Systems conversation with Jensen Huang here.
May 13, 2026
Andrej Karpathy on Vibe Coding vs Agentic Engineering: Why He Feels More Behind Than Ever in 2026
Andrej Karpathy, co-founder of OpenAI, former head of AI at Tesla, and now founder of Eureka Labs, returned to Sequoia Capital’s AI Ascent 2026 stage for a wide-ranging conversation with partner Stephanie Zhan. One year after coining the term “vibe coding,” Karpathy unpacked what has changed, why he has never felt more behind as a programmer, and why the discipline emerging on top of vibe coding, which he calls agentic engineering, is the more serious craft worth learning right now.

The conversation covered Software 3.0, the limits of verifiability, why LLMs are better understood as ghosts than animals, and why you can outsource your thinking but never your understanding. Below is a complete breakdown of the talk for anyone building, hiring, or learning in the agent era.

TLDW

Karpathy describes a sharp transition that happened in December 2025, when agentic coding tools crossed a threshold and code chunks just started coming out fine without correction. He frames the current moment as Software 3.0, where prompting an LLM is the new programming, and entire app categories are collapsing into a single model call. He distinguishes vibe coding (raising the floor for everyone) from agentic engineering (preserving the professional quality bar at much higher speed). Models remain jagged because they are trained on what labs choose to verify, so founders should look for valuable but neglected verifiable domains. Taste, judgment, oversight, and understanding remain uniquely human responsibilities, and tools that enhance understanding are the ones he is most excited about.

Key Takeaways
- December 2025 was a clear inflection point. Code chunks from agentic tools started arriving correct without edits, and Karpathy stopped correcting the system entirely.
- Software 3.0 means programming has become prompting. The context window is your lever over the LLM interpreter, which performs computation in digital information space.
- Open Code’s installer is a software 3.0 example. Instead of a complex shell script, you copy paste a block of text to your agent, and the agent figures out your environment.
- The Menu Gen anecdote illustrates how entire apps can become spurious. What used to require OCR, image generation, and a hosted Vercell app can now be a single Gemini plus Nano Banana prompt.
- Vibe coding raises the floor. Agentic engineering preserves the professional ceiling. The two are different disciplines.
- The 10x engineer multiplier is now far higher than 10x for people who are good at agentic engineering.
- Hiring processes have not caught up. Puzzle interviews are the old paradigm. New evaluations should look like building a full Twitter clone for agents and surviving simulated red team attacks from other agents.
- Models are jagged because reinforcement learning rewards what is verifiable, and labs choose which verifiable domains to invest in. Strawberry letter counts and the 50 meter car wash question show how state-of-the-art models can refactor 100,000 line codebases yet fail at trivial reasoning.
- If you are in a verifiable setting, you can run your own fine tuning, build RL environments, and benefit even when the labs are not focused on your domain.
- LLMs are ghosts, not animals. They are statistical simulations summoned from pre training and shaped by RL appendages, not creatures with curiosity or motivation. Yelling at them does not help.
- Taste, aesthetics, spec design, and oversight remain human jobs. Models still produce bloated, copy paste heavy code with brittle abstractions.
- Documentation is still written for humans. Agent native infrastructure, where docs are explicitly designed to be copy pasted into an agent, is a major opportunity.
- The future likely involves agent representation for people and organizations, with agents talking to other agents to coordinate meetings and tasks.
- You can outsource your thinking but not your understanding. Tools that help humans understand information faster are uniquely valuable.
Detailed Summary

Why Karpathy Feels More Behind Than Ever

Karpathy opens by describing how he has been using agentic coding tools for over a year. For most of that period, the experience was mixed. The tools could write chunks of code, but they often required edits and supervision. December 2025 changed everything. With more time during a holiday break and the release of newer models, Karpathy noticed that the chunks just came out fine. He kept asking for more. He cannot remember the last time he had to correct the agent. He started trusting the system, and what followed was a cascade of side projects.

He wants to stress that anyone whose model of AI was formed by ChatGPT in early 2025 needs to look again. The agentic coherent workflow that genuinely works is a fundamentally different experience, and the transition was stark.

Software 3.0 Explained

The Software 1.0 paradigm was writing explicit code. Software 2.0 was programming by curating datasets and training neural networks. Software 3.0 is programming by prompting. When you train a GPT class model on a sufficiently large set of tasks, the model implicitly learns to multitask everything in the data. The result is a programmable computer where the context window is your interface, and the LLM is the interpreter performing computation in digital information space.

Karpathy gives two concrete examples. The first is Open Code’s installer. Normally a shell script handles installation across many platforms, and these scripts balloon in complexity. Open Code instead provides a block of text you copy paste to your agent. The agent reads your environment, follows instructions, debugs in a loop, and gets things working. You no longer specify every detail. The agent supplies its own intelligence.

The Menu Gen Story

The second example is Karpathy’s Menu Gen project. He built an app that takes a photo of a restaurant menu, OCRs the items, generates pictures for each dish, and renders the enhanced menu. The app runs on Vercell and chains together multiple services. Then he saw a software 3.0 alternative. You take a photo, give it to Gemini, and ask it to use Nano Banana to overlay generated images onto the menu. The model returns a single image with everything rendered. The entire app he built is now spurious. The neural network does the work. The prompt is the photo. The output is the photo. There is no app between them.

Karpathy uses this to argue that founders should not just think of AI as a speedup of existing patterns. Entirely new things become possible. His example is LLM driven knowledge bases that compile a wiki for an organization from raw documents. That is not a faster version of older code. It is a new capability with no prior equivalent.

What Will Look Obvious in Hindsight

Stephanie Zhan asks what the equivalent of building websites in the 1990s or mobile apps in the 2010s looks like today. Karpathy speculates about completely neural computers. Imagine a device that takes raw video and audio as input, runs a neural net as the host process, and uses diffusion to render a unique UI for each moment. He notes that early computing in the 1950s and 60s was undecided between calculator like and neural net like architectures. We went down the calculator path. He thinks the relationship may eventually flip, with neural networks becoming the host and CPUs becoming co processors used for deterministic appendages.

Verifiability and Jagged Intelligence

Karpathy spent significant writing time on verifiability. Classical computers automate what you can specify in code. The current generation of LLMs automates what you can verify. Frontier labs train models inside giant reinforcement learning environments, so the models peak in capability where verification rewards are strong, especially math and code. They stagnate or get rough around the edges elsewhere.

This explains the jagged intelligence puzzle. The classic example was counting letters in strawberry. The newer one Karpathy offers: a state of the art model will refactor a 100,000 line codebase or find zero day vulnerabilities, then tell you to walk to a car wash 50 meters away because it is so close. The two coexisting capabilities should be jarring. They reveal that you must stay in the loop, treat models as tools, and understand which RL circuits your task lands in.

He also points out that data distribution choices matter. The jump in chess capability from GPT 3.5 to GPT 4 came largely because someone at OpenAI added a huge amount of chess data to pre training. Whatever ends up in the mix gets disproportionately good. You are at the mercy of what labs prioritize, and you have to explore the model the labs hand you because there is no manual.

Founder Advice in a Lab Dominated World

Asked what founders should do given that labs are racing toward escape velocity in obvious verifiable domains, Karpathy points back to verifiability itself. If your domain is verifiable but currently neglected, you can build RL environments and run your own fine tuning. The technology works. Pull the lever with diverse RL environments and a fine tuning framework, and you get something useful. He hints there is one specific domain he finds undervalued but declines to name it on stage.

On the question of what is automatable only from a distance, Karpathy says almost everything can ultimately be made verifiable. Even writing can be assessed by councils of LLM judges. The differences are in difficulty, not in possibility.

From Vibe Coding to Agentic Engineering

Vibe coding raises the floor. Anyone can build something. Agentic engineering preserves the professional quality bar that existed before. You are still responsible for your software. You are still not allowed to ship vulnerabilities. The question is how you go faster without sacrificing standards. Karpathy calls it an engineering discipline because coordinating spiky, stochastic agents to maintain quality at speed requires real skill.

The ceiling on agentic engineering capability is very high. The old idea of a 10x engineer is now an understatement. People who are good at this peak far above 10x.

What Mediocre Versus AI Native Looks Like

Karpathy compares this to how different generations use ChatGPT. The difference between a mediocre and an AI native engineer using Claude Code, Codex, or Open Code is investment in setup and full use of available features. The same way previous generations of engineers got the most out of Vim or VSCode, today’s strong engineers tune their agentic environments deeply.

He thinks hiring processes have not caught up. Most companies still hand out puzzles. The new test should look like asking a candidate to build a full Twitter clone for agents, make it secure, simulate user activity with agents, and then run multiple Codex 5.4x high instances trying to break it. The candidate’s system should hold up.

What Humans Still Own

Agents are intern level entities right now. Humans are responsible for aesthetics, judgment, taste, and oversight. Karpathy describes a Menu Gen bug where the agent tried to associate Stripe purchases with Google accounts using email addresses as the key, instead of a persistent user ID. Email addresses can differ between Stripe and Google accounts. This kind of specification level mistake is exactly what humans must catch.

He works with agents to design detailed specs and treats those as documentation. The agent fills in the implementation. He has stopped memorizing API details for things like NumPy axis arguments or PyTorch reshape versus permute. The intern handles recall. Humans handle architecture, design, and the right questions.

Reading the actual code agents produce can still cause heart attacks. It is bloated, full of copy paste, riddled with awkward and brittle abstractions. His Micro GPT project, an attempt to simplify LLM training to its bare essence, was nearly impossible to drive through agents. The models hate simplification. That capability sits outside their RL circuits. Nothing is fundamentally preventing this from improving. The labs simply have not invested.

Animals Versus Ghosts

Karpathy returns to his framing that we are not building animals, we are summoning ghosts. Animal intelligence comes from evolution and is shaped by intrinsic motivation, fun, curiosity, and empowerment. LLMs are statistical simulation circuits where pre training is the substrate and RL is bolted on as appendages. They are jagged. They do not respond to being yelled at. They have no real curiosity. The ghost framing is partly philosophical, but it changes how you approach them. You stay suspicious. You explore. You do not assume the system you used yesterday will behave the same on a new task.

Agent Native Infrastructure

Most software, frameworks, libraries, and documentation are still written for humans. Karpathy’s pet peeve is being told to do something instead of being given a block of text to copy paste to his agent. He wants agent first infrastructure. The Menu Gen project’s hardest part was not writing code. It was deploying on Vercell, configuring DNS, navigating service settings, and stringing together integrations. He wants to give a single prompt and have the entire thing deployed without touching anything.

Long term he expects agent representation for individuals and organizations. His agent will negotiate meeting details with your agent. The world becomes one of sensors, actuators, and agent native data structures legible to LLMs.

Education and What Still Matters

The most striking line of the conversation comes near the end. Karpathy quotes a tweet that shaped his thinking: you can outsource your thinking but you cannot outsource your understanding. Information still has to make it into your brain. You still need to know what you are building and why. You cannot direct agents well if you do not understand the system.

This is part of why he is so excited about LLM driven knowledge bases. Every time he reads an article, his personal wiki absorbs it, and he can query it from new angles. Every projection onto the same information yields new insight. Tools that enhance human understanding are uniquely valuable because LLMs do not excel at understanding. That bottleneck is yours to manage.

Thoughts

The most useful frame in this talk is the distinction between vibe coding and agentic engineering. It clarifies what has been muddled for the past year. Vibe coding is about access. Anyone can produce something. Agentic engineering is about discipline. You preserve the standards that made software trustworthy in the first place, while moving at speeds that would have seemed absurd two years ago. These are not the same activity, and conflating them is part of why so many shipped products feel half built.

The Menu Gen anecdote is the kind of story that should make every solo developer pause. If a single Gemini plus Nano Banana prompt can replace a multi service Vercell deployed app, the question for any builder becomes how much of what you are working on right now is going to be made spurious by the next model release. The honest answer is probably more than you want to admit. The defensive posture is not building thicker apps. It is choosing problems where the model alone is not enough, where taste, distribution, infrastructure, or specific verifiable RL environments give you something the next model cannot collapse into a prompt.

The verifiability lens is also unusually practical. If you are a solo builder, the question shifts from what is possible to what is verifiable but neglected. The labs will eat the obvious verifiable domains because that is how their RL pipelines are set up. The opportunity is in domains where verification is possible but the labs have not yet invested. That is a much more concrete strategic filter than vague intuitions about defensibility.

The car wash example is going to stick. State of the art models can refactor enormous codebases and still tell you to walk somewhere a sane person would drive. That is the lived reality of jagged intelligence, and it argues strongly for staying in the loop on real decisions rather than handing off everything to agents. The agents are excellent fillers of blanks. They are not yet trustworthy specifiers of the spec.

Finally, the line about outsourcing thinking but not understanding is worth taping above the desk. The bottleneck is no longer typing speed, syntax recall, or even API knowledge. It is whether the human in the loop actually understands the system being built. Tools that genuinely improve human understanding, including personal knowledge bases that re project information through different prompts, are likely the most undervalued category of products being built right now. The opportunity is not just in agents. It is in the cognitive scaffolding that makes humans good directors of agents.
April 29, 2026