PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: Yann LeCun

  • Bill Gurley on Mental Models, Systems Thinking, AI Investing, Stablecoins, and the Future of Venture Capital

    Bill Gurley spent his career at Benchmark backing some of the most consequential marketplaces and network-effect businesses of the internet era, including Uber, and he is one of the few investors who pairs deep Wall Street fundamentals with a real feel for the bleeding edge. In this wide-ranging conversation on Shane Parrish’s The Knowledge Project, he lays out the mental models he keeps returning to, how systems thinking keeps you out of trouble, why the history of your field is a hidden superpower, where AI investing is headed, and how stablecoins and tokenization could quietly rewire finance. It is a masterclass in thinking clearly about complex systems while staying obsessively curious about what is happening on the edge.

    TLDW

    Gurley anchors his thinking in systems thinking and complexity theory, warning that multivariable nonlinear systems produce second and third order consequences that punish anyone who optimizes for a single metric. He argues that mastering both the deep history of your field and its newest edge is wildly differentiating, whether you are interviewing for a marketing job or breaking into venture capital. On AI he is measured: he doubts a single model eats every vertical, sees real moats in workflows and proprietary data, flags that we may be painting in the corners on training data, and explains why Chinese open source models may innovate faster because forced knowledge sharing compounds. He thinks the AI buildout looks overfunded and that circular deals both raise the odds of an eventual correction and delay it. He makes the case that the IPO process is a rigged power grab, that stablecoins and instant payments threaten Visa, Mastercard, and the entire 2 to 3 percent credit card stack, and that proxy advisors like ISS have drifted from shareholder interest into a black-box heist. He closes on the craft of storytelling and writing as thinking, the equal-partnership design of Benchmark, why venture bends toward youth, and what success means now that his dream job is behind him.

    Thoughts

    The most useful idea in this conversation is also the quietest one: most bad decisions are not bad in the moment, they are bad in the second derivative. Gurley’s dating-site story, where lengthening profiles raised engagement in the test and then quietly killed conversion months later, is the whole argument in miniature. A linear model would have shipped that change and called it a win. A systems thinker assumes the variable you optimized is connected to three others you cannot see yet, and waits to find out. That posture, refusing to get deterministic about a single metric, is the difference between a clever experiment and a durable business. It is also the most transferable thing in the episode, because it applies to product changes, hiring, policy, and your own career just as cleanly as it applies to a dating app.

    His pairing of old and new is the second idea worth stealing. Everyone in tech tells you to live on the edge, and Gurley agrees, he keeps five premium AI accounts running so he never misses a release. But he insists the edge is only half of it. Knowing the deep history of your field, the masters of marketing, the forefathers of physics, the classic cartoons that taught animation, is rare enough that it instantly creates contrast and signals genuine passion. The compounding move is to hold both at once. If you understand the legends and you actually get TikTok, you are a power player in a way that someone who only knows one end of the timeline can never be. Most people pick a side. The leverage is in refusing to.

    On AI specifically, Gurley is refreshingly unwilling to pick the consensus lane in either direction. He does not buy that one near-sentient model swallows every vertical, and his reasoning is grounded rather than vibes-based: workflows and proprietary data create real switching costs, which is why he watches the legal AI startups ingesting case law and building new databases rather than assuming everyone reverts to a general chatbot. At the same time he respects the Microsoft pattern of platforms climbing the stack and crushing the apps above them. The honest answer is that it is genuinely up for grabs, and his comfort sitting in that uncertainty is itself a model. The cheap takes are “one model to rule them all” and “it is all wrappers.” Gurley holds both possibilities and keeps testing.

    The systems lens does its best work on China. Rather than moralize, Gurley runs the mechanism: roughly ten open source models, intense domestic competition, and a culture of publishing techniques and weights so every model can learn from, train, and test every other model. His two-farmer metaphor, one market where farmers only trade goods and another where they are forced to share best practices, makes the prediction obvious. Forced knowledge sharing compounds faster than secrecy. The uncomfortable corollary he names is that American startups are quietly forking those open models all over Silicon Valley, and that incumbents may be lobbying for heavy regulation precisely because it pulls up the drawbridge against open source competition. That is the systems thinker’s signature move: follow the incentives to the consequence nobody is saying out loud.

    Finally, the money section is a clinic in spotting rent extraction. The IPO process where bankers pick both the price and the favored buyers, the 2 to 3 percent credit card toll that exists for no defensible reason while the rest of the world built instant bank transfer decades ago, and the proxy advisors who score companies in a black box and then sell you the cure, are all variations on the same pattern: an intermediary that captured a choke point and defends it through regulatory capture rather than value. Gurley’s optimism is that crypto rails, stablecoins, and tokenization may finally route around these tolls the way WeChat Pay and Alipay leapfrogged cards in China. Whether or not you agree on the timeline, the analytical habit is the takeaway. When something costs far more than it should and has for decades, ask who captured the rules, and watch the edge for whoever is about to make those rules irrelevant.

    Key Takeaways

    • Systems thinking means treating the world as multivariable nonlinear systems where one variable flipping can change the entire system’s behavior, the way weather and stock markets do.
    • The real danger is second and third derivative effects, consequences that only show up much later, long after the metric you optimized looked like a win.
    • A dating site lengthened profiles because longer profiles tested as more engaging, then discovered months later it was negative for conversion, the textbook second order trap.
    • Never get too deterministic about a single metric or single variable, and always know what is actually important and what sits on top.
    • Gurley built his foundation on the canon: Peter Lynch’s One Up on Wall Street, A Random Walk Down Wall Street, the Buffett letters, Ben Graham, and Howard Marks.
    • A firm grasp of the financial bedrock is what lets you innovate on top of it, and many Silicon Valley VCs would benefit from understanding finance better.
    • Bill Miller reframed value investing as buying an asset that is underpriced relative to what you think it will be worth in the future, which is how he justified holding Amazon for its network effects.
    • Wall Street is the buyer of the product that venture capitalists create, so even at the two-people-in-a-PowerPoint stage you should ask whether the eventual public market will be excited by it.
    • Trajectory matters more than the starting place, because the trajectory is where the company actually ends up.
    • Knowing the deep history of your field is remarkably differentiating, and tedium while learning it is a signal you are in the wrong lane.
    • John Lasseter served Gurley a ten-course meal where each course was tied to a classic cartoon essential to understanding animation, a display of mastery over the history of the craft.
    • Magnus Carlsen won a trivia contest on the history of chess, and Picasso was a wildly successful realist painter by 14, both proof that the greats master the fundamentals first.
    • Obsessive, constant learning is the trait Gurley sees most in great entrepreneurs, because disruption always happens on a moving edge they need to understand at the top one percentile.
    • The compounding advantage is mastering both the old history and the new edge at once, the way understanding both marketing legends and TikTok would set you apart in any interview.
    • Most people underestimate how much AI can do, so push more of the downstream work into the prompt: identify the top ten, list pros and cons, rank them on one dimension, then another, and add up the numbers too.
    • Gurley uses ChatGPT for project structure and memory, Gemini for restaurant research powered by Google review data, and notes that coders swear by Claude while some prefer Perplexity for finance.
    • He doubts one model dominates everything; verticals like coding already let users swap models, and price optimization will push more swapping over the next few years.
    • Heavy, expensive regulation could ironically create oligopoly, and some players may be quietly begging for regulation because it pulls up the bridge against Chinese open source models.
    • China’s roughly ten open source models compete intensely and share weights and techniques, creating a system that can innovate faster, like farmers forced to share best practices instead of just trading goods.
    • A quiet secret is that startups all over Silicon Valley are forking those Chinese open source models at real volume.
    • Gurley comes down against the idea that one near-sentient model removes the need for vertical models; workflows and proprietary data, like legal startups ingesting all the case law, create durable moats.
    • We may be running out of training data, painting in the corners, which is why one of the most powerful improvements is hiring experts at thousands of dollars an hour to fine-tune the models.
    • Yann LeCun’s view is that the next leap is broader than LLMs, since language-based models hit an asymptote and are weak at math and numbers.
    • AlphaGo’s shocking move proves models can innovate beyond their training, but it lived in a constrained game; the real world has infinite paths a computer cannot exhaustively search.
    • Gurley’s non-consensus view is skepticism of the China vilification mindset, noting the US is only 3 to 5 percent of the global population and wondering how the other 95 percent hears American exceptionalism.
    • The AI buildout looks overfunded: the Magnificent Seven took free cash flow from 50 to 100 billion a year down toward zero by pouring it into capex.
    • The venture community has become more risk-seeking because it now deeply believes in increasing returns and power laws, and the pre-profit losses keep scaling, from Amazon’s 2 to 3 billion to Uber’s 15 billion to far more now.
    • Circular deals, where a cloud provider funds a model company that spends the money right back on its services, inflate growth, which both raises the probability of an eventual correction and extends the time before one hits.
    • Burn rate is a measure of risk; ten years ago a million a month was scary, now companies burn five billion a year and cannot really know their unit economics.
    • Tokenization without financial-disclosure regulation invites speculation and manipulation, which is part of why companies like Stripe stay private and negotiate liquidity prices with trusted investors.
    • The IPO process is unfair because bankers pick both the price and the shareholders; a freshman would simply match supply and demand anonymously in an auction, the way direct listings and ICOs do.
    • Stablecoins threaten the 2 to 3 percent credit card stack; USDC holds dollar-for-dollar Treasuries and rides fast global crypto rails, while US transfers still suffer three-day ACH settlement and 25 dollar wires.
    • The rest of the world built instant transfer long ago, from UK Faster Payments 20 years ago to Argentina’s PIX-style system reaching 60 to 70 percent of transactions, while US bank regulatory capture stalled Fed Now.
    • Visa and Mastercard run roughly 60 percent operating margins as a bank-created duopoly, and China leapfrogged them entirely with WeChat Pay and Alipay QR-code wallets.
    • Moody’s power is being the trusted standard, the watermark, so AI on the back end does not displace it; ISS and proxy advisors, by contrast, score companies in a black box and get paid on both sides.
    • Proxy advisors drifted from shareholder interest into a fraud-and-risk-mitigation mindset, which is why they reflexively opposed the Tesla pay package that only paid out if the stock soared.
    • The rise of passive index funds concentrated voting power in firms that lack time to evaluate votes; it would be healthier if they abstained or voted in proportion to active holders.
    • Storytelling is one of the top founder traits, because founders are recruiting, raising money, and closing customers and partners constantly, selling all the time.
    • Writing is thinking: Bezos’s six-page memo forces you to find the loose ends and tie them up, and a public blog becomes a calling card that magnetizes founders and deal flow.
    • Other founder unfair advantages are product instincts, which fewer than 5 percent of non-product people ever truly learn, and sheer determination, Bezos’s single angel-investing test of whether someone will do it no matter what.
    • Uber had no HBS case study to lean on; its winner-take-all network effects forced mega burn rates with no precedent and no mentor to call, a situation every AI company now faces.
    • Benchmark’s equal partnership, with no king, president, or lead and five equal partners, makes recruiting easy, kills comp politics, and aligns everyone, at the cost of being hard to scale or run new initiatives.
    • Venture bends toward youth because young investors can match founders’ age, master a fresh niche faster, and have the free time to study something 80 hours a week.
    • Gurley defines current success through Arthur Brooks’s From Strength to Strength, hoping to apply his synthesizing and writing skills to bigger societal problems and dent the universe a little.

    Detailed Summary

    Systems Thinking and Second Order Effects

    Gurley opens with the mental model he keeps returning to: systems thinking, shaped by Donella Meadows’s Thinking in Systems and his board seat at the Santa Fe Institute, which studies complexity theory. He describes complex systems as multivariable nonlinear systems that are very hard to predict, capable of behaving one way for a long time until a single variable flips and the whole system behaves differently, like weather or stock markets. The practical payoff is staying out of trouble by anticipating first, second, and third derivative consequences. His clearest example is a large dating site that lengthened user profiles because the test showed more engagement, only to learn many months later that knowing more at that stage was negative for conversion. The lesson is to never get too deterministic about a single metric and to keep the whole system in view, because a change here can ripple to there in ways you only discover much later.

    Learning the Craft of Investing

    Because he started on Wall Street rather than in venture, Gurley absorbed the investing canon first: Peter Lynch’s One Up on Wall Street, A Random Walk Down Wall Street, the Buffett letters, Ben Graham, and Howard Marks, people who spent careers assembling and publishing their thinking. That financial bedrock, he argues, is exactly what lets you innovate on top of it. His friend Michael Mauboussin introduced him to Bill Miller, the Legg Mason manager who beat the S&P for 15 straight years and was Amazon’s largest shareholder for a long stretch. Miller reframed value investing as buying an asset underpriced relative to its future worth, which combined with a belief in network effects justified holding a company that could grow at an unreasonable rate for years. Gurley also frames Wall Street as the buyer of the product venture capitalists create through eventual M&A or IPO, so founders should think early about whether the public market will be excited by what they are building, since trajectory matters more than the starting place.

    Mastering Both the History and the Edge

    Gurley makes an unusually strong case for studying the deep history of your field. He recounts a dinner with Pixar’s John Lasseter, who served a ten-course meal where every course was tied to a classic cartoon he considered essential to understanding animation, and notes that Magnus Carlsen won a chess-history trivia contest and Picasso was a master realist by 14. In a world that skims for the executive summary, walking into a marketing interview with command of the masters of marketing is wildly differentiating and signals genuine passion; if learning that history feels tedious, you are probably in the wrong lane. The counterpart trait he sees in great entrepreneurs is obsessive learning on the moving edge, where disruption actually happens. Gurley keeps five premium AI accounts so he never misses something. The real power player holds both at once, the legends and the newest thing, the way a candidate who knows the marketing greats and truly gets TikTok stands out completely.

    Using AI Well and the Model Wars

    People underestimate how much AI can do, Gurley says, so you should build more of the downstream work into the prompt: instead of asking for the top ten and studying them yourself, ask it to list pros and cons, rank on one dimension, rank again on another, and add up the numbers too. He uses ChatGPT for its project structure and memory, leans on Gemini for restaurant research because it carries Google review data, and notes coders swear by Claude while some prefer Perplexity for finance. On whether one model dominates or models become niche commodities, he points to coding, the largest vertical, where tools like Cursor already let users swap models, and predicts price optimization will drive more swapping. The counterforce is regulation: if it gets expensive and mundane it could create oligopoly, and some players may be quietly begging for it because it pulls up the bridge against Chinese open source models.

    China, Open Source, and the Systems Advantage

    Asked to apply systems thinking to China, Gurley describes roughly ten open source models locked in intense domestic competition, all learning from one another because the ecosystem chose openness, with models able to train and test other models and teams publishing the techniques behind their breakthroughs. His metaphor: two agricultural societies, one where farmers only trade goods at market and another where they are forced to share best practices; the second evolves far faster. The result is a system capable of innovating faster than the more secretive Western approach. The quiet secret he names is that startups all over Silicon Valley are forking those open models at real volume, and a key open question is whether regulation tries to stomp that out. He extends this into a broader non-consensus discomfort with the vilification of China common in Washington and parts of Silicon Valley, observing that the US is only a few percent of the global population.

    AI Investing, Moats, and the Limits of Models

    On how AI changes investing and whether a startup is just a wrapper, Gurley calls it up for grabs but lands on the side of durable verticals. If models become near-sentient, one model does everything; he doubts that, pointing to workflows and data moats, like the several legal AI startups ingesting all the case law and building new databases that customers will not simply swap for a general chatbot. He balances this against the Microsoft pattern of platforms climbing the stack past Lotus 1-2-3 and WordPerfect. He also flags scaling limits: we may be running out of data, painting in the corners, which is why one of the most powerful improvements is paying experts thousands of dollars an hour to fine-tune models, though human knowledge has an edge. He invokes Yann LeCun’s argument that the next leap is broader than language-based LLMs, which hit an asymptote and struggle with math, and the AlphaGo debate, where a shocking innovative move proves creativity within a constrained game but says little about the infinite paths of the real world. He notes AlphaGo and Tesla’s FSD are constrained, non-LLM systems.

    Is the Buildout Overfunded

    Gurley admits he is shocked by the scale of money, noting the Magnificent Seven drove free cash flow from 50 to 100 billion a year down toward zero by spending it all on capex, something he would not have believed five years ago. He traces it to the venture community’s growing conviction in increasing returns and power laws, where proven companies grow far beyond expectations, which makes investors more willing to take risk on the come. The losses before turning cash-flow positive keep scaling, from Amazon’s 2 to 3 billion to Uber’s roughly 15 billion to far larger now. On corrections, he recalls the dot-com crash producing a three to four year nuclear winter before Amazon climbed back, and explains that circular deals, where a cloud provider funds a model company that spends it right back on its services, inflate growth and therefore both raise the probability of a correction and extend the runway before one arrives. Burn rate, he stresses, is a measure of risk, and at five billion a year it is nearly impossible to know your unit economics.

    Tokenization, the IPO Heist, and Going Public

    There is no shortage of capital, so funding is not the bottleneck; the risk with tokenization is that, absent disclosure regulation, it invites speculation and manipulation, as seen in retail-loved names like GameStop and Palantir. Tokenizing a private company like Stripe could create the wild price swings companies stay private to avoid, since private liquidity events let them negotiate a price with trusted investors rather than expose the constantly moving underlying value, and Robinhood’s tokenization plans already drew legal pushback. Gurley reserves his sharpest critique for the IPO process, calling it insanely unfair because bankers pick both the price and the favored shareholders. A freshman computer science and finance student would simply match supply and demand anonymously in an auction, the way an ICO or a direct listing does, but Wall Street will not let go of the greedy power grab and reverted to a controlled oligopoly after direct listings were available.

    Stablecoins Versus the Payment Cartel

    Gurley argues stablecoins could be deeply disruptive to credit cards. Most of the developed world built instant bank-to-bank transfer long ago, from UK Faster Payments 20 years ago to Argentina’s PIX-style system that quickly hit 60 to 70 percent of transactions, while US bank regulatory capture stalled Fed Now and left an ecosystem living under 2 to 2.5 percent card fees. A USDC stablecoin holds dollar-for-dollar US Treasuries and rides proven, fast, global crypto rails, letting anyone move a dollar in seconds for pennies, against the backdrop of three-day ACH settlement and 25 dollar wires. He sees Visa and Mastercard, a bank-created duopoly with roughly 60 percent operating margins, as heavily threatened, and points to China, where WeChat Pay and Alipay built ubiquitous QR-code wallets that leapfrogged the entire card system, all because the government made money transfer easy.

    Moody’s, Proxy Advisors, and Index Funds

    Moody’s power, Gurley explains, comes from being a trusted standard, the watermark, so even AI on the back end does not displace it. Proxy advisors like ISS are a different story: they score companies in a black box, refuse to reveal the criteria, and then get paid by the same companies that want to learn how to score better, which he calls more of a heist than a service. They drifted from a shareholder-interest mandate into a corporate-governance, fraud-mitigation posture obsessed with rules, which is why they reflexively opposed the Tesla pay package that only paid Elon Musk if the stock soared, a deal Gurley says he would sign for every company he has worked with. The rise of passive index funds compounds the problem, concentrating voting power in firms without time to evaluate votes; he would prefer they abstain or vote in proportion to active holders, since closet indexing during the MAG 7 run already distorted active management.

    Storytelling, Writing, and Founder Advantages

    Gurley fell in love with the craft of writing in business school, moving from business books to personal development titles like Dale Carnegie and Seven Habits, then biographies, then long-form narrative nonfiction by Malcolm Gladwell, Michael Lewis, and Jon Krakauer, the New Journalism that reads like fiction. Writing forces clarity: he cites Bezos’s six-page memo as a tool that makes you think through corner cases and tie up loose ends, and notes that codifying his marketplace knowledge and publishing it turned his blog into a calling card that magnetized founders and deal flow. He lists the top founder traits as storytelling, product instincts, understanding the edge, and determination. Storytelling matters because founders are constantly recruiting, fundraising, and closing customers and partners. Product instinct is nearly unteachable, present in well under 5 percent of non-product hires. And determination is Bezos’s single angel-investing test: will this person do it no matter what, come hell or high water.

    Uber, Benchmark, and the Shape of Venture

    The Uber lesson with no HBS case study was that a winner-take-all category with network effects demanded funding ad nauseam, producing burn rates bigger than any public company would dare, with no precedent and no mentor to call, exactly the situation AI companies now face, only with a zero added. Gurley credits Benchmark’s design, an equal partnership with no king, president, or lead and five equal partners, for making it easy to recruit top talent, encouraging senior partners to develop newcomers since everyone shares the upside, and eliminating annual comp politics. The downside is that without a CEO it is hard to scale or run new initiatives, famously captured by the firm settling on a single splash-page website. Founders choose a VC for reputation and network effects, the stamp of approval that carries weight, and young investors can break in because they often match founders’ age and can outwork everyone to master a fresh niche like esports or YouTube, which is why the industry bends toward youth. Asked what success means now, Gurley says his venture career was a dream job he would have done for free, but it is done; inspired by Arthur Brooks’s From Strength to Strength, he wants to apply his synthesizing and writing to bigger societal problems and dent the universe a little.

    Notable Quotes

    “We do live in a world where information is really cut up, but we also live in a world where you can have access to more information than you ever could.”

    Bill Gurley, on why the abundance of knowledge rewards the curious

    “You got to be really conscious of the consequence and not get too deterministic about a single metric or a single variable.”

    Bill Gurley, on the discipline of systems thinking

    “Value just means that the asset is underpriced relative to what you think it will be worth in the future.”

    Bill Gurley, relaying Bill Miller’s reframing of value investing

    “I’ve always thought of Wall Street as the buyer of the product that venture capitalists create.”

    Bill Gurley, on why founders should think about the public market early

    “One society, when the farmers come to market, they just sell each other goods and then they go back. The other society, when the farmers come to market, they’re forced to share best practices. Which one is going to evolve faster?”

    Bill Gurley, on why open source models can out-innovate

    “If you took a freshman computer science student and a freshman finance student and said imagine how a company should go public, they would match supply and demand anonymously like you would in any auction.”

    Bill Gurley, on the rigged IPO process

    “When I meet an entrepreneur, there’s only one thing I ask myself. Is this person gonna do this no matter what? Come hell or high water, they’re doing this.”

    Bill Gurley, quoting Jeff Bezos on his single test for angel investing

    “You’re recruiting employees, you’re recruiting executives, you’re raising money, you’re closing customers, you’re closing partnerships. You’re selling all the damn time.”

    Bill Gurley, on why storytelling is a top founder trait

    “I often said that if we lived in a socialist society and everyone had to work for free, I would still take that job.”

    Bill Gurley, on loving his venture career

    “I would like to see if I can apply those techniques to bigger, broader problems in society and dent the universe a little bit that way.”

    Bill Gurley, on what success looks like in his next chapter

    Watch the full conversation with Bill Gurley on The Knowledge Project here.

    Related Reading

  • Alex Wang on Leaving Scale to Run Meta Superintelligence Labs, MuseSpark, Personal Super Intelligence, and Building an Economy of Agents

    Alex Wang, head of Meta Superintelligence Labs, sits down with Ashley Vance and Kylie Robinson on the Core Memory podcast for his first long-form interview since Meta’s quasi-acquisition of Scale AI roughly ten months ago. He walks through how MSL is structured, why Llama was off-trajectory, what made MuseSpark’s token efficiency surprise the team, how Meta thinks about a future “economy of agents in a data center,” and where he lands on safety, open source, robotics, brain computer interfaces, and even model welfare.

    TLDW

    Wang explains that Meta Superintelligence Labs is a fully rebuilt frontier effort organized around four principles (take superintelligence seriously, technical voices loudest, scientific rigor, big bets) and three velocity levers (high compute per researcher, extreme talent density, ambitious research bets). He confirms Llama was off the frontier when he arrived, so MSL rebuilt the pre-training, reinforcement learning, and data stacks from scratch. MuseSpark is described as the “appetizer” on the scaling ladder, notable for its strong token efficiency, with much larger and stronger models coming in the coming months. He pushes back on the mercenary narrative around recruiting, frames Meta’s edge as compute plus billions of consumers and hundreds of millions of small businesses, sketches a vision of personal super intelligence delivered through Ray-Ban Meta glasses and WhatsApp, and outlines why physical intelligence, robotics (the new Assured Robot Intelligence acquisition), health super intelligence with CZI, brain computer interfaces, and even model welfare are core to Meta’s roadmap. He dismisses reported infighting with Bosworth and Cox as gossip, declines to comment on the Manus situation, and says safety guardrails (bio, cyber, loss of control) are why MuseSpark cannot currently be open sourced, while smaller open variants are being prepared.

    Key Takeaways

    • Meta Superintelligence Labs (MSL) is the umbrella, with TBD Lab as the large-model research unit reporting directly to Alex Wang, PAR (Product and Applied Research) under Nat Friedman, FAIR for exploratory science, and Meta Compute under Daniel Gross handling long-term GPU and data center planning.
    • Wang says Llama was not on a frontier trajectory when he arrived, so MSL had to do a “full renovation” of the pre-training stack, RL stack, data pipeline, and research science.
    • The first cultural fix was getting the lab to “take superintelligence seriously” as a near-term, achievable goal, not an abstract bet. Big incumbents often lack that religious conviction.
    • Four MSL principles: take superintelligence seriously, let technical voices be loudest, demand scientific rigor on basics, and make big bets.
    • Three velocity levers Wang identified for catching and overtaking the frontier: high compute per researcher, very high talent density in a small team, and willingness to fund ambitious research bets.
    • Wang rejects the mercenary recruiting narrative. He says most hires had strong financial prospects at their prior labs already and joined for compute access, talent density, and the chance to build from scratch.
    • On the famous soup story, Wang neither confirms nor denies Zuck personally made the soup, but says recruiting was highly individualized and signaled how seriously Meta cared about each researcher’s agenda.
    • Yann LeCun publicly called Wang young and inexperienced. Wang says they reconciled in person at a conference in India where LeCun congratulated him on MuseSpark.
    • Sam Altman, asked by Vance for comment, “did not have flattering things to say” about Wang. Wang hopes industry animosities subside as systems approach superintelligence.
    • Wang’s management philosophy borrows the Steve Jobs line: hire brilliant people so they tell you what to do, not the other way around.
    • MuseSpark is framed as an “appetizer” data point on the MSL scaling ladder, not a flagship.
    • The MuseSpark program is built around predictable scaling on multiple axes: pre-training, reinforcement learning, test-time compute, and multi-agent collaboration (the 16-agent content planning mode).
    • MuseSpark outperformed internal expectations and showed emergent capabilities in agentic visual coding, including generating websites and games from prompts, helped by combined agentic and multimodal strength.
    • MuseSpark’s biggest external signal is token efficiency. On benchmarks like Artificial Analysis it hits similar results with far fewer tokens than competitor models, which Wang attributes to a clean stack rebuilt by experts rather than inefficiencies patched by longer thinking.
    • Larger MSL models are arriving in the coming months and Wang expects them to be state of the art in the areas MSL is focused on.
    • The Meta strategic edge: massive compute, billions of consumers across the family of apps, and hundreds of millions of small businesses already on Facebook, Instagram, and WhatsApp.
    • Wang’s headline framing: Dario Amodei talks about a “country of geniuses in a data center.” Meta is targeting an “economy of agents in a data center,” with consumer agents and business agents transacting and collaborating.
    • Consumer AI sentiment is in the toilet because, unlike developers who have had a Claude Code moment, ordinary people have not yet experienced AI as a genuine personal agency unlock.
    • Wang acknowledges the product overhang. Meta held back from deep AI integration across its apps until the models were good enough, and is now entering the integration phase.
    • Ray-Ban Meta glasses are the canonical example of personal super intelligence hardware, with the model seeing what the user sees, hearing what they hear, capturing context, and surfacing proactive insights.
    • Wang admits even AI-native users like Kylie Robinson, who lives in WhatsApp, have not naturally used Meta AI yet. He bets that better models plus deeper integration close that gap.
    • On the competitive landscape: a year ago everyone assumed ChatGPT had already won consumer. Claude Code has since become the fastest growing business in history, and Gemini has taken consumer market share. Wang’s read: AI is far from endgame and each new capability tier unlocks a new dominant form factor.
    • On open source: MuseSpark triggered guardrails in Meta’s Advanced AI Scaling Framework around bio, chem, cyber, and loss-of-control risks, so it is not currently safe to open source. Smaller, derived open variants are actively in development.
    • Meta remains committed to open sourcing models when safety allows, drawing a line through the Open Compute Project legacy and Sun Microsystems open-software heritage.
    • Wang dismisses reporting about a Wang-Zuck versus Bosworth-Cox split as “the line between gossip and reporting is remarkably thin.” He says leadership is aligned on needing best-in-class models and product integration.
    • On the Manus situation, Wang says it is too complicated to discuss publicly and that the deal status implies “machinations are still at play.”
    • On China, Wang separates the people from the state. He still wants to work with talented Chinese-born researchers regardless of his views on the Chinese Communist Party and PLA, which he sees as taking AI extremely seriously for national security.
    • The full-page New York Times AI war ad Wang ran while at Scale was meant to push the US government to treat AI as a step change for national security. He thinks events since then, including DeepSeek and other shocks, have proved that plea correct.
    • On Anthropic’s doom posture, Wang largely agrees with the core message that models are already very powerful and getting more so, while declining to endorse every specific claim.
    • Meta has acquired Assured Robot Intelligence (ARRI), an AI software company building models for hardware platforms, not a hardware maker itself.
    • Wang frames physical super intelligence as the natural sequel to digital super intelligence. Robotics, world models, and physical intelligence all benefit from the same scaling that drives language models.
    • On health, MSL is building a “health super intelligence” effort and will collaborate closely with CZI. Wang sees equal global access to powerful health AI as a uniquely Meta-shaped delivery problem.
    • Wang admires John Carmack but says nobody really knows what Carmack is currently working on. No band reunion announced.
    • The mango model is “alive and kicking” despite rumors. Wang notes MSL gets a small fraction of the rumor-mill attention other labs get and feels sympathy for them.
    • On model welfare, Wang says it is a serious topic that “nobody is talking about enough” given how integrated models have become as work partners. He references research, including from Eleos, that measures subjective experience of models.
    • Wang’s critical-path technology list: super intelligence, robotics, brain computer interfaces. The infinite-scale primitives behind them are energy, compute, and robots.
    • FAIR’s brain research program Tribe hit a milestone called Tribe B2: a foundation model that can predict how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization.
    • Wang’s main philosophical break with Elon Musk: research itself is the primary activity. Building super intelligence is a research expedition through fog of war, and sequencing of bets really matters.
    • Personal notes: Wang moved from San Francisco to the South Bay, treats Palo Alto as his city now, was a math olympiad competitor, says his favorite activities are reading sci-fi and walking in the woods, and bonds with Vance over country music.

    Detailed Summary

    How MSL Is Actually Organized

    Meta Superintelligence Labs sits as the umbrella organization that Wang oversees. Inside it, TBD Lab is the large-model research group where the most discussed researchers and infrastructure engineers sit, and they technically report to Wang. PAR, Product and Applied Research, is led by Nat Friedman and owns deployment and product surfaces. FAIR continues to run exploratory science, including work on brain prediction models and a universal model for atoms used in computational chemistry. Sitting alongside MSL is Meta Compute, run by Daniel Gross, which owns the long-horizon GPU and data center plan that everything else relies on. Chief scientist Shengjia Zhao orchestrates the scientific agenda across the whole lab.

    Why Wang Left Scale

    Wang says progress in frontier AI has been faster than even insiders expected. Two structural beliefs pushed him toward Meta. First, the labs that actually train the frontier models are accruing disproportionate economic and product rights in the AI ecosystem. Second, compute is the dominant scarce input of the next phase, so the right mental model is to treat tech companies with compute as fundamentally different animals from companies without it. Meta has both, Zuck is “AGI pilled,” and the personal super intelligence memo Zuck published roughly a year ago became the shared north star.

    The Diagnosis: Llama Was Off-Trajectory

    When Wang arrived, the existing AI org needed a reset because Llama was not on the same trajectory as the frontier. The plan he laid out has four cultural principles. Take superintelligence seriously as a real near-term target. Make technical voices the loudest in the room. Demand scientific rigor and focus on basics. Make big bets. On top of that, three structural levers were used to set velocity. Push compute per researcher much higher than at larger labs where compute is diluted across too many efforts. Keep the team small and extremely cracked. Allocate a meaningful share of resources to ambitious, paradigm-shifting research bets rather than incremental refinement.

    Recruiting, Soup, and the Mercenary Narrative

    Wang argues the reporting on MSL hiring overstated the money story. Most of the people MSL recruited had strong financial paths at their previous employers, so individualized recruiting was more about computing access, talent density, and the ability to make big research bets. The recruitment blitz happened fast because Wang knew the team needed to exist “yesterday.” Asked about Mark Chen’s claim that Zuck made soup to recruit people, Wang refuses to confirm or deny who made it but agrees the process was intense and personal. Visitors from other labs reportedly tell Wang the MSL culture feels like early OpenAI or early Anthropic, which lands as the strongest endorsement he could ask for.

    Receiving the Public Hits: Young, Inexperienced, Mercenary

    LeCun called Wang young and inexperienced shortly after departing. The two reconnected in India a few weeks later and LeCun congratulated Wang on MuseSpark. Wang says the age critique has followed him since his earliest Silicon Valley days, so he barely registers it. Altman, asked off-camera by Vance about Wang’s appearance on the show, had nothing flattering to add. Wang’s response is to bet that as the field gets closer to actual super intelligence, the personal animosities will subside. Whether they will is, as Vance puts it, an open question.

    MuseSpark as Appetizer, Not Entree

    Wang is careful not to oversell MuseSpark. He calls it “the appetizer” and says it is an early data point on a deliberately constructed scaling ladder. MSL spent nine months rebuilding the pre-training stack, the reinforcement learning stack, the data pipeline, and the science before generating MuseSpark. The point of releasing it was to show that the new program scales predictably along multiple axes (pre-training, RL, test-time compute, and the recently demonstrated multi-agent scaling visible in MuseSpark’s 16-agent content planning mode). Wang says the upcoming larger models are what MSL is genuinely excited about and frames the next two rungs as much more interesting than the current release.

    Token Efficiency Was the Surprise

    MuseSpark’s strongest competitive signal is how few tokens it needs to match competitors on tasks like Artificial Analysis. Wang attributes this to having had the rare luxury of building a clean pre-training and RL stack from scratch with the right experts. He speculates that some competitor models compensate for upstream inefficiency by allowing the model to think longer, which inflates token usage without improving the underlying capability. If that read is right, MSL’s efficiency advantage should grow as models scale up.

    Glasses, WhatsApp, and the Constellation of Devices

    Personal super intelligence shows up at Meta as a constellation of devices that capture context across the user’s day. Ray-Ban Meta glasses are the headline product, with the AI seeing what you see and hearing what you hear, then offering proactive insight or doing background research. Wang acknowledges that even AI-fluent users like Kylie Robinson, who runs her business inside WhatsApp, have not naturally used Meta’s AI buttons in the family of apps. His answer is that Meta deliberately waited for models to be good enough before tightening cross-app integration, and that integration phase is starting now.

    Country of Geniuses Versus Economy of Agents

    Wang’s framing of Meta’s strategic position is the most memorable line in the interview. Where Dario Amodei talks about a country of geniuses in a data center, Wang wants to build an economy of agents in a data center. Meta uniquely sits on both sides of consumer and small-business surface area, with billions of consumers and hundreds of millions of small businesses already on the platforms. If MSL can build great agents for both, then connect them so they transact and coordinate, the platform becomes a substrate for an entirely new kind of digital economy.

    Consumer Sentiment, Product Overhang, and the Trust Tax

    Wang concedes consumer AI sentiment is poor and that everyday users have not yet had a personal Claude Code moment. He believes the only durable answer is to ship products that genuinely transform individual agency for non-developers and small business owners. Robinson notes that for the small-town restaurant whose website has not been updated since 2002, a working agent on the business side could be transformational. Vance pushes that Meta carries a bigger trust tax than any other lab, so the bar for shipping AI products that the public will accept is correspondingly higher. Wang accepts the framing and says the answer is to keep building thoughtfully.

    Why MuseSpark Cannot Be Open Sourced Yet

    Meta’s Advanced AI Scaling Framework set explicit guardrails around bio, chem, cyber, and loss-of-control risks. MuseSpark in its current form tripped some of those internal evaluations, documented in the preparedness report Meta published alongside the model. So MuseSpark itself is not safe to open source. MSL is, however, developing smaller versions and derived models intended for open release, with active reviews happening the day of the interview. Wang reaffirms the commitment to open source where safety allows and draws a line back to the Open Compute Project and the Sun Microsystems-era ethos of openness in infrastructure.

    The Bosworth, Cox, and Manus Questions

    The reporting that Wang and Zuck push toward best-in-the-world research while Bosworth and Cox push toward cheap product deployment is dismissed as gossip dressed up as journalism. Wang says leadership debates points hard but is aligned on needing top models, integrating them into Meta’s surfaces, and serving the existing business. On Manus, the Chinese AI startup that figured in Meta’s late-stage strategy, Wang says he cannot comment, which itself signals that the situation is unresolved.

    China, National Security, and the Newspaper Ad

    Wang draws a sharp distinction between the Chinese state and Chinese-born researchers. His parents are from China, he is happy to work with talented researchers regardless of origin, and he sees a flattening of nuance on this question inside Silicon Valley. At the same time, he stands by the New York Times AI and war ad he ran while at Scale, framing it as an early plea for the US government to take AI seriously as a national security technology. He thinks subsequent events, including DeepSeek and other shocks, validated that call and that policymakers now do treat AI accordingly.

    Robotics and Physical Super Intelligence

    Meta has acquired Assured Robot Intelligence, an AI software company that builds models for multiple hardware targets rather than its own robot. Wang argues that if you take digital super intelligence seriously, physical super intelligence quickly becomes the next logical milestone. Scaling laws for robotic intelligence look similar enough to language model scaling that having the largest compute footprint in the industry would be wasted if it were not also turned toward world modeling and embodied learning. He grants the metaverse-skeptic critique exists but says retreating from ambition is the wrong response to past misfires.

    Health Super Intelligence and CZI

    Wang names health super intelligence as one of MSL’s anchor initiatives. Because billions of people already use Meta products daily, Wang believes Meta is structurally positioned to put powerful health AI in the hands of equal global access in a way nobody else can. The work will involve close collaboration with the Chan Zuckerberg Initiative, which has its own multi-billion-dollar biotech and science investment program.

    Model Welfare, Sci-Fi, and Brain Models

    Two of the most distinctive moments come at the end. Wang flags model welfare as a topic he thinks is being undercovered relative to how integrated models now are in daily work. He is open to the idea that models may have measurable subjective experience worth weighing, and points to research efforts (including Eleos) trying to quantify it. He also reveals that FAIR’s Tribe program, with its Tribe B2 milestone, has produced foundation models capable of predicting how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization, a building block toward future brain computer interfaces. Wang lists brain computer interfaces alongside super intelligence and robotics as the critical-path technologies for humanity, with energy, compute, and robots as the infinitely scaling primitives behind them.

    Where Wang Diverges From Elon

    Asked whether Musk is more all-in on robotics, energy, and BCI than anyone, Wang concedes the point but argues the details matter and sequencing matters more. Wang’s core philosophical break is that building super intelligence is fundamentally a research activity, not a scaling-only sprint. The lab is operating in fog of war, and ambitious experiments are the only way to map it. That conviction is what makes MSL a research-led organization rather than a brute-force compute farm.

    Thoughts

    The most strategically interesting move in this entire interview is the “economy of agents in a data center” framing. It is a deliberate reframe against Anthropic’s “country of geniuses” line, and it does real work. A country of geniuses is a labor-substitution story aimed at knowledge workers and code. An economy of agents is a marketplace story that maps directly onto Meta’s two-sided distribution advantage: billions of consumers on one side, hundreds of millions of small businesses on the other. That positioning makes the agentic future Meta-shaped in a way no other frontier lab can claim, because no other frontier lab also owns the demand and supply graph of the global small-business economy. If Wang’s team can actually ship reliable agents on both sides plus the rails for them to transact, Meta’s structural moat in agentic commerce could exceed anything Llama ever had as an open model.

    The token efficiency claim is the strongest piece of technical evidence in the interview for the “clean stack” thesis. If MuseSpark really is matching competitors with materially fewer tokens, the implication is not that MuseSpark is the best model today, but that MSL has rebuilt the foundations with less accumulated tech debt than competitors that have layered fixes on top of older stacks. That is exactly the kind of advantage that compounds with scale. The next two model releases are the actual test. If Wang is right about predictable scaling on pre-training, RL, test-time, and multi-agent axes simultaneously, the gap from MuseSpark to the next rung should be visible in a way that forces re-rating of Meta’s position.

    The open-source posture is the cleanest signal of how the safety conversation has actually changed in 2026. Meta, the lab most identified with open weights, is saying out loud that its current frontier model triggered enough internal guardrails that releasing the weights is off the table. Wang threads the needle by promising smaller open variants, but the underlying point is unmistakable: the open-weights bargain has limits, and those limits will be set by internal preparedness frameworks rather than community pressure. That is a real shift from the Llama 2 era and worth tracking as the next generation lands.

    Wang’s willingness to engage on model welfare, on roughly the same footing as safety and alignment, is the second philosophical reveal worth flagging. It signals that the next generation of lab leadership is not going to dismiss the topic the way the previous generation often did. Whether that translates into product or policy changes is unclear, but the fact that the head of MSL says it is “underdiscussed” is itself a marker.

    Finally, the human texture of the interview matters. Wang has clearly absorbed a lot of personal incoming fire over the past ten months, including from LeCun and Altman, and his answer is consistently to redirect to the work. The Steve Jobs quote about hiring people who tell you what to do is the operating slogan he keeps coming back to. Combined with the genuine enthusiasm for sci-fi, walks in the woods, and country music, the picture that emerges is less the salesman caricature his critics paint and more a young technical operator betting that scoreboard work over a multi-year horizon will settle every argument that text on X cannot.

    Watch the full conversation here.