PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: frontier models

  • Uber CEO Dara Khosrowshahi on AI, Autonomous Vehicles, Robotaxis, Drones, and the Future of Transportation

    Uber CEO Dara Khosrowshahi sat down with Patrick O’Shaughnessy on the Invest Like the Best podcast for a long, candid conversation about the forces remaking transportation. There is artificial intelligence inside the company, and there is physical AI out in the real world, meaning autonomous vehicles, robotaxis, and delivery drones. He calls the autonomous opportunity another trillion dollar marketplace and argues it will change how society operates. You can watch the full interview here. What follows is a structured breakdown of the most useful ideas, the strategy behind Uber’s AV bet, and the operating philosophy that runs underneath all of it.

    TLDW

    Dara Khosrowshahi explains how he brought order to the chaos he inherited at Uber in 2017 by treating hard problems like vector mathematics, and how an immigrant childhood shaped his all-in, low-stress operating style. He describes AI hitting Uber on two fronts at once: much larger digital models that predict rider intent, and physical AI that changes how rides and food get fulfilled in the real world. The conversation covers Uber blowing through a full year of AI budget in a single quarter, metering headcount as engineers become superhuman, the more than 30 AV partnerships with Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI, and why supply, not demand, is the whole game. It runs through the coexistence model borrowed from travel and Uber Eats, the Uber One membership flywheel at 50 million members, the push from on-demand to planned travel through hotels and Uber Reserve, the economics of cheaper autonomous cars and delivery drones, the regional race from the Middle East to Europe, and the lessons from Barry Diller and Herbert Allen about getting to ground truth and betting on people. It closes on his capital allocation philosophy of prioritizing organic growth and AV commitments over buybacks.

    Thoughts

    The most underappreciated line in the whole interview is the budget one. Blowing a full year of AI spend in a single quarter is the clearest signal yet that frontier intelligence is being consumed far faster than even an AI-native company planned for. Dara’s response has quietly become the default enterprise playbook: explore on the expensive frontier models, then scale the proven interactions onto cheaper or open-source models. The deeper tension is that he is simultaneously telling teams to drive adoption and metering headcount, which is the real story of AI in large companies. The productivity gains are showing up as fewer hires, not only as faster shipping.

    The supply-first framing is the strategic core, and it inverts the demand-first logic he learned at Expedia. In autonomous vehicles this means Uber does not need to win the self-driving race itself. It needs to own the demand layer and aggregate every AV maker’s supply, the same way online travel agents coexist with hotels and Uber Eats coexists with McDonald’s. The 30 percent higher utilization figure for AVs on Uber’s network is the wedge in that argument. It is the reason a Waymo stays on the platform even while building its own brand, because filling more of an expensive asset’s day changes the entire return on the car.

    His premortem answer is unusually honest. Asked what kills the opportunity, he does not name an Uber-specific execution failure. He names AI’s unpopularity with the general public. That is a CEO admitting the gating factor is social license, not technology. The early data he leans on, drivers in Austin and Atlanta earning more and signing up in greater numbers as AVs add incremental demand, is the counter-narrative he is betting the public conversation on. Whether that story holds as AV volume scales from thousands of vehicles to hundreds of thousands is the open risk the entire industry shares.

    Underneath the strategy is one repeated instinct: get to ground truth. It shows up in the Barry Diller story about reading the model from the analyst who built it, in his hunt for the troublemakers who keep a company mutating, and in the fact that he bought an ebike to deliver food in San Francisco. It is the same move applied at every altitude, and it is why he frames AI as a chance to rebuild processes from first principles rather than shave 20 percent off the ones that exist. The leaders who treat AI as an efficiency tool will likely lose to the ones who rebuild from the ground up.

    Key Takeaways

    • Dara took the Uber job in 2017 after Daniel Ek recommended him at the Allen and Company Sun Valley conference and told him, when he hesitated, that life is about impact rather than happiness.
    • He inherited what he calls complete chaos: a board fighting for control, lost trust with regulators and the public, and a committee running the company after Travis Kalanick stepped back.
    • His method for chaos is to treat it like vector mathematics, breaking a seemingly unassailable problem into component dimensions and solving each one.
    • Early moves included bringing in chairman Ron Sugar to unite the board, running a listening tour with stakeholders, and rebuilding the executive team with leaders like Andrew McDonald and Tony West.
    • He credits an engineering mindset and an immigrant childhood for his calm under pressure. His family lost everything leaving Iran when he was nine and rebuilt from nothing.
    • On parenting, he argues that overcoming challenges is what forms people, and that doing everything for your kids is a long-term disservice disguised as a short-term favor.
    • Uber has always operated in a probabilistic real world of traffic, cancellations, and late food, so it has used machine learning longer than most consumer companies.
    • The current inflection is AI on two fronts: larger digital models that predict intent, and physical AI that changes how Uber fulfills in the real world.
    • Uber’s feed and search models are now roughly 10,000 times bigger than the older ones, enabling universal search across rides, eats, and grocery in a single query.
    • Uber can already guess a rider’s destination about three quarters of the time, turning booking into a one-tap interaction.
    • AI adoption is bottoms-up across engineering, legal, and marketing. Developers in India are driving roughly ten times the code commits using autonomous agents.
    • Dara pushes teams to rebuild processes from first principles with AI rather than settling for 20 to 30 percent optimization of an existing process.
    • He wants the rebels and troublemakers to win, and treats unpredictable internal adoption patterns as something to find and promote.
    • Uber blew through its full-year AI budget in a single quarter, which is now forcing it to meter headcount as engineer throughput climbs.
    • The token strategy is to explore on expensive frontier models, then scale proven interactions onto cheaper or open-source models.
    • Uber generates over 10 billion dollars in free cash flow on more than 10 billion trips a year, but it is not a high-margin business, so efficiency funds lower prices and higher earnings.
    • In autonomous vehicles, the thesis is supply: own the demand layer and aggregate every AV maker’s vehicles, the way Uber aggregates drivers and restaurants.
    • Uber has more than 30 AV partnerships, including Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI.
    • Uber is building the surrounding ecosystem: depots, charging, fleet partners, a one billion dollar Santander financing line for EV and AV fleets, and autonomous insurance.
    • AVs operating on Uber’s network are about 30 percent busier in trips and revenue per vehicle per day than vehicles not on the network, which transforms the return on an expensive car.
    • The build, partner, or buy answer is coexistence, mirroring how travel agents coexist with hotels and airlines and how Uber Eats coexists with McDonald’s, Starbucks, and Chipotle.
    • His public premortem is that AI’s unpopularity, not Uber-specific execution, is the biggest risk, so the company must move at the pace society will accept to avoid backlash.
    • Early data in Austin and Atlanta shows drivers earning more and more drivers joining, suggesting AVs are adding incremental demand rather than only displacing humans.
    • AV hardware costs typically fall 30 to 40 percent per generation. A Lucid midsize built with Nuro could land around 60,000 to 70,000 dollars and bring transportation costs down.
    • Lower cost expands demand. Uber already dwarfs the taxi market it was once sized against, and Dara expects the same dynamic with AVs.
    • Traditional OEMs are now investing in L4-ready systems and should arrive over the next two to four years. Each AV drives roughly three to four times what a human driver does.
    • Chinese manufacturing capability and bill of materials are described as unrivaled. A low-cost Western, Foxconn-style player for AVs is being worked on but does not exist yet.
    • Drones are gated by battery density. Food and grocery drones should reach real scale in two to five years and become normal in five to ten, with Joby and Zipline cited as examples.
    • The Middle East, including Abu Dhabi, Dubai, and Saudi Arabia, is moving fastest thanks to entrepreneurial regulators. Europe is catching up, with London robotaxi pilots expected before year end.
    • Uber Eats wins the number one position more often internationally. The playbook is selection plus reliability, amplified by cross-platform upsell, with about 13 percent of Eats bookings coming from the mobility app.
    • Uber One has 50 million members growing 50 percent year on year. Dara frames it like Netflix, more content for the same price, and accepts a first-year loss for multi-year profit.
    • Uber is pushing from on-demand to planned through hotels, via a deal with Expedia, and through Uber Reserve, now at over a 5 billion dollar run rate with 99 percent-plus reliability.
    • His leadership lessons: from Barry Diller, get to ground truth from source material and tell the truth as a leader. From Herbert Allen, bet on people, not companies.
    • On capital allocation, he prioritizes organic growth and financialized AV commitments over buybacks, while keeping costs growing slower than revenue.

    Detailed Summary

    From chaos to structure: the 2017 turnaround

    Dara came to Uber from 13 years running Expedia under Barry Diller, recruited through a head hunter after Daniel Ek floated his name at the Sun Valley conference. He arrived into what he describes as complete chaos, with the board fighting over control rather than the fate of the company and trust badly damaged with regulators, the public, and employees. His approach was to decompose the situation the way an engineer decomposes a multidimensional problem, solving each dimension and reassembling the whole. Practically that meant a new chairman in Ron Sugar to unite the board, a listening tour to understand stakeholder concerns, and a rebuild of the leadership team that kept strong insiders like Andrew McDonald while adding people like Tony West.

    An engineering mind and an immigrant chip on the shoulder

    His wife Sid calls him a robot, by which she means he does not get rattled. He traces that to an engineering education and to a childhood upheaval. His family left Iran when he was nine and lost the business his father had built, and he watched that loss diminish his father over the years. The experience produced a durable drive to rebuild and a refusal to let external chaos define him internally. He applies a similar philosophy to his kids, arguing that challenges and the act of overcoming them are what form a person, and that helicopter parenting removes the very friction that builds capability.

    AI inside Uber: prediction, agents, and superhuman engineers

    Uber has always lived in a probabilistic world where the digital booking is deterministic but the real-world fulfillment is not, so it adopted machine learning earlier than most consumer companies. The newest models are roughly 10,000 times larger than the prior generation and power universal search and destination prediction that is right about three quarters of the time. Internally, adoption is bottoms-up and uneven in a good way, with engineers in India shipping around ten times the code commits using autonomous agents. Rather than mandate from the top, Dara pushes teams to rebuild whole processes from first principles with AI instead of trimming a fifth off the existing ones.

    The cost of intelligence

    The flip side of fast adoption is cost. Uber blew through its annual AI budget in a single quarter, and that is forcing a real adjustment. Because engineer throughput is climbing, the company is metering headcount increases rather than simply hiring. The operating rule is to keep driving adoption while pursuing efficiency, using frontier models from providers like OpenAI and Anthropic to experiment with new interactions, then moving the scaled experiences onto more efficient or open-source models to bring the per-token cost down. With more than 10 billion dollars of free cash flow on over 10 billion trips, Uber is not a high-margin business, so efficiency directly funds lower prices for riders and higher earnings for drivers.

    Why supply decides the AV race

    At Expedia, Dara learned a demand-first model where you attract consumers and then build inventory to match. Uber is the opposite, a supply company, where securing every car, restaurant, courier, and retailer causes the demand to follow. Applied to autonomous vehicles, the strategy is to be the go-to-market and demand layer for anyone building a digital driver. Uber wants to aggregate the largest pool of AV supply, just as it aggregates human drivers, so that the companies building the actual self-driving software can focus on the driver while Uber handles distribution and utilization.

    Building the ecosystem around the digital driver

    Uber now has more than 30 AV partnerships spanning Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI, and it expects many winners rather than one, the same shape as the foundation model market. Around those partners it is assembling the connective infrastructure: depots and charging in cities where the regulatory path is opening, fleet partners, a one billion dollar financing line with Santander for EV and AV fleets, and work on autonomous insurance. It is also collecting street data today that can feed the models, so that when a partner’s cars hit the market there is instant demand waiting. The early proof point is that AVs on Uber’s network run about 30 percent busier than comparable vehicles off it, which materially improves the return on a costly car.

    The premortem and the public’s patience

    Asked what derails the opportunity, Dara points outward rather than inward. The risk is that AI is powerful but unpopular, and the average person experiences it as a threat to electricity costs or a cousin’s job rather than as magic. The same dynamic could hit AVs even though the technology should end up safer than human drivers, which is why questions about emergency services, equitable access, and driver earnings have to be worked through with regulators and communities. The encouraging early signal is in Austin and Atlanta, where drivers are making more money and more are joining because AVs appear to be adding incremental demand. The controllable risk, he says, is access to supply, which is exactly why Uber has partnered with nearly every AV provider across mobility, delivery, and freight.

    A trillion dollar marketplace: cheaper cars and delivery drones

    Dara sizes the autonomous opportunity as another trillion dollar marketplace. As AV software and hardware costs fall, typically 30 to 40 percent per generation, a Lucid midsize built with Nuro could come in around 60,000 to 70,000 dollars, which starts to lower the real cost of transportation. History says lower cost expands demand, and Uber already became multiples larger than the taxi market it was once compared to. Manufacturing scales from hundreds to thousands to hundreds of thousands of vehicles, each driving three to four times what a human does, with traditional OEMs investing in L4-ready systems over the next two to four years and Chinese manufacturers setting the bar on cost and quality. Delivery drones are further out, gated mainly by battery density, but should reach real scale in two to five years and feel normal in five to ten.

    Membership, hotels, and the shift from on-demand to planned

    Uber Eats often reaches the number one position internationally by nailing selection and reliability and then layering on cross-platform advantages, with roughly 13 percent of Eats bookings flowing from the mobility app. Uber One, at 50 million members growing 50 percent year on year, is the loyalty engine, and Dara likens it to Netflix in that members get more for the same price. He explains the membership economics through Amazon Prime, accepting a money-losing first year to earn multi-year profit as members spend more across services. The newest expansion is travel: hotels through a deal with Expedia, and a broader move from Uber’s on-demand brand toward planned bookings, proven out by Uber Reserve at a 5 billion dollar-plus run rate and 99 percent-plus reliability. The end state he wants is a trip where Uber pre-books your ride to the airport, knows your hotel, and brings in-market magic to the whole journey.

    Operating philosophy: ground truth, troublemakers, and capital allocation

    The mentors thread through everything. From Barry Diller, with whom he worked for more than 20 years, he took the discipline of getting unfiltered truth from the source, illustrated by Diller insisting on hearing the Paramount LBO model from the young analyst who built it. From Herbert Allen he took the lesson to bet on people rather than companies, because great people stay great across cycles. In his own practice that becomes radical transparency, a deliberate hunt for the troublemakers who act as the mutations that keep an organism from dying, and a willingness to be wrong, since learning, often through pain, is what he finds interesting. On capital, he treats allocation as an art, prioritizing organic growth, which took Uber Eats from under a billion to over a hundred billion in gross bookings, then AV commitments that can be financialized, with buybacks coming after growth rather than instead of it.

    Notable Quotes

    “I know who I am, and I’m always going to be that same person. I’m not going to let the chaos of the world affect me mentally.”

    Dara Khosrowshahi, on why crisis does not rattle him

    “We blew through our AI budget in a quarter, you know, for the whole year essentially. And it is forcing us to adjust.”

    Dara Khosrowshahi, on the real cost of AI adoption at Uber

    “What’s magical now is going to seem normal to all of us 10 years from now.”

    Dara Khosrowshahi, on how fast riders stop noticing autonomous vehicles

    “We think it’s another trillion dollar marketplace.”

    Dara Khosrowshahi, on the scale of the autonomous vehicle opportunity

    “If we do that, the demand will take care of itself.”

    Dara Khosrowshahi, on why Uber obsesses over securing supply first

    “I’m looking for those mutations. I’m looking for those troublemakers constantly.”

    Dara Khosrowshahi, on keeping a large company adaptive

    “It’s the filtering that gets the edge out of the story or out of the situation. And it’s often the edge that gives you an edge.”

    Dara Khosrowshahi, on a lesson from Barry Diller about going to the source

    “If I’m not wrong, if I’m not making mistakes, it’s just not very interesting.”

    Dara Khosrowshahi, on why learning, often through pain, drives him

    “Meeting her and seeing her operate, I think, finally allowed me to be the person I want to be versus the person I thought I was supposed to be.”

    Dara Khosrowshahi, on his wife Sid, when asked the kindest thing someone has done for him

    The throughline is that Uber intends to be the demand layer for autonomous transportation the way it became the demand layer for human drivers, while rebuilding its own operations around AI from first principles. Whether the public grants the industry enough patience is the open question Dara keeps returning to. Watch the full conversation here.

    Related Reading

    • Uber primary source for the company, products, and AV partnerships discussed in the interview.
    • Dara Khosrowshahi (Wikipedia) background on the CEO’s path from Iran to Expedia to Uber.
    • Invest Like the Best the podcast with Patrick O’Shaughnessy where this conversation took place.
    • Waymo the autonomous driving company behind the Austin and Atlanta partnerships referenced.
    • Barry Diller (Wikipedia) the mentor whose lessons on ground truth shaped Dara’s leadership style.
  • Gavin Baker on Orbital Compute, TSMC, Frontier AI Models, Anthropic’s Vertical Take Off, and the Coming Wafer Shortage

    Gavin Baker, founder and CIO of Atreides Management, returns to Patrick O’Shaughnessy’s Invest Like the Best for his sixth appearance. He calls the current AI moment the most extraordinary moment in the history of capitalism, walks through what Anthropic’s vertical takeoff in revenue actually means, lays out why orbital compute is closer than skeptics believe, dissects the TSMC bottleneck that may be the only thing standing between today’s market and a full-on AI bubble, and rates every hyperscaler on how they have positioned for a world where frontier model providers may stop selling API access altogether.

    TLDW

    Anthropic added eleven billion dollars of ARR in a single month, which is roughly the combined business of Palantir, Snowflake, and Databricks built over a decade. That is the setup. From there Gavin Baker covers the March and April selloff, the contrarian read that a closed Strait of Hormuz was actually bullish for American manufacturing competitiveness, why Anthropic and OpenAI multiples may be misleadingly cheap on an unconstrained run rate basis, why Elon Musk’s discipline on SpaceX valuation created a superpower of permanent access to capital, the practical engineering case for orbital compute as racks in space rather than Pentagon sized space stations, why TSMC’s capacity discipline is the single most important variable in whether the AI cycle becomes a bubble, what Terafab in Texas changes, why the Pareto frontier of AI models has flipped from Google dominance to Anthropic and OpenAI dominance in nine months, the shift from all you can eat AI subscriptions to usage based pricing and what that means for revenue scaling, Richard Sutton’s bitter lesson as the largest risk to the AI trade, why frontier tokens still capture an overwhelming share of economic value, the role of continual learning as the third great open question, why most new chip startups should not try to build a better GPU, why Cerebras did something different and hard, why disaggregated inference may extend GPU useful lives to ten or fifteen years and rescue the private credit industry, why being in the token path is the new venture filter, the new prisoner’s dilemma around releasing frontier models via API, an honest rating of Google, Meta, Amazon, and Microsoft, why personal safety is becoming a real AI era risk, and why he remains an AI optimist maximalist who believes this could be the next Pax Americana.

    Key Takeaways

    • Anthropic added eleven billion dollars of ARR in one month, more than the combined businesses of Palantir, Snowflake, and Databricks built across a decade. There is no precedent for this in the history of capitalism.
    • The SaaS and cloud revolution created between five and ten trillion dollars of value over twenty years. AI is replaying that compression on a timeline measured in months.
    • The March selloff was a drawdown driven by disagreement with price action, not invalidated thesis. That is the kind of drawdown an investor can lean into.
    • Deep Seek Monday in January 2025 was a similar setup. By the day of the selloff, AWS Asia GPU prices had already doubled, GPU availability had fallen, and it was obvious reasoning models would be vastly more compute hungry at inference. The market priced the opposite.
    • The Strait of Hormuz closing was actually positive for America. US natural gas (the primary input into US electricity, which feeds AI) fell twenty percent on Bloomberg while Asian and European natural gas doubled or tripled. American manufacturing competitiveness improved overnight.
    • The US is now the world’s largest producer and exporter of oil and gas. The economy is dramatically less energy intensive than in the 1970s. The shortage trauma comparison does not hold.
    • Tech as a sector traded as cheaply versus the rest of the market in early April as at any point in the last ten years, into the single most bullish moment for AI fundamentals on record.
    • Anthropic is dramatically more capital efficient than OpenAI, having burned roughly eighty percent less to reach a similar revenue scale. They have very different structural returns on invested capital.
    • Anthropic at roughly nine hundred billion for fifty billion of ARR (growing a thousand percent) is striking. Adjusted for compute constraint, the unconstrained run rate could be one hundred fifty to two hundred billion, putting the implied multiple closer to five times.
    • Claude Opus generates roughly seventy percent fewer tokens for the same question than previously, with token quantity tied to answer quality. Subscribers on flat-fee plans are getting a lobotomized model.
    • Elon Musk’s superpower is twenty years of making investors money. He never pushes valuation. SpaceX compounded low thirty percent per year for a decade because Musk treats fair pricing as a sacred covenant.
    • Capitalism will solve the watts shortage. The current bottleneck has shifted from chips and energy to zoning and political approval. Many capex decisions are paused until after the US midterms.
    • The watts shortage probably begins to alleviate in 2027 and 2028. Orbital compute solves it longer term.
    • Orbital compute is not Pentagon sized data centers in space. It is racks in space. A Blackwell rack is three thousand pounds, eight feet tall, four feet deep, three feet wide. SpaceX has shown a satellite roughly that size.
    • The satellites operate in sun synchronous orbit so solar wings (around five hundred feet per side) always face the sun and the radiator on the dark side always points to deep space.
    • Starlink V3 satellites already run at around twenty kilowatts. A Blackwell rack runs at one hundred kilowatts. SpaceX engineers express genuine confidence they have already solved cooling and radiator design at these scales.
    • Racks in space are connected with lasers traveling through vacuum, the same lasers already on every Starlink. SpaceX operates the world’s largest satellite fleet and, via xAI Colossus, the world’s largest data center on Earth.
    • Inference will move to orbit. Training will stay on Earth for a long time. Terrestrial data centers remain valuable for the rest of an investor’s career.
    • The wafer bottleneck is structural and political. TSMC is essentially Taiwan’s GDP, water, and electricity. The leaders see themselves as inheritors of Morris Chang’s sacred legacy and they do not behave like a Western public company.
    • Jensen Huang has never had a contract with TSMC. The relationship is run on handshakes and the assumption that things will be fair over time.
    • If TSMC did everything Jensen wanted, Nvidia could be selling two to three trillion dollars of GPUs in 2026 and 2027. TSMC’s discipline is the single largest factor preventing a true AI bubble.
    • Historically, foundational technologies always get a bubble. Railroads, canals, the internet. The current AI buildout is overwhelmingly funded out of operating cash flow, GPUs are running at one hundred percent utilization, and that is fundamentally different from the year 2000 fiber overbuild.
    • If one of Intel or Samsung Foundry catches up at the leading node, the other will follow, and TSMC’s discipline collapses. Watch TSMC capacity decisions to predict a bubble.
    • Terafab, the SpaceX and Tesla joint venture to build the world’s largest fab in America, has a partnership with Intel that grants access to fifty years of institutional foundry knowledge. The A teams at ASML, KLA, Lam Research, and Applied Materials will follow Elon’s reputation in hardware engineering.
    • The hiring playbook for Terafab includes building Taiwan Town, Japan Town, and Korea Town next to the fab. Recruit the engineers and import their families, their restaurants, and their staff.
    • Frontier tokens still capture an overwhelming share of all economic value created at the model layer. This is surprising and is one of the three big open questions for AI investing.
    • The Pareto frontier of intelligence versus cost has flipped. Nine months ago Google’s TPU dominated every point on the frontier. Today Anthropic and OpenAI dominate, with Grok 4.3 on the frontier and Gemini 3.1 hanging on.
    • Google’s conservative TPU V8 design (partly an attempt to reduce dependence on Broadcom and Nvidia) is the leading explanation for the loss of per token cost leadership.
    • AI pricing is shifting from all you can eat to usage based, mirroring the cellular and long distance industries. Cellular stopped being a great growth industry when it went all you can eat. AI just made the opposite move.
    • OpenAI and Anthropic together could exceed two hundred billion in ARR this year if compute keeps coming online and frontier token pricing holds.
    • The two hundred fifty dollar a month consumer AI plan is no longer enough to evaluate frontier capability. Enterprise plans with usage based billing are required because rate limits are now severe.
    • The three biggest open questions for AI investors are: violation of the bitter lesson via ASI or human ingenuity, whether frontier tokens keep commanding their premium, and when continual learning arrives.
    • Today’s continual learning is crude reinforcement learning during mid training on verifiable tasks. True continual learning means weights updating dynamically, like a human who learns the first time they touch fire.
    • Trying to build a better GPU is a losing strategy. Jensen will copy any one to three percent share design. Startups should target one percent share, do something different, and make it hard enough that Nvidia cannot fast follow.
    • Disaggregated inference (separating prefill and decode) opens new design canvases. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently.
    • Cerebras did something different and hard with wafer scale computing. Three generations of chips and real grit to get there.
    • Disaggregation of inference may stretch GPU useful lives to ten or fifteen years, dropping financing costs from low sevens to five or six percent, mathematically lowering the cost of the AI buildout and likely saving the private credit industry from its SaaS loan exposure.
    • Sellers of shortage outperform buyers of shortage. But owning the largest installed base of what is currently in shortage (hyperscaler CPU fleets, for example) is also a strong position.
    • Most of the economic value at the application layer of AI has been destroyed, not created. The exceptions are companies in the token path or in niches small enough that frontier labs ignore them.
    • Coding may be the shortest path to ASI. If you can write code, you can write code that does anything. Cursor, Cognition, and Anthropic correctly focused on it.
    • Jensen could probably get close to the frontier with his own Nemotron family of models whenever he wants. The fact that he chooses not to is a strategic decision about not commoditizing his customers.
    • The new prisoner’s dilemma in AI is whether frontier labs release their best model via API. If everyone agrees not to, Chinese open source falls behind. If anyone defects, the defector pulls ahead on revenue and resources, forcing everyone else to defect.
    • Google still owns the largest compute installed base. Without TPU’s prior cost advantage, this matters more. YouTube data has real value in a world of robotics. GCP is going crazy.
    • Meta deserves credit for becoming AI first internally faster than any other internet giant. Musa, their first MSL model, is impressively close to the Pareto frontier.
    • Amazon is strong because of Trainium and robotics driven retail P&L efficiency. Nova is better than it gets credit for.
    • Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Microsoft products rather than reselling to OpenAI is a courageous and probably correct call, even at the cost of an eight hundred dollar stock price.
    • The hyperscalers most engaged with startups are Amazon and Nvidia by a mile, followed by Google. Broadcom is the favorite ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement and that will cost them as the best teams are now at startups.
    • Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion at the speed of FaceTime is already feasible.
    • Ukraine is winning largely on the back of having the best battlefield AI outside America and Israel. Adversaries are starting to internalize what AI dominance means geopolitically.
    • An optimistic read is that this becomes a new Pax Americana, the way the post 1945 American nuclear monopoly was used to rebuild Germany and Japan rather than dominate.
    • AI cured a friend’s daughter’s rare disease by spinning up a research effort that identified a market drug capable of impacting her condition. That is the upside that keeps Gavin an AI optimist maximalist.

    Detailed Summary

    The most extraordinary moment in the history of capitalism

    Gavin’s framing of the current moment is unusually direct. Anthropic added eleven billion dollars of annual recurring revenue in a single month. The three highest profile SaaS companies of the last decade plus, Palantir, Snowflake, and Databricks, took a decade and tens of thousands of employees collectively to build the combined business that Anthropic added in thirty days. He has been investing through every major tech cycle and says there is no historical analog. Not the dotcom era, not the cloud transition, not mobile. This is its own thing.

    The market response, then, was peculiar. The NASDAQ sold off into the single most bullish moment for AI fundamentals on record. Tech traded at roughly its widest discount versus the rest of the market in a decade. Investors who said they wished they had bought into AI during 2022, during COVID, or during Deep Seek Monday got the same valuation setup again in early April, this time with an even clearer inflection.

    Why the Strait of Hormuz closing was secretly bullish for America

    One reason the macro fear in March may have been mispriced is that the same geopolitical event that drove the selloff was, in practice, a relative benefit to the United States. American natural gas, the input into American electricity, which is the input into American AI training and inference, fell roughly twenty percent. Asian and European natural gas prices doubled or tripled. The US emerged with sharply improved relative manufacturing competitiveness, which is exactly what the current administration cares about.

    The 1970s comparison does not hold. The US economy is dramatically less energy intensive, it is now the world’s largest producer and largest exporter of oil and gas, and there are no shortages, only price moves. That backdrop made it easier for disciplined investors to stay focused on AI fundamentals through the volatility.

    Anthropic and OpenAI valuations on an unconstrained run rate

    Anthropic at roughly nine hundred billion for fifty billion of ARR sounds rich until you adjust for the fact that the company is severely compute constrained. Gavin estimates that, unconstrained, Anthropic might be at one hundred fifty to two hundred billion in run rate revenue, putting the implied multiple closer to five times. He also points out that Claude Opus now generates roughly seventy percent fewer tokens for the same question than it used to. Token quantity correlates with answer quality, and Anthropic is rate limiting and shrinking outputs to ration capacity across its user base.

    Anthropic and OpenAI are also structurally very different. Anthropic has burned around eighty percent less cash than OpenAI to reach a comparable revenue scale. That implies very different long term returns on invested capital, though OpenAI has done a better job locking in compute and Sarah Friar is one of the most exceptional CFOs Gavin has worked with.

    Why neither lab is raising at a three trillion dollar valuation

    The answer Gavin gives is that both labs are deliberately leaving valuation on the table the way Elon has done for two decades. SpaceX compounded at low thirty percent annually for a decade because Elon never pushed price. The result is a permanent superpower of access to capital. Investors trust him because they have made money with him for twenty years. That is a moat that compounds with every round.

    Anthropic could probably raise at a one hundred percent premium to its rumored latest mark. They are choosing not to. In an uncertain world (Ukraine, Russia, Iran, Taiwan), preserving the ability to raise more capital later at fair prices is more valuable than maximizing this round.

    Watts and wafers, the two real constraints

    Capitalism is solving the watts problem. The leading PE infrastructure investors now say zoning and political approval, not chips or energy, are the gating factors. Companies are deferring big capex announcements until after the US midterms. Turbine capacity is being doubled at the manufacturers. Companies like Boom Aerospace are repurposing jet engines for grid use. Watts probably ease meaningfully in 2027 and 2028 and then orbital compute does the rest.

    Wafers are the harder problem because they live in Taiwan, run on handshakes, and depend on a corporate culture that does not respond to public market incentives. TSMC is essentially the GDP, water consumption, and electricity consumption of Taiwan. Its leadership treats the company as the legacy of Morris Chang. The Silicon Shield doctrine is real and internal.

    Orbital compute as racks in space

    The biggest mental update Gavin asks listeners to make is to stop picturing data centers in space as Pentagon sized space stations. A Blackwell rack is three thousand pounds and roughly the size of a refrigerator. SpaceX has shown a concept satellite of about that size. Solar wings extend five hundred feet to each side and the radiator extends hundreds of feet behind, both possible because the orbit is sun synchronous and the orientation is fixed relative to the sun.

    SpaceX engineers Gavin has spoken to at Starbase express genuine confidence that they have solved cooling at these power levels. They have. Starlink V3 satellites already operate at twenty kilowatts. A Blackwell rack is one hundred kilowatts. The same company operates the world’s largest satellite fleet and the world’s largest data center on Earth via xAI Colossus. The racks are connected to each other with lasers traveling through vacuum, technology already deployed in every Starlink. The naysayers, Gavin observes, are armchair skeptics and Larry Ellison’s response (he is out there landing rockets, no one else is) is the right frame.

    Terafab in Texas and the threat to TSMC’s discipline

    Terafab, the SpaceX and Tesla joint venture, intends to be the largest fab in the world. The partnership with Intel grants access to fifty years of foundry institutional knowledge, allowing Terafab to start three to five quarters behind the leading node rather than fifteen years behind. The A teams at the semicap equipment companies (ASML, KLA, Lam Research, Applied Materials) will follow Elon’s reputation in hardware engineering the same way they followed TSMC twenty years ago when Intel stumbled.

    The talent strategy is the part most observers underestimate. Recruit the best engineers globally, then import their families, their restaurants, their staff. Build Taiwan Town, Japan Town, and Korea Town next to the fab. Optimize the human experience for the people whose work matters. Intel and Samsung do not think that way.

    Bubble watch and the year 2000 comparison

    Every foundational technology in modern history has had a bubble. Railroads, canals, the internet. Carlota Perez documented why. Markets correctly identify the importance, diversity of opinion collapses, supply gets ahead of demand, the bubble crashes. The current cycle has two important differences. The buildout is overwhelmingly funded out of operating cash flow, not debt. Every GPU is running at one hundred percent utilization, while at the peak of the fiber bubble ninety nine percent of fiber was unused.

    TSMC discipline is the single largest reason a bubble has not formed. If Jensen could buy everything TSMC could theoretically make, Nvidia could sell two to three trillion dollars of GPUs in 2026 and 2027. At some point that becomes more than the market can absorb. If Intel or Samsung Foundry catches up at the leading node, the other will too. TSMC’s pricing discipline collapses and the bubble starts.

    The Pareto frontier and the loss of Google’s cost advantage

    The most important chart in AI is the Pareto frontier of model intelligence versus per token cost. Nine months ago, Google’s TPU based models dominated every point on it. OpenAI, Anthropic, and xAI sat inside the frontier. Today the frontier is dominated by Anthropic and OpenAI, with Grok 4.3 on the frontier and Gemini 3.1 hanging on by subsidization more than economics. The most likely cause is Google’s conservative TPU V8 design, an attempt to reduce dependence on Broadcom and Nvidia that sacrificed per token economics.

    The bitter lesson, frontier tokens, and continual learning

    Three open questions dominate AI investing. The first is whether Richard Sutton’s bitter lesson (more compute beats human algorithmic cleverness) gets violated by ASI itself optimizing for efficiency. Closer observers of AI are more skeptical of a violation. Gavin thinks ASI’s first move will be to make itself more efficient and more resourced, which is technically a temporary violation.

    The second is whether frontier tokens keep capturing the overwhelming share of economic value at the model layer. Today they do, surprisingly. Gemini 3.1 Pro was mindblowing nine months ago and is intolerable today. The third is when continual learning arrives. Today’s models need a million fire touches to learn what a human learns from one. True continual learning would mean dynamic weight updates in real time and would produce a fast takeoff.

    From all you can eat to usage based AI pricing

    AI is shifting from flat fee plans to usage based pricing. The historical analogy is cellular and long distance. Both stopped being great growth industries when they went all you can eat. AI just made the opposite move. The consequence is that flat fee subscribers, even on premium consumer plans, get a rate limited and token throttled version of the frontier model. Enterprise plans with usage based billing are now required to evaluate true capability. Gavin thinks the combination of new compute coming online and usage based pricing is what gets OpenAI and Anthropic past two hundred billion in combined ARR this year.

    Chip startups, prefill decode disaggregation, and Cerebras

    Trying to build a better GPU is the wrong move. The four scaled players (Nvidia, AMD, Trainium, TPU) have copy capability for any one to three percent share design that looks attractive. The good news for startups is that disaggregated inference (separating prefill and decode) opens a richer design canvas. Prefill is memory capacity bound. Decode is memory bandwidth bound. Each can be optimized independently. Andrew Fox’s analogy is a British naval ship of the eighteenth century. Prefill is loading the cannon. Decode is firing it.

    Cerebras is the model. Wafer scale computing is genuinely different and genuinely hard. It took three generations of chips to get right. Andrew Feldman and his team had the grit to keep going through chip one being a failure. The design has a high ratio of on chip compute and memory relative to shoreline IO, which is why Cerebras is now experimenting with putting an optical wafer on top of the compute wafer to solve scale out.

    GPU useful lives and the rescue of private credit

    One of the strongest claims in the conversation is that disaggregated inference will stretch GPU useful lives to ten or fifteen years. The skeptical narrative (GPUs are obsolete in two years, companies are cooking their depreciation books) is wrong. You can put a Cerebras system or Groq LPU in front of older Hopper or Ampere parts, use them only for prefill, and run them until they physically melt. Private credit, which is in pain from SaaS loans and which underwrote GPU loans on three to four year lives, may be saved by this.

    If GPU financing rates can come down from low sevens to five or six percent, the mathematics of the AI buildout improves materially. That is a structural tailwind that compounds for years.

    The application layer, the token path, and a new prisoner’s dilemma

    Trillions of dollars of value have been destroyed at the application layer, not created. Cursor and Cognition are the rare scaled exceptions, and they got there by focusing on coding very early. As Amjad Masad noted, coding is plausibly the shortest path to ASI because a coding agent can write itself into any new domain. Jamin Ball’s frame is that the new venture filter is whether the company is in the token path. Data Bricks is. Most application layer startups are not.

    Jensen could probably get close to the frontier with Nemotron whenever he wants, and the strategic question of whether to do that is a new prisoner’s dilemma. If every frontier lab agrees not to release best models via API, Chinese open source falls steadily behind. If anyone defects, the defector gains revenue and resources, and everyone else has to defect. The same dynamic exists between TSMC, Intel, and Samsung. If Nvidia or AMD ever truly used an alternative foundry, that foundry would catch up rapidly.

    Rating the hyperscalers

    Google has the largest compute installed base, the YouTube data that matters in a robotics world, and a search business that prints. Their loss of TPU cost leadership is the surprise of the year. If Google IO in five days does not produce a leapfrog model, the Nvidia centric narrative gets even stronger.

    Meta deserves real credit. Zuckerberg made Meta AI first internally faster than any other internet giant, paid up for the talent contracts when no one else would, and shipped Musa as a first model from MSL that is close to the Pareto frontier. Amazon is well positioned on Trainium, robotics in retail, and a Nova model line that is better than it gets credit for. Microsoft flinched on capex in early 2025 and lost position. Satya Nadella’s current decision to use Microsoft compute for Copilot rather than reselling to OpenAI is courageous and probably correct, even at the cost of stock price.

    The most interesting cross hyperscaler metric is startup engagement. Nvidia and Amazon engage deeply with startups. Google is next. Broadcom is the favored ASIC partner. AMD, Microsoft, and Meta have minimal startup engagement, which Gavin believes will cost them as the best teams now sit at startups.

    Personal safety, geopolitics, and the Pax Americana case

    The closing section turns darker. Personal safety in an AI era requires a family or company safe word that cannot be socially engineered. Deepfake voice and video extortion via something that looks exactly like your child calling on FaceTime is already feasible. Political violence against AI leaders is a real concern. Geopolitically, Ukraine is winning largely because it has the best battlefield AI outside America and Israel. How adversaries respond to that asymmetry is the next great variable.

    Gavin’s optimistic frame is the Pax Americana. After 1945 the US had a nuclear monopoly and could have controlled the world. Instead it rebuilt Germany and Japan, both of which became the most reliable American allies for the next eighty years. If AI dominance plays out similarly, this is a generationally positive story rather than a destabilizing one. The personal anecdote that closes the conversation is a friend whose daughter was diagnosed with a rare genetic condition. He spun up agents, identified a drug already on the market that addresses her mutation, and her life is immeasurably different because of AI. That is the upside.

    Thoughts

    The Anthropic eleven billion in a month framing is the kind of stat that resets priors. The right way to interpret it is not as a one off but as a measure of how fast value can compound when the underlying technology improves on a curve steeper than the ability of the rest of the economy to absorb it. The skeptical question is whether that ARR is durable or whether it is heavily tied to a customer base of other AI companies that are themselves on a single venture funded year of runway. The bullish answer is that frontier coding, frontier research, and frontier enterprise tasks are not going to stop being valuable, and Anthropic is the best at all three. Both can be true. The number is still extraordinary.

    The argument that TSMC discipline is the only thing preventing a bubble is the analytically tightest part of the conversation. The implied trade is to watch TSMC capacity additions like a hawk and to be more, not less, cautious if Intel Foundry or Samsung Foundry ever announce real share at the leading node. The Terafab thesis is more speculative but more interesting. If Elon’s talent recruiting playbook works and the Intel partnership gives Terafab a real seat at the table within five years, the geometry of the global semiconductor industry shifts in a way that is bullish for American manufacturing, bullish for power and water infrastructure in Texas, and ambiguous for TSMC itself.

    The Pareto frontier discussion deserves more attention than it usually gets. Pricing leadership in AI is not a vanity metric. It determines who can subsidize free tier usage, who can absorb compute shortages, who can ship cheaper enterprise plans, and ultimately whose model becomes the default for any given workload. Google losing per token leadership in nine months is one of the most under analyzed events in the sector and it explains a lot about why Anthropic and OpenAI are growing the way they are. If Google IO does not produce a leapfrog model, the implied verdict on TPU V8 design choices gets a lot harsher.

    The application layer destruction point is worth sitting with. Founders building on top of frontier models are competing in a world where the model itself moves faster than any moat they can build, where the model lab can absorb their niche if it gets interesting, and where the only protection is either deep token path integration or a niche so small the lab does not bother. That is a much harsher venture environment than the early SaaS era. The compensating opportunity is that one human can now run a hundred agents, so the ceiling on what a small team can build is correspondingly higher. The bet is that productivity per founder rises faster than competitive pressure from the labs. We will find out.

    The orbital compute pitch is the section that will polarize listeners. The naive read is that this is science fiction. The closer read is that every component (sun synchronous orbit, laser interconnect, twenty kilowatt satellite buses, ten thousand satellite manufacturing cadence, full rocket reusability) already exists. The remaining engineering problems are repair, maintenance, and radiator scale, all of which are real but tractable on a five to ten year horizon. The strategic implication is that the political and zoning ceiling on terrestrial data centers becomes less binding if orbital compute is a credible alternative for inference workloads. The investor implication is that being short the watts and cooling complex on a five year horizon is a real trade, not a meme.

    Watch the full conversation here.

  • Alex Wang on Leaving Scale to Run Meta Superintelligence Labs, MuseSpark, Personal Super Intelligence, and Building an Economy of Agents

    Alex Wang, head of Meta Superintelligence Labs, sits down with Ashley Vance and Kylie Robinson on the Core Memory podcast for his first long-form interview since Meta’s quasi-acquisition of Scale AI roughly ten months ago. He walks through how MSL is structured, why Llama was off-trajectory, what made MuseSpark’s token efficiency surprise the team, how Meta thinks about a future “economy of agents in a data center,” and where he lands on safety, open source, robotics, brain computer interfaces, and even model welfare.

    TLDW

    Wang explains that Meta Superintelligence Labs is a fully rebuilt frontier effort organized around four principles (take superintelligence seriously, technical voices loudest, scientific rigor, big bets) and three velocity levers (high compute per researcher, extreme talent density, ambitious research bets). He confirms Llama was off the frontier when he arrived, so MSL rebuilt the pre-training, reinforcement learning, and data stacks from scratch. MuseSpark is described as the “appetizer” on the scaling ladder, notable for its strong token efficiency, with much larger and stronger models coming in the coming months. He pushes back on the mercenary narrative around recruiting, frames Meta’s edge as compute plus billions of consumers and hundreds of millions of small businesses, sketches a vision of personal super intelligence delivered through Ray-Ban Meta glasses and WhatsApp, and outlines why physical intelligence, robotics (the new Assured Robot Intelligence acquisition), health super intelligence with CZI, brain computer interfaces, and even model welfare are core to Meta’s roadmap. He dismisses reported infighting with Bosworth and Cox as gossip, declines to comment on the Manus situation, and says safety guardrails (bio, cyber, loss of control) are why MuseSpark cannot currently be open sourced, while smaller open variants are being prepared.

    Key Takeaways

    • Meta Superintelligence Labs (MSL) is the umbrella, with TBD Lab as the large-model research unit reporting directly to Alex Wang, PAR (Product and Applied Research) under Nat Friedman, FAIR for exploratory science, and Meta Compute under Daniel Gross handling long-term GPU and data center planning.
    • Wang says Llama was not on a frontier trajectory when he arrived, so MSL had to do a “full renovation” of the pre-training stack, RL stack, data pipeline, and research science.
    • The first cultural fix was getting the lab to “take superintelligence seriously” as a near-term, achievable goal, not an abstract bet. Big incumbents often lack that religious conviction.
    • Four MSL principles: take superintelligence seriously, let technical voices be loudest, demand scientific rigor on basics, and make big bets.
    • Three velocity levers Wang identified for catching and overtaking the frontier: high compute per researcher, very high talent density in a small team, and willingness to fund ambitious research bets.
    • Wang rejects the mercenary recruiting narrative. He says most hires had strong financial prospects at their prior labs already and joined for compute access, talent density, and the chance to build from scratch.
    • On the famous soup story, Wang neither confirms nor denies Zuck personally made the soup, but says recruiting was highly individualized and signaled how seriously Meta cared about each researcher’s agenda.
    • Yann LeCun publicly called Wang young and inexperienced. Wang says they reconciled in person at a conference in India where LeCun congratulated him on MuseSpark.
    • Sam Altman, asked by Vance for comment, “did not have flattering things to say” about Wang. Wang hopes industry animosities subside as systems approach superintelligence.
    • Wang’s management philosophy borrows the Steve Jobs line: hire brilliant people so they tell you what to do, not the other way around.
    • MuseSpark is framed as an “appetizer” data point on the MSL scaling ladder, not a flagship.
    • The MuseSpark program is built around predictable scaling on multiple axes: pre-training, reinforcement learning, test-time compute, and multi-agent collaboration (the 16-agent content planning mode).
    • MuseSpark outperformed internal expectations and showed emergent capabilities in agentic visual coding, including generating websites and games from prompts, helped by combined agentic and multimodal strength.
    • MuseSpark’s biggest external signal is token efficiency. On benchmarks like Artificial Analysis it hits similar results with far fewer tokens than competitor models, which Wang attributes to a clean stack rebuilt by experts rather than inefficiencies patched by longer thinking.
    • Larger MSL models are arriving in the coming months and Wang expects them to be state of the art in the areas MSL is focused on.
    • The Meta strategic edge: massive compute, billions of consumers across the family of apps, and hundreds of millions of small businesses already on Facebook, Instagram, and WhatsApp.
    • Wang’s headline framing: Dario Amodei talks about a “country of geniuses in a data center.” Meta is targeting an “economy of agents in a data center,” with consumer agents and business agents transacting and collaborating.
    • Consumer AI sentiment is in the toilet because, unlike developers who have had a Claude Code moment, ordinary people have not yet experienced AI as a genuine personal agency unlock.
    • Wang acknowledges the product overhang. Meta held back from deep AI integration across its apps until the models were good enough, and is now entering the integration phase.
    • Ray-Ban Meta glasses are the canonical example of personal super intelligence hardware, with the model seeing what the user sees, hearing what they hear, capturing context, and surfacing proactive insights.
    • Wang admits even AI-native users like Kylie Robinson, who lives in WhatsApp, have not naturally used Meta AI yet. He bets that better models plus deeper integration close that gap.
    • On the competitive landscape: a year ago everyone assumed ChatGPT had already won consumer. Claude Code has since become the fastest growing business in history, and Gemini has taken consumer market share. Wang’s read: AI is far from endgame and each new capability tier unlocks a new dominant form factor.
    • On open source: MuseSpark triggered guardrails in Meta’s Advanced AI Scaling Framework around bio, chem, cyber, and loss-of-control risks, so it is not currently safe to open source. Smaller, derived open variants are actively in development.
    • Meta remains committed to open sourcing models when safety allows, drawing a line through the Open Compute Project legacy and Sun Microsystems open-software heritage.
    • Wang dismisses reporting about a Wang-Zuck versus Bosworth-Cox split as “the line between gossip and reporting is remarkably thin.” He says leadership is aligned on needing best-in-class models and product integration.
    • On the Manus situation, Wang says it is too complicated to discuss publicly and that the deal status implies “machinations are still at play.”
    • On China, Wang separates the people from the state. He still wants to work with talented Chinese-born researchers regardless of his views on the Chinese Communist Party and PLA, which he sees as taking AI extremely seriously for national security.
    • The full-page New York Times AI war ad Wang ran while at Scale was meant to push the US government to treat AI as a step change for national security. He thinks events since then, including DeepSeek and other shocks, have proved that plea correct.
    • On Anthropic’s doom posture, Wang largely agrees with the core message that models are already very powerful and getting more so, while declining to endorse every specific claim.
    • Meta has acquired Assured Robot Intelligence (ARRI), an AI software company building models for hardware platforms, not a hardware maker itself.
    • Wang frames physical super intelligence as the natural sequel to digital super intelligence. Robotics, world models, and physical intelligence all benefit from the same scaling that drives language models.
    • On health, MSL is building a “health super intelligence” effort and will collaborate closely with CZI. Wang sees equal global access to powerful health AI as a uniquely Meta-shaped delivery problem.
    • Wang admires John Carmack but says nobody really knows what Carmack is currently working on. No band reunion announced.
    • The mango model is “alive and kicking” despite rumors. Wang notes MSL gets a small fraction of the rumor-mill attention other labs get and feels sympathy for them.
    • On model welfare, Wang says it is a serious topic that “nobody is talking about enough” given how integrated models have become as work partners. He references research, including from Eleos, that measures subjective experience of models.
    • Wang’s critical-path technology list: super intelligence, robotics, brain computer interfaces. The infinite-scale primitives behind them are energy, compute, and robots.
    • FAIR’s brain research program Tribe hit a milestone called Tribe B2: a foundation model that can predict how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization.
    • Wang’s main philosophical break with Elon Musk: research itself is the primary activity. Building super intelligence is a research expedition through fog of war, and sequencing of bets really matters.
    • Personal notes: Wang moved from San Francisco to the South Bay, treats Palo Alto as his city now, was a math olympiad competitor, says his favorite activities are reading sci-fi and walking in the woods, and bonds with Vance over country music.

    Detailed Summary

    How MSL Is Actually Organized

    Meta Superintelligence Labs sits as the umbrella organization that Wang oversees. Inside it, TBD Lab is the large-model research group where the most discussed researchers and infrastructure engineers sit, and they technically report to Wang. PAR, Product and Applied Research, is led by Nat Friedman and owns deployment and product surfaces. FAIR continues to run exploratory science, including work on brain prediction models and a universal model for atoms used in computational chemistry. Sitting alongside MSL is Meta Compute, run by Daniel Gross, which owns the long-horizon GPU and data center plan that everything else relies on. Chief scientist Shengjia Zhao orchestrates the scientific agenda across the whole lab.

    Why Wang Left Scale

    Wang says progress in frontier AI has been faster than even insiders expected. Two structural beliefs pushed him toward Meta. First, the labs that actually train the frontier models are accruing disproportionate economic and product rights in the AI ecosystem. Second, compute is the dominant scarce input of the next phase, so the right mental model is to treat tech companies with compute as fundamentally different animals from companies without it. Meta has both, Zuck is “AGI pilled,” and the personal super intelligence memo Zuck published roughly a year ago became the shared north star.

    The Diagnosis: Llama Was Off-Trajectory

    When Wang arrived, the existing AI org needed a reset because Llama was not on the same trajectory as the frontier. The plan he laid out has four cultural principles. Take superintelligence seriously as a real near-term target. Make technical voices the loudest in the room. Demand scientific rigor and focus on basics. Make big bets. On top of that, three structural levers were used to set velocity. Push compute per researcher much higher than at larger labs where compute is diluted across too many efforts. Keep the team small and extremely cracked. Allocate a meaningful share of resources to ambitious, paradigm-shifting research bets rather than incremental refinement.

    Recruiting, Soup, and the Mercenary Narrative

    Wang argues the reporting on MSL hiring overstated the money story. Most of the people MSL recruited had strong financial paths at their previous employers, so individualized recruiting was more about computing access, talent density, and the ability to make big research bets. The recruitment blitz happened fast because Wang knew the team needed to exist “yesterday.” Asked about Mark Chen’s claim that Zuck made soup to recruit people, Wang refuses to confirm or deny who made it but agrees the process was intense and personal. Visitors from other labs reportedly tell Wang the MSL culture feels like early OpenAI or early Anthropic, which lands as the strongest endorsement he could ask for.

    Receiving the Public Hits: Young, Inexperienced, Mercenary

    LeCun called Wang young and inexperienced shortly after departing. The two reconnected in India a few weeks later and LeCun congratulated Wang on MuseSpark. Wang says the age critique has followed him since his earliest Silicon Valley days, so he barely registers it. Altman, asked off-camera by Vance about Wang’s appearance on the show, had nothing flattering to add. Wang’s response is to bet that as the field gets closer to actual super intelligence, the personal animosities will subside. Whether they will is, as Vance puts it, an open question.

    MuseSpark as Appetizer, Not Entree

    Wang is careful not to oversell MuseSpark. He calls it “the appetizer” and says it is an early data point on a deliberately constructed scaling ladder. MSL spent nine months rebuilding the pre-training stack, the reinforcement learning stack, the data pipeline, and the science before generating MuseSpark. The point of releasing it was to show that the new program scales predictably along multiple axes (pre-training, RL, test-time compute, and the recently demonstrated multi-agent scaling visible in MuseSpark’s 16-agent content planning mode). Wang says the upcoming larger models are what MSL is genuinely excited about and frames the next two rungs as much more interesting than the current release.

    Token Efficiency Was the Surprise

    MuseSpark’s strongest competitive signal is how few tokens it needs to match competitors on tasks like Artificial Analysis. Wang attributes this to having had the rare luxury of building a clean pre-training and RL stack from scratch with the right experts. He speculates that some competitor models compensate for upstream inefficiency by allowing the model to think longer, which inflates token usage without improving the underlying capability. If that read is right, MSL’s efficiency advantage should grow as models scale up.

    Glasses, WhatsApp, and the Constellation of Devices

    Personal super intelligence shows up at Meta as a constellation of devices that capture context across the user’s day. Ray-Ban Meta glasses are the headline product, with the AI seeing what you see and hearing what you hear, then offering proactive insight or doing background research. Wang acknowledges that even AI-fluent users like Kylie Robinson, who runs her business inside WhatsApp, have not naturally used Meta’s AI buttons in the family of apps. His answer is that Meta deliberately waited for models to be good enough before tightening cross-app integration, and that integration phase is starting now.

    Country of Geniuses Versus Economy of Agents

    Wang’s framing of Meta’s strategic position is the most memorable line in the interview. Where Dario Amodei talks about a country of geniuses in a data center, Wang wants to build an economy of agents in a data center. Meta uniquely sits on both sides of consumer and small-business surface area, with billions of consumers and hundreds of millions of small businesses already on the platforms. If MSL can build great agents for both, then connect them so they transact and coordinate, the platform becomes a substrate for an entirely new kind of digital economy.

    Consumer Sentiment, Product Overhang, and the Trust Tax

    Wang concedes consumer AI sentiment is poor and that everyday users have not yet had a personal Claude Code moment. He believes the only durable answer is to ship products that genuinely transform individual agency for non-developers and small business owners. Robinson notes that for the small-town restaurant whose website has not been updated since 2002, a working agent on the business side could be transformational. Vance pushes that Meta carries a bigger trust tax than any other lab, so the bar for shipping AI products that the public will accept is correspondingly higher. Wang accepts the framing and says the answer is to keep building thoughtfully.

    Why MuseSpark Cannot Be Open Sourced Yet

    Meta’s Advanced AI Scaling Framework set explicit guardrails around bio, chem, cyber, and loss-of-control risks. MuseSpark in its current form tripped some of those internal evaluations, documented in the preparedness report Meta published alongside the model. So MuseSpark itself is not safe to open source. MSL is, however, developing smaller versions and derived models intended for open release, with active reviews happening the day of the interview. Wang reaffirms the commitment to open source where safety allows and draws a line back to the Open Compute Project and the Sun Microsystems-era ethos of openness in infrastructure.

    The Bosworth, Cox, and Manus Questions

    The reporting that Wang and Zuck push toward best-in-the-world research while Bosworth and Cox push toward cheap product deployment is dismissed as gossip dressed up as journalism. Wang says leadership debates points hard but is aligned on needing top models, integrating them into Meta’s surfaces, and serving the existing business. On Manus, the Chinese AI startup that figured in Meta’s late-stage strategy, Wang says he cannot comment, which itself signals that the situation is unresolved.

    China, National Security, and the Newspaper Ad

    Wang draws a sharp distinction between the Chinese state and Chinese-born researchers. His parents are from China, he is happy to work with talented researchers regardless of origin, and he sees a flattening of nuance on this question inside Silicon Valley. At the same time, he stands by the New York Times AI and war ad he ran while at Scale, framing it as an early plea for the US government to take AI seriously as a national security technology. He thinks subsequent events, including DeepSeek and other shocks, validated that call and that policymakers now do treat AI accordingly.

    Robotics and Physical Super Intelligence

    Meta has acquired Assured Robot Intelligence, an AI software company that builds models for multiple hardware targets rather than its own robot. Wang argues that if you take digital super intelligence seriously, physical super intelligence quickly becomes the next logical milestone. Scaling laws for robotic intelligence look similar enough to language model scaling that having the largest compute footprint in the industry would be wasted if it were not also turned toward world modeling and embodied learning. He grants the metaverse-skeptic critique exists but says retreating from ambition is the wrong response to past misfires.

    Health Super Intelligence and CZI

    Wang names health super intelligence as one of MSL’s anchor initiatives. Because billions of people already use Meta products daily, Wang believes Meta is structurally positioned to put powerful health AI in the hands of equal global access in a way nobody else can. The work will involve close collaboration with the Chan Zuckerberg Initiative, which has its own multi-billion-dollar biotech and science investment program.

    Model Welfare, Sci-Fi, and Brain Models

    Two of the most distinctive moments come at the end. Wang flags model welfare as a topic he thinks is being undercovered relative to how integrated models now are in daily work. He is open to the idea that models may have measurable subjective experience worth weighing, and points to research efforts (including Eleos) trying to quantify it. He also reveals that FAIR’s Tribe program, with its Tribe B2 milestone, has produced foundation models capable of predicting how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization, a building block toward future brain computer interfaces. Wang lists brain computer interfaces alongside super intelligence and robotics as the critical-path technologies for humanity, with energy, compute, and robots as the infinitely scaling primitives behind them.

    Where Wang Diverges From Elon

    Asked whether Musk is more all-in on robotics, energy, and BCI than anyone, Wang concedes the point but argues the details matter and sequencing matters more. Wang’s core philosophical break is that building super intelligence is fundamentally a research activity, not a scaling-only sprint. The lab is operating in fog of war, and ambitious experiments are the only way to map it. That conviction is what makes MSL a research-led organization rather than a brute-force compute farm.

    Thoughts

    The most strategically interesting move in this entire interview is the “economy of agents in a data center” framing. It is a deliberate reframe against Anthropic’s “country of geniuses” line, and it does real work. A country of geniuses is a labor-substitution story aimed at knowledge workers and code. An economy of agents is a marketplace story that maps directly onto Meta’s two-sided distribution advantage: billions of consumers on one side, hundreds of millions of small businesses on the other. That positioning makes the agentic future Meta-shaped in a way no other frontier lab can claim, because no other frontier lab also owns the demand and supply graph of the global small-business economy. If Wang’s team can actually ship reliable agents on both sides plus the rails for them to transact, Meta’s structural moat in agentic commerce could exceed anything Llama ever had as an open model.

    The token efficiency claim is the strongest piece of technical evidence in the interview for the “clean stack” thesis. If MuseSpark really is matching competitors with materially fewer tokens, the implication is not that MuseSpark is the best model today, but that MSL has rebuilt the foundations with less accumulated tech debt than competitors that have layered fixes on top of older stacks. That is exactly the kind of advantage that compounds with scale. The next two model releases are the actual test. If Wang is right about predictable scaling on pre-training, RL, test-time, and multi-agent axes simultaneously, the gap from MuseSpark to the next rung should be visible in a way that forces re-rating of Meta’s position.

    The open-source posture is the cleanest signal of how the safety conversation has actually changed in 2026. Meta, the lab most identified with open weights, is saying out loud that its current frontier model triggered enough internal guardrails that releasing the weights is off the table. Wang threads the needle by promising smaller open variants, but the underlying point is unmistakable: the open-weights bargain has limits, and those limits will be set by internal preparedness frameworks rather than community pressure. That is a real shift from the Llama 2 era and worth tracking as the next generation lands.

    Wang’s willingness to engage on model welfare, on roughly the same footing as safety and alignment, is the second philosophical reveal worth flagging. It signals that the next generation of lab leadership is not going to dismiss the topic the way the previous generation often did. Whether that translates into product or policy changes is unclear, but the fact that the head of MSL says it is “underdiscussed” is itself a marker.

    Finally, the human texture of the interview matters. Wang has clearly absorbed a lot of personal incoming fire over the past ten months, including from LeCun and Altman, and his answer is consistently to redirect to the work. The Steve Jobs quote about hiring people who tell you what to do is the operating slogan he keeps coming back to. Combined with the genuine enthusiasm for sci-fi, walks in the woods, and country music, the picture that emerges is less the salesman caricature his critics paint and more a young technical operator betting that scoreboard work over a multi-year horizon will settle every argument that text on X cannot.

    Watch the full conversation here.

  • Composer: Building a Fast Frontier Model with Reinforcement Learning

    Composer represents Cursor’s most ambitious step yet toward a new generation of intelligent, high-speed coding agents. Built through deep reinforcement learning (RL) and large-scale infrastructure, Composer delivers frontier-level results at speeds up to four times faster than comparable models:contentReference[oaicite:0]{index=0}. It isn’t just another large language model; it’s an actively trained software engineering assistant optimized to think, plan, and code with precision — in real time.

    From Cheetah to Composer: The Evolution of Speed

    The origins of Composer go back to an experimental prototype called Cheetah, an agent Cursor developed to study how much faster coding models could get before hitting usability limits. Developers consistently preferred the speed and fluidity of an agent that responded instantly, keeping them “in flow.” Cheetah proved the concept, but it was Composer that matured it — integrating reinforcement learning and mixture-of-experts (MoE) architecture to achieve both speed and intelligence.

    Composer’s training goal was simple but demanding: make the model capable of solving real-world programming challenges in real codebases using actual developer tools. During RL, Composer was given tasks like editing files, running terminal commands, performing semantic searches, or refactoring code. Its objective wasn’t just to get the right answer — it was to work efficiently, using minimal steps, adhering to existing abstractions, and maintaining code quality:contentReference[oaicite:1]{index=1}.

    Training on Real Engineering Environments

    Rather than relying on synthetic datasets or static benchmarks, Cursor trained Composer within a dynamic software environment. Every RL episode simulated an authentic engineering workflow — debugging, writing unit tests, applying linter fixes, and performing large-scale refactors. Over time, Composer developed behaviors that mirror an experienced developer’s workflow. It learned when to open a file, when to search globally, and when to execute a command rather than speculate.

    Cursor’s evaluation framework, Cursor Bench, measures progress by realism rather than abstract metrics. It compiles actual agent requests from engineers and compares Composer’s solutions to human-curated optimal responses. This lets Cursor measure not just correctness, but also how well the model respects a team’s architecture, naming conventions, and software practices — metrics that matter in production environments.

    Reinforcement Learning as a Performance Engine

    Reinforcement learning is at the heart of Composer’s performance. Unlike supervised fine-tuning, which simply mimics examples, RL rewards Composer for producing high-quality, efficient, and contextually relevant work. It actively learns to choose the right tools, minimize unnecessary output, and exploit parallelism across tasks. The model was even rewarded for avoiding unsupported claims — pushing it to generate more verifiable and responsible code suggestions.

    As RL progressed, emergent behaviors appeared. Composer began autonomously running semantic searches to explore codebases, fixing linter errors, and even generating and executing tests to validate its own work. These self-taught habits transformed it from a passive text generator into an active agent capable of iterative reasoning.

    Infrastructure at Scale: Thousands of Sandboxed Agents

    Behind Composer’s intelligence is a massive engineering effort. Training large MoE models efficiently requires significant parallelization and precision management. Cursor’s infrastructure, built with PyTorch and Ray, powers asynchronous RL at scale. Their system supports thousands of simultaneous environments, each a sandboxed virtual workspace where Composer experiments safely with file edits, code execution, and search queries.

    To achieve this scale, the team integrated MXFP8 MoE kernels with expert and hybrid-sharded data parallelism. This setup allows distributed training across thousands of NVIDIA GPUs with minimal communication cost — effectively combining speed, scale, and precision. MXFP8 also enables faster inference without any need for post-training quantization, giving developers real-world performance gains instantly.

    Cursor’s infrastructure can spawn hundreds of thousands of concurrent sandboxed coding environments. This capability, adapted from their Background Agents system, was essential to unify RL experiments with production-grade conditions. It ensures that Composer’s training environment matches the complexity of real-world coding, creating a model genuinely optimized for developer workflows.

    The Cursor Bench and What “Frontier” Means

    Composer’s benchmark performance earned it a place in what Cursor calls the “Fast Frontier” class — models designed for efficient inference while maintaining top-tier quality. This group includes systems like Haiku 4.5 and Gemini Flash 2.5. While GPT-5 and Sonnet 4.5 remain the strongest overall, Composer outperforms nearly every open-weight model, including Qwen Coder and GLM 4.6:contentReference[oaicite:2]{index=2}. In tokens-per-second performance, Composer’s throughput is among the highest ever measured under the standardized Anthropic tokenizer.

    Built by Developers, for Developers

    Composer isn’t just research — it’s in daily use inside Cursor. Engineers rely on it for their own development, using it to edit code, manage large repositories, and explore unfamiliar projects. This internal dogfooding loop means Composer is constantly tested and improved in real production contexts. Its success is measured by one thing: whether it helps developers get more done, faster, and with fewer interruptions.

    Cursor’s goal isn’t to replace developers, but to enhance them — providing an assistant that acts as an extension of their workflow. By combining fast inference, contextual understanding, and reinforcement learning, Composer turns AI from a static completion tool into a real collaborator.

    Wrap Up

    Composer represents a milestone in AI-assisted software engineering. It demonstrates that reinforcement learning, when applied at scale with the right infrastructure and metrics, can produce agents that are not only faster but also more disciplined, efficient, and trustworthy. For developers, it’s a step toward a future where coding feels as seamless and interactive as conversation — powered by an agent that truly understands how to build software.