PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: talent density

Paul Graham in Stockholm on Why Founders Should Go to Silicon Valley and How Sweden Can Become the Silicon Valley of Europe
Paul Graham, the Y Combinator co-founder whose essays have shaped how a generation of founders thinks about startups, took the stage in Stockholm to answer two questions at once. Should you, as an ambitious founder, go to Silicon Valley? And what should Sweden do to thrive as a startup hub? His surprising thesis is that both questions have the same answer. Watch the full talk on YouTube.

TLDW

Graham argues that talent in any high-intensity field concentrates in one geographic center, the way painting clustered in 1870s Paris, math in Gutting around 1900, and movies in 1950s Hollywood. For startups today, that center is Silicon Valley. Founders should go, at least for a while, because the talent pool is both bigger and better, because serendipitous meetings outperform planned ones, because investors decide faster, because moving abroad paradoxically earns more respect from investors at home, and because measuring yourself against known greats like Brian Chesky, Sam Altman, or Max Levchin clears away the fog at the summit and shows you the work required to get there. The most subtle benefit is cultural. Silicon Valley has a 60 year old pay it forward custom in which people help strangers for no reason, a habit Graham traces to a place where nobodies become billionaires faster than anywhere else. The pivot to Sweden is that the best way to help Stockholm become a startup hub is for Swedish founders to go to Silicon Valley, ideally through YC, and then come back, importing money, skills, and Valley culture. Yes, returning founders are only half as likely to become unicorns as those who stay, but selection bias and the valuation gap explain most of that, and half a unicorn is still extraordinary. The job of Silicon Valley of Europe is unclaimed. Mountain View was a backwater in 1955 too. Critical mass is invisible until it is reached.

Key Takeaways
- Whenever humans work intensely on something, one place in the world becomes its center. Painting in 1870 was Paris. Math in 1900 was Gutting. Movies in 1950 was Hollywood. Startups today is Silicon Valley.
- Every ambitious person working in those eras faced the same decision founders face now. The right answer is the same one it has always been. Yes, go. You can come back, but you should at least go.
- National borders do not change the basic logic of moving from a village to a capital city. The reasoning that says move to where your peers are does not even know the dotted line on the map is there.
- At the great center, the talent pool expands in two dimensions at once. The people are better and there are more of them, and they cluster, producing an intoxicating concentration of ability.
- Serendipitous meetings are mysteriously, enormously valuable. Biographies of people who do great things are full of chance encounters that change everything.
- Graham offers three candidate explanations for why unplanned meetings beat planned ones. There are simply more of them, so outliers are statistically unplanned. Planned meetings may be too conservative because they require a stated reason in advance. Unplanned conversations let you bail in the first few sentences, so the ones that continue are pre filtered for fit.
- For ambitious people there is nothing better than serendipitous meetings with other people working on the same hard thing. Big centers produce more of them.
- Things move faster in big centers because better people are more confident and more decisive, and because peers compete with and egg each other on. Ideas get acted on rather than half held.
- Investors in Silicon Valley decide dramatically faster than European investors. They are more confident and they face stiff competition, so they cannot sit on a good opportunity without losing it.
- This produces a counterintuitive rule. The more right an investor is about a deal, the less time they can wait, because everyone else who meets the same founder is going to invest too.
- Yuri Sagalov is the canonical example. He invested in Max Levchin instantly because he knew anyone else who met Max would invest. Speed is the rational response to a crowded, high quality market.
- Valley investors grumble that valuations are too high and decisions too rushed, yet they outperform European investors empirically. The complaining is just noise.
- Moving abroad earns you more respect from investors back home. Jesus said no one is a prophet in their own country, and local investors implicitly assume local startups are second rate everywhere, not just in Sweden.
- Leaving inverts that rule and lifts you in local investors estimation. Sometimes the mere announcement that you got into Y Combinator is enough. Investors who ignored you for months suddenly trip over themselves to write checks.
- The Dropbox story illustrates this perfectly. A big Boston VC firm spent a year offering Drew Houston encouragement and advice but no money. The moment Sequoia got interested in Silicon Valley, that same firm faxed Drew a term sheet with a blank valuation. Drew went with Sequoia anyway and in 2018 Dropbox became the first YC company to go public.
- The biggest advantage of moving to a great center is not what it does for you but what it does to you. A big fish in a small pond cannot tell how big it actually is.
- In a big pond you can measure yourself against known giants. Surprisingly often the news is good. You see Brian Chesky or Sam Altman or Max Levchin and realize they are not a different species. You could do what they did if you worked that hard.
- The key word is hard. Seeing a giant up close also calibrates the cost. It is not just I could be like that. It is I could be like that if I worked as hard as that.
- Graham offers a Mount Olympus metaphor. Moving to the mountain clears away the fog at the top. The summit is right there, quite high but no longer impossibly high. Ambitious people need a high but definite threshold.
- The most surprising thing about Silicon Valley to outsiders is that people help you for no reason. A founder who recently moved from England said every conversation seems to end with what can I do to help you.
- This is not politeness. English people are far more polite than Americans on average. The helpfulness is a different cultural artifact specific to the Valley.
- Graham traces the origin to economics. Silicon Valley is the place where nobodies become billionaires faster than anywhere else, so being nice to nobodies has historically paid off. If the helping behavior was ever calculated, the calculation is gone now. The custom is 60 years old and has become reflex.
- Ron Conway is the purest expression of the pattern. All he does is help people. He does not track whether they are portfolio companies. He does not remember most of the favors. That untracked, indiscriminate helpfulness lets him operate at a much larger scale.
- When many people behave this way at once, the conservation law for favors breaks down. There are just more favors. The pie grows.
- Moving to the Valley changes you. One of the strangest effects is that it makes you more helpful to other people.
- The answer to how Sweden should thrive as a startup hub is buried inside the answer to whether founders should go. Go to Silicon Valley for a bit and then come back.
- That move helps Sweden in three concrete ways. The average quality of Swedish startups goes up. Returning founders bring Silicon Valley money back with them. And they import Silicon Valley culture, which has spent decades evolving to be optimal for startups.
- Silicon Valley culture is more compatible with Swedish culture than people realize. Sweden lacks the tall poppies problem (which it should drop anyway) and shares the high trust trait that makes the Valley work.
- Historical precedent backs this. In the 1800s Sweden literally gave mathematicians fellowships conditional on leaving the country to study math abroad. Boycotting Gutting in the name of building Swedish math would have been absurd.
- YC is the optimal way to do the go for a bit and come back move. It is a deliberately engineered super valley within the Valley, concentrating density of founders, helpfulness, and investor speed into four to six months.
- If the Swedish government designed a program to give Swedish founders concentrated Silicon Valley exposure, they could not do better than YC, and it costs them nothing because Silicon Valley investors fund it. They do not even have to license it. They just call the API.
- YC data shows founders who go home are only about half as likely to become unicorns as those who stay. Three reasons not to be discouraged. First, selection bias. The most confident and determined founders are the ones willing to relocate, so the data is measuring those traits as much as Valley effects.
- Second, the metric is valuation, not company performance. Bay Area startups simply raise at higher multiples for the same business.
- Third, even half as well is still very good. If you would have been a Valley billionaire and end up with 500 million instead, the practical difference is zero. In Swedish kroner you are still a billionaire.
- Money is not everything anyway. Once you have kids, where they grow up becomes the dominant question. That is an argument for returning home that has nothing to do with startups.
- The most exciting upside is that Stockholm could become the Silicon Valley of Europe. The job is unclaimed. Nobody has a confident answer to where the European tech center is.
- Geographic size is not the constraint people think it is. Mountain View was a backwater in 1955 when Shockley Semiconductor was founded there, and it stayed the geographic center of Silicon Valley until 2012 when activity shifted to San Francisco.
- The two ingredients required are a place founders want to live and a critical mass of them. Stockholm clearly clears the first bar. The second is impossible to measure until you hit it, at which point it tips quickly.
- Stockholm may be closer than it looks. Critical mass is the kind of threshold that is invisible until it has already been passed.
Detailed Summary

Why Centers Exist and Why You Have to Go There

Graham opens with a historical pattern. Whenever a field gets pursued intensely, one place becomes its center. Painting in 1870 was Paris. Math in 1900 was Gutting. Movies in 1950 was Hollywood. For startups now it is Silicon Valley. The question every ambitious person in those eras asked, should I go, has had the same correct answer for thousands of years. Yes. You can come back, but at minimum you should go. The logic does not change at national borders. If a villager interested in startups would obviously move to their country’s capital, the same reasoning applies when the capital sits across a dotted line on a map.

What you get at the center is a talent pool that expands in two dimensions at once. The people are better, and there are more of them, and they cluster, producing a density of ability that Graham describes as intoxicating. Every YC batch dinner, he says, feels the way the Stockholm room felt during his talk.

The Mystery of Serendipitous Meetings

One specific benefit of density is serendipitous meetings, and Graham admits he does not fully understand why unplanned encounters outperform planned ones so dramatically. Biographies of accomplished people are dense with chance meetings that redirected entire lives. He offers three possible explanations. Maybe there are simply more unplanned meetings, so statistically the outliers will mostly be unplanned. Maybe planned meetings are too conservative because they require a stated reason in advance, which lops off the upside the same way deliberate startup idea hunts lop off the best ideas. Maybe unplanned conversations have built in selection. You can decide in the first few sentences whether to continue, so the surviving conversations are pre filtered for fit. Whatever the mechanism, big centers produce more of these high value encounters, and that alone is worth the move.

Speed and the Investor Asymmetry

Things move faster in big centers because better people are more confident and more decisive. They egg each other on. Ideas get acted on instead of half held. Graham notes that in villages around the world there are people who half had every famous idea and never moved on it, and now resent the founder who did.

The starkest example is investor speed. Silicon Valley investors decide dramatically faster than European ones, partly because they are better and more confident and partly because competition forces it. An investor who correctly identifies a great opportunity faces a counterintuitive rule. The more right they are, the less time they can wait, because every other investor who meets that founder will reach the same conclusion. Yuri Sagalov is the canonical case. He invested in Max Levchin immediately on meeting him because he knew anyone else would do the same. Valley investors complain that valuations are too high and decisions too rushed, but they empirically outperform European investors anyway. The grumbling is noise.

The Prophet at Home Effect

An underrated benefit of leaving for the center is that it raises your standing at home. Graham quotes the line about no prophet in their own country and notes that investors outside Silicon Valley implicitly assume local startups are second rate. It is not a Swedish problem. It is universal. Leaving inverts the rule. Local investors automatically rate you higher because you have been somewhere they consider serious. Sometimes the mere announcement that you got into Y Combinator triggers the inversion. The Dropbox story is the cleanest illustration. A big Boston VC firm spent a year giving Drew Houston encouragement and advice but no money. The moment Sequoia took an interest in Silicon Valley, that same firm faxed Drew a term sheet with a blank valuation, willing to invest at any price. Drew went with Sequoia. Dropbox went public in 2018 as the first YC IPO.

Big Pond, Visible Summit

The deepest benefit of relocating is not what the center does for you but what it does to you. A big fish in a small pond cannot tell how big it actually is. A big fish in a big pond can. You can stand next to Brian Chesky or Sam Altman or, as the Stockholm audience just had, Max Levchin, and recognize that they are not a different species. You could do what they did, if you worked that hard. The catch, Graham emphasizes twice, is the if. Seeing a giant up close calibrates both the achievability of the summit and the cost of reaching it.

He offers a Mount Olympus image. Moving to the mountain clears away the fog at the top. The summit is right there, quite high but no longer impossibly high. Ambitious people need a high but definite threshold. Visibility transforms a vague aspiration into a clear, hard, finite target.

The Pay It Forward Culture

The most surprising thing about Silicon Valley to outsiders is that people help you for no reason. The phrase sounds normal in the Valley and strange everywhere else, the way clean streets feel normal in Sweden but require explanation elsewhere. Graham asked a founder who recently moved from England what surprised him most. The answer was the helpfulness. Every conversation ended with what can I do to help you. The English founder noted that this was not English politeness, which is a different thing and arguably more pronounced.

Graham traces the origin to economics. Silicon Valley is where nobodies become billionaires faster than anywhere else. Someone with a taste for being nice to nobodies, the kind of person who pets the nobody on the head rather than kicking it aside, was always going to end up with powerful friends in that environment. Whether the original behavior was calculated or not, it is reflexive now. The custom is 60 years old. Ron Conway is the purest expression. He helps everyone, does not track favors, does not remember most of them, and as a result operates at a scale that ledger keeping makes impossible. When many people behave that way at once, the conservation law for favors breaks down. The pie expands. Graham notes that moving to the Valley will change you in this same way, almost involuntarily.

The Sweden Answer Is Inside the Founder Answer

The pivot of the talk is that both questions have the same answer. The way Stockholm thrives as a startup hub is for Swedish founders to go to Silicon Valley and come back. That move helps Sweden in three concrete ways. The average quality of Swedish startups rises. Returning founders bring Valley money back with them. And they import Valley culture, which has been optimized over decades for startups and which is more compatible with Swedish culture than people assume. Sweden lacks the tall poppies dynamic, which it should drop anyway, and shares the high trust trait that the Valley runs on.

The historical analogy is direct. In the late 1800s the Swedish government gave mathematicians fellowships conditional on leaving the country to study abroad. Boycotting Gutting to develop Swedish math would have been self defeating. The same logic applies to startups now.

YC as the Optimal Vehicle

Graham acknowledges he is talking his own book and says it anyway because he thinks it is true. The optimal way to go for a bit and come back is YC. YC is a deliberately engineered super valley inside the Valley, concentrating founder density, helpfulness, and investor speed into a four to six month container. If the Swedish government designed such a program from scratch it would look like YC, and YC costs the government nothing because Silicon Valley investors fund it. There is no licensing process. Founders just call the API.

The Half As Many Unicorns Caveat

The honest data point. Founders who go home after YC are only about half as likely to become unicorns as those who stay. Graham offers three reasons not to be discouraged. First, selection bias. The most confident and determined founders are also the ones willing to relocate, so the data is partly measuring those traits rather than the effect of geography. Second, the metric is valuation, not company performance. Bay Area companies simply raise at higher multiples. Third, half is still very good. A 500 million dollar company instead of a 1 billion dollar one is no real difference in practice, and in Swedish kroner you still cross the billionaire threshold.

Money is not everything anyway. Once you have kids, where they grow up becomes the dominant decision, and that question has nothing to do with valuations.

The Silicon Valley of Europe Is an Open Position

Graham ends with the most ambitious frame. If Sweden transplants enough Valley culture, Stockholm could become the Silicon Valley of Europe. The job is unclaimed. There is no confident answer to where the European startup center is, the way nobody asks where the Silicon Valley of America is because the answer is obvious. Geographic size is a weaker constraint than people think. Mountain View was a backwater in 1955 when Shockley Semiconductor was founded there, and it remained the geometric center of Silicon Valley until activity shifted to San Francisco in 2012. The only real requirements are a place founders want to live and a critical mass of founders. Stockholm clearly clears the first bar. The second is impossible to measure until it is hit, and then it tips fast. Graham closes by suggesting Stockholm may already be closer than it looks.

Thoughts

The most useful idea in this talk is the inversion at the heart of it. Most advice about startup geography frames the choice as a tradeoff between leaving and staying, with leaving optimized for the founder and staying optimized for the country. Graham collapses the two. The country wins more when founders leave and come back than when founders stay out of loyalty. The brain drain framing assumes a fixed pool of talent that can only be in one place. The brain circulation framing, which is what Graham is actually describing, assumes that exposure compounds. A founder who has spent six months absorbing Valley density brings back something a founder who stayed home never had. The Swedish math fellowships from the 1800s are the deepest evidence here. A government that wanted strong domestic mathematicians did not try to build a wall around them. It paid them to leave.

The serendipity argument is the part of the talk that should make planners uncomfortable, because it is essentially an admission that the highest leverage activity in a startup career cannot be scheduled. The three theories Graham offers are not mutually exclusive and the cumulative force of them is that any environment optimized for planned, calendared interaction is by definition lopping off its own upside. This has obvious implications beyond geography. Remote first cultures, calendar tetris, gated office access, and the whole apparatus that converts random encounters into booked meetings are all working against the mechanism Graham is describing. Whether that tradeoff is worth it for any given company is a separate question, but it is at minimum a tradeoff, not a free win.

The pay it forward story is also more economically grounded than it usually gets credit for. Graham is careful to note that the helping behavior may have originated as a calculated bet on being kind to potential future billionaires, then ossified into reflex once enough generations practiced it. That is a more honest origin story than the usual quasi spiritual version. It also implies the culture can be transplanted, but only by recreating the conditions that originally produced it. You cannot just declare a pay it forward culture and have one. You need a place where nobodies actually do become billionaires often enough that helping them rationally pays off, then run that loop for 60 years. Most cities trying to engineer their way into being startup hubs skip past this part and wonder why the culture does not stick.

Finally, the Mountain View in 1955 line is the underrated punch of the talk. People who write off their own city as too small or too peripheral to become anything usually have an idealized image of the current center as a place that was always obviously special. It was not. Shockley Semiconductor went into a strip of orchards. Whatever Stockholm or anywhere else looks like today, it looks more impressive than Mountain View did the year Silicon Valley was born.

Watch the full Paul Graham talk from Stockholm on YouTube.
May 13, 2026
Alex Wang on Leaving Scale to Run Meta Superintelligence Labs, MuseSpark, Personal Super Intelligence, and Building an Economy of Agents
Alex Wang, head of Meta Superintelligence Labs, sits down with Ashley Vance and Kylie Robinson on the Core Memory podcast for his first long-form interview since Meta’s quasi-acquisition of Scale AI roughly ten months ago. He walks through how MSL is structured, why Llama was off-trajectory, what made MuseSpark’s token efficiency surprise the team, how Meta thinks about a future “economy of agents in a data center,” and where he lands on safety, open source, robotics, brain computer interfaces, and even model welfare.

TLDW

Wang explains that Meta Superintelligence Labs is a fully rebuilt frontier effort organized around four principles (take superintelligence seriously, technical voices loudest, scientific rigor, big bets) and three velocity levers (high compute per researcher, extreme talent density, ambitious research bets). He confirms Llama was off the frontier when he arrived, so MSL rebuilt the pre-training, reinforcement learning, and data stacks from scratch. MuseSpark is described as the “appetizer” on the scaling ladder, notable for its strong token efficiency, with much larger and stronger models coming in the coming months. He pushes back on the mercenary narrative around recruiting, frames Meta’s edge as compute plus billions of consumers and hundreds of millions of small businesses, sketches a vision of personal super intelligence delivered through Ray-Ban Meta glasses and WhatsApp, and outlines why physical intelligence, robotics (the new Assured Robot Intelligence acquisition), health super intelligence with CZI, brain computer interfaces, and even model welfare are core to Meta’s roadmap. He dismisses reported infighting with Bosworth and Cox as gossip, declines to comment on the Manus situation, and says safety guardrails (bio, cyber, loss of control) are why MuseSpark cannot currently be open sourced, while smaller open variants are being prepared.

Key Takeaways
- Meta Superintelligence Labs (MSL) is the umbrella, with TBD Lab as the large-model research unit reporting directly to Alex Wang, PAR (Product and Applied Research) under Nat Friedman, FAIR for exploratory science, and Meta Compute under Daniel Gross handling long-term GPU and data center planning.
- Wang says Llama was not on a frontier trajectory when he arrived, so MSL had to do a “full renovation” of the pre-training stack, RL stack, data pipeline, and research science.
- The first cultural fix was getting the lab to “take superintelligence seriously” as a near-term, achievable goal, not an abstract bet. Big incumbents often lack that religious conviction.
- Four MSL principles: take superintelligence seriously, let technical voices be loudest, demand scientific rigor on basics, and make big bets.
- Three velocity levers Wang identified for catching and overtaking the frontier: high compute per researcher, very high talent density in a small team, and willingness to fund ambitious research bets.
- Wang rejects the mercenary recruiting narrative. He says most hires had strong financial prospects at their prior labs already and joined for compute access, talent density, and the chance to build from scratch.
- On the famous soup story, Wang neither confirms nor denies Zuck personally made the soup, but says recruiting was highly individualized and signaled how seriously Meta cared about each researcher’s agenda.
- Yann LeCun publicly called Wang young and inexperienced. Wang says they reconciled in person at a conference in India where LeCun congratulated him on MuseSpark.
- Sam Altman, asked by Vance for comment, “did not have flattering things to say” about Wang. Wang hopes industry animosities subside as systems approach superintelligence.
- Wang’s management philosophy borrows the Steve Jobs line: hire brilliant people so they tell you what to do, not the other way around.
- MuseSpark is framed as an “appetizer” data point on the MSL scaling ladder, not a flagship.
- The MuseSpark program is built around predictable scaling on multiple axes: pre-training, reinforcement learning, test-time compute, and multi-agent collaboration (the 16-agent content planning mode).
- MuseSpark outperformed internal expectations and showed emergent capabilities in agentic visual coding, including generating websites and games from prompts, helped by combined agentic and multimodal strength.
- MuseSpark’s biggest external signal is token efficiency. On benchmarks like Artificial Analysis it hits similar results with far fewer tokens than competitor models, which Wang attributes to a clean stack rebuilt by experts rather than inefficiencies patched by longer thinking.
- Larger MSL models are arriving in the coming months and Wang expects them to be state of the art in the areas MSL is focused on.
- The Meta strategic edge: massive compute, billions of consumers across the family of apps, and hundreds of millions of small businesses already on Facebook, Instagram, and WhatsApp.
- Wang’s headline framing: Dario Amodei talks about a “country of geniuses in a data center.” Meta is targeting an “economy of agents in a data center,” with consumer agents and business agents transacting and collaborating.
- Consumer AI sentiment is in the toilet because, unlike developers who have had a Claude Code moment, ordinary people have not yet experienced AI as a genuine personal agency unlock.
- Wang acknowledges the product overhang. Meta held back from deep AI integration across its apps until the models were good enough, and is now entering the integration phase.
- Ray-Ban Meta glasses are the canonical example of personal super intelligence hardware, with the model seeing what the user sees, hearing what they hear, capturing context, and surfacing proactive insights.
- Wang admits even AI-native users like Kylie Robinson, who lives in WhatsApp, have not naturally used Meta AI yet. He bets that better models plus deeper integration close that gap.
- On the competitive landscape: a year ago everyone assumed ChatGPT had already won consumer. Claude Code has since become the fastest growing business in history, and Gemini has taken consumer market share. Wang’s read: AI is far from endgame and each new capability tier unlocks a new dominant form factor.
- On open source: MuseSpark triggered guardrails in Meta’s Advanced AI Scaling Framework around bio, chem, cyber, and loss-of-control risks, so it is not currently safe to open source. Smaller, derived open variants are actively in development.
- Meta remains committed to open sourcing models when safety allows, drawing a line through the Open Compute Project legacy and Sun Microsystems open-software heritage.
- Wang dismisses reporting about a Wang-Zuck versus Bosworth-Cox split as “the line between gossip and reporting is remarkably thin.” He says leadership is aligned on needing best-in-class models and product integration.
- On the Manus situation, Wang says it is too complicated to discuss publicly and that the deal status implies “machinations are still at play.”
- On China, Wang separates the people from the state. He still wants to work with talented Chinese-born researchers regardless of his views on the Chinese Communist Party and PLA, which he sees as taking AI extremely seriously for national security.
- The full-page New York Times AI war ad Wang ran while at Scale was meant to push the US government to treat AI as a step change for national security. He thinks events since then, including DeepSeek and other shocks, have proved that plea correct.
- On Anthropic’s doom posture, Wang largely agrees with the core message that models are already very powerful and getting more so, while declining to endorse every specific claim.
- Meta has acquired Assured Robot Intelligence (ARRI), an AI software company building models for hardware platforms, not a hardware maker itself.
- Wang frames physical super intelligence as the natural sequel to digital super intelligence. Robotics, world models, and physical intelligence all benefit from the same scaling that drives language models.
- On health, MSL is building a “health super intelligence” effort and will collaborate closely with CZI. Wang sees equal global access to powerful health AI as a uniquely Meta-shaped delivery problem.
- Wang admires John Carmack but says nobody really knows what Carmack is currently working on. No band reunion announced.
- The mango model is “alive and kicking” despite rumors. Wang notes MSL gets a small fraction of the rumor-mill attention other labs get and feels sympathy for them.
- On model welfare, Wang says it is a serious topic that “nobody is talking about enough” given how integrated models have become as work partners. He references research, including from Eleos, that measures subjective experience of models.
- Wang’s critical-path technology list: super intelligence, robotics, brain computer interfaces. The infinite-scale primitives behind them are energy, compute, and robots.
- FAIR’s brain research program Tribe hit a milestone called Tribe B2: a foundation model that can predict how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization.
- Wang’s main philosophical break with Elon Musk: research itself is the primary activity. Building super intelligence is a research expedition through fog of war, and sequencing of bets really matters.
- Personal notes: Wang moved from San Francisco to the South Bay, treats Palo Alto as his city now, was a math olympiad competitor, says his favorite activities are reading sci-fi and walking in the woods, and bonds with Vance over country music.
Detailed Summary

How MSL Is Actually Organized

Meta Superintelligence Labs sits as the umbrella organization that Wang oversees. Inside it, TBD Lab is the large-model research group where the most discussed researchers and infrastructure engineers sit, and they technically report to Wang. PAR, Product and Applied Research, is led by Nat Friedman and owns deployment and product surfaces. FAIR continues to run exploratory science, including work on brain prediction models and a universal model for atoms used in computational chemistry. Sitting alongside MSL is Meta Compute, run by Daniel Gross, which owns the long-horizon GPU and data center plan that everything else relies on. Chief scientist Shengjia Zhao orchestrates the scientific agenda across the whole lab.

Why Wang Left Scale

Wang says progress in frontier AI has been faster than even insiders expected. Two structural beliefs pushed him toward Meta. First, the labs that actually train the frontier models are accruing disproportionate economic and product rights in the AI ecosystem. Second, compute is the dominant scarce input of the next phase, so the right mental model is to treat tech companies with compute as fundamentally different animals from companies without it. Meta has both, Zuck is “AGI pilled,” and the personal super intelligence memo Zuck published roughly a year ago became the shared north star.

The Diagnosis: Llama Was Off-Trajectory

When Wang arrived, the existing AI org needed a reset because Llama was not on the same trajectory as the frontier. The plan he laid out has four cultural principles. Take superintelligence seriously as a real near-term target. Make technical voices the loudest in the room. Demand scientific rigor and focus on basics. Make big bets. On top of that, three structural levers were used to set velocity. Push compute per researcher much higher than at larger labs where compute is diluted across too many efforts. Keep the team small and extremely cracked. Allocate a meaningful share of resources to ambitious, paradigm-shifting research bets rather than incremental refinement.

Recruiting, Soup, and the Mercenary Narrative

Wang argues the reporting on MSL hiring overstated the money story. Most of the people MSL recruited had strong financial paths at their previous employers, so individualized recruiting was more about computing access, talent density, and the ability to make big research bets. The recruitment blitz happened fast because Wang knew the team needed to exist “yesterday.” Asked about Mark Chen’s claim that Zuck made soup to recruit people, Wang refuses to confirm or deny who made it but agrees the process was intense and personal. Visitors from other labs reportedly tell Wang the MSL culture feels like early OpenAI or early Anthropic, which lands as the strongest endorsement he could ask for.

Receiving the Public Hits: Young, Inexperienced, Mercenary

LeCun called Wang young and inexperienced shortly after departing. The two reconnected in India a few weeks later and LeCun congratulated Wang on MuseSpark. Wang says the age critique has followed him since his earliest Silicon Valley days, so he barely registers it. Altman, asked off-camera by Vance about Wang’s appearance on the show, had nothing flattering to add. Wang’s response is to bet that as the field gets closer to actual super intelligence, the personal animosities will subside. Whether they will is, as Vance puts it, an open question.

MuseSpark as Appetizer, Not Entree

Wang is careful not to oversell MuseSpark. He calls it “the appetizer” and says it is an early data point on a deliberately constructed scaling ladder. MSL spent nine months rebuilding the pre-training stack, the reinforcement learning stack, the data pipeline, and the science before generating MuseSpark. The point of releasing it was to show that the new program scales predictably along multiple axes (pre-training, RL, test-time compute, and the recently demonstrated multi-agent scaling visible in MuseSpark’s 16-agent content planning mode). Wang says the upcoming larger models are what MSL is genuinely excited about and frames the next two rungs as much more interesting than the current release.

Token Efficiency Was the Surprise

MuseSpark’s strongest competitive signal is how few tokens it needs to match competitors on tasks like Artificial Analysis. Wang attributes this to having had the rare luxury of building a clean pre-training and RL stack from scratch with the right experts. He speculates that some competitor models compensate for upstream inefficiency by allowing the model to think longer, which inflates token usage without improving the underlying capability. If that read is right, MSL’s efficiency advantage should grow as models scale up.

Glasses, WhatsApp, and the Constellation of Devices

Personal super intelligence shows up at Meta as a constellation of devices that capture context across the user’s day. Ray-Ban Meta glasses are the headline product, with the AI seeing what you see and hearing what you hear, then offering proactive insight or doing background research. Wang acknowledges that even AI-fluent users like Kylie Robinson, who runs her business inside WhatsApp, have not naturally used Meta’s AI buttons in the family of apps. His answer is that Meta deliberately waited for models to be good enough before tightening cross-app integration, and that integration phase is starting now.

Country of Geniuses Versus Economy of Agents

Wang’s framing of Meta’s strategic position is the most memorable line in the interview. Where Dario Amodei talks about a country of geniuses in a data center, Wang wants to build an economy of agents in a data center. Meta uniquely sits on both sides of consumer and small-business surface area, with billions of consumers and hundreds of millions of small businesses already on the platforms. If MSL can build great agents for both, then connect them so they transact and coordinate, the platform becomes a substrate for an entirely new kind of digital economy.

Consumer Sentiment, Product Overhang, and the Trust Tax

Wang concedes consumer AI sentiment is poor and that everyday users have not yet had a personal Claude Code moment. He believes the only durable answer is to ship products that genuinely transform individual agency for non-developers and small business owners. Robinson notes that for the small-town restaurant whose website has not been updated since 2002, a working agent on the business side could be transformational. Vance pushes that Meta carries a bigger trust tax than any other lab, so the bar for shipping AI products that the public will accept is correspondingly higher. Wang accepts the framing and says the answer is to keep building thoughtfully.

Why MuseSpark Cannot Be Open Sourced Yet

Meta’s Advanced AI Scaling Framework set explicit guardrails around bio, chem, cyber, and loss-of-control risks. MuseSpark in its current form tripped some of those internal evaluations, documented in the preparedness report Meta published alongside the model. So MuseSpark itself is not safe to open source. MSL is, however, developing smaller versions and derived models intended for open release, with active reviews happening the day of the interview. Wang reaffirms the commitment to open source where safety allows and draws a line back to the Open Compute Project and the Sun Microsystems-era ethos of openness in infrastructure.

The Bosworth, Cox, and Manus Questions

The reporting that Wang and Zuck push toward best-in-the-world research while Bosworth and Cox push toward cheap product deployment is dismissed as gossip dressed up as journalism. Wang says leadership debates points hard but is aligned on needing top models, integrating them into Meta’s surfaces, and serving the existing business. On Manus, the Chinese AI startup that figured in Meta’s late-stage strategy, Wang says he cannot comment, which itself signals that the situation is unresolved.

China, National Security, and the Newspaper Ad

Wang draws a sharp distinction between the Chinese state and Chinese-born researchers. His parents are from China, he is happy to work with talented researchers regardless of origin, and he sees a flattening of nuance on this question inside Silicon Valley. At the same time, he stands by the New York Times AI and war ad he ran while at Scale, framing it as an early plea for the US government to take AI seriously as a national security technology. He thinks subsequent events, including DeepSeek and other shocks, validated that call and that policymakers now do treat AI accordingly.

Robotics and Physical Super Intelligence

Meta has acquired Assured Robot Intelligence, an AI software company that builds models for multiple hardware targets rather than its own robot. Wang argues that if you take digital super intelligence seriously, physical super intelligence quickly becomes the next logical milestone. Scaling laws for robotic intelligence look similar enough to language model scaling that having the largest compute footprint in the industry would be wasted if it were not also turned toward world modeling and embodied learning. He grants the metaverse-skeptic critique exists but says retreating from ambition is the wrong response to past misfires.

Health Super Intelligence and CZI

Wang names health super intelligence as one of MSL’s anchor initiatives. Because billions of people already use Meta products daily, Wang believes Meta is structurally positioned to put powerful health AI in the hands of equal global access in a way nobody else can. The work will involve close collaboration with the Chan Zuckerberg Initiative, which has its own multi-billion-dollar biotech and science investment program.

Model Welfare, Sci-Fi, and Brain Models

Two of the most distinctive moments come at the end. Wang flags model welfare as a topic he thinks is being undercovered relative to how integrated models now are in daily work. He is open to the idea that models may have measurable subjective experience worth weighing, and points to research efforts (including Eleos) trying to quantify it. He also reveals that FAIR’s Tribe program, with its Tribe B2 milestone, has produced foundation models capable of predicting how an unknown person’s brain would respond to images, video, and audio with reasonable zero-shot generalization, a building block toward future brain computer interfaces. Wang lists brain computer interfaces alongside super intelligence and robotics as the critical-path technologies for humanity, with energy, compute, and robots as the infinitely scaling primitives behind them.

Where Wang Diverges From Elon

Asked whether Musk is more all-in on robotics, energy, and BCI than anyone, Wang concedes the point but argues the details matter and sequencing matters more. Wang’s core philosophical break is that building super intelligence is fundamentally a research activity, not a scaling-only sprint. The lab is operating in fog of war, and ambitious experiments are the only way to map it. That conviction is what makes MSL a research-led organization rather than a brute-force compute farm.

Thoughts

The most strategically interesting move in this entire interview is the “economy of agents in a data center” framing. It is a deliberate reframe against Anthropic’s “country of geniuses” line, and it does real work. A country of geniuses is a labor-substitution story aimed at knowledge workers and code. An economy of agents is a marketplace story that maps directly onto Meta’s two-sided distribution advantage: billions of consumers on one side, hundreds of millions of small businesses on the other. That positioning makes the agentic future Meta-shaped in a way no other frontier lab can claim, because no other frontier lab also owns the demand and supply graph of the global small-business economy. If Wang’s team can actually ship reliable agents on both sides plus the rails for them to transact, Meta’s structural moat in agentic commerce could exceed anything Llama ever had as an open model.

The token efficiency claim is the strongest piece of technical evidence in the interview for the “clean stack” thesis. If MuseSpark really is matching competitors with materially fewer tokens, the implication is not that MuseSpark is the best model today, but that MSL has rebuilt the foundations with less accumulated tech debt than competitors that have layered fixes on top of older stacks. That is exactly the kind of advantage that compounds with scale. The next two model releases are the actual test. If Wang is right about predictable scaling on pre-training, RL, test-time, and multi-agent axes simultaneously, the gap from MuseSpark to the next rung should be visible in a way that forces re-rating of Meta’s position.

The open-source posture is the cleanest signal of how the safety conversation has actually changed in 2026. Meta, the lab most identified with open weights, is saying out loud that its current frontier model triggered enough internal guardrails that releasing the weights is off the table. Wang threads the needle by promising smaller open variants, but the underlying point is unmistakable: the open-weights bargain has limits, and those limits will be set by internal preparedness frameworks rather than community pressure. That is a real shift from the Llama 2 era and worth tracking as the next generation lands.

Wang’s willingness to engage on model welfare, on roughly the same footing as safety and alignment, is the second philosophical reveal worth flagging. It signals that the next generation of lab leadership is not going to dismiss the topic the way the previous generation often did. Whether that translates into product or policy changes is unclear, but the fact that the head of MSL says it is “underdiscussed” is itself a marker.

Finally, the human texture of the interview matters. Wang has clearly absorbed a lot of personal incoming fire over the past ten months, including from LeCun and Altman, and his answer is consistently to redirect to the work. The Steve Jobs quote about hiring people who tell you what to do is the operating slogan he keeps coming back to. Combined with the genuine enthusiasm for sci-fi, walks in the woods, and country music, the picture that emerges is less the salesman caricature his critics paint and more a young technical operator betting that scoreboard work over a multi-year horizon will settle every argument that text on X cannot.

Watch the full conversation here.
May 13, 2026
Krishna Rao on Anthropic Going From 9 Billion to 30 Billion ARR in One Quarter and the Compute Strategy Powering Claude
Krishna Rao, Chief Financial Officer of Anthropic, sat down with Patrick O’Shaughnessy on Invest Like the Best for one of the most detailed public looks yet at the operating engine behind Claude. He covers how Anthropic compounded from $9 billion of run rate revenue at the start of the year to north of $30 billion by the end of Q1, why he spends 30 to 40 percent of his time on compute, the playbook for buying gigawatts of AI infrastructure across Trainium, TPU, and GPU platforms, how Anthropic prices its models, why returns to frontier intelligence keep climbing, and what the Mythos release tells us about the cyber capabilities of the next generation of Claude.

TLDW

Anthropic is running the most compute fungible frontier lab in the world, with active deployments across AWS Trainium, Google TPU, and Nvidia GPU, and an internal orchestration layer that lets a chip serve inference in the morning and run reinforcement learning the same evening. Krishna Rao explains the cone of uncertainty that governs gigawatt scale compute procurement, the floor Anthropic refuses to drop below on model development compute, the Jevons paradox unlock from cutting Opus pricing, the 500 percent annualized net dollar retention from enterprise customers, the layer cake of long term deals with Google, Broadcom, Amazon, and the recent xAI Colossus tie up in Memphis, the phased release of the Mythos model in response to spiking cyber capabilities, the internal use of Claude Code to produce statutory financial statements and run a Monthly Financial Review skill, and why the team believes scaling laws are alive and well. The interview also covers fundraising history through Series D and Series E, the $75 billion already raised plus another $50 billion coming, talent density beating talent mass during the Meta poaching wave, and Rao’s belief that biotech and drug discovery represent the most exciting frontier for AI.

Key Takeaways
- Anthropic entered the year with about $9 billion of run rate revenue and ended the first quarter with north of $30 billion of run rate revenue, a more than 3x leap driven by model intelligence gains and the products built around them.
- Compute is described as the lifeblood of the company, the canvas everything else is built on, and the most consequential class of decisions Rao makes. Buy too much and you go bankrupt. Buy too little and you cannot serve customers or stay at the frontier.
- Rao spends 30 to 40 percent of his time on compute, even today, and the leadership team meets repeatedly on both procurement and ongoing compute allocation.
- Anthropic is the only frontier language lab actively using all three major chip platforms in production: AWS Trainium, Google TPU, and Nvidia GPU. It is also the only major model available on all three clouds.
- Flexibility is the central design principle. Anthropic builds flexibility into the deals themselves, into the orchestration layer that maps workloads to chips, and into compilers built from the chip level up.
- The cone of uncertainty frames procurement. Small differences in weekly or monthly growth compound into wildly different two year outcomes, so the team plans across a range of scenarios rather than a single point estimate, and ranges toward the upper end while protecting downside.
- Compute allocation across the company sits in three buckets: model development and research, internal employee acceleration, and external customer serving. A non negotiable floor protects model development even when customer demand is tight.
- Anthropic estimates that if it cut off internal employee use of its own models, the freed compute could serve billions of dollars of additional revenue. It chooses not to, because internal use compounds into better future models.
- Intelligence is multi dimensional, not a single IQ score. Anthropic measures real world capability through customer feedback, long horizon task performance, tool use, computer use, and speed at agentic tasks, not just leaderboard benchmarks that have largely saturated.
- Each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers both capability improvements and an efficiency multiplier on token processing. New models often serve customers at a fraction of the prior cost while doing more.
- Reinforcement learning is described as inference inside a sandbox with a reward function, so model efficiency gains directly improve internal RL throughput. The flywheel is tightly coupled.
- Over 90 percent of code at Anthropic is now written by Claude Code, and a large share of Claude Code itself is written by Claude Code.
- Anthropic shipped roughly 30 distinct product and feature releases in January and the pace has accelerated since.
- Scaling laws, in Anthropic’s internal data, are alive and well. The team holds itself to a skeptical scientific standard and still does not see them slowing down.
- Anthropic recently signed a 5 gigawatt deal with Google and Broadcom for TPUs starting in 2027, plus an Amazon Trainium agreement for up to 5 gigawatts, totaling more than $100 billion in commitments. A significant portion lands this year and next year.
- A new partnership for capacity at the xAI Colossus facility in Memphis was announced just before the interview, aimed at expanding consumer and prosumer capacity.
- Pricing has been remarkably stable across Haiku, Sonnet, and Opus. The biggest deliberate change was lowering Opus pricing, which produced a textbook Jevons paradox: consumption rose far faster than the price drop, and the new Opus 4.6 and 4.7 slot in at the same price point.
- Mythos is the first model Anthropic chose to release in a phased way because of a sharp spike in cyber capability. In an open source codebase where a prior model found 22 security vulnerabilities, Mythos found roughly 250.
- The Mythos release framework focuses on defensive use first, expands access over time, and is presented as a template for future capability spikes.
- Anthropic now sells to 9 of the Fortune 10 and reports net dollar retention above 500 percent on an annualized basis. These are not pilots. Rao describes signing two double digit million dollar commitments during a 20 minute Uber ride to the studio.
- The platform strategy is mostly horizontal. Anthropic will go vertical with offerings like Claude for Financial Services, Claude for Life Sciences, and Claude Security where it can demonstrate the model’s capabilities, but expects most application value to accrue to customers building on top.
- Investors raised over $75 billion in equity since Rao joined, with another $50 billion in commitments tied to the Amazon and Google deals. Capital intensity is real, but the raises fund the upper end of the cone of uncertainty more than they fund current losses.
- The Series E close coincided with the day the DeepSeek news broke, forcing investors to reassess their AI thesis in real time. Anthropic closed the round anyway.
- Inside finance, Claude now produces statutory financial statements for every Anthropic legal entity, with a human checker. A library of more than 70 finance specific skills underpins workflows.
- A custom Monthly Financial Review skill produces a 90 to 95 percent ready monthly close report, so leadership discussion shifts from reconciling numbers to debating implications.
- An internal real time analytics platform called Anthrop Stats compresses weekly insight cycles from hours to about 30 minutes.
- The biggest token user inside Anthropic’s finance team is the head of tax, focused on tax policy engines and workflow automation. The most senior people, not the youngest, are leading internal adoption.
- Talent density beats talent mass. When Meta and others ran aggressive offer waves, Anthropic lost two people while peer labs lost dozens.
- All seven Anthropic co founders remain at the company, as does most of the first 20 to 30 employees, which Rao credits to a collaborative, transparent, debate friendly culture and a real culture interview that can veto otherwise top tier candidates.
- Dario Amodei holds an open all hands every two weeks, writes a short prepared document, and takes unscripted questions from anyone at the company.
- AI safety investments in interpretability and alignment have a commercial side effect. Looking inside the model helps Anthropic build better models, and enterprises selling sensitive workloads want to trust the lab they hand customer data to.
- Anthropic explicitly identifies as America first in its approach to model development, and engages closely with the US administration on capability releases such as Mythos.
- The longer term product vision is the virtual collaborator: an agent with organizational context, access to the company’s tools, persistent memory, and the ability to work on ideas, not just tasks, over long horizons.
- CoWork, Anthropic’s extension of the Claude Code paradigm into general knowledge work, is being adopted faster than Claude Code itself when indexed to the same point in its launch curve.
- Anthropic’s product teams ship daily, with a fleet of agents working across the company on specific tasks. Everyone effectively becomes a manager of agents.
- The dominant downside risks to Anthropic’s high end forecast are slower customer diffusion of model capability into real workflows, scaling laws flattening unexpectedly, and Anthropic losing its position at the frontier.
- Rao is most excited about biotech and healthcare outcomes, especially the prospect that AI could push drug discovery and lab throughput up 10x or 100x, turning currently incurable diagnoses into treatable ones within a patient’s lifetime.
Detailed Summary

Compute as Lifeblood and the Cone of Uncertainty

Rao opens with the claim that compute is the most important resource at Anthropic, and the most consequential decision class in the company. You cannot buy a gigawatt of compute next week. You have to anticipate demand a year or two in advance, and the cost of being wrong in either direction is high. Buy too much and the unit economics collapse. Buy too little and you cannot serve customers or stay at the frontier, which are described as the same failure mode. To navigate this, the team uses a cone of uncertainty rather than point estimates. Small differences in weekly growth compound into vastly different two year outcomes, and Anthropic tries to position itself toward the upper end of that cone while preserving optionality. Rao notes he has had to consciously break a lifetime of linear thinking and force himself into exponential models.

Three Chip Platforms, One Orchestration Layer

Anthropic uses Amazon’s Trainium, Google’s TPUs, and Nvidia’s GPUs fungibly. That was not free. Adopting TPUs at scale started around the third TPU generation, when outside observers thought it was a strange choice. Anthropic invested years into compilers and orchestration so workloads can flow across chips by generation and by job type. The team works deeply with Annapurna Labs at AWS to influence Trainium roadmaps because Anthropic stresses these chips harder than almost anyone. The result is what Rao believes is the most efficient utilization of compute across any frontier lab, with a dollar of compute going further inside Anthropic than anywhere else.

Three Buckets and the Model Development Floor

Compute gets allocated across model development, internal acceleration of employees, and customer serving. The conversations are collaborative rather than zero sum, but there is a hard floor on model development that the company refuses to cross even if it makes customer demand harder to serve in the short term. The thesis is simple. The returns to frontier intelligence are extremely high, especially in enterprise, so cutting model investment to chase near term revenue is a bad trade. Internal employee use is also explicitly protected. Rao notes that diverting that internal usage to external customers would unlock billions of additional revenue today, but the compounding benefit of accelerating researchers and engineers outweighs that.

Intelligence Is Multi Dimensional

Rao pushes back hard on the IQ framing of model progress. Benchmarks saturate quickly, and the real signal comes from how customers actually use the models. Anthropic looks at long horizon task completion, tool use, computer use, and time to result on agentic tasks. Two equally capable agents who differ only in speed produce dramatically different value, because the faster one compounds into more attempts and more outcomes. Frontier model leaps are also fuel efficient. The sedan to sports car analogy breaks down because each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers a step up in capability and a multiplier on per token efficiency.

From 9 Billion to 30 Billion ARR in One Quarter

The headline number for the quarter is a leap from about $9 billion of run rate revenue to over $30 billion, accomplished without onboarding a corresponding step up in compute, because new compute lands on ramps locked in 12 months prior. Rao attributes the leap to model capability gains, products that surface that intelligence in usable form factors, and an enterprise customer base that pulls more workloads onto Claude as each generation unlocks new use cases. Coding started the wave with Sonnet 3.5 and 3.6, and the same pattern is now playing out elsewhere in the economy.

Recursive Self Improvement and Talent Density

Over 90 percent of Anthropic’s code is now written by Claude Code, including most of Claude Code itself. Rao describes this as a structural reason to keep allocating internal compute to employees even when external demand is hungry. Recursive self improvement is not happening through models that need no humans. It is happening through researchers who set direction and use frontier models to compress months of work into days. Talent density beats talent mass. When Meta and other labs went after Anthropic researchers with very large packages, Anthropic lost two people while peer labs lost dozens.

Procurement Strategy and the Layer Cake

Compute lands as a layer cake. Last month Anthropic signed a 5 gigawatt TPU deal with Google and Broadcom starting in 2027, alongside an Amazon Trainium agreement for up to 5 gigawatts. The total is north of $100 billion in commitments. A new tie up with xAI’s Colossus facility in Memphis was announced just before the interview, intended for nearer term capacity to support consumer and prosumer growth. Anthropic evaluates near term and long term compute deals against the same set of variables: price, duration, location, chip type, and how efficiently the team can run it. The relationships are deeper than procurement. The hyperscalers are also distribution channels for the model.

Platform First, Selective Vertical Bets

Rao describes Anthropic as a platform first business, with most expected value accruing to customers building on the platform. The team will only go vertical when it can either demonstrate capabilities that are skating to where the puck is going, like Claude Code did before the models could fully support it, or when it wants to set a template for an industry vertical, as with Claude for Financial Services, Claude for Life Sciences, and Claude Security. He acknowledges that surprise capability jumps make customers anxious about the platform competing with them, and frames Anthropic’s mitigation as deeper partnerships, early access programs, and an emphasis on accelerating customer building rather than disintermediating it.

Pricing, Jevons Paradox, and Return on Compute

Pricing across Haiku, Sonnet, and Opus has been stable. The notable exception is Opus, which Anthropic deliberately repriced lower when launching Opus 4.5 because Opus class problems were being squeezed into Sonnet workloads. Efficiency gains made it possible to serve Opus profitably at the new level. The consumption response was a classic Jevons paradox, with usage rising far more than the price reduction would have predicted, and Opus 4.6 then slotted in at the same price with a capability bump. Margins are not framed as a per token markup. Compute is fungible across model development, internal acceleration, and customer serving, so Anthropic measures return on the entire compute envelope rather than software style variable cost per call.

Fundraising, DeepSeek, and Capital Intensity

Rao joined while Anthropic was closing its Series D, mid frontier model launch and during the FTX share liquidation. Investors initially questioned whether Anthropic needed a frontier model, whether AI safety and a real business could coexist, and why the sales team was so small. The Series E closed the same day the DeepSeek news broke, with markets violently re pricing AI in real time. Since Rao joined, Anthropic has raised over $75 billion, with another $50 billion tied to the Amazon and Google compute deals. The reason for the size of the raises is the cone of uncertainty, not current losses. Returns on compute today are described as robust.

Mythos, Cyber Capability, and Phased Releases

The Mythos release marks the first time Anthropic shipped a model under a deliberately phased rollout because of a specific capability spike. Cyber is the dimension that spiked. Where a prior model found 22 vulnerabilities in an open source codebase, Mythos found roughly 250. The defensive applications, automatically patching massive codebases, are genuinely valuable, but the offensive risk is real enough that Anthropic chose to release to a smaller group first and expand access over time. Rao positions this as a template for future capability spikes, not a permanent restriction. He also describes the relationship with the US administration as cooperative, including the Department of War interaction, with Anthropic supporting a regulatory framework that does not strangle innovation but takes responsibility seriously.

Claude Inside Finance

Anthropic’s finance team is one of the strongest internal case studies. Statutory financial statements for every legal entity are produced by Claude, with a human reviewer. A skill library of more than 70 finance specific skills underpins a Monthly Financial Review skill that drafts the monthly close at 90 to 95 percent ready, so leadership meetings shift from explaining the numbers to discussing what to do about them. An internal analytics platform called Anthrop Stats compresses weekly insight cycles from hours to 30 minutes. The biggest internal token user in finance is the head of tax, building policy engines, which Rao highlights as evidence that adoption is driven by the most senior people, not just younger engineers.

Culture, Co Founders, and the Race to the Top

Seven co founders should not, on paper, work as a leadership group. Rao argues it works because the culture was set early around collaboration, intellectual honesty, transparency, and humility. The culture interview is a real veto, not a checkbox. Dario Amodei runs an all hands every two weeks with a short written piece followed by unscripted questions, and decisions, once made, get clean alignment rather than residual politics. Anthropic frames its approach as a race to the top, where being a model for how to build the technology responsibly is itself a recruiting and retention advantage.

The Virtual Collaborator and the Frontier Ahead

The product vision Rao describes is the virtual collaborator. Not just a smarter chatbot, but an agent with organizational context, access to the company’s tools, memory, and the ability to work on ideas over long horizons. Coding was the first domain to feel this, but CoWork, Anthropic’s extension of the Claude Code pattern into general knowledge work, is being adopted faster than Claude Code was at the same age. Product development inside Anthropic already looks different. Teams ship daily, with fleets of agents working across the company, and individual humans increasingly act as managers of those fleets.

Downside Risks and What Excites Him Most

The three risks Rao names if asked to do a premortem on a softer year are slower customer diffusion of model capability into real workflows, scaling laws unexpectedly flattening, and Anthropic losing its frontier position to competitors. None of these are observed today, but he is unwilling to claim them with certainty. On the upside, he is most excited about biotech and healthcare. Lab throughput rising 10x or 100x, paired with AI assisted clinical workflows, could turn currently incurable diagnoses into treatable ones within a patient’s lifetime. That is the outcome he wants the technology to chase.

Thoughts

The most consequential structural point in this interview is the framing of compute as a single fungible resource pool measured by return on the entire envelope, not as a variable cost per inference call. That accounting shift, if you accept it, breaks most of the bear cases about AI lab unit economics. The bear argument almost always assumes that a token served to a customer is the only thing the chip did that day. Rao’s version is that the same fleet trains models in the morning, runs reinforcement learning at lunch, serves customers in the afternoon, and accelerates internal engineers in the evening. If even half of that is real, the right comparison is total compute spend versus total enterprise value created by the platform, and on that ratio Anthropic looks structurally strong rather than weak.

The Jevons paradox on Opus pricing is the most actionable insight for anyone running an AI product. Most teams default to either chasing premium pricing on the newest model or undercutting to chase volume. Anthropic did something more disciplined: it left Sonnet and Haiku alone, dropped Opus when efficiency gains made it serveable, and watched aggregate usage rise faster than the price cut. The lesson is that frontier model pricing is not really a price problem. It is a capability access problem, and elasticity around the right tier is much higher than the standard SaaS playbook implies.

The Mythos cyber jump deserves more attention than it has gotten. Going from 22 to 250 vulnerabilities found in the same codebase is the kind of capability discontinuity that genuinely changes the regulatory calculus. Anthropic is signaling that it can identify these discontinuities ahead of release and choose a deployment shape that respects them. Whether peer labs adopt similar discipline is the open question. Anthropic’s race to the top framing assumes they will be forced to. The competitive market may say otherwise.

The hiring data point is the most underrated investor signal. Two departures while peer labs lost dozens, during the most aggressive talent war in tech history, is not a culture poster. It is a structural advantage that compounds every time another lab tries to buy its way to the frontier. Money can be matched. Conviction in the mission, transparent leadership, and a culture interview that can veto otherwise stellar candidates cannot. If you believe scaling laws hold, talent retention at this density is one of the few moats that actually scales with capital.

Finally, the most interesting personal admission is that Krishna Rao, a finance leader trained at Blackstone and Cedar, is openly telling investors that linear thinking is the failure mode he had to break out of. The companies that pattern match this moment to prior technology waves are mispricing it, in both directions. The cone of uncertainty Anthropic uses internally is the right metaphor for everyone else too. If you are forecasting AI as if it is cloud in 2010, you are almost certainly wrong, and the magnitude of the error is much larger than it would be in any prior era.

Watch the full conversation with Krishna Rao on Invest Like the Best here.
May 13, 2026