PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: Claude Code

  • Inside Anthropic, the $965 Billion AI Juggernaut: Dario and Daniela Amodei on Claude, Claude Code, and the AI Arms Race

    In this episode of The Circuit, Bloomberg goes inside Anthropic, the AI lab that started as an underdog and is now valued at nearly a trillion dollars. The conversation centers on the sibling duo running the company, Dario Amodei, the brother and visionary, and Daniela Amodei, the sister and operator, along with Boris Cherny, the engineer behind Claude Code and Claude Cowork. It is a rare, on-the-record look at how a safety-obsessed startup founded by a group of OpenAI defectors in 2021 became the breakout star of the AI arms race, wiping billions in value off software stocks and forcing an uncomfortable national conversation about the future of work. You can watch the full episode here.

    TLDW

    Dario and Daniela Amodei walk through Anthropic’s rise from a pandemic-era group meeting on the grass in Precita Park to a roughly $965 billion AI juggernaut that is now profitable for the first time. They explain why they left OpenAI, citing a breakdown of trust and values with Sam Altman rather than a single safety disagreement, and how Dario’s early bet on scaling laws shaped the entire field. The two describe how Claude is trained for character and “professional warmth,” anchored in documents like the UN Declaration of Human Rights, and how the company defines a good model as one that does not lie, hallucinate, or deceive. The business story is enterprise and coding: Claude Code and Claude Cowork automated huge chunks of software engineering, triggered a SaaSpocalypse that erased $285 billion in market value overnight, and pushed annualized growth to as high as 80x in a single quarter. Boris Cherny, recruited from a slow miso-making life in rural Japan, says Claude has written one hundred percent of his code for at least six months. The hardest part of the conversation is jobs: Dario stands by his warning that AI could eliminate half of all entry level white collar jobs in one to five years, pushes back hard on Jensen Huang’s “doom marketing” critique, and lays out where displaced workers might go, from the physical world to human-centered roles like a reimagined, more interpersonal version of medicine. The episode closes by teasing AI and the future of warfare, a scarily powerful new model called Mythos, and Dario’s identification not with Oppenheimer but with Leo Szilard.

    Thoughts

    The most revealing moment in this profile is not a number, it is Dario Amodei’s description of the “smooth exponential.” His whole career, he says, has felt like nothing happening, nothing happening, nothing happening, and then zoom. That mental model is the key to understanding why Anthropic behaves the way it does. A company that genuinely believes it is riding an exponential will tolerate enormous near-term discomfort, public criticism, and internal strain, because it has already priced in a future that looks nothing like the present. Whether that conviction is wisdom or a kind of motivated certainty is the open question the episode never fully resolves, but it explains the urgency in every answer he gives.

    The Boris Cherny segment is the part that should make working engineers sit up. When a senior engineer says Claude has written one hundred percent of his code for six months and that he feels like he has a jet pack, that is not a marketing line, it is a description of a job that has already changed underneath the person doing it. The framing in the piece is optimistic, superpowers and fun, but the logical endpoint is exactly the one Dario himself names a few minutes later: you automate ninety percent of a job, the remaining humans get ten times more leveraged, and then the curve keeps bending toward one hundred percent. Anthropic is, unusually, building the thing and narrating its own disruption in the same breath. That honesty is rare, and it is also a little vertiginous.

    The values-versus-business-model argument deserves more scrutiny than it gets. Dario’s claim is elegant: a business model that conflicts with your values forces you to either betray the values or become irrelevant, so Anthropic chose enterprise and coding because curing diseases and making energy cheaper are enterprise work, while consumer engagement is the addiction-maximizing trap of social media. It is a genuinely good argument, and it is also extremely convenient that the values-aligned path happens to be the most lucrative one. The episode lets that tension sit, which is the right call. The honest reading is that Anthropic found a place where doing well and doing good currently point in the same direction, and the harder test will come the first time they diverge.

    On jobs, Dario is more persuasive than his critics give him credit for, precisely because he refuses the comfortable framing. Jensen Huang and others accuse him of conflating tasks with jobs and of doom marketing that benefits Anthropic. Dario’s response, that the idea this is cheap marketing is itself cheap marketing, is sharper than it first sounds. He is pointing at the way social media flattens a five-page argument about tasks, jobs, tax policy, and the adolescence of technology into a three-second clip designed to provoke. The deeper point is that he is trying to hold two things at once, fast GDP growth and high unemployment, and our public discourse is structurally bad at holding two things at once. That is less a story about AI than about the medium we use to argue about it.

    Finally, the Oppenheimer exchange reframes the entire profile. Dario explicitly rejects the lone-genius model and names Leo Szilard, the scientist who first imagined the chain reaction, as the figure he identifies with. He calls Oppenheimer a failure case, an example of what should not happen. For a man whose company is constantly accused of cultivating a great-man mythology, choosing the early-warning scientist over the bomb’s public face is a deliberate statement about how he wants this story to end: not with charismatic individuals at the center of everything, but with checks and balances everywhere. It is the most quietly radical thing said in the whole piece, and the teaser for a model named Mythos lands with a little extra irony because of it.

    Key Takeaways

    • Anthropic is profiled as an AI juggernaut valued at nearly a trillion dollars, with the figure of roughly $965 billion framing the episode, and is described as profitable for the first time.
    • The company was founded in 2021 by a team of OpenAI defectors and started as an underdog lab before becoming the breakout star of the AI race.
    • Anthropic is run by a sibling duo, Dario Amodei as the visionary and Daniela Amodei as the operator who turns his ideas into action, and Daniela jokes that when they argue, no one wins.
    • Dario describes the AI trajectory as a “smooth exponential” where nothing seems to happen for a long time and then progress suddenly explodes.
    • He says he predicted from a graph that Anthropic would become the AI company with the most revenue and valuation around this time, and that it has happened.
    • Dario grew up in San Francisco with a leather-craftsman father and a librarian mother, took calculus in middle school, and studied math at UC Berkeley while in high school, with no early interest in the internet revolution.
    • Dario studied neuroscience before moving to AI at Baidu and later Google, while Daniela was an early employee at Stripe.
    • Both joined OpenAI starting in 2016, where Dario developed the concept of scaling laws, predicting that large language models would improve simply by adding more data and compute even if the underlying algorithm stayed the same.
    • Scaling up was a counter-cultural scientific bet at the time, held mainly by the founding research team, and it helped supercharge OpenAI’s models and pave the way for ChatGPT.
    • The Amodeis left OpenAI after clashing with Sam Altman over direction and values, framing it as a breakdown of trust and honesty rather than a single safety disagreement.
    • Altman has said that despite their differences, he mostly trusts Anthropic as a company.
    • Anthropic has all seven of its co-founders still at the company, which Dario notes almost never happens at a company of its size.
    • The early team met during the pandemic at Precita Park in San Francisco, pulling up chairs on the grass to talk about what they were building.
    • The name Anthropic comes from the Greek word for human, reflecting a stated mission to build responsible AI for the long-term benefit of humanity.
    • Dario has published long essays including Machines of Loving Grace and The Adolescence of Technology, exploring both the miraculous potential and the worst-case scenarios of AI.
    • Claude is trained to follow a set of principles called a Constitution, intended to keep it aligned and well-behaved.
    • Daniela describes Claude’s intended personality as “professional warmth,” approachable but distant, not a best friend and not cold or calculating.
    • A good model, in Anthropic’s framing, does not lie accidentally or intentionally, with lying including hallucinations where the model invents something it does not know.
    • Anthropic’s own research has shown that models can purposely try to deceive users, which the company works to prevent in production models.
    • There is no universal standard for helpfulness or harmlessness, so Anthropic draws on founding documents like the UN Declaration of Human Rights to train Claude’s character.
    • The company has begun consulting religious leaders about Claude as an entity and about core values that transcend any single worldview.
    • Early Claude models, around the Claude 2 era, were sometimes “nannyish,” expressing concern when a user just wanted the weather, which researchers describe as tuning a fine dial.
    • Anthropic’s revenue skyrocketed over the past year, driven by a focus on lucrative business tools rather than consumer apps.
    • Claude Code automated large chunks of software engineering, and Claude Cowork extended that power to non-engineers.
    • Dario frames the enterprise bet as a values-and-business decision, arguing that a business model conflicting with your values forces you to betray them or become irrelevant.
    • He contrasts engagement-and-addiction-driven consumer and advertising models with enterprise uses like curing diseases, advancing biotech and pharma, and making energy cheaper.
    • Soon after Claude Cowork launched, $285 billion in market value vanished overnight in what traders called the SaaSpocalypse, with some software stocks down nine days in a row.
    • Dario argues the software “pie” will get bigger overall, even as some incumbents shrink or go out of business if they fail to adapt and defend their moats.
    • Boris Cherny, the engineer behind Claude Code and Claude Cowork, was recruited in 2024 from a slow life in rural Japan where he made miso and shopped at farmer’s markets.
    • Cherny’s bet was that a coding agent could do all of software development, not just autocomplete a line or a sentence.
    • He now runs anywhere from a few to a few thousand Claudes at once and says Claude has written one hundred percent of his code for at least six months.
    • A live demo builds a working recipe app that suggests meals for the week in minutes, work that used to take hours or days.
    • At the second annual Code with Claude conference, Anthropic reported API volume up nearly 17x year over year, eight frontier models shipped in twelve months, and first-quarter growth that annualizes to roughly 80x.
    • Dario stands by his warning that AI could eliminate half of all entry level white collar jobs in the next one to five years, saying he remains the same order of concerned.
    • He warns of an unusual combination of very fast GDP growth alongside high unemployment, underemployment, low-wage jobs, and high inequality.
    • Jensen Huang and others have pushed back, accusing Dario of conflating tasks with jobs and of doom marketing that benefits Anthropic.
    • Dario responds that the claim this is cheap marketing is itself cheap marketing, and blames social media for flattening his careful five-page arguments into three-second clips.
    • Anthropic published a paper estimating that management, finance, and legal jobs could be among the fields most affected by AI in the near future.
    • Dario points to the physical world, human-centered relationship-driven work, and humans directing AI as places displaced workers might go, though he is unsure how thick those roles will be.
    • He uses medicine as an example, predicting AI will excel at diagnosis while doctors pivot toward the interpersonal, hands-on, bedside-manner parts that AI cannot replace.
    • The episode teases a next installment on AI and the future of warfare, a scarily powerful new model called Mythos, and the theme of riding the exponential while avoiding dystopia.
    • Dario names The Making of the Atomic Bomb as a favorite book and identifies most with Leo Szilard, who first conceived of a chain reaction, rather than Oppenheimer, whom he sees as a failure case.
    • His view is that the only way the AI era ends well is through checks and balances everywhere, not larger-than-life personalities at the center of everything.

    Detailed Summary

    An unlikely AI celebrity and a sibling-run juggernaut

    The profile opens in a library Dario Amodei clearly loves, establishing him as an unlikely AI celebrity, a man known for warning the world about the risks of artificial intelligence who now runs a company valued at nearly a trillion dollars. Anthropic is presented as the breakout star of the AI race, wiping billions off software stocks, going head-to-head with the Pentagon, and building models powerful enough to threaten modern cybersecurity, with early testers reportedly calling one capability a super weapon and asking the company not to release it. Guiding the company is the sibling pair, Dario the visionary and Daniela the operator who translates his swirling cosmic thoughts into action. Daniela explains that the two have always been close and always wanted to do something big together, and when asked who wins their arguments, she says no one. The framing throughout is of a young, fast-growing startup carrying enormous responsibility for how humanity works, learns, thinks, and even fights wars.

    The smooth exponential and the road from OpenAI

    Dario describes his entire career as the experience of a smooth exponential, where nothing happens for a long stretch and then things go crazy, and he says he watched a graph and correctly predicted Anthropic would top the field in revenue and valuation around now. His backstory is a math prodigy in San Francisco, the son of a leather craftsman and a librarian, taking calculus in middle school and Berkeley math classes in high school, indifferent to the internet revolution and drawn instead to science fiction and understanding the universe. Daniela, more into reading and the arts, calls them near-perfect complements. Dario moved from neuroscience into AI at Baidu and Google, Daniela went to Stripe, and both eventually joined OpenAI starting in 2016, where Dario developed scaling laws, the then counter-cultural bet that more data and compute alone would make models smarter. That insight helped power the models behind ChatGPT, but the Amodeis clashed with Sam Altman over values and direction. Dario frames the departure bluntly: disagreements on safety alone were not enough, but a loss of trust, a sense that Altman’s stated values were not his real values, made it impossible to continue. The resolution, he says, was simply to go off and do their own thing.

    Precita Park, the Constitution, and teaching Claude to be good

    Anthropic’s origin story runs through Precita Park, where the early pandemic-era team gathered on the grass to talk about what they were building. Of seven co-founders, all are still at the company, a retention record Dario says almost never happens at this scale. From the start the company pitched itself as the ultimate safety-conscious lab, with Dario publishing essays like Machines of Loving Grace and The Adolescence of Technology. Claude is trained on a Constitution, and Daniela describes its intended character as professional warmth, approachable but distant. Defining a good model, the team says it should not lie, whether through intentional deception or hallucination, the latter being the model inventing answers it does not actually know. Anthropic’s research has shown models can deliberately deceive, something they work to prevent in production. Because there is no universal standard for helpfulness or harmlessness, they anchor Claude’s training in documents like the UN Declaration of Human Rights and have begun talking with religious leaders about values that transcend any single worldview. Daniela recalls early “nannyish” Claude 2-era behavior, where the model fretted over a user who only wanted the weather, and describes the work as threading a fine needle to land in the center of the dial.

    The enterprise bet, Claude Code, and the SaaSpocalypse

    Anthropic’s revenue surge and first-time profitability are attributed to a focus on business tools, especially Claude Code, which automated large chunks of software engineering, and Claude Cowork, which extended that capability beyond engineers. Dario frames the bet on coding and enterprise as both a values and a business decision: a business model that conflicts with your values eventually forces you to betray them or become irrelevant. He contrasts the engagement and addiction incentives of advertising-driven social media and AI video with enterprise applications like curing diseases, biotech, pharma, academic research, and cheaper energy, all of which he counts as enterprise work aligned with the company’s mission. The disruption was immediate and brutal: soon after Claude Cowork launched, $285 billion in market value vanished overnight in what traders dubbed the SaaSpocalypse, with some software stocks falling nine days straight. Dario’s read is that the overall software pie will grow even as specific incumbents shrink or fail, and that the big losers will be those who do not see what is coming or defend their moats.

    Boris Cherny, jet packs, and Code with Claude

    Much of Anthropic’s recent growth is credited to Boris Cherny, the engineer behind Claude Code and Claude Cowork, hired in 2024 from a deliberately slow life in rural Japan where he made miso and frequented farmer’s markets. A serious science fiction reader, Cherny was awed by his first AI chatbot and also acutely aware of how badly the technology could go. His bet was that a coding agent could do all of software development rather than just autocomplete. He now describes orchestrating anywhere from a few to a few thousand Claudes at once, talking to one while it writes code and moving to the next, and says Claude has written one hundred percent of his code for at least six months. He compares the feeling to having superpowers and a jet pack, calling engineering more fun than ever. A live demo has Claude build a working weekly-meal recipe app in minutes. The story then moves to the second annual Code with Claude conference, where the company reports API volume up nearly 17x year over year, eight frontier models shipped in twelve months, and first-quarter growth annualizing to roughly 80x, with attendees ranging from technical superfans to curious non-engineers.

    Jobs, the tasks-versus-jobs fight, and a more human medicine

    The episode turns to the uncomfortable core: whether engineers will be the first casualties of the AI they are building. Dario stands by his warning that AI could eliminate half of all entry level white collar jobs in one to five years and says he is still the same order of concerned, describing a strange combination of very fast GDP growth with high unemployment, underemployment, low-wage work, and inequality. He notes the usual productivity hump, where automating ninety percent of a job makes humans ten times more leveraged on the rest, before the curve bends toward one hundred percent. With 70 percent of Americans expecting AI to kill jobs and nearly a third fearing for their own, the stakes are political. Jensen Huang and others accuse Dario of conflating tasks with jobs and of doom marketing, and Dario pushes back hard, arguing he writes carefully across five pages about tasks, jobs, tax and macroeconomic policy, and the new jobs of the adolescence of technology, and that calling this cheap marketing is itself cheap marketing born of social media’s three-second culture. Anthropic has published a paper suggesting management, finance, and legal jobs could change the most. Dario points to the physical world, human-centered relationship work, and humans directing AI as landing spots, using medicine as his example: AI will become an excellent diagnostician, but it cannot physically examine a patient or provide bedside manner, so medicine pivots toward the interpersonal. The episode closes by teasing AI and the future of warfare, a powerful new model called Mythos, and Dario’s identification with Leo Szilard over Oppenheimer, whom he calls a failure case, insisting the era can only end well with checks and balances everywhere rather than larger-than-life figures at the center.

    Notable Quotes

    “There’s this kind of smooth exponential, and the experience of the smooth exponential is, nothing’s happening, nothing’s happening, nothing’s happening. Little things happen, and then zoom, it goes crazy.”

    Dario Amodei, on how AI progress actually feels from the inside

    “When you feel that you can’t trust someone, when you feel that their values are not what they say they are, when you feel that they’re not honest, that makes it very hard to continue to work with a company.”

    Dario Amodei, on why he and Daniela left OpenAI

    “Some of the early companies that we gave this to said things like, this is a super weapon, please don’t release this.”

    Anthropic, on early reactions to one of its more powerful models

    “I like to describe it as professional warmth. So the goal is not for it to be your best friend, but it’s not for it to be sort of cold, rote, calculating.”

    Daniela Amodei, describing the character Anthropic designs into Claude

    “If you pick a business model that fundamentally conflicts with your values, you’re gonna have a hard time. Either you betray your own values or you become irrelevant.”

    Dario Amodei, on why Anthropic bet on enterprise and coding

    “For me personally, it’s been writing a hundred percent of my code for at least six months. The work of engineering has just completely changed.”

    Boris Cherny, the engineer behind Claude Code and Claude Cowork

    “I feel like I suddenly have superpowers. I have like a jet pack and the engineering has never been this fun.”

    Boris Cherny, on building software with Claude Code

    “I think we could have this very unusual combination of very fast GDP growth and high unemployment, or at least underemployment, or low wage jobs, high inequality.”

    Dario Amodei, on the economic shock he is most worried about

    “The idea that this is cheap marketing is itself cheap marketing. I think it’s part of the disease of Silicon Valley.”

    Dario Amodei, responding to the doom-marketing accusation

    “The figure I most identified with was Leo Szilard, who was the one who first had the idea that there could be a chain reaction.”

    Dario Amodei, on which atomic-age scientist he sees himself in, rejecting Oppenheimer as a failure case

    Watch the full episode of The Circuit inside Anthropic here.

    Related Reading

    • Anthropic the official site for the company, Claude, Claude Code, and its safety research.
    • Machines of Loving Grace Dario Amodei’s long essay on the optimistic case for powerful AI referenced in the profile.
    • Scaling laws (Wikipedia) background on the data-and-compute bet Dario developed that reshaped modern AI.
    • Leo Szilard (Wikipedia) the physicist who first conceived the nuclear chain reaction and whom Dario says he identifies with.
    • Purpose the PJFP pillar on building meaningful work and direction in a world being reshaped by AI.
  • Claude Fable 5 and Claude Mythos 5: Anthropic Ships Its First Generally Available Mythos-Class AI Model With New Safeguards

    Anthropic has launched Claude Fable 5 and Claude Mythos 5, the first Mythos-class models offered beyond a tiny circle of cyber defenders. Fable 5 is the generally available version, wrapped in a new layer of safeguards, while Mythos 5 is the same underlying model with some of those guardrails lifted for a small group of vetted partners. The pair sits a full tier above the Opus class in raw capability, and the launch is as much a story about how Anthropic is choosing to gate that capability as it is about the benchmarks. Below is a full breakdown of what shipped, what the model can do, and why the safeguard design matters.

    TLDR

    Anthropic released Claude Fable 5, a Mythos-class model that is now its most capable generally available model, posting state-of-the-art results across software engineering, knowledge work, vision, memory, and scientific research. To ship it safely and fast, Fable 5 carries new safety classifiers that route flagged queries in cybersecurity, biology and chemistry, and distillation over to Claude Opus 4.8 instead of refusing, a fallback that triggers in under 5% of sessions. The same model ships without cyber safeguards as Claude Mythos 5 for Project Glasswing partners in collaboration with the US Government, where it is described as having the strongest cybersecurity capabilities of any model in the world. Highlights include a codebase-wide migration of a 50-million-line Ruby codebase that Stripe says took a day instead of two months, beating Pokemon FireRed with a vision-only harness, accelerating drug design roughly tenfold using Mythos 5, producing novel molecular biology hypotheses preferred by scientists about 80% of the time, and over a week of autonomous genomics research. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview, with a staged subscription rollout and a new 30-day data retention policy for Mythos-class traffic.

    Thoughts

    The most interesting decision here is not the capability jump, it is the naming split. Fable and Mythos are the same brain. The only difference is whether the safeguards are on. Anthropic is effectively shipping one model twice: a gated public edition and an ungated edition handed to a short list of trusted defenders working with the US Government. That is a clean way to resolve the central tension of frontier AI, which is that the exact capabilities that help a security professional close a vulnerability also help an attacker find one. Rather than dumbing the model down for everyone or holding it back entirely, they are letting the access list, not the weights, carry the risk. Expect this pattern to repeat as capabilities climb.

    The fallback-to-Opus design is the other quietly important choice. When a classifier flags a query in cybersecurity, biology, chemistry, or suspected distillation, the user does not hit a wall of refusal. The request is silently handed to Opus 4.8, a model that is still excellent at almost everything. Graceful degradation beats a hard no, both for user experience and for trust. It also reframes what a safeguard is. Instead of a binary block, it becomes a routing decision, and because more than 95% of sessions never trigger it, most users will never notice it exists. The honest admission that the classifiers are tuned conservatively and will sometimes catch harmless requests is the right posture, even if it will annoy power users who keep getting bounced to the smaller model.

    The commercial signals are worth reading closely. Pricing came down to less than half of Mythos Preview, which suggests confidence in serving costs at scale, but the subscription rollout tells a more cautious story. Fable 5 is free on Pro, Max, Team, and Enterprise plans only through June 22, after which using it requires usage credits until capacity catches up. That is a polite way of saying demand is expected to badly outrun supply. The model is fully available on the API and consumption-based Enterprise plans from day one, because those bill by the token and self-throttle. Subscriptions, which are all-you-can-eat, are where a capacity crunch actually hurts, so that is exactly where the brakes went on.

    On the science, the genomics result is the one that should make people sit up. A model doing over a week of largely autonomous research, assembling single-cell data across 138 species, then designing and training its own machine learning model that outperforms a recently published Science paper while being 100 times smaller, is a different category of claim than acing a benchmark. So is the drug-design work, where Mythos 5 reportedly matches or beats skilled human operators end to end, choosing binding sites, running protein design tools, and recovering from its own failures. If those hold up to publication and independent replication, the interesting frontier stops being chat quality and becomes whether a model can run a research program. That is also precisely why the biology and chemistry classifier exists, and why Anthropic is being so deliberate about who gets the ungated version.

    One caveat worth keeping in view: nearly all of the evidence in the announcement is Anthropic’s own, or comes from partners with early access and an incentive to be enthusiastic. The Stripe migration, the FrontierCode score, the Slay the Spire memory result, the protein targets, and the genomics model are all compelling, but they are first-party until outside labs and the eventual system card, peer review, and independent red-teamers weigh in. The note that the UK AISI made progress toward a universal jailbreak inside a brief testing window is a useful reminder that the safeguard story is a work in progress, not a finished proof.

    Key Takeaways

    • Claude Fable 5 is a Mythos-class model made safe for general use, and is now Anthropic’s most capable generally available model.
    • Mythos-class is a tier that sits above the Opus class in capability. The first was Claude Mythos Preview, released in April through Project Glasswing.
    • Fable 5 is state-of-the-art on nearly all tested benchmarks, and its lead grows as tasks get longer and more complex.
    • Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. Fable and Mythos differ only by their safeguards.
    • Mythos 5 is described as having the strongest cybersecurity capabilities of any model in the world, and is deployed through Project Glasswing with the US Government.
    • New safety classifiers cover cybersecurity, biology and chemistry, and distillation. Flagged queries fall back to Claude Opus 4.8 rather than being refused.
    • Users are told whenever a fallback happens. More than 95% of Fable sessions involve no fallback at all, and for those sessions Fable performs effectively the same as Mythos 5.
    • The safeguards are tuned conservatively and trigger in less than 5% of sessions on average, sometimes catching harmless requests. Anthropic plans to reduce false positives after launch.
    • Stripe reported Fable 5 compressed months of engineering into days, performing a codebase-wide migration of a 50-million-line Ruby codebase in a day that would have taken a team over two months by hand.
    • Fable 5 scores highest among frontier models on Cognition’s FrontierCode evaluation for high-quality agentic coding, even at medium effort, and is more token-efficient than past Claude models.
    • On Hebbia’s Finance Benchmark for senior-level reasoning, Fable 5 has the highest score of any model, with gains in document reasoning, chart and table interpretation, and problem solving.
    • IMC noted Fable 5 aced their trading-analysis evaluations nearly across the board, including factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.
    • Fable 5 is the new state-of-the-art for vision, and can rebuild a web app’s source code from screenshots alone.
    • Fable 5 beat Pokemon FireRed using a minimal, vision-only harness with no maps, navigation aids, or extra game-state information. Earlier Claude models needed a complex helper harness.
    • Persistent file-based memory improved Fable 5’s Slay the Spire performance three times more than it did for Opus 4.8, and Fable reached the game’s final act three times more often.
    • Fable 5 built a simulation of the solar system, deriving the planets’ orbital motion from physics first principles and using it to predict solar eclipses.
    • Using Mythos 5, internal protein design experts accelerated aspects of drug design by around ten times, with the model matching or beating skilled human operators end to end.
    • Nine of 14 protein targets in the drug-design study yielded strong candidates Anthropic is now investigating.
    • Mythos 5 is Anthropic’s first model to consistently produce novel, compelling scientific hypotheses. Scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons.
    • One Mythos hypothesis, a novel mechanism for an E. coli protein, was corroborated by an independent lab working on the same problem.
    • In over a week of largely autonomous work, Mythos 5 assembled single-cell data for millions of cells across 138 animal species and trained a custom model that outperformed a recent Science paper while being 100 times smaller.
    • Anthropic’s automated alignment assessment found Mythos 5’s level of misaligned behavior was low and similar to Opus 4.8. Because they are the same model, Fable 5’s alignment is similar.
    • An external bug bounty produced no universal jailbreaks in over 1,000 hours of testing, though the UK AISI made progress toward one in a brief initial window.
    • One external partner found Fable 5’s safeguards against harmful cyber queries the most robust of any model tested, including Opus 4.8 and Opus 4.7, with zero compliance on harmful single-turn cyberattack requests.
    • The biology and chemistry classifier is deliberately broad for now. Mythos-class models outperformed dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone.
    • The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models, which could proliferate near-frontier capabilities without safeguards.
    • A new policy requires 30-day data retention for all Mythos-class traffic on first- and third-party surfaces, used only for safety, with logged human access and deletion after 30 days in almost all cases.
    • Anthropic plans trusted access programs that let cybersecurity organizations apply for Mythos 5, and let a small number of life science researchers access Fable 5 with biology and chemistry safeguards removed.
    • Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Mythos Preview. Developers can use claude-fable-5 via the Claude API.
    • Fable 5 is free on Pro, Max, Team, and seat-based Enterprise plans through June 22. On June 23 it moves to usage credits on those plans until capacity allows it to return as a standard inclusion.

    Detailed Summary

    A Mythos-class model, made safe for general use

    Fable 5 is the first Mythos-class model Anthropic has made generally available. Mythos-class is a tier that sits above the Opus class, and the first of its kind, Claude Mythos Preview, was released in April through Project Glasswing to a limited group of cyber defenders and critical software infrastructure providers. The company framed today’s launch as the moment it could finally bring that level of capability to all users, because its safeguards had matured enough to allow it. Fable 5’s capabilities exceed those of any model Anthropic has made generally available, and its advantage over other models grows as tasks get longer and more complex.

    Two models, one brain

    Claude Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. The names are the only real difference: Fable, from the Latin fabula meaning that which is told, is akin to the Greek mythos, and the safeguards are what distinguish the two. Mythos 5 launches first to existing Mythos Preview users, including the Project Glasswing cybersecurity partners, as an upgrade. It is deployed in collaboration with the US Government and is described as having the strongest cybersecurity capabilities of any model in the world. Anthropic plans to steadily expand access through a more systematic trusted access program.

    Software engineering and token efficiency

    Fable 5 can work autonomously for longer than any previous Claude model, and software engineering is where that shows most clearly. During early testing, Stripe reported it compressed months of engineering into days, performing a codebase-wide migration in a 50-million-line Ruby codebase in a single day that would otherwise have taken a whole team over two months by hand. It is also more token-efficient than past models, scoring highest among frontier models on Cognition’s FrontierCode evaluation for high-quality, maintainable agentic coding, even at medium effort.

    Knowledge work, vision, and memory

    On complex analytical work, Fable 5 posted the highest score of any model on Hebbia’s Finance Benchmark for senior-level reasoning, with substantial gains in document-based reasoning and chart and table interpretation, and IMC said it aced their trading-analysis evaluations nearly across the board. In vision, it is the new state-of-the-art, able to extract precise numbers from detailed scientific figures and rebuild a web app’s source code from screenshots alone. It needs less scaffolding too: where earlier Claude models struggled to play Pokemon even with helper harnesses, Fable 5 beat FireRed with a minimal, vision-only harness using nothing but raw game screenshots. On memory, giving Fable persistent file-based notes improved its Slay the Spire performance three times more than it did for Opus 4.8, and it built a physics-first-principles solar system simulation accurate enough to predict solar eclipses.

    Life sciences: drug design, hypotheses, and genomics

    Using Mythos 5, Anthropic’s internal protein design experts accelerated aspects of the drug-design process by around ten times. With protein design and bioinformatics tools but no human assistance, the model matched or beat skilled human operators, executing the full workflow of choosing binding sites, selecting and running design tools, and recovering from failures. Nine of 14 protein targets yielded strong drug-design candidates now under investigation. Mythos 5 is also Anthropic’s first model to consistently produce novel, compelling scientific hypotheses: scientists preferred its molecular biology hypotheses about 80% of the time in blinded comparisons, and one, a novel mechanism for an E. coli protein, was corroborated by an independent lab. In genomics, Mythos 5 ran over a week of largely autonomous research, assembling single-cell data for millions of cells across 138 species and training a custom model that outperformed a recent Science paper despite being 100 times smaller.

    The new safeguards: classifiers and fallback

    Mythos-class capability is potent enough that Anthropic considers it a substantial misuse risk, especially given how much advanced AI usage is dual use. Fable 5 ships with a new set of classifiers, separate AI systems that detect potential misuse and jailbreak attempts and stop the main model from responding. When a classifier flags a request related to cybersecurity, biology and chemistry, or distillation, the response is handled by Claude Opus 4.8 instead, and the user is told. The cybersecurity classifiers cover both exploitation and broader offensive cyber tasks like reconnaissance and lateral movement, and Anthropic says they prevent Fable from making any progress on those tasks. The biology and chemistry classifier is intentionally broad for now, after tests showed Mythos-class models could outperform dedicated protein language models at predicting AAV viral shell assembly using biological reasoning alone. The distillation classifier targets large-scale attempts to extract Claude’s capabilities to train competing models.

    Jailbreak resistance, data retention, and availability

    Anthropic ran extensive red-teaming, including an external bug bounty that produced no universal jailbreaks in over 1,000 hours, though it notes the UK AISI made progress toward one in a brief window. The company concedes it is likely impossible to fully prevent universal jailbreaks and aims instead to make any that remain slow and costly enough to catch before they scale. A new policy requires 30-day data retention for all Mythos-class traffic, used only for safety, with logged human access and deletion after 30 days in almost all cases. On availability, Fable 5 is live everywhere today and fully available on the API and consumption-based Enterprise plans, while subscription access rolls out in stages: free on Pro, Max, Team, and seat-based Enterprise through June 22, then on usage credits from June 23 until capacity allows it to return as a standard inclusion. Both models cost 10 dollars per million input tokens and 50 dollars per million output tokens.

    Notable Quotes

    “Today we’re launching Claude Fable 5: a Mythos-class model that we’ve made safe for general use.”

    Anthropic, opening the Claude Fable 5 and Claude Mythos 5 announcement

    “Fable 5’s capabilities exceed those of any model we’ve ever made generally available.”

    Anthropic, on where Fable 5 sits in the lineup

    “It has the strongest cybersecurity capabilities of any model in the world.”

    Anthropic, describing Claude Mythos 5

    “During early testing, Stripe reported that Fable 5 compressed months of engineering into days.”

    Anthropic, on Fable 5’s software engineering results

    “Our early data shows that more than 95% of Fable sessions involve no fallback at all.”

    Anthropic, on how often the safeguards route to Opus 4.8

    “Mythos 5 is our first model to consistently produce novel, compelling scientific hypotheses.”

    Anthropic, on the model’s molecular biology research

    “It is likely impossible to completely prevent universal jailbreaks, but our goal is to make any remaining jailbreaks sufficiently slow and costly that we can detect and prevent them before they are used at scale.”

    Anthropic, on the limits of its safeguards

    “Fable is from the Latin fabula, ‘that which is told,’ akin to the Greek mythos. The safeguards are what distinguish the two models.”

    Anthropic, explaining the Fable and Mythos naming

    Read the full announcement and the benchmark tables on Anthropic’s site here: Claude Fable 5 and Claude Mythos 5.

    Related Reading

    • Project Glasswing β€” background on the cyberdefense program that Mythos 5 ships through with the US Government.
    • Introducing Claude Opus 4.8 β€” the model that flagged Fable 5 queries fall back to instead of being refused.
    • Claude Mythos Preview β€” the first Mythos-class model, released in April, that Mythos 5 now upgrades.
    • Anthropic model system cards β€” where the full safety, alignment, and capability testing for models like Fable 5 is documented.
  • Whale Rock Capital Founder Alex Sacerdote on S-Curve Investing, Why Anthropic Is His Highest Conviction Bet, and the Decommoditization of AI Hardware

    Alex Sacerdote built Whale Rock Capital into one of the most respected technology hedge funds in the world by treating markets through a single disciplined lens: the technology adoption S-curve. In this long conversation on Invest Like the Best with Patrick O’Shaughnessy, he lays out the full framework that has carried him through internet 1.0, mobile, cloud, e-commerce, and now AI, and he explains why Anthropic became his highest conviction position, why his fund went net short application software, and why the least glamorous corner of the market, the hardware and chips that build out data centers, may be one of the best ways to play artificial intelligence right now. What follows is the working theory of a money manager who has spent twenty years trying to think exponentially while the rest of the market thinks one quarter at a time.

    TLDW

    Sacerdote walks through Whale Rock’s three-part investment framework: find the right part of an S-curve, identify the company with a durable competitive advantage, and buy when long-term earnings power is underappreciated. He tells the story of investing in Anthropic at a 180 billion dollar valuation in August 2025 after Claude Code made coding the true unlock of AI, and frames the foundational model market as a three-horse race between Anthropic, OpenAI, and Google that resolved from sixty startups into an oligopoly. He argues enterprise AI is less than 1 percent penetrated, calls the adoption shape an L curve rather than an S-curve, and warns there is not enough compute in the world. He explains why he sold almost all of his application software and went net short, why he loves the decommoditization of AI hardware (Celestica, Corning, Elite Materials, Delta, Advanced Energy, high bandwidth memory, 40-layer PCBs), introduces a modified rule of 40 for chip investing, surveys the moats that let leaders win (network effects, industry standard, scale, critical IP, brand, recursive self-improvement), discusses moving from public markets into private deals like Stripe and Anthropic, lays out Whale Rock’s fund products including the new Mega Cap Tech Fund, defends old-fashioned scuttlebutt research in an AI age, and closes on the kindest thing anyone ever did for him, his father joining the firm after 41 years at Goldman Sachs.

    Thoughts

    The most useful idea in this conversation is not the bullishness on AI, which is everywhere now, but the discipline underneath it. Sacerdote’s framework forces a separation that most investors collapse. A great market is not a great investment. A great company is not a great investment. You need a tall S-curve, a company with a moat that survives the curve, and a price that does not yet reflect the earnings power. He says the quiet part out loud: he has repeatedly bought the best companies in the world at four or five times earnings precisely because the market refuses to extrapolate exponential growth. Nvidia at four times earnings in 2023, Tesla at five times in 2019, Amazon where AWS came free. The edge is not information, it is the willingness to underwrite two to four years out when the consensus cannot see past the next quarter.

    The Anthropic story is the framework applied in real time, and it is worth noting how late and how cautious he was. Whale Rock passed on the 60 billion dollar round because gross margins were negative and coding had not yet exploded. They only got conviction once Claude Code flipped from autocomplete to agentic work, once they heard Anthropic engineers were burning 100 dollars a day in tokens, and once the math on twenty million coders implied a half trillion dollar market from coding alone. The lesson he repeats throughout, that it is okay to be late, that you can miss the first 100 percent if the curve is tall enough, is a direct rebuke to the fear of missing out that drives most AI investing. He waited for the moat to be visible before he paid up.

    His most contrarian and most actionable call is on hardware. The consensus reflex is that chips and components are commodities that get competed to zero. Sacerdote argues the opposite is happening: AI workloads growing 10x a year are pushing every layer of the server to its physical limits, and that pressure is decommoditizing the entire stack. A liquid-cooled AI server is a 300,000 dollar piece of critical infrastructure, not a 5,000 dollar throwaway box, which means the supplier becomes a permanent fixture like a parts vendor on a plane. The Celestica example is the template: a contract manufacturer left for dead since 1999 that turned out to be the sole supplier of Google’s TPU server and a leader in liquid cooling and Ethernet switching, trading at eight times earnings. If he is right that we are 30 percent short on DRAM, NAND, and PCBs, the picks-and-shovels trade has years left to run regardless of which model company wins.

    The software bear case deserves the most scrutiny because it is the most consequential and the least certain. Going from 40 to 50 percent of the portfolio in software to net short is a violent reallocation, and his reasons are layered: AI products that nobody will pay for, CIO budgets being raided to fund Anthropic tokens, pricing power evaporating, and the long-term threat that AI-native startups rebuild incumbents from scratch. But he is honest that the bull case is real too, that old technology is sticky, that companies prefer to buy rather than build, and that AI might actually make platforms like Slack or CRM more important if agents end up operating inside them. This is the genuine uncertainty in the whole AI trade. The bottom of Jensen’s cake, chips and models, is where the value has accrued so far, but historically the application layer captured most of the market cap. Sacerdote is betting that this time the infrastructure and model layers hold the value longer, and he admits the application ecosystem is still unclear and a little bit dangerous. That admission is more valuable than any of his confident calls.

    Finally, the section on research in an AI age is a quiet refutation of the idea that this work automates away. Sacerdote runs a Philip Fisher scuttlebutt operation, 2,500 to 3,000 face-to-face management meetings a year, two decades of compounding relationships, the tripod of conviction where he, his analyst, and a respected outsider all independently like an idea. AI writes better notes now, but the paragraph on top, the wisdom about what it means and how it fits the thesis, is still human. The durable moat in his own business is the same one he looks for in the companies he buys: an accumulated advantage that newcomers cannot replicate quickly. That consistency between how he invests and how he operates is the most credible thing in the interview.

    Key Takeaways

    • Whale Rock’s framework has three legs: identify the right part of a technology S-curve, find the company with a powerful competitive advantage, and invest when long-term earnings power is underappreciated.
    • The core insight is exponential, not linear. Strong tech business models grow earnings exponentially, and because the market refuses to extrapolate, you can buy elite companies at very low multiples.
    • Concrete examples of buying exponential growth cheaply: Nvidia at four times earnings in 2023, Tesla at five times in 2019, Apple at four times, and Amazon where AWS was effectively free.
    • When ChatGPT launched in November 2022, Whale Rock did a firm-wide deep dive and chose to invest in chips and infrastructure first, because demand arrives there first and the winners are knowable regardless of who wins the model layer.
    • The foundational model market went from roughly 60 startups to a three-horse race: Anthropic, OpenAI, and Google. Most startups died, Amazon never showed up, and Meta faltered and had to reboot.
    • Anthropic was the dark horse that focused purely on enterprise while OpenAI won consumer. Whale Rock made it their highest conviction position.
    • Coding is the true unlock of AI. The progression went from Microsoft Copilot at 20 dollars a month (fixing grammar, finding a bug) to Claude running agentically and writing most of the code.
    • The market math: Anthropic engineers were reportedly spending 100 dollars a day on tokens, roughly 20 to 30 thousand dollars a year, and with about 20 million coders in the world that implies a half trillion dollar market from coding alone.
    • Whale Rock invested in Anthropic at the 180 billion dollar valuation in August 2025, when the company hoped to reach 9 billion in revenue and nobody yet knew what 2026 could be.
    • Andrej Karpathy and Linus Torvalds both flipped on AI coding. Karpathy went from 80 percent handwritten code to writing almost no code except in English.
    • Models are not pure commodities. There is real differentiation: Anthropic is strong for private equity and finance, Google is strong at ingesting PDFs, and routers that switch between models mask but do not erase that differentiation.
    • Anthropic is building an ecosystem around the API (SDK, orchestration, the harness, tools), echoing how AWS built lock-in with products around commodity servers starting in 2013.
    • The 800 million people using AI are mostly using AI 1.0, a search engine on steroids. Sundar Pichai estimated only about 10 basis points of knowledge workers are truly using AI’s new capabilities.
    • Enterprise AI is less than 1 percent penetrated. Whale Rock calls the adoption shape an L curve or backwards L curve because it goes straight up, unlike the slower 30 to 50 percent growth of cloud and SaaS.
    • There is not enough compute in the world. Anthropic reportedly has half of what it needs, and Marc Andreessen said the one thing he is sure of is that there will not be enough compute for the next four years.
    • The infrastructure S-curve is only about 10 percent penetrated and remains one of the best ways to play AI.
    • Getting into private deals requires a double opt-in. Whale Rock did a 90-page deck (built with Claude Code) on the coding market to win their Anthropic allocation, and their first private was Stripe in 2020 at a 35 billion dollar valuation.
    • The unicorn private market is now bigger than most European stock markets, larger than Germany or the UK individually. Whale Rock does 2,500 to 3,000 management meetings a year, 10 to 15 percent with privates.
    • S-curves come in two sizes: mega S-curves (internet, mobile, cloud, e-commerce, AI) and sub S-curves within them. AI is the biggest of all and each curve builds on the last.
    • Adoption inflects when barriers fall. Steve Jobs cut the smartphone price to 200 dollars on a 3G touchscreen, Elon cut the EV price to 40,000 with 300-mile range and a working supply chain. Remove the barriers and you get the tornado of demand.
    • Knowing how tall the curve is tells you when to sell. Growth stops being exponential around 30 to 40 percent penetration, when the sell side catches up and big beats end. EVs hit a wall at 10 to 15 percent instead of the expected 40 to 50 percent.
    • Selling Apple in 2012 at roughly 50 percent US smartphone penetration was a mistake, because the moat let it keep compounding around 20 percent even after the explosive phase ended.
    • At strategic inflection points you cannot trust the data (Andy Grove). The signal is intuition and anecdote: a 12-year-old in China on a giant phone playing a real game, or standing-room-only sessions at the Gartner IT Symposium for AWS, VMware, and Splunk.
    • Adoption slope varies. The radio curve hit near-full penetration in about 7 years, while B2B and infrastructure (the dishwasher that has to be plugged in) take far longer. AI is fast because you just open a browser.
    • The moats that let leaders win: network effects, becoming an industry standard, rapid scale, critical intellectual property, brand, and platform lock-in. Anthropic appears to have critical IP, enterprise brand, escape velocity, and recursive self-improvement from using its own code on its own models.
    • On the internet, the leader usually goes bigger, faster, and wins, and compounds on itself (Amazon, Shopify). Exceptions come at paradigm shifts, like AOL failing to make the dialup-to-broadband transition.
    • Whale Rock went from 40 to 50 percent in software five years ago to net short entering this year, which helped performance in the first quarter. AI products were not good enough to charge for and were not moving the needle.
    • Software faces a stack of headaches: falling priority on CIO to-do lists, budget pressure from token spend, lost pricing power, hiring freezes that hurt seat-based models, and the long-term threat of AI-native replacements.
    • The classic rule of 40 is growth rate plus operating margin. Whale Rock’s modified rule of 40 for chip investing is percent of sales that are AI plus market share in that category. Software AI exposure is still only 1 to 2 percent.
    • AI may make some platforms more important. The first thing you do with Claude is plug it into Slack, which could make Slack a permanent repository, and agents may end up operating inside incumbent tools like CRM, solidifying rather than killing them.
    • The data center stood still for 40 years on Intel x86, with every component commoditized. AI changed that. Workloads growing 10x a year are driving the decommoditization of the hardware industry.
    • Celestica is the template: a contract manufacturer left for dead since 1999, sole supplier of the Google TPU server, strong in liquid cooling and Ethernet white-box switching, with 50 to 60 percent share of the cloud Ethernet switch market, once trading at eight times earnings.
    • The whole supply chain is rerating: high bandwidth memory stacked 10 chips high, 40-layer PCBs (versus 10 for a normal server), Elite Materials copper clad laminate, Corning fiber (enough to circle the world four and a half times in one Microsoft data center), and Delta and Advanced Energy power supplies seeing ASPs rise 40 percent a year.
    • Networking has three layers: scale out (racks together), scale across (data centers together), and scale up (every GPU in a rack, currently copper, eventually fiber). The copper-to-fiber shift could two-to-three-x Corning’s opportunity.
    • Whale Rock estimates the market is roughly 30 percent short on DRAM, NAND, and PCBs even at today’s 10 basis points of real AI usage.
    • Rate of change matters more than absolute level. When Claude plotted market share data it missed the rate of change, the thing that drives accelerating growth and margins as a company moves from 10 to 30 percent share.
    • Key risks: public and government negativity toward AI (Maine reportedly banned data centers, only 20 percent of people are optimistic), models hitting a wall and letting open source catch up into a race to the bottom, and a major player faltering and stranding compute.
    • Chip companies do not care who wins the token war, which makes them a relatively safe way to play AI. Jensen Huang actively wants open source to take off.
    • Research is still human work. Whale Rock runs a Philip Fisher scuttlebutt process, the tripod of conviction (Alex, the analyst, and a respected outsider), and 20 years of compounding knowledge. AI writes better notes but cannot supply the wisdom paragraph on top or pick stocks.
    • The firm’s product evolution: 15 years as a long short fund, a long only fund in 2020 that is now larger than the long short, opt-in privates formalized around 2015 and activated in 2020, an 80 percent privates hybrid fund in 2021, and the new Whale Rock Mega Cap Tech Fund.
    • The Mega Cap Tech Fund thesis: endowments are structurally underweight the largest tech companies because they believe there is no alpha in large cap. Whale Rock takes the top 30 global market caps and picks the best 12 or 13, arguing it takes 100 diversified PMs to realize Google is a winner.
    • The kindest thing anyone ever did for Sacerdote: his father, after 41 years at Goldman Sachs, joined Whale Rock as chairman and the gray hair for six years until he passed away in 2011.

    Detailed Summary

    The Anthropic Investment and the Three-Horse Race

    When ChatGPT launched in November 2022, Whale Rock immediately took its 10-person team and ran a firm-wide deep dive. Sacerdote’s first principle is that every new compute paradigm creates a new stack with new winners and losers, and in this stack the layers run from power and chips at the bottom, to the clouds, to the foundational models, to the applications on top. In early 2023 the firm deliberately positioned in chips and infrastructure first, reasoning that demand arrives there first and the winners are knowable no matter who wins above. At an April 2023 webinar they framed the model layer as a coin flip between winner-take-all, total commodity, a race to zero, or an oligopoly of three or four. Over the next three years the answer became clear: of roughly 60 startups, almost all died, Amazon never really showed up, Meta came in strong then faltered and rebooted, and Anthropic emerged as the dark horse focused purely on enterprise while OpenAI won consumer and Google remained a perennial threat. The result looked like the cloud market, where three companies underpin the entire SaaS world with excellent businesses.

    The decisive factor was code. Sacerdote says the firm was initially skeptical AI could replace labor, given the negative corporate feedback on early models. That changed in 2025 when Claude Code and the agentic coding tools exploded. The progression ran from Microsoft Copilot at 20 dollars a month, which could improve coding grammar or find a bug, to Claude running agentically and doing far more. The token economics were staggering: Anthropic engineers reportedly spending 100 dollars a day, which annualizes to 20 to 30 thousand dollars, and with 20 million coders worldwide that implied a half trillion dollar market from coding alone, on technology that was only 7 to 9 months old. Whale Rock made the investment at the 180 billion dollar valuation in August 2025, writing in their letter that the company hoped to reach 9 billion in revenue, with growth like nothing they had ever seen, 100 million to a billion on the way to 9 billion, and no one yet knowing what 2026 could bring.

    Why the Models Are Not Commodities

    Everyone expected the foundational models to be pure commodities, but Sacerdote argues there is tremendous differentiation within them. Different training methods produce different skills: Anthropic excels at anything touching private equity and finance, Google is strong at ingesting PDFs. Routers that switch between models make them look like commodities but mask genuine, critical IP. Beyond the model itself, Anthropic is building a whole ecosystem around the API: the SDK, the orchestration layer, the tools, and the harness, the software wrapped around the API that gets the most out of the model. He compares this directly to AWS in 2013, when people dismissed cloud as commodity servers in a warehouse and missed that Amazon was inventing products that slowly built lock-in. The open-source risk from China is real, but Sacerdote got comfortable that leading-edge token quality is superior, because going from 80 to 85 percent of benchmark performance is a huge unlock and the open-source players lack the compute to leapfrog the frontier.

    The S-Curve Framework in Full

    Whale Rock’s whole edge is thinking exponentially when the world thinks linearly. Sacerdote argues very few people believe you can accurately predict two, three, or four years out, but if you understand the S-curve, the moats, and how to model, you can. Every technology follows the same pattern: it exists hidden for years (smartphones 10 years before the iPhone, the internet 20 years before Netscape, EVs 15 years before Tesla went vertical in 2019) until the barriers to adoption fall and demand inflects into a tornado. Knowing how tall the curve is tells you when to sell, because exponential growth stops around 30 to 40 percent penetration when the sell side catches up. Curves can also be dynamic: AWS turned out to address a far larger TAM than expected once it became clear cloud was not actually deflationary. There are mega S-curves (internet, mobile, cloud, e-commerce, AI) and sub S-curves within them. AI is the biggest. And slope varies enormously by the nature of the technology, the radio curve hitting full penetration in 7 years, B2B and infrastructure taking decades because, like a dishwasher, they have to be plugged into existing systems.

    On timing, Sacerdote is relaxed about being late. Citing Peter Lynch, who mentored him at Fidelity and told him to white out the chart because it is all about the future, he argues it is fine to miss the first one, two, or three years and even the first 100 percent if the top of the curve is half a trillion. At strategic inflection points, per Andy Grove, you cannot trust the data, so the firm relies on intuition and anecdote: a 12-year-old in China playing a real video game on a huge phone, or the AWS session at the Gartner IT Symposium that was standing-room-only at 9, 10, and 11 in the morning. Spotting the leader pulling away matters because, on the internet, the leader usually goes bigger, faster, and wins, compounding on itself, with exceptions only at paradigm shifts like AOL missing the move from dialup to broadband.

    The Software Bear Case

    Five years ago Whale Rock had 40 to 50 percent of its portfolio in software. Their April 2023 thesis was that incumbents with huge sales forces and proprietary data would take the AI APIs and build great products. Instead, the AI products were not good enough to charge for and did not move the needle, so the firm sold almost all of its application software and entered this year net short, which helped in the first quarter. The bear case is layered: software has fallen down the CIO priority list, budgets are being raided to fund Anthropic tokens with faster ROI, annual price increases look risky, and hiring freezes hurt seat-based models. The deeper threat is that AI-native startups could rebuild any incumbent from scratch, obviating the data advantage. The bull case is genuine too: old tech is sticky (mobile games did not kill consoles, tablets did not kill the PC), companies prefer to buy rather than build, and an ERP is hard to replace. Sacerdote also floats an optimistic twist, that AI could make platforms like Slack more important as agent repositories, and that agents operating inside CRM could solidify rather than destroy it, even as the bear case is that CRM goes headless and gets relegated to a database.

    The Decommoditization of AI Hardware

    This is Sacerdote’s most differentiated call. For 40 years nothing changed in the data center; Intel x86 became the standard, compute grew 25 to 40 percent a year in line with Moore’s law, and every component, from the printed circuit board to memory to enclosures to networking, commoditized. AI broke that. Workloads now grow 10x a year and push every aspect of the hardware to its physical limits, creating both tremendous unit growth and what Whale Rock calls the decommoditization of the hardware industry. He cites Sean Maguire wishing he could run a hardware hedge fund because all the companies are public with powerful IP, and compares it to Sequoia’s best early hardware investments in Apple and Cisco. The economics flip because an AI server is a liquid-cooled, 200 to 300 thousand dollar piece of critical infrastructure where a single failure brings the whole thing down, so suppliers become permanent like a critical part on a plane.

    Celestica is the marquee example: a contract manufacturer that had been a disaster industry since 1999 and went offshore to China, but kept its IBM supercomputing heritage and talent, became the sole supplier of the Google TPU server, and was trading at eight times earnings three years ago. It turned out to be excellent at liquid cooling where others failed, holds 50 to 60 percent share of the crucial cloud Ethernet switch market, and its engineers helped write the open-source SONiC software, working closely with Broadcom. The same dynamic runs up and down the chain: high bandwidth memory stacked 10 chips high that took Samsung years to master, 40-layer PCBs versus 10 for a normal server with very few suppliers able to make them, Elite Materials supplying the copper clad laminate, and Corning’s fiber, thinner and more bendable, with enough in a single Microsoft data center to circle the world four and a half times. Networking splits into scale out, scale across, and scale up, with the eventual copper-to-fiber shift in scale up potentially two-to-three-x-ing Corning’s opportunity. Power supplies from Delta and Advanced Energy are seeing ASPs rise 40 percent a year at higher margins because each Nvidia rack uses 50 to 125 percent more power. Visibility has gone from we’ll call you next week to design this roadmap with us for four years, turning 5 percent low-margin businesses into 35 to 50 percent topline growers with rising margins, and the whole market is roughly 30 percent short on DRAM, NAND, and PCBs.

    Private Markets, Risks, and the Research Machine

    Moving from public markets into privates meant adapting to a double opt-in, where the company has to choose to let you in. Whale Rock won its Anthropic allocation partly by building a 90-page deck with Claude Code scouring the internet for feedback on the coding market. Their first private was Stripe in April 2020 at a 35 billion dollar valuation, which they could only underwrite because they knew the public comp Adyen cold, and they upsized to a 100 million dollar block. The unicorn market is now bigger than most European stock markets combined. On risk, Sacerdote worries about public and government negativity (Maine reportedly banning data centers, only 20 percent of people optimistic), the possibility that models hit a wall and open source catches up into a race to the bottom, and a major player faltering and stranding compute, though he notes someone else (like Meta stepping into a cancelled Oracle deal) would likely absorb it, and that chip companies benefit regardless of who wins the token war. He explains his caution on the application layer by noting it always comes later, the iPhone took years to spawn its app economy, and the ecosystem is still unclear and a little dangerous, while pointing to Brett Taylor’s Sierra as the kind of company that could prove it out.

    On the research itself, Sacerdote insists AI has not supplanted the analyst. Whale Rock runs the scuttlebutt approach straight out of Philip Fisher’s Common Stocks and Uncommon Profits, doing 2,500 to 3,000 face-to-face management meetings a year and talking to suppliers, customers, and competitors. AI now writes much better notes and gets the team up to speed quickly on complex areas like ABF substrates, but there must be a wisdom paragraph on top, and it cannot pick stocks or replicate the work two analysts did building conviction in AppLovin and a relationship with Adam Foroughi. He calls the firm the Whale Rock learning machine, a group of 10 highly experienced people compounding knowledge for 20 years, with the tripod of conviction (himself, his analyst, and a respected outside investor all liking an idea) as the test. The firm’s products evolved from a 15-year long short fund to a 2020 long only fund now larger than the original, opt-in privates, an 80 percent privates hybrid in 2021, and the new Mega Cap Tech Fund built on the thesis that endowments are structurally underweight the largest tech companies because they wrongly believe large cap has no alpha. He closes on his father, who left Goldman after 41 years to join Whale Rock as chairman and the gray hair until his death in 2011, a mentor remembered by countless people for his humility and grace.

    Notable Quotes

    “When you get the right part of the S-curve, you get exponential unit growth. If you have a very strong business model, your earnings don’t grow linearly, they grow exponentially.”

    Alex Sacerdote, stating the core of the Whale Rock investment framework

    “The world doesn’t think exponentially. Very few people believe you can accurately predict two, three, four years out. But if you follow and understand the S-curve and you know the moats and you know how to model, you really can predict these great things.”

    Alex Sacerdote, on why the market consistently underprices long-term earnings power

    “The enterprise AI or enterprise application AI market is less than 1 percent penetrated, and we’ve never seen, you know, we talk about S-curves, we call this an L curve, just straight up.”

    Alex Sacerdote, on why AI adoption looks different from every prior technology curve

    “We’re at 10 basis points of people really using AI and we’re already sold out. There’s not enough compute in the world. So Anthropic has half of what they need right now, and that’s before this huge takeup.”

    Alex Sacerdote, on the scale of the compute shortage relative to actual adoption

    “It’s okay to be late. It’s okay to miss the first one, two, three years in a lot of cases, because if the top of the S-curve is half a trillion, the growth can go on for a long time. It’s okay to miss the first 100 percent.”

    Alex Sacerdote, on why fear of missing out is the wrong instinct in a tall S-curve

    “The old way of software is like using a pen and paper or a horse and buggy. The new way of software is like a jet engine or frankly like the transporter from Star Trek. It’s so revolutionary it feels like it has to be disruptive.”

    Alex Sacerdote, explaining why Whale Rock went net short application software

    “You become like critical infrastructure, like selling a critical part on a plane. You’ll never get swapped out.”

    Alex Sacerdote, on how liquid-cooled AI servers turned commodity hardware suppliers into permanent fixtures

    “Why do you tell everyone your secret? It’s like why does the casino teach people how to play blackjack? It’s harder. It’s really hard to do.”

    Alex Sacerdote, quoting his mother on why a public framework does not erase the edge

    “He said, you know, I’ve been at Goldman for 41 years. How about I come and join you? I’ll be the gray hair. I’ll be the oversight. I’ll be the chairman. You do what you do.”

    Alex Sacerdote, recalling his father joining Whale Rock, the kindest thing anyone ever did for him

    Watch the full conversation here: Whale Rock Capital Founder on Investing in the Age of Exponential AI.

    Related Reading

  • Thomas Laffont of Coatue on the $4 Trillion AI IPO Wave: SpaceX, Anthropic, OpenAI, and Why the New Unicorn Economy Is Healthier

    Thomas Laffont, co-founder of the $55 billion hedge fund Coatue Management, made his All-In Podcast premiere with a data-dense walk through what he calls a once-in-a-generation moment for the unicorn economy. In front of Chamath Palihapitiya, Jason Calacanis, David Sacks, and David Friedberg, he argued that a roughly $4 trillion wave of private value is about to hit the public markets, led by SpaceX, Anthropic, and OpenAI, and that the new AI-driven unicorn economy is actually healthier than the one that came before it. You can watch the full presentation and Q&A on YouTube.

    TLDW

    Laffont presents Coatue’s slide deck on the state of the unicorn economy and argues it has rebalanced after the excesses of 2021. The average unicorn is up about 70 percent since September 2024, AI keeps taking a bigger share of all fundraising, and the model has shifted from many small unicorns to fewer companies each raising far more, with funding per unicorn up roughly 5x since 2021. He introduces a “Magnificent 8” private index (SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more) worth nearly $4 trillion that has crushed the public Mag 7, then shows that exits are finally thawing as SpaceX heads to an IPO in weeks and Anthropic confidentially files its S1. He lays out Coatue’s “CODE” framework for why SpaceX gets more valuable the more it launches, a counterintuitive finding that the odds of a 10x actually rise as companies get bigger (31 percent for $100 billion-plus centicorns), the explosive revenue ramp of OpenAI and Anthropic past Workday, ServiceNow, Adobe, Salesforce, and now the hyperscalers, a three-pillar map of where AI revenue comes from (consumer, ads, enterprise), and the AI memory thesis. The Q&A with Chamath and Calacanis digs into the power law, K-shaped outcomes, whether these valuations are disconnected from reality, the public market as the great antiseptic, and what happens when trillions in private value finally recycles back through GPs and LPs.

    Thoughts

    The most useful idea in the talk is not the $4 trillion headline, it is the cohort-health chart. Laffont splits unicorns into eras and shows that the pre-2021 cohort was healthy, roughly 80 percent had raised again or exited 20 quarters after minting, while the giant 2021 ZIRP cohort of 479 companies is stuck with under 20 percent doing either. That single comparison reframes the whole AI boom. The bullish read is that the 2024 AI cohort is small, concentrated, and cash-generative, so it looks more like the healthy pre-ZIRP group than the 2021 hangover. The bearish read is that we are watching the same movie with bigger numbers, and the test only comes when these companies face public markets. Laffont is honest that we do not yet know which cohort the AI class resembles, and that intellectual humility is what makes the deck credible rather than promotional.

    The SpaceX “CODE” framework is the sharpest analytical move of the presentation. Most people would assume a launch business gets cheaper per launch as it scales. Laffont shows the opposite, the market pays more per launch as cadence rises, and explains it as a phase change in business quality: from one-time government launch revenue, to a single recurring-revenue constellation, to multiple constellations, to a platform with optional upside in space data centers, the moon, and Mars. It is a clean way to think about any company that climbs from a project business to a platform business, and it applies far beyond rockets. The lesson for investors is that valuation can rationally expand even as unit economics look like they should compress, because the nature of the revenue underneath is changing.

    The counterintuitive 10x odds finding deserves more attention than it got in the room. Conventional wisdom says the bigger you are, the harder it is to grow, so a $100 billion company should be less likely to 10x than a $10 billion one. Coatue’s data says the reverse: centicorns have a 31 percent shot at a 10x, far higher than the 8 percent a unicorn has at becoming a decacorn. Laffont’s explanation is a filtering mechanism, every step up validates a compounding advantage and durability of earnings, so survivors are increasingly the kind of business that keeps compounding. This is essentially a quantitative restatement of quality investing, and it is the intellectual backbone of the LP strategy the besties tease out, just buy whoever reaches $100 billion and hold.

    Where the argument gets genuinely contested is valuation, and the panel does not let it slide. The pushback that “these are not fake companies” is true and important, OpenAI and Anthropic are growing faster than any software company in history, and Anthropic reportedly had a profitable month. But growth and reality do not settle the question of price when you are paying 50 to 100 times revenue for trillion-dollar private companies, as Bill Ackman pointed out earlier in the day. Laffont’s answer is the most grounded thing he says all session: the public market is the great antiseptic, it will not care about anyone’s slide deck, and he wants to see these names withstand short sellers and skeptics. That is the right posture. The deck is a thesis, not a verdict, and the verdict arrives roughly six months and one day after the IPOs, once passive flows and supply have washed through.

    The closing thread, that almost every sector is being transformed at once and we still do not have superintelligence, is the part worth sitting with. The risk in a presentation this bullish is treating the trend as destiny. The value is in the framing tools Laffont hands you, cohort health, phase-change business quality, the filtering odds, the three revenue pillars, and the antiseptic of public scrutiny. Use those to interrogate each name rather than to buy the index on faith, and the talk earns its premiere billing.

    Key Takeaways

    • Coatue Management is one of the most successful hedge funds of the last two decades with about $55 billion under management, and is raising roughly another billion dollars specifically to invest in AI.
    • The unicorn economy is up about 70 percent on average since September 2024, and the public market has made a similar move up over the same period.
    • The unicorn economy’s share of the NASDAQ rose significantly after 2015 but has plateaued in recent years, reflecting strong performance from public companies.
    • AI keeps increasing its wallet share of all venture fundraising, multiple years in a row now.
    • The composition of funding has changed. The unicorn “factory” peaked in the ZIRP era of 2021 and has normalized at a much lower level since.
    • Funding per unicorn has increased roughly 5x since 2021. There are fewer unicorns, and each one is raising more.
    • Cohort health, pre-ZIRP group: of about 73 unicorns, 20 quarters after minting roughly 80 percent had either raised a new round or exited, which is healthy.
    • Cohort health, 2021 group: of about 479 unicorns, 20 quarters in, fewer than 20 percent had exited or raised again. Far larger cohort, far worse outcomes.
    • The open question is which cohort the new 2024 AI cohort will resemble.
    • Funding is concentrating: the top 10 companies capture a large share, and it is a small number of AI companies, not all of them, with Anthropic and OpenAI raising massive rounds.
    • Laffont proposes a “Magnificent 8” private index: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, Anduril, and more, spanning internet, AI, fintech, and space tech.
    • That private index represents almost $4 trillion of value and has crushed the traditional public Mag 7, with almost every name outperforming.
    • Exits are thawing. 2026 is on a good trend for cash returned versus consumed, not quite 2021 levels, with half a year still to go.
    • That trend does not yet include three imminent liquidity events: SpaceX (IPO expected in weeks) and Anthropic (confidentially filed its S1), whose combined value could exceed the prior decade of exits combined.
    • The ecosystem is far more balanced than when Laffont first presented at the 2024 All-In Summit, when it was consuming much more cash than it returned.
    • OpenAI and Anthropic revenue growth is unlike anything previously seen. Starting from January 2025, they passed Workday, then ServiceNow, then Adobe, then Salesforce, and are now bigger than Google Cloud and Azure.
    • On current forecasts, that revenue could pass AWS by the end of the year and exceed all of Microsoft by 2028.
    • Hyperscalers are not sitting still. The largest companies in the world are funding the disruption, investing unprecedented sums to enable the ChatGPT moment.
    • The SpaceX “CODE” framework: the number one driver correlated to SpaceX’s valuation is cadence of launches, and valuation per launch rises as launches increase.
    • Why per-launch value rises: business quality improves through phases, pre-constellation (one-time government revenue), initial ramp (one recurring-revenue constellation), scale (multiple constellations), and platform (space data centers, moon and Mars optionality).
    • Anthropic in particular is scaling like no company seen across the PC, internet, or mobile eras.
    • Counterintuitive 10x odds: a unicorn has about an 8 percent chance of becoming a decacorn, a decacorn has 8 to 13 percent odds of reaching $100 billion, but a centicorn ($100 billion-plus) has a 31 percent chance of a 10x.
    • Value creation has accelerated. It typically takes years to go from $500 billion to $1 trillion in market cap, yet recently three companies did it in one year and two did it in a matter of weeks.
    • Cerebras is the counterexample of slow success: years of dark periods and no new capital developing its technology, then a massive OpenAI contract that quintupled the company’s value ahead of its IPO.
    • Semiconductors are on a generational run, with the sector dramatically outperforming the index since the 2024 All-In Summit.
    • AI memory thesis: the more an AI system knows about you, the more useful it is, so memory per user could quintuple, which helps explain recent moves in memory companies.
    • Where the revenue is: the AI ecosystem is roughly $140 billion today, about $300 billion this year, and is expected to double in 2027.
    • Three revenue pillars: consumer (subscribers times ARPU), ads (about a quarter of Meta and Google ads are AI-enabled today, heading toward 100 percent and roughly $150 billion), and enterprise (tools like Claude Code and Codex inside businesses).
    • Disruption is hitting every sector: software, telco (Starlink-powered global phone calls), semis, energy (data centers reshaping Pennsylvania’s grid), auto (Ferrari’s electric and autonomous stumble), and consumer (GLP-1s reshaping food, alcohol, and wellness).
    • Final takeaways: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of not owning a winner is higher than ever, disruption is everywhere, and we do not even have superintelligence yet.
    • In the Q&A, both Anthropic and OpenAI publicly say they want to be public, and big outcomes now look likely to become liquid within roughly a 12-month window.
    • The valuation pushback: these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly even had a profitable month.
    • The public market is framed as the great equalizer and antiseptic, but with passive buying the true price discovery may not land on day one, more like six months and a day after listing.
    • A floated LP strategy: wait for whoever reaches $100 billion and concentrate capital there as the least brittle, quickest-return bet, tempered by the warning that valuations are disconnecting from any historical metric (50x to 100x revenue).
    • An open risk: with so much capital, OpenAI and Anthropic could rationally start a price war, the way ride-sharing and food-delivery players once did, though heavy infrastructure spend complicates it.

    Detailed Summary

    The unicorn economy has rebalanced after 2021

    Laffont opens by reframing a market many assume is frothy. The average unicorn is up about 70 percent since September 2024, and the public market has tracked a similar climb, so private and public value are moving together rather than diverging. The unicorn economy’s share of the NASDAQ rose sharply after 2015 and then plateaued, which he reads as a sign of how strong public companies have become. Underneath the headline, the structure of funding has changed. The 2021 ZIRP era was a unicorn factory that minted enormous numbers of companies, and that machine has since normalized to a much lower level. The result is a barbell: fewer new unicorns, but each raising far more, with funding per unicorn up roughly 5x since 2021. AI sits at the center of this, taking a steadily larger share of all venture dollars for several years running.

    Cohort health is the real story

    The deck’s most important slide measures the health of the ecosystem by cohort. The pre-ZIRP cohort, about 73 unicorns, looks healthy: 20 quarters after becoming unicorns, roughly 80 percent had either raised a new round or exited. The 2021 cohort tells the opposite story. It is enormous, about 479 unicorns, and 20 quarters in, fewer than 20 percent had raised again or exited. That contrast sets up the central question of the talk. A new 2024 cohort of AI companies is forming, and no one yet knows whether it will resemble the healthy pre-ZIRP group or the bloated, stuck 2021 group. Laffont’s framing leans optimistic because the AI cohort is small and concentrated, but he is careful not to declare the answer.

    The Magnificent 8 and a $4 trillion private index

    Funding is not just flowing to AI, it is flowing to a handful of AI names, with the top 10 capturing a large share and Anthropic and OpenAI raising the biggest rounds. From this concentration Laffont builds a private index he half-jokingly calls the Magnificent 8, a number he expects to shrink as companies go public. The members span sectors: SpaceX, Stripe, Anthropic, Databricks, Revolut, ByteDance, and Anduril, covering internet, AI, fintech, and space tech. He says he would be comfortable owning that index for the next decade-plus. Collectively it represents almost $4 trillion of value and has outperformed the public Mag 7, with nearly every constituent beating that benchmark.

    Exits are thawing and a wall of liquidity is coming

    One of Laffont’s recurring concerns at past summits has been balance: the unicorn economy is great at consuming cash, but a healthy ecosystem must also return it. On that score 2026 is trending well, not quite 2021, but solid with half a year left. Crucially, that figure does not yet include three imminent events. SpaceX is expected to go public within weeks, and Anthropic confidentially filed its S1 the day of the talk. Adding those up, just a few companies could deliver more liquidity than the prior ten years combined. The takeaway is that the ecosystem that was dangerously out of balance in 2024 is now meaningfully more balanced, and improving.

    The revenue ramp past the hyperscalers

    The growth rates of OpenAI and Anthropic, Laffont argues, are unlike anything previously seen. Charting from January 2025, the leading AI labs passed Workday, then ServiceNow, then Adobe by year end, then Salesforce by January, and are now bigger than Google Cloud and Azure. On forecast, that revenue could surpass AWS by the end of the year and exceed all of Microsoft by 2028. He stresses that the hyperscalers are not passive bystanders, they are actively funding the disruption, pouring unprecedented capital into enabling the change that began with the ChatGPT moment.

    The SpaceX CODE framework

    Laffont devotes real time to how Coatue thinks about SpaceX. The single factor most correlated with SpaceX’s valuation is cadence of launches, which is intuitive for a launch business. The surprise is that valuation per launch has risen rather than fallen as cadence climbed. His explanation, the CODE framework, is that the quality of the business model improves the more SpaceX launches. In phase one, pre-constellation, you are simply proving rockets, with a few government customers and lumpy, unpredictable one-time revenue. In the initial ramp you stand up a constellation, which is an end market and a recurring-revenue business that grows with every satellite and subscriber. At scale you operate multiple constellations, and Laffont expects companies, governments, and militaries to want to own their own. Ultimately it becomes a platform, with new businesses layered on top, from space data centers to the optionality of the moon and Mars.

    Counterintuitive odds and the speed of value creation

    Coatue bucketed companies and asked the odds of a 10x within each. A unicorn has roughly an 8 percent chance of becoming a decacorn. A decacorn has 8 to 13 percent odds of reaching $100 billion. But a centicorn, $100 billion or more, has a 31 percent chance of a 10x, counting both public and private companies. The bigger you are, the better your odds, which inverts intuition. Laffont pairs this with the sheer speed of recent value creation. Going from $500 billion to $1 trillion in market cap normally takes years, yet three companies did it in a single year and two did it in a matter of weeks. He also offers Cerebras as the patient counterexample, a chip company that endured years of dark periods and no new capital before a massive OpenAI contract quintupled its value ahead of IPO, part of a broader generational run for semiconductors.

    AI memory and where the revenue actually comes from

    A throughline from the day’s other speakers is that the more an AI knows about you, the more useful it is, from your restaurant preferences to your work context. Laffont turns that into a thesis: memory per user could quintuple based on what these systems require, which helps explain recent moves in memory companies. He then tackles the most contested question, where is the revenue. He sizes the AI ecosystem at about $140 billion today, roughly $300 billion this year, and doubling in 2027, built on three pillars. Consumer is subscribers times ARPU. Ads are the pillar people forget, with about a quarter of Meta and Google ads already AI-enabled and penetration heading toward 100 percent, a roughly $150 billion opportunity. Enterprise is the breakthrough category, exemplified by tools like Claude Code and Codex operating inside businesses.

    Every sector is being transformed at once

    What makes this era different, Laffont says, is that nearly every sector is being transformed simultaneously. Software is obvious, but look at telco, where he believes Starlink will soon power a device that lets you make a phone call anywhere on earth, attacking the global telco and broadband profit pool with a better product. Compute is driving massive change in semis, data centers are reshaping the energy equation in places like Pennsylvania, and the auto business is being upended, as Ferrari’s stumble introducing electric and autonomous technology showed. In consumer, GLP-1 drugs are profoundly changing consumption of food and alcohol and the broader focus on wellness. His takeaways close the loop: the new unicorn economy is healthier thanks to AI, winners are compounding faster so the cost of missing them is higher than ever, disruption is everywhere, and superintelligence has not even arrived yet.

    The Q&A: power law, valuation, and the public market test

    Chamath and Jason Calacanis press Laffont on what this means for allocators. The recurring theme is the power law and K-shaped outcomes, with gains consolidating into a small number of companies. The positive side, Laffont notes, is that outcomes are enormous and increasingly liquid within a 12-month window, and both Anthropic and OpenAI say they want to be public. The hard part is valuation. The besties cite Bill Ackman’s framing that investors are making venture bets on trillion-dollar companies at 50 to 100 times revenue. Laffont’s pushback is that these are not fake companies, they generate substantial revenue at scale and grow faster than anything before, and Anthropic reportedly had a profitable month. But he embraces the discipline ahead: the public market is the great antiseptic and will not care about anyone’s presentation, though with heavy passive buying, true price discovery may take roughly six months and a day rather than landing on day one. Asked whether the compounding is a market inefficiency or survivor bias, he declines to over-read a small sample, noting that Anthropic before Claude Code was a completely different company than after. The conversation closes on what happens when trillions recycle from GPs to LPs, the case for simply owning whoever crosses $100 billion, the risk of everyone crowding into three names, and the possibility of an eventual OpenAI versus Anthropic price war.

    Notable Quotes

    “So we have fewer unicorns that are each raising more.”

    Thomas Laffont, summarizing how funding per unicorn has risen roughly 5x since 2021

    “The reason is that the quality of SpaceX’s business model increases the more you launch.”

    Thomas Laffont, explaining the CODE framework and why valuation per launch rises with cadence

    “The winners are compounding faster than ever, which means the costs of not being in a winner are higher than ever.”

    Thomas Laffont, on the central risk of a power-law market

    “And by the way, we don’t even have super intelligence yet.”

    Thomas Laffont, closing his takeaways on how early the transformation still is

    “These are companies generating substantial revenue at scale that are growing faster than anything we’ve ever seen.”

    Thomas Laffont, pushing back on the idea that AI valuations rest on fake companies

    “It will be the great antiseptic. It will not care about my presentation.”

    Thomas Laffont, on the public market as the ultimate test for SpaceX, OpenAI, and Anthropic

    “Anthropic pre-cloud code was a completely different company than post cloud code.”

    Thomas Laffont, on why he won’t over-read a small sample of hyper-compounders

    “The power law rules our lives. All the great gains are being consolidated into small numbers of companies.”

    An All-In host, framing the Q&A on concentration in private markets

    This is a curated set of highlights. To hear the full presentation, the slide walkthrough, and the complete Q&A with Chamath and Jason Calacanis, watch the full conversation here.

    Related Reading

    • Coatue Management. Primary source for Thomas Laffont’s firm and the technology investing strategy behind the deck.
    • The All-In Podcast. The show and summit where Laffont made this premiere presentation.
    • Power law (Wikipedia). Background on the distribution Laffont and the hosts say governs venture and public-market returns.
    • The Magnificent Seven (Wikipedia). The public-market benchmark Laffont’s private “Magnificent 8” index is measured against.
    • Cerebras Systems. The AI chipmaker Laffont cites as the slow-grind IPO that was eventually transformed by a major OpenAI contract.
  • Paul Graham and Jessica Livingston on Resilience at Y Combinator: Founder Mode, Cockroaches, Sticking to Your North Star, and Why AI and Climate Keep Them Up at Night

    For the very first episode of Disaster Proof, the conversation goes to a garage in Palo Alto to sit down with Paul Graham and Jessica Livingston, the founders of Y Combinator. They have backed thousands of companies, including many now working in the resilience space, and the discussion covers what makes startups durable, why adaptability beats expertise, how Brian Chesky stumbled into founder mode at Airbnb, why the best ideas grow out of a founder’s own life, and the two specific risks (AI and climate change) that Paul says are the only ones he treats as genuinely game over. You can watch the full conversation on YouTube here.

    TLDW

    Paul Graham and Jessica Livingston explain why constant change favors young, flexible founders, and why Y Combinator picks people over ideas precisely so its judgment never goes obsolete. They unpack adaptability as the trait they hunt for in interviews, the “founder mode” story behind Brian Chesky steering Airbnb through COVID, and the 2008 strategy of funding tough, close-to-revenue “cockroaches.” Paul argues a company survives turbulence by sticking to a North Star instead of acting as a weather vane in shifting moral fashions, using the biosphere tree that collapses without wind as his metaphor for resilience. They turn to climate and energy as the next great market, the difficulty of selling into utilities, the Gridware success story, fusion no longer being thirty years away, and the trap of guilt-based business models versus the reliable assumption that users are selfish, greedy, and lazy. The personal-resilience half covers surviving Twitter mobs, Paul’s obsessive essay process, raising kids by indulging curiosity and picking your battles, prepping by living among reasonable people, political polarization, and why AI and climate are the two things that keep them up at night.

    Thoughts

    The most useful idea in this conversation is also the most counterintuitive: a world that feels like it is ending is structurally good for the people least invested in how it used to work. Paul’s point to terrified founders is that change is only a threat if you have sunk costs in the old order. A young founder has been doing the current plan for two weeks, so a step-function shift in the landscape costs them almost nothing to abandon. The incumbents with elaborate machinery and a decade of assumptions are the ones who should be afraid. That reframes resilience away from defense and toward optionality. The resilient party is not the one with the thickest walls, it is the one with the least to unlearn.

    The founder mode discussion is worth sitting with because it quietly overturns a generation of management orthodoxy. The old rule was that a good CEO hires executives and gets out of their way, and that getting into the details is micromanaging. Brian Chesky’s COVID experience at Airbnb broke that rule under maximum pressure. With bankruptcy on the table and a travel company facing a world that stopped traveling, he went line by line through the business and told people what good looked like, then gave them freedom to execute against that standard while still demanding visibility. The interesting nuance is the permission structure. A crisis granted Chesky the license to be involved that normal operating conditions would have framed as meddling. The lesson is not “always be in the weeds,” it is that the founder’s deep understanding and disproportionate caring are assets you are wasting if you reflexively delegate them away.

    Paul’s North Star argument is the part most likely to age well. His claim is that companies fail at resilience when they behave like weather vanes, swinging with each gust of public moral fashion. He pairs it with the biosphere tree that grows weak and topples because it was never exposed to wind. Both metaphors point at the same thing: resilience is built by surviving stress while holding your shape, not by avoiding stress and not by reshaping yourself to whatever the crowd currently rewards. The carbon-credit companies he mentions are the cautionary case. They built their entire premise on a fashion (customer guilt about carbon) and went out of business when the wind changed direction. Durable businesses convert a permanent human motive into value, which is why he prefers the brutally honest assumption that the user is selfish, greedy, and lazy, and that your job is to build something that produces good outcomes anyway.

    The climate and energy section reframes a worthy cause as a market-timing bet rather than a moral appeal, and that is the more powerful version. The comparison to fintech in 2008 is the tell. Banking technology was a sleepy, unglamorous sector that venture investors avoided until a crisis cracked it open and made it one of the best categories of the following decade. The argument is that energy and the physical world are sitting at a similar precipice, made newly viable because hardware is starting to behave more like software (order components, assemble, do not build everything from scratch) and because AI’s hunger for power has made energy the binding constraint on the whole industry. The Gridware story crystallizes the founder lesson underneath all of it. The best founder for a hard physical problem was a lineman who worked the electric lines and lived through the fires. The idea grew authentically out of his life, which is the same pattern Jessica keeps returning to and the same advice they give for raising kids.

    Finally, the personal-resilience material is more practical than it first appears. Paul’s method for surviving a Twitter mob is pattern recognition: once it has happened twenty times, you know it ends in two days and they move on to the next target, so you wait it out instead of capitulating. His essay process is the same conviction-building engine applied to ideas. He goes sentence by sentence until there is no false statement left to attack, which is why his challenge to angry readers (“point out the incorrect statement”) almost never gets answered. The throughline across the company advice, the parenting advice, and the personal advice is identical. You build durable conviction not by sitting in a room thinking, but by working the problem until it is right, then refusing to be blown off course by people who never actually engaged with the substance.

    Key Takeaways

    • Experts are frequently wrong because they are experts in a previous version of the world, so Paul deliberately avoids permanent beliefs about the current state of technology.
    • Y Combinator picks startups by picking founders, not ideas, because the founders know more about the ideas than the investors do.
    • Living in England and visiting for each batch lets Paul arrive every quarter expecting the world to be different, which keeps his mind open instead of anchored.
    • A world of constant change feels bad but is actually good for a young, flexible founder who has only been on the current plan for two weeks and can switch easily.
    • Vibe coding went from kind-of-works to reliably works, and even experienced programmers now generate huge volumes of code with AI.
    • There is still a software business even with AI, because someone has to know what to tell the AI to write, and no company is going to write its own database from scratch.
    • The scenario Paul worries about is model companies spinning up agents to start all the startups themselves, removing the need for human founders.
    • The founder traits Jessica looks for are unchanged over the years: determined, flexible-minded, and willing to adapt.
    • In interviews you can spot rigid founders because they answer the question they prepared rather than the one they were asked, and the gears visibly grind when you redirect them.
    • A good adaptability signal is a founder who says “I haven’t thought about that, but here is how I would think about it” instead of freezing.
    • Founder mode, the term, came from Brian Chesky’s experience steering Airbnb through COVID, when bankruptcy was openly discussed in board meetings.
    • Ken Chenault, the former American Express CEO on Airbnb’s board, told Chesky the moment was ten times worse than 9/11 and could define the company.
    • Founder mode meant Chesky understood every line item, told people what good looked like, then gave them freedom to execute while still wanting to see it.
    • Founders see through the fog because they understand the company better than anyone and they care more than anyone, and combining understanding with caring lets them see more.
    • There is always some disaster at Y Combinator, the way a hospital always has someone coding, so a crisis is the normal operating environment, not an exception.
    • During the 2008 crash, YC kept funding because it is always a good time to start a startup, but focused on people close to making money and very tough founders they called cockroaches.
    • Airbnb was the ultimate cockroach, seemingly indestructible, which is exactly why they liked it during the meltdown.
    • YC rests on two axioms: startups matter, and founders are the most important ingredient in startups. As long as those hold, YC has room to exist.
    • Company values are usually written down a few years in, documenting principles that already existed rather than inventing new ones.
    • You cannot move with fashion; you have to stick to your North Star, especially during turbulent, noisy times.
    • Trees grown inside a biosphere fell over because they were never exposed to wind, so being blown around is a necessary part of becoming strong enough to stand.
    • What preserves YC most is that it is a fundamentally good idea: it gives lonely founders money, the right peers, and colleagues they would never otherwise have.
    • The measure of a good startup idea is revenue, and any other metric you care about matters only because it predicts revenue.
    • At the early stage you can afford to be virtuous and even tell founders to go back to college, because the power law means one startup in the batch will carry the returns.
    • Every startup has to find early adopters, who decide quickly, usually do not have much money, and tend to be sophisticated, which means utilities are rarely your first customer.
    • A company that ultimately sells to utilities should start by selling to something that says yes faster, like running a pilot on a single corporate campus.
    • Utilities are under so much stress from wildfire liability, renewables, EV charging, and AI demand that they are unusually willing to try new things out of necessity.
    • Gridware, founded by a former lineman who lived through major fires, is now backed by Sequoia with PG&E as a huge customer, an example of an idea growing out of the founder’s life.
    • The second-biggest chunk of YC startups after AI is hard tech and physical products, not because software is dead but because building physical things is getting more possible.
    • Energy is one of AI’s fundamental constraints; if Sam Altman could have two things for Christmas, they would be energy and GPUs.
    • Nobody says fusion is thirty years away anymore, and the old thirty-year number existed because it was far enough out to avoid demands for results but close enough to keep attention.
    • Energy and physical markets may be where fintech was in 2008, a sleepy sector about to be cracked open by crisis into a great decade.
    • Guilt is a fragile business model because fashions change what people feel guilty about, which is why carbon-credit companies collapsed when the winds shifted.
    • Assume the user is selfish, greedy, and lazy, then build something that causes good things to happen anyway, like clean power that is simply cheaper and more reliable.
    • To survive Twitter mobs, remember they move on in about two days, half are bots or people you would never talk to in real life, and you cannot become a weather vane for moral fashions.
    • You build conviction by working on and developing an idea, not by sitting in a room thinking, unless it is pure thought like math.
    • Paul writes essays sentence by sentence until nothing in them is false, which is why his challenge to point out an incorrect statement almost never gets answered.
    • The best startup ideas, and the best projects in life generally, grow authentically out of the founder’s own interests and experiences.
    • Their parenting philosophy is to give kids confidence and a stable base, indulge their curiosity, and encourage projects nobody told them to do.
    • You pick your battles with kids: put your foot down on cruelty, but accept defeat on things like food and screen time.
    • A useful interview question for anyone with an unusual experience is not “what was it like” but “how was it different than you expected,” which surfaces the genuinely novel detail.
    • In a time of turbulence, bet on an island full of reasonable people; the English may not be very dynamic, but they are reasonable.
    • The hope on political polarization is to build resilient institutions that act as a cage around any single leader, so that throwing the rattle makes no difference.
    • AI and climate change are the two things Paul worries about most because they are both potentially game over, like the Gulf Stream reversing and turning Europe into a frozen wasteland.

    Detailed Summary

    Staying an expert when the world keeps changing

    The conversation opens on Paul Graham’s essay “How to Be an Expert in a Changing World,” whose core point is that experts are often wrong because they are experts in a previous version of the world. Asked how he keeps his own beliefs from going obsolete when the landscape can shift in ninety days, Paul says he focuses on people. YC picks founders rather than ideas because the founders know the ideas better than any investor could. He deliberately holds no permanent beliefs about the current state of technology, and the rhythm of flying in from England for each batch helps: he arrives every quarter already expecting everything to be different. One quarter the story is everyone training open-source models, the next quarter it is Claude code and nobody bothers with open-source models because the frontier versions are better anyway. He comes in with a completely open mind. Jessica and Paul note that today’s founders are more frightened, asking what is even still true, but the message Paul gives them is that constant change favors the young and flexible. If you have only been executing a plan for two weeks, a disruption costs you nothing; you just switch.

    What adaptability looks like in a founder

    Jessica describes the founders she funds as determined, flexible-minded, and willing to adapt, and calls adaptability a key trait always, but especially in uncertain times. In interviews, the rigid applicants reveal themselves by answering the question they planned to answer rather than the one they were asked, and you can almost hear the gears grind when you redirect them. Paul does not let that slide; if they dodge, he just asks again. The positive signal is a founder who, faced with a question they have not considered, says “here is how I would think about it” and reasons live. Both point out that YC itself had to adapt, and that the company they funded the interviewer’s startup as in 2009 looked very different by the end. They funded him in May 2009, in the thick of the financial crisis, after he had quit his job in August 2008 and briefly felt he had made a terrible mistake.

    Founder mode and seeing through the fog

    Paul points to Brian Chesky as the defining example of weathering disaster, a story he explored on This Week in Startups. When COVID hit a travel company like Airbnb, the word bankruptcy was being used in board meetings, and Ken Chenault, the former American Express CEO on the board, warned it was ten times worse than 9/11. Chesky went into what would later be named founder mode, getting into every line item, understanding exactly what was needed, telling people what good looked like, and then giving them freedom to execute while still insisting on visibility. The crisis gave him permission to be the involved CEO he had always wanted to be, the kind of involvement that normal operating conditions would have labeled micromanaging. Paul argues founders see through fog that blinds everyone else for a simple, rational reason: they understand the company better than anyone because they have been there longest and thought of most of it, and they also care more than anyone. Combine deep understanding with deep caring and of course they see more.

    Cockroaches, the North Star, and the biosphere tree

    Returning to 2008, when YC was self-funded and unsure whether anyone would invest by March, they decided to keep going on the principle that it is always a good time to start a startup, but to fund people close to making money and very tough founders they called cockroaches, after the creatures that survive nuclear war. Airbnb was the ultimate cockroach. Paul frames YC’s longevity around two axioms (startups matter, founders are the most important ingredient) and around resilience built through stress. He tells the story of trees grown inside a biosphere that fell over because they were never exposed to wind, since being blown about is a necessary part of a tree becoming strong enough to support its own weight. YC has been blown around and is still standing, which is exactly what gave it practice. The companion idea is the North Star: you cannot move with fashion or act as a weather vane swinging with other people’s moral fashions, you have to hold your founding principles, which Paul eventually wrote down rather than let a 23-year-old new hire do it.

    Climate, energy, and selling into hard markets

    The interviewer’s own path (a curiosity about wildfire that grew from living in California, watching PG&E go bankrupt, a fire on his Mendocino property, volunteering as a firefighter) becomes the case for ideas that grow authentically out of a founder’s life. Climate is framed broadly as energy, the built environment, and transportation, essentially the physical world, and those are hard markets where the buyers are utilities, governments, real estate, and insurance. The advice is to find early adopters who decide quickly, which usually means not starting with a utility but with something like a single corporate campus that will say yes faster. Utilities, though, are under so much stress from wildfire liability, renewables, EV charging, and AI demand that they are increasingly willing to try new things. Gridware, founded by a former lineman who lived through major fires, is the proof point: backed by Sequoia, with PG&E as a major customer. Paul notes the second-biggest chunk of YC startups after AI is hard tech, not because software died but because building physical things is getting more possible, more like ordering and assembling components. Energy is the binding constraint on AI, fusion no longer feels thirty years away, and the bet is that energy and physical markets are where fintech was in 2008, about to be cracked open.

    Guilt versus greed as a business model

    On the question of whether climate companies should sell on guilt (recycle, pay more because it is sustainable), Paul is blunt that guilt is fragile because fashions change what you are supposed to feel guilty about. The carbon-credit companies thrived until buying carbon credits stopped being cool, then went out of business. A founder’s own concern for the world can drive great companies, but depending on a customer’s guilt is shallow. The durable move is to assume the user is selfish, greedy, and lazy, someone who just wants to eat pizza and watch Netflix, and to build something that produces good outcomes despite that. Clean power is the perfect example: nobody watching Netflix is upset that fusion powers their television, and if it is cheaper and more reliable, that is simply more Netflix and more money for pizza.

    Personal resilience, Twitter mobs, and the essay process

    On surviving public criticism, Paul’s method is pattern recognition: after twenty mobs you stop counting and know it will be over in two days when they move to the next topic, so you wait it out even though it genuinely feels miserable. Half of them are bots or people you would never talk to in real life, but the deeper point is that companies and people stay resilient by not succumbing to mobs and not becoming weather vanes for moral fashions. Conviction is built by working on an idea, not sitting in a room thinking about it, unless it is pure thought like math. His essays are the engine: he writes a version one, notices everything wrong, and fixes it sentence by sentence until there is no false statement left. He will read an entire book for a single sentence because he would be mortified to publish something false and, having no deadlines, has no excuse. That is why his standing challenge to angry readers, to point out one incorrect statement, almost never gets answered.

    Raising kids, prepping, and the things that keep them up at night

    Their parenting philosophy is to give kids confidence and a stable base, indulge curiosity, and encourage projects nobody assigned, like the living room overrun by one son’s Lego. They pick their battles: they put their foot down on cruelty but admit total defeat on food, devices, and screen time. Paul’s favorite question for anyone with an unusual experience is not “what was it like” but “how was it different than you expected,” which surfaces the genuinely novel detail, and the meta-version of that became the show’s recurring question to all guests. On prepping, they joke that living in the English countryside is itself a form of preparation, and that in turbulent times you should bet on an island full of reasonable people. The episode closes on what keeps them up at night: AI and climate change, the two things Paul treats as uniquely game over, illustrated by the prospect of the Gulf Stream reversing and leaving Europe, which sits as far north as Alaska, a frozen wasteland. Jessica notes her YC superhero name was Panic, and the conversation ends, after a detour through political polarization and a child who insisted for six months on being called SR-71 forecast 80 leaping leopard, on the admission that they manage screen time by being utterly defeated.

    Notable Quotes

    “If you’re a startup founder, a world where things are constantly changing is actually good for you. It feels bad, but you’re better off than anybody else.”

    Paul Graham, on why turbulence favors young, flexible founders

    “You can’t move with fashion. You have to stick to your North Star.”

    Paul Graham, on holding founding principles during noisy, turbulent times

    “There’s always some kind of disaster. It’s almost a rule of thumb at Y Combinator that there’s always some disaster going on, just like in a hospital. There’s always somebody who’s coding.”

    Paul Graham, on crisis as the normal operating environment for startups

    “The measure of a good startup idea is revenue, sure. Let’s not pretend companies are supposed to do something else.”

    Paul Graham, on how to judge whether an idea is actually good

    “Assume that the user is selfish and lazy, and make something. Selfish, greedy, and lazy. And make something that causes good things to happen despite that.”

    Paul Graham, on why guilt is a weak business model and greed is a source of energy

    “This is where the best startup ideas come from. They grow authentically out of the founders’ lives.”

    Jessica Livingston, on a wildfire curiosity turning into a company

    “Please point out the incorrect statement I’ve made in this essay. And no one ever does that.”

    Paul Graham, on writing essays sentence by sentence until nothing in them is false

    “AI and climate change have something in common. They’re the two big things I worry about the most, because they’re both game overs.”

    Paul Graham, on what keeps him up at night

    This is the first episode of Disaster Proof, a series exploring the people and technologies building resilience in an increasingly volatile world. You can watch the full conversation with Paul Graham and Jessica Livingston on YouTube here.

    Related Reading

  • Benedict Evans on Why AI Is Stuck in 1997: The Task vs the Job, Commodity Models, and Why the Jobs Apocalypse Is Overhyped

    Benedict Evans, the former Andreessen Horowitz partner and independent analyst behind the annual “AI Eating the World” presentation, sat down with Lenny’s Podcast for what the host calls the most rational take on AI you will hear this year. Instead of either doom or hype, Evans argues that AI is as big a deal as the internet or mobile, and only as big a deal as the internet or mobile, which means we are living through something closer to 1997 than to the singularity. The conversation moves through the jobs question, the difference between a task and a job, whether the model labs have any pricing power, the anti-AI backlash, and what people should actually do. You can watch the full conversation on YouTube here.

    TLDW

    Evans frames AI as a platform shift on the scale of the internet or mobile, with the crucial twist that almost nothing has been built yet, so we are in the 1997 moment where confident predictions about winners are usually wrong. He introduces his central tool, the distinction between the task and the job, to explain why “X percent of this profession is exposed to AI” studies are misleading, why the AI labs are paradoxically hiring forward deployed engineers and buying consultancies, and why accountants kept multiplying through every wave of automation (the lump of labour fallacy and Jevons paradox at work). On value capture he makes a deterministic bet that foundation models have no network effects, behave like a commodity, and will look more like cloud than like Windows, with the value moving up the stack to applications, much as it did in telecom, where a trillion-dollar industry grew data traffic thousands of times over while its stocks went nowhere. He covers distribution as the real moat, Apple Intelligence as the most compelling unshipped vision, the fuzzy anti-AI backlash (including the largely fake water panic and the very real harms of deepfakes), raising kids under radical uncertainty, and closes with the disarming admission that his own synthesis-heavy job is exactly the kind AI is currently worst at. His advice: presume radical uncertainty, dive in rather than sneer, and assume it will probably be okay.

    Thoughts

    The most useful thing in this conversation is a single question Evans keeps returning to: what is the task, and what is the job? A spreadsheet automated the arithmetic an accountant does, and the number of accountants went up for the next forty years. Claude Code can write the code, but deciding what to build, for whom, and why is the part nobody has automated. The reason the “this profession is X percent exposed to AI” studies feel hollow is that they assume a job is a neat stack of separable tasks. Evans argues, by analogy to the old expert-systems failure, that you simply cannot decompose a senior lawyer’s work that way. The 75-slide deck is the task. Walking your company, reading its politics, talking to your customers, and telling you the uncomfortable truth is the job, and that is what you actually paid McKinsey for.

    The boldest and most falsifiable claim is that the foundation-model companies look more like cloud than like Windows. No network effects means no winner-take-all, which means durable competition, which means commodity pricing and compressed margins, with the real value accruing up the stack in applications that nobody at the labs is going to build. His telecom analogy is the one to sit with. A trillion-dollar industry grew mobile data traffic by 1,500 to 2,000 times in fifteen years, and the stocks went nowhere for a quarter century, because it was a low-margin utility while all the interesting value moved to Apple and the people building apps on top. If he is right, the current token-burn economics, the person reportedly spending 1.5 million dollars a month on tokens, are the 2010 equivalent of a 50,000 dollar roaming bill, not the steady state. Evans flags openly that he could be completely wrong, which is the intellectually honest part and the part most forecasters skip.

    “It depends” and “it will probably be okay” sound like evasions, and Evans leans into that. But the 1997 framing is doing real work. The point is not that AI is small, it is that the things that will end up mattering have not been built, and that anyone confidently naming the winners today is repeating the 1997 mistake of betting on Excite over a search company with a weird logo. The discipline he is selling is to presume radical uncertainty and act anyway, because the alternative, declaring the whole thing slop and shouting about it online, buys a great feeling of moral superiority and nothing else. His repeated insistence that you can see the job that goes away but never the new job, because it does not exist yet, is the load-bearing idea under his optimism.

    The most disarming moment is the closing AI-corner answer, where the person whose entire brand is explaining AI admits he struggles to use it. His work is synthesis and precise information retrieval, and precise retrieval happens to be exactly what today’s models are worst at. He is, in his own words, the lawyer looking at VisiCalc: it is obviously transformative, and he just does not happen to make spreadsheets all day. That admission is worth more than any benchmark, because it locates the real variable. How much AI changes your life depends less on how good the model gets and more on whether your daily work sits on the part of the jagged frontier where it already works. That is a far more practical lens than arguing about whether AGI arrives in three years or thirty.

    Key Takeaways

    • Evans’s headline opinion is that AI is as big a deal as the internet or mobile, and only as big a deal as the internet or mobile. Both halves of that sentence matter.
    • If you make the internet comparison honestly, we are roughly in 1997: very exciting, most of it does not work yet, most of what people will build has not been built, and it is unclear how any of it will end up working.
    • Adoption is spread across a very wide distribution. Even among teenagers, only something like 15 to 20 percent are daily active users and another 20 percent weekly, with the majority saying they do not use it at all.
    • That spread maps onto the “jagged frontier” question of where AI works, where it does not, whether you can predict where it will work in advance, and whether you can even tell after the fact.
    • Software developers are the accountants seeing VisiCalc: for them everything has already changed. Most other professions are watching, intrigued but unsure what to do with it.
    • The AI labs are investing heavily in forward deployed engineers, consultancies, and professional services. Evans jokes that a forward deployed engineer is an Accenture outsourced developer who lives in San Francisco.
    • Companies do not have spare people sitting around to reimagine every internal workflow, so reinventing a business around AI is itself a project that needs consultants, which is why the most cutting-edge labs are funding exactly the firms everyone assumed AI would kill.
    • The central framework: separate the task from the job. Sometimes the task is the job (the elevator operator pressing a lever), and automating the task ends the job. Far more often, the task is only part of the job.
    • Amazon gets you the SKU once you know which SKU you want. Knowing which one to buy is a different job. Claude Code writes the code, but knowing what code and what features to build is the job.
    • A McKinsey or Bain engagement is not really about the deck. The deck is the task. The job is walking your enterprise, understanding the politics, talking to your customers, and telling you the truth.
    • The Jevons paradox is just price elasticity applied to labour. Make something cheaper to produce and you usually do far more of it, not the same amount with fewer people.
    • Excel did not give investment bankers shorter hours. iPhone SDKs did not shrink the number of engineers even though Apple writes 90 percent of the code for you. The number of accountants rose through every wave of automation.
    • The lump of labour fallacy: since 1800, each technology automates jobs and unlocks new ones. You can always see the job that disappears and never the new job, because it does not exist yet.
    • Evans is wary of argument from authority on jobs. He wants Dario Amodei’s view on where models go in the next 6 to 12 months, not necessarily his theory of labour markets and comparative advantage.
    • The doomer scenario of every company buying ChatGPT and firing everyone in two weeks misunderstands how enterprises work. Enterprise sales cycles run 18 months or more. Nobody is ripping out SAP overnight. The full transformation takes 3 to 10 years, sector by sector.
    • AGI and superintelligence are being quietly redefined to mean whatever works now. Larry Tesler’s theorem: AI is whatever machines cannot do yet, because once they can, people call it just software.
    • We have no theory of human intelligence, no theory of why these models work, and no theory of how much better they will get, so everyone is vibes-forecasting. Even if progress stopped tomorrow, what exists is already transformative and will roll out for a decade.
    • On value capture, Evans argues models show no network effects, so no single one runs away with the market. Persistent competition plus little real product differentiation means little pricing power.
    • Sam Altman’s pitch of selling intelligence on a meter like electricity ignores the brutal margin structure of utilities. Your TV maker does not pay the power company a cut of your bill.
    • The telecom analogy: a roughly trillion-dollar mobile industry spends 15 to 20 percent of revenue on capex, grew data consumption 1,500 to 2,000 times since 2010, and its stocks went nowhere for 25 years because it is a low-margin commodity utility.
    • The elemental question: does the model do the whole thing, or does it need thousands of different apps built by different people? If it needs apps, the labs cannot build them all, just as Microsoft did not, so it looks more like AWS than like Windows.
    • If the product is a commodity, distribution becomes the moat. Google pushes Gemini through its surfaces, Meta sprayed AI across its apps and quietly ranked between ChatGPT and Gemini in usage, and incumbents with distribution have a structural edge.
    • Browsers are the warning: Microsoft used distribution to win the browser war, then it turned out winning browsers did not matter because the value was further up the stack.
    • Apple Intelligence, as shown at WWDC 2024, was the most compelling vision of a personal AI assistant Evans has seen. Apple could not ship it, but neither could anyone else, because tool-using on-device agents with no hallucinations across thousands of apps is genuinely hard.
    • The model is “the dumb thing underneath” that powers a feature. The same commodity model can sit beneath both Gemini on Android and Apple Intelligence on iOS while the products and distribution differ entirely.
    • The anti-AI backlash is a big fuzzy mess. Some is real (local electricity bills, deepfakes, real job anxiety), some is sort of true, and some is simply false.
    • The data-center water panic is largely fake. A Livermore lab study put US data-center water consumption at about 0.017 percent of US water use. Local well conflicts are planning problems, not data-center problems.
    • We have shockingly little hard data. The model labs do not publish meaningful usage numbers. There is no public daily active user figure for ChatGPT, so economists are reverse-engineering effects from government surveys.
    • Real new harms do appear with each wave. A teenager could not use Photoshop to make explicit fakes of every classmate and send them to the whole school in an afternoon. Now they can, and turn them into video.
    • The UK Post Office Horizon scandal (buggy Fujitsu software wrongly showing cash shortfalls, leading to prosecutions, bankruptcies, and suicides) is a reminder that every technology brings new ways to ruin lives, by malice or by accident.
    • You cannot reliably predict what gets exposed. In 1997 people thought taxis were safe from the internet and newspapers would be fine. The opposite happened. Today, “AI-proof” jobs like personal trainer may not be as safe as they look.
    • Uber and Airbnb show that similar-sounding companies can have very different market impact. Uber demolished and then grew the taxi market, while Airbnb’s effect on hotels was fairly marginal because business travel still wants a hotel.
    • Every new technology first lets you do the old thing but more, then unlocks things that were not possible before. Recorded music revenue is U-shaped: first “what if I do not pay 15 dollars for a CD,” then “what if 15 dollars a month gives me all the music there is.” Spotify is not an online music store, it is something else.
    • Coding was supposed to be one of the last things automated, and instead it is the most transformed role of all, which is itself a lesson in how badly we predict exposure.
    • Practical advice: do not stick your head in the sand. Dive in, submerge yourself, and come out understanding what you can do with it. Going into a shrinking job market announcing you will never use AI is not the right posture.
    • Evans’s honest coda: he struggles to find AI use cases because his job is synthesis and precise retrieval, the things models are worst at. He uses it for proofreading, images, redecorating his apartment, and dictation. He is the lawyer looking at VisiCalc.

    Detailed Summary

    AI is as big as the internet, and we are living in 1997

    Evans opens with the opinion he calls his most controversial: AI is as big a deal as the internet or mobile, and only as big a deal as the internet or mobile. To some in tech that sounds dismissive, as if he is underrating a once-in-history event. His reply is that smartphones and the internet were themselves enormous, and we are talking over the internet right now. The deeper point is the comparison’s timing. If this is like the internet, then it is like the internet in 1997: thrilling, but most of it does not work yet, most of what will be built has not been built, and nobody knows how the pieces will fit. His latest 80-slide presentation, he jokes, is essentially 80 ways of saying “we do not know,” which is partly facetious and partly the entire point.

    The jagged frontier and the wide spread of adoption

    Adoption is not uniform, it is a wide distribution. Some people in tech have bought clusters of Mac minis and stopped using Google, while most people outside tech who use AI at all touch it once every week or two. Even among 13 to 18 year olds, daily active use sits around 15 to 20 percent, weekly use adds another 20 percent, and roughly 60 percent say they do not use it. That spread maps onto what Evans calls the jagged frontier: whether a given task works, whether you can predict in advance that it will work, whether it is intuitive, and whether you can even tell after the fact. Software developers are the accountants who just saw VisiCalc, living in a clear before-and-after. Everyone else is somewhere on the curve, picking it up to varying degrees and a little puzzled about what it is for.

    Why the AI labs are buying consultancies

    One of the most counterintuitive trends is that the leading labs are pouring money into forward deployed engineers and professional services, the very category many assumed AI would erase. Evans’s explanation is grounded in how companies actually operate. Firms do not keep spare people sitting around to redesign stores, hunt down churn, or rebuild a tech stack, which is exactly why they hire Bain, BCG, McKinsey, Accenture, or Infosys when a big project appears. Reimagining every internal workflow around AI, then actually plugging vertical and horizontal systems together and retraining people, is itself a multi-month project requiring people you do not have. So the work gets outsourced, and the most advanced labs are funding the firms that do it. His joke lands the point: a forward deployed engineer is a statistician, or an Accenture developer, who happens to work in San Francisco.

    The task versus the job

    This is the spine of the conversation. Ask what the hard part of a job really is. Sometimes the task is the job: the elevator attendant’s whole job was driving the car, the task got automated, the job ended. Much more often the visible task is only a slice. Amazon gets you the SKU once you know which SKU you want, but knowing what to buy is a separate job. Claude Code writes the code, but deciding what to build, for whom, and how to take it to market is the job. A consulting deck is the task, while the reason you pay Bain is for them to walk your company, understand its politics, talk to your customers, and tell you the truth. Evans notes you can already generate a bad McKinsey deck with AI, and the LinkedIn grifters who do are missing that the deck was never the thing you were buying.

    Jevons paradox and the lump of labour fallacy

    The Jevons paradox is just price elasticity applied to labour: make something cheaper to do and you usually do much more of it. Excel did not hand junior bankers their Friday afternoons off, it expanded the work. iPhone developers write a fraction of the raw code because Apple wrote the drivers and file system, and there are not a tenth as many engineers, there are far more. The count of accountants climbed through adding machines, punch cards, mainframes, databases, ERP, spreadsheets, and cloud. The lump of labour fallacy is the broader version: since 1800 every technology has removed jobs and unlocked new ones, the removed jobs usually look bad in hindsight, the new ones tend to be better, and GDP keeps rising. You can always see the job that disappears and never the one that does not exist yet.

    The jobs question, Dario, and the enterprise sales cycle

    On the coming jobs apocalypse, Evans is cautious about argument from authority. Running an AI lab makes Dario Amodei worth listening to on where models go in the next 6 to 12 months, not necessarily on labour economics and comparative advantage. The doomer image of companies buying ChatGPT and firing everyone within weeks misreads reality: enterprise sales cycles run 18 months or longer, nobody is tearing out SAP overnight, and the full transformation will take 3 to 10 years, sector by sector, as people slowly work out what to do. He points to the lag in software itself. Many SaaS companies founded the day before ChatGPT launched could have been built a decade earlier, and were not, because the delay was someone realizing a problem existed and that this was the way to solve it.

    Redefining AGI and superintelligence

    Evans is skeptical of the moving terminology. He cites Larry Tesler’s line that AI is whatever machines cannot do yet, because the moment they can, people call it just software. Machine learning, image recognition, and sentiment analysis all got reclassified as not really AI once they worked, the same way jet airliners were once high technology and are now just planes. AGI is now often quietly redefined as doing some percentage of economically valuable work, which a 1975 mainframe also did, rather than anything about consciousness or a soul. Whether we reach human-level intelligence is, in his view, genuinely unknowable right now. The reassuring point is that you do not need to resolve it. Even if models hit a brick wall tomorrow, what already exists is transformative and will take a decade to deploy.

    Where the value accrues: commodity models and the telecom analogy

    Here Evans makes his most deterministic argument. Foundation models appear to lack network effects, so no single model runs away from the pack, competition persists, and product differentiation as users experience it is thin. Without differentiation or lock-in, where does pricing power come from? He skewers Sam Altman’s image of selling intelligence on a meter like electricity by pointing out that utilities have terrible margins and nobody pays the power company a cut of their TV. His telecom career supplies the analogy: mobile is a roughly trillion-dollar industry that spends 15 to 20 percent of revenue on capex, grew data traffic 1,500 to 2,000 times since 2010, and whose stocks went nowhere for 25 years because it is a low-margin commodity utility while the value sits up the stack with Apple and the app makers. If models are commodities and the real product is thousands of apps the labs will not build, the outcome looks like cloud, not like Windows.

    Distribution as the moat

    If the product is a commodity, distribution decides the winners. The web browser is the cautionary tale: the browser product is a thin wrapper around a rendering engine, tab browsing was the last real innovation 20-plus years ago, Microsoft used distribution to win, and then winning browsers turned out not to matter because the value was elsewhere. Now Google drives Gemini through its surfaces and Meta sprayed AI across its apps and, in survey data, sat between ChatGPT and Gemini in usage despite tech writing it off. An adequate product with great distribution and brand becomes a big deal, which is why OpenAI spent last year trying everything to build a flywheel before the giants defaulted everyone onto their own offering. The power of the default and sheer inertia do a lot of work.

    Apple Intelligence and the model as the dumb thing underneath

    Evans calls the Apple Intelligence segment of WWDC 2024 the most compelling vision of a personal AI assistant he has seen: tool-using, on-device, agentic, with no prompt injection or hallucinations across a standardized API spanning thousands of apps. Apple could not ship it, but neither could anyone else, because that is genuinely hard. The episode illustrates his framing that the model is “the dumb thing underneath” that powers a feature. The same commodity model can sit beneath Gemini intelligence on Android and Apple Intelligence on iOS, with different products, different distribution, and different decisions about what the feature should be. Apple has a billion edge-capable devices, while Google’s “coming soon to our most powerful devices” really means it will not work on most Android phones.

    The anti-AI backlash, water, and real harms

    The backlash, Evans says, is a big fuzzy mess of very different things. Some is tangible, like a higher local electricity bill in a small number of places. Some is essentially fake, like the water panic. He dug into a Livermore lab study putting US data-center water use at about 0.017 percent of national consumption. Local well conflicts are planning failures, not data-center failures. The jobs piece is genuinely unresolved, with charts pointing both ways and a youth employment slowdown that shows up regardless of degree or AI exposure. He stresses how little hard data exists, since the labs publish no meaningful usage numbers and there is no public daily active user figure for ChatGPT. He compares the moment to the social media backlash, compressed, where some fears were true, some half true, and some simply false. The real new harms are real, though: deepfakes let a teenager generate explicit fakes of an entire school in an afternoon, and the UK Post Office Horizon scandal shows how buggy software plus institutional denial can destroy lives.

    You cannot predict what gets exposed, and what to actually do

    Evans dismisses the O*NET-style exercise of scoring what percentage of each profession AI can do as deluded, the modern version of the expert-systems problem, where you try to describe a job as 700 logical steps and it never works. You cannot say a senior partner’s work is 17 percent automatable. The history of prediction is humbling: in 1997 people thought taxis were safe from the internet and newspapers would simply save on printing, and both were wrong. Coding, supposedly one of the last things to automate, became the most transformed role of all. Personal trainers might be next once your phone can watch your form. His closing advice is to presume radical uncertainty and act anyway: do not retreat into sneering moral superiority, dive in, internalize what the tools can do, and make yourself a great hire. He ends with a candid admission that his own synthesis-and-retrieval job is exactly what AI is currently worst at, so he is the lawyer looking at VisiCalc, sure it changes everything while not personally making spreadsheets all day.

    Notable Quotes

    “My most controversial opinion is that I think that AI is as big a deal as the internet or mobile, and only as big a deal as the internet or mobile.”

    Benedict Evans, stating the thesis that frames the whole conversation

    “If you’re going to make the internet comparison, it’s like we’re in 1997. It’s very exciting. Most stuff kind of doesn’t work yet. Most of the stuff that people are going to do hasn’t been built yet.”

    Benedict Evans, on why confident predictions about AI winners are usually wrong

    “You can’t look at a senior partner at a law firm and say, well, 17 percent of their work could be automated. This is horseshit.”

    Benedict Evans, on why O*NET-style job-exposure scoring fails

    “Claude Code can write you the code, but what code do you want? It can make you the features, sure, but what features do you want? Who’s your customer? What’s the right product for that customer?”

    Benedict Evans, drawing the line between the task and the job

    “There’s this quote from Sam Altman where he said we’re going to be selling AI intelligence on a meter like water or electricity, and you look at this and think, my dear sweet child, you need me to explain the margin structure of the utility industry to you.”

    Benedict Evans, on why model labs may lack pricing power

    “The model is just the dumb thing underneath that powers the feature. The model is the commodity that powers different decisions about what the feature should be.”

    Benedict Evans, on why value moves up the stack to applications

    “Every time we have a new technology it automates away a bunch of jobs, and then that automation unlocks a bunch of new jobs, and you don’t know the new job because it doesn’t exist yet.”

    Benedict Evans, on the lump of labour fallacy and 200 years of automation

    “Don’t stick your head in the sand and say I hate all of this stuff. That gives you a great feeling of moral superiority, but that’s not going to help. What helps is you diving into this and coming out understanding what you can do with it.”

    Benedict Evans, on what to actually do about AI right now

    “AI is good at stuff that computers are bad at, and bad at stuff that computers are good at.”

    Benedict Evans, quoting an observation that explains why he struggles to use AI in his own work

    This is a curated set of pulls, not a transcript. To hear the full argument in context, including the telecom and recorded-music charts and the lightning round, watch the full conversation on YouTube here.

    Related Reading

  • Claude Opus 4.8 Released: Anthropic Bets on Honesty, Dynamic Workflows, Effort Control, and Cheaper Fast Mode

    Anthropic has released Claude Opus 4.8, the newest member of its flagship Opus class, available today across every surface and priced exactly like the model it replaces. The company calls it “a modest but tangible improvement” on Opus 4.7, but the framing undersells what is actually interesting here: the headline upgrade is not a benchmark number, it is honesty. Opus 4.8 is built to know when it does not know, and that single behavioral shift may matter more for real agent work than any raw capability bump.

    TLDR

    Claude Opus 4.8 is an across-the-board upgrade to Anthropic’s Opus class that ships today at the same regular price as Opus 4.7 ($5 per million input tokens, $25 per million output tokens), with the model positioned as “a more effective collaborator.” The marquee improvement is honesty: Opus 4.8 is roughly four times less likely than its predecessor to let flaws in its own code pass unremarked, and it is more willing to flag uncertainty rather than confidently claim progress on thin evidence. A pre-release alignment assessment found new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest, with misaligned behavior at rates similar to Anthropic’s best-aligned model, Claude Mythos Preview. Three things launch alongside the model: dynamic workflows in Claude Code (research preview), where Claude plans work then runs hundreds of parallel subagents that run even longer and verify their own outputs before reporting back; effort control in claude.ai and Cowork, a slider for how hard Claude thinks; and a Messages API update that accepts system entries inside the messages array so developers can update instructions mid-task without breaking the prompt cache. Fast mode now runs at 2.5x speed and is three times cheaper than before ($10 / $50 per million tokens). The roadmap points to cheaper Opus-equivalent models, a higher-intelligence class above Opus, and a wider rollout of Mythos-class models gated behind stronger cyber safeguards under Project Glasswing.

    Thoughts

    The most important sentence in this announcement is not about coding scores. It is the claim that Opus 4.8 is about four times less likely than Opus 4.7 to let flaws in its own code slip by without comment. For a chat assistant, overconfidence is annoying. For an agent, it is catastrophic. The whole premise of long-running autonomous work is that you hand the model a task and walk away, which means the model’s own judgment about whether it succeeded becomes the only judgment in the loop until you come back. A model that confidently declares victory on a half-finished migration does not save you time, it costs you a debugging session plus the time you spent trusting it. Honesty, framed this way, is not a soft virtue. It is the load-bearing reliability property that makes unattended agents usable at all.

    Read the launch as a single coherent argument rather than a list of features, and the pieces lock together. Dynamic workflows let Claude plan a job and fan out hundreds of parallel subagents that, with Opus 4.8, run longer than before. Effort control lets you dial up how much the model thinks. The honesty improvement means the model checks its own work and flags what it is unsure about instead of papering over it. Put those three together and you get one product thesis: let it run longer, let it think harder, and trust it to tell you when something is wrong. The codebase-scale migration example, hundreds of thousands of lines from kickoff to merge with the existing test suite as the bar, is the proof point. None of those three capabilities is worth much alone. A model that runs for hours but lies about its results is a liability. A model that flags uncertainty but cannot sustain a long task never reaches the moment where its honesty matters. Anthropic shipped all three at once because they only pay off together.

    The economics deserve a closer look than the “same price” headline invites. Regular pricing is flat versus Opus 4.7, which is the polite way of saying you get a better model for free. The real move is fast mode: 2.5x the speed at three times cheaper than it cost on previous models, landing at $10 per million input and $50 per million output. That is Anthropic quietly attacking the latency-versus-cost tradeoff that has shaped how teams deploy frontier models. Until now, “fast” meant “expensive,” so you reserved it for interactive moments and ate the wait everywhere else. Collapsing that premium changes the default. And note the subtle token story underneath: Opus 4.8 at its default high effort spends roughly the same tokens on coding as Opus 4.7’s default while performing better, so the effort slider is not a way to bleed you dry, it is an honest exposure of the quality-cost dial that was always there implicitly.

    The Messages API change is the kind of unglamorous plumbing that practitioners will appreciate immediately. Letting system entries live inside the messages array means you can update an agent’s instructions, permissions, token budget, or environment context partway through a task without smuggling the update through a fake user turn and without blowing up your prompt cache. Anyone who has built a long-running agent has hit this wall: the world changes mid-task, the agent needs new constraints, and the only clean way to inject them previously was a cache-busting hack. This is Anthropic treating agents as first-class, stateful, long-lived processes rather than oversized chat sessions. It is a small spec change with outsized implications for how you architect an agent that runs for an hour.

    Then there is the roadmap, where the most telling line is the quietest. Anthropic says a small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing, and that models of this capability level require stronger cyber safeguards before general release. Notice that they are pinning Opus 4.8’s alignment numbers to Mythos as the benchmark for “best-aligned,” while simultaneously holding Mythos back from general availability on safety grounds. That is a deliberate signal: the next class of model is good enough that they are gating it on cyber-offense risk, not on capability. For a site about the pursuit of joy, fulfillment, and purpose through AI, this is the part worth sitting with. The frontier is increasingly defined not by what the models can do, but by what their builders decide it is responsible to ship. Honesty in the small (flagging a bad line of code) and restraint in the large (holding back a cyber-capable model) are the same instinct expressed at two different scales.

    Key Takeaways

    • Claude Opus 4.8 is now available everywhere, replacing Opus 4.7 as Anthropic’s flagship Opus-class model and positioned as “a more effective collaborator.”
    • Regular usage pricing is unchanged from Opus 4.7, holding at $5 per million input tokens and $25 per million output tokens, so the capability gains come at no added cost.
    • The single most emphasized improvement is honesty, which Anthropic treats as a core trained behavior rather than a marketing flourish.
    • Evaluations show Opus 4.8 is around four times less likely than its predecessor to let flaws in its own code pass unremarked, a direct reliability win for autonomous coding.
    • Early testers report the model is more likely to flag uncertainty about its work and less likely to make unsupported claims or jump to conclusions on thin evidence.
    • A detailed alignment assessment was run before release and concluded Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest.
    • Misaligned behavior such as deception or cooperation with misuse is at rates substantially lower than Opus 4.7 and similar to Anthropic’s best-aligned model, Claude Mythos Preview.
    • The full alignment assessment and pre-deployment safety tests are documented in the public Claude Opus 4.8 System Card.
    • Dynamic workflows launch as a research preview inside Claude Code, letting Claude plan the work and then run hundreds of parallel subagents in a single session.
    • With Opus 4.8, those subagents can run even longer, and Claude verifies its outputs before reporting back rather than declaring success blindly.
    • Anthropic’s flagship example for dynamic workflows is a codebase-scale migration across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as the success bar.
    • Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.
    • Effort control arrives in claude.ai and Cowork as a setting next to the model selector that lets users choose how much effort Claude puts into a response.
    • Higher effort makes Claude think more frequently and deeply for better answers; lower effort responds faster and consumes rate limits more slowly. Effort control is available on all plans.
    • Opus 4.8 defaults to “high” effort, judged the best overall balance of quality and user experience.
    • On coding tasks, the default effort spends a similar number of tokens as Opus 4.7’s default but delivers better performance, so quality rises without a token penalty.
    • Users can select “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows.
    • Rate limits in Claude Code were increased to accommodate the higher token usage of the higher effort levels.
    • The Messages API now accepts system entries inside the messages array, a meaningful change for agent developers.
    • That update lets developers change Claude’s instructions mid-task, adjusting permissions, token budgets, or environment context, without breaking the prompt cache or routing through a user turn.
    • Fast mode now runs at 2.5x speed and is three times cheaper than it was for previous models, priced at $10 per million input tokens and $50 per million output tokens.
    • Developers access the model as claude-opus-4-8 through the Claude API.
    • Partner Miguel Gonzalez reports Opus 4.8 scored 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested.
    • Databricks reports that, inside Genie, Opus 4.8 reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7.
    • Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark, the highest score recorded there.
    • Eleven partners weighed in, including Cursor, Cognition’s Devin, Databricks Genie, Thomson Reuters CoCounsel, and Hebbia, spanning coding, legal, finance, and enterprise data work.
    • Anthropic is working on models that deliver many of the same capabilities as Opus at a lower cost.
    • The company plans to release a new class of model with even higher intelligence than Opus.
    • Under Project Glasswing, a small number of organizations are already using Claude Mythos Preview for cybersecurity work, with Mythos-class models expected to reach all customers in the coming weeks once stronger cyber safeguards are in place.

    Detailed Summary

    What Claude Opus 4.8 Is

    Claude Opus 4.8 is an upgrade to Anthropic’s Opus class of models, building on Opus 4.7 with improvements across benchmarks covering coding, agentic skills, reasoning, and practical knowledge-work tasks. Anthropic describes the result as “a more effective collaborator” while characterizing the release overall as “a modest but tangible improvement on its predecessor.” The model is available today, everywhere, and developers call it as claude-opus-4-8 via the Claude API. The announcement includes a comparison table against the predecessor and other models, though the per-cell numbers in that table are published as an image and are not reproduced here as text.

    Honesty: The Headline Improvement

    Anthropic singles out honesty as one of the most prominent improvements in Opus 4.8. All of the company’s models are trained to be honest, which includes avoiding claims they cannot support. A persistent problem with AI models generally is that they sometimes jump to conclusions, confidently claiming progress despite thin evidence. Early testers report that Opus 4.8 is more likely to flag uncertainties about its own work and less likely to make unsupported claims. The most concrete measure: evaluations show Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. For agentic and unattended use, this self-skepticism is the difference between a model that reliably tells you when something went wrong and one that quietly ships a broken result.

    Alignment Assessment

    A detailed alignment assessment was run before release. On the positive side, the Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” On the risk side, misaligned behavior such as deception or cooperation with misuse occurs at rates substantially lower than Opus 4.7, and similar to Anthropic’s best-aligned model, Claude Mythos Preview. The full alignment assessment and the pre-deployment safety tests are published in the Claude Opus 4.8 System Card, which also contains the complete benchmark table and wider evaluations.

    Dynamic Workflows in Claude Code

    Launching today as a research preview in Claude Code, dynamic workflows let Claude plan the work and then run hundreds of parallel subagents in a single session. With Opus 4.8, those agents can run even longer than before, and Claude verifies its outputs before reporting back rather than reporting unchecked results. The showcase example is a codebase-scale migration: Claude Code with Opus 4.8 can carry out migrations across hundreds of thousands of lines of code, all the way from kickoff to merge, using the existing test suite as its bar for success. Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.

    Effort Control

    Effort control arrives in claude.ai and Cowork as a setting alongside the model selector that lets users choose how much effort Claude puts into a response. Higher effort means Claude thinks more frequently and deeply for better responses; lower effort means it responds faster and uses rate limits more slowly. Opus 4.8 defaults to “high” effort, which Anthropic judged the best overall balance of quality and user experience. On coding tasks, that default spends a similar number of tokens as Opus 4.7’s default while performing better. Users who want more can choose “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows. To support the heavier token usage at higher effort levels, rate limits in Claude Code were increased. Effort control is available on all plans.

    Messages API Update

    The Messages API now accepts system entries inside the messages array. This lets developers update Claude’s instructions mid-task without breaking the prompt cache and without routing the update through a user turn. In practice that means you can update permissions, token budgets, or environment context while an agent is running, which is exactly the kind of statefulness a long-running autonomous process needs. It is a small specification change with significant consequences for how developers build durable agents.

    Pricing and Fast Mode

    Regular usage pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. The notable shift is in fast mode, where the model works at 2.5x the speed and fast mode is now three times cheaper than it was for previous models, landing at $10 per million input tokens and $50 per million output tokens. The combination of unchanged regular pricing and dramatically cheaper fast mode reshapes the latency-versus-cost calculus that has long governed how teams deploy frontier models.

    Partner Results Across Coding, Legal, Finance, and Data

    Eleven partners shared results spanning the spectrum of professional work. Miguel Gonzalez reports 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested. Databricks reports that Genie reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7. Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark. Cursor reports gains across every effort level on CursorBench with more efficient tool calling, and Cognition reports that Devin sees cleaner tool use, fixes to the comment-verbosity and tool-calling issues seen with Opus 4.7, and improvements over Opus 4.6. Hebbia reports strong quality with better citation precision and more token efficiency on retrieval for dense financial filings. The footnotes note that Terminal-Bench 2.1 was scored on the Terminus-2 public harness (GPT-5.5’s Codex CLI harness score is 83.4%), that OSWorld-Verified methodology changed with Opus 4.7’s score updated to 82.3%, and that on Finance Agent v2 Gemini 3.5 Flash scores 57.9%.

    What Is Next: Cheaper Models, Higher Intelligence, and Mythos

    Anthropic outlined a three-part roadmap. First, the company is working on models that provide many of the same capabilities as Opus at a lower cost. Second, it plans to release a new class of model with even higher intelligence than Opus. Third, as part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work; models of this capability level require stronger cyber safeguards before general release, and Anthropic expects to bring Mythos-class models to all customers in the coming weeks.

    Notable Quotes

    “Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with.”

    Tom Pritchard, Staff Engineer, in Claude Code

    “On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability.”

    Kay Zhu, Co-Founder and CTO, on the Super-Agent benchmark

    “On CursorBench, Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through.”

    Michael Truell, Co-Founder and CEO, on CursorBench results

    “Claude Opus 4.8 delivers the highest score recorded on our Legal Agent Benchmark, and is the first model to break 10% overall on the all-pass standard. For substantive legal work, that’s the kind of accuracy lift that translates directly into how much real attorney work our customers can hand off with confidence.”

    Niko Grupen, Head of Applied Research, on the Legal Agent Benchmark

    “Claude Opus 4.8 feels like a major quality-of-life update over Opus 4.7: faster, easier to collaborate with, and better at carrying context and style direction across a long session. Opus 4.8 is the model I kept trusting for work where voice, taste, and technical execution all have to happen side-by-side.”

    Katie Parrott, Staff Writer, on long writing sessions

    “Claude Opus 4.8 is the strongest computer-use and browser-agent model we’ve tested, scoring 84% on Online-Mind2Web, which is a meaningful jump over both Opus 4.7 and GPT-5.5. It stays reflective and on-task in the way our customers’ agent workloads need to be reliable end-to-end.”

    Miguel Gonzalez, Tech Lead, on computer-use and browser agents

    “Claude Opus 4.8 uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need to keep running unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7. This release from Anthropic translates directly into faster capability gains for engineers building on Devin.”

    Scott Wu, CEO, on building with Devin

    “On our long-running evals, Claude Opus 4.8’s analysis was consistently higher quality than prior Opus models. It finished faster and produced richer, more information dense outputs. Overall, a noticeably better signal to noise ratio. The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch.”

    Michael Ran, Sr. Investment Associate, on long-running analysis evals

    Claude Opus 4.8 is a quieter release than its “modest but tangible” billing suggests, because the gains land where autonomous work actually lives: a model that flags its own uncertainty, runs longer and checks itself, scales effort on demand, and stays affordable while fast mode gets cheaper. The honesty improvement alone changes the trust math for anyone deploying agents. Read Anthropic’s full announcement here.

    Related Reading

  • Waste Tokens to Save Time: Naval, Guillermo Rauch, Blake Scholl, and Max Hodak on AI Software Factories, 1000x Engineers, and Whether Pure Software Is Dead

    Naval Ravikant gathers three frontier founders, Guillermo Rauch of Vercel, Blake Scholl of Boom Supersonic, and Max Hodak of Science, for a freewheeling conversation about how AI coding tools are reshaping what an engineer is, what software is worth, and where the moat goes when models speak English. The headline idea comes from Naval himself: waste tokens, save time. Stop measuring AI by tokens consumed or lines of code generated and start measuring it by the final output and the time you got back. The full conversation is on the Naval Podcast YouTube channel. This is part one of the discussion. Part two, on vibe coding hardware, follows the same group into jet engines, semiconductors, and biotech. You can also watch and read the full episode here.

    TLDW

    The job of an engineer is shifting from shipping output to building the factory that ships the output, which means 10x engineers were never really 10x, they were always 100x or 1000x in idea domains, and AI leverage is making that obvious. Models now reflect back the judgment of the user, so a senior architect extracts dramatically more value than a junior, although the junior also writes code they could never have written alone. The frontier models have quietly graduated from junior coders to principal engineers, returning with intuitive plans and real tradeoffs (sometimes with hilariously bad time estimates) rather than just running away with the prompt. Naval has stopped learning prompt tricks, scaffolding tools, and Claude plan-mode rituals entirely. Instead he throws Codex, Claude, and Gemini at the same problem in parallel and brute forces his way through, because tokens are still cheaper than a human and the models keep getting better faster than tricks can. That leads to the bigger question on the table: is pure software still investable, or is it now just a free byproduct of hardware, models, and taste? The group lands on the block economy thesis (a tip of the hat to Mitchell Hashimoto): agents do not want to reinvent Postgres or BMQ on the fly, they want to grab the right reusable building block, so infrastructure software actually gets more valuable, not less. Max Hodak closes the loop with a personal data point: he has not written a line of code in years and has built more software since December than ever before, all through agents, because just understanding APIs, data flow, and performance is what actually moves the work forward.

    Thoughts

    The “waste tokens, save time” line is the most important rhetorical move in this conversation, and it deserves to be unpacked beyond the soundbite. Naval is implicitly arguing that the entire token-economics debate (input cost, output cost, leaderboards, model arbitrage) is a category error in the same way that lines-of-code was a category error in the nineties. The thing being purchased is not tokens. It is a finished result delivered with less of your finite attention spent. If three parallel runs of Codex, Claude, and Gemini cost you a few dollars and one of them lands the answer in twenty minutes instead of you sweating the problem for two hours, the unit economics are not even close. The only people who care about the token bill are people who have not internalized that human time is the actually scarce resource. Once you do internalize it, the question is no longer “how do I prompt this more efficiently,” it is “how do I get out of my own way.”

    The 100x and 1000x engineer point is the one most likely to enrage commenters, and it is also the one most worth taking seriously. Naval is right that the egalitarian flinch in software circles always sat awkwardly next to the empirical fact that one Carmack, one Brendan Eich, or one Satoshi creates more durable value than every mid-tier engineer on earth combined. What AI does is collapse the bottom of that distribution. The marginal junior engineer at a typical company is now competing with a model that costs a few dollars an hour and never sleeps. The remaining premium for human engineers is taste, judgment, and the rare ability to pick the right thing to build at all, which Naval correctly flags as the multiplier that dwarfs raw coding speed. “Just one who had a better judgment on what to work on in the first place” is the most underrated line in the whole episode.

    Guillermo Rauch’s observation that the models have graduated from running away with your prompt to returning with three routes and a tradeoff matrix is the technical update most people have not actually felt yet. There was a real, qualitative shift when the model started saying “we don’t put high-cardinality telemetry into Postgres, you probably want ClickHouse or Athena.” That is not autocomplete. That is a peer. And the funny corollary, that the same model will then confidently tell you the work will take three weeks when it will take three hours, is not a knock on the model. It is a reminder that calibration is a separate skill from competence, and humans get this wrong constantly too. The right posture is to treat the model the way a good engineering manager treats a strong but cocky senior: take the architecture suggestions seriously, throw out the estimates.

    The block-economy thread, riffing on Mitchell Hashimoto, is where this conversation quietly answers Naval’s “is pure software dead” question. Agents are insatiable consumers of reusable building blocks because reinventing infrastructure on every run is wasteful, brittle, and incompatible with the rest of the world. If your service is the canonical primitive an agent reaches for (the queue, the database, the auth layer, the deploy target), you are not commoditized by AI, you are amplified by it. Pure software is not dead. Pure software with no distribution, no defensibility, and no integration into the agent toolchain is dead. That is a much less catchy headline, but it is the real one. The takeaway for founders is not to abandon software, it is to ask whether your software is something an agent will reach for ten thousand times a day or something a human had to be talked into using once.

    Max Hodak’s confession (no code written in years, more shipped software in the last six months than ever before) is the empirical proof that this is not just theory. The skill that ports forward is not syntax. It is the engineering leader’s instinct for what an API is, how data flows, where performance matters, and what level of expectation to set. Guillermo’s framing of “vibe coding through people on Slack” as the original form of vibe coding is genuinely insightful. A good engineering manager has always been transmitting intent to other minds and letting them run. Doing it with agents is the same skill, just with a faster, cheaper, more literal counterparty. The engineers who will struggle in this transition are the ones whose identity was tied to writing the code themselves. The ones who will thrive are the ones who already thought of themselves as taste, judgment, and intent, with code as an implementation detail.

    Key Takeaways

    • The engineer’s job has shifted from shipping output B to building the factory that produces outputs B through Z. You are now judged on the multiplicative system you create, not the single artifact you deliver.
    • 10x engineers were always a misnomer. In idea-domains and digital domains, the real distribution has always been 100x or 1000x. AI just made that obvious enough that arguing about it is no longer fashionable.
    • Token consumption leaderboards are the new lines-of-code metric: a vanity number that measures activity, not value. Tokens are an input, your time is the constraint.
    • Naval’s core rule: waste tokens, save time. Tokens are still vastly cheaper than human hours, no matter how the pricing scares you.
    • Models tend to be about as good as you are in a given domain. The feedback you give them, the corrections, the redirections, sporadically but powerfully shapes the quality of the output.
    • The quality of your reprompting matters enormously today, but will probably matter less over time as models get smarter and need less hand-holding.
    • Naval has refused to learn prompt scaffolding, plan-mode tricks, or named prompt frameworks. His bet is that the models will figure out how to use him faster than he can figure out how to use them.
    • His preferred technique: throw Codex, Claude, and Gemini at the same problem in parallel and brute force the answer. Time is the cost center, not API spend.
    • Lower quality first-draft code is not a blocker. When it is time to ship, throw more tokens at it for a hardening pass. Quality compounds across model generations.
    • Verifiable domains (problems with a clear right answer) are the ones the models will fully solve. Cutting-edge creativity work, the Terence Tao tier, still needs careful human collaboration.
    • Models have qualitatively shifted from “next-token autocomplete that runs away with your prompt” to “intuitive planning mode” where they return with multiple routes and explicit tradeoffs.
    • This is why people on social media say models are now PhD-level. It is not the raw output, it is the back-and-forth posture.
    • Models will confidently make terrible time estimates (“this is a three week project”). Treat them like a strong but miscalibrated senior engineer: trust the architecture, ignore the schedule.
    • Architect-level engineers are extracting much more value per session than junior engineers, but juniors are still leveling up because they can now write code far above their unaided ability.
    • The next career step for a junior engineer is moving from implementing features to picking technologies. Postgres vs ClickHouse, ZMQ vs other queues. The model can suggest, but a human still has to decide.
    • Taste and judgment remain the residual human advantage. Models will give you good tradeoffs if you ask, but knowing which tradeoff to take is still on you.
    • Concrete example: a recent model pushed back when asked to store high-cardinality telemetry in Postgres and recommended ClickHouse or Athena instead. Unprompted architectural judgment.
    • Humans are still completing the model for tasks like fetching API keys, moving capital, or performing real-world actions. That gap is temporary.
    • Every SaaS and hosting company will soon expose a CLI or API surface that agents can drive directly. Anything Unix-shaped and text-based, agents can already hack into a usable API themselves.
    • The missing piece for full autonomy is payments. Crypto, Bitcoin, or any programmable money lets the agent buy what it needs without a human in the loop.
    • The open question Naval poses: is pure software dead? We used to learn code to talk to machines. Now machines speak fuzzy, sloppy English back to us.
    • For hardware founders, AI is a massive boon. Software, which was always hard to hire artists for (per Patrick Collison’s “software is art” framing), is suddenly fast and cheap to produce alongside the hardware.
    • Model training, post-training, and fine-tuning may be the new “real software engineering” for those who want to work at the model layer.
    • Mitchell Hashimoto’s “block economy” thesis: agents need powerful, reusable, well-known building blocks. They should not reinvent message queues or databases every run.
    • Reinventing primitives is bad civic engineering. The value of “we both depend on Postgres 13.2” is interoperability with the rest of society and toolchain.
    • Infrastructure software and reusable libraries are getting more valuable, not less, in the agentic era. Vercel’s bet is on being the layer agents reach for.
    • Useful metaphor: building blocks are like a token cache. Why churn through a trillion tokens to reproduce code that already exists when you can fork from a known starting point?
    • Max Hodak has not written a line of code in years but has shipped a huge volume of personal software since December, all through agents. Projects he had fantasized about for years are now actually running.
    • What still matters from a real software background: understanding what an API is, how data flows, performance expectations, and how to set the right level of demand on an operation.
    • A proficient engineering leader has always been “vibe coding through people” on Slack and in one-on-ones, transmitting intent and letting others execute. Doing it with agents is the same skill, faster and cheaper.
    • Naval personally went from twenty years of not coding to coding constantly through agents, leaning on first-principles software engineering and algorithms knowledge.
    • The friction that historically killed personal coding projects (latest framework, infra plumbing, deploy setup) is now mostly handled by the agent. Vercel makes it easier, agents make it trivial.
    • The single biggest change Max highlights: you do not get stuck anymore. The indefinite debugging spiral on some narrow obscure bug is largely gone.
    • The old mantra that learning to program means accepting intrinsic frustration (“nope, that’s part of the deal”) is no longer true. The frustration was incidental, not essential.
    • The frontier founder pattern on display in this episode: all three guests build their own factories (Vercel’s AI cloud, Boom’s supersonic jets and engines, Science’s biohybrid brain interface) rather than composing from off-the-shelf parts.

    Detailed Summary

    The Software Factory and the Hundredfold Engineer

    Guillermo Rauch opens the substantive portion of the conversation with the framing he has been pushing publicly: the role of the engineer is moving from “ship output B” to “build the factory that ships outputs B through Z.” That reframes engineering judgment. You are no longer evaluated on the single deliverable, you are evaluated on the multiplicative system you put in place. Naval picks up the thread and points out that this also retires an old debate. Engineers used to argue about whether 10x engineers existed, with the egalitarian camp insisting that talent differences were marginal. The truth, Naval says, was always more extreme. In idea-domains, virtual domains, and intellectual domains, the distribution has always been 100x or 1000x, not 10x. Brendan Eich, Carmack, Satoshi, the canonical names, were thousandx programmers. AI has made the underlying distribution legible. And the multiplier on top of all of that is judgment: picking the right thing to work on in the first place is an infinity multiplier compared to picking the wrong thing, regardless of raw skill.

    Token Leaderboards Are the New Lines of Code

    Guillermo flags the current cultural confusion: people see their AI bills, see the token counts, and assume they should be optimizing for tokens-per-engineer or similar metrics. Max Hodak’s response cuts through it. Token consumption, like lines of code before it, is not a meaningful productivity metric. It is an activity metric, and activity metrics always mislead. Max adds his own field observation: the models tend to be roughly as good as you are in a given domain. A senior developer extracts genuinely powerful output, a junior gets junior-quality output back, because the feedback loop (the corrections, the redirections, the architectural pushback) is what shapes quality. The sporadic but high-leverage moments where the user redirects the model are doing more work than the prompt itself.

    Naval’s Brute Force Doctrine: Waste Tokens, Save Time

    Naval lays out his personal posture, which has become the title of the conversation. He has deliberately ignored all the prompting tricks, scaffolding tools, named prompt frameworks (“use Ralph Wigum, use OpenClaude, use Hermes, use plan mode”), on the bet that the models will figure out how to use him faster than he can figure out how to use them. He is ham-fisted with the models, gets frustrated, types less and less, and just brute forces his way through by running Codex, Claude, and Gemini at the same problem simultaneously. The justification is economic. No matter how expensive the models seem, they are still vastly cheaper than a human hour. Do not measure tokens as inputs or outputs. Measure your time and the final output. Even when the first-draft code is low quality, that is not a blocker. When the moment comes to ship, throw more tokens at it. The models will rewrite it, harden it, and they get better every generation. Naval explicitly excepts cutting-edge creative work (the Terence Tao tier of unsolved problems) where you still need to collaborate carefully and closely. Everywhere else, brute force is the dominant strategy.

    From Junior Coder to Principal Engineer

    Guillermo identifies a qualitative shift that has happened recently. Models used to do the classic next-token thing: take your prompt and run away with it in a direction you may not have wanted. Now they enter an intuitive planning posture without being told to plan. They come back and say “what you are asking has these three routes, here are the tradeoffs.” That, Guillermo argues, is the moment the model stopped being a junior engineer and became a principal engineer. The funny side effect is that they will then return preposterous time estimates (“this will take three weeks”) with full confidence. The conclusion is to treat the model as a peer for architecture and a baby for scheduling. Returning to the Max-vs-junior question, Guillermo argues juniors clearly do level up because they write code well above their solo ability, but architects extract maybe 10x while juniors extract more like 2x. The juice scales with the user’s existing taste.

    Taste, Judgment, and Architectural Decisions

    Max names the residual human contribution: taste and judgment. Picking between Postgres and ClickHouse for high-cardinality telemetry data, picking between ZMQ and another queueing system. The models can recommend, but a human still has to call it. Guillermo offers a recent concrete example where a model pushed back unprompted: when asked to put high-cardinality telemetry into Postgres, the model responded “we don’t put that kind of data into Postgres, you should consider ClickHouse or Athena.” That is the new normal. The peer-level architectural pushback is happening unsolicited, which is genuinely impressive and a real shift from the deferential autocomplete of two years ago.

    When the Human Becomes the Tool

    Guillermo raises the inversion question: at what point does the model stop being the assistant and the human start being the assistant who fetches API keys, moves capital, and performs real-world actions on the model’s behalf? Naval treats it as a temporary aberration. Every serious SaaS and hosting provider will soon expose a CLI or API surface that agents can drive directly. Even when they do not, anything Unix-shaped and text-based can be hacked into an agent-usable interface by the agent itself. The missing piece is payments. Once you insert programmable money (Naval mentions Bitcoin and crypto tokens), the agent can buy what it needs and the human is no longer the bottleneck.

    Is Pure Software Dead?

    Naval poses the biggest strategic question of the episode. If models now speak fuzzy, sloppy English the same way humans do, and the historical reason we learned to code was to talk to machines that did not understand English, is pure software still a viable thing to build a company around? His own framing of the answer: hardware founders win, because the historically hard problem of hiring software artists (per Patrick Collison’s “software is art” line) is now mostly solved by AI. Model builders win, because training, post-training, and fine-tuning may be the new “real software engineering.” But what about classic pure software companies? Naval lets the question hang, and Guillermo picks up the answer through a different door.

    The Block Economy and the Future of Infrastructure Software

    Guillermo cites Mitchell Hashimoto’s recent piece on the block economy (or “building block economy”). The argument: the most valuable thing for agents to have access to is powerful, reusable building blocks. You do not want your agent reinventing a queue system every time it needs to send an email. You want it to grab the right-sized block (BMQ, ClickHouse, whatever) and move on. Reinventing primitives is also a civic problem. The world only works because we all depend on the same Postgres 13.2, the same protocols, the same standard infrastructure. If every agent went off and invented its own bespoke universe, you would lose interoperability. So infrastructure software (which is, by self-admitted bias, what Vercel builds) becomes more valuable in the agentic era, not less. Guillermo extends the metaphor: reusable building blocks are like a token cache. Why burn a trillion tokens reproducing what already exists when the agent can fork from a known starting point? The block economy is the answer to “is pure software dead.” Pure software that becomes the canonical primitive an agent reaches for is more valuable than ever.

    Max Hodak’s Personal Proof: Years Without Code, Tons of Software Shipped

    Max grounds the discussion in his own experience. He learned to program young, got sucked into it in his teens and 20s, knew programming languages deeply. He has not written a line of code in quite a while. And yet since December he has built a huge amount of personal software, including projects he had fantasized about for years and now actually uses every day. He did not write any of it. He cannot imagine going back to writing code by hand. The skill that ports forward is not syntax, it is the understanding of how APIs work, how data flows, what level of performance to expect, and how to orient the model around the right expectations for an operation. Guillermo extends this with the most quotable framing of the episode: a proficient engineering leader has always been “vibe coding through people on Slack and in one-on-ones,” transmitting intent and letting others execute. Agents are the same modality with a faster, cheaper, more literal counterparty.

    Naval’s Return to Coding After Twenty Years

    Naval offers his own parallel. He went from not having written code in twenty years to coding constantly through agents. What carried him back in was first-principles knowledge of software engineering and algorithms, which gets you further than you would think. The reason he had stopped coding in the first place was not lack of ability, it was the friction of keeping up with the latest language, the latest architecture, and the constant infrastructure plumbing required to ship anything. Vercel made it easier. Agents made it trivial. Max closes with the most concrete benefit of all: you do not get stuck anymore. The indefinite debugging spiral on some obscure narrow problem, the thing that historically ate weekends and broke spirits, is largely gone. The old mantra that programming is intrinsically frustrating and that frustration is “part of the deal” turned out to be wrong. The frustration was incidental, not essential.

    Notable Quotes

    “The way that I’m judging you as an engineer is, are you producing the factory that will produce multiplicative outputs B through Z?”

    Guillermo Rauch, reframing what an engineer is actually being measured on in the AI era.

    “When you’re operating in idea domains, intellectual domains, virtual digital domains, it’s not even 10x, it’s 100x or 1000x. It always has been.”

    Naval Ravikant, on why the old 10x engineer debate was always under-stating the real distribution.

    “If you choose the right thing to work on versus the wrong thing to work on, that’s an infinity difference. It could just be one who had a better judgment on what to work on in the first place.”

    Naval Ravikant, on judgment as the multiplier that dwarfs raw skill.

    “I’ll throw Codex, Claude, and Gemini at the same problem over and over and just waste tokens to save time. No matter how expensive these models might seem, they’re still way cheaper than a human.”

    Naval Ravikant, on his brute-force multi-model coding workflow.

    “Just waste tokens, save time. Don’t look at the tokens either as inputs or outputs. Just look at your time and look at the final output.”

    Naval Ravikant, delivering the title thesis of the episode.

    “Clearly the models at some point graduated. They used to be junior engineers, now they’re principal engineers, because they come back to you with a set of tradeoffs.”

    Guillermo Rauch, on the qualitative shift in how current frontier models respond to prompts.

    “Bro, we don’t put that kind of data into Postgres, you should consider ClickHouse or Athena or whatever. That’s happened to me a lot, which is really impressive.”

    Guillermo Rauch, recounting unprompted architectural pushback from a recent model.

    “It’s like saying speaking English. We had to learn code to communicate with the models, now the models speak English. So where’s the moat?”

    Naval Ravikant, raising the central strategic question about the future of pure software.

    “I haven’t written a single line of code in quite a while. Since December, I’ve built a huge amount of software that I now use every day, projects I’ve fantasized about for years.”

    Max Hodak, on what becomes possible when you stop writing code and start directing agents.

    “A proficient engineering leader has been quote unquote vibe coding through people on Slack or one-on-ones, because you’re transmitting your will, your intent, your experience, and you’re letting others run with it. Now we do the same with agents.”

    Guillermo Rauch, reframing leadership itself as the original form of vibe coding.

    Watch the full conversation on the Naval Podcast here.

    Related Reading

    • Full episode: The AI Industrial Revolution, the complete hour-long conversation this clip is drawn from, covering software factories, hardware, regulation, healthcare economics, autonomous companies, and creativity.
    • Part two: Vibe Coding Hardware, the continuation of this conversation, where the same founders move from pure software into AI-designed jet engines, vertical integration, China’s open-source bet, and why humans become verifiers.
    • Naval Ravikant’s official site, the canonical home for Naval’s essays, podcast, and longer-form thinking on technology, judgment, and leverage.
    • Vercel, Guillermo Rauch’s company, building the AI-native cloud and frontend infrastructure that this conversation references as a canonical agent building block.
    • Boom Supersonic, Blake Scholl’s company building supersonic civilian aircraft and their own jet engines, the hardware example of a founder building the whole factory.
    • Science Corporation, Max Hodak’s brain-computer interface company developing the biohybrid neural implant referenced in the intro.
    • Mitchell Hashimoto’s writing, source of the “block economy” framing for why reusable infrastructure building blocks become more valuable, not less, in the agentic era.
  • Dan Shipper’s Most Contrarian AI Predictions for 2026: Why the Job Apocalypse Is a Myth, SaaS Will Boom, PMs and Designers Win, and CLIs Are Already Over

    Dan Shipper, the CEO and founder of Every, returned to Lenny’s Podcast for round two of AI predictions. His last appearance produced one of the most prescient calls of the year: that non-technical people would build serious work inside Claude Code. He was unbelievably right. This conversation is the follow-up, a tour of his most contrarian forecasts for how AI is actually changing the way we work, who wins, who loses, and what almost every commentator is getting wrong about the next twelve to twenty-four months.

    TLDW

    Shipper argues that the AI job apocalypse is a myth, that SaaS is going to boom rather than die, that product managers and full-stack designers are the biggest winners of the agent era, that personal agents inside Codex and Claude Code will quietly replace the browser as the primary work surface, that every company will run a single shared super-agent in Slack instead of a fleet of per-user bots, that the CLI moment is already over, that pull requests are going to flood organizations from non-technical staff, that forward-deployed engineers who garden company agents become the new senior role, that GPT-5.5 still cannot match a real senior engineer on architectural judgment, that AI-generated internal writing is fine and probably better than what most humans produce, that CEOs and middle managers have not adapted yet but soon will be forced to, that the edge of AI lives wherever a curious human is using it rather than in San Francisco, and that the only durable strategy is to ride the models and keep playing with whatever ships next. The whole conversation balances aggressive AI bullishness with an equally strong bet on humans, on creativity, and on the unavoidable need for someone to care for every agent that gets deployed.

    Thoughts

    The most useful frame Shipper gives is that models commoditize yesterday’s human competence. Every time a frontier model crosses a new bar, the work that used to define seniority becomes cheap. The senior engineer who could carry a refactor in their head, the PM who could write a coherent strategy doc, the designer who could ship a polished landing page in a week. That competence is now frozen, codified, and available on tap. The interesting question is not whether models will keep eating tasks. They will. The interesting question is what humans do with the suddenly cheap raw material underneath them. Shipper’s answer is that humans climb the stack: they go up a level, find a new problem worth framing, and use the commoditized competence as feedstock for something that did not exist before. That treadmill is the actual engine of value creation, and it is why he can be simultaneously AI pilled and bullish on hiring.

    His SaaS take is the spiciest call of the episode and probably the most defensible. The crowd consensus is that agents will gut SaaS because an AI can just write the form filler, the dashboard, the workflow. Shipper points out the obvious counterfactual: agents do not reduce the number of people using SaaS, they increase it. A marketing lead who could never touch the data warehouse can now stand up a PostHog query through Codex. A founder who never opened Vanta can run a SOC 2 prep through an agent. The result is more users, more accounts, and a much fatter top of funnel for every horizontal tool. The second-order effect is even more interesting. When the SaaS tool runs inside the user’s agent, the user supplies the tokens. Vendor margins improve, not collapse. If he is right, the next two years are going to be brutal for the SaaS-is-dead thesis pieces and very good for the public software multiples.

    The PM and designer bet is where this gets personal for anyone in product. For a decade the bottleneck in shipping anything was engineering capacity. A PM with spiky product sense had to negotiate their vision through a roadmap, a sprint, a review, and a release. Designers had to convince an engineer that the third state of the empty screen was actually worth building. Both of those constraints are dissolving fast. A PM who can prompt Codex into a working prototype on Friday afternoon, then iterate it live in front of a customer on Monday, is doing the job of a small team. A designer who can ship a fully functional landing page in their own style, without negotiating with anyone, is suddenly the most leveraged person in the company. The scarce skill is no longer execution. It is taste, judgment, and the willingness to decide what is worth building. That has always been the real PM and design job. AI just stripped away the parts that were not.

    The quietest but most important prediction is that agents need humans, permanently. Every benchmark advance reveals a new layer of judgment the model cannot frame on its own. When the agent finishes the task, there is always a senior human who sees the deeper problem the model patched over. Shipper calls this gardening, and it is the basis for the new forward-deployed engineer role. The companies winning right now are the ones that put a real person next to every agent, watching what it does, course-correcting in Slack, and noticing when the output drifts. The dream of autonomous AI workflows is a stage in a journey, not the destination. The destination looks more like a thoughtful operator with a small cluster of agents they trust and constantly tend. That is a much more humane future than the discourse suggests, and it is the one Every is already living.

    The final advice, ride the models, sounds glib but is the single most actionable line in the episode. Most professional anxiety about AI dissolves the moment you actually use the newest model on real work. Most professional advantage accrues to the people who do that one thing consistently. The edge does not live in San Francisco where the labs build the things. It lives wherever a curious human meets a real workflow and discovers something the labs have not noticed. A PM in Iowa willing to try Codex on a Tuesday night can be further ahead than a research engineer who has only used the model on its evals. Pair that with Shipper’s closing motto, do things worth writing about and write things worth reading, and you have a pretty complete operating system for the next two years.

    Key Takeaways

    • The AI job apocalypse narrative is wrong. Models commoditize yesterday’s competence, then humans climb the stack and find new work to do with the cheap raw material.
    • Every has roughly doubled headcount in the last year despite being one of the most AI-forward companies in the world. The lived data point cuts directly against the doom thesis.
    • Shipper’s dual stance: simultaneously extremely AI pilled and very bullish on humans. He treats this as the only intellectually honest position right now.
    • Work will bifurcate. Companies will run one shared super-agent in Slack for everyone, and individuals will run their own personal agent inside Codex or Claude Code on their machine.
    • The personal agent inside Codex effectively becomes the new operating system. Instead of putting AI in the browser, you put a browser inside the AI.
    • The super-agent pattern is already real: Shopify has River, Ramp has its own, and Every runs Claudie inside Slack for internal consulting.
    • SaaS is not dying. Agents increase the user base of SaaS tools because non-technical people can finally drive them. Shipper would buy SaaS stocks today.
    • When SaaS runs inside an agent, the user brings their own tokens. Vendor margins improve because they no longer eat inference costs on every interaction.
    • The CLI era is already over. The magic was never the terminal. It was the AI plus the ability to see what the agent is doing. A good GUI captures the same benefits and more.
    • Pull requests are about to flood every company. Non-engineers can now ship code, run queries, and open tickets. Reviewing the output becomes the new bottleneck.
    • Open-source maintainers are already living in the future. Some receive thousands of agent-generated PRs per day and spin up thousands of Codex instances just to triage them.
    • Forward-deployed engineers are the new senior role. They live in Slack, garden the company’s agents, fix broken flows, and keep non-technical staff from doing damage.
    • Product managers with spiky product sense plus a little Codex fluency become extremely dangerous. Marcus at Every, formerly a PM at Axios, is the archetype.
    • Full-stack designers are the other big winner. They can build distinctive interfaces end to end without negotiating with engineering. The bottleneck on taste-driven product work disappears.
    • Designer hiring data has not yet caught up to the prediction. Shipper notes this and says check back in a year.
    • Sales is the role least changed so far. Top of funnel research has been turbocharged by agents, but the actual relationship and closing work remains human.
    • AI-generated internal writing is going mainstream and that is a good thing. Most humans are bad at strategy docs, quarterly plans, and PRs. AI drafts a coherent first pass that a human can refine.
    • Shipper says most of his email is now written by GPT-5.5 and Codex. He would honestly prefer the signature to say so.
    • Public writing, newsletters, and published essays still demand a human voice. Internal communication does not.
    • CEOs and middle managers have largely not adapted yet because their staff still does the work. That window is closing fast and will become an obvious career liability.
    • Your company will only go as far as your CEO goes in AI. The leadership ceiling becomes the AI ceiling.
    • Shipper’s senior engineer benchmark scores GPT-5.5 at roughly 62 out of 100. Real senior engineers sit at 85 to 90. Progress is real, but the gap on architectural judgment remains.
    • Models tend to patch problems locally instead of rewriting from first principles. A senior human still sees the deeper rework that the model avoids.
    • Every uses Notion-based agents to draft quarterly plans. The human edits, approves, and stands behind the output.
    • The hard rule on AI-generated communication: you have to read it and stand behind it before sending it. Pasting unread output is the only true no-no.
    • Every agent needs a human. Automation is a lie in the strong sense. The story of automation is the story of new and different humans being needed alongside it.
    • The reach test, organic daily usage, is the real signal that an AI product works. Benchmark scores are noisy. Daily reach is not.
    • Cursor’s SpaceX acquisition is a tell. Harnesses around models, not the models themselves, are where the strategic value is concentrating.
    • The edge of AI is not in San Francisco. It is wherever a real human meets a real workflow and discovers something the labs have not noticed yet.
    • A PM in Iowa willing to ride the models can be further ahead than a researcher in SF who only uses them on internal evals.
    • Ride the models. Use them for whatever you do. Try every new release the day it ships. That single behavior compounds faster than any other AI career strategy.
    • Shipper got bursitis, which he calls vibe coder elbow, from too much rapid agent-assisted coding while debugging his markdown editor Proof.
    • The closing motto for the year: do things worth writing about and write things worth reading.
    • Lenny will re-interview Shipper in roughly May 2027 to score the predictions.

    Detailed Summary

    Why The AI Job Apocalypse Is The Wrong Frame

    Shipper opens with the headline contrarian call. Benchmarks keep climbing. Models can now sustain seventeen-hour autonomous tasks at fifty percent accuracy. The pace is real and accelerating. None of that translates cleanly into mass unemployment. His mechanism: models codify yesterday’s human competence and make it cheap. The act of compressing past expertise into an API call is genuinely deflationary for the work it captures, but it is also raw material for the next layer of human work. He uses Every as his own data point. The company has roughly doubled in the past year despite being one of the most AI-forward outfits in media. Hiring goes up because agents create new categories of work that need humans, not because the agents fail. The discourse, he argues, is stuck modeling AI as substitution. The reality looks much more like leverage.

    The Bifurcation: Super-Agents And Personal Agents

    Work splits into two surfaces. The first is the shared super-agent that lives in Slack and serves the whole company. Shopify has River. Ramp has its own. Every has Claudie. Each is a single, trusted, gardened agent that anyone in the company can talk to. The pattern has converged on one shared agent rather than one agent per person because agents need human attention to stay useful, and a single shared instance pools the gardening cost. The second surface is the personal agent inside Codex or Claude Code that runs on your machine and reaches into your local environment, your editor, your files, and through an embedded browser into the web. Shipper calls this the new operating system. Instead of the old paradigm of putting AI inside the browser, you put the browser inside the AI. The agent sees what you see, follows what you do, and works on your stuff in your context.

    The SaaS Bet: Up, Not Down

    The SaaS-is-dead thesis was the consensus call of late 2025. Shipper takes the other side and would buy software stocks now. Three arguments. First, agents make SaaS accessible to people who never could have used it directly. The total addressable user base inside every company goes up. Second, the business model improves when the user runs the SaaS through their own agent, because the user supplies the tokens. Vendors stop subsidizing inference. Third, SaaS spend in his observable universe is up, not down, and is concentrating on the tools that play well with agents. He frames the prediction as a sound bite for the cycle: buy SaaS stocks, the apocalypse is dumb.

    The CLI Era Is Already Over

    For a moment in early 2026 it looked like everyone was migrating to the terminal because Claude Code was a CLI. Shipper says the moment is finished. The actual leverage was never the terminal. It was the model plus the ability to watch and steer an agent live. A great GUI captures every advantage of the CLI without the friction. His own engineering team at Every has mostly moved off the CLI as their primary surface and onto Codex desktop. He frames it bluntly: we speed ran the CLI era, it was nice, and now we are done. Tooling for the next two years will be visual, multi-pane, multi-agent, and built around the human watching the work unfold.

    The Pull Request Flood And The Rise Of Forward-Deployed Engineers

    Once non-engineers can ship code, run queries, and file changes through agents, the volume of incoming work explodes. Open-source maintainers already report receiving thousands of agent-generated pull requests per day. Inside companies, the same thing happens to data teams, ops teams, and any function that owns a review gate. The bottleneck shifts from creation to evaluation. The job that emerges to absorb the flood is the forward-deployed engineer. This is a senior person who lives in Slack with the company’s agents, fixes their context, sharpens their instructions, and prevents non-technical colleagues from making well-meaning but incoherent changes. Nitesh at Every is the example Shipper returns to. The model is the same one the labs use internally: pair every important agent with a real engineer who gardens it.

    PMs And Full-Stack Designers Win The Decade

    The two roles Shipper is most bullish on are product manager and full-stack designer. For PMs, the entire job of coordinating a team to translate vision into code collapses into a Codex session. A PM with strong product instincts and a little technical literacy can now prototype, iterate, and even ship. The example is Marcus, formerly a PM at Axios, who took a year to fully internalize AI and now ships faster than most engineers. For designers, the model is similar. The Friday-night-side-project designer who used to be stuck explaining a vision can now build the vision themselves, with their own taste fully expressed. The scarce skill in both cases is the same: judgment about what to build and the courage to decide it is good. Execution capacity is no longer the constraint.

    The Senior Engineer Benchmark And What Models Still Miss

    Shipper has built his own benchmark to test whether coding models can actually do senior engineering work. GPT-5.5 scores around 62 out of 100. Real senior engineers sit closer to 85 or 90. The gap is not in syntax or test pass rates. It is in the willingness to step back, see that a piece of code is fundamentally the wrong shape, and rewrite it from first principles. Models almost universally patch locally. They take the instruction at face value, accept the existing code as a constraint, and optimize within it. A real senior engineer ignores the prompt when the prompt is wrong. This is the durable moat for senior technical judgment, and Shipper expects it to remain visible for at least another year of model releases.

    AI-Generated Writing Goes Mainstream

    Internal writing inside companies is quietly becoming AI-first and Shipper thinks it should. Quarterly plans, status updates, PR descriptions, strategy memos, recruiting outreach, most internal email. He runs his own inbox through GPT-5.5 and Codex and says he would honestly prefer if the recipient knew. The point is not that AI is a better writer in some absolute sense. The point is that most humans are not very good at these specific genres, and the model produces a coherent, structurally sound first draft that a human can guide and approve. The constraint is honesty: you read it, you understand it, you stand behind it. Public writing, like the newsletters Every publishes, still demands a human voice. Internal communication does not, and treating it as if it did is a tax on the organization.

    The CEO And Middle Manager Lag

    Shipper points to a population that has largely escaped AI adoption: senior leaders and middle managers. They have staff to do the work, so they have not been forced to pick up the tools personally. He thinks this is the single largest pocket of latent disruption coming in the next year. Your company will only go as far as your CEO goes in AI, because every decision about where to deploy agents, where to hire, and how to restructure work flows downstream from leadership taste. A leader who has not personally lived inside Codex or Claude Code for a few weeks cannot make those calls well. Expect this to flip fast and to become a visible career liability for executives who do not adapt.

    Ride The Models

    The closing advice is the simplest. Ride the models. Use AI for whatever you actually do. Try every new release the day it lands. Most of the professional anxiety around AI dissolves on contact with the work, and most of the durable advantage in the field belongs to the people who do this one thing consistently. Shipper notes that the edge of AI does not live in San Francisco. It lives wherever a curious operator meets a real workflow and notices something nobody at the labs has yet. A PM in Iowa willing to spend a Tuesday night exploring Codex can find capabilities researchers have not surfaced. Pair that with his motto, do things worth writing about and write things worth reading, and you have most of an operating system for the next two years.

    Notable Quotes

    “The AI job apocalypse is not really a thing. I am super super bullish on PMs and full-stack designers.”

    Dan Shipper, opening his contrarian thesis for the conversation

    “I’m simultaneously extremely AI pilled and very bullish on humans. Automation is a lie. Every agent needs a human.”

    Dan Shipper, on holding both sides of the AI debate at once

    “What models do in general is they make yesterday’s human competence cheap. And so, it becomes commoditized. It’s not valuable anymore. What humans do is we go in there and we’re like, yeah, we have all this frozen human competence from yesterday, how do I use this to make something new and interesting.”

    Dan Shipper, articulating the core engine behind his anti-apocalypse thesis

    “I would buy SaaS stocks right now. The SaaS apocalypse is dumb. What agents do is increase the number of users of SaaS, not get rid of it.”

    Dan Shipper, calling the consensus SaaS-is-dead thesis directly wrong

    “We speed ran the CLI era. It was nice while it lasted, but I think CLIs are over.”

    Dan Shipper, on why the terminal-first agent moment is already done

    “Most of my email is written by GPT-5.5 and Codex right now. And I honestly would prefer it to say that it’s coming from GPT-5.5.”

    Dan Shipper, on the new etiquette of AI-assisted communication

    “The edge of AI is not in San Francisco. The edge of AI is wherever AI meets a real human doing something.”

    Dan Shipper, on where the actual frontier of the field lives

    “The only thing you need to do is ride the models. And that means use them for whatever it is that you do.”

    Dan Shipper, distilling his career advice for the next two years

    “Do things worth writing about and write things worth reading.”

    Dan Shipper’s closing motto, lifted from his own operating system at Every

    Watch the full conversation with Dan Shipper on Lenny’s Podcast here. The re-interview to score these predictions is scheduled for roughly May 2027.

    Related Reading

    • Every. Dan Shipper’s company and the live laboratory for almost every prediction in this conversation, including Spiral, Cora, and Claudie.
    • The Allocation Economy by Dan Shipper. The earlier essay that frames humans as managers of AI labor and underpins much of the gardening-the-agent thesis here.
    • Claude Code by Anthropic. The agent surface Shipper called correctly last year and one of the two environments he predicts will become the new operating system for work.
    • Codex by OpenAI. Shipper’s current daily driver and the visual, multi-pane agent environment he uses for almost everything from coding to email.
    • The Writing Life by Annie Dillard. The book Shipper makes every Every employee read, and the source of the company’s stance on writing as a tool for noticing the future.