PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

  • Shopify CEO Tobi Lütke: AI Is the Perfect Scapegoat for Layoffs, Canada Has Trump Derangement Syndrome, and 50% of Shopify Code Is Now AI-Generated

    TLDW

    Shopify CEO Tobi Lütke sat down with Harry Stebbings on 20VC for one of the most candid and controversial conversations of his career. Lütke argues that the current wave of mass layoffs has nothing to do with AI and everything to do with pandemic-era overhiring, but AI will be blamed because it cannot fight back. He blasts Canada for its “Trump Derangement Syndrome,” calls the climate cult “one of the most evil things wrought on the population,” reveals that over 50% of Shopify’s code is now AI-generated, and says many of his best engineers have not written a line of code since December when Claude Opus changed everything. He also introduces River, an AI engineer at Shopify that named itself, and explains why he believes context engineering will be the dominant role of the next five years.

    Key Takeaways

    • AI is not causing layoffs, COVID overhiring is. Lütke is blunt: “What you see right now is not AI layoffs. Those are just the companies that are really slow that overhired just like everyone else.” AI will get blamed for everything because it is the perfect Girardian scapegoat that cannot fight back.
    • Over 50% of Shopify’s code is now AI-generated and “converting to much higher numbers.” Many of Shopify’s best engineers have not written code this year. December 2025 and the release of Claude Opus changed everything.
    • Senior engineers became more valuable, not less. Lütke initially thought new grads with no priors would dominate the AI native era. He was wrong. Senior engineers steer agents better because steering is the new programming, and reps matter more than ever.
    • Context engineering will become the dominant role within 5 years. A new product builder role is emerging that subsumes engineering, design, and product management, focused on coordinating intelligent actors (humans and AI) to ship products.
    • “River” is Shopify’s AI engineer that named itself. Built first, then asked what name it wanted. River lives in Slack, ships engineering work, and learns publicly because it is steered through public Slack channels.
    • Builders are “eights” on the Enneagram and companies actively conspire against them. Eights call out nonsense, refuse fancy dressing, and are dangerous to colleagues’ careers. They rarely get promoted, often leave, and start companies. Shopify is “remarkably high on eights” because Lütke seeks them out.
    • Canada has “Trump Derangement Syndrome.” Over 60% of Canadians believe the United States is a bigger threat than Russia or China. Lütke calls this “stunning” and wrong. Canada’s only winning strategy historically has been “winning by helping America win.”
    • Canada should be the richest country on Earth. It has every resource the world needs for the next 20 years. Lütke wants pipelines built, industry built, refining done domestically, and an end to exporting raw resources to have other countries make end products.
    • Be deeply suspicious of “non-profit.” Lütke argues opting out of the only fitness function that has ever pulled people out of poverty (markets) and refusing to disclose your actual fitness function is a red flag. Non-profits replace merit with pull.
    • The climate cult is blocking civilization. Lütke called it “one of the most evil things wrought on the population” and pointed to anti-nuclear green parties and frog protection laws blocking factories as examples of policy capture.
    • The Chinese AI threat is real but misunderstood. The bigger concern is that if Western governments restrict children from using AI, kids will simply download Chinese open-weight models, train on collectivist worldviews, and stop ever writing high school essays about Tiananmen Square.
    • Markets are the most democratic system that exists. Every dollar spent is a vote. Capital allocation by hundreds of millions of consumers is more democratic than any election.
    • Friedrich List and the Prussian school over Adam Smith. Lütke prefers a model where governments define excellent games with positive externalities, then completely get out of the way and let competition do the rest.
    • Shopify’s biggest mistake was going into physical logistics right before AI got really good. Lütke initially defended the decision based on what he knew at the time, but later admitted he was probably just wrong.
    • Lütke does not look at the stock price. It has been at least 23 days since he last checked. He runs Shopify on product instincts, not market signals.
    • Great leaders must be exothermic. A CEO is a heat source for the company. Lütke prefers “temperature” to “chaos” because chaos has too negative a connotation.
    • Don’t go to university for university’s sake. Get a degree from somewhere hard to get into so you are surrounded by people who also fought to get in. Better yet, join a small company where you can actually be of value.
    • Entrepreneurship is the most AI-safe AND most AI-benefiting job. Lütke sees a coming golden age of entrepreneurship where priors no longer matter and AI co-founders eliminate the need to grow up around business.
    • “You can just do things” is the rallying cry Lütke wants to ingrain in the world. Action causes information. The cost of trying is lower than ever.
    • The demonization of wealth in America is misdirected. No one gets to a billion dollars by stealing. Builders create products that people vote for with their money, the most democratic act in any economy.

    Detailed Summary

    Harry Stebbings opens by asking Tobi Lütke whether entrepreneurs are motivated by fear of losing or hunger to win. Lütke says he is still figuring out his own answer, but argues that both extremes lead to short-term thinking. The real unlock is taking a long perspective, because compound advantages only accrue when you are willing to wait.

    Builders Are “Eights” and Companies Conspire Against Them

    Lütke explains the Enneagram personality framework and identifies himself as an “eight,” the type that refuses to accept that any organization’s output is acceptable just because it is dressed up nicely. Eights call out nonsense, are dangerous to careers around them, rarely get promoted in professionally managed companies, and often leave to start their own businesses. Shopify deliberately overweights eights in its hiring. Lütke also says people who build companies are “fundamentally crazy people” and that the public image of leadership comes from movies, not reality. He never wanted to be CEO but realized you cannot run a product driven company without controlling the company itself, because product needs and company needs only converge on a three-year horizon.

    The Luxury of Long-Term Thinking as a Public Company

    Stebbings asks if a public company can really afford long-term thinking. Lütke says trusted public companies are the best position to be in. The chasm to cross is from trusted private to untrusted public, which is why so many founders refuse to IPO. Shopify went public 11 years ago at a 1.67 billion dollar valuation when revenues were a fraction of today’s. The valuation is now roughly 100x higher. Lütke walks through the IPO mechanics: investment bankers serve the buy side, not the company, and Lütke priced his offering above range because he knew where his growth would come from. The first trade closed about 10 dollars higher, which he calls a “good performance” but a teaching moment about market price discovery.

    AI Is the Perfect Scapegoat for Mass Layoffs

    This is where the conversation gets explosive. Lütke says Shopify employs about 7,500 to 8,000 people today and his real hope is to have the same number in five years, but at 100x productivity. He argues that the layoffs sweeping the tech industry have nothing to do with AI. They are the result of pandemic-era overhiring catching up to slow-moving companies. But AI will get blamed for everything because it is the perfect Girardian scapegoat. It cannot defend itself, it has no PR team, and an entire industry of doomers is already trained to point at it. Lütke says his own industry has been “gaslighting everyone into AI fear” and science fiction did the same for 60 years before that.

    His own use of AI is what he calls utopian. Tasks that used to be hard are easy. Most jobs, he argues, are not actually good jobs to begin with. Being a human task queue is not a great job. Great jobs involve agency and creation. As AI gets cheaper, purchasing power explodes, and people will get options to do things on weekends that are vastly more productive than their day jobs ever were.

    Markets Are the Most Democratic Mechanism Ever Invented

    Lütke pivots into a long defense of capitalism as the most democratic system in existence. Every dollar spent is a vote, far more frequent and more granular than any election. He uses Elon Musk and Tesla as examples. Lütke owns a Model Y, did not touch the steering wheel that morning, and uses Starlink in the back to work on long drives. He posts on X and gets replies from Japan in real time. He calls Musk a “one man engine” who has captured a tiny percentage of the value he created. He extends this to Shopify itself: Lütke owns 6% of the company, which means 94% is owned by other people who all made money. Plus roughly 10 million people work in the broader Shopify ecosystem on customer fulfillment, web design, customer service, and more.

    Why “Non-Profit” Should Make You Suspicious

    Lütke targets the charity industrial complex. He argues that non-profits opt out of the only mechanism humanity has ever invented to lift people out of poverty (markets), and they fail to articulate what their actual fitness function is. The result is that “merit of organization is replaced with pull of individuals.” Smooth talkers, not builders, end up running these institutions. He acknowledges Carnegie’s libraries and a few exceptions but believes the ratio of charity dollars to good outcomes is dramatically off. He is far more enthusiastic about funders like MacKenzie Scott who give in unrestricted ways, and even more enthusiastic about Jensen Huang and Bloom Energy as compute and infrastructure investments that compound into civilizational gains.

    The Prussian School of Economics

    Asked about government intervention, Lütke pledges allegiance to Friedrich List and the Prussian school of political economy over Adam Smith and Lassalle. The job of government is to define excellent games where positive externalities accrue to society, then completely get out of the way. He calls the outsourcing of violence to governments “one of the most inspiring things humanity has ever done” because it created the conditions for personal property. But governments are extremely bad at doing things directly. The moment a government runs grocery stores, it costs 10x more, and entrepreneurs have to be enlisted to repair the damage.

    Canada’s Trump Derangement Syndrome

    Stebbings asks if Lütke is proud of Canadian Prime Minister Mark Carney for standing up to Trump. Lütke is unequivocal: no. He calls Carney’s stance “not a credible witness to the reality on the ground.” Canadians, he argues, are “massively overfit to niceness,” which leads to “unkind lies” and lying by omission. Over 60% of Canadians now believe the United States is a bigger threat than Russia or China, which Lütke calls “stunning” and clearly wrong. Canada is a small economy attached to a hegemon, and the only winning strategy in its history has been winning by helping America win.

    That said, he agrees with Carney on diversifying the economy, getting closer to Europe, and engaging Asia. But he wants Canada to also “build the [expletive] out of pipelines, build the [expletive] out of our industry, and start refining the stuff ourselves.” Canada has every resource the world needs for the next 20 years and the most educated workforce on Earth. The only obstacle is political will. Canada’s commercial story has been the same since the beaver pelt era: extract resources, ship them abroad, let other countries make end products. Canada Goose, Lululemon, Shopify, Miller Lite. That is the short list of products Canada actually makes.

    The Real Chinese Threat

    Lütke says the Chinese AI threat is both underestimated and overestimated. The bigger threat, he argues, is government overreach. If Western governments start dictating which AI models children can use, kids will simply download Chinese open-weight models. He notes that Chinese models, especially when prompted in Chinese, exhibit a clearly collectivist worldview. The risk is that an entire generation of students writes essays through models trained never to mention Tiananmen Square. He frames the broader political battle as collectivism versus individualism and says everything else is smoke screening.

    Fixing Europe and the Climate Cult

    Asked what he would do as president of Europe, Lütke begins by saying you have to “get rid of the climate cult.” He calls it “one of the most evil things wrought on the population,” citing green parties whose founding myth is that nuclear power is bad, and infrastructure projects blocked because of one frog breeding in one creek. He argues that very few people have the capability to truly build, and they need both enablement and accountability from the village. Beyond that, he wants Europe to follow the Prussian playbook: build excellent games, build infrastructure, and use the resulting wealth to sculpt the economy you want.

    Shopify’s Biggest Mistake

    Lütke says his biggest public mistake was Shopify’s full push into physical logistics and warehousing right before AI capabilities exploded. Initially he defended the decision as correct based on the information available at the time, but later admitted he probably just got it wrong. The hardest part was that real people lost their jobs when Shopify exited.

    Great Leaders Are a Heat Source

    Lütke previously talked about CEOs injecting “chaos” into organizations. He now prefers “temperature.” Heat is atoms jiggling. Great leaders must be exothermic, providing energy that flows through the organization. He says he hasn’t checked Shopify’s stock price in at least 23 days. Most public company CEOs are obsessed with their stock. Lütke runs on product instincts.

    Senior Engineers Don’t Write Code Anymore

    Lütke admits he was wrong about new grads having an AI native advantage. Some are exceptional (he hired a 13-year-old intern from Waterloo whose mother accompanies him to classes), but on the whole, senior engineers steer agents better than juniors do because they have done more reps. Programming is not gone. Programming has become higher level. Engineers massively underestimate how important steering is. Steering is just programming at a higher altitude.

    The Role That Will Dominate in 5 Years

    Lütke says context engineering, a term he had a hand in popularizing, will become a standard role within five years. It will likely subsume parts of product, design, and engineering management. The best AI programmers right now, surprisingly, are people from engineering management because they have been prompting intelligent agents (humans) for years. Good communicators are good thinkers because communication is distillation.

    River, the AI Engineer That Named Itself

    Shopify built an AI engineer that lives in Slack. They built it first, then asked it what name it wanted. The AI chose “River” because Shopify’s monolithic repository is called “world” and rivers shape worlds. River does an enormous amount of Shopify’s engineering, taking instructions through public Slack channels so that the entire company can learn from how others steer it.

    Over 50% of Shopify’s Code Is AI-Generated

    The number is “a fair deal over 50%” and “converting to much higher.” Many of Shopify’s best engineers have not written code this year, with the inflection point being December 2025 and the release of Claude Opus. Lütke himself still writes code occasionally, especially the data structure layer where he applies what he calls a “German school” of engineering: figure out how data persists on disk, then build everything else on top. Once that is right, the rest can be vibe coded by AI.

    Should His Kids Go to University?

    Lütke says he would not push his kids to attend university for its own sake. The value of a hard to enter program is being surrounded by people who also fought to get in. Better still: get into the room with people who are obsessed with the topic you care about. He thinks joining a small startup where you can actually be of value is often a superior path. He addresses nepotism directly. His instinct is that nepotism is bad. The gold standard is double-blind merit. But double-blind merit barely exists anywhere, and intersectional academic hiring criteria in Canada are arguably worse than nepotism.

    Final Reflections

    Lütke ends with what he calls the best advice he knows: “You can just do things.” The system exists to push everyone toward acceptable outcomes, but if you know what a good outcome looks like, you can step out of the system and try. Action causes information. The cost is lower than ever. The only constraint is that the experiment cannot have victims.

    He also addresses the demonization of wealth. No one gets to a billion dollars by stealing. Builders create products people vote for, the most democratic act there is. Buying from a local shop is voting for the welfare and future of local shops. Constructive criticism is itself something someone has to build, and Lütke welcomes it. Lazy criticism, hot takes, and bad faith arguments are corrosive and should be held in contempt.

    He is bullish on AI as a counterweight to information warfare. A council of AI models trained in different countries (Chinese, German, French, American) could fact check claims with multiple perspectives. The “@grok is this true” reflex on X is, he says, a primordial version of this. The information asymmetry that has favored bad faith actors for decades is about to flip.

    Thoughts

    This interview is a window into the operating philosophy of one of the most successful technical founders alive, and it is far more provocative than most of his public appearances. The headline claim, that AI is a scapegoat for layoffs caused by pandemic overhiring, deserves to be repeated until it sinks in. Every CEO who lays people off and then writes a memo about “AI driven efficiency” is taking advantage of a narrative that AI cannot push back against. The math is plain: if you doubled your headcount in 2021 and 2022 and now you are firing 15%, you are not net displaced by AI. You are correcting a hiring mistake.

    The 50% AI generated code statistic is the bigger story. Shopify is not a small company. 8,000 employees and 7 billion in revenue is enterprise scale. If a company that mature has crossed the 50% threshold and is “converting to much higher numbers,” the implication for the broader software industry is enormous. The senior engineer compounding observation is also subtle and important. If steering is the new programming, then the senior pool is more valuable, not less, and the pipeline problem for junior developers gets harder to solve. Companies that under invested in junior training during ZIRP will face an experience cliff in five years.

    Lütke’s Canadian commentary will offend many readers in his home country, which seems to be exactly the point. The “lying by omission” critique of Canadian niceness is sharp and accurate. The 60%+ of Canadians who view the US as their largest threat is genuinely a remarkable statistic, and it has implications for trade policy, capital flows, and immigration. Whether or not you agree with his political read, his prescription is unambiguous and pro-growth: build pipelines, refine resources domestically, stop being content as a feedstock economy.

    The non-profit critique deserves more public debate. The fitness function point, that markets reveal preferences and non-profits opt out of preference revelation while not disclosing what they optimize for, is a sharp economic argument. The pull versus merit observation about who ends up running large foundations rings true to anyone who has worked adjacent to the philanthropic sector.

    The introduction of River as an AI engineer that named itself is a small detail that signals where this is going. AI agents are going from tools to teammates with identities, channels, and reputations. The fact that River shapes the “world” repository is poetic, and the public Slack steering pattern is a real innovation in how organizations can scale agentic AI without creating siloed knowledge.

    Lütke’s “you can just do things” rallying cry is ultimately what ties the entire interview together. Whether he is talking about Canada, Europe, AI engineers, or his own kids, the through line is the same: action causes information, the cost of trying is lower than ever, and the only people who will benefit from the next decade are the ones who refuse to wait for permission. This is the most useful piece of philosophy in the entire conversation, and it applies far beyond entrepreneurship.

  • Subquadratic (SubQ) Explained: The First Fully Sub-Quadratic LLM with a 12M-Token Context Window, 50x Cost Reduction, and a Post-Transformer Architecture

    Subquadratic, the AI infrastructure company behind subq.ai, just emerged from stealth with a $29M seed round and a claim that should make every AI engineer pay attention: they have built the first large language model whose compute scales linearly, not quadratically, with context length. The result is SubQ, a frontier model with a 12 million token context window, roughly 50x lower cost than leading frontier models at 1M tokens, and benchmark numbers that put it ahead of Gemini 3.1 Pro, Claude Opus 4.6/4.7, and GPT-5.4/5.5 on key long-context tasks. This is a deep, opinionated breakdown of everything Subquadratic has published so far, who is behind it, why a sub-quadratic architecture matters, and what changes for developers, agents, and enterprise AI if the numbers hold up.

    TLDR

    Subquadratic is a Miami-based frontier AI lab that launched on May 5, 2026 with $29M in seed funding and a new LLM called SubQ. SubQ is the first fully sub-quadratic LLM, meaning attention compute grows linearly with context length instead of quadratically. The model offers a 12M token context window, around 150 tokens per second, roughly one-fifth the cost of leading frontier models, 95% accuracy on RULER 128K, 92% accuracy at the full 12M tokens, and the company is targeting 100M tokens by Q4 2026. Two products are launching in private beta: SubQ API (OpenAI-compatible, streaming, tool use) and SubQ Code (a CLI coding agent that plugs into Claude Code, Codex, and Cursor to load entire repositories into a single context window).

    Key Takeaways

    • SubQ is the first fully sub-quadratic LLM, with attention compute scaling at O(n) instead of the transformer’s O(n²).
    • The context window is 12 million tokens, enough to fit the entire Python 3.13 standard library (around 5.1M tokens) or roughly 1,050 React pull requests (around 7.5M tokens) in a single prompt.
    • At 12M tokens, SubQ reduces attention compute by almost 1,000x compared to other frontier models.
    • Pricing benchmarks: 95% accuracy on RULER 128K at $8 of compute, versus 94% accuracy at roughly $2,600 on Claude Opus, a 260x to 300x cost reduction.
    • Speed: about 150 tokens per second.
    • Cost: roughly 1/5 of other leading LLMs at 1M tokens, more than 50x cheaper according to launch coverage.
    • Two products in private beta: SubQ API (12M token window, streaming, tool use, OpenAI-compatible endpoints) and SubQ Code (one-line install CLI for coding agents, ~25% lower bills, 10x faster exploration, auto-redirects expensive model turns).
    • SubQ Code integrates with Claude Code, Codex, and Cursor, positioning Subquadratic as the long-context infrastructure layer beneath existing agent workflows rather than a competing chat product.
    • Architecture: a fully sub-quadratic sparse-attention design that learns which token relationships actually matter and skips the rest, redesigned from first principles.
    • Funding: $29M seed led by investors including Javier Villamizar (former SoftBank Vision Fund partner) and Justin Mateen (Tinder co-founder, JAM Fund), alongside early investors in Anthropic, OpenAI, Stripe, and Brex.
    • Founders: Justin Dangel (CEO, five-time founder) and Alex Whedon (CTO, ex-Meta engineer, former Head of Generative AI at TribeAI). Research team includes PhDs from Meta, Google, Oxford, Cambridge, and BYU.
    • Headcount is 11 to 50, headquartered in Miami, Florida, with active hiring for API engineering, developer advocacy, product design, sales, and people operations.
    • Tagline and thesis: “Efficiency is Intelligence.” The company argues that quadratic attention has been the real ceiling on AI applications, and breaking it unlocks workloads that were previously cost-prohibitive or architecturally impossible.

    Detailed Summary

    What is Subquadratic and what is SubQ?

    Subquadratic is a frontier AI research and infrastructure company. Their public homepage is intentionally minimal, with the single line “Efficiency is Intelligence.” and a contact email at [email protected]. The full product story lives on the launch demo site, where the company introduces SubQ as the first model built specifically for long-context tasks. The pitch is direct: SubQ is a sub-quadratic LLM built for 12M-token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.

    Three numbers dominate the marketing copy. Context: 12M token reasoning. Speed: 150 tokens per second. Cost: one-fifth of other leading LLMs. Those three numbers, taken together, are why this launch matters. Until now, you could optimize for one of the three at a time. SubQ claims to push all three at once because the underlying architecture changed, not because the company applied better quantization or smarter caching on top of a transformer.

    The architecture: why “sub-quadratic” is the whole story

    Standard transformers, the architecture behind ChatGPT, Claude, Gemini, and almost everything else, use dense self-attention. Every token compares itself to every other token, which means compute scales as O(n²) in the context length n. Double the context, quadruple the compute. That single property is the reason context windows are usually capped at 128K tokens for open models and around 1M tokens for the most aggressive frontier offerings, and it is the reason most production AI systems lean on retrieval-augmented generation, chunking, agentic retrieval, and prompt engineering tricks to dodge the cost curve entirely.

    SubQ is built on a fully sub-quadratic sparse-attention architecture, redesigned from first principles. The argument from co-founder and CEO Justin Dangel is that LLMs waste compute by processing every possible token-to-token relationship when only a small fraction of those relationships actually matter for the task. SubQ learns to find and focus only on those relevant relationships, which is what brings the scaling behavior down from O(n²) to O(n). At 12M tokens, this design cuts attention compute by almost 1,000x compared to other frontier models. The research community has been chasing this for years through linear attention, state space models, Mamba, and various sparse attention variants. According to Subquadratic, the unsolved problem was never the idea, it was building a sub-quadratic architecture that did not sacrifice frontier-level accuracy. That is what their team spent the time on.

    The benchmarks

    Subquadratic published a benchmark table comparing a SubQ 1M-Preview against Gemini 3.1 Pro, Claude Opus 4.6, Claude Opus 4.7, GPT-5.4, and GPT-5.5 across SWE-Bench Verified (real-world software engineering), RULER at 128K (long-context accuracy across 13 tests), and MRCR v2 8-needle at 1M (multi-round coreference resolution).

    • SWE-Bench Verified: SubQ scores 81.8%, ahead of Gemini 3.1 Pro at 80.6% and Opus 4.6 at 80.8%, with Opus 4.7 leading at 87.6%.
    • RULER at 128K: SubQ scores 95.0%, narrowly ahead of Opus 4.6 at 94.8% (internally evaluated). Other vendors did not report this benchmark.
    • MRCR v2 8-needle, 1M: SubQ scores 65.9%, behind Opus 4.6 at 78.3% and GPT-5.5 at 74.0%, but well ahead of GPT-5.4 at 36.6%, Opus 4.7 at 32.2%, and Gemini 3.1 Pro at 26.3%.
    • The launch blog post adds that on RULER 128K, SubQ scored 97% accuracy at $8 of compute, versus 94% on Claude Opus at roughly $2,600. That is a cost reduction of about 260x at superior accuracy.
    • On MRCR v2 specifically, the launch post lists SubQ at 83, Claude Opus at 78, GPT-5.4 at 39, and Gemini 3.1 Pro at 23.
    • At the full 12M token context, SubQ hits 92% on RULER while other frontier models reportedly break down well before reaching their stated 1M-token limit.
    • Subquadratic notes the SubQ results are third-party validated and a full technical report is forthcoming.

    The story these numbers tell is consistent: SubQ is competitive on traditional benchmarks like SWE-Bench, decisively better on long-context retrieval where compute economics dominate, and dramatically cheaper to run when the workload actually exercises a long context.

    The two products: SubQ API and SubQ Code

    SubQ ships in two flavors. The first is SubQ API, the full-context API for developers and enterprise teams. It exposes the 12M token context window, supports streaming and tool use, and uses OpenAI-compatible endpoints so existing client libraries and orchestration code can be repointed with minimal change. The product positioning is to process full repositories and pipeline states in a single API call at linear cost, rather than chunking inputs and stitching results.

    The second is SubQ Code, a long-context layer designed specifically for coding agents. Instead of competing with Claude Code, Codex, or Cursor, SubQ Code plugs into them. It maps codebases, gathers context, and answers token-heavy questions faster than the host agent’s default model. According to Subquadratic, the integration delivers roughly 25% lower bills and around 10x faster exploration, auto-redirects the most expensive model turns to SubQ, and installs in a single line. The design implication is that agent builders do not have to switch ecosystems to benefit from a 12M token window. They keep their preferred agent and offload the heavy long-context work to SubQ.

    Both products are in private beta. Access is gated through a request early access form where applicants choose SubQ Code, SubQ API, or both, and provide context about their workload.

    What 12M tokens actually unlocks

    Subquadratic illustrates the size of the context window with two concrete examples. The entire Python 3.13 standard library is roughly 5.1M tokens, well under the limit. Six months of React pull requests, around 1,050 PRs against the React codebase, comes in around 7.5M tokens, also under the limit with room to spare. At this scale, the standard pattern of curating which files or chunks the model gets to see goes away. The model just sees everything.

    The downstream implications are significant. RAG pipelines, embedding stores, chunking heuristics, and multi-agent coordination layers exist primarily to compensate for short context windows and quadratic compute. If a model can ingest the whole corpus in one pass at linear cost, large parts of that workaround stack become optional. Long-running agents can preserve full state instead of summarizing it. Coding agents can reason about a refactor across an entire repository without juggling tool calls. Document-heavy workflows in legal, finance, and research can run on the source material directly. And once Subquadratic hits its 100M token target by Q4 2026, the design space shifts again toward applications that depend on persistent state and long time horizons.

    The economic argument

    Subquadratic’s framing is that cost has become the binding constraint on AI deployment, not capability. Many ideas never reach production because the unit economics do not work out. Quadratic attention is the structural reason for that. By breaking the scaling law, SubQ aims to make previously cost-prohibitive workloads viable at scale: high-volume inference, longer included context, and applications that rely on sustained interaction with the model. The 260x to 300x cost reduction reported on RULER 128K is the headline number that operationalizes this thesis.

    The team and the funding

    Subquadratic raised $29M in seed funding. Investors include Javier Villamizar, former partner at SoftBank Vision Fund, and Justin Mateen, co-founder of Tinder and founder of JAM Fund, alongside early investors in Anthropic, OpenAI, Stripe, and Brex. CEO Justin Dangel is a five-time founder with prior companies in health tech, insurance tech, and consumer goods. CTO Alex Whedon previously worked as a software engineer at Meta and led over 40 enterprise AI implementations as Head of Generative AI at TribeAI. The research team is built around PhDs and published researchers from Meta, Google, Oxford, Cambridge, and BYU. The company is headquartered in Miami, Florida, with a headcount in the 11 to 50 range.

    Public hiring lists show the company is staffing across API engineering, founding developer advocacy, principal full-stack engineering, technical copywriting, account executive roles for enterprise sales, senior product design for the Voice AI and API surface, and head of people and talent operations. The Voice AI mention is notable because the public homepage at subq.ai still references a Speech-To-Text API as a current product, suggesting Subquadratic is operating across both speech and language with the same architectural thesis.

    The site itself

    The current public site at subq.ai is deliberately spartan. Visitors see only the company name, the line “Efficiency is Intelligence.”, and a contact email. The full marketing surface lives at the launch demo URL, which acts as the de facto homepage for the launch and links out to the request early access flow, the introducing SubQ blog post, the LinkedIn page, the X account, the Discord community, careers, press contact at [email protected], terms of use, privacy policy, cookies policy, and acceptable use policy. The structure makes sense for a private beta launch: keep the apex domain minimal, push announcement traffic to a dedicated launch site, and gate product access behind a form.

    Thoughts

    The interesting part of Subquadratic’s pitch is not the context window. It is the implicit claim that the entire workaround economy built around transformers, RAG vendors, vector databases, chunking middleware, agentic retrieval frameworks, context compression startups, was always a tax paid because of one architectural property: O(n²). If SubQ’s numbers hold up under independent scrutiny, a meaningful slice of that ecosystem becomes optional rather than mandatory. That has product, infrastructure, and venture implications that go well beyond a faster, cheaper LLM.

    The product strategy is also notably humble in a smart way. Subquadratic is not trying to win the consumer chat war against ChatGPT, Claude, or Gemini. SubQ Code is positioned as a layer underneath Claude Code, Codex, and Cursor, and the API is OpenAI-compatible. That is a classic infrastructure play: do not ask developers to abandon their tools, just route the expensive long-context turns to you. The “auto-redirects expensive model turns” framing is essentially a routing economic argument aimed at agent builders who already feel the pain of paying frontier prices for high-token requests.

    There are open questions worth holding lightly. The MRCR v2 numbers in the public benchmark table show SubQ behind Opus 4.6 and GPT-5.5, even as the launch post emphasizes a higher relative score. The cost comparisons rely on a specific compute basis that the upcoming technical report will need to spell out. And the gap between strong RULER scores at 128K and the 92% claim at 12M tokens is a long way to extrapolate without external replication. None of this is unusual for a launch, but it is the right place to apply pressure once the technical report drops.

    The bigger architectural bet is the one that should hold attention. If sub-quadratic attention done well genuinely matches frontier accuracy, then context length stops being a meaningful product axis and a generation of brittle infrastructure built around context limits gets reconsidered. Subquadratic is making the strongest public case so far that the post-transformer era starts with attention scaling, not parameter count. The next twelve months, the technical report, third-party benchmarks, and the first real production deployments through SubQ Code, will tell us whether this is the inflection point or another promising direction that does not quite cross the line. Either way, “Efficiency is Intelligence” is the right frame for where AI economics are heading, and Subquadratic is one of the few companies whose architecture is consistent with the slogan.

  • Brian Chesky on AI Founder Mode, the 11-Star Experience, and Reinventing Airbnb for the Age of AI

    Airbnb CEO Brian Chesky sits down with Patrick O’Shaughnessy on Invest Like The Best to talk about the next evolution of company building: AI Founder Mode. He covers the shift from founder to CEO, the lessons he learned from Steve Jobs through Hiroki Asai, why consumer AI is the next great frontier, and how he plans to change the atomic unit of Airbnb from a home to a person.

    TLDW

    Brian Chesky believes the next era of company building belongs to founders who refuse to delegate the soul of their company. He coined Founder Mode with Paul Graham after the pandemic forced him to take Airbnb back into his own hands. Now he is shaping what comes next: AI Founder Mode, where leaders work with on-demand context, fewer layers of management, asynchronous communication, and a new generation of hybrid manager-makers. He shares why most software companies have not been touched by AI yet, why consumer AI is about to explode, and why he is rebuilding Airbnb around people, not homes. The conversation also touches on the 11-Star Experience exercise, the power of small teams, why recruiting is the most important job a CEO has, and why every adult is still an artist underneath.

    Key Takeaways

    • Founder Mode is not micromanagement, it is having a steering wheel. Chesky woke up in 2019 feeling like the car had no steering wheel. After the pandemic, he reviewed every detail for two to three years before delegating again. Start hands-on and give ground grudgingly, not the other way around.
    • AI Founder Mode is even more intense. With AI, leaders can be in significantly more details because almost everything is on demand. Expect fewer layers of management, mostly asynchronous work, and the death of the pure people manager.
    • Two types of leaders will not survive AI. Pure people managers who only do one-on-ones, and rigid people who refuse to evolve. Everyone needs to be a hybrid manager-IC who can still touch the work.
    • Manage people through the work, not through meetings. Frank Lloyd Wright did it. Johnny Ive does it. You are not anyone’s therapist.
    • Consumer AI is the next great prize. 159 of the last 175 Y Combinator companies were enterprise. Almost every app on your home screen has not changed since AI arrived. That changes in the next 12 to 24 months.
    • Why consumer AI is hard. No proven business model, mature distribution, trend-chasing investor culture, and the simple fact that consumer is more hits-driven and requires excellence in design, marketing, culture, and press, not just technology and sales.
    • Project Hawaii is the new operating model. A 10 to 12 person Navy SEAL team, hands-on coaching from the CEO, crawl-walk-run-fly. The first project added roughly $200 million in year one and $400 to $500 million in year two.
    • Make the problem as small as possible. Airbnb spent 16 years failing to launch a second hit because it kept trying to scale globally on day one. Now: pilot in one city, expand to 10, then industrialize.
    • It is better to have 100 people love you than a million people sort of like you. Paul Buchheit shipped Gmail only after 100 Googlers loved it. The sample size of intense love is enough to predict mass adoption.
    • The 11-Star Experience is an imagination exercise. Push to absurdity (Elon takes you to space) so a 6 or 7-star experience suddenly seems normal. The gap between 5 and 6 stars is the gap between you and your competitor.
    • Simplicity is distillation, not subtraction. Hiroki Asai, Steve Jobs’s longtime creative director, taught Chesky that great design distills something to its essence. First principles is a design term too.
    • The score takes care of itself. Bill Walsh and John Wooden both taught that you do not focus on winning, you focus on making every input perfect. Wooden spent his first hour with new players teaching them how to put on socks.
    • Industrial design is the original product management. There are no PMs in industrial design. The designer is the PM, working alongside engineers and program managers to design through user journeys.
    • Recruiting is the CEO’s number one job. The more time you spend recruiting, the less time you spend managing, because great people self-manage. Build pipelines, not searches. Start with results, work backwards to people.
    • Co-hire the top 200 people, not just the executive team. Most CEOs hire executives and let them hire their teams. Chesky considers that fatal because most executives cannot hire well without help.
    • Bodybuilding is a metaphor for leadership. If you can change your body, you can change your life. Progressive overload, 1 percent a day, is how compounding works. Start with biology before therapy.
    • Founder-led companies build the deepest moats. Disney is still selling Walt’s playbook 60 years after he died. Apple is still selling Steve’s iPhone. The longer founders stay in founder mode, the more the company can endure when they leave.
    • Software is hyper fast fashion. Hardware ages well. Buildings get patina. Software always looks dated 10 years later. What endures is the community, the brand, the principles, the mission, and the network effect.
    • Apps are dying. Agents are coming. Chesky says we should let go of our attachment to apps because they are not what the future looks like.
    • Airbnb’s atomic unit is changing from a home to a person. Chesky wants to build the most authenticated identity on the internet, the richest preference library, a real-world social graph, and a membership program. Then expand to 50 to 70 verticals on top of that identity.
    • AI shifts attention from consumption to creation. Social media gave you a paintbrush only for opinions. AI gives everyone a real paintbrush and canvas. We are heading into a creative renaissance.
    • Founders are expeditionaries, not visionaries. They put one foot in front of the other and call it a vision later.
    • Detach from accolades. Chesky describes adulation as a cup with a hole in the bottom. Status is a drug. The path to durable creative work is doing it because you love it, the way Walt Disney, Da Vinci, Van Gogh, and Steve Jobs did until the very end.
    • The kindest gift is belief. The best way to activate a person’s potential is to see something in them they do not yet see in themselves.

    Detailed Summary

    From Industrial Design to the CEO Chair

    Chesky studied industrial design at the Rhode Island School of Design. He chose it on instinct after a department head told him industrial designers design everything from a toothbrush to a spaceship. He grew up enchanted by the Reebok Pump, the Game Boy, the Nintendo, and eventually by the late 1990s golden age of Apple. Raymond Loewy, the man who designed Air Force One and an enormous catalog of mid-century consumer products, became a touchstone, but Johnny Ive was the real hero.

    What he loved about industrial design was that it is technical, commercial, and empathetic. A building can win an architecture award and never be leased. A piece of industrial design that does not sell is a failure. So you have to think about manufacturing, distribution, marketing, and most importantly, user journeys. There are no product managers in industrial design. The designer is the PM. That training, he says, prepared him directly for the role of CEO.

    The Pandemic and the Birth of Founder Mode

    Chesky says no one is born a good CEO. People are born good founders. The job of CEO is counterintuitive in almost every direction. Founders are taught to learn by doing, but a CEO who learns by trial and error wastes years unwinding the empires of misfit hires.

    By 2019 he was running a 7,000 person company he no longer recognized. He felt he was driving a car without a steering wheel. He had a dream that he had left Airbnb for ten years and come back to find it had become a giant political bureaucracy. Then he realized he had been there the whole time. The pandemic hit and Airbnb lost 80 percent of its business in eight weeks. He shifted from peacetime to wartime, took control of every detail, worked 100-hour weeks, and reviewed everything for two to three years.

    The vision was never to micromanage forever. The vision was: I need to know what is going on before I can empower anyone. Hire people, audit their work, and only then give ground grudgingly. Most founders do the opposite, which is why they end up with executives building empires they later have to dismantle.

    AI Founder Mode

    Chesky says AI Founder Mode will be even more intense than Founder Mode because nearly everything will be on demand. He used to live in 35 hours of meetings a week to gather information, the same way Steve Jobs ran Apple. He held weekly, biweekly, monthly, and quarterly group reviews with the full chain of command in one room, anyone could speak, and he made the final call after listening last.

    In the AI era, that culture shifts from meetings to asynchronous work. He expects fewer layers of management. He cites the Catholic Church as a 2,000-year-old institution with only four layers and asks why most companies need seven, eight, or nine. Pure people managers will not survive. Every manager will have to be a hybrid IC, an engineer who still codes, a lawyer who still reads case law, a designer who still designs. You manage through the work, not through one-on-ones.

    He is also bullish that AI tooling will become consumer-grade simple very soon. The current tools, including Claude Code and Cowork, are not yet intuitive to the average person, but the economic incentive will force that to change.

    Why Consumer AI Is the Next Great Frontier

    Chesky points out that 159 of the last 175 Y Combinator companies were enterprise. Almost every consumer app on your phone, including Airbnb, has not fundamentally changed since the arrival of AI. He gives four reasons: investors feared ChatGPT would kill consumer companies; consumer AI has no proven business model because subscriptions hit a local max against free Claude and Gemini, ads are off the table for most labs, and e-commerce has been shut down via third-party app removals; distribution is mature; and Silicon Valley culture, while branded as rebellious, is in practice trend-following.

    The deeper reason is simply that consumer is harder. It is hits-driven, requires great design, marketing, culture, press, and you cannot easily start by selling to your dorm-mates the way enterprise YC startups sell to other YC startups. The prize is bigger. The risk is bigger. He predicts a consumer AI renaissance over the next 12 to 24 months.

    Project Hawaii and the Magic of Small Teams

    Inside Airbnb, Chesky tested a new operating model called Project Hawaii. He took 10 to 12 people, designers, engineers, product, and data scientists, treated them like a startup inside the company, and pointed them at one problem: improving the guest funnel. The system is crawl, walk, run, fly. First fix bugs, then add features, then re-imagine flows, then completely reinvent.

    The first team delivered roughly $200 million of internal revenue in year one and $400 to $500 million the next year, eventually contributing more than 600 basis points of conversion improvement on a base of $134 billion in gross sales. Then they took the same system to pricing, then to other problems, then to launching new businesses like Services and Experiences.

    The guiding lesson: make the problem as small as possible. Airbnb launched in one city, New York. Uber in San Francisco. DoorDash in Palo Alto. When Chesky launched Services and Experiences in 100 cities at once last year, it did not work. The fix was to dominate one city, expand to 10, then industrialize. Peter Thiel said it cleanly: better to have a monopoly of a tiny market than a small share of a big market.

    Underneath that is a Paul Buchheit insight Chesky calls the best advice he ever got. It is better to have 100 people love you than a million people sort of like you. Buchheit refused to ship Gmail until 100 Googlers loved it, and that took two years. Once 100 people loved it, 100 million people did.

    The Hiroki Asai Lessons: Simplicity and Craft

    Hiroki Asai, Steve Jobs’s quietly legendary creative director, taught Chesky two principles. The first is that simplicity is not removing things, simplicity is distillation, understanding something so deeply that you can express its essence. Steve Jobs called design the fundamental soul of a man-made creation that reveals itself through subsequent layers. Elon Musk’s first principles thinking is the same idea applied to physics.

    The second is craft. How you do anything is how you do everything. Chesky cites Bill Walsh’s The Score Takes Care of Itself and John Wooden’s first hour with UCLA players, an hour spent teaching them how to put on their socks. Walsh said the way you tucked your jersey was one of 10,000 details that decided whether you won. The lesson is to focus on getting every input right. The output follows.

    The 11-Star Experience

    The 11-Star Experience is one of Chesky’s most copied frameworks. Most Airbnb stays get five stars because anything else means something went wrong. So Chesky asked: what would six stars look like? Your favorite wine on the table, fruit, snacks, a handwritten card. Seven stars? A limousine at the airport and the surfboard waiting for you because they know you surf. Eight stars? An elephant and a parade in your honor. Nine stars, the Beatles arrive in 1964 with 5,000 screaming fans. Ten stars, Elon Musk takes you to space.

    The point is the absurdity. By imagining the impossible, six and seven star experiences stop seeming crazy. The gap between five and six stars is the gap between you and your competitor. If you can industrialize a sixth star, you may have product-market fit. The exercise also restarts your imagination, which Patrick noted has atrophied for many people in the era of consumption-only social media.

    AI as a Canvas for Creativity

    Chesky frames AI as the ultimate platform shift, the ultimate creative expression, and possibly the greatest invention in human history. Social media made us mostly consumers and gave creators only opinion-shaped tools. AI gives everyone a paintbrush. He believes far more people are creative than we recognize because most have never had craftsmanship or tools to express what is in their heads. Pablo Picasso said all children are born artists; the problem is to remain one as you grow up. Chesky thinks every adult is still an artist underneath.

    The Next Chapter of Airbnb

    Chesky describes four phases of the CEO journey: get to product-market fit, scale to hyper-growth, become a real profitable public company, and finally reinvent. Airbnb’s stock has been flat because the core idea is saturating. He is now squarely in phase four, with three priorities.

    First, change the atomic unit from a home to a person. He wants Airbnb to build the most authenticated identity on the internet, the richest preference library, a real-world social graph, and a membership program. Proof of personhood, he says, will be enormously valuable in the AI age. Second, industrialize the new-business engine to support 50 to 70 verticals (homes, experiences, services, eventually flights, and more) all built on top of that personal atomic unit. Third, navigate the AI transition without breaking the existing business or the livelihoods of hosts. He is also exploring sandbox apps that imagine a radically different Airbnb, the answer to “what is after Airbnb?”

    What Endures in the Age of AI

    Chesky is direct that software does not endure. Look at any software from 10 years ago and it looks dated. Hardware ages better. Buildings develop patina. Paris endures. So if you want to build something lasting, you cannot bet on the app. You have to bet on the community, the brand, the mission, the principles, the identity, and the network effect. Apps are going away, replaced by agents. Founders attached to apps need to let go.

    Founder-Led Moats: Disney and the Ham Sandwich Paradox

    Chesky reconciles Warren Buffett’s “buy a company a ham sandwich could run” with the venture capital truth that a founder’s ceiling is the company’s ceiling. The reconciliation is Disney. Most people cannot name a Paramount, Warner Brothers, Universal, or MGM film off the top of their head, but everyone can name Disney films. Walt Disney was a founder in founder mode for so long that he created enough IP and momentum that the company has been running on his playbook for 60 years after his death. Apple is similar with Steve Jobs and the iPhone.

    The counterintuitive lesson: if you want a company to last 100 years, do not delegate early to make it independent of you. Stay in founder mode for as long as possible so you can institutionalize the magic deeply enough that it endures after you. Tech is the industry of change, so founder mode matters even more there than in chocolate or insurance.

    Bodybuilding as Leadership Training

    Chesky was a 135-pound late bloomer who told his friends he would compete at the national level in bodybuilding by 19. He did. Two lessons came out of it. First, if you can change your body, you can change your life. Start with biology before therapy. Second, you cannot get in shape in one day. Progressive overload, discipline, consistency, and roughly 1 percent a day compound into massive gains. The visible feedback loop in bodybuilding taught him to break invisible problems (like the quality of a leadership team) into observable, measurable proxies (like the quality of the room at a twice-yearly roadmap review of the top 100 people).

    Recruiting as the CEO’s Number One Job

    Sam Altman told a 27-year-old Chesky he would spend 50 percent of his time on hiring. Chesky did not, and considers that his biggest mistake. He now starts and ends every day with his recruiter and spends two to three hours a day on hiring. The more time you spend recruiting, the less time you have to spend managing because great people self-manage.

    His system is pipeline recruiting, not search recruiting. He never starts with a search firm. He constantly meets the best people in their fields, asks each one to introduce him to the next two or three best, and builds a rolling rolodex. He starts with results, finds an ad he loves, and works backwards to the team that made it. He builds little mafias of top talent inside the company. He is the co-hiring manager for the top 200 people at Airbnb, not just executives, because most executives cannot hire well without help.

    Activating Talent and the Power of Belief

    You cannot teach motivation. You can only give people a problem and see if they have agency. The way to activate someone, Chesky says, is to show them potential they cannot yet see in themselves. He cites John Wooden, who said the secret to coaching was that he saw potential in players they did not see in themselves. People will climb mountains for that.

    The kindest gift anyone gave Chesky, he says, was belief. A high school art teacher named Miss Williams told his parents he was going to be a famous artist. He never became one, but the belief gave him the confidence to choose art school and to choose to be happy. Michael Seibel and the Justin.tv founders believed in him. Paul Graham made an exception to fund a non-engineer with what he thought was a bad idea. His co-founders Joe and Nate believed in him when he had no business being a CEO. The biggest gift you can give back, he says, is belief in others.

    Detaching from the Scoreboard

    Chesky describes adulation as a cup with a hole in the bottom. Status keeps draining out and you keep needing more to feel the same. The day Airbnb went public at a $100 billion valuation should have been one of the best days of his life. The next morning he put on sweatpants for a Zoom meeting and felt nothing. That triggered a re-evaluation. He stopped seeking accolades and started focusing on intrinsic work. He cites Rick Rubin: an artist is an artist when they make for themselves. He cites Vice President Obama, who told him to focus on what you want to do, not who you want to be.

    His four heroes are Leonardo da Vinci, Vincent Van Gogh, Walt Disney, and Steve Jobs. All four were working until the last week or day of their lives. Da Vinci carried the Mona Lisa with him until he died. Van Gogh sold one painting in his life. Disney was imagining theme parks in the ceiling tiles of his hospital room. Chesky says his motivation is the motivation of an artist. He calls being a CEO of a public company at his scale “almost a glitch in the system” that gave him one of the largest design canvases in human history.

    Thoughts

    What stands out about this conversation is how clearly Chesky has decoupled identity from outcome. He frames himself first as a designer, second as a CEO, and considers the resources he commands as a kind of accidental fortune for an industrial designer to be sitting on. That self-image is what lets him talk about disrupting Airbnb, killing the app paradigm, and changing the atomic unit of the company without flinching. Most public-company CEOs cannot afford that posture.

    The framework worth stealing is Project Hawaii. The pattern of taking a 10-person elite team, putting them under direct CEO coaching, and running them through crawl-walk-run-fly is a near-universal answer to the problem of innovation inside a large company. It works because it removes abstraction layers, creates direct contact with reality, and gives the founder a way to teach muscle memory before delegating. Anyone running a team of any size can borrow the pattern: pick one problem, staff it small, work with it weekly, then let go gradually. The golf-instructor analogy of teaching muscle memory before bad habits set in might be the most important management metaphor of the year.

    His prediction about consumer AI is the most economically interesting part of the talk. The fact that 159 of 175 recent YC companies are enterprise is a startling concentration. If he is right that the next 12 to 24 months bring a consumer renaissance, the opening is enormous. The hard part is what he names directly: there is no proven business model for consumer AI yet. Subscriptions cap out against free incumbents, ads are off-limits for the labs, and e-commerce has been throttled. Solving the business model is probably more valuable than building the next great consumer interface.

    The deeper philosophical thread, that AI is the transition from consumption to creation, is one that anyone building tools for makers should hold close. The 11-Star Experience also reads differently in the AI era. It used to be a thought exercise constrained by what you could plausibly build. AI compresses the gap between imagination and execution to minutes, sometimes seconds. The question is no longer “what is the most absurd version of this experience?” but “which six and seven star experiences can I now industrialize that were unthinkable a year ago?” The exercise has become operational.

    Finally, the meta-lesson on founder-led moats is worth taking seriously. The instinct in venture capital and at most public-company boards is to professionalize early. Chesky’s argument is the opposite: the longer the founder stays in founder mode, the deeper the IP and the longer the company endures after they leave. Disney is the proof. Apple is the proof. Whether Airbnb will be is the open question, and it is the question Chesky is using AI Founder Mode to answer.

  • Howard Marks on Why Most Investors Lose, the AI Bubble, India, and the Hunt for the $10 Bill Nobody Picked Up

    TLDW

    Howard Marks, co-founder of Oaktree Capital and the author of the memos every serious investor reads first, sat down with Nikhil Kamath for a wide-ranging conversation on his 50+ year career, the philosophy of Mujo (the inevitability of change), why he chose bonds over stocks, the difference between drifting down the river and seeing it, where we sit in the current cycle, AI as both threat and opportunity, why active management lost to indexation, and why the only way to outperform in a world full of smart, motivated, computer-literate competitors is “superior insight.” His core message: investing is a puzzle that cannot be solved by formula, and the only edge that lasts is being more right than the other person, more often, with the discipline to stay calm when everyone else is panicking or partying.

    Key Takeaways

    • Mujo is the operating system. Marks took Japanese literature at Wharton and walked away with one idea that shaped his whole career: change is inevitable, unpredictable, and uncontrollable. You cannot predict the future, but you can prepare for it.
    • Cycles are excesses and corrections, not ups and downs. The S&P 500 has averaged about 10% per year for 100 years, but it is almost never between 8% and 12% in any given year. The norm is not the average. Greed and fear push the pendulum past equilibrium every time.
    • The recovery is two years older. When asked where we are in the cycle, Marks notes the bull market continued from April 2024 through January 2026, so by definition we are deeper into the cycle, with a recovery distorted by the unique man-made COVID recession.
    • Drifting versus seeing the river. Marks describes the first 35 years of his career (roughly age 14 to 49) as drifting. Starting Oaktree in 1995 was the first truly intentional decision he made. Entrepreneurship forced proactivity on him.
    • Why bonds over equities. The contractual, predictable nature of debt suited his conservative temperament (his parents were adults during the Depression). He was not voluntarily moved to bonds in 1978; a boss reassigned him just in time for the birth of the high-yield bond market.
    • Distressed debt is the bigger story. Bruce Karsh joined in 1987 and has run roughly $70 billion in distressed debt since 1988, with profits well over 90% of the total profit and loss.
    • Excess return is getting paid more than the risk warrants. If the market thinks a borrower has a 5% default probability and you correctly conclude it is 2%, you collect interest priced for 5% risk while taking 2% risk. That gap is the alpha.
    • Oaktree’s default rate is about a third of the market. Over 40 years, roughly 3.6% to 3.7% of high-yield bonds default each year. Oaktree’s rate is roughly one-third of that, achieved through process discipline, institutional memory, and analysts who stay analysts for life.
    • If you are starting a career today, understand AI. Marks says the investor who will make the most money over the next 10 years is the one who best understands AI and its capabilities, whether they bet for or against it.
    • AI is excellent at pattern matching, but cannot create new patterns. Can AI pick the Amazon out of five business plans? The Steve Jobs out of five CEOs? Marks bets no. Most humans cannot either, which means there is still a role for exceptional people.
    • Indexation won because active management lost. Passive did not become dominant because it is brilliant. It dominated because most active managers failed and charged high fees for the privilege.
    • Bad times create openings for active managers, but most cannot take them. Panic drives prices down, but the same panic prevents most investors from buying. Wally Deemer: when the time comes to buy, you will not want to.
    • The job is simple but not easy. Find the best managers, the best companies, the best ideas. Charlie Munger told Marks: anyone who thinks it is easy is stupid.
    • Where is the $10 bill nobody picked up? Marks thinks it is around AI, but only for those with insight above the average. If you are average and you crowd into AI, you get average results in a bull case and worse in a bear case.
    • Quantitative information about the present cannot produce alpha. Andrew Marks (howards son) pointed this out to his father during the COVID lockdown. Everyone has the same data. Outperformance has to come from somewhere else.
    • Buffett’s edge was reading Moody’s Manuals when nobody else would. The pre-internet research process favored those willing to do tedious work alone. The format of the edge changes; the fact that edge requires doing what others will not, does not.
    • You cannot coach height. Marks can tell you that second-level thinking, contrarian insight, and the ability to evolve at 80 are essential. He cannot tell you how to acquire any of them.
    • India: Marks declines to opine. He has deployed roughly $4 billion in India but refuses to claim expertise on the Indian stock market or recommend a sector.
    • History rhymes. Marks credits Mark Twain. The lessons that repeat are lessons of human nature, which changes incredibly slowly.
    • Investing is a puzzle, not dentistry. Quoting Taleb, Marks observes that engineers and dentists succeed by repeating the right answer. Investors face a problem with no certain solution. If you need to be right every time, do not become an investor.

    Detailed Summary

    From Queens to Wharton: The Accidental Investor

    Howard Marks grew up in Queens, New York, in a middle-class family. Neither of his parents went to college, but his father was an intelligent accountant. Marks discovered accounting in high school, fell in love with its orderliness, and chose Wharton because he was told it was the best undergraduate business school in America. Wharton required a literature class in a foreign country and a non-business minor. For reasons he no longer remembers, Marks chose Japanese studies, then took Japanese civilization and Japanese art. He calls it the most important academic decision of his life because of one concept he encountered: Mujo.

    Mujo, Independence of Events, and Why You Cannot Predict

    Mujo, the turning of the wheel of the law, teaches that change is inevitable, unpredictable, and uncontrollable, and that humans must accommodate it rather than try to control it. Marks pairs this with his deep belief in the independence of events: ten heads in a row do not change the odds on flip eleven. Roughly 20 years ago he wrote a memo titled “You Can’t Predict. You Can Prepare.” A portfolio cannot be optimized for both extreme upside and extreme downside, but it can be built to perform respectably across many possible futures, if you suboptimize for the middle of the probability distribution.

    Why Cycles Exist

    If GDP averages 2% growth, why is it never simply 2%? Marks’s answer is excesses and corrections. Optimism leads producers to overbuild and consumers to overspend, growth runs above trend, then satiation and oversupply pull it back below trend. The S&P 500 averages 10% per year over a century, but the return in any given year is almost never between 8% and 12%. The norm is not the average because human beings are not average; they are alternately greedy and fearful.

    Where Are We Now?

    Two years ago Marks told the Norwegian Sovereign Wealth Fund’s Nicolai Tangen that we were near the middle of the cycle. Two years later, the bull market in stocks continued through January 2026, so by simple math the recovery is older. The COVID recession was a man-made anomaly: one quarter of negative growth followed by the best quarter in history, triggered by a deliberate global shutdown rather than by accumulated excess. That distorts every traditional cycle metric.

    Drifting Versus Seeing the River

    One of the most personal moments in the conversation is Marks’s confession that he drifted for the first 35 years of his career. He did not pick his career, his first job, or his transition from equities to bonds in any deliberate way. Other people pushed him; he said yes. The first proactive decision of his life was co-founding Oaktree in 1995 at age 49, and even that came largely because his wife and his partner Bruce Karsh pushed him into it. Once he had to lead, he had to be intentional. Leadership cannot be passive.

    The Bond Decision

    Marks did not choose bonds; bonds chose him. In May 1978 his boss at Citibank moved him to the bond department to start a convertible fund. Three months later another phone call asked him to figure out something called high-yield bonds being run by a guy in California named Milken. Marks said yes both times. He arrived at the front of the line for high-yield in 1978 and has been there for 48 years.

    The conservative temperament fit. Marks’s parents were adults during the Depression, so he grew up hearing “don’t put all your eggs in one basket” and “save for a rainy day.” Bonds offered contractual, predictable returns. The phrase “junk bonds” was a bias that made the asset class cheaply available to anyone willing to do the analytical work.

    Distressed Debt and Excess Return

    When Bruce Karsh joined in 1987, Oaktree launched what Marks believes was the first distressed debt fund from a mainstream institution. Karsh has managed about $70 billion since 1988 with well over 90% of the total being profit. The core skill is predicting default probability better than the market. If consensus prices a borrower at a 5% default risk and you correctly assess 2%, the interest you receive is overpaid relative to actual risk. Marks calls this “excess return” and credits Mike Milken with the foundational insight: lend to borrowers others will not, demand interest beyond what compensates you, and the math works.

    Over 40 years, roughly 3.6% to 3.7% of high-yield bonds default annually on average. Oaktree’s default rate has been roughly one-third of that. Marks credits institutional culture (analysts who stay analysts for life), psychological stability in volatile periods, and a process that forces every analyst to ask the same eight questions of every company every time. In equity research, you can buy a stock for great management without examining the product, or for a great product without examining the management. In Oaktree’s bond process, you cover every base every time.

    Beginning a Career Today: The AI Question

    Asked what he would do today, Marks says the front of the line is AI. The investor who will succeed most over the next decade is the one who best understands AI, whether they bet for or against it. He notes that he was shocked by his own experience using Claude, but adds that he has not fired a single person and does not intend to.

    His view: AI excels at extracting patterns from history and applying them with discipline and without psychological wobble. But investing also requires creating new patterns. Can AI sit with five business plans and identify the future Amazon? Can it sit with five CEOs and pick Steve Jobs? Marks bets not. Then he adds the killer line: most humans cannot either. Which means the role for exceptional humans survives, but the bar gets higher.

    Why Indexation Won

    When Marks went to graduate school at the University of Chicago in 1968, his professor pointed out that most mutual funds underperformed the S&P after fees. Index funds did not exist yet; Jack Bogle launched the first one in 1974. Today, most equity mutual fund capital is passive. Marks’s controversial take: indexation did not win because it is great. It won because active management was so bad and so expensive. Even at equal fees, if active decisions are inferior, passive wins.

    Bad times create openings for active managers because panic drives prices down, but the same panic prevents most people from buying. Marks quotes the old trader Wally Deemer: when the time comes to buy, you will not want to. The advantage of an AI nudge that says “this is one of those moments, get your ass in gear and buy something” might genuinely add value, because it removes the emotion.

    Second-Level Thinking and Why You Cannot Coach It

    Marks’s first book, The Most Important Thing, has 21 chapters, each titled “The Most Important Thing Is…” Each one is different because so many things matter. The chapter on second-level thinking came to him spontaneously while writing a sample chapter for Columbia University Press. The argument is simple: if you think like everyone else, you act like everyone else, and you get the same results. To outperform, you must deviate from the herd and be more right than the herd. Different is not enough. Different and better is the bar.

    Can AI become a contrarian thinker? You can prompt Claude to give you only non-consensus answers, but the catch is that consensus is often close to right because the people building consensus are intelligent, educated, computer-literate, and motivated. Forcing non-consensus often forces wrong. The real edge is being non-consensus AND correct, which is a much narrower target.

    The $10 Bill That Nobody Has Picked Up

    Marks references the joke about the efficient market hypothesis: there is no $10 bill on the sidewalk because if there were, somebody would have already picked it up. He then concedes that the bill is probably around AI today, but only for those whose insight rises above the average. If you are average and you crowd into AI, you go along with the tide if it works and get crushed if it does not. Quoting Garrison Keillor’s Lake Wobegon, “where all the children are above average,” Marks notes that the math does not allow it. Most investors will not be above average, and acknowledging that is the first step toward becoming one of the few who are.

    Learning From Andrew, Buffett, and Onion-Skin Manuals

    Marks lived with his son Andrew during COVID and wrote a memo about it called “Something of Value” in January 2021. Andrew’s most important contribution was a near-revelation: readily available quantitative information about the present cannot be the source of investment alpha because everyone has it. Buffett’s edge in the 1950s was reading Moody’s Manuals (giant books printed on onion-skin paper with tiny type and zero narrative) when nobody else would. The medium changes; the principle that edge requires doing what others will not, does not.

    India

    Kamath asks Marks directly about India. Marks has deployed roughly $4 billion there but politely declines to claim any expertise on the Indian stock market or recommend a sector. He cautions Kamath about taking advice from people who do not know what they are talking about, and includes himself in that category on the question of India. The honesty is striking and is itself an investment lesson.

    History Rhymes, and Final Advice

    Marks reads Andrew Ross Sorkin’s 1929 and references it in an upcoming memo on private credit. He likes Mark Twain’s reputed line that history does not repeat but it rhymes, and Napoleon’s line that history is written by the winners of tomorrow. The lessons that rhyme are lessons of human nature, which evolves incredibly slowly. Fight or flight from the watering hole still drives behavior in financial markets.

    His final advice: investing is a puzzle, not engineering. A civil engineer calculates steel and concrete, builds the bridge, and the bridge stands. Every time. A dentist fills the cavity correctly and it stays filled. Every time. If you need that kind of reliability in your work, become a dentist. Investing is the act of positioning capital for a future that cannot be predicted accurately. You will be wrong sometimes. If something in your makeup cannot tolerate being wrong sometimes, do not become an investor. The puzzle has no final solution, which is exactly what makes it endlessly interesting.

    Thoughts

    The most useful thing Marks does in this conversation is admit, repeatedly and without ego, what he does not know. He does not know whether AI models differ in real intelligence. He does not know which sector in India to bet on. He does not know how to teach second-level thinking. He drifted for 35 years and only began making intentional decisions at 49. This honesty is the inverse of every guru selling certainty, and it is the actual content of the lesson he is trying to convey: epistemic humility is the precondition for superior insight, because you cannot acquire what you already think you have.

    The deepest insight in the conversation might be the one Andrew Marks (Howard’s son) gave his father during COVID: readily available quantitative information about the present cannot produce alpha because everyone has it. This is devastating in the AI era. If everyone is asking the same large language model the same question, the answers converge, and convergence is consensus, and consensus does not pay. The arms race for proprietary data, novel framings, and unconventional questions is the only thing that can break the convergence.

    Marks’s framing of cycles as excesses and corrections rather than ups and downs is genuinely useful. It reframes volatility from something to fear into something to expect, and reframes the question from “where are we going?” to “how far past trend have we already gone?” The 8 to 12 percent observation about the S&P (that the average return is almost never the actual return) is the kind of fact that should be taught in every introductory finance class but is almost never mentioned.

    The most contrarian claim in the conversation is the one about indexation: that it won because active was bad, not because passive is great. This is a useful inversion. Most defenders of passive investing argue from efficient market theory; Marks argues from the empirical failure of active managers. The implication is that if you can find the small population of active managers who genuinely outperform, the indexation argument falls apart for that subset. Most cannot. The hardest job in investing is the meta-job of identifying the few who can.

    The exchange about AI as a contrarian engine is one of the most clarifying short discussions of AI’s investment limits I have read. Different from consensus is easy. Different and better is the actual goal. Forcing different gets you wrong more often than right because consensus, built by smart, motivated, educated competitors, is usually close to correct. This is why “use AI to find non-consensus ideas” is a worse strategy than it sounds.

    Finally, the Buffett-Moody’s-Manual story is the most quietly profound moment in the interview. The edge in 1955 was the willingness to read tiny type on onion-skin paper alone in an office in Omaha when no one else would. The edge in 2026 is whatever the modern equivalent of that is, and the only honest answer is: nobody knows yet, which is precisely why finding it is worth so much money.

  • Inside Figure: Brett Adcock’s $39 Billion Bet on Humanoid Robots, Helix AI, and the Race to Physical AGI

    Figure is the $39 billion humanoid robotics company most likely to put a general-purpose robot in a commercial workforce, and possibly your living room, before the end of the decade. In a rare two-part sit-down on Sourcery with Molly O’Shea, Founder and CEO Brett Adcock opened every door of the company’s San Jose campus, walked through the manufacturing line, demoed Helix 2 cleaning a living room with no teleoperation, and laid out the plan to scale from thousands of robots in 2026 to a million units a year. He also explained why he fired the OpenAI partnership, why he believes humanoids will reach AGI before any other form factor, and why Figure 04 will be the company’s “iPhone 1 moment.”

    TLDW

    Brett Adcock founded Figure in 2022, self-funded it through a million-a-month burn rate in the first four months, and 15x’d the valuation to $39 billion in 18 months on roughly $2 billion raised from Jeff Bezos, Microsoft, Nvidia, Amazon, and originally OpenAI. The company designs every part in-house, from motors and batteries to the Helix vision-language-action neural network running onboard each robot. Figure deployed humanoids on a BMW assembly line for six months in 2025, hit record production in March 2026, plans to triple that by May, and is targeting a million units per year. Adcock argues that humanoid robotics is an intelligence problem, not a manufacturing problem, that under half of global GDP is human labor (a market measured in tens of trillions of dollars), and that physical interaction data may be the missing ingredient to true artificial general intelligence.

    Key Takeaways

    • Figure is valued at $39 billion after raising nearly $2 billion. Adcock 15x’d the valuation in 18 months and believes the eventual revenue opportunity is in the tens of trillions because roughly half of global GDP is human labor.
    • The bottleneck is intelligence, not manufacturing. Figure already has the parts, the supply chain, and the capacity. The hard part is making robots that run autonomously at human-level performance for 7 to 10 hours a day with zero human intervention.
    • Figure designs almost everything in-house. Motors, rotors, stators, sensors, kinematics, joints, batteries, more than 100 PCBs. Adcock claims no other humanoid group designs more parts than Figure.
    • The OpenAI breakup was about model quality. OpenAI led Figure’s Series B and brought in Microsoft. After a year of collaboration, Adcock says Figure’s internal robot-learning team was running circles around OpenAI on humanoid AI, so he ended the partnership.
    • Helix is Figure’s onboard vision-language-action model. It runs on GPUs in the robot’s torso, ingests camera pixels a few hundred times per second, and outputs joint positions for all ~40 motors. It works without internet connectivity. Helix 2 launched a couple months ago.
    • Robots have more body positions than atoms in the universe. With 40 motors each capable of 360 degrees of rotation, the state space is 360 to the power of 40, which is why Figure abandoned hand-coded controls in favor of neural networks about a year ago.
    • The “Never Fall” protocol is real. A project called Vulcan uses reinforcement learning to keep the robot upright even after losing a knee, ankle, or hip mid-task. The company demoed a robot hobbling on a velocity-locked knee.
    • Figure 03 is the current production robot. It costs roughly 90% less than Figure 02, comes in under $100K per unit, has soft-wrapped foam shoulders, swappable fabric clothing, a high-top sneaker design, and inductive wireless charging at 2 kW through the feet (4 to 5 hours of runtime per 1 hour of charge).
    • Figure 04 is being teased as the “iPhone 1 moment.” Adcock says the jump from Figure 03 to Figure 04 will be the largest generational improvement they have ever made, far bigger than 1 to 2 or 2 to 3.
    • BMW deployed Figure robots for six months in 2025. The robots helped build a BMW X3 in the body shop. Adcock owns the first humanoid-built X3 personally and describes the deployment as the inflection point that led to Helix 2.
    • Home robots will lease for around $400 to $600 a month. Comparable to a car lease. The robot docks itself in a 2-by-2-foot wireless charging station and runs laundry, dishes, and tidying tasks autonomously.
    • Data is the biggest blocker. Figure has roughly 1 million hours of pre-training and mid-training data plus thousands of hours of post-training data. They also pay people in spandex bodysuits to do joint-level human movement capture.
    • Adcock runs three companies simultaneously. Figure (humanoids), Cover (terahertz weapons-detection imaging spun out from NASA Jet Propulsion Lab for K-12 schools), and Hark (an AI lab building personalized AI models and devices, out of stealth two weeks ago).
    • Physical AGI is the explicit goal. Adcock argues that real-world interaction data, learning by touching the world and observing the consequences, is the missing piece for true AGI, and that humanoids may reach it before chatbots do.
    • Security is paranoid by design. A drone was caught hovering outside Figure’s office at one point. They tented the windows, restrict phones in certain areas, and treat industrial CAD and software as high-value IP.

    Detailed Summary

    The Company in Context

    Figure is less than four years old. Adcock founded it in 2022 after stepping away from Archer Aviation, the eVTOL aircraft company he took public. He self-funded Figure to a million dollars a month in burn within four months, hired a 40-person team in four to five months, and pursued a vertically integrated strategy from day one. The thesis is simple. Roughly half of global GDP is human labor. Wages paid to commercial workers run into the tens of trillions of dollars annually. If you can build a humanoid that does general-purpose human work reliably, the resulting business compounds into one of the largest companies in history.

    The campus in San Jose has four buildings: corporate headquarters with 250 to 300 engineers, BotQ (the manufacturing facility), the Grid (a 24/7 robot stress-test environment that runs holidays and weekends), and a design studio that opened to cameras for the first time. Total headcount is around 500. The company has raised close to $2 billion across rounds, with capital from Jeff Bezos, Microsoft, Nvidia, and Amazon. The valuation jumped 15x to $39 billion in 18 months.

    Why Humanoid Robotics Is an Intelligence Problem

    The core technical insight: a humanoid has roughly 40 motors, each capable of full 360-degree rotation, which produces a state space of 360 to the power of 40. That number is larger than the count of atoms in the observable universe. You cannot write hand-coded control logic for that. Figure pivoted entirely from classical controls to neural networks about a year ago, and the team has built what Adcock claims is the best humanoid neural-network controller in the world.

    Helix is a vision-language-action model that runs onboard each robot. It accepts a natural-language prompt like “clean the living room,” reasons through the scene from camera input, and outputs joint commands a few hundred times per second. Inference happens locally on GPUs inside the torso, so the robot keeps working with no internet connection. Helix 2 launched a few months ago following lessons learned from the BMW deployment, and Figure has roughly a million hours of base training data plus thousands of hours of post-training data driving it.

    The OpenAI Partnership and Breakup

    OpenAI led Figure’s Series B alongside Microsoft. The two teams collaborated for roughly a year on running language models on humanoids. Adcock says he got to know Sam Altman and the team well, but over time it became clear that Figure’s internal robot-learning engineers (most with over a decade of experience in the field) were outpacing OpenAI on testing, model training, and integration with humanoid hardware. Adcock also implies OpenAI was getting interested in robotics itself, which created a strategic conflict. He ended the partnership. He is candid about being wrong on the original strategic logic for letting them invest in the first place.

    BotQ: The Humanoid Factory

    BotQ is the assembly facility where Figure 03 robots are born. Lines build heads, batteries, arms, legs, and hands separately. Each subsystem goes through end-of-line testing before integration. Heads contain camera systems, IMU, thermal sensors, Wi-Fi, 5G, Bluetooth, and lights, and are flashed with firmware and calibrated on the line. The 2.25 kilowatt-hour battery pack is custom-designed with a structural enclosure, polyurethane potting, and an internally engineered thermal-runaway venting system. The requirement is that no flame ever exits the pack. Figure has never had a robot catch fire.

    March 2026 was the company’s record production month, more robots built than in the entire prior history of the company combined. Adcock plans to triple that by May. After assembly, robots run a multi-hour “burn-in” in dedicated bays where the robot self-checks for loose cables, comm errors, or bad parts. They wear vests during gantry-supported wakeup. Once they pass, they walk themselves over to headquarters.

    The Grid and the Never-Fall Protocol

    The Grid runs robots 24/7 at higher operational intensity than any client site. It is the last line of defense before software ships. A dedicated team called Never Fall predicts every plausible fault and engineers around it. The Vulcan project takes this further: using reinforcement learning in simulation, robots learn to survive losing a knee, ankle, or hip mid-task. In the demo, a robot’s left knee was velocity-locked (simulating a lost actuator), and the robot continued hobbling around without falling. A backup robot can be summoned to take over the work.

    The Home Robot Demo

    Figure 03 demoed tidying a living room in a home environment built into the campus. The robot was given the prompt “clean the living room” and reasoned through the task autonomously: clearing cups, putting away toys, wiping the table. There was a brief sassy spray during the cleaning sequence. Adcock was emphatic that this is not teleoperated despite persistent online rumors. Helix 2 runs entirely onboard, no human in the loop.

    The product plan for the home is a leasing model in the $400 to $600 per month range, comparable to a car lease. The dock is roughly 2 feet by 2 feet and plugs into a standard wall outlet. Charging happens inductively through the feet at 2 kilowatts, giving roughly 4 to 5 hours of runtime per 1 hour of charge. Figure is not selling to homes yet but plans to soon.

    The Three Generations (and the Fourth)

    Figure 01 was a “cyberpunk” first-generation robot built for speed of iteration, costing hundreds of thousands of dollars per unit. Most parts were CNC-machined to Formula 1 precision. It walked within a year of company founding, which Adcock claims is among the fastest humanoid walking timelines in history. It had a tendon-driven hand (motors in the forearm) which Figure abandoned early. Because the wrist motors were too far along to redesign, the team raided foot motors and stuffed them in the forearm, producing the so-called Frankenstein forearm where the wrist bent halfway up the arm. Adcock was sure people would notice. In three years, no one ever asked.

    Figure 02 moved the battery from a backpack into the torso, doubled the battery, tripled the compute, added new cameras, and used an exoskeleton load-bearing structure inspired by aircraft skin design. Roughly 50 units were built. It was retired about a month before filming.

    Figure 03, the current production model, is roughly 90% cheaper than Figure 02 and slimmer in profile. It has soft foam-wrapped shoulders, swappable fabric clothing (with a zipper down the back), high-top sneakers, and the latest-generation hand with camera-based tactile sensors. The aesthetic was deliberately moved away from “too roboty.” Figure 03 was the first humanoid robot at the White House (greeting guests at an event with the First Lady).

    Figure 04 is in late-stage detailed design. Adcock describes it as the company’s “iPhone 1 moment,” a much larger generational leap than any prior version, with substantial cost reduction, easier manufacturing, easier home setup, and changes Adcock says are too sensitive to discuss publicly.

    Hands and the Path to Physical AGI

    Figure recently teased a high-degree-of-freedom hand with as many joints as a human hand. Adcock argues this is essential not just for dextrous manipulation but for passive learning from humans at scale. If humans can move their hands in arbitrary ways, the robot needs to be able to map onto those movements at test time. He believes the path to AGI in physical embodiment runs through the hands.

    Adcock’s broader claim is that physical interaction data, learning what happens when you touch, push, lift, or drop something, is the missing ingredient that current frontier language models lack. Most human intelligence is built through trial and error in the physical world. If that is true, humanoids may close the gap to AGI before pure software systems do.

    Brett Adcock’s Other Companies

    Cover is a school weapons-detection company spun out of NASA’s Jet Propulsion Lab. It uses terahertz imaging radar (originally developed for the Iraq and Afghanistan wars to find bomb vests at standoff distance) to detect concealed weapons in clothing or backpacks from 5 to 20 meters away, far further than airport scanners. Adcock bought the IP outright two years ago, and Caltech holds a small minority interest. The team is largely former JPL engineers based in Pasadena. Beta deployments to schools are planned by end of year, with 130,000 K-12 schools as the addressable market. Adcock self-funds it.

    Hark is an AI lab Adcock started seven or eight months ago and unveiled two weeks before the interview. It has 50 employees and is building next-generation personalized AI models alongside new AI hardware (the thesis being that 20-year-old form factors like phones and laptops are the wrong interface for AI).

    Operating Philosophy

    Adcock works from the engineering bullpen, not a corner office. He cut the “annual golf trip” category of relationships out of his life five years ago to make space for family and three companies. He goes home for dinner and bedtime with his kids and returns to the office after. He cites Steve Jobs and Jeff Bezos (a Figure investor) as influences and frames his work ethic as wanting to play “11 out of 10.” He maintains tight physical and digital security: a drone was once caught surveilling the office through a window, after which the team tented the glass.

    Risks

    Adcock is direct that the odds of full success are low. The risk list is long: manufacturing at unprecedented rates, robots running fully autonomously without human intervention (which no one has demonstrated), AI policies that generalize across every environment, hardware reliability, low unit cost, consumer demand. He frames his job as a daily funnel of the most pernicious problems in the company.

    He does not see capital or the $39B valuation as the binding constraint. If the robots work, he projects revenue measured in tens of trillions of dollars and points out that tech companies trade at 10 to 20 times revenue.

    Thoughts

    The most interesting structural claim Adcock makes is that humanoid robotics is an intelligence problem, not a manufacturing problem. That is a strong statement about where the difficulty actually lives. If the bottleneck were industrial (parts, supply chain, factory throughput), the dominant strategy would be to wait for incumbents like Foxconn or BYD to enter and underprice everyone. If the bottleneck is intelligence, the dominant strategy is exactly what Figure is doing: integrate vertically, control the hardware, generate proprietary training data, and run a tight feedback loop between deployments and model updates. The BMW deployment producing the lessons that became Helix 2 is the cleanest illustration of that loop in action.

    The 360-to-the-40th state space framing is a useful reminder of why neural networks won this domain. Anything you cannot enumerate, you must learn. The pivot from classical controls to neural networks about a year ago is probably the single highest-leverage decision in the company’s history, and it tracks with the broader collapse of hand-coded systems across robotics, autonomy, and even compilers.

    The OpenAI breakup is more interesting than it first appears. Adcock’s story is not “they were bad,” it is “we got better than them, faster.” That is consistent with a recurring pattern in AI right now: vertically integrated application companies, where the model is the product, are starting to outpace general-purpose model providers on their own narrow domains. If physical AGI does happen first in embodiment, that pattern will look prophetic in retrospect.

    The home leasing model at $400 to $600 per month is the part most people will underestimate. That price point is not luxury. It is roughly the cost of a modest car payment, less than full-time childcare, less than a cleaning service plus a dog walker plus laundry pickup. If the robot can actually do laundry, dishes, and tidying every day with no failures, the consumer math gets aggressive fast. The bottleneck is reliability per hour, not willingness to pay.

    The skeptic’s case is also worth holding in mind. “Working” in a curated demo home is not the same as working in 100,000 messy real homes with cats, kids, weird furniture, and unpredictable lighting. Generalization is exactly the problem Adcock concedes is unsolved. The Vulcan demo (hobbling on a velocity-locked knee) is impressive, but a single failure mode handled is a long way from “never fall” across the full distribution of real-world conditions. The phrase “we want to be able to” appears repeatedly in Adcock’s roadmap, and it is doing a lot of work.

    Still, the velocity is real. Record manufacturing in March, tripling by May, four buildings, 500 employees, vertically integrated parts, a custom battery line, BMW deployment, White House appearance, Time cover, Helix 2 in production, Figure 04 in detailed design. The competitive landscape (Tesla Optimus, 1X, Apptronik, Unitree, and several Chinese entrants) is going to determine whether Figure stays “a few years ahead” of everyone, as Adcock claims, or whether the gap collapses. But if humanoids actually work, this is one of the very few companies positioned to capture the upside, and Adcock has been operating the playbook for almost four years.

    The most underrated detail in the whole tour: Figure 04 is being described internally as the iPhone 1. Figure 03 is the BlackBerry. If that framing holds up, the next 12 to 24 months are when this market gets defined.

  • Elad Gil on the AI Frontier: Compute Constraints, the Personal IPO, and Why Most AI Founders Should Sell in the Next 12 to 18 Months

    Elad Gil sat down with Tim Ferriss for a wide ranging conversation that pairs almost perfectly with his recent Substack post Random thoughts while gazing at the misty AI Frontier. Together, the podcast and the post lay out the cleanest framework I have seen for what is actually happening in AI right now: a Korean memory bottleneck capping every lab, a class wide personal IPO across the research community, the fastest revenue ramps in capitalist history, and a brutal dot com style culling that most founders do not yet want to admit is coming. Below is a complete breakdown.

    TLDW (Too Long, Didn’t Watch)

    Elad Gil argues that AI is producing the fastest revenue ramps in capitalist history while setting up the same brutal power law that wiped out 99 percent of dot com companies. OpenAI and Anthropic each sit at roughly 0.1 percent of US GDP today, on a path to 1 percent of GDP run rate by end of 2026, which is insanely fast by any historical standard. The current ceiling on capabilities is not chips but Korean high bandwidth memory, and that constraint will likely hold all major labs roughly comparable in capability through 2028. Talent has just experienced a class wide personal IPO via Meta led bidding, with packages running tens to hundreds of millions per researcher. Most AI companies should consider exiting in the next 12 to 18 months while the tide is high. Right now consensus is correct. Save the contrarianism for later.

    Key Takeaways

    • OpenAI and Anthropic are each at roughly 0.1 percent of US GDP. With US GDP near 30 trillion dollars and each lab at a roughly 30 billion dollar revenue run rate, AI has gone from essentially zero to 0.25 to 0.5 percent of GDP in just a few years. If the labs hit 100 billion in run rate by year end 2026 (which many expect), AI hits 1 percent of GDP run rate inside a single year.
    • The AI personal IPO is real. 50 to a few hundred AI researchers across multiple companies just experienced a class wide IPO event due to Meta led bidding, with top packages reportedly tens to hundreds of millions per person. The closest historical analog is early crypto holders around 2017.
    • The bottleneck is Korean memory, not Nvidia chips. High bandwidth memory from Hynix, Samsung, Micron, and others is the binding constraint. Expected to hold roughly two years. After that, power and data center buildout become the next walls.
    • No lab can pull dramatically ahead before 2028. Because every lab is compute constrained on the same input, OpenAI, Anthropic, Google, xAI, and Meta should remain roughly comparable in capability through that window, absent an algorithmic breakthrough that stays inside one lab.
    • Compute is the new currency. Token budgets now define what an engineer can accomplish, what a company can spend, and what business models are viable. Some companies (neoclouds, Cursor) are effectively inference providers disguised as tools.
    • The dot com base rate is the AI base rate. Around 1,500 to 2,000 companies went public in the late 1990s internet cycle. A dozen or two survived. AI will likely look the same.
    • Most AI founders should consider selling in the next 12 to 18 months. If you are not in the durable handful, this is your value maximizing window. A handful of companies (OpenAI, Anthropic) should never sell.
    • Buyers are bigger than ever. One percent of a 3 trillion dollar market cap is 30 billion dollars. That math makes massive AI acquisitions trivial for hyperscalers, vertical incumbents, and adjacent giants.
    • Underrated exit path: merger of equals. Two private AI competitors destroying each other on price should consider just merging. PayPal and X.com did exactly this in the 1990s.
    • 91 percent of global AI private market cap sits in a 10 by 10 mile square. If you want to do AI, move to the Bay Area. Remote work for cluster industries is BS.
    • Want money? Ask for advice. Want advice? Ask for money. The inverse also works: offering useful advice frequently leads to inbound investment opportunities.
    • AI is selling units of labor, not software. The shift is from selling seats and tools to selling cognitive output. This is why Harvey can win in legal, where decades of legal SaaS failed.
    • AI eats closed loops first. Tasks that can be turned into testable closed loop systems (code, AI research) get automated fastest. Map jobs on a 2×2 of closed loop tightness vs economic value to see where AI hits soonest.
    • Headcount will flatten at later stage companies. Multiple late stage CEOs told Elad they will not do big AI layoffs but will simply stop growing headcount even as revenue grows 30 to 100 percent. Hidden layoffs are also hitting outsourcing firms in India and the Philippines first.
    • The Slop Age could be the golden era of AI plus humanity. AI produces useful slop at volume, humans desloppify it, leverage is high, and the work is fun. This window may close as AI gets superhuman.
    • Market first, team second (90 percent of the time). Great teams die in bad markets. The exception is when you meet someone truly exceptional at the very earliest stage.
    • The one belief framework. If your investment memo needs three core beliefs to be true, it is too complicated. Coinbase was an index on crypto. Stripe was an index on e-commerce. That was the entire memo.
    • The four year vest is a relic. It exists because in the 1970s companies actually went public in four years. Today the private window has stretched to 20 years and venture has eaten what used to be public market growth investing.
    • Boards are in-laws. You cannot fire investor board members. Take a worse price for a better board member, because as Naval Ravikant said, valuation is temporary, control is forever.
    • Right now, consensus is correct. Save the contrarianism. The smart move is to just buy more AI exposure rather than try to outsmart the obvious.
    • Distribution wins more than founders admit. Google paid hundreds of millions to push the toolbar. Facebook bought ads on people’s own names in Europe. TikTok spent billions on user acquisition. Allbirds (yes, the shoe company) just raised a convert to build a GPU farm.
    • Anti-AI sentiment will get worse before it gets better. Maine banned new data centers. There has been violence directed at AI leaders. Expect more political and activist backlash, especially as AI is blamed for harms it has not yet caused while its benefits are mismeasured.
    • Use AI as a cold reader. Elad uploads photos of founders to AI models with cold reading prompts and reports surprisingly accurate personality assessments based on micro features.

    Detailed Summary

    The Numbers Are Insane and Mostly Underappreciated

    The most stunning data point in either source is the GDP math. US GDP is roughly 30 trillion dollars. OpenAI and Anthropic are each rumored to be at roughly 30 billion dollars in revenue run rate, putting each one at 0.1 percent of US GDP. Add cloud AI revenue and the picture gets stranger: AI has grown from essentially zero to between 0.25 and 0.5 percent of GDP in only a few years. If the labs hit 100 billion in run rate by year end 2026, AI will be at roughly 1 percent of GDP run rate inside a single year. There is no historical analog for that pace. Elad notes that productivity gains from AI may end up mismeasured the way internet productivity was undercounted in the 2000s, which would have downstream consequences for regulation: AI gets blamed for the bad (job losses) and credited for none of the good (new jobs, education gains, healthcare improvements). His half joking aside is that the real ASI test may be the ability to actually measure AI’s economic impact.

    The AI Personal IPO

    The most underdiscussed phenomenon in AI right now, according to Elad, is what he calls a class wide personal IPO. When a company IPOs, a subset of employees become wealthy, lose focus, and either start companies, get into politics, fund passion projects, or check out. Meta started aggressively bidding for AI talent. Other major labs had to match. The result was 50 to a few hundred researchers, scattered across multiple labs, suddenly receiving compensation in the tens to hundreds of millions of dollars range. The only historical analog Elad can think of is early crypto holders around 2017. Some chunk of these newly wealthy researchers will redirect attention to AI for science, side projects, or quiet quitting. The aggregate field stays mission aligned, but the distribution of attention has shifted.

    The Korean Memory Bottleneck

    Every major AI lab today is building giant Nvidia clusters paired with high bandwidth memory primarily from Korean fabs and a few other suppliers. They run massive amounts of data through these clusters for months, and the output is, almost absurdly, a single flat file containing what amounts to a compressed version of human knowledge plus reasoning. Right now, the binding constraint on this whole stack is HBM memory from Hynix, Samsung, Micron, and others. Korean memory fab capacity has been below the capacity of every other piece of the system. Elad estimates this constraint persists for roughly two years. After that, the next walls are likely data center construction and power. The strategic implication is enormous. While memory constrains everyone, no single lab can buy 10x the compute of its rivals, so capabilities should stay roughly comparable across the major labs. Once that constraint lifts, possibly around 2028, one player could theoretically pull dramatically ahead, especially if AI assisted AI research closes a self improvement loop inside one lab.

    Compute Is the New Currency

    The blog post sharpens a framing that runs throughout the podcast: compute, denominated in tokens, is now a unit of economic value. Token budgets define what an engineer can accomplish, what a company can spend, and what business models work. Some companies are effectively inference providers wearing tool costumes. Neoclouds are the cleanest example. Cursor is another, subsidizing inference as a user acquisition strategy. The most absurd recent example: Allbirds, the shoe company, raised a convertible to build a GPU farm. Whether this becomes the AI version of Microstrategy’s Bitcoin trade or a cautionary tale, it tells you where the cost of capital believes the next decade is going.

    The Dot Com Survival Math

    Elad walks through the brutal arithmetic that AI founders should be internalizing. In the late 1990s and early 2000s, somewhere between 1,500 and 2,000 internet companies went public. Of those, roughly a dozen or two survived in any meaningful form. Every cycle has looked like this: automotive in the early 1900s, SaaS, mobile, crypto. There is no reason AI will be different. Most current AI companies, including those ramping revenue today, will see the market, competition, and adoption turn on them. The question every AI founder should be asking is whether they are in the durable handful or not.

    Most AI Companies Should Consider Exiting in the Next 12 to 18 Months

    This is the most actionable and most uncomfortable take in either source. While the tide is rising, every AI company looks unstoppable. Whether they actually are, in a 10 year frame, is a separate question. Founders running successful AI companies should take a cold honest look at whether the next 12 to 18 months is their value maximizing window. Companies typically have a 6 to 12 month peak before some headwind hits, often visible in the second derivative of growth. The best signal that you should sell is when growth rate is starting to plateau and you can see why. A handful of companies (OpenAI, Anthropic, the durable winners) should never exit. Many others should, while everything is still on the upswing.

    What Makes an AI Company Durable

    Elad lays out four lenses for evaluating durability at the application layer:

    1. Does your product get dramatically better when the underlying model gets better, in a way that keeps customers loyal?
    2. How deep and broad is the product? Are you building multiple integrated products embedded in actual workflows?
    3. Are you embedded in real change management at the customer? AI adoption is mostly a workflow change problem, not a tech problem. Workflow embedding is durable.
    4. Are you capturing and using proprietary data in a way that creates a system of record? Data moats are often overstated, but sometimes real.

    At the lab layer, Elad believes OpenAI, Anthropic, and Google are durable absent disaster. He predicted three years ago that the foundation model market would settle into an oligopoly aligned with cloud, and that prediction has roughly held.

    Selling Work, Not Software

    The deepest structural insight in the conversation is that generative AI is shifting what software companies sell. The old model was selling seats, tools, and SaaS subscriptions. The new model is selling units of cognitive labor. Zendesk sold seats to support reps. Decagon and Sierra sell agentic support output. Harvey can win in legal even though selling to law firms was historically considered terrible business, because Harvey is not selling tools, it is augmenting lawyer output. This shift opens markets that were previously closed and dramatically grows tech TAMs. It is also why founder limited theories of entrepreneurship currently understate how many opportunities exist.

    AI Eats Closed Loops First

    One of the cleanest mental models in the blog post is the closed loop framework. AI automates first what can be turned into a testable closed loop. Code is the canonical example: outputs can be tested, errors detected, models can iterate. AI research is similar. Both have tight feedback loops and high economic value, which puts them at the top of the AI impact ranking. Map jobs on a 2×2 of closed loop tightness vs economic value and you can see where AI hits soonest. The interesting forward question is which jobs become more closed loop next. Data collection and labeling will keep growing in every field as a result.

    The Harness Matters More Than People Think

    For coding tools and increasingly for enterprise applications, what Elad calls the harness, the wrapper of UX, prompting, workflow integration, and brand around the underlying model, is becoming sticky. It is not just which model you call. It is the environment built around it. Cursor and Windsurf demonstrate this in coding. The interesting open questions are what the harness looks like for sales AI, for AI architects, for analyst workflows. Those gaps leave room for startups even as model capabilities converge.

    Hidden Layoffs and the Developing World

    Most announced AI driven layoffs are probably just COVID era overhiring corrections wrapped in a more flattering narrative. But real AI driven labor displacement is happening, and it is hitting outsourcing firms first. That means countries like India and the Philippines, where many outsourced services jobs sit, are likely to be the most impacted earliest. Several developing economies built their growth ladders on services exports. If AI takes those jobs first, the migration and economic patterns of the next decade may shift in ways nobody is yet planning for.

    The Flat Company

    Multiple late stage CEOs told Elad they will not announce big AI layoffs. Instead, they will simply stop growing headcount. If revenue grows 30 to 100 percent, headcount stays flat or shrinks via attrition. Existing employees become dramatically more productive. The very best people who can leverage AI will see compensation inflate. Sales and some growth engineering keep hiring. Almost everything else flatlines. This is mostly a later stage and public company phenomenon. True early stage startups should still scale aggressively after product market fit, just with more leverage per person.

    Exit Options for AI Founders

    Elad lays out four exit categories. First, the labs and hyperscalers themselves: Apple, Amazon, Google, Microsoft, Meta. Second, vertical incumbents like Thomson Reuters for legal or healthcare giants for clinical AI. Third, the underrated category of merger of equals between two private AI competitors who are currently destroying each other on price. PayPal and X.com did this in the 1990s. Uber and Lyft reportedly almost did. Fourth, large adjacent tech companies: Oracle, Samsung, Tesla, SpaceX, Snowflake, Databricks, Stripe, Coinbase. The market cap math has changed in a way that makes acquisition trivial. One percent of a three trillion dollar market cap is 30 billion dollars, which means a hyperscaler can do massive acquisitions almost casually.

    Geographic Concentration Is Extreme

    Elad’s team analyzed where private market cap aggregates. Historically half of global tech private market cap sat in the US, with half of that in the Bay Area. With AI, 91 percent of global AI private market cap is in a single 10 by 10 mile square in the Bay Area. New York is a distant second and then it falls off a cliff. For defense tech, the cluster is Southern California (SpaceX, Anduril, El Segundo, Irvine). Fintech and crypto skew toward New York. The remote everywhere advice is, Elad says, just BS for anyone trying to break into an industry cluster.

    How Elad Got Into His Best Deals

    Stripe started with Elad cold emailing Patrick Collison after selling an API company to Twitter. A couple of walks later, Patrick texted that he was raising and Elad was in. Airbnb came from helping the founders raise their Series A and being asked at the end if he wanted to invest. Anduril came from noticing that Google had shut down Project Maven and asking if anyone was building defense tech, then meeting Trey Stephens at a Founders Fund lunch. Perplexity came from Aravind Srinivas cold messaging him on LinkedIn while still at OpenAI. Across all of these, the pattern is the same: be in the cluster, be helpful, be talking publicly about technology nobody else is talking about, and be useful to founders before any money is on the table.

    The One Belief Framework

    Investors love complicated 50 page memos. Elad believes the actual decision usually collapses into a single core belief. Coinbase: this is an index on crypto, and crypto will keep growing. Stripe: this is an index on e-commerce, and e-commerce will keep growing. Anduril: AI plus drones plus a cost plus model will be important for defense. If your thesis needs three things to be true, it is probably not going to work. If it needs nothing, you have no thesis.

    Boards as In-Laws

    Elad emphasizes that founders should treat board composition like one of the most important hiring decisions of the company. You cannot fire an investor board member. They have contractual rights. So if you are going to be stuck with someone for a decade, take a worse valuation for a better human. Reid Hoffman’s framing is that the best board member is a co-founder you could not have otherwise hired. Naval Ravikant’s framing is that valuation is temporary but control is forever. Elad recommends writing a job spec for every board seat.

    The Slop Age as a Golden Era

    One of the warmest takes in the blog post is the framing of the current moment as the Slop Age, and the suggestion that this might actually be the golden era of AI plus humanity. Before the last few years, AI was inaccessible and narrow. Eventually AI may become superhuman at most tasks. Today, AI produces useful slop at volume, which means humans are still needed to desloppify the slop, but the leverage on time and ambition is real. That makes the work fun. If AI displaces people or starts doing more interesting work, this golden moment fades. Elad also notes the obvious counter, that the era of human generated internet slop preceded the AI slop era. AGI may end the slop age, or alternately may be the thing that finally cleans up all the prior waves of human slop.

    Anti-AI Regulation and Violence Will Increase

    This is one of the more sobering threads in the blog post. Real world AI driven labor displacement has been small so far, but anti-AI sentiment is already strong and growing. Maine just banned new data centers. There has been actual violence directed at AI leaders, including a recent attack on Sam Altman. Elad’s view is that AI leaders should work harder on optimistic public framing, real political lobbying, and reining in the doom narrative coming from inside the field. Otherwise the regulatory and activist backlash will get much worse, and likely on the basis of mismeasured impacts.

    Right Now Consensus Is Correct

    The headline contrarian take from the episode is that contrarianism right now is wrong. There are moments in time when betting against the crowd pays. This is not one of them. The smart bet is just buying more AI exposure. Trying to find the clever angle, the underlooked hardware play, the secret macro thesis, is overthinking it. Save the contrarian moves for later in the cycle.

    Distribution Almost Always Matters

    Elad pushes back on the founder mythology that great products win on their own. Google paid hundreds of millions of dollars in the early 2000s to distribute its toolbar through every popular app installer on the internet. Facebook bought search ads against people’s own names in European markets to seed network liquidity. TikTok spent billions on user acquisition before its algorithm could lock people in. Snowflake spent enormous sums on enterprise sales and channel partnerships. Sometimes the best product wins. Often the company with the best distribution wins. Founders should plan for both.

    AI as a Cold Reader and a Research Partner

    Two of the more practical AI workflows Elad describes: First, uploading photos of founders to AI models with cold reading prompts that ask the model to identify micro features (crows feet from genuine smiling, brow patterns, posture cues) and infer personality traits, sense of humor, and likely social behavior. He reports the outputs are surprisingly specific. Second, running deep dives across multiple models in parallel (Claude, ChatGPT, Gemini), asking each for primary sources, summary tables, and cross checked data. He recently used this approach to investigate the rise in autism and ADHD diagnoses, concluding that diagnostic criteria shifts and school incentives drive most of it, and noting that maternal age has a stronger statistical association with autism than paternal age, despite paternal age getting all the public discourse.

    The First Ever 10 Year Plan

    For someone who has been compounding aggressively for two decades, Elad has somehow never written a 10 year plan until now. He knows it will not play out as written. The point is that the act of imagining a decade out shifts what you choose to do in the near term. He explicitly rejects the AGI in two years therefore plans are pointless framing as defeatist. There will be interesting things to do regardless of how the AGI timeline plays out.

    Thoughts

    This is one of the more useful AI investor conversations of 2026, mostly because Elad is willing to put numbers and timelines on things that are usually left vague. Pairing the podcast with the underlying Substack post is the right move because the post is where the GDP math, the closed loop framework, and the Slop Age framing actually live. The podcast is where Elad explains how he thinks rather than just what he thinks.

    The 12 to 18 month sell window framing is the most actionable single idea in either source, and probably the most uncomfortable for AI founders sitting on multi billion dollar paper valuations. The math is unforgiving. A dozen winners out of thousands. If you are honest with yourself about whether you are in the dozen, you know what to do.

    The Korean memory bottleneck framing explains a lot of current behavior. The talent wars make more sense once you accept that compute is not going to be the differentiator for two years, so people become the only remaining lever. The convergence of capabilities across OpenAI, Anthropic, Google, and xAI starts to look less like coincidence and more like the structural inevitability of a supply constrained input. The 2028 inflection date is the one to watch.

    Compute as currency is the cleanest reframing in the blog post. Once you start pricing companies in tokens rather than dollars, everything from Cursor’s economics to Allbirds raising a convert to build a GPU farm becomes legible. The interesting question is whether this is a permanent unit of denomination or a transitional one that fades when inference costs collapse.

    The software to labor argument is the structural framing that I think will hold up the longest. Once you internalize that we are not selling seats anymore but selling cognitive output, every vertical that was previously locked behind ugly procurement and IT inertia opens up. Harvey is the proof of concept. There will be 30 more Harveys across every white collar profession.

    The closed loop framework is the cleanest predictor of which jobs get hit hardest and soonest. If you want to know whether your role is exposed, the questions to ask are whether outputs can be machine evaluated, how tight the feedback loop is, and how high the economic value is. The intersection is where AI lands first.

    The geographic concentration data is genuinely shocking. 91 percent of global AI private market cap in a 10 by 10 mile area is the kind of statistic that should make everyone outside that square think very carefully about what game they are playing.

    The Slop Age framing is the most emotionally honest moment in the post. We are in a window where humans still meaningfully add value on top of AI output. That window is finite. Enjoy it.

    The anti-AI backlash thread is the one I think most people in the industry are still underweighting. Maine banning new data centers is a leading indicator, not a one off. The fact that the impacts are likely to be mismeasured by official statistics makes the political dynamics worse, not better. AI will get blamed for harms it did not cause and credited for none of the gains. If the field’s leaders do not start communicating better and lobbying smarter, the regulatory environment in 2028 will be much worse than in 2026.

    Finally, Elad’s first ever 10 year plan stands out as the most quietly important moment in the episode. The implicit message is that even people who have been compounding aggressively for two decades benefit from forcing a longer time horizon onto their thinking. Most plans fail. The act of planning still changes what you do today.

    Read the original Elad Gil post here: Random thoughts while gazing at the misty AI Frontier. Find Elad on X at @eladgil, on his Substack at blog.eladgil.com, and on his website at eladgil.com. Tim Ferriss publishes the full episode at tim.blog/podcast.

  • How GPT-5, Claude, and Gemini Are Actually Trained and Served: The Real Math Behind Frontier AI Infrastructure

    Reiner Pope, CEO of MatX and former TPU architect at Google, sat down with Dwarkesh Patel for a different kind of episode: a chalk-and-blackboard lecture on how frontier LLMs like GPT-5, Claude, and Gemini are actually trained and served. With nothing but a handful of equations and public API prices, Reiner reverse engineers an astonishing amount of what the labs are doing. If you have ever wondered why Fast Mode costs more, why context length stalls around 200k tokens, why models seem 100x over-trained, or why hyperscalers are pouring half a trillion dollars into memory, this is the most lucid explanation on the internet.

    TLDW

    Frontier LLM economics come down to two simple budgets: compute time and memory time. Once you write the rooflines on a blackboard, almost everything else falls out of them. Optimal batch size is roughly 300 times your sparsity ratio (around 2,000 to 3,000 tokens for a DeepSeek-style model). A new batch “train” departs every 20 milliseconds because that is how long it takes to read HBM end to end. Mixture of experts strongly favors staying inside a single rack, which is why scale-up domains went from 8 GPUs (Hopper) to 72 (Blackwell) to 500-plus (Rubin). Pipeline parallelism solves weight capacity but does nothing for KV cache, and adds painful per-hop latency, which is why Ilya famously said pipelining is not wise. Because of reinforcement learning and inference economics, frontier models are roughly 100x over-trained versus Chinchilla optimal, and a well-tuned model should output roughly as many tokens during deployment as went into its pre-training corpus. API prices leak the rest: Gemini’s 50% premium above 200k tokens reveals where KV memory time crosses weight memory time, prefill being 5x cheaper than decode confirms decode is memory bandwidth bound, and cache hit pricing tiers map directly to HBM, DDR, flash, and (yes) spinning disk. The lecture closes on a beautiful detour about the convergent evolution of neural nets and cryptographic ciphers.

    Key Takeaways

    • Two equations explain almost everything. A roofline analysis comparing compute time to memory fetch time predicts cost, latency, and architectural choices with shocking accuracy.
    • Optimal batch size is about 300 times sparsity. For a DeepSeek model that activates 32 of 256 experts, that lands around 2,000 to 3,000 tokens per batch. Real deployments go a bit higher to leave headroom.
    • The 20 millisecond train. A new batch departs every 20ms because that is how long it takes to read all of HBM once. Worst-case queue latency is roughly 40ms.
    • Fast Mode is just smaller batches. Pay 6x more, get 2.5x faster decode by amortizing weights over fewer users. There is a hard latency floor at the HBM read time.
    • Slow Mode would not save much. Once you are past the optimal batch size, the cost-per-token plateau is dominated by compute, not weight fetches. You cannot meaningfully amortize KV cache because it is unique per sequence.
    • One rack is the natural MoE unit. Expert parallelism wants all-to-all communication, which strongly favors the scale-up network (NVLink) over the scale-out network (roughly 8x slower).
    • Bigger scale-up domains drove model scaling. The jump from 8 (Hopper) to 72 (Blackwell) to 500-plus (Rubin) GPUs per rack increased aggregate memory bandwidth by 8x, which is why trillion-plus parameter models only became viable recently.
    • Pipeline parallelism is overrated for inference. It saves on weight memory capacity but does nothing for KV cache memory. It also adds milliseconds of latency per hop in decode.
    • Why Ilya said pipelining is not wise. Architectural constraints (cross-layer residuals like in Kimi) and the inability to amortize weight loads across micro-batches make pipelining a hassle in training too.
    • The memory wall is real and paradoxical. Hyperscalers reportedly spend 50% of CapEx on memory, yet racks have far more HBM than a trillion-parameter model needs. The capacity is there for KV cache and batch size, not for weights.
    • Frontier models are roughly 100x over-trained vs Chinchilla. When you minimize total cost across pre-training plus RL plus inference, smaller models trained on more data win.
    • Each model should output roughly all human knowledge. If you equalize pre-training and inference compute, the total tokens served by a model during its lifetime should approximate its training corpus. Roughly 150 trillion in, 150 trillion out.
    • API pricing reveals architecture. Gemini’s 50% premium above 200k context, the 5x decode-vs-prefill ratio, and cache duration tiers all leak detailed information about KV size, memory bottlenecks, and storage hierarchy.
    • KV cache is roughly 2KB per token. Solving Gemini’s pricing equation gives a plausible 1.6 to 2 kilobytes per token at 100B active parameters and 200k context.
    • Decode is memory bandwidth bound, prefill is compute bound. The 5x price gap is direct evidence.
    • Cache pricing maps to memory tiers. The 5-minute and 1-hour cache durations probably correspond to flash and spinning disk drain times respectively. LLM serving uses spinning disk.
    • Context length is stuck near 200k. Memory bandwidth, not compute, is the binding constraint. Sparse attention gives a square-root improvement but is not infinite.
    • Cryptography and neural nets are mathematical cousins. Both rely on jumbling information across inputs. Feistel ciphers led directly to RevNets (reversible neural networks). Adversarial attacks mirror the cipher avalanche property.

    Detailed Summary

    The Roofline: Compute Time vs Memory Time

    Reiner starts with the simplest possible model of LLM inference. The time to do a forward pass is bounded below by the maximum of compute time and memory fetch time. Compute time is the batch size times active parameters divided by FLOPs. Memory time is total parameters divided by memory bandwidth, plus a KV cache term that scales with batch size and context length. From these two equations, almost every economic and architectural fact about modern LLMs can be derived.

    Plotting cost per token against batch size gives a clean picture: at low batch you pay enormous overhead because you cannot amortize the weight fetches, and at high batch you hit a compute floor. There is a sweet spot where memory bandwidth time equals compute time. That sweet spot is what Fast Mode and Slow Mode are tuning around.

    Why Fast Mode Costs More: The Batch Trade-Off

    When Claude Code or Codex offers Fast Mode at 6x the price for 2.5x the speed, what is really happening is that they are running you at a smaller batch size. Smaller batch means weight loads are amortized over fewer users, so cost per token goes up. But latency goes down because each forward pass touches less data. There is a hard floor on latency because you have to read every byte of HBM at least once per token, and that takes about 20 milliseconds on Blackwell-class hardware. There is also a soft ceiling on Slow Mode savings because the unamortizable parts (KV cache fetches, compute) eventually dominate.

    The 20 Millisecond Train

    HBM capacity divided by HBM bandwidth lands consistently around 20 milliseconds across generations of Nvidia hardware. That is the natural cadence at which a frontier model can run a forward pass over all its weights. Reiner uses a memorable analogy: a train departs every 20 milliseconds. Any users whose requests are ready board the train. If the train is full, they wait. If it is empty, it leaves anyway. This is why you do not need millions of concurrent users to saturate a model’s batch. You only need enough to fill a 2,000-token train every 20ms.

    Why Optimal Batch Size Is About 300 Times Sparsity

    Setting compute time equal to weight fetch time and rearranging gives a beautiful result: batch size needs to be greater than (FLOPs / memory bandwidth) times (total params / active params). The hardware ratio is a dimensionless 300 on most GPUs and has stayed remarkably stable from A100 through Hopper, Blackwell, and Rubin. The model term is just the sparsity ratio. For DeepSeek with 32 of 256 experts active, that is 8. So optimal batch is around 2,400 tokens. Real deployments push this to 3x to leave headroom for non-ideal efficiency. At 64 trains per second, that is roughly 128,000 tokens per second per replica, or about 1/1000 of Gemini’s reported global throughput.

    Mixture of Experts Wants to Live Inside a Rack

    MoE all-to-all routing means every token can be sent to any expert on any GPU. The communication pattern strongly prefers the fast scale-up network (NVLink) inside a rack to the slower scale-out network between racks. Scale-out is roughly 8x slower in bandwidth. This is why one rack ends up being the natural unit for an expert layer, and why Nvidia’s progression from 8 GPUs per rack (Hopper) to 72 (Blackwell) to 500-plus (Rubin) has been such a big deal for model size scaling.

    Reiner walks through the physical constraints: cable density, bend radius, weight, power, cooling. Modern racks are pushing every dimension to the limit. Stuffing more GPUs into the scale-up domain is genuinely a hardware engineering problem.

    Pipeline Parallelism: Why Ilya Said It Is Not Wise

    Pipelining splits model layers across racks. It is the natural way to scale beyond the scale-up domain for very large models. But it has problems. In inference, pipelining does not save runtime, it only saves memory capacity per rack, which already is not the binding constraint because trillion-parameter models only need a terabyte and racks have 10x that. In training, pipelining creates the famous bubble (idle GPU time at the start and end of each pipeline pass) and forces micro-batching, which kills your ability to amortize weight loads across the global batch.

    There is also an architectural cost. Models like Kimi use cross-layer residual connections where attention attends to layers a few back, and pipelining makes those patterns very hard to implement cleanly. Ilya’s quip “as we now know, pipelining is not wise” captures all of this.

    The Memory Wall Paradox

    Industry analysts report that hyperscalers are spending 50% of CapEx on memory this year, while smartphones and laptops are seeing 30% volume drops because there is not enough HBM and DDR to go around. Yet a Blackwell rack already has tens of terabytes of HBM, far more than a trillion-parameter model needs. The reason is that all that extra capacity goes to KV cache, batch size, and longer context. The bandwidth, not the capacity, is what matters most for weight loading. This also implies that hardware could be designed with less HBM per GPU if you commit to pipelining the weights, which is a real architectural option for a chip startup like MatX.

    Reinforcement Learning and the 100x Over-Training of Frontier Models

    Chinchilla scaling laws say a model with N active parameters should be trained on roughly 20N tokens for compute-optimal training. But frontier labs do not just minimize training cost. They minimize training plus inference cost across the model’s deployment lifetime. With reinforcement learning added to the mix, the cost equation has three terms: pre-training (6 times active params times tokens), RL (somewhere between 2x and 6x times active params times RL tokens, with a 30% efficiency penalty for decode-heavy rollouts), and inference (2 times active params times inference tokens).

    If you assume those three roughly equalize at the optimum (a heuristic that holds for many cost curves), you get a clean conclusion: the data going into pre-training should be roughly equal to the data going into RL, which should be roughly equal to the tokens served at inference. With 100 billion active parameters and roughly 150 trillion training tokens, that is about 75x past Chinchilla optimal. Reiner rounds it to 100x. This is the most concrete first-principles argument for why frontier models are so deeply over-trained, and it implies that as inference traffic grows, models should keep getting smaller and longer-trained.

    Each Model Should Output All of Human Knowledge

    The most jaw-dropping consequence: if you equalize pre-training and inference compute, then the total tokens generated by a model across its deployment lifetime should approximate the size of its training corpus. GPT-5, served to hundreds of millions of users for two months, will collectively output something on the order of 150 trillion tokens. That is roughly the sum of human knowledge in textual form. Each frontier model is, in this sense, a one-shot universal author of a corpus the size of its source material.

    API Prices Leak Architecture

    This is where the lecture gets really fun. Gemini 3.1 charges 50% more for context above 200k tokens. Setting memory time equal to compute time at exactly 200k context and solving for KV cache size gives roughly 1.6 to 2 kilobytes per token, which is plausible for a model with 8 KV heads, dense attention, and head dimension of 128.

    The 5x premium for output (decode) tokens versus input (prefill) tokens is direct evidence that decode is severely memory bandwidth bound and prefill is compute bound. Prefill processes many tokens per weight load, so it amortizes memory cost over the whole sequence. Decode processes one token per weight load, so it pays full memory cost every time.

    Cache hits priced at one tenth of cache misses tell you that storing the KV cache in HBM (or DDR or flash) is much cheaper than recomputing it from scratch. The two cache duration tiers (5 minutes and 1 hour) probably correspond to memory tiers whose drain times match those durations: flash for the 5-minute tier, spinning disk for the 1-hour tier. Yes, spinning disk is in the modern LLM serving stack, despite being decades-old technology.

    Why Context Length Has Plateaued at 200k

    Context lengths shot up from 8k to roughly 200k during the GPT-3 to GPT-4 era and have stayed roughly flat for the past two years. Reiner argues this is the natural balance point where memory bandwidth cost crosses compute cost. Going to a million tokens is expensive. Going to 100 million tokens (which Dario has hinted is needed for true continual learning via in-context learning) is essentially impossible without either a memory technology breakthrough or a much more aggressive sparse attention scheme. Sparse attention helps with a square-root improvement, but it is not unlimited. Going too sparse trades off too much quality.

    Cryptography Meets Neural Nets

    The episode ends with a lovely intellectual detour. Cryptographic protocols and transformer architectures both rely on jumbling information across all inputs. They are doing inverse versions of the same operation: ciphers take structured input and produce randomness, while neural nets take noisy input and extract structure. Both fields use differentiation as their primary attack vector (differential cryptanalysis on ciphers, gradient descent on neural nets). Adversarial attacks on image classifiers exploit exactly the avalanche property that good ciphers are designed for.

    The most concrete crossover: Feistel ciphers, which let you build invertible functions out of non-invertible ones, were ported into deep learning as RevNets (reversible networks) in 2017. RevNets let you run the entire network backwards during the backward pass, eliminating the need to store activations and dramatically reducing training memory footprint. It is the opposite trade-off of KV caching: spending compute to save memory rather than spending memory to save compute.

    Thoughts

    The most striking thing about this episode is how much can be deduced from a few equations and the public API price sheets of the major labs. The labs treat their architectures as trade secrets, but the moment they price tokens to be close to cost (which competition forces them to do), the prices themselves leak the underlying ratios. Anyone with a pen and paper can reverse engineer the KV cache size, the memory tier hierarchy, and the compute-vs-memory bottleneck profile of a frontier model. There is a lesson here for builders: in competitive markets, the prices tell you almost everything.

    The 100x over-training result has interesting implications for what comes next. If the optimal balance shifts further toward inference (as adoption keeps growing), models should get smaller and longer-trained. That is good news for serving costs and bad news for training-compute-as-moat. The biggest determinant of model quality might increasingly be data quality and RL environment design, not raw pre-training compute. This squares with what is visible publicly: the leading labs are investing heavily in RL infrastructure, evaluations, and synthetic data pipelines.

    The memory wall is the most underrated infrastructure story in AI. Most people think of compute as the bottleneck, but Reiner makes it clear that memory bandwidth is what actually limits context length, which limits how agentic a model can be in practice. If you cannot get to 100 million token contexts, you probably cannot have an AI agent that has been working with you for a month and remembers everything. Either some sparse attention scheme has to give us cheap effective context length, or we need a memory hardware breakthrough, or we have to invent some form of continual learning that does not rely on context windows. None of those paths are obviously easy, and the fact that context length has been flat for two years despite enormous investment suggests we are stuck against a real wall.

    The cryptography parallel is the kind of cross-disciplinary insight that does not show up enough in AI discourse. Treating neural networks as a kind of differentiable cipher reframes a lot of the architecture choices (residual connections, layer normalization, attention) as deliberate efforts to make the function smooth and invertible enough to learn, in contrast to ciphers, which are deliberately designed to resist exactly that. Adversarial robustness research probably has a lot more to learn from cryptanalysis than it currently does.

    Finally, the format itself is a win. Most AI podcasts are conversational, which is great for personality but bad for technical depth. A blackboard lecture with an interlocutor who asks naive questions at the right moments is a much higher bandwidth medium. More of this, please.

  • Andrej Karpathy on Vibe Coding vs Agentic Engineering: Why He Feels More Behind Than Ever in 2026

    Andrej Karpathy, co-founder of OpenAI, former head of AI at Tesla, and now founder of Eureka Labs, returned to Sequoia Capital’s AI Ascent 2026 stage for a wide-ranging conversation with partner Stephanie Zhan. One year after coining the term “vibe coding,” Karpathy unpacked what has changed, why he has never felt more behind as a programmer, and why the discipline emerging on top of vibe coding, which he calls agentic engineering, is the more serious craft worth learning right now.

    The conversation covered Software 3.0, the limits of verifiability, why LLMs are better understood as ghosts than animals, and why you can outsource your thinking but never your understanding. Below is a complete breakdown of the talk for anyone building, hiring, or learning in the agent era.

    TLDW

    Karpathy describes a sharp transition that happened in December 2025, when agentic coding tools crossed a threshold and code chunks just started coming out fine without correction. He frames the current moment as Software 3.0, where prompting an LLM is the new programming, and entire app categories are collapsing into a single model call. He distinguishes vibe coding (raising the floor for everyone) from agentic engineering (preserving the professional quality bar at much higher speed). Models remain jagged because they are trained on what labs choose to verify, so founders should look for valuable but neglected verifiable domains. Taste, judgment, oversight, and understanding remain uniquely human responsibilities, and tools that enhance understanding are the ones he is most excited about.

    Key Takeaways

    • December 2025 was a clear inflection point. Code chunks from agentic tools started arriving correct without edits, and Karpathy stopped correcting the system entirely.
    • Software 3.0 means programming has become prompting. The context window is your lever over the LLM interpreter, which performs computation in digital information space.
    • Open Code’s installer is a software 3.0 example. Instead of a complex shell script, you copy paste a block of text to your agent, and the agent figures out your environment.
    • The Menu Gen anecdote illustrates how entire apps can become spurious. What used to require OCR, image generation, and a hosted Vercell app can now be a single Gemini plus Nano Banana prompt.
    • Vibe coding raises the floor. Agentic engineering preserves the professional ceiling. The two are different disciplines.
    • The 10x engineer multiplier is now far higher than 10x for people who are good at agentic engineering.
    • Hiring processes have not caught up. Puzzle interviews are the old paradigm. New evaluations should look like building a full Twitter clone for agents and surviving simulated red team attacks from other agents.
    • Models are jagged because reinforcement learning rewards what is verifiable, and labs choose which verifiable domains to invest in. Strawberry letter counts and the 50 meter car wash question show how state-of-the-art models can refactor 100,000 line codebases yet fail at trivial reasoning.
    • If you are in a verifiable setting, you can run your own fine tuning, build RL environments, and benefit even when the labs are not focused on your domain.
    • LLMs are ghosts, not animals. They are statistical simulations summoned from pre training and shaped by RL appendages, not creatures with curiosity or motivation. Yelling at them does not help.
    • Taste, aesthetics, spec design, and oversight remain human jobs. Models still produce bloated, copy paste heavy code with brittle abstractions.
    • Documentation is still written for humans. Agent native infrastructure, where docs are explicitly designed to be copy pasted into an agent, is a major opportunity.
    • The future likely involves agent representation for people and organizations, with agents talking to other agents to coordinate meetings and tasks.
    • You can outsource your thinking but not your understanding. Tools that help humans understand information faster are uniquely valuable.

    Detailed Summary

    Why Karpathy Feels More Behind Than Ever

    Karpathy opens by describing how he has been using agentic coding tools for over a year. For most of that period, the experience was mixed. The tools could write chunks of code, but they often required edits and supervision. December 2025 changed everything. With more time during a holiday break and the release of newer models, Karpathy noticed that the chunks just came out fine. He kept asking for more. He cannot remember the last time he had to correct the agent. He started trusting the system, and what followed was a cascade of side projects.

    He wants to stress that anyone whose model of AI was formed by ChatGPT in early 2025 needs to look again. The agentic coherent workflow that genuinely works is a fundamentally different experience, and the transition was stark.

    Software 3.0 Explained

    The Software 1.0 paradigm was writing explicit code. Software 2.0 was programming by curating datasets and training neural networks. Software 3.0 is programming by prompting. When you train a GPT class model on a sufficiently large set of tasks, the model implicitly learns to multitask everything in the data. The result is a programmable computer where the context window is your interface, and the LLM is the interpreter performing computation in digital information space.

    Karpathy gives two concrete examples. The first is Open Code’s installer. Normally a shell script handles installation across many platforms, and these scripts balloon in complexity. Open Code instead provides a block of text you copy paste to your agent. The agent reads your environment, follows instructions, debugs in a loop, and gets things working. You no longer specify every detail. The agent supplies its own intelligence.

    The Menu Gen Story

    The second example is Karpathy’s Menu Gen project. He built an app that takes a photo of a restaurant menu, OCRs the items, generates pictures for each dish, and renders the enhanced menu. The app runs on Vercell and chains together multiple services. Then he saw a software 3.0 alternative. You take a photo, give it to Gemini, and ask it to use Nano Banana to overlay generated images onto the menu. The model returns a single image with everything rendered. The entire app he built is now spurious. The neural network does the work. The prompt is the photo. The output is the photo. There is no app between them.

    Karpathy uses this to argue that founders should not just think of AI as a speedup of existing patterns. Entirely new things become possible. His example is LLM driven knowledge bases that compile a wiki for an organization from raw documents. That is not a faster version of older code. It is a new capability with no prior equivalent.

    What Will Look Obvious in Hindsight

    Stephanie Zhan asks what the equivalent of building websites in the 1990s or mobile apps in the 2010s looks like today. Karpathy speculates about completely neural computers. Imagine a device that takes raw video and audio as input, runs a neural net as the host process, and uses diffusion to render a unique UI for each moment. He notes that early computing in the 1950s and 60s was undecided between calculator like and neural net like architectures. We went down the calculator path. He thinks the relationship may eventually flip, with neural networks becoming the host and CPUs becoming co processors used for deterministic appendages.

    Verifiability and Jagged Intelligence

    Karpathy spent significant writing time on verifiability. Classical computers automate what you can specify in code. The current generation of LLMs automates what you can verify. Frontier labs train models inside giant reinforcement learning environments, so the models peak in capability where verification rewards are strong, especially math and code. They stagnate or get rough around the edges elsewhere.

    This explains the jagged intelligence puzzle. The classic example was counting letters in strawberry. The newer one Karpathy offers: a state of the art model will refactor a 100,000 line codebase or find zero day vulnerabilities, then tell you to walk to a car wash 50 meters away because it is so close. The two coexisting capabilities should be jarring. They reveal that you must stay in the loop, treat models as tools, and understand which RL circuits your task lands in.

    He also points out that data distribution choices matter. The jump in chess capability from GPT 3.5 to GPT 4 came largely because someone at OpenAI added a huge amount of chess data to pre training. Whatever ends up in the mix gets disproportionately good. You are at the mercy of what labs prioritize, and you have to explore the model the labs hand you because there is no manual.

    Founder Advice in a Lab Dominated World

    Asked what founders should do given that labs are racing toward escape velocity in obvious verifiable domains, Karpathy points back to verifiability itself. If your domain is verifiable but currently neglected, you can build RL environments and run your own fine tuning. The technology works. Pull the lever with diverse RL environments and a fine tuning framework, and you get something useful. He hints there is one specific domain he finds undervalued but declines to name it on stage.

    On the question of what is automatable only from a distance, Karpathy says almost everything can ultimately be made verifiable. Even writing can be assessed by councils of LLM judges. The differences are in difficulty, not in possibility.

    From Vibe Coding to Agentic Engineering

    Vibe coding raises the floor. Anyone can build something. Agentic engineering preserves the professional quality bar that existed before. You are still responsible for your software. You are still not allowed to ship vulnerabilities. The question is how you go faster without sacrificing standards. Karpathy calls it an engineering discipline because coordinating spiky, stochastic agents to maintain quality at speed requires real skill.

    The ceiling on agentic engineering capability is very high. The old idea of a 10x engineer is now an understatement. People who are good at this peak far above 10x.

    What Mediocre Versus AI Native Looks Like

    Karpathy compares this to how different generations use ChatGPT. The difference between a mediocre and an AI native engineer using Claude Code, Codex, or Open Code is investment in setup and full use of available features. The same way previous generations of engineers got the most out of Vim or VSCode, today’s strong engineers tune their agentic environments deeply.

    He thinks hiring processes have not caught up. Most companies still hand out puzzles. The new test should look like asking a candidate to build a full Twitter clone for agents, make it secure, simulate user activity with agents, and then run multiple Codex 5.4x high instances trying to break it. The candidate’s system should hold up.

    What Humans Still Own

    Agents are intern level entities right now. Humans are responsible for aesthetics, judgment, taste, and oversight. Karpathy describes a Menu Gen bug where the agent tried to associate Stripe purchases with Google accounts using email addresses as the key, instead of a persistent user ID. Email addresses can differ between Stripe and Google accounts. This kind of specification level mistake is exactly what humans must catch.

    He works with agents to design detailed specs and treats those as documentation. The agent fills in the implementation. He has stopped memorizing API details for things like NumPy axis arguments or PyTorch reshape versus permute. The intern handles recall. Humans handle architecture, design, and the right questions.

    Reading the actual code agents produce can still cause heart attacks. It is bloated, full of copy paste, riddled with awkward and brittle abstractions. His Micro GPT project, an attempt to simplify LLM training to its bare essence, was nearly impossible to drive through agents. The models hate simplification. That capability sits outside their RL circuits. Nothing is fundamentally preventing this from improving. The labs simply have not invested.

    Animals Versus Ghosts

    Karpathy returns to his framing that we are not building animals, we are summoning ghosts. Animal intelligence comes from evolution and is shaped by intrinsic motivation, fun, curiosity, and empowerment. LLMs are statistical simulation circuits where pre training is the substrate and RL is bolted on as appendages. They are jagged. They do not respond to being yelled at. They have no real curiosity. The ghost framing is partly philosophical, but it changes how you approach them. You stay suspicious. You explore. You do not assume the system you used yesterday will behave the same on a new task.

    Agent Native Infrastructure

    Most software, frameworks, libraries, and documentation are still written for humans. Karpathy’s pet peeve is being told to do something instead of being given a block of text to copy paste to his agent. He wants agent first infrastructure. The Menu Gen project’s hardest part was not writing code. It was deploying on Vercell, configuring DNS, navigating service settings, and stringing together integrations. He wants to give a single prompt and have the entire thing deployed without touching anything.

    Long term he expects agent representation for individuals and organizations. His agent will negotiate meeting details with your agent. The world becomes one of sensors, actuators, and agent native data structures legible to LLMs.

    Education and What Still Matters

    The most striking line of the conversation comes near the end. Karpathy quotes a tweet that shaped his thinking: you can outsource your thinking but you cannot outsource your understanding. Information still has to make it into your brain. You still need to know what you are building and why. You cannot direct agents well if you do not understand the system.

    This is part of why he is so excited about LLM driven knowledge bases. Every time he reads an article, his personal wiki absorbs it, and he can query it from new angles. Every projection onto the same information yields new insight. Tools that enhance human understanding are uniquely valuable because LLMs do not excel at understanding. That bottleneck is yours to manage.

    Thoughts

    The most useful frame in this talk is the distinction between vibe coding and agentic engineering. It clarifies what has been muddled for the past year. Vibe coding is about access. Anyone can produce something. Agentic engineering is about discipline. You preserve the standards that made software trustworthy in the first place, while moving at speeds that would have seemed absurd two years ago. These are not the same activity, and conflating them is part of why so many shipped products feel half built.

    The Menu Gen anecdote is the kind of story that should make every solo developer pause. If a single Gemini plus Nano Banana prompt can replace a multi service Vercell deployed app, the question for any builder becomes how much of what you are working on right now is going to be made spurious by the next model release. The honest answer is probably more than you want to admit. The defensive posture is not building thicker apps. It is choosing problems where the model alone is not enough, where taste, distribution, infrastructure, or specific verifiable RL environments give you something the next model cannot collapse into a prompt.

    The verifiability lens is also unusually practical. If you are a solo builder, the question shifts from what is possible to what is verifiable but neglected. The labs will eat the obvious verifiable domains because that is how their RL pipelines are set up. The opportunity is in domains where verification is possible but the labs have not yet invested. That is a much more concrete strategic filter than vague intuitions about defensibility.

    The car wash example is going to stick. State of the art models can refactor enormous codebases and still tell you to walk somewhere a sane person would drive. That is the lived reality of jagged intelligence, and it argues strongly for staying in the loop on real decisions rather than handing off everything to agents. The agents are excellent fillers of blanks. They are not yet trustworthy specifiers of the spec.

    Finally, the line about outsourcing thinking but not understanding is worth taping above the desk. The bottleneck is no longer typing speed, syntax recall, or even API knowledge. It is whether the human in the loop actually understands the system being built. Tools that genuinely improve human understanding, including personal knowledge bases that re project information through different prompts, are likely the most undervalued category of products being built right now. The opportunity is not just in agents. It is in the cognitive scaffolding that makes humans good directors of agents.

  • Paul Tudor Jones on Macro Trading, Bitcoin, the AI Existential Threat, and Why the US Stock Market Is the Most Leveraged in History

    Legendary macro trader Paul Tudor Jones sat down with Patrick O’Shaughnessy on Invest Like the Best for a sweeping conversation that spans 50 years of trading, the 1980 silver collapse, the 1987 crash, his evolving admiration for Warren Buffett, his alarming view of AI safety, and a daily routine that starts at 2:30 AM. This is one of the most candid and useful conversations a working trader, investor, or builder can listen to right now.

    TLDW (Too Long, Didn’t Watch)

    Paul Tudor Jones believes the United States is sitting on the most leveraged equity market in history at 252% of GDP, dwarfing 1929 and 2000. He sees a sovereign debt bubble, a coming wave of IPO supply that could reverse a decade of buyback driven gains, and a dollar yen trade setting up as the next big macro opportunity. He calls Bitcoin the best inflation hedge that exists thanks to its finite supply, but flags real cyber and quantum tail risks. He apologizes publicly to Warren Buffett for years of doubting him and calls him the OG of compound interest. He thinks AI is being deployed without any meaningful safety regulation, that watermarking AI content should be mandated by law, and that humanity is sleepwalking into a tail risk that could cost hundreds of millions of lives. And he closes with a simple life formula: God, family, friends, fun, and service, with a daily intentional act of kindness as the secret to a meaningful life.

    Key Takeaways

    • The US equity market is at 252% of GDP, the highest in history. For context, 1929 peaked at 65%, 1987 around 85 to 90%, and 2000 around 170%. A standard mean reversion to long term PEs would be a 30 to 35% decline, which on this base would shave 80 to 90% of GDP in market cap.
    • We are in a sovereign debt bubble, not necessarily an equity bubble. But the country is over equitized, individual equity weightings are at all time highs, and private equity has more than doubled as a share of institutional portfolios since 2008.
    • IPO supply is about to flip the buyback math. Buybacks have been retiring roughly 2% of market cap per year for a decade. Contemplated IPOs in the next year could equal 5 to 6% of market cap, reversing a structural tailwind.
    • Hyperscaler capex will eat into tech cash flow, which is part of why tech has been dogging it and may continue to.
    • The buy and hold S&P 500 advice is dangerous at current valuations. Historically, buying the S&P at a PE of 22 has produced negative 10 year returns. Valuation matters even on long horizons.
    • Dollar yen is his current setup. The yen has been grossly undervalued for 24 months. Japan is the largest net international investment creditor, holding roughly $4.5 trillion mostly unhedged in dollars. The catalyst is a new Reagan or Thatcher style prime minister who Paul thinks will trigger a sharp yen rally.
    • Bitcoin is the best inflation hedge in existence because it is finite and decentralized, more scarce than gold. The two real risks are kinetic conflict triggering cyber warfare and the eventual arrival of quantum computing.
    • Every major crash he has lived through had the same DNA: leverage, usually derivative driven. 1987 was 100% portfolio insurance. 1998 was Long Term Capital and derivatives. 2000 was an IPO supply unlock cascade. Today combines all three risks with sovereign debt fragility on top.
    • Trading is boxing, not chess. Most days you are jabbing and feeling out the market. A few times per cycle there is a real opening. Bitcoin in 2020 was a knockout. Two year rates in 2022 was a knockout. The job is to be ready when the opening appears.
    • Great traders are 70% born, not made. Paul polled his top risk takers and the consensus was nature dominates nurture. The traits: type A, hyper curious, loves competition, loves games, intuitive grasp of probability.
    • Liquidity is everything. His grandfather told him as a kid, “you are only worth what you can write a check for tomorrow.” He watched Bunker Hunt go from richest man on earth to virtually bankrupt in six weeks during the 1980 silver collapse. The lesson stuck.
    • Warren Buffett apology. Paul publicly recants decades of skepticism, calling Buffett a flipping genius who understood compound interest at age nine and the OG of compounding.
    • AI safety is a five alarm fire. Paul attended a small conference with modelers from the four biggest model labs. The consensus answer to how AI safety gets resolved was, paraphrasing, when 50 to 100 million people die in an accident. He thinks this is insane.
    • Mandatory AI watermarking should be a campaign issue. He wants knowing violations made a felony after three offenses. He says deepfakes have already fooled serious people he knows twice this year.
    • The build, break, iterate model is fine for most technology and catastrophic for AI because the break in this case can be civilization scale. The Atomic Energy Commission was created 18 months after the bomb. We are three years into deployed AI with effectively zero regulation.
    • Daily routine for 50 years: wake at 6:15, work an hour, 45 minutes of hard cardio, screens for the open, meetings 10 to 12, lunch meeting, hour before close and hour after to plan the next day, walk with wife at 5, work, dinner, mindless TV, work 9:30 to 10:15, sleep, wake at 2:30 or 3 AM to watch the London open and do analytical work, then back to sleep until 6:15.
    • Information overload is now the bottleneck. He works harder today than 40 years ago because the volume of inputs has exploded. The challenge is preserving what he calls exquisite execution: buying when there is blood on the ground and selling at maximum elation.
    • Eli Tullis was his trading mentor. Tullis traded almost only cotton and was a master of executing at the maximum apogee of fear and greed. The biggest lesson came after a catastrophic loss when Tullis greeted his wife’s friends with a smile and total composure. When the going gets tough, the tough get going.
    • Robin Hood Foundation was born from a wrong call. Paul was convinced 1987 would trigger a depression. It did not. But the conviction launched what became one of the most influential anti poverty organizations in America.
    • Journalism 101 should be required at every college. Newspaper inverted pyramid writing taught him principal component analysis: lead with the most important fact, then the next, then the next. He says it is exactly how he ranks variables in a trade.
    • If you do not use it, you lose it. A Palm Beach doctor told him “you retire, you die” and it changed how he thinks about working into his 90s.
    • The principal components of a great life: God, family, friends, fun, service. Significance does not come from the trades. It comes from the people you loved and the people you served.
    • Kill them with kindness. One intentional act of kindness per day, repeated, rewires you. “I should” becomes “I am.” It is the closing message of the entire conversation.

    Detailed Summary

    The Kindest Thing: A Three Year Old Lost in a Vegetable Market

    Paul opens the conversation by insisting they reverse the usual order of the show and start with Patrick’s signature closing question: what is the kindest thing anyone has ever done for you. His earliest childhood memory is being separated from his mother around age two and a half at an outdoor produce market in Memphis in 1957. An elderly Black man took his hand, walked him up and down the aisles, and reunited him with his mother. When she tried to give him five dollars, a meaningful sum at the time, he refused, saying he knew she would do it for his child. That night Paul began adding the unnamed man to his prayer list. He repeated that prayer roughly four to five thousand times over the next twelve years.

    Decades later, watching Harry Reasoner interview Eugene Lang on 60 Minutes, Paul saw the photo negative of his own story: an older man, this time helping kids of color in Harlem, promising to put them through college if they finished high school. Paul called Lang the next day and was redirected to Bedford Stuyvesant, the highest crime neighborhood in New York at the time. He adopted a class, ran after school programs, hired tutors, dealt with kids being murdered and teen pregnancy, and learned by failing what poverty actually requires to defeat. That work seeded the Robin Hood Foundation in 1987 and one of the first charter schools in New York, the Bedford Stuyvesant Charter School of Excellence, which became the number one ranked elementary school out of 543 in NYC within five years.

    Aim High and Shoot Straight

    Paul tells the story of his commencement address at what is now Rhodes College in Memphis. He polled the audience to see who remembered their own commencement speakers. Almost no one did. So he ended his speech by pulling out a bow, knocking an arrow, telling the graduates “whatever you do, aim high and shoot straight,” and shooting an apple off a table. Memorable.

    Trading vs Investing: A 50 Year Career in the Trenches

    Paul started in 1976 when inflation was raging and assets routinely doubled and halved in a single year. He cut his teeth on the floor of the cotton exchange and the COMEX, watching Bunker Hunt accumulate roughly 200 million ounces of silver at an average cost of $3.12 and ride it to roughly $50 an ounce, becoming worth $11 billion at the peak. When the COMEX restricted silver to liquidation only, the price collapsed from $50 to under $10 in eight weeks. Hunt was virtually bankrupt. The searing lesson: never trust permanence in any asset, and always preserve liquidity.

    He contrasts his own life with Warren Buffett’s. Paul’s BBI Fund has run for 40 years with a negative 0.12 correlation to the S&P 500, meaning 100% of returns are alpha. He compares trading to playing right guard in the NFL for 50 years, fighting in the trenches every single day, while Buffett’s belief in America gave him a different kind of strength: the ability to ride out a 50% drawdown in 2008 to 2009 without flinching. After listening to the Acquired podcast on Berkshire Hathaway, Paul realized Buffett understood compound interest at age nine and sought out Benjamin Graham at 17. He calls himself an idiot for ever doubting him.

    The AI Existential Risk Argument

    Paul attended a small conference around 18 months ago with roughly 35 to 40 attendees, including one modeler from each of the four largest AI labs. When he asked them point blank how they expected AI safety to get resolved, the consensus answer was, paraphrasing, that meaningful action would only happen after a mass casualty event of 50 to 100 million people. He has been alarmed ever since.

    His core critique is structural. The build, break, iterate cycle has been the engine of human invention since the beginning. The problem is that AI is the first technology where the tail event of a break could be civilizational. He compares the regulatory response unfavorably to the atomic bomb: the Atomic Energy Commission was stood up 18 months after Hiroshima. We are three years into widely deployed AI with no real regulation, no public referendum, and no convening with adversaries like China.

    His specific policy ask is mandatory watermarking of AI generated content, with knowing violations made a felony after three offenses. He says deepfakes have already deceived people he trusts twice this year and that restoring trust in a basic shared reality is foundational to fixing American discourse. He also notes that a meaningful share of senior AI scientists openly envision a future of brain implanted humans with inalienable rights. He thinks most humans, given a vote, would reject that path. His point is that there has been no vote.

    The Nature of Trading: Boxing, Not Chess

    Trading, Paul says, is more like classic boxing than chess. You are jabbing, feeling out the opponent, looking for an opening. Most days you are gathering information and not doing much. A few times per cycle there is a real opening that you can land hard. He cites Bitcoin in 2020 and two year rates in 2022 as recent knockouts.

    The genesis of every big move, he argues, is one of three things: the market got carried away, an imbalance went on too long, or a central bank or government did something they should not have. Right now he thinks dollar yen fits the pattern: the yen has been grossly undervalued for two years, Japan holds about $4.5 trillion in net international investment positions mostly unhedged in dollars, and the catalyst has arrived in a new prime minister he compares to Reagan, Thatcher, or Trump in his second term.

    Bitcoin as the Best Inflation Hedge

    Paul reiterates Bitcoin as superior to gold as an inflation hedge. Gold supply grows roughly two percent a year. Bitcoin’s supply is capped. Decentralization adds defensibility. The honest caveats: any kinetic global conflict will trigger cyber warfare, and electronic assets sit on the front line. Quantum computing, if and when it arrives, could enable hacks of any bank or any digital store of value. He is not predicting either tomorrow but he is unwilling to ignore them.

    Are We in a Bubble? Look at the Numbers

    The headline statistic is jaw dropping. Stock market capitalization to GDP is currently 252%. The 1929 peak was 65%. The 1987 peak was 85 to 90%. The 2000 peak was 170%. We have never been here before.

    Bear markets since 1970 have mean reverted on roughly a ten year cadence. A reversion to a normalized PE from current levels would imply a 30 to 35% decline. On a 250% of GDP base, that is 80 to 90 points of GDP in evaporated wealth. Capital gains tax revenue would crater, the deficit would explode, and the bond market would suffer a self reinforcing negative feedback loop.

    Add to this the IPO unlock schedule. Contemplated IPOs over the next year may equal 5 to 6% of market cap. For a decade, buybacks have removed roughly 2% per year. The math is about to flip. Hyperscaler capex commitments will further eat into the cash flow that funded the buybacks. Private equity has gone from 7% of institutional portfolios in 2007 to 16% today. Real estate and infrastructure allocations have grown. The system is dramatically more illiquid and more leveraged than it was in 2008.

    Paul’s specific warning to anyone telling clients to just buy the S&P: at a starting PE of 22, history shows negative 10 year returns. Valuation always matters.

    A Day in the Life of PTJ

    The schedule is monastic. Up at 6:15. Work an hour. 45 minutes of hard cardio. At the screens for the open. Meetings from 10 to 12. Lunch meeting. Afternoon meeting. An hour before the close and an hour after to plan tomorrow and think about what is coming overnight in Tokyo and Hong Kong. Home around 5. An hour walking with his wife. Another hour of work. Dinner. Mindless TV. Work again from 9:30 to 10:15. Sleep. Wake at 2:30 or 3 AM to watch the London open for 30 to 45 minutes and do analytical work in the quiet. Back to sleep. Wake at 6:15. Repeat for 40 years.

    He says he works harder now than ever before because of information overload. The opportunity cost of every distraction is exquisite execution: buying when there is blood on the ground, selling at peak euphoria.

    Eli Tullis and Executing at Maximum Pain

    Paul’s mentor Eli Tullis traded almost exclusively cotton. The defining moment came after Tullis was annihilated when a long awaited drought broke and cotton went limit down over a weekend. Paul watched in disbelief as Tullis welcomed his wife’s friends to a beautiful office for lunch with a smile, charm, and zero visible distress. The lesson, branded into Paul: when the going gets tough, the tough get going.

    Are Traders Born or Made

    Paul polled four or five of his best risk takers at a Christmas dinner. The unanimous answer: roughly 70% nature. The traits that recur: type A personality, hyper curiosity, love of competition, obsession with games, intuitive grasp of probability theory. Paul had a degree in probability theory without ever taking a math course on it. He played chess, backgammon, monopoly, gin rummy, gambled in college, and has never stopped playing bridge with friends.

    Why Keep Trading?

    Three reasons. First, his Palm Beach doctor told him retirement equals death. If you do not use it, you lose it. Second, his father lived to 100 and Paul wants to remain mentally sharp through his 90s. Third, and most importantly, he wants to make an absolute pot of money so he can give it away. The pursuit of nobility, as he calls it.

    The Workless World

    Paul used to despair about a future where AI does so much that humans no longer need to work. So much human significance comes from work. He has become more optimistic recently, watching how athletes find significance in sport and how he finds significance in bridge games with friends. Humans, he argues, are absurdly adaptable. We may find significance in something as small as a single intentional act of kindness per day.

    Why Journalism 101 Should Be Required

    Paul’s father ran a tiny trade finance legal paper in Memphis. Paul grew up writing for it and taking journalism classes. He argues that newspaper inverted pyramid writing should be mandatory in every college, more important than business school. Conclusion first. First sentence carries the most important fact. Who, what, where, when, why, how. Each subsequent paragraph drops one notch in importance. This is just principal component analysis applied to communication. It is also exactly how Paul ranks variables in a trade. At any given moment, ten things might matter, but only one is the catalytic variable today. The discipline of the inverted pyramid is the discipline of trading.

    The Principal Components of a Great Life

    Asked to apply the same framework to life, Paul answers without hesitation: God, family, friends, fun, service. He says he has actually thought about his own funeral with anticipation, partly because of the songs he has chosen. At the end, he says, no one thinks about the 1987 crash or Bitcoin. They think about who they loved, who loved them, what kind of relationships they had, and what they did to leave a legacy of betterment for others. Legacy, he insists, means deeds, not words.

    Kill Them With Kindness

    The closing message comes from his mother. Wake up some days you will be in a bad mood. Something on TV will make you angry. The temptation today is to demonize the other side. The antidote is intentional. One simple act of kindness per day, transmitted outward, repeated. Reps matter. “I should” becomes “I am.” Over time you become an organically kind person. Your outlook brightens. Multiply that across a country and the country changes.

    Thoughts

    The 252% market cap to GDP figure is the single most important number in the conversation. Most listeners will gloss over it. They should not. The structural argument Paul lays out is internally consistent and uncomfortably specific: an over equitized country, a sovereign debt bubble, an IPO supply wave that flips a decade of buyback math, hyperscaler capex eating cash flow, private equity more than doubled as a portfolio share since 2008, and far less liquidity than 2008 to absorb a shock. None of these are predictions of an imminent crash. They are descriptions of the kindling.

    His Buffett apology is the kind of intellectual honesty that is rare in finance. Two operators with opposite styles can both be right for fifty years. Paul’s negative correlation to the S&P with 100% alpha and Buffett’s belief in America with patient compounding are not rival theories of investing. They are different jobs. Most retail investors are trying to do Buffett’s job with a trader’s emotional reflexes, which is why so few make it.

    The AI section is the part of the interview that should make builders pause. Paul is not an AI doomer in the online sense. He is a 50 year career risk manager applying the standard framework: what is the size of the tail, what is the regulatory containment, who has the kill switch. His answer is that the tail is potentially civilization scale, the containment is effectively zero, and there is no kill switch. The historical precedent he reaches for is not science fiction but the Atomic Energy Commission stood up 18 months after Hiroshima. The contrast with our current trajectory is uncomfortable.

    The watermarking proposal is unusually concrete for a trader and unusually politically tractable for an AI safety policy. It does not require slowing capability research. It does not require international coordination as a precondition. It restores the basic epistemic substrate of public discourse: knowing what is human and what is not. Whether you think AI risks are overblown or underrated, watermarking is a Pareto improvement.

    For builders shipping software in the AI era, the meta lesson is that we are running the build, break, iterate playbook on a system whose break radius is no longer contained by the founders. That is a different kind of responsibility than the one most engineers have ever held. It does not have a clean answer yet. But the question is now visible.

    The kindness frame at the start and end is not throat clearing. It is the actual operating system Paul has run on for 70 years. The four to five thousand prayer reps for an unnamed man who held his hand in a Memphis vegetable market produced a pattern interrupt 25 years later that founded one of the most effective anti poverty organizations in the country. Compound interest applies to acts as much as to dollars. That is the through line of the entire conversation, and it is the thing most listeners will forget by tomorrow morning. They should not.

  • Claude Opus 4.7 Released: Anthropic’s New Coding Powerhouse With xhigh Effort Mode, 3.75MP Vision, and State-of-the-Art Agentic Performance

    TLDR

    Anthropic released Claude Opus 4.7 on April 16, 2026, as a direct upgrade to Opus 4.6. It delivers major gains on the hardest coding tasks, introduces a new xhigh effort level, supports images up to 2,576 pixels on the long edge (roughly 3.75 megapixels), and ships with automatic cybersecurity safeguards. Pricing stays flat at $5 per million input tokens and $25 per million output tokens. Early testers at Cursor, Replit, Vercel, Notion, Devin, Harvey, Databricks, and Warp report double-digit benchmark jumps, stronger instruction following, better long-horizon autonomy, and a more opinionated model that pushes back instead of agreeing reflexively.

    Key Takeaways

    • Direct upgrade from Opus 4.6 at the same price point, available via API as claude-opus-4-7, plus Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
    • New xhigh effort level slots between high and max, giving developers finer control over the reasoning-versus-latency tradeoff.
    • Vision gets a real jump: images up to 2,576 pixels on the long edge, more than 3x prior Claude models. XBOW reported 98.5% visual acuity versus 54.5% for Opus 4.6.
    • Coding benchmarks up across the board: Cursor saw 70% on CursorBench versus 58% for 4.6, Rakuten-SWE-Bench resolved 3x more production tasks, and GitHub measured a 13% lift on their 93-task benchmark.
    • Long-horizon autonomy is a headline theme. Devin says Opus 4.7 works coherently for hours. Genspark highlights loop resistance and the highest quality-per-tool-call ratio they have measured.
    • Instruction following is substantially tighter, which means old prompts written for loose-interpretation models may now behave unexpectedly. Re-tune prompts and harnesses.
    • Better memory across file-system-based workflows, reducing the need for up-front context in multi-session work.
    • Tokenizer changed: same input can now map to 1.0 to 1.35x more tokens. Opus 4.7 also thinks more at higher effort levels, so output token counts rise too.
    • Cybersecurity safeguards automatically detect and block prohibited or high-risk cyber requests. Legitimate security researchers can apply to the new Cyber Verification Program.
    • Claude Code gets /ultrareview, a dedicated review session that catches bugs and design issues. Pro and Max users get three free ultrareviews. Auto mode is extended to Max users.
    • State-of-the-art on GDPval-AA, a third-party evaluation of economically valuable knowledge work spanning finance, legal, and other domains.
    • Not the most capable overall model. That distinction still goes to Claude Mythos Preview, which also remains the best-aligned model Anthropic has trained.

    Detailed Summary

    What Claude Opus 4.7 Actually Is

    Claude Opus 4.7 is Anthropic’s latest generally available frontier model, positioned as a targeted upgrade to Opus 4.6 rather than a ground-up new generation. The focus is squarely on advanced software engineering, long-running agentic workflows, and higher-fidelity vision. Anthropic describes it as handling complex, long-running tasks with rigor and consistency, paying precise attention to instructions, and devising ways to verify its own outputs before reporting back.

    The positioning matters. Claude Mythos Preview, announced alongside Project Glasswing, remains the most powerful and best-aligned model Anthropic has trained. Opus 4.7 is the first release after Mythos Preview and serves a dual purpose: give developers a concrete upgrade today, and stress-test new cybersecurity safeguards on a less capable model before Anthropic attempts a broader release of Mythos-class systems.

    Coding and Agentic Performance

    The early-access testimonials read like a highlight reel of the agentic coding ecosystem. Cursor saw CursorBench scores jump from 58% on Opus 4.6 to over 70% on Opus 4.7. Rakuten measured 3x more resolved production tasks on Rakuten-SWE-Bench with double-digit gains in code quality and test quality. GitHub measured a 13% lift on a 93-task coding benchmark including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Notion observed a 14% improvement over Opus 4.6 at fewer tokens and a third of the tool errors, calling it the first model to pass their implicit-need tests.

    Devin emphasized sustained autonomy, saying the model works coherently for hours and pushes through hard problems rather than giving up. Warp reported that Opus 4.7 passed Terminal Bench tasks prior Claude models had failed, including a tricky concurrency bug Opus 4.6 could not crack. Vercel highlighted a behavior they had not seen before: the model actually does proofs on systems code before starting work, and is noticeably more honest about its own limits.

    A recurring theme across testimonials is that Opus 4.7 pushes back. Replit’s president said it feels like a better coworker because it challenges technical decisions instead of agreeing by default. Augment Code noted it brings a more opinionated perspective rather than simply agreeing with the user. For anyone building real engineering workflows, that pushback behavior is arguably more valuable than raw benchmark deltas.

    Vision: The Quiet Breakthrough

    The vision upgrade may be the most underappreciated change. Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels, which is more than three times the previous Claude limit. This is a model-level change, not an API parameter, so every image sent to Claude is processed at higher fidelity automatically.

    XBOW, which builds autonomous penetration testing agents that rely heavily on computer use, reported the most dramatic single number in the entire announcement: 98.5% on their visual acuity benchmark versus 54.5% for Opus 4.6. They described their single biggest Opus pain point as effectively disappearing, unlocking an entire class of work where they could not previously use Claude. Solve Intelligence reported major improvements in multimodal understanding for life sciences patent workflows, from reading chemical structures to interpreting complex technical diagrams.

    This unlocks computer-use agents reading dense screenshots, data extraction from complex diagrams, and any work requiring pixel-perfect references.

    The New xhigh Effort Level

    Opus 4.7 introduces an xhigh (extra high) effort level that sits between high and max. This gives developers a new middle gear for the reasoning-versus-latency tradeoff on hard problems. In Claude Code, Anthropic raised the default effort level to xhigh across all plans. For coding and agentic use cases, Anthropic recommends starting with high or xhigh effort rather than defaulting to medium.

    Alongside effort controls, the Claude Platform is getting task budgets in public beta, letting developers guide Claude’s token spend so it can prioritize work across longer runs. This matters because Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings.

    Token Usage Changes You Need to Plan For

    Two token-related changes affect migration. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text, but the tradeoff is that the same input can map to 1.0 to 1.35x more tokens depending on content type. Second, Opus 4.7 thinks more at higher effort levels, which means more output tokens on hard problems.

    Anthropic’s own internal coding evaluation shows the net effect is favorable when measured against quality delivered per token, but the recommendation is to measure the difference on real traffic rather than assume. Token usage can be controlled via the effort parameter, task budgets, or simply prompting the model to be more concise. Anthropic published a migration guide with tuning advice.

    Claude Code Updates: /ultrareview and Auto Mode

    Claude Code gets two meaningful additions. The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it out.

    Auto mode, a permissions option where Claude makes decisions on behalf of the user so longer tasks run with fewer interruptions, has been extended from Pro to Max users. The pitch is that auto mode is safer than skipping all permissions while still enabling long autonomous runs.

    Cybersecurity Safeguards and the Cyber Verification Program

    Opus 4.7 ships with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses. During training, Anthropic experimented with efforts to differentially reduce cyber capabilities, meaning Opus 4.7’s cyber ceiling is intentionally lower than Mythos Preview’s.

    For legitimate users, Anthropic launched a Cyber Verification Program for security professionals doing vulnerability research, penetration testing, and red-teaming. Real-world data from these safeguards will inform how Anthropic eventually releases Mythos-class models more broadly.

    Safety and Alignment

    Opus 4.7 shows a similar safety profile to Opus 4.6 overall. Honesty and resistance to prompt injection attacks improved. Some measures slipped modestly, notably a tendency to give overly detailed harm-reduction advice on controlled substances. Anthropic’s alignment assessment concluded the model is largely well-aligned and trustworthy, though not fully ideal. Mythos Preview still holds the crown as the best-aligned model according to Anthropic’s evaluations. The full Claude Opus 4.7 System Card has the complete breakdown.

    Real-World Work Beyond Code

    Opus 4.7 posts a state-of-the-art score on the Finance Agent evaluation and on GDPval-AA, a third-party evaluation of economically valuable knowledge work spanning finance, legal, and other domains. Harvey reported 90.9% on BigLaw Bench at high effort with noticeably smarter handling of ambiguous document editing tasks, including correctly distinguishing assignment provisions from change-of-control provisions. Databricks measured 21% fewer errors than Opus 4.6 on OfficeQA Pro document reasoning. Vercel went as far as calling it the best model in the world for building dashboards and data-rich interfaces.

    Pricing and Availability

    Pricing holds at $5 per million input tokens and $25 per million output tokens. Opus 4.7 is live today across all Claude products, the Claude API as claude-opus-4-7, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

    Thoughts

    The most interesting thing about this release is not the benchmark deltas, which are strong but expected for a point-release. It is the behavioral shift. When a dozen independent companies describe the same model as opinionated, willing to push back, self-verifying, and honest about its limits, that is a different product category than “next version, slightly better.” That is a model optimized for being a collaborator rather than an autocomplete.

    For solo builders running long agentic sessions, the loop resistance and long-horizon autonomy claims are the ones worth taking seriously. Genspark’s framing is sharp: a model that loops indefinitely on 1 in 18 queries wastes compute and blocks users. If Opus 4.7 genuinely closes that failure mode, the economics of overnight autonomous runs change meaningfully.

    The vision jump is the sleeper feature. 3.75 megapixel support plus the XBOW acuity number suggests computer-use agents are about to get a lot more reliable at reading actual screens. Anyone building browser agents, automated QA, or visual data extraction pipelines should retest their stacks this week.

    The instruction-following tightening is a real gotcha. Prompts written against Opus 4.6’s looser interpretation habits may produce surprising results when the model now takes every word literally. Teams with production prompt libraries should budget time for re-tuning rather than expecting a drop-in swap.

    Finally, the strategic framing around Mythos Preview is worth noting. Anthropic is explicitly using Opus 4.7 as a safeguards testbed for eventually releasing more capable cyber-capable systems. That is an honest acknowledgment that capability and deployment readiness are separate problems, and it sets a template for how frontier releases may work going forward.