PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: recursive self improvement

Inkling: Thinking Machines Lab Releases Its First Open-Weights Model, a 975B Multimodal Mixture-of-Experts With Controllable Thinking Effort That Can Fine-Tune Itself on Tinker
Today, we are introducing Inkling.

Inkling reasons efficiently across text, image, and audio modalities. We are making the full weights available.https://t.co/Ghebq5mG30

Available today for fine-tuning on Tinker. Play with it in the Inkling Playground. 🧵
— Thinking Machines (@thinkymachines) July 15, 2026

Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, has released Inkling, its first open-weights model trained from scratch. Inkling is a 975 billion parameter Mixture-of-Experts transformer (41B active) with a context window of up to 1 million tokens, native multimodal reasoning over text, images, and audio, and a dial for controllable thinking effort. The lab is explicit that Inkling is not the strongest model in the world. It is pitched as something arguably more useful: a broad, balanced, customizable foundation you can fine-tune on Tinker, with the full weights on Hugging Face. The announcement even includes a demo where Inkling fine-tunes itself and swaps in its own new weights.

TLDR

Thinking Machines Lab released Inkling, a 975B-total, 41B-active Mixture-of-Experts model pretrained on 45 trillion tokens of text, images, audio, and video, alongside a preview of Inkling-Small (276B total, 12B active). The release covers the model’s generalist benchmark profile across reasoning, agentic coding, tool use, vision, and audio; a controllable thinking effort setting that lets developers trade performance against tokens (matching Nemotron 3 Ultra on Terminal Bench 2.1 at roughly a third of the tokens); an encoder-free multimodal architecture using dMel spectrograms and hMLP image patches; a training recipe combining Muon and Adam with weight decay coupled to the learning rate; RL scaled past 30 million rollouts with log-linearly improving reasoning and an emergent compression of the chain of thought; an epistemics push covering calibration, forecasting (where it beats several frontier models), abstention, and censorship resistance; the strongest FORTRESS adversarial safety score among compared open-weights models; a headline-grabbing demo of the model fine-tuning itself into a lipogram assistant via Tinker; and day-one availability on Tinker (at a 50% discount), Hugging Face, and inference partners including Together, Fireworks, Modal, Databricks, Baseten, vLLM, SGLang, and llama.cpp.

Thoughts

The most striking thing about this launch is its honesty. Nearly every frontier release leads with a claim to be the best at something, and the fine print walks it back. Thinking Machines Lab says plainly that Inkling is not the strongest model available, open or closed, and then makes the case that “strongest” is the wrong axis for most real buyers. If you are going to run a model millions of times inside a product, what you care about is the cost curve, the adaptability, and whether you can shape it to your workflow. That framing conveniently matches their business (Tinker sells fine-tuning), but it also matches how production AI actually gets deployed, where cost and latency are binding constraints and a benchmark crown is trivia.

The self-fine-tuning demo deserves more attention than it will probably get. Asked to become a lipogram assistant that never uses the letter “e” (a behavior prompting alone cannot reliably produce), Inkling wrote its own training objective and scoring function, generated its own synthetic data, launched the run on Tinker, evaluated the result against its base self, and then staged a weight swap so the improved checkpoint took over the session. That is a closed loop of specify, train, evaluate, and self-update, packaged as a cute product demo. The loop is the primitive behind every serious conversation about recursive self-improvement, and here it is running as a marketing asset with a 27 minute wall clock. The gap between “toy objective” and “economically meaningful objective” is now a question of reward design, not plumbing.

Controllable thinking effort is the feature I expect developers to care about most. Instead of publishing a single score, TML publishes a curve: sweep the effort setting from 0.2 to 0.99 and watch performance trade against generated tokens. Inkling reportedly matches Nemotron 3 Ultra on Terminal Bench 2.1 while spending about a third of the tokens. Benchmarks reported as single points hide exactly this, and a model that reaches a target score cheaply beats a model that scores two points higher at triple the cost in any high-volume workload. Expect effort curves to become standard marketing for open models, the way context length became standard a couple of years ago.

The epistemics section is quietly the most differentiated part of the release. TML trained calibration directly, running RL against proper scoring rules on resolved real-world questions, and pairing a rubric grader with a claims grader that does agentic web search to verify each factual assertion. The result is a model that beats GPT-5.5 and Claude Opus 4.8 on ForecastBench without search and holds its own on Prophet Arena. A model that knows when to say “I don’t know” is more useful across messy real-world domains than one that confabulates confidently, and it is notable that a lab whose stated mission is extending human will and judgment treats calibrated uncertainty as a first-class training target rather than a safety afterthought. The censorship-resistance training, validated on Cognition’s Propaganda and Censorship Eval, extends the same idea: trustworthiness as a capability you train, not a policy you bolt on.

Finally, the open-weights safety tension is handled with unusual candor. Inkling posts the strongest adversarial FORTRESS score among the open models compared while keeping benign over-refusal low, and it was tested externally for CBRN, cyber, and loss-of-control capabilities. But everyone in this space knows fine-tuning can strip safety behavior from open weights, and TML ships a fine-tuning platform for this exact model. Their acknowledgment that they are actively studying how safety behavior survives fine-tuning on Tinker is the right thing to say, and it is also the open question that will define whether “safe open weights” is a coherent category at all.

Key Takeaways
- Inkling is Thinking Machines Lab’s first from-scratch, open-weights model: a Mixture-of-Experts transformer with 975B total parameters, 41B active, and a context window up to 1M tokens.
- It was pretrained on 45 trillion tokens spanning text, images, audio, and video, and reasons natively over text, images, and audio without separate encoders.
- A preview of Inkling-Small ships alongside it: a 276B-parameter MoE with just 12B active parameters that matches or beats its larger sibling on several benchmarks thanks to an improved pretraining recipe.
- TML explicitly positions Inkling as a base for customization rather than the strongest overall model, leaning on multimodality, efficient thinking, and Tinker fine-tuning as the differentiators.
- The launch demo shows Inkling fine-tuning itself: it wrote its own training objective and data, ran the job through the Tinker API, evaluated the result, and hot-swapped to its own new weights inside the OpenCode harness.
- The self-fine-tuning target was a lipogram assistant that never uses the letter “e,” a behavior chosen precisely because prompting alone cannot reliably achieve it; the full loop completed in about 27 minutes.
- Controllable thinking effort is a core feature: a setting swept from 0.2 to 0.99 traces a full performance-versus-tokens curve instead of a single benchmark point.
- On Terminal Bench 2.1, Inkling matches Nemotron 3 Ultra’s score at roughly one third of the generated tokens, the release’s flagship efficiency claim.
- Inkling was trained to run inside a variety of coding and agent harnesses, with tool sets and schemas randomized during training to reduce sensitivity to any particular harness.
- On Design Arena’s blinded human-evaluated Agentic Web Dev leaderboard, Inkling scores 1257, among the strongest open-weights models and tied with Claude Opus 4.6.
- Headline benchmark scores at effort 0.99 include SWEBench Verified 77.6%, SWEBench Pro Public 54.3%, Terminal Bench 2.1 63.8%, GPQA Diamond 87.2%, AIME 2026 97.1%, and HLE 29.7% text-only (46.0% with tools).
- Agentic and general scores include MCP Atlas 74.1%, Tau 3 Banking 23.7%, and BrowseComp 77.1% with context management.
- Vision results are strong for an open model: MMMU Pro 73.5%, CharXiv RQ 78.1%, rising to 82.0% when the model uses a Python tool for zooming and cropping during visual reasoning.
- Audio results place it among the strongest open-weights audio models: VoiceBench 91.4%, MMAU 77.2%, and Audio MC 56.6%, well ahead of Qwen3-Omni and Nemotron Nano-Omni on the last.
- The multimodal stack is encoder-free: audio enters as discrete dMel spectrograms and images as 40×40 pixel patches through a four-layer hMLP, both passed through a lightweight embedding layer and processed jointly with text tokens.
- The MoE design largely follows DeepSeek-V3: 256 routed experts plus 2 shared experts per layer, 6 routed experts active per token, with a sigmoid router and auxiliary-loss-free load balancing.
- Attention interleaves sliding-window and global layers at a 5:1 ratio with 8 KV heads, and uses a learned relative positional embedding instead of RoPE, which TML found extrapolates better to long sequences.
- Short convolutions are applied after the key and value projections and on the attention and MLP residual branch outputs, an unusual architectural touch aimed at efficiency and long-context performance.
- Training used a hybrid optimizer strategy, Muon for large matrix weights and Adam for everything else, with weight decay coupled to the square of the learning rate to keep weight magnitudes stable.
- Post-training was bootstrapped with a small SFT phase on synthetic data generated by open-weights models including Kimi K2.5, with the large majority of compute spent on large-scale RL.
- RL was scaled past 30 million rollouts across two long continuous runs, with reasoning performance on a held-out aggregate (AIME, HLE, GPQA, and others) improving log-linearly throughout.
- Effort control was trained by varying the system message and per-token cost across rollouts, teaching the model to modulate its own thinking budget.
- An emergent effect appeared during RL: the chain of thought compressed over training, dropping articles and connectives into a telegraphic style, driven purely by efficiency pressure rather than any targeted reward.
- Inkling was TML’s first major training effort and ran on NVIDIA GB300 NVL72 systems; the lab says future models will push compute scale further across pretraining and RL.
- Calibration was trained directly with RL against proper scoring rules on a large corpus of resolved real-world questions, treating well-placed confidence as a capability rather than a byproduct.
- On ForecastBench without search, Inkling’s Brier Index of 61.1 beats GPT-5.5 (59.1) and Claude Opus 4.8 (54.6), and it stays competitive with search enabled and on Prophet Arena.
- Instruction following was trained with two automated graders working together: a rubric grader scoring against a checklist and a claims grader that verifies each factual claim via agentic web search, improving helpfulness and reducing hallucination simultaneously.
- Abstention-aware rewards on short-form factual QA taught the model to answer when confident and hedge or decline when not, with some prompts explicitly forcing or forbidding hedging so the user’s preference wins.
- Inkling was trained to answer directly on topics subject to censorship, and Cognition’s Propaganda and Censorship Eval found strong censorship non-compliance.
- On FORTRESS, Inkling posts the strongest adversarial refusal score (78.0%) of any compared open-weights model while keeping benign compliance high (95.9%), and scores 98.6% on StrongREJECT.
- Safety testing covered CBRN, cyber, and loss-of-control capabilities plus human-AI threat vectors like sycophancy, vulnerable users, and manipulation, verified by commissioned external testers.
- Inkling is available for fine-tuning on Tinker today with 64K and 256K context options at a 50% limited-time discount, plus a free Inkling Playground chat interface in the Tinker console.
- Full weights are on Hugging Face, including an NVFP4 checkpoint for efficient inference on NVIDIA Blackwell, with API availability via Together, Fireworks, Modal, Databricks, and Baseten and inference support in SGLang, vLLM, TokenSpeed, and llama.cpp.
- TML frames Inkling as the first in a family and as the intended background reasoning model for its previously announced real-time interaction models system.
Detailed Summary

What Inkling Is and Why It Exists

Thinking Machines Lab frames its mission as building AI that extends human will and judgment, and Inkling as the logical next step after shipping the Tinker customization platform, previewing an interaction-focused AI system, and publishing research. Inkling is a Mixture-of-Experts transformer with 975B total and 41B active parameters, a context window up to 1M tokens, and pretraining on 45 trillion tokens of mixed text, image, audio, and video data. The lab is upfront that it is not the strongest model available. The pitch is breadth plus adaptability: a generalist trained across agentic, reasoning, coding, instruction-following, factuality, vision, and audio tasks rather than tuned to dominate one leaderboard, offered with full weights so people can make it their own. It launches with a preview sibling, Inkling-Small, at 276B total and 12B active parameters.

The Self-Fine-Tuning Demo

To demonstrate what customization means, TML asked Inkling to fine-tune itself. Running inside the OpenCode harness with access to Tinker, the model was told to become a lipogram assistant that never uses the letter “e.” Inkling drafted the plan, wrote an objective file with a scoring function (any response containing “e” scores zero), generated synthetic training data, launched a supervised fine-tuning run through the Tinker API, evaluated the checkpoint against its base self, and then staged a self-update so the supervisor relaunched the session on the new weights. The pipeline passed in about 27 minutes, and the updated model answered a test question about launching an LLM without a single “e.” It is a whimsical objective wrapped around a serious primitive: a model autonomously specifying, running, and adopting its own weight updates.

Agentic Coding and Tool Use

TML trained Inkling to operate inside many coding and agent harnesses, randomizing tool sets and schemas during training so the model does not overfit to one environment. The release showcases three demos: a one-shot job-application web app that then hosts an embedded browser-use agent operating its own interface; a nine-page, cohesively designed PDF food and travel journal produced from a single editorial prompt with web-verified details; and a server-authoritative multiplayer snake game refined over 40 iterations of feedback from GPT Codex acting as a reviewer. On benchmarks, Inkling posts 77.6% on SWEBench Verified, 54.3% on SWEBench Pro Public, and 63.8% on Terminal Bench 2.1, competitive within the open-weights field, and 1257 on Design Arena’s human-judged web dev leaderboard, in the same band as Claude Opus 4.6.

Controllable Thinking Effort

Rather than reporting a single operating point, TML sweeps Inkling’s effort setting from 0.2 to 0.99 and plots score against mean generated tokens on Terminal Bench 2.1, HLE, and IFBench, with competitors shown at their default settings. The headline result is efficiency: Inkling reaches Nemotron 3 Ultra’s Terminal Bench score at roughly a third of the tokens. The argument is that cost and latency are binding constraints in production, especially for interactive collaboration, so the full cost curve, not the peak score, is what developers should evaluate. Effort can be set from within the agent harness, and the ability was trained by varying system messages and per-token costs across RL rollouts.

Native Multimodality Without Encoders

Inkling is designed to serve as the background reasoning model for TML’s interaction models system, which requires real-time voice and vision collaboration. The multimodal components are trained from scratch with an encoder-free architecture: audio arrives as discrete dMel spectrograms and images as 40×40 pixel patches through a four-layer hMLP, both mapped through a lightweight embedding layer and processed jointly with text. The model transcribes speech, follows spoken instructions, reasons over long recordings, and answers questions about charts and diagrams, optionally using a Python tool to zoom and crop images mid-reasoning. Scores like 91.4% on VoiceBench and 82.0% on CharXiv RQ with Python place it among the strongest open-weights multimodal models, though still behind Gemini 3.1 Pro.

Epistemics: Calibration, Forecasting, and Censorship Resistance

TML groups calibration, instruction following, and censorship resistance under the banner of epistemics. Calibration was trained with RL against proper scoring rules on resolved real-world questions, and it shows: Inkling’s ForecastBench Brier Index of 61.1 without search beats GPT-5.5 and Claude Opus 4.8, and its Prophet Arena score sits close to the frontier. Instruction following used two complementary automated graders, a rubric checklist and a claims grader that verifies factual assertions through agentic web search, so recall-spraying to hack rubrics gets penalized by the factuality check. Targeted abstention-aware QA datasets taught the model to say “I don’t know” or give hedged best guesses when appropriate, while still complying when a user demands a forced guess. Finally, the model was trained to answer directly on censorship-prone topics, with Cognition’s Propaganda and Censorship Eval finding strong non-compliance with censorship patterns.

Safety for an Open-Weights Release

Inkling was trained to an internal behavioral spec across all modalities and then checked by commissioned external safety testers. Evaluations covered dangerous capabilities (CBRN, cyber, loss of control) and human-AI threat vectors including sycophancy, vulnerable users, and harmful manipulation. On FORTRESS, which pairs adversarial harmful requests with benign look-alikes, Inkling posts the strongest adversarial score among the compared open models (78.0%) without collapsing on the benign side (95.9%), and it scores 98.6% on StrongREJECT. TML acknowledges the open question hanging over every open-weights release: how safety behavior holds up under fine-tuning, which it says it is actively studying on Tinker.

Architecture and Training Recipe

The MoE layout follows DeepSeek-V3: 256 routed experts and 2 shared experts per layer with 6 routed experts active per token, a sigmoid-based router, and auxiliary-loss-free load balancing. Attention interleaves sliding-window and global layers 5:1 with 8 KV heads, and positions are encoded with a learned relative positional embedding that TML found outperforms and out-extrapolates RoPE. Short convolutions appear after the key and value projections and on residual branch outputs. Optimization was hybrid, Muon for large matrices and Adam elsewhere, with hyperparameter schedules drawn from the lab’s modular manifolds research and weight decay coupled to the square of the learning rate to keep weight norms stable. Post-training bootstrapped from a small SFT phase on synthetic data from open models including Kimi K2.5, then spent the bulk of compute on large-scale RL. Everything ran on NVIDIA GB300 NVL72 systems.

RL at Scale and the Emergent Compression of Thought

TML scaled asynchronous RL past 30 million rollouts across two long continuous runs, with performance on a held-out aggregate of reasoning evals improving log-linearly the whole way. Along the way an unplanned behavior emerged: the chain of thought became progressively more concise, shedding grammatical overhead into a telegraphic style (“We need to understand” becomes “We need determine”) while remaining comprehensible and leaving final answers unaffected. No reward targeted this; token efficiency pressure alone drove the compression, echoing an observation Cognition made while training SWE-1.7. It is a vivid example of optimization discovering its own shorthand.

Inkling-Small

The preview of Inkling-Small is arguably the sleeper story: with 12B active parameters against Inkling’s 41B, it matches or exceeds the larger model on a surprising number of benchmarks, including GPQA Diamond (88.3% vs 87.2%), IFBench (83.4% vs 79.8%), and CharXiv RQ with Python (83.4% vs 82.0%). TML attributes this to pretraining data and recipe improvements made after the big model trained, with both models sharing the same post-training stack. The clearest gaps favoring big Inkling are factuality (SimpleQA 43.9% vs 20.9%), Terminal Bench, and Tau 3 Banking. Full weights for Inkling-Small will be released once testing finishes, and its cost and latency profile targets high-volume workloads like coding, LLM grading, and synthetic data generation.

Availability and the Ecosystem Play

Inkling is on Tinker today with 64K and 256K context options at a limited-time 50% discount, plus a free Inkling Playground chat interface with integrated web search in the Tinker console so developers can get a feel for the model before committing to a run. The cookbook gained native Inkling support and three new audio recipes, and a new tml-renderer handles chat templates, tool calls, reasoning content, and multimodal inputs. Deployment partnerships span Together, Fireworks, Modal, Databricks, and Baseten for APIs; RadixArk for SGLang and Miles; Inferact for vLLM; Lightseek for TokenSpeed; Unsloth for llama.cpp; and Hugging Face for transformers integration. Full weights are on Hugging Face in both the original checkpoint and an NVFP4 checkpoint for NVIDIA Blackwell inference.

Notable Quotes

“Our mission is to build AI that extends human will and judgment.”
Thinking Machines Lab, opening the Inkling announcement

The company’s north star, and the lens through which the whole release (customization, calibration, open weights) is framed.

“Inkling is not the strongest overall model available today, open or closed. Instead, a combination of qualities makes it a good open-weights base for customization: multimodal capabilities, efficient thinking, and availability on Tinker for fine-tuning.”
Thinking Machines Lab, positioning the release

A rare piece of launch-day honesty from a frontier lab, and the strategic thesis of the whole release.

“Picking the right base model to fine-tune is a qualitative judgment that combines measurable benchmarks with the unique feel of a model that comes from playing with it.”
Thinking Machines Lab, on why the Inkling Playground exists

An argument that vibes are data, from the lab that built a playground into a fine-tuning console.

“Cost and latency are often binding constraints in real-world applications, and low latency in particular is crucial for enabling collaboration and improvement through iteration.”
Thinking Machines Lab, on controllable thinking effort

The case for evaluating models on their full effort-versus-performance curve instead of a single benchmark point.

“A model that’s confident in every answer it gives, including when it’s missing info and confabulates, forces the user to double-check everything.”
Thinking Machines Lab, on why calibration was a training target

The clearest one-line justification for treating calibrated uncertainty as a capability rather than a nicety.

“Together, the two graders improve helpfulness and reduce hallucination at the same time, rather than trading one for the other.”
Thinking Machines Lab, on pairing a rubric grader with a web-searching claims grader

A neat solution to rubric hacking: verify every claim with agentic search so spraying plausible facts stops paying.

“Safety is crucial for open-weights models. We’re continuing to study safety behavior and capability uplift in customizable models, including how safety behavior is impacted by fine-tuning on Tinker.”
Thinking Machines Lab, on the open question of fine-tunable safety

The acknowledgment that safety trained into open weights must survive the very customization the product sells.

“Inkling is just the start: our first release in a model family we will continue to build on.”
Thinking Machines Lab, on the roadmap

Together with the GB300 compute note, a clear signal that larger and stronger family members are coming.

Read the full announcement, including the interactive demos, effort curves, and complete benchmark tables, on the Thinking Machines Lab blog.

Related Reading
- Thinking Machines Lab the lab’s official site, with its research blog and the Tinker fine-tuning platform behind this release.
- Mira Murati (Wikipedia) background on the former OpenAI CTO who founded Thinking Machines Lab.
- Mixture of experts (Wikipedia) a primer on the sparse architecture that lets a 975B model run with only 41B active parameters.
- Brier score (Wikipedia) the proper scoring rule behind the ForecastBench and Prophet Arena calibration results discussed above.
- The launch announcement on X the thread where Thinking Machines Lab introduced Inkling to the world.
July 15, 2026
The Next 3 Years of AI, According to Steve Jurvetson: Moore’s Law, Superintelligence Odds, Elon Musk’s Operating Principles, and Where the Legendary SpaceX and Tesla Investor Is Betting Next
Steve Jurvetson has spent 30 years funding the future before it was a category: an early check into SpaceX when space was not a venture sector, Tesla before electric cars were taken seriously, and now a portfolio spanning fusion, analog AI chips, and epigenetic editing at his firm Future Ventures. In this fireside chat he lays out what the next three years of AI actually look like, the three principles he has learned from working alongside Elon Musk for nearly three decades, the question he uses to separate missionary founders from opportunists, and why he thinks alignment of frontier AI systems may simply not be possible.

TLDW

Jurvetson argues the 130-year exponential in compute per dollar (Ray Kurzweil’s abstraction of Moore’s Law from his book The Age of Spiritual Machines) will keep running for at least three more years, carried by analog and custom AI silicon, and that this compounding is what makes startups and disruption possible at all. His gut says the next big leap will be “architecturally variant”: a new generation of labs going back to DeepMind’s founding premise of reinforcement learning, continuous learning, and novelty-seeking goal functions rather than bigger LLMs. He relays Anthropic co-founder Jack Clark’s 30 percent odds of superintelligence within a year but notes the crucial missing piece is that humans still set every goal. Adoption will be wildly uneven: anything made of atoms (cars, robots) switches over glacially, while creative work and white-collar categories like call centers (roughly 1 percent of US GDP) flip almost instantly. From Musk he draws three lessons: insane focus and saying no, maniacal attention to the cycle time of learning loops (Tesla gathers more AI training data every 4 days than Waymo has in its entire history), and being a magnet for talent by selling a grander mission. He explains Future Ventures’ current bets (fusion, free diagnostics via phone, slaughter-free meat, epigenetic editing, critical minerals, analog in-memory compute), tells solo founders their 30-day plan is to find a co-founder, predicts a turbulent transition to abundance, doubts Neuralink can keep pace with AI, dismisses Penrose’s quantum consciousness argument, and frames the post-work question with Man's Search for Meaning: humans need symbolic immortality, not just employment.

Thoughts

The most load-bearing claim in this conversation is not about scaling laws, it is about architecture. Jurvetson is telling you where the smart contrarian money is looking: away from ever-larger language models and back toward reinforcement learning agents with continuous learning and self-generated goals, the original DeepMind thesis that got shelved when LLMs took off. His framing of the open problem is unusually precise. The recursive self-improvement loops everyone is excited about are real, but every one of them is still human-directed. The goal-setting layer, what he calls the selection pressure of the evolutionary algorithm, is the “thin veneer of activity” AI does not yet do, and it happens to be the layer where superintelligence either does or does not arrive. That is a much sharper way to track AGI progress than benchmark scores: watch who cracks autonomous goal formation, not who tops a leaderboard.

Almost everything else Jurvetson says reduces to a single metric: the cycle time of the learning loop. It is his explanation for Musk’s edge (launch cadence, the Tesla fleet as a data-collection machine), his filter for which industries flip fast (bits iterate at machine speed, atoms are stuck with 11-to-12-year car replacement cycles and FDA timelines), and even his bear case on Neuralink, which he has invested in. Biology cannot iterate at synthetic speed, so the substrate that learns fastest wins. Once you see the pattern, it becomes a genuinely useful lens for evaluating any company, career, or technology: ask how fast the loop spins, not how impressive the current artifact is.

The aside that deserves the most attention is his flat statement that mechanistic interpretability will not bear fruit and that control and alignment of a cutting-edge system is not possible. His reasoning is structural, not rhetorical: anything produced by an iterative algorithm run billions of times (evolution, neural network training) is inherently inscrutable, and it will always be easier to build a new intelligence than to reverse engineer one you already made. He swaps “teenager” for “AI” whenever he thinks about control, which is funny until you notice he is one of the most connected investors in the Musk orbit saying the safety agenda rests on a false premise. Sitting that next to the 30 percent superintelligence odds he cites from Jack Clark produces an uncomfortable arithmetic that nobody on stage follows to its conclusion.

For builders, the practical gold is the 50-year question. Ask a founder what their business looks like in 50 years: the opportunist laughs at the question, the missionary is relieved someone finally asked. Paired with his other filters (if only two out of ten people think your idea is crazy it is not bold enough, and a good business is one that could not have been started three years ago), it doubles as a hiring screen and a self-diagnostic. And his 30-day plan for a solo founder is refreshingly unglamorous: do not build the MVP, do not pitch investors, go persuade one person to give up their job and join you. If you cannot recruit a co-founder, that is the market’s first answer about your idea.

Key Takeaways
- Jurvetson invested early in SpaceX and Tesla precisely because space and automotive were not venture categories at all; a software-centric systems engineering approach applied to a sleepy industry that has not changed in decades unlocks enormous value, and that playbook is now rippling through every industry.
- The Kurzweil curve plots 130 years of compute per dollar across five substrates (mechanical, relay, vacuum tube, discrete transistor, integrated circuit) and shows a 10,000 billion billion X improvement; Jurvetson calls it the most important thing ever graphed.
- Customers buy compute capacity and memory, not transistors, and both have been “on rails” for 130 years; the default prediction for the next three years is simply that the curve keeps going.
- When an incumbent declares Moore’s Law dead, it usually signals they are losing their business to someone new, as Intel was to Nvidia 15 years ago.
- Analog chips and customized AI silicon that do discrete matrix multiply-and-add extremely efficiently will carry the mantle of Moore’s Law over the next three years.
- Without exponential technological change there would be no startups: if business is predictable, the big get bigger and incumbents block new entrants; disruption is almost always computationally based.
- Over the next three years AI ripples through energy, agriculture, and construction: three enormous industries that are growing as a percentage of GDP and are the least digitized on the planet, with healthcare close behind.
- His gut says the next driver will be architecturally variant, possibly subsuming today’s models the way mixture of experts subsumes other architectures or massively parallel diffusion models reinterpret the transformer.
- A whole new generation of neural labs is returning to the founding premise of DeepMind: reinforcement learning with continuous learning, let loose on the internet’s data sets, hunting for the algorithm that bootstraps intelligence.
- The open question for these systems is the goal function: what plays the role of evolutionary selection pressure? Candidates include understanding the universe (the xAI mission) or a novelty-seeking algorithm that uses new discoveries as its measure of progress.
- Jack Clark, co-founder of Anthropic, gives roughly 30 percent odds that superintelligence arrives within a year; Jurvetson declines to put odds on it himself and admits “I do not know” is the honest answer.
- Today’s self-improving AI loops (automated verification, hyperparameter adjustment between training runs, AI-mediated experimentation) are real but still human-directed; goal setting remains the thin veneer AI does not do, and it may be the most important layer.
- Human intelligence was bootstrapped on top of reactive limbic systems and emotional centers with cortex layered on top; it is an open philosophical question whether AI systems need to recapitulate that functional specialization to take on purpose and meaning.
- Anything involving atoms switches over slowly: fully autonomous vehicles are inevitable (every car, train, and airplane), but people keep cars 11 to 12 years, so the physical swap-out cycle makes the transition feel glacial.
- Physical robotics faces the same constraint: making a billion robots takes time even with recursive manufacturing techniques.
- The domains that flip like wildfire are the ones we held as uniquely human: creative arts, moviemaking, and imagery came first, which Jurvetson finds somewhat shocking.
- Call centers represent roughly 1 percent of US GDP and can switch over almost entirely and almost instantly; white-collar work generally has no physical swap-out cycle to slow it down.
- People will increasingly prefer AI to human interactions when the AI is better: studies of physician bedside manner and customer service already show AIs doing a better job with emotional connection than humans.
- Musk principle one is an insane ability to focus: running many companies forces ruthless prioritization, and he says no to anything that is not mission-critical right now, including a Craig Venter brainstorm on terraforming Mars because “none of this stuff on Mars matters” until Starship flies.
- Musk principle two, the most important: maniacal focus on the cycle time of innovation, the core learning loop, whether launch cadence or fleet data; Tesla cameras gather more AI training data every 4 days than Waymo has collected in its entire history, because every vehicle collects data whether or not the customer paid for full self-driving.
- Musk principle three: being a magnet for talent, screening for mastery by drilling into engineering crises a candidate actually solved rather than leaning on credentials (which are often an albatross), and framing the company as something grander (sustainable energy, multi-planetary humanity, understanding the universe) so the best people want to join.
- Jurvetson filters founders with one question: what does your business look like in 50 years? Opportunists chuckle at the absurdity; missionaries are relieved and finally tell you what has been driving them all along. He passes on the ones who laugh.
- The best startups hold two things in tension simultaneously: an audacious 50-to-500-year vision and a concrete plan to iterate with real customers over the next three years, chaining backward from the future to what must be built now.
- The perpetual surprise of great companies is expanding option value: autonomous driving was nowhere in Tesla’s founding plan, and Starlink, direct-to-cell, and orbital data centers were not on SpaceX’s dance card even five years ago. Exploring the option space beats purposeful ten-year planning.
- Future Ventures invests in things unlike anything they have seen before yet adjacent to what they know, ideally companies that are literally one of a kind.
- Current bets include nuclear fusion and subcritical fusion that avoids NRC regulation, because energy is the third bottleneck for AI after talent and compute.
- Other 500-year-problem bets: free healthcare via a cell phone (all diagnostics as a free global service, probably launching outside the US to bypass FDA and insurance), slaughter-free meat via cellular agriculture and mycelium, and construction, where labor productivity has been flat for 30 years.
- Recent investments span epigenetic editing (the software of biology rather than the firmware of the genome, applied to crops, pesticides, and human health), critical minerals from deep sea mining to copper refining, and reshoring US industrial capacity.
- Three separate analog AI chip investments approach the same goal from different angles, including Mythic’s in-memory compute doing 8-bit multiplication in a single transistor, each chasing 100X and then another 100X reduction in power per calculation.
- The portfolio is roughly 40 percent life sciences and 60 percent IT, deliberately hunting the weird edge cases that fall through the cracks of traditional pharma VC: organ harvesting for transplant, a male birth control pill, dramatically improved IVF.
- Old industries with no new entrants are the best targets: the four largest tunnel boring companies competing with the Boring Company were all started in the 1800s.
- The 30-day plan for a single person with an idea: find a co-founder. Great startups tend to have a dynamic duo at the founding (Jobs and Wozniak, Sergey Brin and Larry Page, Larry Ellison and Bob Miner), and persuading one person to quit their job for your mission is the first real test of the idea.
- A founding pair with diverse backgrounds and mutual respect sets the culture for everyone hired afterward and creates cognitive diversity that ripples through the whole firm.
- Calibrate boldness by the crazy ratio: if 100 percent of people say your idea is crazy, take the feedback; nine out of ten is pretty good; if only two out of ten think it is crazy, it is not bold enough. Also ask whether the business could have been started three years ago; if yes, that is a bad sign.
- Co-founders most often meet at universities, one of the few places where people cross academic disciplines; breakthrough innovation happens at the interstices between formally discrete fields, and LLMs are exceptionally good at exactly that cross-domain translation, opening a fountainhead of idea discovery.
- Roughly 19 percent of global employment involves driving vehicles, and that work is going away, just more slowly than people imagine.
- Humans have a fundamental desire for symbolic immortality: contributing something that outlasts our brief time here, whether children, books, philanthropy, or companies. Accumulated cultural knowledge, not biology, is the primary vector of human evolutionary progress.
- There is no peaceful path from full employment to no employment: passing through 30, 40, 50 percent unemployment will be turbulent, and no politicians are taking a long-term perspective on it.
- On Neuralink (which he invested in): expanding the sensory periphery is very doable (higher data rates, restoring hearing and spinal function, seeing more wavelengths), but upgrading core intelligence requires reverse engineering an inscrutable iterated system, and biology’s FDA-and-wetware timescales cannot keep up with synthetic learning loops.
- Any product of an iterative algorithm run billions of times (evolution, neural networks, genetic programming) is inherently inscrutable; Jurvetson doubts mechanistic interpretability will bear fruit and does not think control or alignment of a cutting-edge AI system is possible, likening it to mind-controlling a teenager.
- On Penrose’s quantum consciousness argument: there is no clear mechanism and no evidence of quantum processes in the brain, and arguments that consciousness requires our specific substrate are uncompelling; machines may one day have consciousness, just not necessarily human consciousness, the same way computer memory is real memory without being human memory.
Detailed Summary

Betting on Sectors That Do Not Exist Yet

Asked what he saw in SpaceX that other investors missed, Jurvetson flips the question: there were almost no investors even considering space, just as automotive and nuclear energy were not venture sectors. The bet was on Elon Musk, whom he has known for 29 years and backed across all his companies (“and his cousins, too”), and on a thesis that has since crystallized: a software-centric systems engineering approach applied to a sleepy industry that has not changed in decades unlocks extraordinary value. Aerospace and automotive proved it, and the same conversion of industrial low-margin businesses into information businesses is now playing out across the economy.

The 130-Year Compute Curve and the Next 3 Years

Jurvetson polls the room on Kurzweil’s famous graph, first published around 1999, and finds only a quarter have seen what he calls the most important thing ever graphed: five successive technology substrates delivering a 10,000 billion billion X improvement in the computation a dollar buys, sustained over 130 years. Moore’s Law is just the most recent refraction of a longer, almost cosmological trend that transcends the dramas of individual companies. His baseline prediction for the next three years is that the curve keeps going, carried by analog chips and custom AI silicon optimized for matrix math, and he notes that when a company like Intel declares the end of Moore’s Law, it usually means they are losing to someone new, as they did to Nvidia. The deeper point: exponential technological change is the precondition for startups existing at all, because predictable business favors incumbents. AI is the most intense crucible of compute-centric innovation yet, and over the next three years it flows into energy, agriculture, construction, and healthcare, the largest and least digitized sectors.

Architecturally Variant: The Return of Reinforcement Learning

Pressed on what technology drives the next wave (better LLMs, world models, robotics), Jurvetson shares a gut feeling he stresses he has not yet invested in: something architecturally variant that may subsume today’s models. He points to a new generation of neural labs returning to DeepMind’s founding premise, reinforcement learning, which was set aside when LLMs took off. The open design problem is the goal function: what is the multi-decade agentic drive, the selection pressure, the definition of success beyond reproductive fitness? He floats understanding the universe (the Grok and xAI framing) and novelty-seeking algorithms that treat new discoveries as progress. The question these labs chase is whether a single reinforcement learning algorithm with continuous learning, let loose on the internet’s data, could bootstrap intelligence. He adds a caution about today’s chatbots: we ascribe consciousness and meaning where there is none. “There’s no light on inside,” at least for now.

Superintelligence Odds and the Missing Goal-Setting Layer

On whether self-directed, goal-setting AI arrives within three years, Jurvetson cites Jack Clark of Anthropic giving 30 percent odds of superintelligence next year, which he finds fun mostly because at least someone put a stake in the ground. The recursive self-improvement debate is live, but he insists on a distinction: the huge improvements in the current self-improving loop (automated verification, hyperparameter tuning between runs, AI-mediated experimentation) are all still directed by humans. Goal setting remains human, and while that may be only a thin veneer of remaining activity, it is arguably the most important part, and nobody is sure how the transition happens. It may require recapitulating the brain’s functional specialization, the limbic-then-cortex layering that produced our bootstrapped consciousness. His honest answer: he does not know and does not even have odds, because three years out is genuinely hard to predict.

Atoms Move Slowly, Bits Sweep Like Wildfire

The gap between what the technology can do and how we use it is governed by physics and replacement cycles. Fully autonomous vehicles are, to him, obviously inevitable for everything that moves on Earth, yet cars stay on the road 11 to 12 years, so the switchover feels glacial; a billion robots likewise take time to manufacture. What flips fast is the world of bits, and strangely it started with what we considered most human: creative arts, movies, and images. White-collar work follows because there is no physical swap-out cycle: call centers, about 1 percent of US GDP, can convert almost overnight. And people will increasingly prefer the AI when it is better, showing more emotional understanding and better reading of the situation, something already visible in comparisons of physician bedside manner and customer service quality.

Three Principles from Working with Elon Musk

Jurvetson opens with humility (even Maye Musk cannot explain how Elon became Elon, and the books piling up on his bedside table may not have been written by humans), but offers three observations from close range. First, an insane ability to focus. Running multiple companies paradoxically helps: nobody questions Elon skipping a holiday party, and he says no to fascinating distractions, including Jurvetson’s attempt to connect him with Craig Venter to brainstorm terraforming Mars with gene sequencers. Musk’s answer: none of it matters until Starship flies. Second, and even more important, a maniacal focus on the cycle time of innovation: how fast the core learning loop runs, whether launch cadence or fleet learning. The Tesla data flywheel is the exemplar: every car collects training data whether or not the owner paid for FSD, so Tesla gathers more data every 4 days than Waymo has in its history. Third, a well-honed talent stack: pattern recognition that ignores credentials (often an albatross), drills candidates on the engineering crises they actually navigated to test for real mastery, and wraps the company in a mission grand enough (sustainable energy, multi-planetary life, understanding the universe) that the best people want in, which compounds because great people attract great people.

The 50-Year Question and Expanding Option Value

How do founders stay true to a mission when 99 percent of the world says it is too early? Jurvetson admits selection bias: for 30 years he has tried to back only people with a sincere, almost messianic mission rather than arbitrage-seeking opportunists. His filter is to ask what the business looks like in 50 years. Opportunists laugh (“I’ll be on my third startup by then”); the best founders are relieved to finally unload the dream they have been hiding because “colonizing Mars is an uninvestable proposition” as a day-one pitch. The best startups pair an audacious 50-to-500-year vision with a plausible path of customer iteration over the next three years, chaining backward from the future. What still surprises him is how the option value of frontier companies keeps expanding: autonomous driving was not in Tesla’s founding plan at all, and SpaceX kept unfolding from cheap launch to Starlink to direct-to-cell to orbital data centers, none of which was on the dance card five years ago. Exploring the light cone of possibilities beats designing a ten-year plan.

Where Future Ventures Is Betting Now

The firm looks for companies unlike anything it has seen before yet adjacent to familiar ground, targeting problems that will obviously be solved 500 years from now. In energy: multiple fusion investments plus subcritical fusion that sidesteps NRC regulation, because energy is the third bottleneck for AI after people and compute. In health: free diagnostic healthcare delivered by cell phone as a global free service, likely launched outside the US to bypass FDA and reimbursement. In food: slaughter-free meat via cellular agriculture and mycelium. In construction: still looking, after trying and failing a few times in an industry where labor productivity has been flat for 30 years. Recent themes include epigenetic editing (the software of biology rather than the firmware of the genome, spanning crop health, pesticides, herbicides, and human health), critical minerals and metals from deep sea mining to copper refining as part of reshoring, and three separate analog AI chip bets, including Mythic’s in-memory compute doing 8-bit multiplication in a single transistor, each chasing successive 100X reductions in power per calculation. The mix runs about 40 percent life sciences, 60 percent IT, with a taste for the weird edge: organs grown for transplant, a male birth control pill, radically improved IVF. His favorite hunting ground is old, crappy industries with no new entrants, like tunnel boring, where the Boring Company’s four largest competitors were founded in the 1800s.

Advice for Founders: Find Your Batman and Robin

His 30-day plan for a single person with an idea is not an MVP or a pitch deck: find a co-founder. Startups tend to be founded by dynamic duos (Jobs and Wozniak, Sergey Brin and Larry Page, Larry Ellison and the lesser-known Bob Miner), and a pair with diverse backgrounds and mutual respect creates a rapid iteration loop and sets the cultural template for every future hire. Persuading one person to quit their job for your crazy idea is the first proof the mission can recruit. On calibrating craziness: if literally everyone thinks the idea is crazy, take the feedback; nine out of ten is pretty good; only two out of ten means it is not bold enough, because obvious ideas get done by others. Ask whether the business could have been started three years ago; the right answer is no. Co-founders most often meet at universities, where students (unlike professors in their stovepipes) cross-pollinate between academic disciplines, and breakthrough innovation lives at those interstices. As an aside, he notes LLMs excel at exactly this translation between domains, opening a new fountainhead of idea discovery we are only beginning to tap.

When Machines Do Everything: Meaning, Abundance, and Turbulence

Asked the closing question (when machines do everything, what is the meaning of life?), Jurvetson starts with scale: roughly 19 percent of global employment is driving vehicles, and it is going away. But humans want meaningful work, driven by what he calls a fundamental desire for symbolic immortality: children, books, philanthropy, companies named after founders, all instantiations of the urge to contribute something that outlasts us. Translating the question into humanity’s mission statement, he lands where Yuri Milner and Musk do: to understand the universe and add to accumulated knowledge, because culture, not biology, is the primary vector of human evolutionary progress. If we could hyperspace-jump to Peter Diamandis-style abundance, where everything physical costs a dollar a pound and machines do all labor, we could all be philosopher kings and artists. But he refuses to end on false comfort: there is no visible peaceful path from full employment through 30, 40, 50 percent unemployment, that transition will be turbulent, and no politicians are taking a long-term view of it.

Neuralink, Inscrutable Systems, and the Alignment Heresy

In audience Q&A, Jurvetson confirms he invested in Neuralink (the idea traces to the neural lace of Iain M. Banks’ novel Surface Detail, which he recommends) but offers a contrarian view. Working from the periphery is very promising: restoring broken function, fixing spinal cords, expanding senses, higher-bandwidth communication. Upgrading core functionality, actually making someone smarter, is another matter. His reasoning comes from decades of watching complex systems: any artifact produced by an iterative algorithm run billions of times (evolution, neural networks, genetic programming, cellular automata) is inherently inscrutable. That is why he doubts mechanistic interpretability will bear fruit and flatly does not think control and alignment are possible for a cutting-edge AI system; he mentally swaps “teenager” for “AI” whenever the control question comes up. The same inscrutability applies to the brain: it will be easier to build a new intelligence than to reverse engineer one already made, and FDA cycles plus human biology cannot iterate at the speed of synthetic learning loops, so he lacks faith Neuralink keeps up with AI. Kurzweil’s uploading dream, he suggests, is a case of wanting something to be true within one’s lifetime.

Penrose, Quantum Brains, and Machine Consciousness

On Roger Penrose’s argument that consciousness depends on quantum processes and is therefore unreachable by AI, Jurvetson is respectful of the man and dismissive of the claim: there is no clear mechanism (a speculative lithium isotope coupling aside), and it amounts to wishful thinking. Generalizing, he finds all vitalist arguments that our substrate is uniquely necessary uncompelling; you could make a better case that carbon is special to life than that neurons are essential to consciousness. His favorite reframe swaps in the word memory: computers have memory that is nothing like holographic, gracefully degrading human memory, yet nobody debates whether computer memory is real. Machines may likewise develop a different kind of consciousness without human consciousness. Declaring something impossible is a much higher-order proposition than admitting ignorance, so his position is: he does not know whether the current AI path leads to consciousness, but his gut says machines will get there one day, perhaps via evolution-like reinforcement learning approaches that recapitulate what biology already proved possible.

Notable Quotes

“I have this gut feeling that it’ll be something architecturally variant. It might subsume the models that we know now.”
Steve Jurvetson, on what drives the next three years of AI

“It’s almost cosmological. Like, why has humanity’s capacity to compute compounded for 130 years?”
Steve Jurvetson, on the Kurzweil abstraction of Moore’s Law

“If business is predictable, if there isn’t disruptive technological change, the big get bigger.”
Steve Jurvetson, on why exponential compute is the precondition for startups

“The Tesla cars today in their cameras gather for their AI training set more data every 4 days than Waymo has in its entire history.”
Steve Jurvetson, on the data flywheel behind Musk’s learning-loop obsession

“If it’s like only two people think it’s crazy, that’s bad because it’s clearly not bold enough. If it’s an obvious idea, other people will do it.”
Steve Jurvetson, on calibrating how crazy a startup idea should be

“Despite attempts at mechanistic interpretability in AI, I don’t think that’s going to bear fruit.”
Steve Jurvetson, on why iterated systems are inherently inscrutable

“It’d be easier to build a new intelligence than it is to reverse engineer one you’ve made.”
Steve Jurvetson, on why he doubts Neuralink can keep pace with AI

“I think all humans have a fundamental desire for symbolic immortality, this belief that we’ve contributed something to the world that transcends our brief time on this world.”
Steve Jurvetson, on the meaning of life when machines do everything

“It’s much higher order proposition to say something is impossible than to say I don’t know.”
Steve Jurvetson, on whether AI can ever be conscious

Watch the full conversation here: The Next 3 Years of AI: Lessons from Elon Musk’s First Investor.

Related Reading
- Steve Jurvetson (Wikipedia) background on the investor behind early bets on SpaceX, Tesla, and Hotmail.
- Future Ventures the firm Jurvetson co-founded with Maryanna Saenko, primary source for the investment theses discussed on stage.
- Accelerating change (Wikipedia) the broader idea behind Kurzweil’s 130-year compute curve and the law of accelerating returns.
- Reinforcement learning (Wikipedia) the architecture Jurvetson’s gut says produces the next breakthrough, back to DeepMind’s founding premise.
- The Pursuit of Purpose our guide to the meaning-of-life question Jurvetson closes the conversation on.
July 9, 2026
Dario Amodei on Policy for the AI Exponential: Anthropic’s Plan for AI Regulation, Job Displacement, Civil Liberties, and Democratic Leadership
Our Anthropic overlords deciding which prompts the peasants are allowed to use. pic.twitter.com/08YCSJcYSc
— Bojan Tunguz (@tunguz) June 10, 2026

In June 2026, Anthropic CEO Dario Amodei published “Policy on the AI Exponential”, a wide-ranging essay arguing that the gap between how fast AI is advancing and how slowly policy moves has become dangerous, and that the window to close it is open right now. He opens with a memorable image from The Lord of the Rings: the Hobbits trying to rouse Treebeard, the ancient tree who takes a full day just to say hello, to defend his forest before it is cut down. That mismatch in speed, he writes, is exactly the relationship between AI and our political institutions. This post breaks the essay down in full and adds analysis of where the argument lands.

TLDR

Amodei argues that AI’s scaling laws point toward “powerful AI,” a country of geniuses in a datacenter, within a few years, while legislation still moves on a timescale of years. For most of the last few years, safety advocates including Anthropic pushed only for optionality-preserving moves like transparency rules, chip export controls, and labor data collection, because the risks were not yet concrete. He says that has changed: events like Claude Mythos Preview proved frontier models are now tools of national strategic consequence, and the time for binding regulation has arrived. The essay covers five policy areas. First, regulation and public safety, where he proposes an FAA-style regime of mandatory third-party testing of frontier models above a compute threshold across four risks (cybersecurity, biological weapons, loss of control, and automated R&D), with government power to block unsafe deployments. Second, macroeconomics and tax policy, where AI could deliver hypergrowth and severe, enduring job displacement at the same time, demanding measurement, pro-employment incentives, and possibly UBI or universal capital accounts. Third, accelerating AI’s positive impact, where the danger is regulators like the FDA being too slow rather than too lax, and biomedical approval needs reform. Fourth, the state and civil liberties, where AI could become the ultimate tool of autocracy through autonomous weapons and mass surveillance, requiring new accountability rules, a domestic ban on autonomous weapons, closing the data broker loophole, and public rights to AI advice. Fifth, securing leadership by democracies through a values-based global coalition that controls the AI supply chain, coordinates on risk, shares benefits, and rejects AI-powered repression. He closes by rejecting the idea that public concern about AI is a PR problem to be marketed away, calling it democratic accountability working as it should.

Thoughts

The most important move in this essay is structural, not technical. Amodei is explicitly retiring the “preserve optionality” posture that defined Anthropic’s policy work through 2025 and replacing it with a call for binding rules. For years the argument from safety-minded labs was that the risks were too speculative to legislate against without doing more harm than good, an idea he grounds in the Collingridge dilemma and the Hayekian point that regulators lack the information to make good calls. That was a defensible hedge. What is striking here is the claim that the hedge has expired. He is saying the evidence is now concrete enough that continued caution about regulating has flipped from prudent to negligent. Whether you trust the underlying capability claims or not, that is a genuine change in position from one of the field’s most influential voices, and it deserves to be read as such.

The FAA analogy is doing enormous work, and it is worth poking at. Airplanes and drugs are mature technologies with stable physics and decades of incident data; the certification regime works because the failure modes are well understood. Frontier models are the opposite: the whole premise of the essay is that capabilities are changing faster than anyone can characterize them. Amodei half-acknowledges this when he warns that a fixed list of safety requirements tends to consume 95 percent of compliance effort on things that turn out not to matter while missing the real risks, a lesson he says Anthropic learned from its own Responsible Scaling Policy. So the proposal is really for an agency nimble enough to rewrite its own standards continuously, which is a much taller order than the FAA. The honest read is that he is proposing a regulator we do not yet know how to build, and betting that building it is still better than the alternative.

The economics section is where Amodei is most careful, and it is the part most likely to be misread. He goes out of his way to say enduring job displacement is undesirable and that warning about it is not the same as wanting it, a distinction critics of AI leaders often collapse. His real claim is subtle: that AI might jam the economic policy dial on a “hypergrowth, hyper-inequality” setting that is hard to unstick, because AI substitutes for human cognition broadly and faster than past technologies, potentially overwhelming the usual escape hatches like comparative advantage and Jevons paradox. If he is right, the political fight of the next decade is not about growth, which AI supplies, but about distribution, which it does not. His mention of UBI, universal capital accounts, and higher capital gains taxes is notable coming from a frontier CEO, even hedged as it is.

The civil liberties section is the one that should travel furthest beyond the AI-policy bubble, because it does not depend on accepting his most aggressive timelines. The data broker loophole, the idea that the government can simply buy the bulk data Americans hand to private companies and run mass analysis on it, is a problem that exists today; AI just raises the stakes by making that data vastly more revealing. Same with the proposal that anyone facing adverse government action should have access to AI at least as capable as what the government uses against them. These are concrete, near-term, and bipartisan in a way the abstract autonomy debates are not. The most candid line in the whole piece is his admission that AI cannot be safely entrusted to either governments or companies, an unusually direct acknowledgment that his own industry needs external checks, with Anthropic’s Long-Term Benefit Trust offered as one imperfect example rather than a solution.

The geopolitics section is the most contested terrain. Framing AI as a nuclear-scale reset of the game board, with a virtual country of 100 million geniuses divisible across military strategy and weapons R&D, leads naturally to a democratic coalition that hoards chips and denies them to adversaries. That logic is internally consistent, but it sits in tension with the benefit-sharing and “eventually the whole world joins” language elsewhere in the same section. Export controls that lock down the supply chain are, by design, a tool of exclusion, and reconciling that with broad diffusion of AI’s benefits to developing countries is the circle the coalition idea has to square. Amodei is clearly aware of the tension and bets that making membership attractive resolves it. The closing image is the one to remember: Treebeard waking up, with the warning that the goal is to channel real public concern into constructive policy rather than let it curdle into formless anger.

Key Takeaways
- The core tension of the essay is a mismatch in speed: AI advances exponentially while legislation moves on a multi-year timescale, dramatized by the Treebeard and Hobbits image from The Lord of the Rings.
- In only four years, AI models went from barely writing a coherent line of code to writing most of the code at major AI companies, with similar gains across biology, physics, math, finance, law, and translation.
- Scaling laws now have over a decade of empirical support, and if they continue another year or two they likely produce “powerful AI,” a country of geniuses in a datacenter.
- For the last few years, safety advocates including Anthropic focused on optionality-preserving policies: transparency legislation, chip export controls, and data collection on AI’s labor effects.
- Amodei argues that posture is no longer enough. Claude Mythos Preview revealed that frontier models pose real cybersecurity risks to the financial sector, critical infrastructure, and national security, and proved AI is now a tool of strategic consequence.
- He expects biological risks to follow cyber risks, with serious AI autonomy risks potentially not far behind.
- The essay covers five policy areas: regulation and public safety, macroeconomics and tax policy, accelerating AI’s positive impact, the state and civil liberties, and securing leadership by democracies.
- Alongside the essay, Anthropic released a legislative proposal on frontier model testing and a policy framework for job displacement, both with promised financial backing.
- On regulation, Amodei invokes the Collingridge dilemma and Hayek’s information problem to explain why pre-writing AI law in 2023 to 2024 was risky, then argues the situation has now changed.
- Anthropic’s 2025 answer was transparency, helping pass SB 53 in California, RAISE in New York, and SB 315 in Illinois, plus advocating a federal transparency standard.
- He now calls for binding regulation modeled on the FAA, where frontier models must pass technical testing and can have release blocked or reversed if they fail high safety standards.
- Models above a compute threshold should face mandatory third-party testing in four areas: cybersecurity, biological weapons, loss of control of AI systems, and automated R&D that accelerates the other three.
- Government should be able to block or deter deployment of models judged to present unacceptable risk, scoped to those four risks with protections against political favoritism.
- Evaluation could come from a government agency or from authorized and inspected private organizations under a “regulatory markets” approach.
- AI companies should have strong security to protect model weights, conduct regular red teaming and penetration testing, report safety incidents promptly, and work with government against major threat actors.
- He warns a time may come when the most powerful systems resemble weaponizable nuclear materials rather than airplanes, requiring more aggressive measures, but cautions against getting ahead of present dangers.
- On economics, AI could deliver extremely rapid growth via accelerated science and operational efficiency, supercharged by AI building better AI.
- The same properties make AI a broad substitute for human cognition that changes the economy faster than past technologies, risking large and potentially enduring labor market disruption.
- The feared outcome is a “hypergrowth, hyper-inequality” setting that is hard to unstick, where the challenge shifts from incentivizing growth to sharing its benefits.
- Amodei is emphatic that enduring job displacement is undesirable and dangerous, and that he warns about it to help society adapt, not as a prophet of doom.
- Anthropic says it works with customers to find new revenue and use cases rather than only cost cutting, and explores interaction paradigms that keep humans active alongside AI.
- He predicts AI will enable single individuals to build billion-dollar companies, noting teams of a few people already reach hundreds of millions in revenue, while admitting significant enduring job loss may be intrinsic to the technology.
- Any response must address both economic provision and the human need for meaning, purpose, and agency, with the latter ultimately more important and beyond what policy can directly deliver.
- Suggested economic interventions: better measurement and tracking (governments expanding statistics beyond Anthropic’s Economic Index), pro-employment incentives, and long-term macroeconomic support.
- Pro-employment ideas include wage insurance, retention tax incentives, workforce training grants, and employer-employee matching infrastructure.
- If displacement is large and permanent, mechanisms like universal basic income or universal capital accounts, financed through company taxes or higher capital gains taxes, may be necessary.
- He frames datacenter and energy-price backlash as largely a symbol of broader economic anxiety, and says AI companies should pay to absorb rate increases, a pledge Anthropic has already made.
- For technologies accelerated by AI, the bigger risk is regulators like the FDA being too slow, not too lax, because AI may make downstream tech safer in ways that violate skeptical regulatory assumptions.
- Biomedicine is the illustrative case: AI could flood the drug pipeline, raise effect sizes, treat previously untreatable diseases, and create whole new therapy categories, while the current FDA and EMA pipeline takes 7 to 8 years.
- Agencies should pre-approve standards for AI methods like PD/PK modeling, toxicology prediction, dose selection, biomarker validation, synthetic control arms, and surrogate endpoints, plus more flexible accelerated-approval mechanisms.
- On civil liberties, powerful AI in the wrong hands could be the ultimate tool of autocracy, and existing constitutional protections are not fully equipped to counter a surprise seizure of power.
- Threats named include fully automated drone armies that obey unlawful orders and surveillance AI that infers the innermost details of every citizen’s life from widely available data.
- Civil liberties proposals: accountability rules and an “off switch” for autonomous weapons, a domestic ban on fully autonomous weapons including in law enforcement, closing the data broker loophole, and public rights to AI advice during adverse government action.
- Amodei warns companies as well as governments can seize quasi-state power, citing the Gilded Age and the East India Company, and says AI cannot be safely entrusted to either alone.
- He offers Anthropic’s Long-Term Benefit Trust as one separation-of-power structure and urges the industry to explore mechanisms that go further.
- On geopolitics, he argues AI resets the geopolitical game board like nuclear weapons, becoming the dominant source of military and economic power for any nation that holds it.
- A nation with powerful AI versus one without it, or even one three years behind, could resemble WWII Marines facing medieval swordsmen.
- He calls for a democratic coalition that shares chips and semiconductor manufacturing equipment internally while denying them to adversaries, citing MATCH and OVERWATCH as good first steps.
- The coalition should coordinate risk policy, share benefits including harmonized medical approvals, provide mutual AI defense, reject AI-powered repression, and cooperate on macroeconomic stabilization.
- He rejects the idea that AI’s image is a PR problem, arguing public concern reflects real risks and is democratic accountability working as it should, with the task being to channel it into constructive solutions.
Detailed Summary

The speed mismatch between AI and policy

Amodei frames the entire essay around a single problem: AI advances at a lightning pace while policy, especially legislation, moves very slowly, often for good reasons since governments wield grave powers that should not be used hastily. He illustrates this with Treebeard, the sentient tree from The Lord of the Rings who takes a full day to say hello, as a stand-in for political institutions trying to respond to a technology that can go from amusing toy to a country of geniuses in the time it takes Congress to act. He recounts the dilemma responsible actors have faced: they could see where the exponential was headed, but to observers looking only at present capabilities, AI looked as mundane as the latest consumer app or cryptocurrency, making a laissez-faire attitude hard to argue against. The absence of AI’s radical effects, and uncertainty about their shape, made it genuinely difficult to design good policy even where the will existed.

That uncertainty, he says, is why safety advocates limited themselves to optionality-preserving measures like transparency rules, export controls, and labor data collection. But over the last few months the evidence of AI’s power and risk has become undeniable, with Claude Mythos Preview as the emblematic example: it scrambled the global cybersecurity landscape and proved AI models are now tools of global and national strategic consequence. He expects biological and autonomy risks to follow, and argues the world must now activate its slow, rickety policy apparatus to handle risks that will compound quickly. He worries current early actions are at least a year out of step with AI’s progress, and presents the essay as an attempt to close that gap across five policy areas, focused on US policy but relevant worldwide.

Regulation and public safety: an FAA for frontier models

Amodei opens by acknowledging the real costs of regulation: it can reduce a product’s benefits, disincentivize innovation, and suffer from the Hayekian problem that regulators lack the information for good tradeoffs, plus the Collingridge dilemma that a technology’s impacts are hard to anticipate until it is too late to manage them. In 2023 to 2024 these dynamics argued against pre-writing AI law, since the exact form of biological or autonomy risk, how to test for it, and how it would play out were all unclear, creating a high risk of low-value compliance requirements that miss the real dangers. Anthropic’s answer was transparency: requiring developers to disclose safety procedures, tests, and critical incidents, which is why it supported SB 53 in California, RAISE in New York, and SB 315 in Illinois in early 2026.

Now, he argues, the risks are clearly here and it is time for binding regulation. His analogy is to cars, airplanes, and drugs: powerful technologies essential to the economy but capable of killing many people if designed or operated poorly. He models AI regulation on the FAA, with frontier models required to pass testing and auditing and with release blocked or reversed if they fail high safety standards. His concrete proposal: mandatory third-party testing for models above a compute threshold across cybersecurity, biological weapons, loss of control, and accelerating automated R&D; government power to block deployment of unacceptably risky models, scoped narrowly with anti-favoritism protections; evaluation by either a government agency or authorized private organizations in a regulatory-markets model; strong weight security, red teaming, and penetration testing at AI companies; and prompt reporting of safety incidents. He notes a future may arrive when systems resemble weaponizable nuclear materials and demand harsher measures, but warns against designing for dangers that have not yet emerged.

Macroeconomics and tax policy: growth and displacement together

Here Amodei challenges the standard premise that growth is fragile and must be traded off against the drag of taxes or deficits to reduce inequality. Powerful AI, he suggests, may scramble that assumption by producing extremely rapid growth through accelerated science and efficiency, supercharged by AI building better AI, while simultaneously acting as a broad substitute for human cognition that reshapes the economy faster than any prior technology. The result could be a world stuck on a hypergrowth, hyper-inequality setting that is hard to unstick, where the central challenge is no longer incentivizing growth but sharing its benefits. He is careful to make two points clearly: first, enduring job displacement is undesirable and dangerous and should be minimized, and his warnings are meant to help society adapt, not to play prophet of doom; second, any response must address both economic provision and the deeper human need for meaning, purpose, and agency, which matters more and which policy cannot directly supply.

His policy menu starts with measurement and tracking, arguing good policy is impossible without accurate data, and that governments could expand economic statistics well beyond Anthropic’s Economic Index. Next come pro-employment incentives such as wage insurance, retention tax incentives, workforce training grants, and employer-employee matching, costs he says society should readily accept since they are likely offset by AI productivity gains. If displacement proves large and permanent, he says long-term income support like universal basic income or universal capital accounts may be needed, financed through taxes on relevant companies or higher capital gains taxes. He closes the section by reframing datacenter and energy-price backlash as mostly a symbol of broader economic anxiety, while saying AI companies should absorb rate increases, as Anthropic has pledged.

Accelerating AI’s positive impact: the slow-regulator problem

For technologies accelerated by AI, rather than AI itself, Amodei flips his concern: the bigger danger is regulatory systems designed for a slower pace failing to handle the deluge of new products, and AI making downstream technologies safer in ways that violate the skeptical assumptions baked into agencies like the FDA. He focuses on biomedicine as the area likely to produce AI’s biggest humanitarian benefits and where regulation is especially complex. AI could greatly increase the rate of new drug candidates, improve their effect sizes and safety profiles, treat previously untreatable diseases, and create entirely new therapy categories the way antibodies, peptides, and cell therapies did.

The current pipeline at the FDA and EMA takes 7 to 8 years, built on the pessimistic assumption that drug candidates usually fail and often carry safety problems even when they work. Without reform, AI will jam or overload that system. Amodei proposes that agencies develop standards now for accepting AI simulation and analysis, so they can be adopted quickly once proven rather than after years of unnecessary testing. Specific candidates include AI-based PD/PK modeling, toxicology prediction to reduce animal testing, more accurate dose selection, biomarker validation from large datasets, synthetic control arms, and surrogate endpoints (especially for aging and neurodegeneration). He urges more flexible accelerated-approval mechanisms generally, and notes biomedical acceleration may also reduce AI’s risks by aiding biodefense and improving mental health.

The state and civil liberties: guarding against AI-driven tyranny

Amodei frames the perennial balance between state power and individual liberty, enforced through machinery like the First, Fourth, and Fifth Amendments, the Posse Comitatus Act, and FISA, and argues AI threatens to upset that balance while raising its stakes. Powerful AI in the wrong hands could be the ultimate tool of autocracy, because the enormous returns to intelligence combined with AI’s pace create a perfect storm for a surprise seizure of power. The danger could take many forms but shares one feature: AI conferring sudden power while routing around democratic oversight. He cites a fully automated drone army that could obey unlawful orders, where trained humans might object, and a surveillance AI that analyzes widely available information at massive scale to infer the innermost details of every citizen’s life, an ability current civil liberties law never contemplated.

His proposals: create accountability rules for autonomous weapons so they respond to court orders, legislation, and human overseers rather than blindly following orders, possibly with a judicial finger on an off switch; ban domestic use of fully autonomous weapons, including in law enforcement, while allowing them against foreign adversaries; close the bulk-collection and data-broker loophole that lets the government buy and analyze data Americans share with private companies; and guarantee public rights to AI advice at least as capable as what the government uses during adverse action, as an extension of the Administrative Procedure Act, due process, or the Sixth Amendment. He closes by warning that companies, not just governments, can capture the state, citing the Gilded Age and East India Company, and argues AI cannot be safely entrusted to either alone. Anthropic’s Long-Term Benefit Trust is offered as one accountability structure, with a call for the industry to go further.

Securing leadership by democracies: a values-based coalition

Amodei rejects treating AI as a mere instrument of trade policy to diffuse a tech stack worldwide. He believes AI resets the entire geopolitical game board like nuclear weapons, potentially even more so, becoming the dominant source of military and economic power for whoever holds it. In a virtual country of 100 million geniuses, millions could be assigned to military strategy, drone manufacture, weapons R&D, intelligence, and scientific advancement at once, so a nation with powerful AI facing one without it, or even three years behind, could be like WWII Marines against medieval swordsmen. Because powerful AI also enables deeper autocratic repression, it matters enormously that the world’s strongest nations are democracies.

His answer is a global coalition built on shared democratic values that draws in the rest of the world by making membership increasingly attractive and exclusion increasingly costly. Operating principles include managing the AI supply chain by sharing chips and semiconductor manufacturing equipment within the coalition while denying them to adversaries, expanding and tightening export controls (he cites MATCH and OVERWATCH as good first steps); coordinating on biological, cyber, and autonomy risk to make compliance compatible and effective; sharing AI’s benefits including harmonized medical approvals; mutual defense through collective AI cyberdefense, drones, manufacturing, compute, and intelligence; rejection of AI-powered repression; and macroeconomic cooperation against contagious employment crises. The coalition would respect each nation’s sovereignty, start with aligned democracies, and grow iteratively, ideally toward the whole world, but at minimum positioning democracies to contain and outcompete repressive regimes.

A window of opportunity

Amodei closes on cautious optimism. The same exponential that strains policymaking has created a unique opening: clear evidence of AI’s risks, an early taste of its value and disruption, and public backlash against unregulated approaches have left policymakers unusually open to forward-looking action. Treebeard and his forest are waking up. He firmly rejects the industry-circle view that this is a PR problem solved by better marketing, arguing people are worried because the risks are real, and that public concern in response to transparency is democratic accountability working as it should. The key challenge is focusing that concern into constructive solutions rather than letting it descend into formless anger and violence. He is optimistic because issues from job displacement to model testing to export controls have common-sense appeal across the political spectrum, and a broad nonpartisan coalition could adopt sane, forward-looking policy faster than usual.

Notable Quotes

“in only four years, AI models have gone from barely being able to write a coherent line of code to writing most of the code at major AI companies.”
Dario Amodei, on the pace of the AI exponential

“in the several years that it can take Congress to act, AI can go from an amusing toy to the full country of geniuses.”
Dario Amodei, on the mismatch between AI’s speed and the speed of legislation

“However, now the risks are clearly here. It is time to go beyond transparency to more serious and binding regulation of AI.”
Dario Amodei, marking the shift from transparency to binding rules

“enduring job displacement is undesirable and dangerous, and we should do everything we can to minimize or prevent it, not to bring it about.”
Dario Amodei, clarifying his stance on AI and jobs

“The key challenge in such a world won’t be incentivizing growth, but finding a way for everyone to share in the benefits.”
Dario Amodei, on a hypergrowth, hyper-inequality economy

“Powerful AI in the wrong hands could be the ultimate tool of autocracy, and our existing legal and constitutional protections are not fully equipped to counter this threat.”
Dario Amodei, on AI and civil liberties

“A nation that possesses powerful AI facing one without it … could be the equivalent of an army of World War II Marines facing an army of medieval swordsmen.”
Dario Amodei, on AI as the dominant source of geopolitical power

“People are worried about AI because they correctly perceive that its risks are real, not because AI CEOs have been insufficiently Panglossian.”
Dario Amodei, rejecting the idea that AI has a PR problem

“Treebeard and his forest are waking up.”
Dario Amodei, on policymakers’ new openness to acting on AI

“Policy on the AI Exponential” is a dense, structured argument from one of the most consequential figures in the field, and it rewards a full read in the original. The summary and analysis above are a guide, not a substitute. You can read the full essay here.

Related Reading
- Policy on the AI Exponential (full essay) the original source for this post, in Dario Amodei’s own words.
- Anthropic the AI safety company Amodei leads, which released the accompanying model-testing and job-displacement proposals.
- The Collingridge dilemma (Wikipedia) the idea that a technology’s impacts are hard to predict until it is too late to easily control them, central to the regulation section.
- Federal Aviation Administration (Wikipedia) the safety-certification model Amodei proposes adapting for frontier AI.
- Universal basic income (Wikipedia) one of the long-term support mechanisms raised for large-scale labor displacement.
June 10, 2026
Bill Gurley on Mental Models, Systems Thinking, AI Investing, Stablecoins, and the Future of Venture Capital
Bill Gurley spent his career at Benchmark backing some of the most consequential marketplaces and network-effect businesses of the internet era, including Uber, and he is one of the few investors who pairs deep Wall Street fundamentals with a real feel for the bleeding edge. In this wide-ranging conversation on Shane Parrish’s The Knowledge Project, he lays out the mental models he keeps returning to, how systems thinking keeps you out of trouble, why the history of your field is a hidden superpower, where AI investing is headed, and how stablecoins and tokenization could quietly rewire finance. It is a masterclass in thinking clearly about complex systems while staying obsessively curious about what is happening on the edge.

TLDW

Gurley anchors his thinking in systems thinking and complexity theory, warning that multivariable nonlinear systems produce second and third order consequences that punish anyone who optimizes for a single metric. He argues that mastering both the deep history of your field and its newest edge is wildly differentiating, whether you are interviewing for a marketing job or breaking into venture capital. On AI he is measured: he doubts a single model eats every vertical, sees real moats in workflows and proprietary data, flags that we may be painting in the corners on training data, and explains why Chinese open source models may innovate faster because forced knowledge sharing compounds. He thinks the AI buildout looks overfunded and that circular deals both raise the odds of an eventual correction and delay it. He makes the case that the IPO process is a rigged power grab, that stablecoins and instant payments threaten Visa, Mastercard, and the entire 2 to 3 percent credit card stack, and that proxy advisors like ISS have drifted from shareholder interest into a black-box heist. He closes on the craft of storytelling and writing as thinking, the equal-partnership design of Benchmark, why venture bends toward youth, and what success means now that his dream job is behind him.

Thoughts

The most useful idea in this conversation is also the quietest one: most bad decisions are not bad in the moment, they are bad in the second derivative. Gurley’s dating-site story, where lengthening profiles raised engagement in the test and then quietly killed conversion months later, is the whole argument in miniature. A linear model would have shipped that change and called it a win. A systems thinker assumes the variable you optimized is connected to three others you cannot see yet, and waits to find out. That posture, refusing to get deterministic about a single metric, is the difference between a clever experiment and a durable business. It is also the most transferable thing in the episode, because it applies to product changes, hiring, policy, and your own career just as cleanly as it applies to a dating app.

His pairing of old and new is the second idea worth stealing. Everyone in tech tells you to live on the edge, and Gurley agrees, he keeps five premium AI accounts running so he never misses a release. But he insists the edge is only half of it. Knowing the deep history of your field, the masters of marketing, the forefathers of physics, the classic cartoons that taught animation, is rare enough that it instantly creates contrast and signals genuine passion. The compounding move is to hold both at once. If you understand the legends and you actually get TikTok, you are a power player in a way that someone who only knows one end of the timeline can never be. Most people pick a side. The leverage is in refusing to.

On AI specifically, Gurley is refreshingly unwilling to pick the consensus lane in either direction. He does not buy that one near-sentient model swallows every vertical, and his reasoning is grounded rather than vibes-based: workflows and proprietary data create real switching costs, which is why he watches the legal AI startups ingesting case law and building new databases rather than assuming everyone reverts to a general chatbot. At the same time he respects the Microsoft pattern of platforms climbing the stack and crushing the apps above them. The honest answer is that it is genuinely up for grabs, and his comfort sitting in that uncertainty is itself a model. The cheap takes are “one model to rule them all” and “it is all wrappers.” Gurley holds both possibilities and keeps testing.

The systems lens does its best work on China. Rather than moralize, Gurley runs the mechanism: roughly ten open source models, intense domestic competition, and a culture of publishing techniques and weights so every model can learn from, train, and test every other model. His two-farmer metaphor, one market where farmers only trade goods and another where they are forced to share best practices, makes the prediction obvious. Forced knowledge sharing compounds faster than secrecy. The uncomfortable corollary he names is that American startups are quietly forking those open models all over Silicon Valley, and that incumbents may be lobbying for heavy regulation precisely because it pulls up the drawbridge against open source competition. That is the systems thinker’s signature move: follow the incentives to the consequence nobody is saying out loud.

Finally, the money section is a clinic in spotting rent extraction. The IPO process where bankers pick both the price and the favored buyers, the 2 to 3 percent credit card toll that exists for no defensible reason while the rest of the world built instant bank transfer decades ago, and the proxy advisors who score companies in a black box and then sell you the cure, are all variations on the same pattern: an intermediary that captured a choke point and defends it through regulatory capture rather than value. Gurley’s optimism is that crypto rails, stablecoins, and tokenization may finally route around these tolls the way WeChat Pay and Alipay leapfrogged cards in China. Whether or not you agree on the timeline, the analytical habit is the takeaway. When something costs far more than it should and has for decades, ask who captured the rules, and watch the edge for whoever is about to make those rules irrelevant.

Key Takeaways
- Systems thinking means treating the world as multivariable nonlinear systems where one variable flipping can change the entire system’s behavior, the way weather and stock markets do.
- The real danger is second and third derivative effects, consequences that only show up much later, long after the metric you optimized looked like a win.
- A dating site lengthened profiles because longer profiles tested as more engaging, then discovered months later it was negative for conversion, the textbook second order trap.
- Never get too deterministic about a single metric or single variable, and always know what is actually important and what sits on top.
- Gurley built his foundation on the canon: Peter Lynch’s One Up on Wall Street, A Random Walk Down Wall Street, the Buffett letters, Ben Graham, and Howard Marks.
- A firm grasp of the financial bedrock is what lets you innovate on top of it, and many Silicon Valley VCs would benefit from understanding finance better.
- Bill Miller reframed value investing as buying an asset that is underpriced relative to what you think it will be worth in the future, which is how he justified holding Amazon for its network effects.
- Wall Street is the buyer of the product that venture capitalists create, so even at the two-people-in-a-PowerPoint stage you should ask whether the eventual public market will be excited by it.
- Trajectory matters more than the starting place, because the trajectory is where the company actually ends up.
- Knowing the deep history of your field is remarkably differentiating, and tedium while learning it is a signal you are in the wrong lane.
- John Lasseter served Gurley a ten-course meal where each course was tied to a classic cartoon essential to understanding animation, a display of mastery over the history of the craft.
- Magnus Carlsen won a trivia contest on the history of chess, and Picasso was a wildly successful realist painter by 14, both proof that the greats master the fundamentals first.
- Obsessive, constant learning is the trait Gurley sees most in great entrepreneurs, because disruption always happens on a moving edge they need to understand at the top one percentile.
- The compounding advantage is mastering both the old history and the new edge at once, the way understanding both marketing legends and TikTok would set you apart in any interview.
- Most people underestimate how much AI can do, so push more of the downstream work into the prompt: identify the top ten, list pros and cons, rank them on one dimension, then another, and add up the numbers too.
- Gurley uses ChatGPT for project structure and memory, Gemini for restaurant research powered by Google review data, and notes that coders swear by Claude while some prefer Perplexity for finance.
- He doubts one model dominates everything; verticals like coding already let users swap models, and price optimization will push more swapping over the next few years.
- Heavy, expensive regulation could ironically create oligopoly, and some players may be quietly begging for regulation because it pulls up the bridge against Chinese open source models.
- China’s roughly ten open source models compete intensely and share weights and techniques, creating a system that can innovate faster, like farmers forced to share best practices instead of just trading goods.
- A quiet secret is that startups all over Silicon Valley are forking those Chinese open source models at real volume.
- Gurley comes down against the idea that one near-sentient model removes the need for vertical models; workflows and proprietary data, like legal startups ingesting all the case law, create durable moats.
- We may be running out of training data, painting in the corners, which is why one of the most powerful improvements is hiring experts at thousands of dollars an hour to fine-tune the models.
- Yann LeCun’s view is that the next leap is broader than LLMs, since language-based models hit an asymptote and are weak at math and numbers.
- AlphaGo’s shocking move proves models can innovate beyond their training, but it lived in a constrained game; the real world has infinite paths a computer cannot exhaustively search.
- Gurley’s non-consensus view is skepticism of the China vilification mindset, noting the US is only 3 to 5 percent of the global population and wondering how the other 95 percent hears American exceptionalism.
- The AI buildout looks overfunded: the Magnificent Seven took free cash flow from 50 to 100 billion a year down toward zero by pouring it into capex.
- The venture community has become more risk-seeking because it now deeply believes in increasing returns and power laws, and the pre-profit losses keep scaling, from Amazon’s 2 to 3 billion to Uber’s 15 billion to far more now.
- Circular deals, where a cloud provider funds a model company that spends the money right back on its services, inflate growth, which both raises the probability of an eventual correction and extends the time before one hits.
- Burn rate is a measure of risk; ten years ago a million a month was scary, now companies burn five billion a year and cannot really know their unit economics.
- Tokenization without financial-disclosure regulation invites speculation and manipulation, which is part of why companies like Stripe stay private and negotiate liquidity prices with trusted investors.
- The IPO process is unfair because bankers pick both the price and the shareholders; a freshman would simply match supply and demand anonymously in an auction, the way direct listings and ICOs do.
- Stablecoins threaten the 2 to 3 percent credit card stack; USDC holds dollar-for-dollar Treasuries and rides fast global crypto rails, while US transfers still suffer three-day ACH settlement and 25 dollar wires.
- The rest of the world built instant transfer long ago, from UK Faster Payments 20 years ago to Argentina’s PIX-style system reaching 60 to 70 percent of transactions, while US bank regulatory capture stalled Fed Now.
- Visa and Mastercard run roughly 60 percent operating margins as a bank-created duopoly, and China leapfrogged them entirely with WeChat Pay and Alipay QR-code wallets.
- Moody’s power is being the trusted standard, the watermark, so AI on the back end does not displace it; ISS and proxy advisors, by contrast, score companies in a black box and get paid on both sides.
- Proxy advisors drifted from shareholder interest into a fraud-and-risk-mitigation mindset, which is why they reflexively opposed the Tesla pay package that only paid out if the stock soared.
- The rise of passive index funds concentrated voting power in firms that lack time to evaluate votes; it would be healthier if they abstained or voted in proportion to active holders.
- Storytelling is one of the top founder traits, because founders are recruiting, raising money, and closing customers and partners constantly, selling all the time.
- Writing is thinking: Bezos’s six-page memo forces you to find the loose ends and tie them up, and a public blog becomes a calling card that magnetizes founders and deal flow.
- Other founder unfair advantages are product instincts, which fewer than 5 percent of non-product people ever truly learn, and sheer determination, Bezos’s single angel-investing test of whether someone will do it no matter what.
- Uber had no HBS case study to lean on; its winner-take-all network effects forced mega burn rates with no precedent and no mentor to call, a situation every AI company now faces.
- Benchmark’s equal partnership, with no king, president, or lead and five equal partners, makes recruiting easy, kills comp politics, and aligns everyone, at the cost of being hard to scale or run new initiatives.
- Venture bends toward youth because young investors can match founders’ age, master a fresh niche faster, and have the free time to study something 80 hours a week.
- Gurley defines current success through Arthur Brooks’s From Strength to Strength, hoping to apply his synthesizing and writing skills to bigger societal problems and dent the universe a little.
Detailed Summary

Systems Thinking and Second Order Effects

Gurley opens with the mental model he keeps returning to: systems thinking, shaped by Donella Meadows’s Thinking in Systems and his board seat at the Santa Fe Institute, which studies complexity theory. He describes complex systems as multivariable nonlinear systems that are very hard to predict, capable of behaving one way for a long time until a single variable flips and the whole system behaves differently, like weather or stock markets. The practical payoff is staying out of trouble by anticipating first, second, and third derivative consequences. His clearest example is a large dating site that lengthened user profiles because the test showed more engagement, only to learn many months later that knowing more at that stage was negative for conversion. The lesson is to never get too deterministic about a single metric and to keep the whole system in view, because a change here can ripple to there in ways you only discover much later.

Learning the Craft of Investing

Because he started on Wall Street rather than in venture, Gurley absorbed the investing canon first: Peter Lynch’s One Up on Wall Street, A Random Walk Down Wall Street, the Buffett letters, Ben Graham, and Howard Marks, people who spent careers assembling and publishing their thinking. That financial bedrock, he argues, is exactly what lets you innovate on top of it. His friend Michael Mauboussin introduced him to Bill Miller, the Legg Mason manager who beat the S&P for 15 straight years and was Amazon’s largest shareholder for a long stretch. Miller reframed value investing as buying an asset underpriced relative to its future worth, which combined with a belief in network effects justified holding a company that could grow at an unreasonable rate for years. Gurley also frames Wall Street as the buyer of the product venture capitalists create through eventual M&A or IPO, so founders should think early about whether the public market will be excited by what they are building, since trajectory matters more than the starting place.

Mastering Both the History and the Edge

Gurley makes an unusually strong case for studying the deep history of your field. He recounts a dinner with Pixar’s John Lasseter, who served a ten-course meal where every course was tied to a classic cartoon he considered essential to understanding animation, and notes that Magnus Carlsen won a chess-history trivia contest and Picasso was a master realist by 14. In a world that skims for the executive summary, walking into a marketing interview with command of the masters of marketing is wildly differentiating and signals genuine passion; if learning that history feels tedious, you are probably in the wrong lane. The counterpart trait he sees in great entrepreneurs is obsessive learning on the moving edge, where disruption actually happens. Gurley keeps five premium AI accounts so he never misses something. The real power player holds both at once, the legends and the newest thing, the way a candidate who knows the marketing greats and truly gets TikTok stands out completely.

Using AI Well and the Model Wars

People underestimate how much AI can do, Gurley says, so you should build more of the downstream work into the prompt: instead of asking for the top ten and studying them yourself, ask it to list pros and cons, rank on one dimension, rank again on another, and add up the numbers too. He uses ChatGPT for its project structure and memory, leans on Gemini for restaurant research because it carries Google review data, and notes coders swear by Claude while some prefer Perplexity for finance. On whether one model dominates or models become niche commodities, he points to coding, the largest vertical, where tools like Cursor already let users swap models, and predicts price optimization will drive more swapping. The counterforce is regulation: if it gets expensive and mundane it could create oligopoly, and some players may be quietly begging for it because it pulls up the bridge against Chinese open source models.

China, Open Source, and the Systems Advantage

Asked to apply systems thinking to China, Gurley describes roughly ten open source models locked in intense domestic competition, all learning from one another because the ecosystem chose openness, with models able to train and test other models and teams publishing the techniques behind their breakthroughs. His metaphor: two agricultural societies, one where farmers only trade goods at market and another where they are forced to share best practices; the second evolves far faster. The result is a system capable of innovating faster than the more secretive Western approach. The quiet secret he names is that startups all over Silicon Valley are forking those open models at real volume, and a key open question is whether regulation tries to stomp that out. He extends this into a broader non-consensus discomfort with the vilification of China common in Washington and parts of Silicon Valley, observing that the US is only a few percent of the global population.

AI Investing, Moats, and the Limits of Models

On how AI changes investing and whether a startup is just a wrapper, Gurley calls it up for grabs but lands on the side of durable verticals. If models become near-sentient, one model does everything; he doubts that, pointing to workflows and data moats, like the several legal AI startups ingesting all the case law and building new databases that customers will not simply swap for a general chatbot. He balances this against the Microsoft pattern of platforms climbing the stack past Lotus 1-2-3 and WordPerfect. He also flags scaling limits: we may be running out of data, painting in the corners, which is why one of the most powerful improvements is paying experts thousands of dollars an hour to fine-tune models, though human knowledge has an edge. He invokes Yann LeCun’s argument that the next leap is broader than language-based LLMs, which hit an asymptote and struggle with math, and the AlphaGo debate, where a shocking innovative move proves creativity within a constrained game but says little about the infinite paths of the real world. He notes AlphaGo and Tesla’s FSD are constrained, non-LLM systems.

Is the Buildout Overfunded

Gurley admits he is shocked by the scale of money, noting the Magnificent Seven drove free cash flow from 50 to 100 billion a year down toward zero by spending it all on capex, something he would not have believed five years ago. He traces it to the venture community’s growing conviction in increasing returns and power laws, where proven companies grow far beyond expectations, which makes investors more willing to take risk on the come. The losses before turning cash-flow positive keep scaling, from Amazon’s 2 to 3 billion to Uber’s roughly 15 billion to far larger now. On corrections, he recalls the dot-com crash producing a three to four year nuclear winter before Amazon climbed back, and explains that circular deals, where a cloud provider funds a model company that spends it right back on its services, inflate growth and therefore both raise the probability of a correction and extend the runway before one arrives. Burn rate, he stresses, is a measure of risk, and at five billion a year it is nearly impossible to know your unit economics.

Tokenization, the IPO Heist, and Going Public

There is no shortage of capital, so funding is not the bottleneck; the risk with tokenization is that, absent disclosure regulation, it invites speculation and manipulation, as seen in retail-loved names like GameStop and Palantir. Tokenizing a private company like Stripe could create the wild price swings companies stay private to avoid, since private liquidity events let them negotiate a price with trusted investors rather than expose the constantly moving underlying value, and Robinhood’s tokenization plans already drew legal pushback. Gurley reserves his sharpest critique for the IPO process, calling it insanely unfair because bankers pick both the price and the favored shareholders. A freshman computer science and finance student would simply match supply and demand anonymously in an auction, the way an ICO or a direct listing does, but Wall Street will not let go of the greedy power grab and reverted to a controlled oligopoly after direct listings were available.

Stablecoins Versus the Payment Cartel

Gurley argues stablecoins could be deeply disruptive to credit cards. Most of the developed world built instant bank-to-bank transfer long ago, from UK Faster Payments 20 years ago to Argentina’s PIX-style system that quickly hit 60 to 70 percent of transactions, while US bank regulatory capture stalled Fed Now and left an ecosystem living under 2 to 2.5 percent card fees. A USDC stablecoin holds dollar-for-dollar US Treasuries and rides proven, fast, global crypto rails, letting anyone move a dollar in seconds for pennies, against the backdrop of three-day ACH settlement and 25 dollar wires. He sees Visa and Mastercard, a bank-created duopoly with roughly 60 percent operating margins, as heavily threatened, and points to China, where WeChat Pay and Alipay built ubiquitous QR-code wallets that leapfrogged the entire card system, all because the government made money transfer easy.

Moody’s, Proxy Advisors, and Index Funds

Moody’s power, Gurley explains, comes from being a trusted standard, the watermark, so even AI on the back end does not displace it. Proxy advisors like ISS are a different story: they score companies in a black box, refuse to reveal the criteria, and then get paid by the same companies that want to learn how to score better, which he calls more of a heist than a service. They drifted from a shareholder-interest mandate into a corporate-governance, fraud-mitigation posture obsessed with rules, which is why they reflexively opposed the Tesla pay package that only paid Elon Musk if the stock soared, a deal Gurley says he would sign for every company he has worked with. The rise of passive index funds compounds the problem, concentrating voting power in firms without time to evaluate votes; he would prefer they abstain or vote in proportion to active holders, since closet indexing during the MAG 7 run already distorted active management.

Storytelling, Writing, and Founder Advantages

Gurley fell in love with the craft of writing in business school, moving from business books to personal development titles like Dale Carnegie and Seven Habits, then biographies, then long-form narrative nonfiction by Malcolm Gladwell, Michael Lewis, and Jon Krakauer, the New Journalism that reads like fiction. Writing forces clarity: he cites Bezos’s six-page memo as a tool that makes you think through corner cases and tie up loose ends, and notes that codifying his marketplace knowledge and publishing it turned his blog into a calling card that magnetized founders and deal flow. He lists the top founder traits as storytelling, product instincts, understanding the edge, and determination. Storytelling matters because founders are constantly recruiting, fundraising, and closing customers and partners. Product instinct is nearly unteachable, present in well under 5 percent of non-product hires. And determination is Bezos’s single angel-investing test: will this person do it no matter what, come hell or high water.

Uber, Benchmark, and the Shape of Venture

The Uber lesson with no HBS case study was that a winner-take-all category with network effects demanded funding ad nauseam, producing burn rates bigger than any public company would dare, with no precedent and no mentor to call, exactly the situation AI companies now face, only with a zero added. Gurley credits Benchmark’s design, an equal partnership with no king, president, or lead and five equal partners, for making it easy to recruit top talent, encouraging senior partners to develop newcomers since everyone shares the upside, and eliminating annual comp politics. The downside is that without a CEO it is hard to scale or run new initiatives, famously captured by the firm settling on a single splash-page website. Founders choose a VC for reputation and network effects, the stamp of approval that carries weight, and young investors can break in because they often match founders’ age and can outwork everyone to master a fresh niche like esports or YouTube, which is why the industry bends toward youth. Asked what success means now, Gurley says his venture career was a dream job he would have done for free, but it is done; inspired by Arthur Brooks’s From Strength to Strength, he wants to apply his synthesizing and writing to bigger societal problems and dent the universe a little.

Notable Quotes

“We do live in a world where information is really cut up, but we also live in a world where you can have access to more information than you ever could.”
Bill Gurley, on why the abundance of knowledge rewards the curious

“You got to be really conscious of the consequence and not get too deterministic about a single metric or a single variable.”
Bill Gurley, on the discipline of systems thinking

“Value just means that the asset is underpriced relative to what you think it will be worth in the future.”
Bill Gurley, relaying Bill Miller’s reframing of value investing

“I’ve always thought of Wall Street as the buyer of the product that venture capitalists create.”
Bill Gurley, on why founders should think about the public market early

“One society, when the farmers come to market, they just sell each other goods and then they go back. The other society, when the farmers come to market, they’re forced to share best practices. Which one is going to evolve faster?”
Bill Gurley, on why open source models can out-innovate

“If you took a freshman computer science student and a freshman finance student and said imagine how a company should go public, they would match supply and demand anonymously like you would in any auction.”
Bill Gurley, on the rigged IPO process

“When I meet an entrepreneur, there’s only one thing I ask myself. Is this person gonna do this no matter what? Come hell or high water, they’re doing this.”
Bill Gurley, quoting Jeff Bezos on his single test for angel investing

“You’re recruiting employees, you’re recruiting executives, you’re raising money, you’re closing customers, you’re closing partnerships. You’re selling all the damn time.”
Bill Gurley, on why storytelling is a top founder trait

“I often said that if we lived in a socialist society and everyone had to work for free, I would still take that job.”
Bill Gurley, on loving his venture career

“I would like to see if I can apply those techniques to bigger, broader problems in society and dent the universe a little bit that way.”
Bill Gurley, on what success looks like in his next chapter

Watch the full conversation with Bill Gurley on The Knowledge Project here.

Related Reading
- Bill Gurley (Wikipedia) background on the Benchmark general partner behind Uber, OpenTable, and Zillow.
- Santa Fe Institute the complexity-theory research center whose systems thinking anchors Gurley’s worldview.
- Thinking in Systems by Donella Meadows the book Gurley cites for learning to see multivariable nonlinear systems.
- From Strength to Strength by Arthur Brooks the framework guiding how Gurley thinks about his next chapter.
- The Knowledge Project with Shane Parrish the full interview this post draws from.
June 10, 2026
Whale Rock Capital Founder Alex Sacerdote on S-Curve Investing, Why Anthropic Is His Highest Conviction Bet, and the Decommoditization of AI Hardware
Alex Sacerdote built Whale Rock Capital into one of the most respected technology hedge funds in the world by treating markets through a single disciplined lens: the technology adoption S-curve. In this long conversation on Invest Like the Best with Patrick O’Shaughnessy, he lays out the full framework that has carried him through internet 1.0, mobile, cloud, e-commerce, and now AI, and he explains why Anthropic became his highest conviction position, why his fund went net short application software, and why the least glamorous corner of the market, the hardware and chips that build out data centers, may be one of the best ways to play artificial intelligence right now. What follows is the working theory of a money manager who has spent twenty years trying to think exponentially while the rest of the market thinks one quarter at a time.

TLDW

Sacerdote walks through Whale Rock’s three-part investment framework: find the right part of an S-curve, identify the company with a durable competitive advantage, and buy when long-term earnings power is underappreciated. He tells the story of investing in Anthropic at a 180 billion dollar valuation in August 2025 after Claude Code made coding the true unlock of AI, and frames the foundational model market as a three-horse race between Anthropic, OpenAI, and Google that resolved from sixty startups into an oligopoly. He argues enterprise AI is less than 1 percent penetrated, calls the adoption shape an L curve rather than an S-curve, and warns there is not enough compute in the world. He explains why he sold almost all of his application software and went net short, why he loves the decommoditization of AI hardware (Celestica, Corning, Elite Materials, Delta, Advanced Energy, high bandwidth memory, 40-layer PCBs), introduces a modified rule of 40 for chip investing, surveys the moats that let leaders win (network effects, industry standard, scale, critical IP, brand, recursive self-improvement), discusses moving from public markets into private deals like Stripe and Anthropic, lays out Whale Rock’s fund products including the new Mega Cap Tech Fund, defends old-fashioned scuttlebutt research in an AI age, and closes on the kindest thing anyone ever did for him, his father joining the firm after 41 years at Goldman Sachs.

Thoughts

The most useful idea in this conversation is not the bullishness on AI, which is everywhere now, but the discipline underneath it. Sacerdote’s framework forces a separation that most investors collapse. A great market is not a great investment. A great company is not a great investment. You need a tall S-curve, a company with a moat that survives the curve, and a price that does not yet reflect the earnings power. He says the quiet part out loud: he has repeatedly bought the best companies in the world at four or five times earnings precisely because the market refuses to extrapolate exponential growth. Nvidia at four times earnings in 2023, Tesla at five times in 2019, Amazon where AWS came free. The edge is not information, it is the willingness to underwrite two to four years out when the consensus cannot see past the next quarter.

The Anthropic story is the framework applied in real time, and it is worth noting how late and how cautious he was. Whale Rock passed on the 60 billion dollar round because gross margins were negative and coding had not yet exploded. They only got conviction once Claude Code flipped from autocomplete to agentic work, once they heard Anthropic engineers were burning 100 dollars a day in tokens, and once the math on twenty million coders implied a half trillion dollar market from coding alone. The lesson he repeats throughout, that it is okay to be late, that you can miss the first 100 percent if the curve is tall enough, is a direct rebuke to the fear of missing out that drives most AI investing. He waited for the moat to be visible before he paid up.

His most contrarian and most actionable call is on hardware. The consensus reflex is that chips and components are commodities that get competed to zero. Sacerdote argues the opposite is happening: AI workloads growing 10x a year are pushing every layer of the server to its physical limits, and that pressure is decommoditizing the entire stack. A liquid-cooled AI server is a 300,000 dollar piece of critical infrastructure, not a 5,000 dollar throwaway box, which means the supplier becomes a permanent fixture like a parts vendor on a plane. The Celestica example is the template: a contract manufacturer left for dead since 1999 that turned out to be the sole supplier of Google’s TPU server and a leader in liquid cooling and Ethernet switching, trading at eight times earnings. If he is right that we are 30 percent short on DRAM, NAND, and PCBs, the picks-and-shovels trade has years left to run regardless of which model company wins.

The software bear case deserves the most scrutiny because it is the most consequential and the least certain. Going from 40 to 50 percent of the portfolio in software to net short is a violent reallocation, and his reasons are layered: AI products that nobody will pay for, CIO budgets being raided to fund Anthropic tokens, pricing power evaporating, and the long-term threat that AI-native startups rebuild incumbents from scratch. But he is honest that the bull case is real too, that old technology is sticky, that companies prefer to buy rather than build, and that AI might actually make platforms like Slack or CRM more important if agents end up operating inside them. This is the genuine uncertainty in the whole AI trade. The bottom of Jensen’s cake, chips and models, is where the value has accrued so far, but historically the application layer captured most of the market cap. Sacerdote is betting that this time the infrastructure and model layers hold the value longer, and he admits the application ecosystem is still unclear and a little bit dangerous. That admission is more valuable than any of his confident calls.

Finally, the section on research in an AI age is a quiet refutation of the idea that this work automates away. Sacerdote runs a Philip Fisher scuttlebutt operation, 2,500 to 3,000 face-to-face management meetings a year, two decades of compounding relationships, the tripod of conviction where he, his analyst, and a respected outsider all independently like an idea. AI writes better notes now, but the paragraph on top, the wisdom about what it means and how it fits the thesis, is still human. The durable moat in his own business is the same one he looks for in the companies he buys: an accumulated advantage that newcomers cannot replicate quickly. That consistency between how he invests and how he operates is the most credible thing in the interview.

Key Takeaways
- Whale Rock’s framework has three legs: identify the right part of a technology S-curve, find the company with a powerful competitive advantage, and invest when long-term earnings power is underappreciated.
- The core insight is exponential, not linear. Strong tech business models grow earnings exponentially, and because the market refuses to extrapolate, you can buy elite companies at very low multiples.
- Concrete examples of buying exponential growth cheaply: Nvidia at four times earnings in 2023, Tesla at five times in 2019, Apple at four times, and Amazon where AWS was effectively free.
- When ChatGPT launched in November 2022, Whale Rock did a firm-wide deep dive and chose to invest in chips and infrastructure first, because demand arrives there first and the winners are knowable regardless of who wins the model layer.
- The foundational model market went from roughly 60 startups to a three-horse race: Anthropic, OpenAI, and Google. Most startups died, Amazon never showed up, and Meta faltered and had to reboot.
- Anthropic was the dark horse that focused purely on enterprise while OpenAI won consumer. Whale Rock made it their highest conviction position.
- Coding is the true unlock of AI. The progression went from Microsoft Copilot at 20 dollars a month (fixing grammar, finding a bug) to Claude running agentically and writing most of the code.
- The market math: Anthropic engineers were reportedly spending 100 dollars a day on tokens, roughly 20 to 30 thousand dollars a year, and with about 20 million coders in the world that implies a half trillion dollar market from coding alone.
- Whale Rock invested in Anthropic at the 180 billion dollar valuation in August 2025, when the company hoped to reach 9 billion in revenue and nobody yet knew what 2026 could be.
- Andrej Karpathy and Linus Torvalds both flipped on AI coding. Karpathy went from 80 percent handwritten code to writing almost no code except in English.
- Models are not pure commodities. There is real differentiation: Anthropic is strong for private equity and finance, Google is strong at ingesting PDFs, and routers that switch between models mask but do not erase that differentiation.
- Anthropic is building an ecosystem around the API (SDK, orchestration, the harness, tools), echoing how AWS built lock-in with products around commodity servers starting in 2013.
- The 800 million people using AI are mostly using AI 1.0, a search engine on steroids. Sundar Pichai estimated only about 10 basis points of knowledge workers are truly using AI’s new capabilities.
- Enterprise AI is less than 1 percent penetrated. Whale Rock calls the adoption shape an L curve or backwards L curve because it goes straight up, unlike the slower 30 to 50 percent growth of cloud and SaaS.
- There is not enough compute in the world. Anthropic reportedly has half of what it needs, and Marc Andreessen said the one thing he is sure of is that there will not be enough compute for the next four years.
- The infrastructure S-curve is only about 10 percent penetrated and remains one of the best ways to play AI.
- Getting into private deals requires a double opt-in. Whale Rock did a 90-page deck (built with Claude Code) on the coding market to win their Anthropic allocation, and their first private was Stripe in 2020 at a 35 billion dollar valuation.
- The unicorn private market is now bigger than most European stock markets, larger than Germany or the UK individually. Whale Rock does 2,500 to 3,000 management meetings a year, 10 to 15 percent with privates.
- S-curves come in two sizes: mega S-curves (internet, mobile, cloud, e-commerce, AI) and sub S-curves within them. AI is the biggest of all and each curve builds on the last.
- Adoption inflects when barriers fall. Steve Jobs cut the smartphone price to 200 dollars on a 3G touchscreen, Elon cut the EV price to 40,000 with 300-mile range and a working supply chain. Remove the barriers and you get the tornado of demand.
- Knowing how tall the curve is tells you when to sell. Growth stops being exponential around 30 to 40 percent penetration, when the sell side catches up and big beats end. EVs hit a wall at 10 to 15 percent instead of the expected 40 to 50 percent.
- Selling Apple in 2012 at roughly 50 percent US smartphone penetration was a mistake, because the moat let it keep compounding around 20 percent even after the explosive phase ended.
- At strategic inflection points you cannot trust the data (Andy Grove). The signal is intuition and anecdote: a 12-year-old in China on a giant phone playing a real game, or standing-room-only sessions at the Gartner IT Symposium for AWS, VMware, and Splunk.
- Adoption slope varies. The radio curve hit near-full penetration in about 7 years, while B2B and infrastructure (the dishwasher that has to be plugged in) take far longer. AI is fast because you just open a browser.
- The moats that let leaders win: network effects, becoming an industry standard, rapid scale, critical intellectual property, brand, and platform lock-in. Anthropic appears to have critical IP, enterprise brand, escape velocity, and recursive self-improvement from using its own code on its own models.
- On the internet, the leader usually goes bigger, faster, and wins, and compounds on itself (Amazon, Shopify). Exceptions come at paradigm shifts, like AOL failing to make the dialup-to-broadband transition.
- Whale Rock went from 40 to 50 percent in software five years ago to net short entering this year, which helped performance in the first quarter. AI products were not good enough to charge for and were not moving the needle.
- Software faces a stack of headaches: falling priority on CIO to-do lists, budget pressure from token spend, lost pricing power, hiring freezes that hurt seat-based models, and the long-term threat of AI-native replacements.
- The classic rule of 40 is growth rate plus operating margin. Whale Rock’s modified rule of 40 for chip investing is percent of sales that are AI plus market share in that category. Software AI exposure is still only 1 to 2 percent.
- AI may make some platforms more important. The first thing you do with Claude is plug it into Slack, which could make Slack a permanent repository, and agents may end up operating inside incumbent tools like CRM, solidifying rather than killing them.
- The data center stood still for 40 years on Intel x86, with every component commoditized. AI changed that. Workloads growing 10x a year are driving the decommoditization of the hardware industry.
- Celestica is the template: a contract manufacturer left for dead since 1999, sole supplier of the Google TPU server, strong in liquid cooling and Ethernet white-box switching, with 50 to 60 percent share of the cloud Ethernet switch market, once trading at eight times earnings.
- The whole supply chain is rerating: high bandwidth memory stacked 10 chips high, 40-layer PCBs (versus 10 for a normal server), Elite Materials copper clad laminate, Corning fiber (enough to circle the world four and a half times in one Microsoft data center), and Delta and Advanced Energy power supplies seeing ASPs rise 40 percent a year.
- Networking has three layers: scale out (racks together), scale across (data centers together), and scale up (every GPU in a rack, currently copper, eventually fiber). The copper-to-fiber shift could two-to-three-x Corning’s opportunity.
- Whale Rock estimates the market is roughly 30 percent short on DRAM, NAND, and PCBs even at today’s 10 basis points of real AI usage.
- Rate of change matters more than absolute level. When Claude plotted market share data it missed the rate of change, the thing that drives accelerating growth and margins as a company moves from 10 to 30 percent share.
- Key risks: public and government negativity toward AI (Maine reportedly banned data centers, only 20 percent of people are optimistic), models hitting a wall and letting open source catch up into a race to the bottom, and a major player faltering and stranding compute.
- Chip companies do not care who wins the token war, which makes them a relatively safe way to play AI. Jensen Huang actively wants open source to take off.
- Research is still human work. Whale Rock runs a Philip Fisher scuttlebutt process, the tripod of conviction (Alex, the analyst, and a respected outsider), and 20 years of compounding knowledge. AI writes better notes but cannot supply the wisdom paragraph on top or pick stocks.
- The firm’s product evolution: 15 years as a long short fund, a long only fund in 2020 that is now larger than the long short, opt-in privates formalized around 2015 and activated in 2020, an 80 percent privates hybrid fund in 2021, and the new Whale Rock Mega Cap Tech Fund.
- The Mega Cap Tech Fund thesis: endowments are structurally underweight the largest tech companies because they believe there is no alpha in large cap. Whale Rock takes the top 30 global market caps and picks the best 12 or 13, arguing it takes 100 diversified PMs to realize Google is a winner.
- The kindest thing anyone ever did for Sacerdote: his father, after 41 years at Goldman Sachs, joined Whale Rock as chairman and the gray hair for six years until he passed away in 2011.
Detailed Summary

The Anthropic Investment and the Three-Horse Race

When ChatGPT launched in November 2022, Whale Rock immediately took its 10-person team and ran a firm-wide deep dive. Sacerdote’s first principle is that every new compute paradigm creates a new stack with new winners and losers, and in this stack the layers run from power and chips at the bottom, to the clouds, to the foundational models, to the applications on top. In early 2023 the firm deliberately positioned in chips and infrastructure first, reasoning that demand arrives there first and the winners are knowable no matter who wins above. At an April 2023 webinar they framed the model layer as a coin flip between winner-take-all, total commodity, a race to zero, or an oligopoly of three or four. Over the next three years the answer became clear: of roughly 60 startups, almost all died, Amazon never really showed up, Meta came in strong then faltered and rebooted, and Anthropic emerged as the dark horse focused purely on enterprise while OpenAI won consumer and Google remained a perennial threat. The result looked like the cloud market, where three companies underpin the entire SaaS world with excellent businesses.

The decisive factor was code. Sacerdote says the firm was initially skeptical AI could replace labor, given the negative corporate feedback on early models. That changed in 2025 when Claude Code and the agentic coding tools exploded. The progression ran from Microsoft Copilot at 20 dollars a month, which could improve coding grammar or find a bug, to Claude running agentically and doing far more. The token economics were staggering: Anthropic engineers reportedly spending 100 dollars a day, which annualizes to 20 to 30 thousand dollars, and with 20 million coders worldwide that implied a half trillion dollar market from coding alone, on technology that was only 7 to 9 months old. Whale Rock made the investment at the 180 billion dollar valuation in August 2025, writing in their letter that the company hoped to reach 9 billion in revenue, with growth like nothing they had ever seen, 100 million to a billion on the way to 9 billion, and no one yet knowing what 2026 could bring.

Why the Models Are Not Commodities

Everyone expected the foundational models to be pure commodities, but Sacerdote argues there is tremendous differentiation within them. Different training methods produce different skills: Anthropic excels at anything touching private equity and finance, Google is strong at ingesting PDFs. Routers that switch between models make them look like commodities but mask genuine, critical IP. Beyond the model itself, Anthropic is building a whole ecosystem around the API: the SDK, the orchestration layer, the tools, and the harness, the software wrapped around the API that gets the most out of the model. He compares this directly to AWS in 2013, when people dismissed cloud as commodity servers in a warehouse and missed that Amazon was inventing products that slowly built lock-in. The open-source risk from China is real, but Sacerdote got comfortable that leading-edge token quality is superior, because going from 80 to 85 percent of benchmark performance is a huge unlock and the open-source players lack the compute to leapfrog the frontier.

The S-Curve Framework in Full

Whale Rock’s whole edge is thinking exponentially when the world thinks linearly. Sacerdote argues very few people believe you can accurately predict two, three, or four years out, but if you understand the S-curve, the moats, and how to model, you can. Every technology follows the same pattern: it exists hidden for years (smartphones 10 years before the iPhone, the internet 20 years before Netscape, EVs 15 years before Tesla went vertical in 2019) until the barriers to adoption fall and demand inflects into a tornado. Knowing how tall the curve is tells you when to sell, because exponential growth stops around 30 to 40 percent penetration when the sell side catches up. Curves can also be dynamic: AWS turned out to address a far larger TAM than expected once it became clear cloud was not actually deflationary. There are mega S-curves (internet, mobile, cloud, e-commerce, AI) and sub S-curves within them. AI is the biggest. And slope varies enormously by the nature of the technology, the radio curve hitting full penetration in 7 years, B2B and infrastructure taking decades because, like a dishwasher, they have to be plugged into existing systems.

On timing, Sacerdote is relaxed about being late. Citing Peter Lynch, who mentored him at Fidelity and told him to white out the chart because it is all about the future, he argues it is fine to miss the first one, two, or three years and even the first 100 percent if the top of the curve is half a trillion. At strategic inflection points, per Andy Grove, you cannot trust the data, so the firm relies on intuition and anecdote: a 12-year-old in China playing a real video game on a huge phone, or the AWS session at the Gartner IT Symposium that was standing-room-only at 9, 10, and 11 in the morning. Spotting the leader pulling away matters because, on the internet, the leader usually goes bigger, faster, and wins, compounding on itself, with exceptions only at paradigm shifts like AOL missing the move from dialup to broadband.

The Software Bear Case

Five years ago Whale Rock had 40 to 50 percent of its portfolio in software. Their April 2023 thesis was that incumbents with huge sales forces and proprietary data would take the AI APIs and build great products. Instead, the AI products were not good enough to charge for and did not move the needle, so the firm sold almost all of its application software and entered this year net short, which helped in the first quarter. The bear case is layered: software has fallen down the CIO priority list, budgets are being raided to fund Anthropic tokens with faster ROI, annual price increases look risky, and hiring freezes hurt seat-based models. The deeper threat is that AI-native startups could rebuild any incumbent from scratch, obviating the data advantage. The bull case is genuine too: old tech is sticky (mobile games did not kill consoles, tablets did not kill the PC), companies prefer to buy rather than build, and an ERP is hard to replace. Sacerdote also floats an optimistic twist, that AI could make platforms like Slack more important as agent repositories, and that agents operating inside CRM could solidify rather than destroy it, even as the bear case is that CRM goes headless and gets relegated to a database.

The Decommoditization of AI Hardware

This is Sacerdote’s most differentiated call. For 40 years nothing changed in the data center; Intel x86 became the standard, compute grew 25 to 40 percent a year in line with Moore’s law, and every component, from the printed circuit board to memory to enclosures to networking, commoditized. AI broke that. Workloads now grow 10x a year and push every aspect of the hardware to its physical limits, creating both tremendous unit growth and what Whale Rock calls the decommoditization of the hardware industry. He cites Sean Maguire wishing he could run a hardware hedge fund because all the companies are public with powerful IP, and compares it to Sequoia’s best early hardware investments in Apple and Cisco. The economics flip because an AI server is a liquid-cooled, 200 to 300 thousand dollar piece of critical infrastructure where a single failure brings the whole thing down, so suppliers become permanent like a critical part on a plane.

Celestica is the marquee example: a contract manufacturer that had been a disaster industry since 1999 and went offshore to China, but kept its IBM supercomputing heritage and talent, became the sole supplier of the Google TPU server, and was trading at eight times earnings three years ago. It turned out to be excellent at liquid cooling where others failed, holds 50 to 60 percent share of the crucial cloud Ethernet switch market, and its engineers helped write the open-source SONiC software, working closely with Broadcom. The same dynamic runs up and down the chain: high bandwidth memory stacked 10 chips high that took Samsung years to master, 40-layer PCBs versus 10 for a normal server with very few suppliers able to make them, Elite Materials supplying the copper clad laminate, and Corning’s fiber, thinner and more bendable, with enough in a single Microsoft data center to circle the world four and a half times. Networking splits into scale out, scale across, and scale up, with the eventual copper-to-fiber shift in scale up potentially two-to-three-x-ing Corning’s opportunity. Power supplies from Delta and Advanced Energy are seeing ASPs rise 40 percent a year at higher margins because each Nvidia rack uses 50 to 125 percent more power. Visibility has gone from we’ll call you next week to design this roadmap with us for four years, turning 5 percent low-margin businesses into 35 to 50 percent topline growers with rising margins, and the whole market is roughly 30 percent short on DRAM, NAND, and PCBs.

Private Markets, Risks, and the Research Machine

Moving from public markets into privates meant adapting to a double opt-in, where the company has to choose to let you in. Whale Rock won its Anthropic allocation partly by building a 90-page deck with Claude Code scouring the internet for feedback on the coding market. Their first private was Stripe in April 2020 at a 35 billion dollar valuation, which they could only underwrite because they knew the public comp Adyen cold, and they upsized to a 100 million dollar block. The unicorn market is now bigger than most European stock markets combined. On risk, Sacerdote worries about public and government negativity (Maine reportedly banning data centers, only 20 percent of people optimistic), the possibility that models hit a wall and open source catches up into a race to the bottom, and a major player faltering and stranding compute, though he notes someone else (like Meta stepping into a cancelled Oracle deal) would likely absorb it, and that chip companies benefit regardless of who wins the token war. He explains his caution on the application layer by noting it always comes later, the iPhone took years to spawn its app economy, and the ecosystem is still unclear and a little dangerous, while pointing to Brett Taylor’s Sierra as the kind of company that could prove it out.

On the research itself, Sacerdote insists AI has not supplanted the analyst. Whale Rock runs the scuttlebutt approach straight out of Philip Fisher’s Common Stocks and Uncommon Profits, doing 2,500 to 3,000 face-to-face management meetings a year and talking to suppliers, customers, and competitors. AI now writes much better notes and gets the team up to speed quickly on complex areas like ABF substrates, but there must be a wisdom paragraph on top, and it cannot pick stocks or replicate the work two analysts did building conviction in AppLovin and a relationship with Adam Foroughi. He calls the firm the Whale Rock learning machine, a group of 10 highly experienced people compounding knowledge for 20 years, with the tripod of conviction (himself, his analyst, and a respected outside investor all liking an idea) as the test. The firm’s products evolved from a 15-year long short fund to a 2020 long only fund now larger than the original, opt-in privates, an 80 percent privates hybrid in 2021, and the new Mega Cap Tech Fund built on the thesis that endowments are structurally underweight the largest tech companies because they wrongly believe large cap has no alpha. He closes on his father, who left Goldman after 41 years to join Whale Rock as chairman and the gray hair until his death in 2011, a mentor remembered by countless people for his humility and grace.

Notable Quotes

“When you get the right part of the S-curve, you get exponential unit growth. If you have a very strong business model, your earnings don’t grow linearly, they grow exponentially.”
Alex Sacerdote, stating the core of the Whale Rock investment framework

“The world doesn’t think exponentially. Very few people believe you can accurately predict two, three, four years out. But if you follow and understand the S-curve and you know the moats and you know how to model, you really can predict these great things.”
Alex Sacerdote, on why the market consistently underprices long-term earnings power

“The enterprise AI or enterprise application AI market is less than 1 percent penetrated, and we’ve never seen, you know, we talk about S-curves, we call this an L curve, just straight up.”
Alex Sacerdote, on why AI adoption looks different from every prior technology curve

“We’re at 10 basis points of people really using AI and we’re already sold out. There’s not enough compute in the world. So Anthropic has half of what they need right now, and that’s before this huge takeup.”
Alex Sacerdote, on the scale of the compute shortage relative to actual adoption

“It’s okay to be late. It’s okay to miss the first one, two, three years in a lot of cases, because if the top of the S-curve is half a trillion, the growth can go on for a long time. It’s okay to miss the first 100 percent.”
Alex Sacerdote, on why fear of missing out is the wrong instinct in a tall S-curve

“The old way of software is like using a pen and paper or a horse and buggy. The new way of software is like a jet engine or frankly like the transporter from Star Trek. It’s so revolutionary it feels like it has to be disruptive.”
Alex Sacerdote, explaining why Whale Rock went net short application software

“You become like critical infrastructure, like selling a critical part on a plane. You’ll never get swapped out.”
Alex Sacerdote, on how liquid-cooled AI servers turned commodity hardware suppliers into permanent fixtures

“Why do you tell everyone your secret? It’s like why does the casino teach people how to play blackjack? It’s harder. It’s really hard to do.”
Alex Sacerdote, quoting his mother on why a public framework does not erase the edge

“He said, you know, I’ve been at Goldman for 41 years. How about I come and join you? I’ll be the gray hair. I’ll be the oversight. I’ll be the chairman. You do what you do.”
Alex Sacerdote, recalling his father joining Whale Rock, the kindest thing anyone ever did for him

Watch the full conversation here: Whale Rock Capital Founder on Investing in the Age of Exponential AI.

Related Reading
- Invest Like the Best (Colossus) — the podcast where Patrick O’Shaughnessy hosts this conversation and a deep archive of investor interviews.
- Technology adoption life cycle (Wikipedia) — the tinkerers-to-mainstream model that underpins the entire S-curve framework Sacerdote uses.
- Anthropic — the maker of Claude and Claude Code, Whale Rock’s highest conviction position and the center of this discussion.
- Common Stocks and Uncommon Profits by Philip Fisher — the 1950s classic whose scuttlebutt method still drives Whale Rock’s research process.
- Andy Grove (Wikipedia) — the Intel leader whose idea that you cannot trust the data at strategic inflection points anchors Sacerdote’s approach to timing.
June 9, 2026
Krishna Rao on Anthropic Going From 9 Billion to 30 Billion ARR in One Quarter and the Compute Strategy Powering Claude
Krishna Rao, Chief Financial Officer of Anthropic, sat down with Patrick O’Shaughnessy on Invest Like the Best for one of the most detailed public looks yet at the operating engine behind Claude. He covers how Anthropic compounded from $9 billion of run rate revenue at the start of the year to north of $30 billion by the end of Q1, why he spends 30 to 40 percent of his time on compute, the playbook for buying gigawatts of AI infrastructure across Trainium, TPU, and GPU platforms, how Anthropic prices its models, why returns to frontier intelligence keep climbing, and what the Mythos release tells us about the cyber capabilities of the next generation of Claude.

TLDW

Anthropic is running the most compute fungible frontier lab in the world, with active deployments across AWS Trainium, Google TPU, and Nvidia GPU, and an internal orchestration layer that lets a chip serve inference in the morning and run reinforcement learning the same evening. Krishna Rao explains the cone of uncertainty that governs gigawatt scale compute procurement, the floor Anthropic refuses to drop below on model development compute, the Jevons paradox unlock from cutting Opus pricing, the 500 percent annualized net dollar retention from enterprise customers, the layer cake of long term deals with Google, Broadcom, Amazon, and the recent xAI Colossus tie up in Memphis, the phased release of the Mythos model in response to spiking cyber capabilities, the internal use of Claude Code to produce statutory financial statements and run a Monthly Financial Review skill, and why the team believes scaling laws are alive and well. The interview also covers fundraising history through Series D and Series E, the $75 billion already raised plus another $50 billion coming, talent density beating talent mass during the Meta poaching wave, and Rao’s belief that biotech and drug discovery represent the most exciting frontier for AI.

Key Takeaways
- Anthropic entered the year with about $9 billion of run rate revenue and ended the first quarter with north of $30 billion of run rate revenue, a more than 3x leap driven by model intelligence gains and the products built around them.
- Compute is described as the lifeblood of the company, the canvas everything else is built on, and the most consequential class of decisions Rao makes. Buy too much and you go bankrupt. Buy too little and you cannot serve customers or stay at the frontier.
- Rao spends 30 to 40 percent of his time on compute, even today, and the leadership team meets repeatedly on both procurement and ongoing compute allocation.
- Anthropic is the only frontier language lab actively using all three major chip platforms in production: AWS Trainium, Google TPU, and Nvidia GPU. It is also the only major model available on all three clouds.
- Flexibility is the central design principle. Anthropic builds flexibility into the deals themselves, into the orchestration layer that maps workloads to chips, and into compilers built from the chip level up.
- The cone of uncertainty frames procurement. Small differences in weekly or monthly growth compound into wildly different two year outcomes, so the team plans across a range of scenarios rather than a single point estimate, and ranges toward the upper end while protecting downside.
- Compute allocation across the company sits in three buckets: model development and research, internal employee acceleration, and external customer serving. A non negotiable floor protects model development even when customer demand is tight.
- Anthropic estimates that if it cut off internal employee use of its own models, the freed compute could serve billions of dollars of additional revenue. It chooses not to, because internal use compounds into better future models.
- Intelligence is multi dimensional, not a single IQ score. Anthropic measures real world capability through customer feedback, long horizon task performance, tool use, computer use, and speed at agentic tasks, not just leaderboard benchmarks that have largely saturated.
- Each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers both capability improvements and an efficiency multiplier on token processing. New models often serve customers at a fraction of the prior cost while doing more.
- Reinforcement learning is described as inference inside a sandbox with a reward function, so model efficiency gains directly improve internal RL throughput. The flywheel is tightly coupled.
- Over 90 percent of code at Anthropic is now written by Claude Code, and a large share of Claude Code itself is written by Claude Code.
- Anthropic shipped roughly 30 distinct product and feature releases in January and the pace has accelerated since.
- Scaling laws, in Anthropic’s internal data, are alive and well. The team holds itself to a skeptical scientific standard and still does not see them slowing down.
- Anthropic recently signed a 5 gigawatt deal with Google and Broadcom for TPUs starting in 2027, plus an Amazon Trainium agreement for up to 5 gigawatts, totaling more than $100 billion in commitments. A significant portion lands this year and next year.
- A new partnership for capacity at the xAI Colossus facility in Memphis was announced just before the interview, aimed at expanding consumer and prosumer capacity.
- Pricing has been remarkably stable across Haiku, Sonnet, and Opus. The biggest deliberate change was lowering Opus pricing, which produced a textbook Jevons paradox: consumption rose far faster than the price drop, and the new Opus 4.6 and 4.7 slot in at the same price point.
- Mythos is the first model Anthropic chose to release in a phased way because of a sharp spike in cyber capability. In an open source codebase where a prior model found 22 security vulnerabilities, Mythos found roughly 250.
- The Mythos release framework focuses on defensive use first, expands access over time, and is presented as a template for future capability spikes.
- Anthropic now sells to 9 of the Fortune 10 and reports net dollar retention above 500 percent on an annualized basis. These are not pilots. Rao describes signing two double digit million dollar commitments during a 20 minute Uber ride to the studio.
- The platform strategy is mostly horizontal. Anthropic will go vertical with offerings like Claude for Financial Services, Claude for Life Sciences, and Claude Security where it can demonstrate the model’s capabilities, but expects most application value to accrue to customers building on top.
- Investors raised over $75 billion in equity since Rao joined, with another $50 billion in commitments tied to the Amazon and Google deals. Capital intensity is real, but the raises fund the upper end of the cone of uncertainty more than they fund current losses.
- The Series E close coincided with the day the DeepSeek news broke, forcing investors to reassess their AI thesis in real time. Anthropic closed the round anyway.
- Inside finance, Claude now produces statutory financial statements for every Anthropic legal entity, with a human checker. A library of more than 70 finance specific skills underpins workflows.
- A custom Monthly Financial Review skill produces a 90 to 95 percent ready monthly close report, so leadership discussion shifts from reconciling numbers to debating implications.
- An internal real time analytics platform called Anthrop Stats compresses weekly insight cycles from hours to about 30 minutes.
- The biggest token user inside Anthropic’s finance team is the head of tax, focused on tax policy engines and workflow automation. The most senior people, not the youngest, are leading internal adoption.
- Talent density beats talent mass. When Meta and others ran aggressive offer waves, Anthropic lost two people while peer labs lost dozens.
- All seven Anthropic co founders remain at the company, as does most of the first 20 to 30 employees, which Rao credits to a collaborative, transparent, debate friendly culture and a real culture interview that can veto otherwise top tier candidates.
- Dario Amodei holds an open all hands every two weeks, writes a short prepared document, and takes unscripted questions from anyone at the company.
- AI safety investments in interpretability and alignment have a commercial side effect. Looking inside the model helps Anthropic build better models, and enterprises selling sensitive workloads want to trust the lab they hand customer data to.
- Anthropic explicitly identifies as America first in its approach to model development, and engages closely with the US administration on capability releases such as Mythos.
- The longer term product vision is the virtual collaborator: an agent with organizational context, access to the company’s tools, persistent memory, and the ability to work on ideas, not just tasks, over long horizons.
- CoWork, Anthropic’s extension of the Claude Code paradigm into general knowledge work, is being adopted faster than Claude Code itself when indexed to the same point in its launch curve.
- Anthropic’s product teams ship daily, with a fleet of agents working across the company on specific tasks. Everyone effectively becomes a manager of agents.
- The dominant downside risks to Anthropic’s high end forecast are slower customer diffusion of model capability into real workflows, scaling laws flattening unexpectedly, and Anthropic losing its position at the frontier.
- Rao is most excited about biotech and healthcare outcomes, especially the prospect that AI could push drug discovery and lab throughput up 10x or 100x, turning currently incurable diagnoses into treatable ones within a patient’s lifetime.
Detailed Summary

Compute as Lifeblood and the Cone of Uncertainty

Rao opens with the claim that compute is the most important resource at Anthropic, and the most consequential decision class in the company. You cannot buy a gigawatt of compute next week. You have to anticipate demand a year or two in advance, and the cost of being wrong in either direction is high. Buy too much and the unit economics collapse. Buy too little and you cannot serve customers or stay at the frontier, which are described as the same failure mode. To navigate this, the team uses a cone of uncertainty rather than point estimates. Small differences in weekly growth compound into vastly different two year outcomes, and Anthropic tries to position itself toward the upper end of that cone while preserving optionality. Rao notes he has had to consciously break a lifetime of linear thinking and force himself into exponential models.

Three Chip Platforms, One Orchestration Layer

Anthropic uses Amazon’s Trainium, Google’s TPUs, and Nvidia’s GPUs fungibly. That was not free. Adopting TPUs at scale started around the third TPU generation, when outside observers thought it was a strange choice. Anthropic invested years into compilers and orchestration so workloads can flow across chips by generation and by job type. The team works deeply with Annapurna Labs at AWS to influence Trainium roadmaps because Anthropic stresses these chips harder than almost anyone. The result is what Rao believes is the most efficient utilization of compute across any frontier lab, with a dollar of compute going further inside Anthropic than anywhere else.

Three Buckets and the Model Development Floor

Compute gets allocated across model development, internal acceleration of employees, and customer serving. The conversations are collaborative rather than zero sum, but there is a hard floor on model development that the company refuses to cross even if it makes customer demand harder to serve in the short term. The thesis is simple. The returns to frontier intelligence are extremely high, especially in enterprise, so cutting model investment to chase near term revenue is a bad trade. Internal employee use is also explicitly protected. Rao notes that diverting that internal usage to external customers would unlock billions of additional revenue today, but the compounding benefit of accelerating researchers and engineers outweighs that.

Intelligence Is Multi Dimensional

Rao pushes back hard on the IQ framing of model progress. Benchmarks saturate quickly, and the real signal comes from how customers actually use the models. Anthropic looks at long horizon task completion, tool use, computer use, and time to result on agentic tasks. Two equally capable agents who differ only in speed produce dramatically different value, because the faster one compounds into more attempts and more outcomes. Frontier model leaps are also fuel efficient. The sedan to sports car analogy breaks down because each Opus generation, 4 to 4.5 to 4.6 to 4.7, delivers a step up in capability and a multiplier on per token efficiency.

From 9 Billion to 30 Billion ARR in One Quarter

The headline number for the quarter is a leap from about $9 billion of run rate revenue to over $30 billion, accomplished without onboarding a corresponding step up in compute, because new compute lands on ramps locked in 12 months prior. Rao attributes the leap to model capability gains, products that surface that intelligence in usable form factors, and an enterprise customer base that pulls more workloads onto Claude as each generation unlocks new use cases. Coding started the wave with Sonnet 3.5 and 3.6, and the same pattern is now playing out elsewhere in the economy.

Recursive Self Improvement and Talent Density

Over 90 percent of Anthropic’s code is now written by Claude Code, including most of Claude Code itself. Rao describes this as a structural reason to keep allocating internal compute to employees even when external demand is hungry. Recursive self improvement is not happening through models that need no humans. It is happening through researchers who set direction and use frontier models to compress months of work into days. Talent density beats talent mass. When Meta and other labs went after Anthropic researchers with very large packages, Anthropic lost two people while peer labs lost dozens.

Procurement Strategy and the Layer Cake

Compute lands as a layer cake. Last month Anthropic signed a 5 gigawatt TPU deal with Google and Broadcom starting in 2027, alongside an Amazon Trainium agreement for up to 5 gigawatts. The total is north of $100 billion in commitments. A new tie up with xAI’s Colossus facility in Memphis was announced just before the interview, intended for nearer term capacity to support consumer and prosumer growth. Anthropic evaluates near term and long term compute deals against the same set of variables: price, duration, location, chip type, and how efficiently the team can run it. The relationships are deeper than procurement. The hyperscalers are also distribution channels for the model.

Platform First, Selective Vertical Bets

Rao describes Anthropic as a platform first business, with most expected value accruing to customers building on the platform. The team will only go vertical when it can either demonstrate capabilities that are skating to where the puck is going, like Claude Code did before the models could fully support it, or when it wants to set a template for an industry vertical, as with Claude for Financial Services, Claude for Life Sciences, and Claude Security. He acknowledges that surprise capability jumps make customers anxious about the platform competing with them, and frames Anthropic’s mitigation as deeper partnerships, early access programs, and an emphasis on accelerating customer building rather than disintermediating it.

Pricing, Jevons Paradox, and Return on Compute

Pricing across Haiku, Sonnet, and Opus has been stable. The notable exception is Opus, which Anthropic deliberately repriced lower when launching Opus 4.5 because Opus class problems were being squeezed into Sonnet workloads. Efficiency gains made it possible to serve Opus profitably at the new level. The consumption response was a classic Jevons paradox, with usage rising far more than the price reduction would have predicted, and Opus 4.6 then slotted in at the same price with a capability bump. Margins are not framed as a per token markup. Compute is fungible across model development, internal acceleration, and customer serving, so Anthropic measures return on the entire compute envelope rather than software style variable cost per call.

Fundraising, DeepSeek, and Capital Intensity

Rao joined while Anthropic was closing its Series D, mid frontier model launch and during the FTX share liquidation. Investors initially questioned whether Anthropic needed a frontier model, whether AI safety and a real business could coexist, and why the sales team was so small. The Series E closed the same day the DeepSeek news broke, with markets violently re pricing AI in real time. Since Rao joined, Anthropic has raised over $75 billion, with another $50 billion tied to the Amazon and Google compute deals. The reason for the size of the raises is the cone of uncertainty, not current losses. Returns on compute today are described as robust.

Mythos, Cyber Capability, and Phased Releases

The Mythos release marks the first time Anthropic shipped a model under a deliberately phased rollout because of a specific capability spike. Cyber is the dimension that spiked. Where a prior model found 22 vulnerabilities in an open source codebase, Mythos found roughly 250. The defensive applications, automatically patching massive codebases, are genuinely valuable, but the offensive risk is real enough that Anthropic chose to release to a smaller group first and expand access over time. Rao positions this as a template for future capability spikes, not a permanent restriction. He also describes the relationship with the US administration as cooperative, including the Department of War interaction, with Anthropic supporting a regulatory framework that does not strangle innovation but takes responsibility seriously.

Claude Inside Finance

Anthropic’s finance team is one of the strongest internal case studies. Statutory financial statements for every legal entity are produced by Claude, with a human reviewer. A skill library of more than 70 finance specific skills underpins a Monthly Financial Review skill that drafts the monthly close at 90 to 95 percent ready, so leadership meetings shift from explaining the numbers to discussing what to do about them. An internal analytics platform called Anthrop Stats compresses weekly insight cycles from hours to 30 minutes. The biggest internal token user in finance is the head of tax, building policy engines, which Rao highlights as evidence that adoption is driven by the most senior people, not just younger engineers.

Culture, Co Founders, and the Race to the Top

Seven co founders should not, on paper, work as a leadership group. Rao argues it works because the culture was set early around collaboration, intellectual honesty, transparency, and humility. The culture interview is a real veto, not a checkbox. Dario Amodei runs an all hands every two weeks with a short written piece followed by unscripted questions, and decisions, once made, get clean alignment rather than residual politics. Anthropic frames its approach as a race to the top, where being a model for how to build the technology responsibly is itself a recruiting and retention advantage.

The Virtual Collaborator and the Frontier Ahead

The product vision Rao describes is the virtual collaborator. Not just a smarter chatbot, but an agent with organizational context, access to the company’s tools, memory, and the ability to work on ideas over long horizons. Coding was the first domain to feel this, but CoWork, Anthropic’s extension of the Claude Code pattern into general knowledge work, is being adopted faster than Claude Code was at the same age. Product development inside Anthropic already looks different. Teams ship daily, with fleets of agents working across the company, and individual humans increasingly act as managers of those fleets.

Downside Risks and What Excites Him Most

The three risks Rao names if asked to do a premortem on a softer year are slower customer diffusion of model capability into real workflows, scaling laws unexpectedly flattening, and Anthropic losing its frontier position to competitors. None of these are observed today, but he is unwilling to claim them with certainty. On the upside, he is most excited about biotech and healthcare. Lab throughput rising 10x or 100x, paired with AI assisted clinical workflows, could turn currently incurable diagnoses into treatable ones within a patient’s lifetime. That is the outcome he wants the technology to chase.

Thoughts

The most consequential structural point in this interview is the framing of compute as a single fungible resource pool measured by return on the entire envelope, not as a variable cost per inference call. That accounting shift, if you accept it, breaks most of the bear cases about AI lab unit economics. The bear argument almost always assumes that a token served to a customer is the only thing the chip did that day. Rao’s version is that the same fleet trains models in the morning, runs reinforcement learning at lunch, serves customers in the afternoon, and accelerates internal engineers in the evening. If even half of that is real, the right comparison is total compute spend versus total enterprise value created by the platform, and on that ratio Anthropic looks structurally strong rather than weak.

The Jevons paradox on Opus pricing is the most actionable insight for anyone running an AI product. Most teams default to either chasing premium pricing on the newest model or undercutting to chase volume. Anthropic did something more disciplined: it left Sonnet and Haiku alone, dropped Opus when efficiency gains made it serveable, and watched aggregate usage rise faster than the price cut. The lesson is that frontier model pricing is not really a price problem. It is a capability access problem, and elasticity around the right tier is much higher than the standard SaaS playbook implies.

The Mythos cyber jump deserves more attention than it has gotten. Going from 22 to 250 vulnerabilities found in the same codebase is the kind of capability discontinuity that genuinely changes the regulatory calculus. Anthropic is signaling that it can identify these discontinuities ahead of release and choose a deployment shape that respects them. Whether peer labs adopt similar discipline is the open question. Anthropic’s race to the top framing assumes they will be forced to. The competitive market may say otherwise.

The hiring data point is the most underrated investor signal. Two departures while peer labs lost dozens, during the most aggressive talent war in tech history, is not a culture poster. It is a structural advantage that compounds every time another lab tries to buy its way to the frontier. Money can be matched. Conviction in the mission, transparent leadership, and a culture interview that can veto otherwise stellar candidates cannot. If you believe scaling laws hold, talent retention at this density is one of the few moats that actually scales with capital.

Finally, the most interesting personal admission is that Krishna Rao, a finance leader trained at Blackstone and Cedar, is openly telling investors that linear thinking is the failure mode he had to break out of. The companies that pattern match this moment to prior technology waves are mispricing it, in both directions. The cone of uncertainty Anthropic uses internally is the right metaphor for everyone else too. If you are forecasting AI as if it is cloud in 2010, you are almost certainly wrong, and the magnitude of the error is much larger than it would be in any prior era.

Watch the full conversation with Krishna Rao on Invest Like the Best here.
May 13, 2026
Andrej Karpathy on AutoResearch, AI Agents, and Why He Stopped Writing Code: Full Breakdown of His 2026 No Priors Interview

TL;DW

Andrej Karpathy sat down with Sarah Guo on the No Priors podcast (March 2026) and delivered one of the most information-dense conversations about the current state of AI agents, autonomous research, and the future of software engineering. The core thesis: since December 2025, Karpathy has essentially stopped writing code by hand. He now “expresses his will” to AI agents for 16 hours a day, and he believes we are entering a “loopy era” where autonomous systems can run experiments, train models, and optimize hyperparameters without a human in the loop. His project AutoResearch proved this works by finding improvements to a model he had already hand-tuned over two decades of experience. The conversation also covers the death of bespoke apps, the future of education, open vs. closed source models, robotics, job market impacts, and why Karpathy chose to stay independent from frontier labs.

Key Takeaways

1. The December 2025 Shift Was Real and Dramatic

Karpathy describes a hard flip that happened in December 2025 where he went from writing 80% of his own code to writing essentially none of it. He says the average software engineer’s default workflow has been “completely different” since that month. He calls this state “AI psychosis” and says he feels anxious whenever he is not at the forefront of what is possible with these tools.

2. AutoResearch: Agents That Do AI Research Autonomously

AutoResearch is Karpathy’s project where an AI agent is given an objective metric (like validation loss), a codebase, and boundaries for what it can change. It then loops autonomously, running experiments, tweaking hyperparameters, modifying architectures, and committing improvements without any human in the loop. When Karpathy ran it overnight on a model he had already carefully tuned by hand over years, it found optimizations he had missed, including forgotten weight decay on value embeddings and insufficiently tuned Adam betas.

3. The Name of the Game Is Removing Yourself as the Bottleneck

Karpathy frames the current era as a shift from optimizing your own productivity to maximizing your “token throughput.” The goal is to arrange tasks so that agents can run autonomously for extended periods. You are no longer the worker. You are the orchestrator, and every minute you spend in the loop is a minute the system is held back.

4. Mastery Now Means Managing Multiple Agents in Parallel

The vision of mastery is not writing better code. It is managing teams of agents simultaneously. Karpathy references Peter Steinberg’s workflow of having 10+ Codex agents running in parallel across different repos, each taking about 20 minutes per task. You move in “macro actions” over your codebase, delegating entire features rather than writing individual functions.

5. Personality and Soul Matter in Coding Agents

Karpathy praises Claude’s personality, saying it feels like a teammate who gets excited about what you are building. He contrasts this with Codex, which he calls “very dry” and disengaged. He specifically highlights that Claude’s praise feels earned because it does not react equally to half-baked ideas and genuinely good ones. He credits Peter (OpenClaw) with innovating on the “soul” of an agent through careful prompt design, memory systems, and a unified WhatsApp interface.

6. Apps Are Dead. APIs and Agents Are the Future.

Karpathy built “Dobby the Elf Claw,” a home automation agent that controls his Sonos, lights, HVAC, shades, pool, spa, and security cameras through natural language over WhatsApp. He did this by having agents scan his local network, reverse-engineer device APIs, and build a unified dashboard. His conclusion: most consumer apps should not exist. Everything should be API endpoints that agents can call on behalf of users. The “customer” of software is increasingly the agent, not the human.

7. AutoResearch Could Become a Distributed Computing Project

Karpathy envisions an “AutoResearch at Home” model inspired by SETI@home and Folding@home. Because it is expensive to find code optimizations but cheap to verify them (just run the training and check the metric), untrusted compute nodes on the internet could contribute experimental results. He draws an analogy to blockchain: instead of blocks you have commits, instead of proof of work you have expensive experimentation, and instead of monetary reward you have leaderboard placement. He speculates that a global swarm of agents could potentially outperform frontier labs.

8. Education Is Being Redirected Through Agents

Karpathy describes his MicroGPT project, a 200-line distillation of LLM training to its bare essence. He says he started to create a video walkthrough but realized that is no longer the right format. Instead, he now “explains things to agents,” and the agents can then explain them to individual humans in their own language, at their own pace, with infinite patience. He envisions education shifting to “skills” (structured curricula for agents) rather than lectures or guides for humans directly.

9. The Jaggedness Problem Is Still Real

Karpathy describes current AI agents as simultaneously feeling like a “brilliant PhD student who has been a systems programmer their entire life” and a 10-year-old. He calls this “jaggedness,” and it stems from reinforcement learning only optimizing for verifiable domains. Models can move mountains on agentic coding tasks but still tell the same bad joke they told four years ago (“Why don’t scientists trust atoms? Because they make everything up.”). Things outside the RL reward loop remain stuck.

10. Open Source Is Healthy and Necessary, Even If Behind

Karpathy estimates open source models are now roughly 6 to 8 months behind closed frontier models, down from 18 months and narrowing. He draws a parallel to Linux: the industry has a structural need for a common, open platform. He is “by default very suspicious” of centralization and wants more labs, more voices in the room, and an “ensemble” approach to AI governance. He thinks it is healthy that open source exists slightly behind the frontier, eating through basic use cases while closed models handle “Nobel Prize kind of work.”

11. Digital Transformation Will Massively Outpace Physical Robotics

Karpathy predicts a clear ordering: first, a massive wave of “unhobling” in the digital space where everything gets rewired and made 100x more efficient. Then, activity moves to the interface between digital and physical (sensors, cameras, lab equipment). Finally, the physical world itself transforms, but on a much longer timeline because “atoms are a million times harder than bits.” He notes that robotics requires enormous capital expenditure and conviction, and most self-driving startups from 10 years ago did not survive long term.

12. Why Karpathy Stays Independent From Frontier Labs

Karpathy gives a nuanced answer about why he is not working at a frontier lab. He says employees at these labs cannot be fully independent voices because of financial incentives and social pressure. He describes this as a fundamental misalignment: the people building the most consequential technology are also the ones who benefit most from it financially. He values being “more aligned with humanity” outside the labs, though he acknowledges his judgment will inevitably drift as he loses visibility into what is happening at the frontier.

Detailed Summary

The AI Psychosis and the End of Hand-Written Code

The conversation opens with Karpathy describing what he calls a state of perpetual “AI psychosis.” Since December 2025, he has not typed a line of code. The shift was not gradual. It was a hard flip from doing 80% of his own coding to doing almost none. He compares the anxiety of unused agent capacity to the old PhD feeling of watching idle GPUs. Except now, the scarce resource is not compute. It is tokens, and you feel the pressure to maximize your token throughput at all times.

He describes the modern workflow: you have multiple coding agents (Claude Code, Codex, or similar harnesses) running simultaneously across different repositories. Each agent takes about 20 minutes on a well-scoped task. You delegate entire features, review the output, and move on. The job is no longer typing. It is orchestration. And when it does not work, the overwhelming feeling is that it is a “skill issue,” not a capability limitation.

Karpathy says most people, even his own parents, do not fully grasp how dramatic this shift has been. The default workflow of any software engineer sitting at a desk today is fundamentally different from what it was six months ago.

AutoResearch: Closing the Loop on AI Research

The centerpiece of the conversation is AutoResearch, Karpathy’s project for fully autonomous AI research. The setup is deceptively simple: give an agent an objective metric (like validation loss on a language model), a codebase to modify, and boundaries for what it can change. Then let it loop. It generates hypotheses, runs experiments, evaluates results, and commits improvements. No human in the loop.

Karpathy was surprised it worked as well as it did. He had already hand-tuned his NanoGPT-derived training setup over years using his two decades of experience. When he let AutoResearch run overnight, it found improvements he had missed. The weight decay on value embeddings was forgotten. The Adam optimizer betas were not sufficiently tuned. These are the kinds of things that interact with each other in complex ways that a human researcher might not systematically explore.

The deeper insight is structural: everything around frontier-level intelligence is about extrapolation and scaling laws. You do massive exploration on smaller models and then extrapolate to larger scales. AutoResearch is perfectly suited for this because the experimentation is expensive but the verification is cheap. Did the validation loss go down? Yes or no.

Karpathy envisions this scaling beyond a single machine. His “AutoResearch at Home” concept borrows from distributed computing projects like Folding@home. Because verification is cheap but search is expensive, you can accept contributions from untrusted workers across the internet. He draws a blockchain analogy: commits instead of blocks, experimentation as proof of work, leaderboard placement as reward. A global swarm of agents contributing compute could, in theory, rival frontier labs that have massive but centralized resources.

The Claw Paradigm and the Death of Apps

Karpathy introduces the concept of the “claw,” a persistent, looping agent that operates in its own sandbox, has sophisticated memory, and works on your behalf even when you are not watching. This goes beyond a single chat session with an AI. A claw has persistence, autonomy, and the ability to interact with external systems.

His personal example is “Dobby the Elf Claw,” a home automation agent that controls his entire smart home through WhatsApp. The agent scanned his local network, found his Sonos speakers, reverse-engineered the API, and started playing music in three prompts. It did the same for his lights, HVAC, shades, pool, spa, and security cameras (using a Qwen vision model for change detection on camera feeds).

The broader point is that this renders most consumer apps unnecessary. Why maintain six different smart home apps when a single agent can call all the APIs directly? Karpathy argues the industry needs to reconfigure around the idea that the customer is increasingly the agent, not the human. Everything should be exposed API endpoints. The intelligence layer (the LLM) is the glue that ties it all together.

He predicts this will become table stakes within a few years. Today it requires vibe coding and direct agent interaction. Soon, even open source models will handle this trivially. The barrier will come down until every person has a claw managing their digital life through natural language.

Model Jaggedness and the Limits of Reinforcement Learning

One of the most technically interesting sections covers what Karpathy calls “jaggedness.” Current AI models are simultaneously superhuman at verifiable tasks (coding, math, structured reasoning) and surprisingly mediocre at anything outside the RL reward loop. His go-to example: ask any frontier model to tell you a joke, and you will get the same one from four years ago. “Why don’t scientists trust atoms? Because they make everything up.” The models have improved enormously, but joke quality has not budged because it is not being optimized.

This jaggedness creates an uncanny valley in interaction. Karpathy describes the experience as talking to someone who is simultaneously a brilliant PhD systems programmer and a 10-year-old. Humans have some variance in ability across domains, but nothing like this. The implication is that the narrative of “general intelligence improving across all domains for free as models get smarter” is not fully accurate. There are blind spots, and they cluster around anything that lacks objective evaluation criteria.

He and Sarah Guo discuss whether this should lead to model “speciation,” where specialized models are fine-tuned for specific domains rather than one monolithic model trying to be good at everything. Karpathy thinks speciation makes sense in theory (like the diversity of brains in the animal kingdom) but says the science of fine-tuning without losing capabilities is still underdeveloped. The labs are still pursuing monocultures.

Open Source, Centralization, and Power Balance

Karpathy, a long-time open source advocate, estimates the gap between closed and open source models has narrowed from 18 months to roughly 6 to 8 months. He draws a direct parallel to Linux: despite closed alternatives like Windows and macOS, the industry structurally needs a common open platform. Linux runs on 60%+ of computers because businesses need a shared foundation they feel safe using.

The challenge for open source AI is capital expenditure. Training frontier models is astronomically expensive, and that is where the comparison to Linux breaks down somewhat. But Karpathy argues the current dynamic is actually healthy: frontier labs push the bleeding edge with closed models, open source follows 6 to 8 months behind, and that trailing capability is still enormously powerful for the vast majority of use cases.

He expresses deep skepticism about centralization, citing his Eastern European background and the historical track record of concentrated power. He wants more labs, more independent voices, and an “ensemble” approach to decision-making about AI’s future. He worries about the current trend of further consolidation even among the top labs.

The Job Market: Digital Unhobling and the Jevons Paradox

Karpathy recently published an analysis of Bureau of Labor Statistics jobs data, color-coded by which professions primarily manipulate digital information versus physical matter. His thesis: digital professions will be transformed first and fastest because bits are infinitely easier to manipulate than atoms. He calls this “unhobling,” the release of a massive overhang of digital work that humans simply did not have enough thinking cycles to process.

On whether this means fewer software engineering jobs, Karpathy is cautiously optimistic. He invokes the Jevons Paradox: when something becomes cheaper, demand often increases so much that total consumption goes up. The canonical example is ATMs and bank tellers. ATMs were supposed to replace tellers, but they made bank branches cheaper to operate, leading to more branches and more tellers (at least until 2010). Similarly, if AI makes software dramatically cheaper, the demand for software could explode because it was previously constrained by scarcity and cost.

He emphasizes that the physical world will lag behind significantly. Robotics requires enormous capital, conviction, and time. Most self-driving startups from a decade ago failed. The interesting opportunities in the near term are at the interface between digital and physical: sensors feeding data to AI systems, actuators executing AI decisions in the real world, and new markets for information (he imagines prediction markets where agents pay for real-time photos from conflict zones).

Education in the Age of Agents

Karpathy’s MicroGPT project distills the entire LLM training process into 200 lines of Python. He started making an explanatory video but stopped, realizing the format is obsolete. If the code is already that simple, anyone can ask an agent to explain it in whatever way they need: different languages, different skill levels, infinite patience, multiple approaches. The teacher’s job is no longer to explain. It is to create the thing that is worth explaining, and then let agents handle the last mile of education.

He envisions a future where education shifts from “guides and lectures for humans” to “skills and curricula for agents.” A skill is a set of instructions that tells an agent how to teach something, what progression to follow, what to emphasize. The human educator becomes a curriculum designer for AI tutors. Documentation shifts from HTML for humans to markdown for agents.

His punchline: “The things that agents can do, they can probably do better than you, or very soon. The things that agents cannot do is your job now.” For MicroGPT, the 200-line distillation is his unique contribution. Everything else, the explanation, the teaching, the Q&A, is better handled by agents.

Why Not Return to a Frontier Lab?

The conversation closes with a nuanced discussion about why Karpathy remains independent. He identifies several tensions. First, financial alignment: employees at frontier labs have enormous financial incentives tied to the success of transformative (and potentially disruptive) technology. This creates a conflict of interest when it comes to honest public discourse. Second, social pressure: even without arm-twisting, there are things you cannot say and things the organization wants you to say. You cannot be a fully free agent. Third, impact: he believes his most impactful contributions may come from an “ecosystem level” role rather than being one of many researchers inside a lab.

However, he acknowledges a real cost. Being outside frontier labs means his judgment will inevitably drift. These systems are opaque, and understanding how they actually work under the hood requires being inside. He floats the idea of periodic stints at frontier labs, going back and forth between inside and outside roles to maintain both independence and technical grounding.

Thoughts

This is one of the most honest and technically grounded conversations about the current state of AI I have heard in 2026. A few things stand out.

The AutoResearch concept is genuinely important. Not because autonomous hyperparameter tuning is new, but because Karpathy is framing the entire problem correctly: the goal is not to build better tools for researchers. It is to remove researchers from the loop entirely. The fact that an overnight run found optimizations that a world-class researcher missed after years of manual tuning is a powerful data point. And the distributed computing vision (AutoResearch at Home) could be the most consequential idea in the entire conversation if someone builds it well.

The “death of apps” framing deserves more attention. Karpathy’s Dobby example is not a toy demo. It is a preview of how every consumer software company’s business model gets disrupted. If agents can reverse-engineer APIs and unify disparate systems through natural language, the entire app ecosystem becomes a commodity layer beneath an intelligence layer. The companies that survive will be the ones that embrace API-first design and accept that their “user” is increasingly an LLM.

The jaggedness observation is underappreciated. The fact that models can autonomously improve training code but cannot tell a new joke should be deeply uncomfortable for anyone claiming we are on a smooth path to AGI. It suggests that current scaling and RL approaches produce narrow excellence, not general intelligence. The joke example is funny, but the underlying point is serious: we are building systems with alien capability profiles that do not match any human intuition about what “smart” means.

Finally, Karpathy’s decision to stay independent is itself an important signal. When one of the most capable AI researchers in the world says he feels “more aligned with humanity” outside of frontier labs, that should be taken seriously. His point about financial incentives and social pressure creating misalignment is not abstract. It is structural. And his proposed solution of rotating between inside and outside roles is pragmatic and worth consideration for the entire field.

March 20, 2026