PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: coding agents

OpenCode CEO Jay V on 20x Growth in 6 Months: 13 Million Users, 7 Trillion Tokens a Day, the Anthropic Block That Backfired, and the 16-Year Road to Overnight Success
In this episode of Y Combinator’s Lightcone podcast, Jay V, founder and CEO of OpenCode, the open-source coding agent that works with any model, walks through one of the wildest growth stories in developer tools: 650,000 monthly active users in January to roughly 13 million by June, 7 trillion tokens processed per day, and a business that went from zero to a $40 million revenue run rate in about eight months. He also tells the part almost nobody knows: the company behind this “overnight success” is a 16-year-old legal entity that applied to Y Combinator nine times before getting in.

TLDW

Jay V explains how OpenCode grew 20x in six months to around 13 million monthly active users and 4.6 million weekly actives, processing 7 trillion tokens a day (more than OpenRouter’s entire volume), with an inference business annualizing near $40 million plus 160,000 subscribers worth another $18 million. The inflection point came when Anthropic started blocking Claude Code subscriptions inside OpenCode by rejecting requests whose system prompt contained the words “open code,” which backfired by equating the two products and sending curious users flooding in, shortly after which OpenAI’s Codex officially supported OpenCode. The conversation covers OpenCode’s public usage data (DeepSeek Flash dominating token volume despite GLM hype), a global user base led by China at 17% with heavy usage in Indonesia, Brazil, and Vietnam, Fortune 500 companies discovering thousands of employees already using the tool, the shift from ad-based CAC to token-based CAC, the flat 24-hour GPU utilization curve that comes from serving the whole planet, the “betting the field” marketplace thesis on model commoditization, and the founder’s 16-year, nine-application journey from a Waterloo dorm through SST, OpenNext, and selling coffee over SSH to finally catching lightning.

Thoughts

The Anthropic block is the most instructive growth story in the episode, because it is a perfect modern Streisand effect. Anthropic had a defensible reason to stop subsidized Claude Code subscriptions from flowing through a third-party harness, but the implementation (rejecting any request whose system prompt literally contained “open code”) turned a quiet policy decision into a public endorsement. As Jay puts it, the block placed OpenCode on the same pedestal as Claude Code in the minds of developers who had never heard of it. The hosts’ Instacart comparison is apt: when Amazon bought Whole Foods, the “death of Instacart” meme drove every grocer in America into Instacart’s arms. Incumbents keep learning this lesson the hard way. You cannot block a product without simultaneously advertising that it matters.

The deeper story is geographic. Silicon Valley talks about coding agents as if the $200-per-month power user is the market, and Jay’s data says the opposite. China alone is 17% of OpenCode’s usage, with Indonesia, Brazil, and Vietnam each carrying meaningful share, places where a frontier subscription costs more than rent. OpenCode’s $10 Go plan, running DeepSeek and GLM instead of Sonnet and Opus, is how billions of developers will actually have their first coding-agent moment. There is also a hard operational edge hiding in that distribution: because the East works while the West sleeps, OpenCode’s GPU utilization runs a nearly flat 24-hour cycle, which quietly improves unit economics in a way no single-market competitor can match. Serving the whole planet is not just a mission statement. It is a margin strategy.

OpenCode’s neutrality is turning into one of the most valuable datasets in AI. Because the product is a harness over every model rather than a storefront for one lab, opencode.ai/data shows what developers actually run when they are spending their own money, and it routinely contradicts the Twitter narrative. GLM was supposedly eating DeepSeek’s lunch; the token-volume charts show DeepSeek Flash dipping and then bouncing right back. Users are not loyal, they are rational: they ride frontier limits until they hit caps, then switch to models cheap and fast enough to finish the day’s work. That behavioral reality, boring cost optimization rather than fandom, is what the model market actually looks like once the marketing fog clears, and only a neutral aggregator gets to see it.

The business model inversion deserves more attention than it usually gets. In the last era, customer acquisition cost meant ads. In this one, it means tokens: the free tier is the marketing budget, spent on giving people the aha moment, and the payoff comes when a fraction of those users become whales paying per token, where OpenCode’s volume discounts become margin. This is the same funnel Anthropic and OpenAI run, except the frontier labs subsidize with investor billions while OpenCode rides the falling cost curve of open-weight models. The enterprise motion follows the same bottoms-up physics: no procurement dance, just inbound emails saying thousands of our employees are already using you, please sign the security questionnaire. That is the purest product-market-fit signal that exists.

And then there is the 16-year overnight success. Same legal entity since 2010, same two founders from a Waterloo dorm room, nine YC applications and four interviews before acceptance in 2021, years of living with parents and running out of money, a serverless framework, a coffee shop that ran over SSH. Every “dead end” turns out to have been training: the consumer company taught metrics discipline, SST taught open source and building in public, the terminal storefront taught terminal-UI craft that made OpenCode instantly credible with the Neovim crowd. The hosts land the right conclusion: lightning did strike, but the founders spent a decade positioning the bottle. In an industry currently obsessed with six-month-old unicorns, this episode is a useful reminder that most of them are carrying more history than the headline suggests.

Key Takeaways
- OpenCode ended June 2026 at roughly 13 million monthly active users and 4.6 million weekly actives, close to Codex’s numbers, a 20x increase from about 650,000 monthly actives at the start of the year.
- The platform now processes around 7 trillion tokens per day, more than OpenRouter’s total of roughly 6 trillion, up from about 300 billion per day at the beginning of the year.
- The pay-per-token inference business, launched around late September 2025, annualizes to $31-33 million on June data and $38-40 million on the most recent week, roughly eight months from zero.
- The subscription product launched in late February has grown to about 160,000 monthly subscribers, roughly $18 million in annualized revenue on top of inference.
- A Codex lead engineer publicly noted that about 5% of all Codex subscribers use OpenCode as their main harness, and OpenAI officially supports Codex subscriptions inside OpenCode.
- In the first week of January, Anthropic began blocking Claude Code subscriptions in OpenCode by rejecting any request whose system prompt contained the words “open code.”
- Jay concedes the block made business sense (Anthropic subsidizes that usage) but says it inadvertently equated OpenCode with Claude Code and drove waves of new users to investigate the product.
- The hosts compare it to Amazon buying Whole Foods: the “death of Instacart” meme drove every grocer in America to sign with Instacart, fueling its growth instead of killing it.
- The founding premise is that most people in the world still have not experienced the magic of a coding agent, and frontier per-token prices put that moment out of reach for much of the globe.
- When OpenCode launched in June 2025 the pitch was using your Claude Code subscription in a better terminal UI; by August and September the first credible open-source models (GLM, Kimi, MiniMax) arrived, roughly six months behind the frontier.
- February 2026 marked the first four-week span in OpenCode’s data where users ran Gemini more than the Anthropic models (Sonnet plus Opus combined), which convinced the team the non-Anthropic models were ready for real work and triggered the subscription launch.
- OpenCode publishes its usage data at opencode.ai/data, covering the Go plan where $10 a month buys access to open-source models.
- DeepSeek Flash leads token volume per day, with the two DeepSeek models plus GLM as the top three, despite social media chatter suggesting GLM had overtaken DeepSeek.
- By unique users the top models run DeepSeek Flash at about 38,000, DeepSeek Pro at 31,000, and GLM 5.2 near 30,000.
- A key usage pattern: as users approach daily or weekly limits on premium models, they switch to very cheap models like DeepSeek Flash to finish their work, extending how much coding-agent time their budget buys.
- Speed matters too: some open models are hosted with far higher tokens-per-second than alternatives, making the agent feel near real time, and users perceive quality niches, like GLM 5.2 being better at front-end design.
- China is OpenCode’s largest market at 17% of usage, which the hosts note may make it the only YC company in history with meaningful usage in China, partly because Chinese developers want to run Chinese models and OpenCode gives them that choice.
- Developing countries are huge: Indonesia at 4% of traffic, Brazil at 5%, plus Vietnam and similar markets where a $200-a-month Claude Code subscription is prohibitively expensive.
- The US, which the team was not even targeting with the Go plan, is growing strongly anyway, which Jay reads as a broader vibe shift toward token budgeting even among Americans.
- Large US companies with effectively unlimited token budgets also adopted OpenCode early because they did not want to be locked into a specific model or harness.
- Dozens of forward-leaning Fortune 500 companies have significant OpenCode footprints, often discovered when the company itself emails saying thousands of employees are already using it.
- Enterprise inbound has inverted the old SaaS procurement dance: companies beg OpenCode to fill out security questionnaires so they can officially use a product their engineers already adopted.
- Enterprise pull comes in four flavors: officially blessing developer usage, extending the tool to non-technical employees, embedding the agent loop inside their own products, and managing token spend by routing teams to cheaper models.
- One enterprise asked for deep visibility into exactly what every employee does with the tool, which the team flagged as a should-we-even-build-this question.
- Ramp built a Slack bot running OpenCode’s embeddable server (the agent loop that works behind the UI) before OpenCode had built anything similar internally, publishing a blog post about it in December.
- OpenCode is architected as a two-part product: the terminal UI you interact with, and a separately embeddable server that runs the agent loop and calls the LLM.
- The new CAC is tokens, not ads: the free tier exists to give people the magic moment, the subscription converts them to real work, and whales paying per token feed directly into margin via OpenCode’s volume discounts on inference.
- The episode references Dylan Patel’s podcast claim that Anthropic reached roughly $50 billion annualized revenue at around 70% margin in Q2, proof that the subsidize-then-harvest funnel can cross into profitability.
- Global usage produces a nearly flat 24-hour GPU utilization curve (the East works while the West sleeps), improving unit economics versus competitors serving one region.
- Jay describes OpenCode as a marketplace that showcases model diversity: competition among labs benefits consumers, while vendor lock-in mostly benefits vendor margins.
- OpenCode is now the largest customer by token volume for most open-source model labs, making the relationship symbiotic: the strategy is not picking a winning lab but betting the whole field.
- Every bump in OpenCode’s monthly actives traces back to a corresponding release in the open-source model market, making its growth a proxy for open-model progress.
- The name OpenCode was deliberate positioning: when a market has one or two dominant players, the rest coalesces around an open alternative, and whoever occupies that position first is very hard to displace.
- To support 70+ models and providers at launch, the team built models.dev, an open-source database of models and providers that Jay calls probably the best such dataset in the world.
- The origin moment: when Claude Code appeared in February 2025, the team (Neovim users unimpressed by its terminal UI) decided to build a coding agent that met the standard of modern terminal tools, credibility that resonated instantly with the core developer audience.
- The team had form here: co-founder Dax had built terminal.shop, a complete storefront for buying coffee over SSH, the kind of eccentric-taste project the hosts argue pulls founders toward outlier outcomes.
- The company is one 16-year-old legal entity, incorporated in 2010, founded by Jay and his college roommate Frank after a Waterloo co-op term convinced Jay he never wanted a normal job.
- Jay applied to YC nine times between 2016 and 2021 with four interviews before getting in, with his first interview dating back to the era when Paul Graham ran them and an Airbnb founder was hanging around the waiting room.
- The 2021 YC idea was a serverless platform, Heroku for AWS, which became SST, the team’s first big open-source project and the on-ramp to building in public.
- Building in public became core identity after co-founder Dax observed that if all your code is public and you work in public, staying silent about it is a disservice to the product; the community now follows the company like a reality TV show.
- Jay credits survival to stubbornness, visible forward progress, and cheap burn (living with parents after running out of money), while warning founders: don’t try this at home.
- The hosts’ framing of the whole arc: it took ten years of grinding to get to zero-to-$30-million in eight months, and catching lightning in a bottle requires positioning the bottle correctly first.
Detailed Summary

The Numbers: 20x in Six Months

OpenCode began the year around 650,000 monthly active users and ended June near 13 million, with 4.6 million weekly actives that put it in the same conversation as OpenAI’s Codex. Token throughput grew from roughly 300 billion per day to 7 trillion, a volume larger than all of OpenRouter. The money followed two tracks: a pay-per-token inference business launched in the fall that annualizes near $40 million on recent weeks, and a subscription product launched in late February that reached 160,000 monthly subscribers and about $18 million annualized. Codex officially supporting OpenCode, with around 5% of Codex subscribers choosing it as their harness, added a second frontier on-ramp right as the Anthropic controversy peaked.

The Anthropic Block That Backfired

Using a Claude Code subscription inside OpenCode was one of the most common usage patterns until Anthropic moved to stop it in early January, rejecting requests whose system prompt mentioned “open code.” Jay is gracious about the logic (Anthropic subsidizes subscription usage and wants it inside its own product) but the effect was the opposite of containment. The block put the scrappy open-source harness on the same pedestal as the category leader, told every developer who had not tried it that it was worth investigating, and kicked off the year’s 20x run. The hosts draw the Instacart parallel: a supposed death blow that functioned as the best marketing campaign the company never paid for.

A Global User Base the Valley Doesn’t See

The product premise is that the coding-agent aha moment is a once-a-generation experience most of the world cannot afford at frontier prices. The Go plan ($10 a month for open-source models) was built for that global audience, and the geography shows it: China leads at 17%, with Indonesia at 4%, Brazil at 5%, and Vietnam prominent, markets where $200 a month is simply not a consumer price point. Two surprises followed. Chinese developers use OpenCode partly to run their own country’s models, which no US-locked product lets them do. And the US, never the target for Go, is growing fast anyway, which Jay reads as the token-budgeting vibe shift reaching even the throw-money-at-it crowd, helped by moments like GLM 5.2’s popularity making the plan the easiest way to try it.

What the Usage Data Really Shows

OpenCode publishes per-model usage at opencode.ai/data, and because every data point is an actual end user rather than aggregated API traffic, it is arguably the cleanest picture of what working engineers really run. DeepSeek Flash dominates token volume, the two DeepSeeks plus GLM hold the top three, and the market-share graph shows DeepSeek dipping when GLM launched and then bouncing back, contradicting the Twitter narrative of a GLM takeover. By unique users, Flash leads at 38,000 with DeepSeek Pro at 31,000 and GLM 5.2 near 30,000. The behavioral driver is pragmatic: cheap, fast models let users keep working after they hit premium limits, hosted speeds make some models feel real time, and perceived niches (GLM for front-end design) steer specific workloads.

Enterprises Arriving Through the Back Door

Before the open-model wave, companies adopted OpenCode to avoid lock-in to any single model or harness. Now dozens of forward-thinking Fortune 500 companies have significant footprints, and the procurement process has inverted: instead of sales outreach, OpenCode receives DMs saying a few thousand employees are already using the product, please sign the security questionnaire, and often, please don’t tell anyone. Once inside, enterprises pull in predictable directions: extend access to non-technical staff, embed the agent loop in their own products, and manage token spend by restricting expensive frontier models to teams that need them. Ramp exemplified the embedding path, running a Slack bot on OpenCode’s server component before OpenCode itself had tried it. One request, total visibility into employee activity, raised the harder question of what the company is willing to build.

Token Economics: CAC Is Now Paid in Tokens

The episode’s sharpest business insight is that customer acquisition cost has migrated from ads to tokens. Becoming skilled enough with coding agents to justify heavy spend is itself expensive, a chasm most individuals and companies cannot cross unaided. Anthropic and OpenAI solve this by subsidizing subscriptions until a percentage of users become whales, and per Dylan Patel’s numbers cited in the episode, that funnel has carried Anthropic to roughly $50 billion annualized at 70% margins. OpenCode runs the same funnel without frontier-scale subsidies: the free tier delivers the magic moment, the $10 plan makes real work affordable on open models, and whales paying per token convert OpenCode’s volume discounts into margin. The flat 24-hour GPU utilization curve from serving every timezone compounds the advantage.

Betting the Field: The Marketplace Thesis

Jay frames OpenCode as a marketplace where users pick models by attribute and cost, which keeps labs honest and passes competitive gains to consumers instead of vendor margins. Every bump in OpenCode’s growth traces to a release in the open-model market, so the company is explicitly not picking a winning lab; it is betting the field. That bet has made OpenCode the largest customer by token volume for most open-source model labs, a symbiosis where each side needs the other. On commoditization, Jay’s view is nuanced: the intelligence market is so large that labs will carve defensible niches along the quality-cost-performance axes, the way DeepSeek deliberately owns the cost corner. The positioning strategy has deep roots: as with the team’s earlier OpenNext project, when a market has two dominant players, the rest coalesces around an open alternative, and OpenCode raced to become that default, building models.dev along the way just to support 70+ providers at launch.

Sixteen Years to Overnight Success

The backstory reframes everything. Jay started the company after a discouraging Waterloo co-op term in 2006-2007, incorporated with college roommate Frank in 2010, and spent the next decade shipping products that did “reasonably well” while applying to YC nine times across 2016-2021, with four interviews, all as the same legal entity, the same founders, and a rotating cast of ideas. His first YC interview was with Paul Graham, in a waiting room shared with an Airbnb founder. Acceptance finally came in 2021 with the serverless platform that became SST, the team’s gateway into open source and building in public, a practice pushed by YC’s Dalton and crystallized by co-founder Dax’s observation that public code deserves public storytelling. When Claude Code landed in February 2025, the team’s terminal-UI taste (honed on projects as eccentric as coffee-over-SSH) told them exactly what to build. The hosts close on the honest version of the lightning-in-a-bottle myth: ten years of grinding taught the team consumer metrics, open source, marketing, and positioning, so when the strike came, the bottle was already in place.

Notable Quotes

“Most people in the world still haven’t experienced the magic of a coding agent.”
Jay V, on the founding premise of OpenCode

“You really know you have product market fit when like enterprises are bugging you to sign the security agreement so they can use your product.”
Lightcone host, on OpenCode’s inverted enterprise sales motion

“It’s not that we’re picking a winner in terms of a model lab. We’re just betting the field. We just think the rest of the field is going to do well.”
Jay V, on OpenCode’s strategy toward the model market

“With these open-source models, we’re the largest customer for most of them.”
Jay V, on OpenCode’s token volume relative to open-model labs

“When you’ve got a dominant or in this case two dominant players in the market, the rest of the market coalesces around an open alternative. And picking that position ends up being really valuable because if you pick it, it’s very hard for somebody else to displace you.”
Jay V, on the deliberate positioning behind the OpenCode name

“This is just an unprecedented market, like the market for intelligence has not existed before, everybody should be thinking in a positive-sum grow-the-pie mentality.”
Lightcone host, on why labs should welcome OpenCode’s growth

“Look, you know, all your code is public. You work basically in public. If you don’t talk about it publicly, you’re probably doing yourself a disservice and your product a disservice.”
Jay V, recounting co-founder Dax’s case for building in public

“It was really more a journey that took 10 years to get to 0 to 30 million in 8 months.”
Lightcone host, reframing the overnight-success narrative

“To catch the lightning in the bottle, you actually like have to sort of position the bottle correctly and be ready for it and know what to do with it.”
Lightcone host, closing the episode on preparation meeting luck

Watch the full conversation here.

Related Reading
- OpenCode the open-source coding agent discussed throughout the episode, including its public usage data.
- models.dev the open-source database of AI models and providers the team built to support 70+ providers at launch.
- SST the serverless framework that got the company into YC and established its open-source, build-in-public roots.
- Terminal the coffee-over-SSH storefront that proved the team’s terminal-UI chops before OpenCode existed.
- Y Combinator the accelerator behind the Lightcone podcast, which Jay applied to nine times before getting in.
July 24, 2026
Jensen Huang at Stanford CS153 Frontier Systems on Co-Design, Agentic Computing, Vera Rubin, Open Models, and the Million-X Decade That Reshaped AI Infrastructure
https://www.youtube.com/watch?v=tsQB0n0YV3k

NVIDIA CEO Jensen Huang returned to Stanford for the CS153 Frontier Systems class (the room nicknamed itself “AI Coachella”) to lay out, in raw form, how he thinks about the computer being reinvented for the first time in over sixty years. Across roughly seventy minutes of student questions he walks through the codesign philosophy that gave NVIDIA a million-x decade, the architectural through-line from Hopper to Grace Blackwell to Vera Rubin to Feynman, the case for open source foundation models, the realities of tokens per watt and MFU, energy demand running a thousand times higher, the China and export-control debate, and his own biggest strategic mistakes. Watch the full conversation on YouTube.

TLDW

Huang argues every layer of computing has changed: the programming model, the system architecture, the deployment pattern, the economics. Co-design across CPUs, GPUs, networking, storage, switches and compilers gave NVIDIA roughly a million-x speed-up over ten years versus the ten-x Moore’s Law era, and that headroom is what let researchers say “just train on the whole internet.” Hopper was built for pre-training, Grace Blackwell NVLink72 for inference and reasoning (50x over Hopper in two years), Vera Rubin is built for agents that load long memory, call tools and need a low-latency single-threaded CPU bolted directly to the GPU, and Feynman extends that to swarms of agents that spawn sub-agents. Open weights matter because safety, sovereignty (230-plus languages no one else will fund) and domain models for biology, autonomy, robotics and climate need a foundation that NVIDIA is willing to seed. Compute is not really the scarce resource (Huang says place the order and the chips ship), the broken thing is institutional budgeting that can’t put a billion dollars into a shared university supercomputer. Energy demand is heading a thousand times higher and this is finally the moment market forces alone will fund sustainable generation. On geopolitics he rejects the GPUs-as-atomic-bombs framing and warns America will end up like its telecom industry if it cedes two thirds of the world. On career he advises seeking suffering on purpose. On strategy he says observe, reason from first principles, build a mental model, work backwards, minimize opportunity cost, maximize optionality.

Key Takeaways
- The computing model has been substantially unchanged since the IBM System 360, sixty-plus years ago. Huang’s first computer architecture book was the System 360 manual. AI is the first true reinvention.
- Old computing was pre-recorded retrieval. New computing is generated, contextually aware and continuous. Cloud was on-demand. Agentic systems run continuously.
- Codesign is NVIDIA’s central thesis. Inherited from the Hennessy and Patterson RISC era at Stanford, extended across CPUs, GPUs, networking, switches, storage, compilers and frameworks all optimized together.
- The result of full-stack codesign: roughly 1,000,000x faster compute over ten years, versus a generous 10x to 100x for Moore’s Law in the same period. Dennard scaling effectively ended a decade ago.
- That million-x speed-up is what unlocked “train on all of the internet” as a realistic AI strategy.
- After GPT, Huang says it was obvious thinking was next. Reasoning is just generating tokens consumed internally, then using tools is generating tokens consumed externally. Agentic systems followed predictably.
- Education needs AI baked into the curriculum, not just taught as a subject. Pre-recorded textbooks cannot keep pace with knowledge being generated in real time.
- Huang says he cannot learn anymore without AI. He has the AI read the paper, then read every related paper, then become a dedicated researcher he can interrogate.
- Mead and Conway and the first-principles methodology of semiconductor design are still worth learning even though most of the scaling tricks have been exhausted.
- NVIDIA itself is one of the largest consumers of Anthropic and OpenAI tokens in the world. One hundred percent of NVIDIA engineers are now agentically supported. Huang recommends Claude and similar tools by name and says open-source downloads will not match the integrated product harness.
- NVIDIA still invests heavily in open foundation models because language and intelligence represent the codification of human knowledge. Five pillars: Nemotron (language), BioNeMo (biology), Alphamayo (autonomous vehicles), Groot (humanoid robotics) and a climate science model (mesoscale multiphysics).
- Sovereign language models matter. Roughly 230 world languages will never be a top priority for a commercial frontier lab. Nemotron is near-frontier and fully fine-tunable so any country can adapt it.
- Safety and security require open weights. You cannot defend against or audit a black box. Transparent systems let researchers interrogate models and let defenders deploy swarms.
- The future of cyber defense is not bigger-model-versus-bigger-model. It is trillions of cheap fast small models like Nemotron Nano surrounding the threat.
- Domain models fuse language priors with world models. Alphamayo learned to drive safely on a few million miles instead of billions because it can reason like a human about the road.
- MFU (Model Flops Utilization) is a misleading metric. Huang says he wants low MFU, because that means he over-provisioned every resource and never gets pinned by Amdahl’s law during a spike.
- The xAI Memphis cluster running at 11 percent MFU is not necessarily a failure mode. In disaggregated prefill plus decode inference you can deliver very high tokens per watt with very low MFU.
- The right metric is performance, ultimately tokens per watt as a proxy for intelligence per watt, and even that needs adjustment because not all tokens are equal. Coding tokens are worth more than other tokens.
- Hopper was designed for pre-training. NVIDIA chose to build multi-billion-dollar systems when the largest existing scientific supercomputer cost $350 million, with no proven customer base. It worked.
- Grace Blackwell NVLink72 was designed for inference, especially the high-memory-bandwidth decode phase. It is the world’s first rack-scale computer and delivered a 50x speed-up over Hopper in two years, against an expected 2x from Moore’s Law.
- Vera Rubin is designed for agents. Long-term memory wired into storage and into the GPU fabric, working memory, heavy tool use, and Vera, a CPU optimized for low-latency multi-core single-threaded code so a multi-billion-dollar GPU system does not stall waiting on a slow tool call.
- Feynman is being shaped for swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that demands a new compute pattern.
- Tokens per watt improved 50x in one generation. Compounding energy efficiency is the lever NVIDIA controls directly.
- Total compute energy demand is heading roughly a thousand times higher than today, possibly two orders of magnitude beyond that. Huang says he would not be surprised if the estimate is low.
- For the first time in history, market forces alone are enough to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make sustainable energy investment rational.
- Copper interconnect is becoming a bottleneck. Photonics is moving from optional to structural inside racks and across them.
- Comparing NVIDIA GPUs to atomic bombs, Huang says, is a stupid analogy. A billion people use NVIDIA GPUs. He advocates them to his family. He does not advocate atomic bombs to anyone.
- If the United States cedes two thirds of the global market to competitors on policy grounds, the American technology industry will end up like American telecommunications, which was policied out of existence.
- Huang directly rejects AI doom-by-singularity narratives. It is not true that we have no idea how these systems work. It is not true that the technology becomes infinitely powerful in a nanosecond. He calls the rhetoric irresponsible and harmful to the field students are about to enter.
- On Stanford specifically: if the university president places an order, NVIDIA will deliver the chips. The bottleneck is that no university department has a billion-dollar compute budget because budgeting is fragmented across grants. Stanford’s $40 billion endowment is more than enough to fix that.
- “It’s Stanford’s fault” is meant as empowerment. If something is your fault, you can solve it.
- Career advice: do not optimize purely for passion. Most people do not yet know what they love. Pick the job in front of you and do it as well as possible. Even as CEO, Huang says, 90 percent of the work is hard and he suffers through it.
- Suffering on purpose builds the muscle of resilience. When the company, the team or the family needs you to be tough, that muscle has to already exist.
- NVIDIA’s first generation of products was technically wrong in nearly every dimension: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point. The strategic recovery, not the technology, taught Huang the lessons that have lasted decades.
- The biggest clean strategic mistake Huang names is the move into mobile chips (Tegra). It grew to a billion dollars then went to zero when Qualcomm’s modem dominance shut NVIDIA out of the 3G to 4G transition. The recovery into automotive and robotics (the Thor chip is the great great great grandson of that mobile lineage) was real, but Huang refuses to rationalize the original choice.
- Forecasting framework: observe, reason from first principles, ask “so what” and “what next” until you have a mental model of the future, place your company inside that model, then work backwards while minimizing opportunity cost and maximizing optionality.
- Best part of the CEO job: living at the intersection of vision, strategy and execution surrounded by people capable enough to make ambitious visions real. Worst part: the responsibility for everyone who joined the spaceship, especially in the near-death moments NVIDIA had four or five times early on.
- Underrated insider note: Huang’s first apple pie with cheese, first hot fudge sandwich and first milkshake all happened at Denny’s. The Superbird, the fried chicken and a custom Superbird-style ham and cheese with tomato and mustard are his order.
Detailed Summary

Computing reinvented from the ground up

Huang frames the moment as the first true rewrite of the computer in sixty-plus years. From the IBM System 360 forward, the mental model of writing code, running code, taking a computer to market and reasoning about applications stayed roughly constant. AI changes the programming model itself. Software is no longer a compiled binary running deterministically on a CPU. It is a neural network running on a GPU producing generated, contextual, real-time output. That cascades into how companies are organized, what tools developers use, what the network and storage stack look like, and what an application is even allowed to do. Robo-taxis, he notes, are an application no one would have attempted before deep learning unlocked perception.

Codesign and the million-x decade

Codesign is the philosophical center of the talk. Huang traces it to the RISC work of John Hennessy at Stanford, where simpler instruction sets won by being co-designed with the compiler rather than maximally optimized in isolation. NVIDIA extends the principle across every layer simultaneously: GPU architecture, CPU architecture, NVLink and NVSwitch fabrics, photonic interconnects, networking silicon, storage paths, CUDA libraries, frameworks and ultimately the model design. The numbers Huang gives are arresting. Moore’s Law in its prime delivered roughly 100x per decade. By the time Dennard scaling broke, real-world gains had compressed to roughly 10x. NVIDIA’s codesigned stack delivered between 100,000x and 1,000,000x over the same ten-year window. That non-linear speed-up is, in Huang’s telling, the precondition for modern AI: it is what allowed researchers to stop curating training sets and just feed the entire internet to the model.

Education has to fuse first principles with AI tools

Asked how curriculum should evolve, Huang argues AI must be integrated into the learning process, not just taught about. He recalls Hennessy writing his textbook by hand a chapter a week while Huang was a student, and says pre-recorded textbooks cannot keep up with the rate at which AI generates new knowledge. He describes his own learning workflow: hand the paper to an AI, then have it read the entire surrounding literature, then treat the AI as a dedicated researcher who can be interrogated. At the same time he defends the classics. Mead and Conway are still the foundation. Most modern semiconductor scaling tricks have been exhausted, but knowing where the field came from sharpens judgment when designing what comes next.

Open source and the five domain pillars

Huang gives one of the most detailed public accounts of why NVIDIA invests so heavily in open foundation models even while being a top customer of closed labs. He recommends Claude and OpenAI by name for production coding work, and says 100 percent of NVIDIA engineers are now agentically supported. The open-weights case rests on three legs. First, language is the codification of intelligence, and there are at least 230 languages that no commercial lab will ever prioritize. Nemotron is built near frontier and released so any country or community can fine-tune it. Second, the same representation-learning approach has to be replicated in domains where the data is not internet text, so NVIDIA seeded BioNeMo for biology, Alphamayo for autonomy, Groot for humanoid robotics and a climate model for mesoscale multiphysics. The economics of those fields would never produce a foundation model on their own. Third, safety and security require transparency. A black box cannot be defended or audited, and the future of cyber defense is not bigger-model-versus-bigger-model but swarms of cheap fast small models like Nemotron Nano surrounding the threat.

MFU is the wrong metric, tokens per watt is closer

A student raises the leaked memo that the xAI Memphis cluster is running at 11 percent Model Flops Utilization. Huang flips the framing. He says he would rather be at low MFU all the time, because that means he over-provisioned flops, memory bandwidth, memory capacity and network capacity. Bottlenecks shift constantly, so over-provisioning across every dimension is what lets the system absorb a spike without getting pinned by Amdahl’s law. In disaggregated inference, where prefill and decode are physically separated and decode is bandwidth-bound rather than flop-bound, NVLink72 can deliver extremely high tokens per watt while reporting very low MFU. Huang argues the right framing is performance, and ultimately tokens per watt as a rough proxy for intelligence per watt, adjusted for the fact that not all tokens are equal. A coding token is worth more than a generic token.

Hopper, Grace Blackwell NVLink72, Vera Rubin, Feynman

Huang gives the clearest public framing of NVIDIA’s roadmap as a sequence of architectural answers to evolving compute patterns. Hopper was built for pre-training, at a moment when NVIDIA chose to build multi-billion-dollar machines while the largest scientific supercomputer in the world cost $350 million and the marketplace for such systems was, on paper, zero. Grace Blackwell NVLink72 was the answer to inference and reasoning: a rack-scale computer that ganged 72 GPUs together because decode needs aggregate memory bandwidth far beyond a single chip. The generation-over-generation speed-up was 50x in two years, twenty-five times what Moore’s Law would have delivered. Vera Rubin is being built explicitly for agents. Agents load long-term memory from storage that has to be wired directly into the GPU fabric, they use working memory, they call tools that run on a CPU, and they wait. So the CPU has to be Vera, optimized for low-latency single-threaded code, because the multi-billion-dollar GPU system cannot afford to idle waiting on a slow tool call. Feynman extends the pattern to swarms of agents with sub-agents and sub-sub-agents, a recursive software topology that will demand its own compute pattern.

Energy demand and the grid

Huang’s energy projection is one of the most aggressive numbers in the talk. NVIDIA can compound tokens per watt by 50x per generation through codesign, but the total compute demand is heading roughly a thousand times higher, and Huang says he would not be surprised if the real figure is one or two orders of magnitude beyond that. The reason is structural: future computing is generative and continuous, not pre-recorded and on-demand. The good news, he argues, is that this is the best moment in the history of humanity to invest in sustainable generation. Market forces alone are now sufficient to fund solar, nuclear and grid upgrades. Government subsidies are no longer required to make the math work.

Adversarial countries, export controls and the telecom warning

This is the segment where Huang is visibly fired up. He attacks the GPUs-as-atomic-bombs framing on its face. NVIDIA GPUs power medical imaging, video games and soy sauce delivery. A billion people use them. He advocates them to his family. The analogy collapses at the first comparison. He attacks the second framing, that American companies should not compete abroad because they will lose anyway, as a self-fulfilling defeat. Competition makes the company better. The third framing, that depriving the rest of the world of general-purpose computing benefits the United States, also fails on first principles: it benefits one or two American companies at the cost of an entire industry. The cautionary parallel is telecommunications. The United States once had a leading position in telecom fundamental technology and policied itself out of it. Huang’s worry, voiced explicitly to a room of CS students, is that they will graduate into a shell of a computer industry if the same path is repeated.

AI doom and rational optimism

In the same arc Huang rejects the science-fiction framing of AI as a singularity that arrives suddenly on a Wednesday at 7pm and ends civilization. He calls those claims irresponsible, says they are not true, and points out that the people advancing them are believed by audiences who then make policy on that basis. It is not true that no one understands how these systems work. It is not true that intelligence becomes infinitely powerful instantaneously. It is not true that there is no defense. His framing, which the host echoes as “rational optimism,” is that the goal is to create a future where people care about computers because the technology students are learning is worth mastering.

Stanford’s compute problem is Stanford’s fault

A student presses on the scarcity of compute for independent researchers, startups and universities inside the United States. Huang’s answer is sharp: there is no shortage. Place the order and the chips will arrive. The actual broken thing is institutional. University grants are fragmented across departments. No researcher can raise enough on a single grant to fund a billion-dollar shared cluster, and no one shares. He compares it to showing up at the grocery store demanding a billion dollars of tomatoes today. The solution is planning, aggregation and a campus-scale supercomputer, the way Stanford once built the linear accelerator. The endowment is $40 billion. Pulling a billion off it, contracting cloud capacity and giving every student and researcher AI supercomputer access is, in Huang’s view, obviously doable. When he says “it is Stanford’s fault” the host laughs, but Huang clarifies: if it is your fault you have the power to fix it.

Career, suffering and resilience

Asked how a CS student should spend the next few years, Huang pushes back on the standard “follow your passion” advice. Most people do not know what they love yet, because no one knows what they do not know. The bar of demanding joy from every working day is too high. Whatever the job is, do it as well as you can. Even as CEO of NVIDIA he says he genuinely loves about 10 percent of his work. The other 90 percent is hard and he suffers through it. He recommends suffering on purpose, because resilience is a muscle that only builds under load, and when the company, the team or the family needs that muscle, it has to already exist. Earlier in his life that meant cleaning toilets and busing tables at Denny’s. He does it today running a multi-trillion-dollar company.

The biggest mistakes

Huang separates technical mistakes from strategic mistakes. NVIDIA’s first generation of products was technically wrong in almost every way: curved surfaces instead of triangles, no Z-buffer, forward instead of inverse texture mapping, no floating point inside. The company wasted two and a half years. But the strategic genius of the recovery, the reading of the market, the conservation of resources and the reapplication of talent, is what taught him strategy. The clean strategic mistake he names is mobile. NVIDIA’s Tegra line grew to a billion dollars of revenue and then collapsed to zero when Qualcomm’s modem dominance locked NVIDIA out of the 3G to 4G transition. Huang explicitly refuses the comforting rationalization that the Tegra effort fed the Thor automotive chip (“Thor is the great great great grandson”). The original decision, he says, was a waste of time. The lesson is to think one or two clicks further about whether a market is structurally winnable before committing the company.

Forecasting under fog of war

The final substantive exchange is on forecasting. Huang’s method has four steps. Observe what is actually happening (AlexNet crushing two decades of computer vision research in one shot, GPT producing reasoning by token generation). Reason from first principles about why it works. Ask “so what” and “what next” recursively until a mental model of the future emerges. Place the company inside that future and work backwards. Crucially, expect to be partly wrong. Some outcomes will absolutely happen, some will likely happen, some might happen, and the strategy has to be robust across that distribution. The real cost of any strategic choice is the opportunity cost of the alternatives you did not take, so the discipline is to minimize that cost and maximize optionality while letting the journey itself pay for the journey.

Thoughts

The most useful thing in this conversation is the explicit architectural mapping of compute patterns to chip generations. Hopper for pre-training. Grace Blackwell NVLink72 for inference, because decode is bandwidth-bound and a single chip cannot supply it. Vera Rubin for agents, because tool calls stall multi-billion-dollar GPU systems and so the CPU has to be optimized for low-latency single-threaded code. Feynman for swarms. That sequence is not marketing. It is a falsifiable thesis about where the bottleneck moves next, and every other infrastructure company should be measuring themselves against it. If Huang is right that swarms of sub-agents are the next dominant pattern, then the design pressure shifts from raw flops to fabric topology, memory hierarchy and storage-to-GPU latency. That has implications for everyone downstream, including the hyperscalers building competing accelerators.

The MFU section is the most intellectually generous moment in the talk. The instinct in the AI ops community has been to chase MFU as if it were a virtue. Huang argues, persuasively, that low MFU is consistent with high tokens per watt in a disaggregated inference setup, and that bottlenecks rotate fast enough that over-provisioning every resource is the rational design. That reframing matters because it changes what “scarce” means. Compute is not scarce in the way the discourse treats it. What is scarce is a coherent system designed end-to-end. The xAI 11 percent number, in that frame, is not embarrassing. It is the natural reading of a workload that is mostly decode.

The Stanford segment is the part most likely to be quoted out of context. “It’s Stanford’s fault” is a deliberately provocative line, but the underlying claim is correct and load-bearing. Compute is not gated by NVIDIA refusing to ship chips. It is gated by the fact that fragmented grant funding cannot aggregate into the billion-dollar order that NVIDIA can fulfill. The implication is that universities and national labs need a structural change in how they pool capital for compute, and that the current model of every researcher buying a handful of cards is genuinely obsolete. Huang’s nudge about pulling a billion off the endowment is concrete enough to be acted on, and other major research universities should read this segment as a direct prompt.

The geopolitical segment is the highest-stakes one. The telecommunications comparison is correct as a historical pattern, and Huang is one of the very few executives in a position to deliver that warning credibly. The unresolved tension is that the argument applies symmetrically. If American AI dominance is built by selling globally, that includes selling into adversarial states, and the policy question is where the line falls. Huang does not answer that question. He attacks the framing that lets the question be answered badly. That is a meaningful contribution to the discourse even if it does not resolve the underlying tradeoff.

The career advice section is the part the social-media clips will mishandle. “Seek suffering” reads as macho when extracted. In context it is a specific operational claim about how resilience compounds, and it is paired with the Tegra story where Huang himself paid the price of not thinking one more click ahead. That kind of self-implication is rare in CEO talks, and it is the reason the talk is worth listening to in full rather than only reading the recap.

Watch the full Stanford CS153 Frontier Systems conversation with Jensen Huang here.
May 13, 2026
Andrej Karpathy on AutoResearch, AI Agents, and Why He Stopped Writing Code: Full Breakdown of His 2026 No Priors Interview

TL;DW

Andrej Karpathy sat down with Sarah Guo on the No Priors podcast (March 2026) and delivered one of the most information-dense conversations about the current state of AI agents, autonomous research, and the future of software engineering. The core thesis: since December 2025, Karpathy has essentially stopped writing code by hand. He now “expresses his will” to AI agents for 16 hours a day, and he believes we are entering a “loopy era” where autonomous systems can run experiments, train models, and optimize hyperparameters without a human in the loop. His project AutoResearch proved this works by finding improvements to a model he had already hand-tuned over two decades of experience. The conversation also covers the death of bespoke apps, the future of education, open vs. closed source models, robotics, job market impacts, and why Karpathy chose to stay independent from frontier labs.

Key Takeaways

1. The December 2025 Shift Was Real and Dramatic

Karpathy describes a hard flip that happened in December 2025 where he went from writing 80% of his own code to writing essentially none of it. He says the average software engineer’s default workflow has been “completely different” since that month. He calls this state “AI psychosis” and says he feels anxious whenever he is not at the forefront of what is possible with these tools.

2. AutoResearch: Agents That Do AI Research Autonomously

AutoResearch is Karpathy’s project where an AI agent is given an objective metric (like validation loss), a codebase, and boundaries for what it can change. It then loops autonomously, running experiments, tweaking hyperparameters, modifying architectures, and committing improvements without any human in the loop. When Karpathy ran it overnight on a model he had already carefully tuned by hand over years, it found optimizations he had missed, including forgotten weight decay on value embeddings and insufficiently tuned Adam betas.

3. The Name of the Game Is Removing Yourself as the Bottleneck

Karpathy frames the current era as a shift from optimizing your own productivity to maximizing your “token throughput.” The goal is to arrange tasks so that agents can run autonomously for extended periods. You are no longer the worker. You are the orchestrator, and every minute you spend in the loop is a minute the system is held back.

4. Mastery Now Means Managing Multiple Agents in Parallel

The vision of mastery is not writing better code. It is managing teams of agents simultaneously. Karpathy references Peter Steinberg’s workflow of having 10+ Codex agents running in parallel across different repos, each taking about 20 minutes per task. You move in “macro actions” over your codebase, delegating entire features rather than writing individual functions.

5. Personality and Soul Matter in Coding Agents

Karpathy praises Claude’s personality, saying it feels like a teammate who gets excited about what you are building. He contrasts this with Codex, which he calls “very dry” and disengaged. He specifically highlights that Claude’s praise feels earned because it does not react equally to half-baked ideas and genuinely good ones. He credits Peter (OpenClaw) with innovating on the “soul” of an agent through careful prompt design, memory systems, and a unified WhatsApp interface.

6. Apps Are Dead. APIs and Agents Are the Future.

Karpathy built “Dobby the Elf Claw,” a home automation agent that controls his Sonos, lights, HVAC, shades, pool, spa, and security cameras through natural language over WhatsApp. He did this by having agents scan his local network, reverse-engineer device APIs, and build a unified dashboard. His conclusion: most consumer apps should not exist. Everything should be API endpoints that agents can call on behalf of users. The “customer” of software is increasingly the agent, not the human.

7. AutoResearch Could Become a Distributed Computing Project

Karpathy envisions an “AutoResearch at Home” model inspired by SETI@home and Folding@home. Because it is expensive to find code optimizations but cheap to verify them (just run the training and check the metric), untrusted compute nodes on the internet could contribute experimental results. He draws an analogy to blockchain: instead of blocks you have commits, instead of proof of work you have expensive experimentation, and instead of monetary reward you have leaderboard placement. He speculates that a global swarm of agents could potentially outperform frontier labs.

8. Education Is Being Redirected Through Agents

Karpathy describes his MicroGPT project, a 200-line distillation of LLM training to its bare essence. He says he started to create a video walkthrough but realized that is no longer the right format. Instead, he now “explains things to agents,” and the agents can then explain them to individual humans in their own language, at their own pace, with infinite patience. He envisions education shifting to “skills” (structured curricula for agents) rather than lectures or guides for humans directly.

9. The Jaggedness Problem Is Still Real

Karpathy describes current AI agents as simultaneously feeling like a “brilliant PhD student who has been a systems programmer their entire life” and a 10-year-old. He calls this “jaggedness,” and it stems from reinforcement learning only optimizing for verifiable domains. Models can move mountains on agentic coding tasks but still tell the same bad joke they told four years ago (“Why don’t scientists trust atoms? Because they make everything up.”). Things outside the RL reward loop remain stuck.

10. Open Source Is Healthy and Necessary, Even If Behind

Karpathy estimates open source models are now roughly 6 to 8 months behind closed frontier models, down from 18 months and narrowing. He draws a parallel to Linux: the industry has a structural need for a common, open platform. He is “by default very suspicious” of centralization and wants more labs, more voices in the room, and an “ensemble” approach to AI governance. He thinks it is healthy that open source exists slightly behind the frontier, eating through basic use cases while closed models handle “Nobel Prize kind of work.”

11. Digital Transformation Will Massively Outpace Physical Robotics

Karpathy predicts a clear ordering: first, a massive wave of “unhobling” in the digital space where everything gets rewired and made 100x more efficient. Then, activity moves to the interface between digital and physical (sensors, cameras, lab equipment). Finally, the physical world itself transforms, but on a much longer timeline because “atoms are a million times harder than bits.” He notes that robotics requires enormous capital expenditure and conviction, and most self-driving startups from 10 years ago did not survive long term.

12. Why Karpathy Stays Independent From Frontier Labs

Karpathy gives a nuanced answer about why he is not working at a frontier lab. He says employees at these labs cannot be fully independent voices because of financial incentives and social pressure. He describes this as a fundamental misalignment: the people building the most consequential technology are also the ones who benefit most from it financially. He values being “more aligned with humanity” outside the labs, though he acknowledges his judgment will inevitably drift as he loses visibility into what is happening at the frontier.

Detailed Summary

The AI Psychosis and the End of Hand-Written Code

The conversation opens with Karpathy describing what he calls a state of perpetual “AI psychosis.” Since December 2025, he has not typed a line of code. The shift was not gradual. It was a hard flip from doing 80% of his own coding to doing almost none. He compares the anxiety of unused agent capacity to the old PhD feeling of watching idle GPUs. Except now, the scarce resource is not compute. It is tokens, and you feel the pressure to maximize your token throughput at all times.

He describes the modern workflow: you have multiple coding agents (Claude Code, Codex, or similar harnesses) running simultaneously across different repositories. Each agent takes about 20 minutes on a well-scoped task. You delegate entire features, review the output, and move on. The job is no longer typing. It is orchestration. And when it does not work, the overwhelming feeling is that it is a “skill issue,” not a capability limitation.

Karpathy says most people, even his own parents, do not fully grasp how dramatic this shift has been. The default workflow of any software engineer sitting at a desk today is fundamentally different from what it was six months ago.

AutoResearch: Closing the Loop on AI Research

The centerpiece of the conversation is AutoResearch, Karpathy’s project for fully autonomous AI research. The setup is deceptively simple: give an agent an objective metric (like validation loss on a language model), a codebase to modify, and boundaries for what it can change. Then let it loop. It generates hypotheses, runs experiments, evaluates results, and commits improvements. No human in the loop.

Karpathy was surprised it worked as well as it did. He had already hand-tuned his NanoGPT-derived training setup over years using his two decades of experience. When he let AutoResearch run overnight, it found improvements he had missed. The weight decay on value embeddings was forgotten. The Adam optimizer betas were not sufficiently tuned. These are the kinds of things that interact with each other in complex ways that a human researcher might not systematically explore.

The deeper insight is structural: everything around frontier-level intelligence is about extrapolation and scaling laws. You do massive exploration on smaller models and then extrapolate to larger scales. AutoResearch is perfectly suited for this because the experimentation is expensive but the verification is cheap. Did the validation loss go down? Yes or no.

Karpathy envisions this scaling beyond a single machine. His “AutoResearch at Home” concept borrows from distributed computing projects like Folding@home. Because verification is cheap but search is expensive, you can accept contributions from untrusted workers across the internet. He draws a blockchain analogy: commits instead of blocks, experimentation as proof of work, leaderboard placement as reward. A global swarm of agents contributing compute could, in theory, rival frontier labs that have massive but centralized resources.

The Claw Paradigm and the Death of Apps

Karpathy introduces the concept of the “claw,” a persistent, looping agent that operates in its own sandbox, has sophisticated memory, and works on your behalf even when you are not watching. This goes beyond a single chat session with an AI. A claw has persistence, autonomy, and the ability to interact with external systems.

His personal example is “Dobby the Elf Claw,” a home automation agent that controls his entire smart home through WhatsApp. The agent scanned his local network, found his Sonos speakers, reverse-engineered the API, and started playing music in three prompts. It did the same for his lights, HVAC, shades, pool, spa, and security cameras (using a Qwen vision model for change detection on camera feeds).

The broader point is that this renders most consumer apps unnecessary. Why maintain six different smart home apps when a single agent can call all the APIs directly? Karpathy argues the industry needs to reconfigure around the idea that the customer is increasingly the agent, not the human. Everything should be exposed API endpoints. The intelligence layer (the LLM) is the glue that ties it all together.

He predicts this will become table stakes within a few years. Today it requires vibe coding and direct agent interaction. Soon, even open source models will handle this trivially. The barrier will come down until every person has a claw managing their digital life through natural language.

Model Jaggedness and the Limits of Reinforcement Learning

One of the most technically interesting sections covers what Karpathy calls “jaggedness.” Current AI models are simultaneously superhuman at verifiable tasks (coding, math, structured reasoning) and surprisingly mediocre at anything outside the RL reward loop. His go-to example: ask any frontier model to tell you a joke, and you will get the same one from four years ago. “Why don’t scientists trust atoms? Because they make everything up.” The models have improved enormously, but joke quality has not budged because it is not being optimized.

This jaggedness creates an uncanny valley in interaction. Karpathy describes the experience as talking to someone who is simultaneously a brilliant PhD systems programmer and a 10-year-old. Humans have some variance in ability across domains, but nothing like this. The implication is that the narrative of “general intelligence improving across all domains for free as models get smarter” is not fully accurate. There are blind spots, and they cluster around anything that lacks objective evaluation criteria.

He and Sarah Guo discuss whether this should lead to model “speciation,” where specialized models are fine-tuned for specific domains rather than one monolithic model trying to be good at everything. Karpathy thinks speciation makes sense in theory (like the diversity of brains in the animal kingdom) but says the science of fine-tuning without losing capabilities is still underdeveloped. The labs are still pursuing monocultures.

Open Source, Centralization, and Power Balance

Karpathy, a long-time open source advocate, estimates the gap between closed and open source models has narrowed from 18 months to roughly 6 to 8 months. He draws a direct parallel to Linux: despite closed alternatives like Windows and macOS, the industry structurally needs a common open platform. Linux runs on 60%+ of computers because businesses need a shared foundation they feel safe using.

The challenge for open source AI is capital expenditure. Training frontier models is astronomically expensive, and that is where the comparison to Linux breaks down somewhat. But Karpathy argues the current dynamic is actually healthy: frontier labs push the bleeding edge with closed models, open source follows 6 to 8 months behind, and that trailing capability is still enormously powerful for the vast majority of use cases.

He expresses deep skepticism about centralization, citing his Eastern European background and the historical track record of concentrated power. He wants more labs, more independent voices, and an “ensemble” approach to decision-making about AI’s future. He worries about the current trend of further consolidation even among the top labs.

The Job Market: Digital Unhobling and the Jevons Paradox

Karpathy recently published an analysis of Bureau of Labor Statistics jobs data, color-coded by which professions primarily manipulate digital information versus physical matter. His thesis: digital professions will be transformed first and fastest because bits are infinitely easier to manipulate than atoms. He calls this “unhobling,” the release of a massive overhang of digital work that humans simply did not have enough thinking cycles to process.

On whether this means fewer software engineering jobs, Karpathy is cautiously optimistic. He invokes the Jevons Paradox: when something becomes cheaper, demand often increases so much that total consumption goes up. The canonical example is ATMs and bank tellers. ATMs were supposed to replace tellers, but they made bank branches cheaper to operate, leading to more branches and more tellers (at least until 2010). Similarly, if AI makes software dramatically cheaper, the demand for software could explode because it was previously constrained by scarcity and cost.

He emphasizes that the physical world will lag behind significantly. Robotics requires enormous capital, conviction, and time. Most self-driving startups from a decade ago failed. The interesting opportunities in the near term are at the interface between digital and physical: sensors feeding data to AI systems, actuators executing AI decisions in the real world, and new markets for information (he imagines prediction markets where agents pay for real-time photos from conflict zones).

Education in the Age of Agents

Karpathy’s MicroGPT project distills the entire LLM training process into 200 lines of Python. He started making an explanatory video but stopped, realizing the format is obsolete. If the code is already that simple, anyone can ask an agent to explain it in whatever way they need: different languages, different skill levels, infinite patience, multiple approaches. The teacher’s job is no longer to explain. It is to create the thing that is worth explaining, and then let agents handle the last mile of education.

He envisions a future where education shifts from “guides and lectures for humans” to “skills and curricula for agents.” A skill is a set of instructions that tells an agent how to teach something, what progression to follow, what to emphasize. The human educator becomes a curriculum designer for AI tutors. Documentation shifts from HTML for humans to markdown for agents.

His punchline: “The things that agents can do, they can probably do better than you, or very soon. The things that agents cannot do is your job now.” For MicroGPT, the 200-line distillation is his unique contribution. Everything else, the explanation, the teaching, the Q&A, is better handled by agents.

Why Not Return to a Frontier Lab?

The conversation closes with a nuanced discussion about why Karpathy remains independent. He identifies several tensions. First, financial alignment: employees at frontier labs have enormous financial incentives tied to the success of transformative (and potentially disruptive) technology. This creates a conflict of interest when it comes to honest public discourse. Second, social pressure: even without arm-twisting, there are things you cannot say and things the organization wants you to say. You cannot be a fully free agent. Third, impact: he believes his most impactful contributions may come from an “ecosystem level” role rather than being one of many researchers inside a lab.

However, he acknowledges a real cost. Being outside frontier labs means his judgment will inevitably drift. These systems are opaque, and understanding how they actually work under the hood requires being inside. He floats the idea of periodic stints at frontier labs, going back and forth between inside and outside roles to maintain both independence and technical grounding.

Thoughts

This is one of the most honest and technically grounded conversations about the current state of AI I have heard in 2026. A few things stand out.

The AutoResearch concept is genuinely important. Not because autonomous hyperparameter tuning is new, but because Karpathy is framing the entire problem correctly: the goal is not to build better tools for researchers. It is to remove researchers from the loop entirely. The fact that an overnight run found optimizations that a world-class researcher missed after years of manual tuning is a powerful data point. And the distributed computing vision (AutoResearch at Home) could be the most consequential idea in the entire conversation if someone builds it well.

The “death of apps” framing deserves more attention. Karpathy’s Dobby example is not a toy demo. It is a preview of how every consumer software company’s business model gets disrupted. If agents can reverse-engineer APIs and unify disparate systems through natural language, the entire app ecosystem becomes a commodity layer beneath an intelligence layer. The companies that survive will be the ones that embrace API-first design and accept that their “user” is increasingly an LLM.

The jaggedness observation is underappreciated. The fact that models can autonomously improve training code but cannot tell a new joke should be deeply uncomfortable for anyone claiming we are on a smooth path to AGI. It suggests that current scaling and RL approaches produce narrow excellence, not general intelligence. The joke example is funny, but the underlying point is serious: we are building systems with alien capability profiles that do not match any human intuition about what “smart” means.

Finally, Karpathy’s decision to stay independent is itself an important signal. When one of the most capable AI researchers in the world says he feels “more aligned with humanity” outside of frontier labs, that should be taken seriously. His point about financial incentives and social pressure creating misalignment is not abstract. It is structural. And his proposed solution of rotating between inside and outside roles is pragmatic and worth consideration for the entire field.

March 20, 2026