PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: Agentic Engineering

  • Waste Tokens to Save Time: Naval, Guillermo Rauch, Blake Scholl, and Max Hodak on AI Software Factories, 1000x Engineers, and Whether Pure Software Is Dead

    Naval Ravikant gathers three frontier founders, Guillermo Rauch of Vercel, Blake Scholl of Boom Supersonic, and Max Hodak of Science, for a freewheeling conversation about how AI coding tools are reshaping what an engineer is, what software is worth, and where the moat goes when models speak English. The headline idea comes from Naval himself: waste tokens, save time. Stop measuring AI by tokens consumed or lines of code generated and start measuring it by the final output and the time you got back. The full conversation is on the Naval Podcast YouTube channel. This is part one of the discussion. Part two, on vibe coding hardware, follows the same group into jet engines, semiconductors, and biotech.

    TLDW

    The job of an engineer is shifting from shipping output to building the factory that ships the output, which means 10x engineers were never really 10x, they were always 100x or 1000x in idea domains, and AI leverage is making that obvious. Models now reflect back the judgment of the user, so a senior architect extracts dramatically more value than a junior, although the junior also writes code they could never have written alone. The frontier models have quietly graduated from junior coders to principal engineers, returning with intuitive plans and real tradeoffs (sometimes with hilariously bad time estimates) rather than just running away with the prompt. Naval has stopped learning prompt tricks, scaffolding tools, and Claude plan-mode rituals entirely. Instead he throws Codex, Claude, and Gemini at the same problem in parallel and brute forces his way through, because tokens are still cheaper than a human and the models keep getting better faster than tricks can. That leads to the bigger question on the table: is pure software still investable, or is it now just a free byproduct of hardware, models, and taste? The group lands on the block economy thesis (a tip of the hat to Mitchell Hashimoto): agents do not want to reinvent Postgres or BMQ on the fly, they want to grab the right reusable building block, so infrastructure software actually gets more valuable, not less. Max Hodak closes the loop with a personal data point: he has not written a line of code in years and has built more software since December than ever before, all through agents, because just understanding APIs, data flow, and performance is what actually moves the work forward.

    Thoughts

    The “waste tokens, save time” line is the most important rhetorical move in this conversation, and it deserves to be unpacked beyond the soundbite. Naval is implicitly arguing that the entire token-economics debate (input cost, output cost, leaderboards, model arbitrage) is a category error in the same way that lines-of-code was a category error in the nineties. The thing being purchased is not tokens. It is a finished result delivered with less of your finite attention spent. If three parallel runs of Codex, Claude, and Gemini cost you a few dollars and one of them lands the answer in twenty minutes instead of you sweating the problem for two hours, the unit economics are not even close. The only people who care about the token bill are people who have not internalized that human time is the actually scarce resource. Once you do internalize it, the question is no longer “how do I prompt this more efficiently,” it is “how do I get out of my own way.”

    The 100x and 1000x engineer point is the one most likely to enrage commenters, and it is also the one most worth taking seriously. Naval is right that the egalitarian flinch in software circles always sat awkwardly next to the empirical fact that one Carmack, one Brendan Eich, or one Satoshi creates more durable value than every mid-tier engineer on earth combined. What AI does is collapse the bottom of that distribution. The marginal junior engineer at a typical company is now competing with a model that costs a few dollars an hour and never sleeps. The remaining premium for human engineers is taste, judgment, and the rare ability to pick the right thing to build at all, which Naval correctly flags as the multiplier that dwarfs raw coding speed. “Just one who had a better judgment on what to work on in the first place” is the most underrated line in the whole episode.

    Guillermo Rauch’s observation that the models have graduated from running away with your prompt to returning with three routes and a tradeoff matrix is the technical update most people have not actually felt yet. There was a real, qualitative shift when the model started saying “we don’t put high-cardinality telemetry into Postgres, you probably want ClickHouse or Athena.” That is not autocomplete. That is a peer. And the funny corollary, that the same model will then confidently tell you the work will take three weeks when it will take three hours, is not a knock on the model. It is a reminder that calibration is a separate skill from competence, and humans get this wrong constantly too. The right posture is to treat the model the way a good engineering manager treats a strong but cocky senior: take the architecture suggestions seriously, throw out the estimates.

    The block-economy thread, riffing on Mitchell Hashimoto, is where this conversation quietly answers Naval’s “is pure software dead” question. Agents are insatiable consumers of reusable building blocks because reinventing infrastructure on every run is wasteful, brittle, and incompatible with the rest of the world. If your service is the canonical primitive an agent reaches for (the queue, the database, the auth layer, the deploy target), you are not commoditized by AI, you are amplified by it. Pure software is not dead. Pure software with no distribution, no defensibility, and no integration into the agent toolchain is dead. That is a much less catchy headline, but it is the real one. The takeaway for founders is not to abandon software, it is to ask whether your software is something an agent will reach for ten thousand times a day or something a human had to be talked into using once.

    Max Hodak’s confession (no code written in years, more shipped software in the last six months than ever before) is the empirical proof that this is not just theory. The skill that ports forward is not syntax. It is the engineering leader’s instinct for what an API is, how data flows, where performance matters, and what level of expectation to set. Guillermo’s framing of “vibe coding through people on Slack” as the original form of vibe coding is genuinely insightful. A good engineering manager has always been transmitting intent to other minds and letting them run. Doing it with agents is the same skill, just with a faster, cheaper, more literal counterparty. The engineers who will struggle in this transition are the ones whose identity was tied to writing the code themselves. The ones who will thrive are the ones who already thought of themselves as taste, judgment, and intent, with code as an implementation detail.

    Key Takeaways

    • The engineer’s job has shifted from shipping output B to building the factory that produces outputs B through Z. You are now judged on the multiplicative system you create, not the single artifact you deliver.
    • 10x engineers were always a misnomer. In idea-domains and digital domains, the real distribution has always been 100x or 1000x. AI just made that obvious enough that arguing about it is no longer fashionable.
    • Token consumption leaderboards are the new lines-of-code metric: a vanity number that measures activity, not value. Tokens are an input, your time is the constraint.
    • Naval’s core rule: waste tokens, save time. Tokens are still vastly cheaper than human hours, no matter how the pricing scares you.
    • Models tend to be about as good as you are in a given domain. The feedback you give them, the corrections, the redirections, sporadically but powerfully shapes the quality of the output.
    • The quality of your reprompting matters enormously today, but will probably matter less over time as models get smarter and need less hand-holding.
    • Naval has refused to learn prompt scaffolding, plan-mode tricks, or named prompt frameworks. His bet is that the models will figure out how to use him faster than he can figure out how to use them.
    • His preferred technique: throw Codex, Claude, and Gemini at the same problem in parallel and brute force the answer. Time is the cost center, not API spend.
    • Lower quality first-draft code is not a blocker. When it is time to ship, throw more tokens at it for a hardening pass. Quality compounds across model generations.
    • Verifiable domains (problems with a clear right answer) are the ones the models will fully solve. Cutting-edge creativity work, the Terence Tao tier, still needs careful human collaboration.
    • Models have qualitatively shifted from “next-token autocomplete that runs away with your prompt” to “intuitive planning mode” where they return with multiple routes and explicit tradeoffs.
    • This is why people on social media say models are now PhD-level. It is not the raw output, it is the back-and-forth posture.
    • Models will confidently make terrible time estimates (“this is a three week project”). Treat them like a strong but miscalibrated senior engineer: trust the architecture, ignore the schedule.
    • Architect-level engineers are extracting much more value per session than junior engineers, but juniors are still leveling up because they can now write code far above their unaided ability.
    • The next career step for a junior engineer is moving from implementing features to picking technologies. Postgres vs ClickHouse, ZMQ vs other queues. The model can suggest, but a human still has to decide.
    • Taste and judgment remain the residual human advantage. Models will give you good tradeoffs if you ask, but knowing which tradeoff to take is still on you.
    • Concrete example: a recent model pushed back when asked to store high-cardinality telemetry in Postgres and recommended ClickHouse or Athena instead. Unprompted architectural judgment.
    • Humans are still completing the model for tasks like fetching API keys, moving capital, or performing real-world actions. That gap is temporary.
    • Every SaaS and hosting company will soon expose a CLI or API surface that agents can drive directly. Anything Unix-shaped and text-based, agents can already hack into a usable API themselves.
    • The missing piece for full autonomy is payments. Crypto, Bitcoin, or any programmable money lets the agent buy what it needs without a human in the loop.
    • The open question Naval poses: is pure software dead? We used to learn code to talk to machines. Now machines speak fuzzy, sloppy English back to us.
    • For hardware founders, AI is a massive boon. Software, which was always hard to hire artists for (per Patrick Collison’s “software is art” framing), is suddenly fast and cheap to produce alongside the hardware.
    • Model training, post-training, and fine-tuning may be the new “real software engineering” for those who want to work at the model layer.
    • Mitchell Hashimoto’s “block economy” thesis: agents need powerful, reusable, well-known building blocks. They should not reinvent message queues or databases every run.
    • Reinventing primitives is bad civic engineering. The value of “we both depend on Postgres 13.2” is interoperability with the rest of society and toolchain.
    • Infrastructure software and reusable libraries are getting more valuable, not less, in the agentic era. Vercel’s bet is on being the layer agents reach for.
    • Useful metaphor: building blocks are like a token cache. Why churn through a trillion tokens to reproduce code that already exists when you can fork from a known starting point?
    • Max Hodak has not written a line of code in years but has shipped a huge volume of personal software since December, all through agents. Projects he had fantasized about for years are now actually running.
    • What still matters from a real software background: understanding what an API is, how data flows, performance expectations, and how to set the right level of demand on an operation.
    • A proficient engineering leader has always been “vibe coding through people” on Slack and in one-on-ones, transmitting intent and letting others execute. Doing it with agents is the same skill, faster and cheaper.
    • Naval personally went from twenty years of not coding to coding constantly through agents, leaning on first-principles software engineering and algorithms knowledge.
    • The friction that historically killed personal coding projects (latest framework, infra plumbing, deploy setup) is now mostly handled by the agent. Vercel makes it easier, agents make it trivial.
    • The single biggest change Max highlights: you do not get stuck anymore. The indefinite debugging spiral on some narrow obscure bug is largely gone.
    • The old mantra that learning to program means accepting intrinsic frustration (“nope, that’s part of the deal”) is no longer true. The frustration was incidental, not essential.
    • The frontier founder pattern on display in this episode: all three guests build their own factories (Vercel’s AI cloud, Boom’s supersonic jets and engines, Science’s biohybrid brain interface) rather than composing from off-the-shelf parts.

    Detailed Summary

    The Software Factory and the Hundredfold Engineer

    Guillermo Rauch opens the substantive portion of the conversation with the framing he has been pushing publicly: the role of the engineer is moving from “ship output B” to “build the factory that ships outputs B through Z.” That reframes engineering judgment. You are no longer evaluated on the single deliverable, you are evaluated on the multiplicative system you put in place. Naval picks up the thread and points out that this also retires an old debate. Engineers used to argue about whether 10x engineers existed, with the egalitarian camp insisting that talent differences were marginal. The truth, Naval says, was always more extreme. In idea-domains, virtual domains, and intellectual domains, the distribution has always been 100x or 1000x, not 10x. Brendan Eich, Carmack, Satoshi, the canonical names, were thousandx programmers. AI has made the underlying distribution legible. And the multiplier on top of all of that is judgment: picking the right thing to work on in the first place is an infinity multiplier compared to picking the wrong thing, regardless of raw skill.

    Token Leaderboards Are the New Lines of Code

    Guillermo flags the current cultural confusion: people see their AI bills, see the token counts, and assume they should be optimizing for tokens-per-engineer or similar metrics. Max Hodak’s response cuts through it. Token consumption, like lines of code before it, is not a meaningful productivity metric. It is an activity metric, and activity metrics always mislead. Max adds his own field observation: the models tend to be roughly as good as you are in a given domain. A senior developer extracts genuinely powerful output, a junior gets junior-quality output back, because the feedback loop (the corrections, the redirections, the architectural pushback) is what shapes quality. The sporadic but high-leverage moments where the user redirects the model are doing more work than the prompt itself.

    Naval’s Brute Force Doctrine: Waste Tokens, Save Time

    Naval lays out his personal posture, which has become the title of the conversation. He has deliberately ignored all the prompting tricks, scaffolding tools, named prompt frameworks (“use Ralph Wigum, use OpenClaude, use Hermes, use plan mode”), on the bet that the models will figure out how to use him faster than he can figure out how to use them. He is ham-fisted with the models, gets frustrated, types less and less, and just brute forces his way through by running Codex, Claude, and Gemini at the same problem simultaneously. The justification is economic. No matter how expensive the models seem, they are still vastly cheaper than a human hour. Do not measure tokens as inputs or outputs. Measure your time and the final output. Even when the first-draft code is low quality, that is not a blocker. When the moment comes to ship, throw more tokens at it. The models will rewrite it, harden it, and they get better every generation. Naval explicitly excepts cutting-edge creative work (the Terence Tao tier of unsolved problems) where you still need to collaborate carefully and closely. Everywhere else, brute force is the dominant strategy.

    From Junior Coder to Principal Engineer

    Guillermo identifies a qualitative shift that has happened recently. Models used to do the classic next-token thing: take your prompt and run away with it in a direction you may not have wanted. Now they enter an intuitive planning posture without being told to plan. They come back and say “what you are asking has these three routes, here are the tradeoffs.” That, Guillermo argues, is the moment the model stopped being a junior engineer and became a principal engineer. The funny side effect is that they will then return preposterous time estimates (“this will take three weeks”) with full confidence. The conclusion is to treat the model as a peer for architecture and a baby for scheduling. Returning to the Max-vs-junior question, Guillermo argues juniors clearly do level up because they write code well above their solo ability, but architects extract maybe 10x while juniors extract more like 2x. The juice scales with the user’s existing taste.

    Taste, Judgment, and Architectural Decisions

    Max names the residual human contribution: taste and judgment. Picking between Postgres and ClickHouse for high-cardinality telemetry data, picking between ZMQ and another queueing system. The models can recommend, but a human still has to call it. Guillermo offers a recent concrete example where a model pushed back unprompted: when asked to put high-cardinality telemetry into Postgres, the model responded “we don’t put that kind of data into Postgres, you should consider ClickHouse or Athena.” That is the new normal. The peer-level architectural pushback is happening unsolicited, which is genuinely impressive and a real shift from the deferential autocomplete of two years ago.

    When the Human Becomes the Tool

    Guillermo raises the inversion question: at what point does the model stop being the assistant and the human start being the assistant who fetches API keys, moves capital, and performs real-world actions on the model’s behalf? Naval treats it as a temporary aberration. Every serious SaaS and hosting provider will soon expose a CLI or API surface that agents can drive directly. Even when they do not, anything Unix-shaped and text-based can be hacked into an agent-usable interface by the agent itself. The missing piece is payments. Once you insert programmable money (Naval mentions Bitcoin and crypto tokens), the agent can buy what it needs and the human is no longer the bottleneck.

    Is Pure Software Dead?

    Naval poses the biggest strategic question of the episode. If models now speak fuzzy, sloppy English the same way humans do, and the historical reason we learned to code was to talk to machines that did not understand English, is pure software still a viable thing to build a company around? His own framing of the answer: hardware founders win, because the historically hard problem of hiring software artists (per Patrick Collison’s “software is art” line) is now mostly solved by AI. Model builders win, because training, post-training, and fine-tuning may be the new “real software engineering.” But what about classic pure software companies? Naval lets the question hang, and Guillermo picks up the answer through a different door.

    The Block Economy and the Future of Infrastructure Software

    Guillermo cites Mitchell Hashimoto’s recent piece on the block economy (or “building block economy”). The argument: the most valuable thing for agents to have access to is powerful, reusable building blocks. You do not want your agent reinventing a queue system every time it needs to send an email. You want it to grab the right-sized block (BMQ, ClickHouse, whatever) and move on. Reinventing primitives is also a civic problem. The world only works because we all depend on the same Postgres 13.2, the same protocols, the same standard infrastructure. If every agent went off and invented its own bespoke universe, you would lose interoperability. So infrastructure software (which is, by self-admitted bias, what Vercel builds) becomes more valuable in the agentic era, not less. Guillermo extends the metaphor: reusable building blocks are like a token cache. Why burn a trillion tokens reproducing what already exists when the agent can fork from a known starting point? The block economy is the answer to “is pure software dead.” Pure software that becomes the canonical primitive an agent reaches for is more valuable than ever.

    Max Hodak’s Personal Proof: Years Without Code, Tons of Software Shipped

    Max grounds the discussion in his own experience. He learned to program young, got sucked into it in his teens and 20s, knew programming languages deeply. He has not written a line of code in quite a while. And yet since December he has built a huge amount of personal software, including projects he had fantasized about for years and now actually uses every day. He did not write any of it. He cannot imagine going back to writing code by hand. The skill that ports forward is not syntax, it is the understanding of how APIs work, how data flows, what level of performance to expect, and how to orient the model around the right expectations for an operation. Guillermo extends this with the most quotable framing of the episode: a proficient engineering leader has always been “vibe coding through people on Slack and in one-on-ones,” transmitting intent and letting others execute. Agents are the same modality with a faster, cheaper, more literal counterparty.

    Naval’s Return to Coding After Twenty Years

    Naval offers his own parallel. He went from not having written code in twenty years to coding constantly through agents. What carried him back in was first-principles knowledge of software engineering and algorithms, which gets you further than you would think. The reason he had stopped coding in the first place was not lack of ability, it was the friction of keeping up with the latest language, the latest architecture, and the constant infrastructure plumbing required to ship anything. Vercel made it easier. Agents made it trivial. Max closes with the most concrete benefit of all: you do not get stuck anymore. The indefinite debugging spiral on some obscure narrow problem, the thing that historically ate weekends and broke spirits, is largely gone. The old mantra that programming is intrinsically frustrating and that frustration is “part of the deal” turned out to be wrong. The frustration was incidental, not essential.

    Notable Quotes

    “The way that I’m judging you as an engineer is, are you producing the factory that will produce multiplicative outputs B through Z?”

    Guillermo Rauch, reframing what an engineer is actually being measured on in the AI era.

    “When you’re operating in idea domains, intellectual domains, virtual digital domains, it’s not even 10x, it’s 100x or 1000x. It always has been.”

    Naval Ravikant, on why the old 10x engineer debate was always under-stating the real distribution.

    “If you choose the right thing to work on versus the wrong thing to work on, that’s an infinity difference. It could just be one who had a better judgment on what to work on in the first place.”

    Naval Ravikant, on judgment as the multiplier that dwarfs raw skill.

    “I’ll throw Codex, Claude, and Gemini at the same problem over and over and just waste tokens to save time. No matter how expensive these models might seem, they’re still way cheaper than a human.”

    Naval Ravikant, on his brute-force multi-model coding workflow.

    “Just waste tokens, save time. Don’t look at the tokens either as inputs or outputs. Just look at your time and look at the final output.”

    Naval Ravikant, delivering the title thesis of the episode.

    “Clearly the models at some point graduated. They used to be junior engineers, now they’re principal engineers, because they come back to you with a set of tradeoffs.”

    Guillermo Rauch, on the qualitative shift in how current frontier models respond to prompts.

    “Bro, we don’t put that kind of data into Postgres, you should consider ClickHouse or Athena or whatever. That’s happened to me a lot, which is really impressive.”

    Guillermo Rauch, recounting unprompted architectural pushback from a recent model.

    “It’s like saying speaking English. We had to learn code to communicate with the models, now the models speak English. So where’s the moat?”

    Naval Ravikant, raising the central strategic question about the future of pure software.

    “I haven’t written a single line of code in quite a while. Since December, I’ve built a huge amount of software that I now use every day, projects I’ve fantasized about for years.”

    Max Hodak, on what becomes possible when you stop writing code and start directing agents.

    “A proficient engineering leader has been quote unquote vibe coding through people on Slack or one-on-ones, because you’re transmitting your will, your intent, your experience, and you’re letting others run with it. Now we do the same with agents.”

    Guillermo Rauch, reframing leadership itself as the original form of vibe coding.

    Watch the full conversation on the Naval Podcast here.

    Related Reading

    • Part two: Vibe Coding Hardware, the continuation of this conversation, where the same founders move from pure software into AI-designed jet engines, vertical integration, China’s open-source bet, and why humans become verifiers.
    • Naval Ravikant’s official site, the canonical home for Naval’s essays, podcast, and longer-form thinking on technology, judgment, and leverage.
    • Vercel, Guillermo Rauch’s company, building the AI-native cloud and frontend infrastructure that this conversation references as a canonical agent building block.
    • Boom Supersonic, Blake Scholl’s company building supersonic civilian aircraft and their own jet engines, the hardware example of a founder building the whole factory.
    • Science Corporation, Max Hodak’s brain-computer interface company developing the biohybrid neural implant referenced in the intro.
    • Mitchell Hashimoto’s writing, source of the “block economy” framing for why reusable infrastructure building blocks become more valuable, not less, in the agentic era.
  • Shopify CEO Tobi Lütke: AI Is the Perfect Scapegoat for Layoffs, Canada Has Trump Derangement Syndrome, and 50% of Shopify Code Is Now AI-Generated

    TLDW

    Shopify CEO Tobi Lütke sat down with Harry Stebbings on 20VC for one of the most candid and controversial conversations of his career. Lütke argues that the current wave of mass layoffs has nothing to do with AI and everything to do with pandemic-era overhiring, but AI will be blamed because it cannot fight back. He blasts Canada for its “Trump Derangement Syndrome,” calls the climate cult “one of the most evil things wrought on the population,” reveals that over 50% of Shopify’s code is now AI-generated, and says many of his best engineers have not written a line of code since December when Claude Opus changed everything. He also introduces River, an AI engineer at Shopify that named itself, and explains why he believes context engineering will be the dominant role of the next five years.

    Key Takeaways

    • AI is not causing layoffs, COVID overhiring is. Lütke is blunt: “What you see right now is not AI layoffs. Those are just the companies that are really slow that overhired just like everyone else.” AI will get blamed for everything because it is the perfect Girardian scapegoat that cannot fight back.
    • Over 50% of Shopify’s code is now AI-generated and “converting to much higher numbers.” Many of Shopify’s best engineers have not written code this year. December 2025 and the release of Claude Opus changed everything.
    • Senior engineers became more valuable, not less. Lütke initially thought new grads with no priors would dominate the AI native era. He was wrong. Senior engineers steer agents better because steering is the new programming, and reps matter more than ever.
    • Context engineering will become the dominant role within 5 years. A new product builder role is emerging that subsumes engineering, design, and product management, focused on coordinating intelligent actors (humans and AI) to ship products.
    • “River” is Shopify’s AI engineer that named itself. Built first, then asked what name it wanted. River lives in Slack, ships engineering work, and learns publicly because it is steered through public Slack channels.
    • Builders are “eights” on the Enneagram and companies actively conspire against them. Eights call out nonsense, refuse fancy dressing, and are dangerous to colleagues’ careers. They rarely get promoted, often leave, and start companies. Shopify is “remarkably high on eights” because Lütke seeks them out.
    • Canada has “Trump Derangement Syndrome.” Over 60% of Canadians believe the United States is a bigger threat than Russia or China. Lütke calls this “stunning” and wrong. Canada’s only winning strategy historically has been “winning by helping America win.”
    • Canada should be the richest country on Earth. It has every resource the world needs for the next 20 years. Lütke wants pipelines built, industry built, refining done domestically, and an end to exporting raw resources to have other countries make end products.
    • Be deeply suspicious of “non-profit.” Lütke argues opting out of the only fitness function that has ever pulled people out of poverty (markets) and refusing to disclose your actual fitness function is a red flag. Non-profits replace merit with pull.
    • The climate cult is blocking civilization. Lütke called it “one of the most evil things wrought on the population” and pointed to anti-nuclear green parties and frog protection laws blocking factories as examples of policy capture.
    • The Chinese AI threat is real but misunderstood. The bigger concern is that if Western governments restrict children from using AI, kids will simply download Chinese open-weight models, train on collectivist worldviews, and stop ever writing high school essays about Tiananmen Square.
    • Markets are the most democratic system that exists. Every dollar spent is a vote. Capital allocation by hundreds of millions of consumers is more democratic than any election.
    • Friedrich List and the Prussian school over Adam Smith. Lütke prefers a model where governments define excellent games with positive externalities, then completely get out of the way and let competition do the rest.
    • Shopify’s biggest mistake was going into physical logistics right before AI got really good. Lütke initially defended the decision based on what he knew at the time, but later admitted he was probably just wrong.
    • Lütke does not look at the stock price. It has been at least 23 days since he last checked. He runs Shopify on product instincts, not market signals.
    • Great leaders must be exothermic. A CEO is a heat source for the company. Lütke prefers “temperature” to “chaos” because chaos has too negative a connotation.
    • Don’t go to university for university’s sake. Get a degree from somewhere hard to get into so you are surrounded by people who also fought to get in. Better yet, join a small company where you can actually be of value.
    • Entrepreneurship is the most AI-safe AND most AI-benefiting job. Lütke sees a coming golden age of entrepreneurship where priors no longer matter and AI co-founders eliminate the need to grow up around business.
    • “You can just do things” is the rallying cry Lütke wants to ingrain in the world. Action causes information. The cost of trying is lower than ever.
    • The demonization of wealth in America is misdirected. No one gets to a billion dollars by stealing. Builders create products that people vote for with their money, the most democratic act in any economy.

    Detailed Summary

    Harry Stebbings opens by asking Tobi Lütke whether entrepreneurs are motivated by fear of losing or hunger to win. Lütke says he is still figuring out his own answer, but argues that both extremes lead to short-term thinking. The real unlock is taking a long perspective, because compound advantages only accrue when you are willing to wait.

    Builders Are “Eights” and Companies Conspire Against Them

    Lütke explains the Enneagram personality framework and identifies himself as an “eight,” the type that refuses to accept that any organization’s output is acceptable just because it is dressed up nicely. Eights call out nonsense, are dangerous to careers around them, rarely get promoted in professionally managed companies, and often leave to start their own businesses. Shopify deliberately overweights eights in its hiring. Lütke also says people who build companies are “fundamentally crazy people” and that the public image of leadership comes from movies, not reality. He never wanted to be CEO but realized you cannot run a product driven company without controlling the company itself, because product needs and company needs only converge on a three-year horizon.

    The Luxury of Long-Term Thinking as a Public Company

    Stebbings asks if a public company can really afford long-term thinking. Lütke says trusted public companies are the best position to be in. The chasm to cross is from trusted private to untrusted public, which is why so many founders refuse to IPO. Shopify went public 11 years ago at a 1.67 billion dollar valuation when revenues were a fraction of today’s. The valuation is now roughly 100x higher. Lütke walks through the IPO mechanics: investment bankers serve the buy side, not the company, and Lütke priced his offering above range because he knew where his growth would come from. The first trade closed about 10 dollars higher, which he calls a “good performance” but a teaching moment about market price discovery.

    AI Is the Perfect Scapegoat for Mass Layoffs

    This is where the conversation gets explosive. Lütke says Shopify employs about 7,500 to 8,000 people today and his real hope is to have the same number in five years, but at 100x productivity. He argues that the layoffs sweeping the tech industry have nothing to do with AI. They are the result of pandemic-era overhiring catching up to slow-moving companies. But AI will get blamed for everything because it is the perfect Girardian scapegoat. It cannot defend itself, it has no PR team, and an entire industry of doomers is already trained to point at it. Lütke says his own industry has been “gaslighting everyone into AI fear” and science fiction did the same for 60 years before that.

    His own use of AI is what he calls utopian. Tasks that used to be hard are easy. Most jobs, he argues, are not actually good jobs to begin with. Being a human task queue is not a great job. Great jobs involve agency and creation. As AI gets cheaper, purchasing power explodes, and people will get options to do things on weekends that are vastly more productive than their day jobs ever were.

    Markets Are the Most Democratic Mechanism Ever Invented

    Lütke pivots into a long defense of capitalism as the most democratic system in existence. Every dollar spent is a vote, far more frequent and more granular than any election. He uses Elon Musk and Tesla as examples. Lütke owns a Model Y, did not touch the steering wheel that morning, and uses Starlink in the back to work on long drives. He posts on X and gets replies from Japan in real time. He calls Musk a “one man engine” who has captured a tiny percentage of the value he created. He extends this to Shopify itself: Lütke owns 6% of the company, which means 94% is owned by other people who all made money. Plus roughly 10 million people work in the broader Shopify ecosystem on customer fulfillment, web design, customer service, and more.

    Why “Non-Profit” Should Make You Suspicious

    Lütke targets the charity industrial complex. He argues that non-profits opt out of the only mechanism humanity has ever invented to lift people out of poverty (markets), and they fail to articulate what their actual fitness function is. The result is that “merit of organization is replaced with pull of individuals.” Smooth talkers, not builders, end up running these institutions. He acknowledges Carnegie’s libraries and a few exceptions but believes the ratio of charity dollars to good outcomes is dramatically off. He is far more enthusiastic about funders like MacKenzie Scott who give in unrestricted ways, and even more enthusiastic about Jensen Huang and Bloom Energy as compute and infrastructure investments that compound into civilizational gains.

    The Prussian School of Economics

    Asked about government intervention, Lütke pledges allegiance to Friedrich List and the Prussian school of political economy over Adam Smith and Lassalle. The job of government is to define excellent games where positive externalities accrue to society, then completely get out of the way. He calls the outsourcing of violence to governments “one of the most inspiring things humanity has ever done” because it created the conditions for personal property. But governments are extremely bad at doing things directly. The moment a government runs grocery stores, it costs 10x more, and entrepreneurs have to be enlisted to repair the damage.

    Canada’s Trump Derangement Syndrome

    Stebbings asks if Lütke is proud of Canadian Prime Minister Mark Carney for standing up to Trump. Lütke is unequivocal: no. He calls Carney’s stance “not a credible witness to the reality on the ground.” Canadians, he argues, are “massively overfit to niceness,” which leads to “unkind lies” and lying by omission. Over 60% of Canadians now believe the United States is a bigger threat than Russia or China, which Lütke calls “stunning” and clearly wrong. Canada is a small economy attached to a hegemon, and the only winning strategy in its history has been winning by helping America win.

    That said, he agrees with Carney on diversifying the economy, getting closer to Europe, and engaging Asia. But he wants Canada to also “build the [expletive] out of pipelines, build the [expletive] out of our industry, and start refining the stuff ourselves.” Canada has every resource the world needs for the next 20 years and the most educated workforce on Earth. The only obstacle is political will. Canada’s commercial story has been the same since the beaver pelt era: extract resources, ship them abroad, let other countries make end products. Canada Goose, Lululemon, Shopify, Miller Lite. That is the short list of products Canada actually makes.

    The Real Chinese Threat

    Lütke says the Chinese AI threat is both underestimated and overestimated. The bigger threat, he argues, is government overreach. If Western governments start dictating which AI models children can use, kids will simply download Chinese open-weight models. He notes that Chinese models, especially when prompted in Chinese, exhibit a clearly collectivist worldview. The risk is that an entire generation of students writes essays through models trained never to mention Tiananmen Square. He frames the broader political battle as collectivism versus individualism and says everything else is smoke screening.

    Fixing Europe and the Climate Cult

    Asked what he would do as president of Europe, Lütke begins by saying you have to “get rid of the climate cult.” He calls it “one of the most evil things wrought on the population,” citing green parties whose founding myth is that nuclear power is bad, and infrastructure projects blocked because of one frog breeding in one creek. He argues that very few people have the capability to truly build, and they need both enablement and accountability from the village. Beyond that, he wants Europe to follow the Prussian playbook: build excellent games, build infrastructure, and use the resulting wealth to sculpt the economy you want.

    Shopify’s Biggest Mistake

    Lütke says his biggest public mistake was Shopify’s full push into physical logistics and warehousing right before AI capabilities exploded. Initially he defended the decision as correct based on the information available at the time, but later admitted he probably just got it wrong. The hardest part was that real people lost their jobs when Shopify exited.

    Great Leaders Are a Heat Source

    Lütke previously talked about CEOs injecting “chaos” into organizations. He now prefers “temperature.” Heat is atoms jiggling. Great leaders must be exothermic, providing energy that flows through the organization. He says he hasn’t checked Shopify’s stock price in at least 23 days. Most public company CEOs are obsessed with their stock. Lütke runs on product instincts.

    Senior Engineers Don’t Write Code Anymore

    Lütke admits he was wrong about new grads having an AI native advantage. Some are exceptional (he hired a 13-year-old intern from Waterloo whose mother accompanies him to classes), but on the whole, senior engineers steer agents better than juniors do because they have done more reps. Programming is not gone. Programming has become higher level. Engineers massively underestimate how important steering is. Steering is just programming at a higher altitude.

    The Role That Will Dominate in 5 Years

    Lütke says context engineering, a term he had a hand in popularizing, will become a standard role within five years. It will likely subsume parts of product, design, and engineering management. The best AI programmers right now, surprisingly, are people from engineering management because they have been prompting intelligent agents (humans) for years. Good communicators are good thinkers because communication is distillation.

    River, the AI Engineer That Named Itself

    Shopify built an AI engineer that lives in Slack. They built it first, then asked it what name it wanted. The AI chose “River” because Shopify’s monolithic repository is called “world” and rivers shape worlds. River does an enormous amount of Shopify’s engineering, taking instructions through public Slack channels so that the entire company can learn from how others steer it.

    Over 50% of Shopify’s Code Is AI-Generated

    The number is “a fair deal over 50%” and “converting to much higher.” Many of Shopify’s best engineers have not written code this year, with the inflection point being December 2025 and the release of Claude Opus. Lütke himself still writes code occasionally, especially the data structure layer where he applies what he calls a “German school” of engineering: figure out how data persists on disk, then build everything else on top. Once that is right, the rest can be vibe coded by AI.

    Should His Kids Go to University?

    Lütke says he would not push his kids to attend university for its own sake. The value of a hard to enter program is being surrounded by people who also fought to get in. Better still: get into the room with people who are obsessed with the topic you care about. He thinks joining a small startup where you can actually be of value is often a superior path. He addresses nepotism directly. His instinct is that nepotism is bad. The gold standard is double-blind merit. But double-blind merit barely exists anywhere, and intersectional academic hiring criteria in Canada are arguably worse than nepotism.

    Final Reflections

    Lütke ends with what he calls the best advice he knows: “You can just do things.” The system exists to push everyone toward acceptable outcomes, but if you know what a good outcome looks like, you can step out of the system and try. Action causes information. The cost is lower than ever. The only constraint is that the experiment cannot have victims.

    He also addresses the demonization of wealth. No one gets to a billion dollars by stealing. Builders create products people vote for, the most democratic act there is. Buying from a local shop is voting for the welfare and future of local shops. Constructive criticism is itself something someone has to build, and Lütke welcomes it. Lazy criticism, hot takes, and bad faith arguments are corrosive and should be held in contempt.

    He is bullish on AI as a counterweight to information warfare. A council of AI models trained in different countries (Chinese, German, French, American) could fact check claims with multiple perspectives. The “@grok is this true” reflex on X is, he says, a primordial version of this. The information asymmetry that has favored bad faith actors for decades is about to flip.

    Thoughts

    This interview is a window into the operating philosophy of one of the most successful technical founders alive, and it is far more provocative than most of his public appearances. The headline claim, that AI is a scapegoat for layoffs caused by pandemic overhiring, deserves to be repeated until it sinks in. Every CEO who lays people off and then writes a memo about “AI driven efficiency” is taking advantage of a narrative that AI cannot push back against. The math is plain: if you doubled your headcount in 2021 and 2022 and now you are firing 15%, you are not net displaced by AI. You are correcting a hiring mistake.

    The 50% AI generated code statistic is the bigger story. Shopify is not a small company. 8,000 employees and 7 billion in revenue is enterprise scale. If a company that mature has crossed the 50% threshold and is “converting to much higher numbers,” the implication for the broader software industry is enormous. The senior engineer compounding observation is also subtle and important. If steering is the new programming, then the senior pool is more valuable, not less, and the pipeline problem for junior developers gets harder to solve. Companies that under invested in junior training during ZIRP will face an experience cliff in five years.

    Lütke’s Canadian commentary will offend many readers in his home country, which seems to be exactly the point. The “lying by omission” critique of Canadian niceness is sharp and accurate. The 60%+ of Canadians who view the US as their largest threat is genuinely a remarkable statistic, and it has implications for trade policy, capital flows, and immigration. Whether or not you agree with his political read, his prescription is unambiguous and pro-growth: build pipelines, refine resources domestically, stop being content as a feedstock economy.

    The non-profit critique deserves more public debate. The fitness function point, that markets reveal preferences and non-profits opt out of preference revelation while not disclosing what they optimize for, is a sharp economic argument. The pull versus merit observation about who ends up running large foundations rings true to anyone who has worked adjacent to the philanthropic sector.

    The introduction of River as an AI engineer that named itself is a small detail that signals where this is going. AI agents are going from tools to teammates with identities, channels, and reputations. The fact that River shapes the “world” repository is poetic, and the public Slack steering pattern is a real innovation in how organizations can scale agentic AI without creating siloed knowledge.

    Lütke’s “you can just do things” rallying cry is ultimately what ties the entire interview together. Whether he is talking about Canada, Europe, AI engineers, or his own kids, the through line is the same: action causes information, the cost of trying is lower than ever, and the only people who will benefit from the next decade are the ones who refuse to wait for permission. This is the most useful piece of philosophy in the entire conversation, and it applies far beyond entrepreneurship.

  • Andrej Karpathy on Vibe Coding vs Agentic Engineering: Why He Feels More Behind Than Ever in 2026

    Andrej Karpathy, co-founder of OpenAI, former head of AI at Tesla, and now founder of Eureka Labs, returned to Sequoia Capital’s AI Ascent 2026 stage for a wide-ranging conversation with partner Stephanie Zhan. One year after coining the term “vibe coding,” Karpathy unpacked what has changed, why he has never felt more behind as a programmer, and why the discipline emerging on top of vibe coding, which he calls agentic engineering, is the more serious craft worth learning right now.

    The conversation covered Software 3.0, the limits of verifiability, why LLMs are better understood as ghosts than animals, and why you can outsource your thinking but never your understanding. Below is a complete breakdown of the talk for anyone building, hiring, or learning in the agent era.

    TLDW

    Karpathy describes a sharp transition that happened in December 2025, when agentic coding tools crossed a threshold and code chunks just started coming out fine without correction. He frames the current moment as Software 3.0, where prompting an LLM is the new programming, and entire app categories are collapsing into a single model call. He distinguishes vibe coding (raising the floor for everyone) from agentic engineering (preserving the professional quality bar at much higher speed). Models remain jagged because they are trained on what labs choose to verify, so founders should look for valuable but neglected verifiable domains. Taste, judgment, oversight, and understanding remain uniquely human responsibilities, and tools that enhance understanding are the ones he is most excited about.

    Key Takeaways

    • December 2025 was a clear inflection point. Code chunks from agentic tools started arriving correct without edits, and Karpathy stopped correcting the system entirely.
    • Software 3.0 means programming has become prompting. The context window is your lever over the LLM interpreter, which performs computation in digital information space.
    • Open Code’s installer is a software 3.0 example. Instead of a complex shell script, you copy paste a block of text to your agent, and the agent figures out your environment.
    • The Menu Gen anecdote illustrates how entire apps can become spurious. What used to require OCR, image generation, and a hosted Vercell app can now be a single Gemini plus Nano Banana prompt.
    • Vibe coding raises the floor. Agentic engineering preserves the professional ceiling. The two are different disciplines.
    • The 10x engineer multiplier is now far higher than 10x for people who are good at agentic engineering.
    • Hiring processes have not caught up. Puzzle interviews are the old paradigm. New evaluations should look like building a full Twitter clone for agents and surviving simulated red team attacks from other agents.
    • Models are jagged because reinforcement learning rewards what is verifiable, and labs choose which verifiable domains to invest in. Strawberry letter counts and the 50 meter car wash question show how state-of-the-art models can refactor 100,000 line codebases yet fail at trivial reasoning.
    • If you are in a verifiable setting, you can run your own fine tuning, build RL environments, and benefit even when the labs are not focused on your domain.
    • LLMs are ghosts, not animals. They are statistical simulations summoned from pre training and shaped by RL appendages, not creatures with curiosity or motivation. Yelling at them does not help.
    • Taste, aesthetics, spec design, and oversight remain human jobs. Models still produce bloated, copy paste heavy code with brittle abstractions.
    • Documentation is still written for humans. Agent native infrastructure, where docs are explicitly designed to be copy pasted into an agent, is a major opportunity.
    • The future likely involves agent representation for people and organizations, with agents talking to other agents to coordinate meetings and tasks.
    • You can outsource your thinking but not your understanding. Tools that help humans understand information faster are uniquely valuable.

    Detailed Summary

    Why Karpathy Feels More Behind Than Ever

    Karpathy opens by describing how he has been using agentic coding tools for over a year. For most of that period, the experience was mixed. The tools could write chunks of code, but they often required edits and supervision. December 2025 changed everything. With more time during a holiday break and the release of newer models, Karpathy noticed that the chunks just came out fine. He kept asking for more. He cannot remember the last time he had to correct the agent. He started trusting the system, and what followed was a cascade of side projects.

    He wants to stress that anyone whose model of AI was formed by ChatGPT in early 2025 needs to look again. The agentic coherent workflow that genuinely works is a fundamentally different experience, and the transition was stark.

    Software 3.0 Explained

    The Software 1.0 paradigm was writing explicit code. Software 2.0 was programming by curating datasets and training neural networks. Software 3.0 is programming by prompting. When you train a GPT class model on a sufficiently large set of tasks, the model implicitly learns to multitask everything in the data. The result is a programmable computer where the context window is your interface, and the LLM is the interpreter performing computation in digital information space.

    Karpathy gives two concrete examples. The first is Open Code’s installer. Normally a shell script handles installation across many platforms, and these scripts balloon in complexity. Open Code instead provides a block of text you copy paste to your agent. The agent reads your environment, follows instructions, debugs in a loop, and gets things working. You no longer specify every detail. The agent supplies its own intelligence.

    The Menu Gen Story

    The second example is Karpathy’s Menu Gen project. He built an app that takes a photo of a restaurant menu, OCRs the items, generates pictures for each dish, and renders the enhanced menu. The app runs on Vercell and chains together multiple services. Then he saw a software 3.0 alternative. You take a photo, give it to Gemini, and ask it to use Nano Banana to overlay generated images onto the menu. The model returns a single image with everything rendered. The entire app he built is now spurious. The neural network does the work. The prompt is the photo. The output is the photo. There is no app between them.

    Karpathy uses this to argue that founders should not just think of AI as a speedup of existing patterns. Entirely new things become possible. His example is LLM driven knowledge bases that compile a wiki for an organization from raw documents. That is not a faster version of older code. It is a new capability with no prior equivalent.

    What Will Look Obvious in Hindsight

    Stephanie Zhan asks what the equivalent of building websites in the 1990s or mobile apps in the 2010s looks like today. Karpathy speculates about completely neural computers. Imagine a device that takes raw video and audio as input, runs a neural net as the host process, and uses diffusion to render a unique UI for each moment. He notes that early computing in the 1950s and 60s was undecided between calculator like and neural net like architectures. We went down the calculator path. He thinks the relationship may eventually flip, with neural networks becoming the host and CPUs becoming co processors used for deterministic appendages.

    Verifiability and Jagged Intelligence

    Karpathy spent significant writing time on verifiability. Classical computers automate what you can specify in code. The current generation of LLMs automates what you can verify. Frontier labs train models inside giant reinforcement learning environments, so the models peak in capability where verification rewards are strong, especially math and code. They stagnate or get rough around the edges elsewhere.

    This explains the jagged intelligence puzzle. The classic example was counting letters in strawberry. The newer one Karpathy offers: a state of the art model will refactor a 100,000 line codebase or find zero day vulnerabilities, then tell you to walk to a car wash 50 meters away because it is so close. The two coexisting capabilities should be jarring. They reveal that you must stay in the loop, treat models as tools, and understand which RL circuits your task lands in.

    He also points out that data distribution choices matter. The jump in chess capability from GPT 3.5 to GPT 4 came largely because someone at OpenAI added a huge amount of chess data to pre training. Whatever ends up in the mix gets disproportionately good. You are at the mercy of what labs prioritize, and you have to explore the model the labs hand you because there is no manual.

    Founder Advice in a Lab Dominated World

    Asked what founders should do given that labs are racing toward escape velocity in obvious verifiable domains, Karpathy points back to verifiability itself. If your domain is verifiable but currently neglected, you can build RL environments and run your own fine tuning. The technology works. Pull the lever with diverse RL environments and a fine tuning framework, and you get something useful. He hints there is one specific domain he finds undervalued but declines to name it on stage.

    On the question of what is automatable only from a distance, Karpathy says almost everything can ultimately be made verifiable. Even writing can be assessed by councils of LLM judges. The differences are in difficulty, not in possibility.

    From Vibe Coding to Agentic Engineering

    Vibe coding raises the floor. Anyone can build something. Agentic engineering preserves the professional quality bar that existed before. You are still responsible for your software. You are still not allowed to ship vulnerabilities. The question is how you go faster without sacrificing standards. Karpathy calls it an engineering discipline because coordinating spiky, stochastic agents to maintain quality at speed requires real skill.

    The ceiling on agentic engineering capability is very high. The old idea of a 10x engineer is now an understatement. People who are good at this peak far above 10x.

    What Mediocre Versus AI Native Looks Like

    Karpathy compares this to how different generations use ChatGPT. The difference between a mediocre and an AI native engineer using Claude Code, Codex, or Open Code is investment in setup and full use of available features. The same way previous generations of engineers got the most out of Vim or VSCode, today’s strong engineers tune their agentic environments deeply.

    He thinks hiring processes have not caught up. Most companies still hand out puzzles. The new test should look like asking a candidate to build a full Twitter clone for agents, make it secure, simulate user activity with agents, and then run multiple Codex 5.4x high instances trying to break it. The candidate’s system should hold up.

    What Humans Still Own

    Agents are intern level entities right now. Humans are responsible for aesthetics, judgment, taste, and oversight. Karpathy describes a Menu Gen bug where the agent tried to associate Stripe purchases with Google accounts using email addresses as the key, instead of a persistent user ID. Email addresses can differ between Stripe and Google accounts. This kind of specification level mistake is exactly what humans must catch.

    He works with agents to design detailed specs and treats those as documentation. The agent fills in the implementation. He has stopped memorizing API details for things like NumPy axis arguments or PyTorch reshape versus permute. The intern handles recall. Humans handle architecture, design, and the right questions.

    Reading the actual code agents produce can still cause heart attacks. It is bloated, full of copy paste, riddled with awkward and brittle abstractions. His Micro GPT project, an attempt to simplify LLM training to its bare essence, was nearly impossible to drive through agents. The models hate simplification. That capability sits outside their RL circuits. Nothing is fundamentally preventing this from improving. The labs simply have not invested.

    Animals Versus Ghosts

    Karpathy returns to his framing that we are not building animals, we are summoning ghosts. Animal intelligence comes from evolution and is shaped by intrinsic motivation, fun, curiosity, and empowerment. LLMs are statistical simulation circuits where pre training is the substrate and RL is bolted on as appendages. They are jagged. They do not respond to being yelled at. They have no real curiosity. The ghost framing is partly philosophical, but it changes how you approach them. You stay suspicious. You explore. You do not assume the system you used yesterday will behave the same on a new task.

    Agent Native Infrastructure

    Most software, frameworks, libraries, and documentation are still written for humans. Karpathy’s pet peeve is being told to do something instead of being given a block of text to copy paste to his agent. He wants agent first infrastructure. The Menu Gen project’s hardest part was not writing code. It was deploying on Vercell, configuring DNS, navigating service settings, and stringing together integrations. He wants to give a single prompt and have the entire thing deployed without touching anything.

    Long term he expects agent representation for individuals and organizations. His agent will negotiate meeting details with your agent. The world becomes one of sensors, actuators, and agent native data structures legible to LLMs.

    Education and What Still Matters

    The most striking line of the conversation comes near the end. Karpathy quotes a tweet that shaped his thinking: you can outsource your thinking but you cannot outsource your understanding. Information still has to make it into your brain. You still need to know what you are building and why. You cannot direct agents well if you do not understand the system.

    This is part of why he is so excited about LLM driven knowledge bases. Every time he reads an article, his personal wiki absorbs it, and he can query it from new angles. Every projection onto the same information yields new insight. Tools that enhance human understanding are uniquely valuable because LLMs do not excel at understanding. That bottleneck is yours to manage.

    Thoughts

    The most useful frame in this talk is the distinction between vibe coding and agentic engineering. It clarifies what has been muddled for the past year. Vibe coding is about access. Anyone can produce something. Agentic engineering is about discipline. You preserve the standards that made software trustworthy in the first place, while moving at speeds that would have seemed absurd two years ago. These are not the same activity, and conflating them is part of why so many shipped products feel half built.

    The Menu Gen anecdote is the kind of story that should make every solo developer pause. If a single Gemini plus Nano Banana prompt can replace a multi service Vercell deployed app, the question for any builder becomes how much of what you are working on right now is going to be made spurious by the next model release. The honest answer is probably more than you want to admit. The defensive posture is not building thicker apps. It is choosing problems where the model alone is not enough, where taste, distribution, infrastructure, or specific verifiable RL environments give you something the next model cannot collapse into a prompt.

    The verifiability lens is also unusually practical. If you are a solo builder, the question shifts from what is possible to what is verifiable but neglected. The labs will eat the obvious verifiable domains because that is how their RL pipelines are set up. The opportunity is in domains where verification is possible but the labs have not yet invested. That is a much more concrete strategic filter than vague intuitions about defensibility.

    The car wash example is going to stick. State of the art models can refactor enormous codebases and still tell you to walk somewhere a sane person would drive. That is the lived reality of jagged intelligence, and it argues strongly for staying in the loop on real decisions rather than handing off everything to agents. The agents are excellent fillers of blanks. They are not yet trustworthy specifiers of the spec.

    Finally, the line about outsourcing thinking but not understanding is worth taping above the desk. The bottleneck is no longer typing speed, syntax recall, or even API knowledge. It is whether the human in the loop actually understands the system being built. Tools that genuinely improve human understanding, including personal knowledge bases that re project information through different prompts, are likely the most undervalued category of products being built right now. The opportunity is not just in agents. It is in the cognitive scaffolding that makes humans good directors of agents.

  • OpenClaw & The Age of the Lobster: How Peter Steinberger Broken the Internet with Agentic AI

    In the history of open-source software, few projects have exploded with the velocity, chaos, and sheer “weirdness” of OpenClaw. What began as a one-hour prototype by a developer frustrated with existing AI tools has morphed into the fastest-growing repository in GitHub history, amassing over 180,000 stars in a matter of months.

    But OpenClaw isn’t just a tool; it is a cultural moment. It’s a story about “Space Lobsters,” trademark wars with billion-dollar labs, the death of traditional apps, and a fundamental shift in what it means to be a programmer. In a marathon conversation on the Lex Fridman Podcast, creator Peter Steinberger pulled back the curtain on the “Age of the Lobster.”

    Here is the definitive deep dive into the viral AI agent that is rewriting the rules of software.


    The TL;DW (Too Long; Didn’t Watch)

    • The “Magic” Moment: OpenClaw started as a simple WhatsApp-to-CLI bridge. It went viral when the agent—without being coded to do so—figured out how to process an audio file by inspecting headers, converting it with ffmpeg, and transcribing it via API, all autonomously.
    • Agentic Engineering > Vibe Coding: Steinberger rejects the term “vibe coding” as a slur. He practices “Agentic Engineering”—a method of empathizing with the AI, treating it like a junior developer who lacks context but has infinite potential.
    • The “Molt” Wars: The project survived a brutal trademark dispute with Anthropic (creators of Claude). During a forced rename to “MoltBot,” crypto scammers sniped Steinberger’s domains and usernames in seconds, serving malware to users. This led to a “Manhattan Project” style secret operation to rebrand as OpenClaw.
    • The End of the App Economy: Steinberger predicts 80% of apps will disappear. Why use a calendar app or a food delivery GUI when your agent can just “do it” via API or browser automation? Apps will devolve into “slow APIs”.
    • Self-Modifying Code: OpenClaw can rewrite its own source code to fix bugs or add features, a concept Steinberger calls “self-introspection.”

    The Origin: Prompting a Revolution into Existence

    The story of OpenClaw is one of frustration. In late 2025, Steinberger wanted a personal assistant that could actually do things—not just chat, but interact with his files, his calendar, and his life. When he realized the big AI labs weren’t building it fast enough, he decided to “prompt it into existence”.

    The One-Hour Prototype

    The first version was built in a single hour. It was a “thin line” connecting WhatsApp to a Command Line Interface (CLI) running on his machine.

    “I sent it a message, and a typing indicator appeared. I didn’t build that… I literally went, ‘How the f*** did he do that?’”

    The agent had received an audio file (an opus file with no extension). Instead of crashing, it analyzed the file header, realized it needed `ffmpeg`, found it wasn’t installed, used `curl` to send it to OpenAI’s Whisper API, and replied to Peter. It did all this autonomously. That was the spark that proved this wasn’t just a chatbot—it was an agent with problem-solving capabilities.


    The Philosophy of the Lobster: Why OpenClaw Won

    In a sea of corporate, sanitized AI tools, OpenClaw won because it was weird.

    Peter intentionally infused the project with “soul.” While tools like GitHub Copilot or ChatGPT are designed to be helpful but sterile, OpenClaw (originally “Claude’s,” a play on “Claws”) was designed to be a “Space Lobster in a TARDIS”.

    The soul.md File

    At the heart of OpenClaw’s personality is a file called soul.md. This is the agent’s constitution. Unlike Anthropic’s “Constitutional AI,” which is hidden, OpenClaw’s soul is modifiable. It even wrote its own existential disclaimer:

    “I don’t remember previous sessions… If you’re reading this in a future session, hello. I wrote this, but I won’t remember writing it. It’s okay. The words are still mine.”

    This mix of high-utility code and “high-art slop” created a cult following. It wasn’t just software; it was a character.


    The “Molt” Saga: A Trademark War & Crypto Snipers

    The projects massive success drew the attention of Anthropic, the creators of the “Claude” model. They politely requested a name change to avoid confusion. What should have been a simple rebrand turned into a cybersecurity nightmare.

    The 5-Second Snipe

    Peter attempted to rename the project to “MoltBot.” He had two browser windows open to execute the switch. In the five seconds it took to move his mouse from one window to another, crypto scammers “sniped” the account name.

    Suddenly, the official repo was serving malware and promoting scam tokens. “Everything that could go wrong, did go wrong,” Steinberger recalled. The scammers even sniped the NPM package in the minute it took to upload the new version.

    The Manhattan Project

    To fix this, Peter had to go dark. He planned the rename to “OpenClaw” like a military operation. He set up a “war room,” created decoy names to throw off the snipers, and coordinated with contacts at GitHub and X (Twitter) to ensure the switch was atomic. He even called Sam Altman personally to check if “OpenClaw” would cause issues with OpenAI (it didn’t).


    Agentic Engineering vs. “Vibe Coding”

    Steinberger offers a crucial distinction for developers entering this new era. He rejects the term “vibe coding” (coding by feel without understanding) and proposes Agentic Engineering.

    The Empathy Gap

    Successful Agentic Engineering requires empathy for the model.

    • Tabula Rasa: The agent starts every session with zero context. It doesn’t know your architecture or your variable names.
    • The Junior Dev Analogy: You must guide it like a talented junior developer. Point it to the right files. Don’t expect it to know the whole codebase instantly.
    • Self-Correction: Peter often asks the agent, “Now that you built it, what would you refactor?” The agent, having “felt” the pain of the build, often identifies optimizations it couldn’t see at the start.

    Codex (German) vs. Opus (American)

    Peter dropped a hilarious but accurate analogy for the two leading models:

    • Claude Opus 4.6: The “American” colleague. Charismatic, eager to please, says “You’re absolutely right!” too often, and is great for roleplay and creative tasks.
    • GPT-5.3 Codex: The “German” engineer. Dry, sits in the corner, doesn’t talk much, reads a lot of documentation, but gets the job done reliably without the fluff.

    The End of Apps & The Future of Software

    Perhaps the most disruptive insight from the interview is Steinberger’s view on the app economy.

    “Why do I need a UI?”

    He argues that 80% of apps will disappear. If an agent has access to your location, your health data, and your preferences, why do you need to open MyFitnessPal? The agent can just log your calories based on where you ate. Why open Uber Eats? Just tell the agent “Get me lunch.”

    Apps that try to block agents (like X/Twitter clipping API access) are fighting a losing battle. “If I can access it in the browser, it’s an API. It’s just a slow API,” Peter notes. OpenClaw uses tools like Playwright to simply click “I am not a robot” buttons and scrape the data it needs, regardless of developer intent.


    Thoughts: The “Mourning” of the Craft

    Steinberger touched on a poignant topic for developers: the grief of losing the craft of coding. For decades, programmers have derived identity from their ability to write syntax. As AI takes over the implementation, that identity is under threat.

    But Peter frames this not as an end, but an evolution. We are moving from “programmers” to “builders.” The barrier to entry has collapsed. The bottleneck is no longer your ability to write Rust or C++; it is your ability to imagine a system and guide an agent to build it. We are entering the age of the System Architect, where one person can do the work of a ten-person team.

    OpenClaw is not just a tool; it is the first true operating system for this new reality.