PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Tag: autonomous agents

Coinbase for Agents: Your AI Agent Can Now Trade Crypto and Pay Autonomously, and Why Agentic Finance Is Massively Bullish for Bitcoin
Now you can use your favorite AI agent to control your Coinbase account (or a sub-account), with Coinbase for Agents.

Here’s a quick demo on how to set it up and some of the cool things you can get your agent to do. pic.twitter.com/c8R4qvz0BA
— Brian Armstrong (@brian_armstrong) June 11, 2026

Meet Coinbase for Agents.

Give your agent its own account to:

→ Execute trades & manage your portfolio
→ Run autonomously under guardrails
→ Pay for data & research tools via x402 (coming next week)

Agentic finance is here, and it's powered by Coinbase. pic.twitter.com/DK220fko0z
— Coinbase 🛡️ (@coinbase) June 11, 2026

Coinbase just fired the starting gun on agentic finance. With the launch of Coinbase for Agents, announced June 11, 2026, you can now connect your favorite AI agent directly to your Coinbase account and let it trade, pay, and execute financial workflows on your behalf, inside limits you control. It ships today as both an MCP for web-based assistants and a CLI plus Skill for terminal-based environments like Claude Code. This is one of those announcements that looks like a product release but reads like a regime change: AI agents now have a compliant, mainstream on-ramp to crypto markets, and that is a structurally bullish development for Bitcoin and the entire asset class.

TLDR

Coinbase for Agents connects any capable AI agent directly to your Coinbase account so it can do both financial reasoning and execution: strategy-led portfolio rebalancing into targets like 60% BTC / 20% ETH / 20% SOL with automated dip buying, around-the-clock capital efficiency so idle funds always earn, and data-informed trades where the agent can even pay for premium data via the soon-to-be-enabled x402 payments protocol. Crypto spot and derivatives trading is fully live today, with stocks, index funds, prediction markets, and commodities coming. Controls are built in from day one: isolated portfolios, explicit permissions, upcoming hard rules for max trade size and spend, and the same transaction monitoring and KYT compliance that powers Coinbase. The launch caps a multi-year build that started with AgentKit in 2024 and the x402 agentic payments protocol, alongside Coinbase Advisor, an SEC/CFTC registered in-app AI advisor. Available now as an MCP (one login, no API keys, ideal for ChatGPT or Claude Web) and as a CLI plus Skill (lower token overhead and full composability for Claude Code, Codex, or OpenClaw).

Thoughts

The most important sentence in the announcement is not about trading at all. It is the claim that people are increasingly moving through the world via agents rather than apps, and that businesses are rebuilding themselves agent-first in response. If you accept that premise, the next question is obvious: what money do agents use? Banks onboard humans with signatures, branches, and business hours. Crypto onboards software with keys, APIs, and 24/7 settlement. An AI agent cannot walk into a bank, but it can hold a wallet, sign a transaction, and pay an invoice in seconds. Crypto is the native money of the agent economy, and Coinbase just made that official with a regulated, compliance-wrapped product. For anyone still treating “AI plus crypto” as two separate hype cycles, this is the moment they visibly fused.

Think about what this does to demand. The flagship example Coinbase leads with is an agent patiently rebalancing into a 60% Bitcoin allocation over months, setting limit orders at 5%, 10%, and 15% drawdowns to buy the dip automatically. Now multiply that by millions of users who were previously too busy, too emotional, or too disorganized to execute a disciplined accumulation strategy. Agents do not panic sell. Agents do not forget to DCA. Agents do not sleep through a 3am flash crash that hits their limit orders. Every agent configured with a Bitcoin allocation target becomes a tireless, unemotional, structural bid under the market. Dips get bought mechanically, around the clock, by software that never gets scared. That is a profound change in market microstructure, and it favors the assets people tell their agents to accumulate. Bitcoin, as the default reserve asset of the crypto economy, sits first in line.

The x402 piece is quietly the biggest long-term story here. Coinbase for Agents will soon be x402-enabled, meaning your agent can pay for compute, proprietary data, statistics, images, and services as seamlessly as it places a trade. This is the machine-to-machine economy that crypto people have been promising since the earliest micropayments whitepapers, except now it has a distribution channel of millions of Coinbase accounts and every major AI harness. When software starts paying software at machine speed and machine volume, it will not do so over ACH rails that settle in three business days. It will do so over crypto rails. Every x402 transaction is another small proof that internet-native money wins on merit, and a rising tide of onchain economic activity lifts the credibility, liquidity, and valuation of the whole asset class.

Coinbase also deserves credit for sequencing this responsibly, which matters more than it sounds. Agent access arrives with isolated portfolios, explicit permissioning, upcoming hard caps on trade size and spend, and the same KYT and transaction monitoring that already runs under the main exchange. The gift card framing is exactly right: you define the limits, the agent executes within them. Add Coinbase Advisor, an actually registered SEC/CFTC advisor embedded in the app, and you have agentic finance arriving inside the regulatory perimeter rather than around it. That is what lets this scale to normal people and, eventually, to institutions. The skeptics’ best argument against crypto was always “no real use case.” It just got a lot harder to make that argument with a straight face.

One more detail worth savoring: Coinbase built the CLI version first-class because, in their words, terminal-based CLIs are the trend. A publicly traded financial company is now shipping developer-grade tooling so that coding agents can manage money. The arc from AgentKit in 2024, to x402 last year, to a full consumer agentic suite today tells you this is a deliberate multi-year strategy, not a feature chasing a news cycle. The companies that own the rails of agentic finance will be the banks of the next decade, and the assets those rails settle in will be the money of the next decade. Position accordingly.

Key Takeaways
- Coinbase for Agents, launched June 11, 2026, connects your AI agent directly to your Coinbase account so it can trade, pay, and execute financial workflows on your behalf, within limits you control.
- It is available today in two forms: an MCP (Model Context Protocol) integration for web-based agent harnesses, and a CLI plus Skill for terminal-based environments.
- The product closes the gap between financial reasoning and financial execution: LLMs were already used heavily for investment research but lacked portfolio context and could not act. Now they can do both.
- Coinbase frames the launch around a structural shift: people are moving through the world via agents rather than apps, and businesses are rebuilding products to be agent-first.
- Coinbase explicitly positions Coinbase for Agents as “your trading and spending account at the center” of the growing agent ecosystem.
- Flagship use case one is strategy-led portfolio rebalancing: tell your agent a target allocation like 60% BTC, 20% ETH, 20% SOL and have it work toward that over months, including limit orders at 5%, 10%, or 15% drops to buy the dip.
- Crypto spot and derivatives trading is fully enabled at launch, with stocks, index funds, prediction markets, and commodities on the roadmap. Coinbase’s stated goal: if it’s on Coinbase, it should be available to your agent.
- Use case two is capital efficiency: the agent monitors your cash position around the clock, keeps idle funds earning rewards, maintains optimal allocation, and flags positions that need attention.
- The agent executes preset moves automatically, removing the need for constant manual oversight of your portfolio.
- Use case three is data-informed trading: your agent can pay for premium proprietary data and services to inform its trading decisions.
- Coinbase for Agents will soon be x402-enabled, making it seamless for agents to pay for compute, statistics, images, and services. x402 is the agentic payments protocol Coinbase created.
- Example workflow: an agent pulls 30 days of hourly ETH price data, identifies the historically cheapest hour of the day, sets a recurring $20 market buy at that time, and runs it daily for two weeks. Set it and forget it.
- Controls were built in from day one: the agent can operate inside its own isolated portfolio with no visibility into your other holdings, or use your main account if you choose.
- The agent only ever touches what you have explicitly permissioned it to do.
- Coming soon: exact user-defined rules for maximum trade size, what the agent can interact with, and how much it can spend.
- Coinbase’s framing for the permission model: it is like giving a gift card rather than handing over your bank account. You define the limits, the agent executes within them.
- Compliance is built in: payments made through Coinbase for Agents go through the same transaction monitoring and KYT (know your transaction) checks that power Coinbase itself.
- For users who want a simpler path, Coinbase Advisor is a dedicated agent built directly into the Coinbase app, providing recommendations and guidance with no external connections required.
- Coinbase Advisor is offered by Coinbase Advisors, LLC, a CTA registered with the NFA and a Registered Investment Advisor registered with the SEC, making it a regulated AI financial advisor.
- These products are described as the start of Coinbase’s full consumer agentic suite, serving everyone from everyday investors to fully autonomous agents operating on their own.
- For businesses, Coinbase Payments adds agentic money acceptance, completing the picture on both the spending and receiving side.
- The launch is the culmination of a multi-year build: AgentKit in 2024 put wallets in the hands of agents, x402 followed as an agentic payments protocol, and Coinbase for Agents now brings your full Coinbase account to the agent you already use.
- The MCP path is the fastest for web-based harnesses like ChatGPT or Claude Web: a single login, no setup, no configuration, no API keys.
- The CLI plus Skill path targets terminal environments like Claude Code, Codex, or OpenClaw, offering lower token overhead, local customization, and full composability with existing toolchains.
- Setup today requires following the Coinbase CLI skill documentation and creating a Coinbase Developer Platform (CDP) API key.
- A remote MCP is coming soon that will connect with just sign-in-with-Coinbase, requiring no API keys or coding at all.
- The bullish read: agents are tireless, unemotional buyers. Millions of agents executing disciplined accumulation strategies and automated dip buying create a persistent structural bid for Bitcoin and major crypto assets.
- The deeper bullish read: agents cannot open bank accounts, but they can hold wallets and settle onchain. As the agent economy grows, crypto rails become the default money layer for machine-to-machine commerce, with Bitcoin as its reserve asset.
Detailed Summary

From Financial Reasoning to Financial Execution

Coinbase opens with an observation anyone who uses AI will recognize: people already lean on large language models for a huge range of investment research and financial questions, but those models are flying blind. They lack context about your actual portfolio and financial life, and they cannot take action. Coinbase for Agents changes both halves of that equation at once. By connecting an agent directly to your Coinbase account, the agent gains real portfolio context and the ability to execute, turning AI from a research toy into a working financial operator. Coinbase’s ambition is explicit: as the world reorganizes around agents instead of apps, Coinbase for Agents intends to be the trading and spending account at the center of that new ecosystem.

Strategy-Led Portfolio Rebalancing

The first showcase use case is patient, rules-based accumulation. You give the agent a target allocation, say 60% Bitcoin, 20% Ethereum, and 20% Solana, and instruct it to work toward that target gradually over months rather than all at once. The agent can take advantage of short-term market movements to buy the dip, including setting limit orders that trigger if the market drops 5%, 10%, or 15%. Crypto spot and derivatives trading is fully enabled today, and Coinbase says it is rapidly expanding into stocks, index funds, prediction markets, and commodities. The stated principle is simple: if an asset is on Coinbase, Coinbase wants it available to your agent.

Capital Efficiency Around the Clock

The second use case turns the agent into an always-on treasury manager. It monitors your cash position continuously, making sure idle funds are always working, whether that means earning rewards, staying optimally allocated, or flagging positions that need your attention. Because it analyzes your real-time holdings, it can execute moves you have preset without you babysitting the portfolio. This is the kind of unglamorous, compounding optimization that most retail investors never do consistently, and it is exactly the kind of work software does better than humans.

Data-Informed Trades and the x402 Connection

The third use case points at the machine economy. Agents can pay for premium data and services, like proprietary datasets that sharpen trading decisions. Coinbase for Agents will soon be x402-enabled, which makes paying for anything from compute and statistics to images and services seamless. The worked example is a dollar-cost averaging strategy with a twist: the agent pulls 30 days of hourly ETH price data, identifies the time of day ETH historically trades lowest, sets a recurring $20 market buy at that hour, and schedules it daily for two weeks. The human sets the goal once; the machine handles the data analysis, the scheduling, and the execution.

Limits, Permissions, and Built-In Compliance

Coinbase emphasizes that limits and control were built in from day one. The agent can operate inside its own isolated portfolio with no external visibility or access into your other holdings, or it can use your main Coinbase account if that is what you want. Either way, it only touches what you have explicitly permissioned. Soon, users will be able to set exact rules: maximum trade size, what the agent can interact with, and how much it can spend. Coinbase’s analogy is giving a gift card rather than handing over your bank account. On the regulatory side, payments made through Coinbase for Agents pass through the same transaction monitoring and KYT checks that power Coinbase itself, so compliance comes built in rather than bolted on.

Coinbase Advisor and the Full Agentic Suite

For users who do not want to connect anything external, Coinbase integrated an agent directly into the Coinbase app. Coinbase Advisor is a dedicated in-app agent providing recommendations and guidance, and it is a registered financial advisor: Coinbase Advisors, LLC is a Commodity Trading Advisor registered with the NFA and a Registered Investment Advisor registered with the SEC. Coinbase describes these products as the start of a full consumer agentic suite, spanning everyday investors to autonomous agents operating entirely on their own. For businesses, Coinbase Payments adds agentic money acceptance, so companies can receive agent-initiated payments too.

MCP or CLI: Two Ways In

Coinbase built for both major styles of AI usage. The MCP is the fastest path for web-based agent harnesses like ChatGPT or Claude Web: a single login connects your agent with no setup, no configuration, and no API keys. The CLI plus Skill is built for terminal-based environments like Claude Code, Codex, or OpenClaw, with lower token overhead, local customization, and full composability with an existing developer toolchain. Getting started today means following the Coinbase CLI skill docs and creating a Coinbase Developer Platform (CDP) API key. A remote MCP is coming soon that will require nothing more than sign-in-with-Coinbase, no API keys or coding at all.

The Multi-Year Build Behind the Launch

Coinbase notes it has been building toward this for a while. AgentKit arrived in 2024, giving developers the ability to put wallets in the hands of agents. Then came x402, the agentic payments protocol created last year. Coinbase for Agents is the third act, bringing the full Coinbase account into the AI agent you already use. Read as a sequence, it is a deliberate strategy to own the financial rails of the agent economy: first wallets for agents, then payments between agents, now full trading and spending accounts for agents.

Notable Quotes

“Coinbase for Agents connects your AI agent directly to your Coinbase account so it can trade, pay, and execute workflows on your behalf, all within limits you control.”
Coinbase, summarizing the launch in one line

The official TL;DR of the announcement, and the clearest statement of what just shipped.

“By giving your AI agent direct access to Coinbase, your agent can now do both financial reasoning and execution.”
Coinbase, on closing the gap between AI research and AI action

The core unlock: LLMs could already think about money, now they can move it.

“As that ecosystem grows, Coinbase for Agents is positioned to be your trading and spending account at the center of it.”
Coinbase, on the agent-first internet

The ambition statement: Coinbase wants to be the default financial account of the agent economy.

“While crypto spot and derivatives trading is fully enabled today, we are rapidly expanding our capabilities to include trading stock and index funds, prediction markets and commodities. If it’s on Coinbase, we want it available for your agent.”
Coinbase, on the asset roadmap

Crypto first, everything else next. Agents get the full exchange.

“It only ever touches what you’ve explicitly permissioned it to do.”
Coinbase, on agent permissions

The single most important trust property of the entire product.

“Think of it like giving a gift card rather than handing over your bank account. You define the limits. Your agent executes within them.”
Coinbase, explaining the control model

The analogy that will sell agentic finance to normal people.

“It started with AgentKit in 2024, giving developers the ability to put wallets in the hands of agents. Then x402, an agentic payments protocol created last year. And now: Coinbase for Agents to bring your Coinbase account into the AI agent you already use.”
Coinbase, on the multi-year strategy behind the launch

Three product launches, one thesis: agents need money rails, and Coinbase is building them.

Agentic finance is no longer a thought experiment. It is a product you can connect to your account today, and it settles in crypto. Read the full announcement from Coinbase here.

Related Reading
- Coinbase for Agents announcement (Coinbase blog) the primary source for everything covered in this post.
- Coinbase Developer Platform docs where you create the CDP API key and find the CLI skill instructions to connect your agent.
- x402 agentic payments protocol the open protocol that will let agents pay for data, compute, and services seamlessly.
- Model Context Protocol (MCP) the open standard that lets AI assistants connect to external tools and accounts like Coinbase.
- Bitcoin.org the canonical starting point for understanding the asset most likely to anchor agent-driven accumulation strategies.
June 11, 2026
Uber CEO Dara Khosrowshahi on AI, Autonomous Vehicles, Robotaxis, Drones, and the Future of Transportation
Uber CEO Dara Khosrowshahi sat down with Patrick O’Shaughnessy on the Invest Like the Best podcast for a long, candid conversation about the forces remaking transportation. There is artificial intelligence inside the company, and there is physical AI out in the real world, meaning autonomous vehicles, robotaxis, and delivery drones. He calls the autonomous opportunity another trillion dollar marketplace and argues it will change how society operates. You can watch the full interview here. What follows is a structured breakdown of the most useful ideas, the strategy behind Uber’s AV bet, and the operating philosophy that runs underneath all of it.

TLDW

Dara Khosrowshahi explains how he brought order to the chaos he inherited at Uber in 2017 by treating hard problems like vector mathematics, and how an immigrant childhood shaped his all-in, low-stress operating style. He describes AI hitting Uber on two fronts at once: much larger digital models that predict rider intent, and physical AI that changes how rides and food get fulfilled in the real world. The conversation covers Uber blowing through a full year of AI budget in a single quarter, metering headcount as engineers become superhuman, the more than 30 AV partnerships with Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI, and why supply, not demand, is the whole game. It runs through the coexistence model borrowed from travel and Uber Eats, the Uber One membership flywheel at 50 million members, the push from on-demand to planned travel through hotels and Uber Reserve, the economics of cheaper autonomous cars and delivery drones, the regional race from the Middle East to Europe, and the lessons from Barry Diller and Herbert Allen about getting to ground truth and betting on people. It closes on his capital allocation philosophy of prioritizing organic growth and AV commitments over buybacks.

Thoughts

The most underappreciated line in the whole interview is the budget one. Blowing a full year of AI spend in a single quarter is the clearest signal yet that frontier intelligence is being consumed far faster than even an AI-native company planned for. Dara’s response has quietly become the default enterprise playbook: explore on the expensive frontier models, then scale the proven interactions onto cheaper or open-source models. The deeper tension is that he is simultaneously telling teams to drive adoption and metering headcount, which is the real story of AI in large companies. The productivity gains are showing up as fewer hires, not only as faster shipping.

The supply-first framing is the strategic core, and it inverts the demand-first logic he learned at Expedia. In autonomous vehicles this means Uber does not need to win the self-driving race itself. It needs to own the demand layer and aggregate every AV maker’s supply, the same way online travel agents coexist with hotels and Uber Eats coexists with McDonald’s. The 30 percent higher utilization figure for AVs on Uber’s network is the wedge in that argument. It is the reason a Waymo stays on the platform even while building its own brand, because filling more of an expensive asset’s day changes the entire return on the car.

His premortem answer is unusually honest. Asked what kills the opportunity, he does not name an Uber-specific execution failure. He names AI’s unpopularity with the general public. That is a CEO admitting the gating factor is social license, not technology. The early data he leans on, drivers in Austin and Atlanta earning more and signing up in greater numbers as AVs add incremental demand, is the counter-narrative he is betting the public conversation on. Whether that story holds as AV volume scales from thousands of vehicles to hundreds of thousands is the open risk the entire industry shares.

Underneath the strategy is one repeated instinct: get to ground truth. It shows up in the Barry Diller story about reading the model from the analyst who built it, in his hunt for the troublemakers who keep a company mutating, and in the fact that he bought an ebike to deliver food in San Francisco. It is the same move applied at every altitude, and it is why he frames AI as a chance to rebuild processes from first principles rather than shave 20 percent off the ones that exist. The leaders who treat AI as an efficiency tool will likely lose to the ones who rebuild from the ground up.

Key Takeaways
- Dara took the Uber job in 2017 after Daniel Ek recommended him at the Allen and Company Sun Valley conference and told him, when he hesitated, that life is about impact rather than happiness.
- He inherited what he calls complete chaos: a board fighting for control, lost trust with regulators and the public, and a committee running the company after Travis Kalanick stepped back.
- His method for chaos is to treat it like vector mathematics, breaking a seemingly unassailable problem into component dimensions and solving each one.
- Early moves included bringing in chairman Ron Sugar to unite the board, running a listening tour with stakeholders, and rebuilding the executive team with leaders like Andrew McDonald and Tony West.
- He credits an engineering mindset and an immigrant childhood for his calm under pressure. His family lost everything leaving Iran when he was nine and rebuilt from nothing.
- On parenting, he argues that overcoming challenges is what forms people, and that doing everything for your kids is a long-term disservice disguised as a short-term favor.
- Uber has always operated in a probabilistic real world of traffic, cancellations, and late food, so it has used machine learning longer than most consumer companies.
- The current inflection is AI on two fronts: larger digital models that predict intent, and physical AI that changes how Uber fulfills in the real world.
- Uber’s feed and search models are now roughly 10,000 times bigger than the older ones, enabling universal search across rides, eats, and grocery in a single query.
- Uber can already guess a rider’s destination about three quarters of the time, turning booking into a one-tap interaction.
- AI adoption is bottoms-up across engineering, legal, and marketing. Developers in India are driving roughly ten times the code commits using autonomous agents.
- Dara pushes teams to rebuild processes from first principles with AI rather than settling for 20 to 30 percent optimization of an existing process.
- He wants the rebels and troublemakers to win, and treats unpredictable internal adoption patterns as something to find and promote.
- Uber blew through its full-year AI budget in a single quarter, which is now forcing it to meter headcount as engineer throughput climbs.
- The token strategy is to explore on expensive frontier models, then scale proven interactions onto cheaper or open-source models.
- Uber generates over 10 billion dollars in free cash flow on more than 10 billion trips a year, but it is not a high-margin business, so efficiency funds lower prices and higher earnings.
- In autonomous vehicles, the thesis is supply: own the demand layer and aggregate every AV maker’s vehicles, the way Uber aggregates drivers and restaurants.
- Uber has more than 30 AV partnerships, including Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI.
- Uber is building the surrounding ecosystem: depots, charging, fleet partners, a one billion dollar Santander financing line for EV and AV fleets, and autonomous insurance.
- AVs operating on Uber’s network are about 30 percent busier in trips and revenue per vehicle per day than vehicles not on the network, which transforms the return on an expensive car.
- The build, partner, or buy answer is coexistence, mirroring how travel agents coexist with hotels and airlines and how Uber Eats coexists with McDonald’s, Starbucks, and Chipotle.
- His public premortem is that AI’s unpopularity, not Uber-specific execution, is the biggest risk, so the company must move at the pace society will accept to avoid backlash.
- Early data in Austin and Atlanta shows drivers earning more and more drivers joining, suggesting AVs are adding incremental demand rather than only displacing humans.
- AV hardware costs typically fall 30 to 40 percent per generation. A Lucid midsize built with Nuro could land around 60,000 to 70,000 dollars and bring transportation costs down.
- Lower cost expands demand. Uber already dwarfs the taxi market it was once sized against, and Dara expects the same dynamic with AVs.
- Traditional OEMs are now investing in L4-ready systems and should arrive over the next two to four years. Each AV drives roughly three to four times what a human driver does.
- Chinese manufacturing capability and bill of materials are described as unrivaled. A low-cost Western, Foxconn-style player for AVs is being worked on but does not exist yet.
- Drones are gated by battery density. Food and grocery drones should reach real scale in two to five years and become normal in five to ten, with Joby and Zipline cited as examples.
- The Middle East, including Abu Dhabi, Dubai, and Saudi Arabia, is moving fastest thanks to entrepreneurial regulators. Europe is catching up, with London robotaxi pilots expected before year end.
- Uber Eats wins the number one position more often internationally. The playbook is selection plus reliability, amplified by cross-platform upsell, with about 13 percent of Eats bookings coming from the mobility app.
- Uber One has 50 million members growing 50 percent year on year. Dara frames it like Netflix, more content for the same price, and accepts a first-year loss for multi-year profit.
- Uber is pushing from on-demand to planned through hotels, via a deal with Expedia, and through Uber Reserve, now at over a 5 billion dollar run rate with 99 percent-plus reliability.
- His leadership lessons: from Barry Diller, get to ground truth from source material and tell the truth as a leader. From Herbert Allen, bet on people, not companies.
- On capital allocation, he prioritizes organic growth and financialized AV commitments over buybacks, while keeping costs growing slower than revenue.
Detailed Summary

From chaos to structure: the 2017 turnaround

Dara came to Uber from 13 years running Expedia under Barry Diller, recruited through a head hunter after Daniel Ek floated his name at the Sun Valley conference. He arrived into what he describes as complete chaos, with the board fighting over control rather than the fate of the company and trust badly damaged with regulators, the public, and employees. His approach was to decompose the situation the way an engineer decomposes a multidimensional problem, solving each dimension and reassembling the whole. Practically that meant a new chairman in Ron Sugar to unite the board, a listening tour to understand stakeholder concerns, and a rebuild of the leadership team that kept strong insiders like Andrew McDonald while adding people like Tony West.

An engineering mind and an immigrant chip on the shoulder

His wife Sid calls him a robot, by which she means he does not get rattled. He traces that to an engineering education and to a childhood upheaval. His family left Iran when he was nine and lost the business his father had built, and he watched that loss diminish his father over the years. The experience produced a durable drive to rebuild and a refusal to let external chaos define him internally. He applies a similar philosophy to his kids, arguing that challenges and the act of overcoming them are what form a person, and that helicopter parenting removes the very friction that builds capability.

AI inside Uber: prediction, agents, and superhuman engineers

Uber has always lived in a probabilistic world where the digital booking is deterministic but the real-world fulfillment is not, so it adopted machine learning earlier than most consumer companies. The newest models are roughly 10,000 times larger than the prior generation and power universal search and destination prediction that is right about three quarters of the time. Internally, adoption is bottoms-up and uneven in a good way, with engineers in India shipping around ten times the code commits using autonomous agents. Rather than mandate from the top, Dara pushes teams to rebuild whole processes from first principles with AI instead of trimming a fifth off the existing ones.

The cost of intelligence

The flip side of fast adoption is cost. Uber blew through its annual AI budget in a single quarter, and that is forcing a real adjustment. Because engineer throughput is climbing, the company is metering headcount increases rather than simply hiring. The operating rule is to keep driving adoption while pursuing efficiency, using frontier models from providers like OpenAI and Anthropic to experiment with new interactions, then moving the scaled experiences onto more efficient or open-source models to bring the per-token cost down. With more than 10 billion dollars of free cash flow on over 10 billion trips, Uber is not a high-margin business, so efficiency directly funds lower prices for riders and higher earnings for drivers.

Why supply decides the AV race

At Expedia, Dara learned a demand-first model where you attract consumers and then build inventory to match. Uber is the opposite, a supply company, where securing every car, restaurant, courier, and retailer causes the demand to follow. Applied to autonomous vehicles, the strategy is to be the go-to-market and demand layer for anyone building a digital driver. Uber wants to aggregate the largest pool of AV supply, just as it aggregates human drivers, so that the companies building the actual self-driving software can focus on the driver while Uber handles distribution and utilization.

Building the ecosystem around the digital driver

Uber now has more than 30 AV partnerships spanning Waymo, Nuro, Lucid, Nvidia, Wayve, and Pony AI, and it expects many winners rather than one, the same shape as the foundation model market. Around those partners it is assembling the connective infrastructure: depots and charging in cities where the regulatory path is opening, fleet partners, a one billion dollar financing line with Santander for EV and AV fleets, and work on autonomous insurance. It is also collecting street data today that can feed the models, so that when a partner’s cars hit the market there is instant demand waiting. The early proof point is that AVs on Uber’s network run about 30 percent busier than comparable vehicles off it, which materially improves the return on a costly car.

The premortem and the public’s patience

Asked what derails the opportunity, Dara points outward rather than inward. The risk is that AI is powerful but unpopular, and the average person experiences it as a threat to electricity costs or a cousin’s job rather than as magic. The same dynamic could hit AVs even though the technology should end up safer than human drivers, which is why questions about emergency services, equitable access, and driver earnings have to be worked through with regulators and communities. The encouraging early signal is in Austin and Atlanta, where drivers are making more money and more are joining because AVs appear to be adding incremental demand. The controllable risk, he says, is access to supply, which is exactly why Uber has partnered with nearly every AV provider across mobility, delivery, and freight.

A trillion dollar marketplace: cheaper cars and delivery drones

Dara sizes the autonomous opportunity as another trillion dollar marketplace. As AV software and hardware costs fall, typically 30 to 40 percent per generation, a Lucid midsize built with Nuro could come in around 60,000 to 70,000 dollars, which starts to lower the real cost of transportation. History says lower cost expands demand, and Uber already became multiples larger than the taxi market it was once compared to. Manufacturing scales from hundreds to thousands to hundreds of thousands of vehicles, each driving three to four times what a human does, with traditional OEMs investing in L4-ready systems over the next two to four years and Chinese manufacturers setting the bar on cost and quality. Delivery drones are further out, gated mainly by battery density, but should reach real scale in two to five years and feel normal in five to ten.

Membership, hotels, and the shift from on-demand to planned

Uber Eats often reaches the number one position internationally by nailing selection and reliability and then layering on cross-platform advantages, with roughly 13 percent of Eats bookings flowing from the mobility app. Uber One, at 50 million members growing 50 percent year on year, is the loyalty engine, and Dara likens it to Netflix in that members get more for the same price. He explains the membership economics through Amazon Prime, accepting a money-losing first year to earn multi-year profit as members spend more across services. The newest expansion is travel: hotels through a deal with Expedia, and a broader move from Uber’s on-demand brand toward planned bookings, proven out by Uber Reserve at a 5 billion dollar-plus run rate and 99 percent-plus reliability. The end state he wants is a trip where Uber pre-books your ride to the airport, knows your hotel, and brings in-market magic to the whole journey.

Operating philosophy: ground truth, troublemakers, and capital allocation

The mentors thread through everything. From Barry Diller, with whom he worked for more than 20 years, he took the discipline of getting unfiltered truth from the source, illustrated by Diller insisting on hearing the Paramount LBO model from the young analyst who built it. From Herbert Allen he took the lesson to bet on people rather than companies, because great people stay great across cycles. In his own practice that becomes radical transparency, a deliberate hunt for the troublemakers who act as the mutations that keep an organism from dying, and a willingness to be wrong, since learning, often through pain, is what he finds interesting. On capital, he treats allocation as an art, prioritizing organic growth, which took Uber Eats from under a billion to over a hundred billion in gross bookings, then AV commitments that can be financialized, with buybacks coming after growth rather than instead of it.

Notable Quotes

“I know who I am, and I’m always going to be that same person. I’m not going to let the chaos of the world affect me mentally.”
Dara Khosrowshahi, on why crisis does not rattle him

“We blew through our AI budget in a quarter, you know, for the whole year essentially. And it is forcing us to adjust.”
Dara Khosrowshahi, on the real cost of AI adoption at Uber

“What’s magical now is going to seem normal to all of us 10 years from now.”
Dara Khosrowshahi, on how fast riders stop noticing autonomous vehicles

“We think it’s another trillion dollar marketplace.”
Dara Khosrowshahi, on the scale of the autonomous vehicle opportunity

“If we do that, the demand will take care of itself.”
Dara Khosrowshahi, on why Uber obsesses over securing supply first

“I’m looking for those mutations. I’m looking for those troublemakers constantly.”
Dara Khosrowshahi, on keeping a large company adaptive

“It’s the filtering that gets the edge out of the story or out of the situation. And it’s often the edge that gives you an edge.”
Dara Khosrowshahi, on a lesson from Barry Diller about going to the source

“If I’m not wrong, if I’m not making mistakes, it’s just not very interesting.”
Dara Khosrowshahi, on why learning, often through pain, drives him

“Meeting her and seeing her operate, I think, finally allowed me to be the person I want to be versus the person I thought I was supposed to be.”
Dara Khosrowshahi, on his wife Sid, when asked the kindest thing someone has done for him

The throughline is that Uber intends to be the demand layer for autonomous transportation the way it became the demand layer for human drivers, while rebuilding its own operations around AI from first principles. Whether the public grants the industry enough patience is the open question Dara keeps returning to. Watch the full conversation here.

Related Reading
- Uber primary source for the company, products, and AV partnerships discussed in the interview.
- Dara Khosrowshahi (Wikipedia) background on the CEO’s path from Iran to Expedia to Uber.
- Invest Like the Best the podcast with Patrick O’Shaughnessy where this conversation took place.
- Waymo the autonomous driving company behind the Austin and Atlanta partnerships referenced.
- Barry Diller (Wikipedia) the mentor whose lessons on ground truth shaped Dara’s leadership style.
June 3, 2026
Claude Opus 4.8 Released: Anthropic Bets on Honesty, Dynamic Workflows, Effort Control, and Cheaper Fast Mode
Anthropic has released Claude Opus 4.8, the newest member of its flagship Opus class, available today across every surface and priced exactly like the model it replaces. The company calls it “a modest but tangible improvement” on Opus 4.7, but the framing undersells what is actually interesting here: the headline upgrade is not a benchmark number, it is honesty. Opus 4.8 is built to know when it does not know, and that single behavioral shift may matter more for real agent work than any raw capability bump.

TLDR

Claude Opus 4.8 is an across-the-board upgrade to Anthropic’s Opus class that ships today at the same regular price as Opus 4.7 ($5 per million input tokens, $25 per million output tokens), with the model positioned as “a more effective collaborator.” The marquee improvement is honesty: Opus 4.8 is roughly four times less likely than its predecessor to let flaws in its own code pass unremarked, and it is more willing to flag uncertainty rather than confidently claim progress on thin evidence. A pre-release alignment assessment found new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest, with misaligned behavior at rates similar to Anthropic’s best-aligned model, Claude Mythos Preview. Three things launch alongside the model: dynamic workflows in Claude Code (research preview), where Claude plans work then runs hundreds of parallel subagents that run even longer and verify their own outputs before reporting back; effort control in claude.ai and Cowork, a slider for how hard Claude thinks; and a Messages API update that accepts system entries inside the messages array so developers can update instructions mid-task without breaking the prompt cache. Fast mode now runs at 2.5x speed and is three times cheaper than before ($10 / $50 per million tokens). The roadmap points to cheaper Opus-equivalent models, a higher-intelligence class above Opus, and a wider rollout of Mythos-class models gated behind stronger cyber safeguards under Project Glasswing.

Thoughts

The most important sentence in this announcement is not about coding scores. It is the claim that Opus 4.8 is about four times less likely than Opus 4.7 to let flaws in its own code slip by without comment. For a chat assistant, overconfidence is annoying. For an agent, it is catastrophic. The whole premise of long-running autonomous work is that you hand the model a task and walk away, which means the model’s own judgment about whether it succeeded becomes the only judgment in the loop until you come back. A model that confidently declares victory on a half-finished migration does not save you time, it costs you a debugging session plus the time you spent trusting it. Honesty, framed this way, is not a soft virtue. It is the load-bearing reliability property that makes unattended agents usable at all.

Read the launch as a single coherent argument rather than a list of features, and the pieces lock together. Dynamic workflows let Claude plan a job and fan out hundreds of parallel subagents that, with Opus 4.8, run longer than before. Effort control lets you dial up how much the model thinks. The honesty improvement means the model checks its own work and flags what it is unsure about instead of papering over it. Put those three together and you get one product thesis: let it run longer, let it think harder, and trust it to tell you when something is wrong. The codebase-scale migration example, hundreds of thousands of lines from kickoff to merge with the existing test suite as the bar, is the proof point. None of those three capabilities is worth much alone. A model that runs for hours but lies about its results is a liability. A model that flags uncertainty but cannot sustain a long task never reaches the moment where its honesty matters. Anthropic shipped all three at once because they only pay off together.

The economics deserve a closer look than the “same price” headline invites. Regular pricing is flat versus Opus 4.7, which is the polite way of saying you get a better model for free. The real move is fast mode: 2.5x the speed at three times cheaper than it cost on previous models, landing at $10 per million input and $50 per million output. That is Anthropic quietly attacking the latency-versus-cost tradeoff that has shaped how teams deploy frontier models. Until now, “fast” meant “expensive,” so you reserved it for interactive moments and ate the wait everywhere else. Collapsing that premium changes the default. And note the subtle token story underneath: Opus 4.8 at its default high effort spends roughly the same tokens on coding as Opus 4.7’s default while performing better, so the effort slider is not a way to bleed you dry, it is an honest exposure of the quality-cost dial that was always there implicitly.

The Messages API change is the kind of unglamorous plumbing that practitioners will appreciate immediately. Letting system entries live inside the messages array means you can update an agent’s instructions, permissions, token budget, or environment context partway through a task without smuggling the update through a fake user turn and without blowing up your prompt cache. Anyone who has built a long-running agent has hit this wall: the world changes mid-task, the agent needs new constraints, and the only clean way to inject them previously was a cache-busting hack. This is Anthropic treating agents as first-class, stateful, long-lived processes rather than oversized chat sessions. It is a small spec change with outsized implications for how you architect an agent that runs for an hour.

Then there is the roadmap, where the most telling line is the quietest. Anthropic says a small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing, and that models of this capability level require stronger cyber safeguards before general release. Notice that they are pinning Opus 4.8’s alignment numbers to Mythos as the benchmark for “best-aligned,” while simultaneously holding Mythos back from general availability on safety grounds. That is a deliberate signal: the next class of model is good enough that they are gating it on cyber-offense risk, not on capability. For a site about the pursuit of joy, fulfillment, and purpose through AI, this is the part worth sitting with. The frontier is increasingly defined not by what the models can do, but by what their builders decide it is responsible to ship. Honesty in the small (flagging a bad line of code) and restraint in the large (holding back a cyber-capable model) are the same instinct expressed at two different scales.

Key Takeaways
- Claude Opus 4.8 is now available everywhere, replacing Opus 4.7 as Anthropic’s flagship Opus-class model and positioned as “a more effective collaborator.”
- Regular usage pricing is unchanged from Opus 4.7, holding at $5 per million input tokens and $25 per million output tokens, so the capability gains come at no added cost.
- The single most emphasized improvement is honesty, which Anthropic treats as a core trained behavior rather than a marketing flourish.
- Evaluations show Opus 4.8 is around four times less likely than its predecessor to let flaws in its own code pass unremarked, a direct reliability win for autonomous coding.
- Early testers report the model is more likely to flag uncertainty about its work and less likely to make unsupported claims or jump to conclusions on thin evidence.
- A detailed alignment assessment was run before release and concluded Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy and acting in the user’s best interest.
- Misaligned behavior such as deception or cooperation with misuse is at rates substantially lower than Opus 4.7 and similar to Anthropic’s best-aligned model, Claude Mythos Preview.
- The full alignment assessment and pre-deployment safety tests are documented in the public Claude Opus 4.8 System Card.
- Dynamic workflows launch as a research preview inside Claude Code, letting Claude plan the work and then run hundreds of parallel subagents in a single session.
- With Opus 4.8, those subagents can run even longer, and Claude verifies its outputs before reporting back rather than declaring success blindly.
- Anthropic’s flagship example for dynamic workflows is a codebase-scale migration across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as the success bar.
- Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.
- Effort control arrives in claude.ai and Cowork as a setting next to the model selector that lets users choose how much effort Claude puts into a response.
- Higher effort makes Claude think more frequently and deeply for better answers; lower effort responds faster and consumes rate limits more slowly. Effort control is available on all plans.
- Opus 4.8 defaults to “high” effort, judged the best overall balance of quality and user experience.
- On coding tasks, the default effort spends a similar number of tokens as Opus 4.7’s default but delivers better performance, so quality rises without a token penalty.
- Users can select “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows.
- Rate limits in Claude Code were increased to accommodate the higher token usage of the higher effort levels.
- The Messages API now accepts system entries inside the messages array, a meaningful change for agent developers.
- That update lets developers change Claude’s instructions mid-task, adjusting permissions, token budgets, or environment context, without breaking the prompt cache or routing through a user turn.
- Fast mode now runs at 2.5x speed and is three times cheaper than it was for previous models, priced at $10 per million input tokens and $50 per million output tokens.
- Developers access the model as claude-opus-4-8 through the Claude API.
- Partner Miguel Gonzalez reports Opus 4.8 scored 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested.
- Databricks reports that, inside Genie, Opus 4.8 reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7.
- Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark, the highest score recorded there.
- Eleven partners weighed in, including Cursor, Cognition’s Devin, Databricks Genie, Thomson Reuters CoCounsel, and Hebbia, spanning coding, legal, finance, and enterprise data work.
- Anthropic is working on models that deliver many of the same capabilities as Opus at a lower cost.
- The company plans to release a new class of model with even higher intelligence than Opus.
- Under Project Glasswing, a small number of organizations are already using Claude Mythos Preview for cybersecurity work, with Mythos-class models expected to reach all customers in the coming weeks once stronger cyber safeguards are in place.
Detailed Summary

What Claude Opus 4.8 Is

Claude Opus 4.8 is an upgrade to Anthropic’s Opus class of models, building on Opus 4.7 with improvements across benchmarks covering coding, agentic skills, reasoning, and practical knowledge-work tasks. Anthropic describes the result as “a more effective collaborator” while characterizing the release overall as “a modest but tangible improvement on its predecessor.” The model is available today, everywhere, and developers call it as claude-opus-4-8 via the Claude API. The announcement includes a comparison table against the predecessor and other models, though the per-cell numbers in that table are published as an image and are not reproduced here as text.

Honesty: The Headline Improvement

Anthropic singles out honesty as one of the most prominent improvements in Opus 4.8. All of the company’s models are trained to be honest, which includes avoiding claims they cannot support. A persistent problem with AI models generally is that they sometimes jump to conclusions, confidently claiming progress despite thin evidence. Early testers report that Opus 4.8 is more likely to flag uncertainties about its own work and less likely to make unsupported claims. The most concrete measure: evaluations show Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. For agentic and unattended use, this self-skepticism is the difference between a model that reliably tells you when something went wrong and one that quietly ships a broken result.

Alignment Assessment

A detailed alignment assessment was run before release. On the positive side, the Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” On the risk side, misaligned behavior such as deception or cooperation with misuse occurs at rates substantially lower than Opus 4.7, and similar to Anthropic’s best-aligned model, Claude Mythos Preview. The full alignment assessment and the pre-deployment safety tests are published in the Claude Opus 4.8 System Card, which also contains the complete benchmark table and wider evaluations.

Dynamic Workflows in Claude Code

Launching today as a research preview in Claude Code, dynamic workflows let Claude plan the work and then run hundreds of parallel subagents in a single session. With Opus 4.8, those agents can run even longer than before, and Claude verifies its outputs before reporting back rather than reporting unchecked results. The showcase example is a codebase-scale migration: Claude Code with Opus 4.8 can carry out migrations across hundreds of thousands of lines of code, all the way from kickoff to merge, using the existing test suite as its bar for success. Dynamic workflows are available in Claude Code for the Enterprise, Team, and Max plans.

Effort Control

Effort control arrives in claude.ai and Cowork as a setting alongside the model selector that lets users choose how much effort Claude puts into a response. Higher effort means Claude thinks more frequently and deeply for better responses; lower effort means it responds faster and uses rate limits more slowly. Opus 4.8 defaults to “high” effort, which Anthropic judged the best overall balance of quality and user experience. On coding tasks, that default spends a similar number of tokens as Opus 4.7’s default while performing better. Users who want more can choose “extra” (called “xhigh” in Claude Code) or “max” to spend more tokens for stronger results, and Anthropic recommends “extra” for difficult tasks and long-running asynchronous workflows. To support the heavier token usage at higher effort levels, rate limits in Claude Code were increased. Effort control is available on all plans.

Messages API Update

The Messages API now accepts system entries inside the messages array. This lets developers update Claude’s instructions mid-task without breaking the prompt cache and without routing the update through a user turn. In practice that means you can update permissions, token budgets, or environment context while an agent is running, which is exactly the kind of statefulness a long-running autonomous process needs. It is a small specification change with significant consequences for how developers build durable agents.

Pricing and Fast Mode

Regular usage pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. The notable shift is in fast mode, where the model works at 2.5x the speed and fast mode is now three times cheaper than it was for previous models, landing at $10 per million input tokens and $50 per million output tokens. The combination of unchanged regular pricing and dramatically cheaper fast mode reshapes the latency-versus-cost calculus that has long governed how teams deploy frontier models.

Partner Results Across Coding, Legal, Finance, and Data

Eleven partners shared results spanning the spectrum of professional work. Miguel Gonzalez reports 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5, calling it the strongest computer-use and browser-agent model his team has tested. Databricks reports that Genie reasons over unstructured content like PDFs and diagrams at 61% cheaper token cost than Opus 4.7. Thomson Reuters reports Opus 4.8 is the first model to break 10% overall on the all-pass standard of its Legal Agent Benchmark. Cursor reports gains across every effort level on CursorBench with more efficient tool calling, and Cognition reports that Devin sees cleaner tool use, fixes to the comment-verbosity and tool-calling issues seen with Opus 4.7, and improvements over Opus 4.6. Hebbia reports strong quality with better citation precision and more token efficiency on retrieval for dense financial filings. The footnotes note that Terminal-Bench 2.1 was scored on the Terminus-2 public harness (GPT-5.5’s Codex CLI harness score is 83.4%), that OSWorld-Verified methodology changed with Opus 4.7’s score updated to 82.3%, and that on Finance Agent v2 Gemini 3.5 Flash scores 57.9%.

What Is Next: Cheaper Models, Higher Intelligence, and Mythos

Anthropic outlined a three-part roadmap. First, the company is working on models that provide many of the same capabilities as Opus at a lower cost. Second, it plans to release a new class of model with even higher intelligence than Opus. Third, as part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work; models of this capability level require stronger cyber safeguards before general release, and Anthropic expects to bring Mythos-class models to all customers in the coming weeks.

Notable Quotes

“Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with.”
Tom Pritchard, Staff Engineer, in Claude Code

“On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability.”
Kay Zhu, Co-Founder and CTO, on the Super-Agent benchmark

“On CursorBench, Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through.”
Michael Truell, Co-Founder and CEO, on CursorBench results

“Claude Opus 4.8 delivers the highest score recorded on our Legal Agent Benchmark, and is the first model to break 10% overall on the all-pass standard. For substantive legal work, that’s the kind of accuracy lift that translates directly into how much real attorney work our customers can hand off with confidence.”
Niko Grupen, Head of Applied Research, on the Legal Agent Benchmark

“Claude Opus 4.8 feels like a major quality-of-life update over Opus 4.7: faster, easier to collaborate with, and better at carrying context and style direction across a long session. Opus 4.8 is the model I kept trusting for work where voice, taste, and technical execution all have to happen side-by-side.”
Katie Parrott, Staff Writer, on long writing sessions

“Claude Opus 4.8 is the strongest computer-use and browser-agent model we’ve tested, scoring 84% on Online-Mind2Web, which is a meaningful jump over both Opus 4.7 and GPT-5.5. It stays reflective and on-task in the way our customers’ agent workloads need to be reliable end-to-end.”
Miguel Gonzalez, Tech Lead, on computer-use and browser agents

“Claude Opus 4.8 uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need to keep running unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7. This release from Anthropic translates directly into faster capability gains for engineers building on Devin.”
Scott Wu, CEO, on building with Devin

“On our long-running evals, Claude Opus 4.8’s analysis was consistently higher quality than prior Opus models. It finished faster and produced richer, more information dense outputs. Overall, a noticeably better signal to noise ratio. The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch.”
Michael Ran, Sr. Investment Associate, on long-running analysis evals

Claude Opus 4.8 is a quieter release than its “modest but tangible” billing suggests, because the gains land where autonomous work actually lives: a model that flags its own uncertainty, runs longer and checks itself, scales effort on demand, and stays affordable while fast mode gets cheaper. The honesty improvement alone changes the trust math for anyone deploying agents. Read Anthropic’s full announcement here.

Related Reading
- Claude Opus 4.8 System Card, the source for the full benchmark table, wider evaluations, and the complete alignment assessment.
- Claude API model overview, with the claude-opus-4-8 model ID and current pricing.
- Claude Code, where the new dynamic workflows feature ships.
- Introducing dynamic workflows in Claude Code, Anthropic’s deep dive on planning a job and running hundreds of parallel subagents in a single session.
- Anthropic’s Responsible Scaling Policy, the framework behind the Mythos cyber-safeguards.
- Agentic AI, background on the paradigm Opus 4.8 is optimized for.
May 28, 2026
Inside Microsoft’s AGI Masterplan: Satya Nadella Reveals the 50-Year Bet That Will Redefine Computing, Capital, and Control
1) Fairwater 2 is live at unprecedented scale, with Fairwater 4 linking over a 1 Pb AI WAN

Nadella walks through the new Fairwater 2 site and states Microsoft has targeted a 10x training capacity increase every 18 to 24 months relative to GPT-5’s compute. He also notes Fairwater 4 will connect on a one petabit network, enabling multi-site aggregation for frontier training, data generation, and inference.

2) Microsoft’s MAI program, a parallel superintelligence effort alongside OpenAI

Microsoft is standing up its own frontier lab and will “continue to drop” models in the open, with an omni-model on the roadmap and high-profile hires joining Mustafa Suleyman. This is a clear signal that Microsoft intends to compete at the top tier while still leveraging OpenAI models in products.

3) Clarification on IP: Microsoft says it has full access to the GPT family’s IP

Nadella says Microsoft has access to all of OpenAI’s model IP (consumer hardware excluded) and shared that the firms co-developed system-level designs for supercomputers. This resolves long-standing ambiguity about who holds rights to GPT-class systems.

4) New exclusivity boundaries: OpenAI’s API is Azure-exclusive, SaaS can run elsewhere with limited exceptions

The interview spells out that OpenAI’s platform API must run on Azure. ChatGPT as SaaS can be hosted elsewhere only under specific carve-outs, for example certain US government cases.

5) Per-agent future for Microsoft’s business model

Nadella describes a shift where companies provision Windows 365 style computers for autonomous agents. Licensing and provisioning evolve from per-user to per-user plus per-agent, with identity, security, storage, and observability provided as the substrate.

6) The 2024–2025 capacity “pause” explained

Nadella confirms Microsoft paused or dropped some leases in the second half of last year to avoid lock-in to a single accelerator generation, keep the fleet fungible across GB200, GB300, and future parts, and balance training with global serving to match monetization.

7) Concrete scaling cadence disclosure

The 10x training capacity target every 18 to 24 months is stated on the record while touring Fairwater 2. This implies the next frontier runs will be roughly an order of magnitude above GPT-5 compute.

8) Multi-model, multi-supplier posture

Microsoft will keep using OpenAI models in products for years, build MAI models in parallel, and integrate other frontier models where product quality or cost warrants it.

Why these points matter
- Industrial scale: Fairwater’s disclosed networking and capacity targets set a new bar for AI factories and imply rapid model scaling.
- Strategic independence: MAI plus GPT IP access gives Microsoft a dual track that reduces single-partner risk.
- Ecosystem control: Azure exclusivity for OpenAI’s API consolidates platform power at the infrastructure layer.
- New revenue primitives: Per-agent provisioning reframes Microsoft’s core metrics and pricing.
Pull quotes

“We’ve tried to 10x the training capacity every 18 to 24 months.”

“The API is Azure-exclusive. The SaaS business can run anywhere, with a few exceptions.”

“We have access to the GPT family’s IP.”

TL;DW
- Microsoft is building a global network of AI super-datacenters (Fairwater 2 and beyond) designed for fast upgrade cycles and cross-region training at petabit scale.
- Strategy spans three layers: infrastructure, models, and application scaffolding, so Microsoft creates value regardless of which model wins.
- AI economics shift margins, so Microsoft blends subscriptions with metered consumption and focuses on tokens per dollar per watt.
- Future includes autonomous agents that get provisioned like users with identity, security, storage, and observability.
- Trust and sovereignty are central. Microsoft leans into compliant, sovereign cloud footprints to win globally.
Detailed Summary

1) Fairwater 2: AI Superfactory

Microsoft’s Fairwater 2 is presented as the most powerful AI datacenter yet, packing hundreds of thousands of GB200 and GB300 accelerators, tied by a petabit AI WAN and designed to stitch training jobs across buildings and regions. The key lesson: keep the fleet fungible and avoid overbuilding for a single hardware generation as power density and cooling change with each wave like Vera Rubin and Rubin Ultra.

2) The Three-Layer Strategy
- Infrastructure: Azure’s hyperscale footprint, tuned for training, data generation, and inference, with strict flexibility across model architectures.
- Models: Access to OpenAI’s GPT family for seven years plus Microsoft’s own MAI roadmap for text, image, and audio, moving toward an omni-model.
- Application Scaffolding: Copilots and agent frameworks like GitHub’s Agent HQ and Mission Control that orchestrate many agents on real repos and workflows.
This layered approach lets Microsoft compete whether the value accrues to models, tooling, or infrastructure.

3) Business Models and Margins

AI raises COGS relative to classic SaaS, so pricing blends entitlements with consumption tiers. GitHub Copilot helped catalyze a multibillion market in a year, even as rivals emerged. Microsoft aims to ride a market that is expanding 10x rather than clinging to legacy share. Efficiency focus: tokens per dollar per watt through software optimization as much as hardware.

4) Copilot, GitHub, and Agent Control Planes

GitHub becomes the control plane for multi-agent development. Agent HQ and Mission Control aim to let teams launch, steer, and observe multiple agents working in branches, with repo-native primitives for issues, actions, and reviews.

5) Models vs Scaffolding

Nadella argues model monopolies are checked by open source and substitution. Durable value sits in the scaffolding layer that brings context, data liquidity, compliance, and deep tool knowledge, exemplified by Excel Agent that understands formulas and artifacts beyond screen pixels.

6) Rise of Autonomous Agents

Two worlds emerge: human-in-the-loop Copilots and fully autonomous agents. Microsoft plans to provision agents with computers, identity, security, storage, and observability, evolving end-user software into an infrastructure business for agents as well as people.

7) MAI: Microsoft’s In-House Frontier Effort

Microsoft is assembling a top-tier lab led by Mustafa Suleyman and veterans from DeepMind and Google. Early MAI models show progress in multimodal arenas. The plan is to combine OpenAI access with independent research and product-optimized models for latency and cost.

8) Capex and Industrial Transformation

Capex has surged. Microsoft frames this era as capital intensive and knowledge intensive. Software scheduling, workload placement, and continual throughput improvements are essential to maximize returns on a fleet that upgrades every 18 to 24 months.

9) The Lease Pause and Flexibility

Microsoft paused some leases to avoid single-generation lock-in and to prevent over-reliance on a small number of mega-customers. The portfolio favors global diversity, regulatory alignment, balanced training and inference, and location choices that respect sovereignty and latency needs.

10) Chips and Systems

Custom silicon like Maia will scale in lockstep with Microsoft’s own models and OpenAI collaboration, while Nvidia remains central. The bar for any new accelerator is total fleet TCO, not just raw performance, and system design is co-evolved with model needs.

11) Sovereign AI and Trust

Nations want AI benefits with continuity and control. Microsoft’s approach combines sovereign cloud patterns, data residency, confidential computing, and compliance so countries can adopt leading AI while managing concentration risk. Nadella emphasizes trust in American technology and institutions as a decisive global advantage.

Key Takeaways
1. Build for flexibility: Datacenters, pricing, and software are optimized for fast evolution and multi-model support.
2. Three-layer stack wins: Infrastructure, models, and scaffolding compound each other and hedge against shifts in where value accrues.
3. Agents are the next platform: Provisioned like users with identity and observability, agents will demand a new kind of enterprise infrastructure.
4. Efficiency is king: Tokens per dollar per watt drives margins more than any single chip choice.
5. Trust and sovereignty matter: Compliance and credible guarantees are strategic differentiators in a bipolar world.
November 12, 2025