Anthropic passes OpenAI: Opus 4.8 and the bet on agents that work longer

  • AI

 

Anthropic passes OpenAI: Opus 4.8 and the bet on agents that work longer

Siyah zeminde birbirine bağlı parlayan küplerden oluşan bir ağ

On Thursday, May 28, Anthropic did two things in the same news cycle. It shipped Claude Opus 4.8, and it announced a $65 billion raise at a $965 billion valuation that put it ahead of OpenAI for the first time. The valuation got the headlines. The model is the part worth reading closely.

$965B
POST-MONEY VALUATION
$47B
RUN-RATE REVENUE THIS MONTH
FEWER UNREMARKED CODE FLAWS
3
MAJOR CLOUDS, FIRST FRONTIER MODEL

I run an AI company called TaoAILab. The valuation number is not something I can act on. The model release is. So I read the Opus 4.8 announcement first, and the funding announcement second, and I think most builders should do the same. The money follows the capability, and the capability is where this week actually moved.

Here is the through-line. Anthropic is selling an agent that stays on task longer, checks its own work more honestly, and runs in parallel with copies of itself. That is a different product category from a smarter chatbot, and it explains the revenue better than the valuation does.

1. What shipped on May 28

Two announcements, same day. Claude Opus 4.8, an upgrade to Anthropic's top model class. And a Series H round: $65 billion raised at a $965 billion post-money valuation, announced Thursday.

The valuation crossed a line that matters for the narrative. Anthropic's last private mark was $380 billion in February. OpenAI's was $852 billion in March. The new $965 billion figure puts Anthropic ahead of OpenAI in the private market for the first time. The round was led by Altimeter, Dragoneer, Greenoaks, and Sequoia, with a long list of institutions behind them, and about $15 billion of it came from previously committed hyperscaler money, including the $5 billion Amazon announced in April.

That is the context. Now the part I care about.

2. Opus 4.8 is a story about autonomy, not IQ

The release framing matters. Opus 4.8 is reported to have "sharper judgement, more honesty about its progress," and the ability to "work independently for longer than its predecessors"9to5MacAnthropic's own page makes the same point, calling the model sharper in its judgement and more honest about its progress. Notice what is missing from that framing. There is no claim about being smarter in a single answer. The claim is about duration and self-honesty.

Anthropic backs that with two concrete capabilities. Anthropic says the model can carry out "codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge," and that it can work with "hundreds of parallel subagents in a single session." A migration from kickoff to merge is not a chat reply. It is a multi-hour job with hundreds of decision points, and the model is being sold on its ability to hold the thread across all of them.

The self-honesty claim has a number attached. Anthropic states Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked." For anyone who has watched an agent confidently ship a broken function, that line is more interesting than any benchmark. An agent that flags its own mistakes is an agent you can leave running.

The benchmarks moved in the same direction. According to the model's published scorecard, agentic coding rose from 64.3% on Opus 4.7 to 69.2% on Opus 4.8, and multidisciplinary reasoning with tools went from 54.7% to 57.9%(9to5Mac, reporting Anthropic's figures).Anthropic itself calls the jump "a modest but tangible improvement." I appreciate the honesty in that phrasing. The interesting gains sit in the agentic columns rather than the raw reasoning ones.

Pricing stayed flat at $5 per million input tokens and $25 per million output, the same as Opus 4.7. Fast mode is described as running at 2.5 times the speed while costing three times less than before. New effort settings, "extra" and "max," let you spend more tokens for harder problems. None of that is glamorous. All of it points at the same buyer: someone running long agent jobs who cares about the cost of a completed task more than the cost of a single message.

3. The rise runs through code, not chat

Karanlık ekranda aşağı akan yeşil kod karakterleri, Matrix tarzı

The single most useful framing I read this week came from the reporting around the raise: Anthropic's growth is running through code, not chat. The run-rate revenue crossed $47 billion earlier this month, up from roughly $10 billion a year ago, and the engine behind that jump is Claude Code and the developer platform rather than a consumer assistant.

This is worth sitting with. OpenAI's brand is ChatGPT, a consumer product with hundreds of millions of users. Anthropic does not have an equivalent consumer hit, and it just passed OpenAI in valuation anyway. The lesson is that selling autonomous work to developers and enterprises can be a bigger business than selling conversation to everyone.

Anthropic's CFO put the product line plainly, saying the company works "to make tools like Claude Code and Cowork more helpful, more powerful, and more adaptable." Those are agent tools rather than chat windows. The revenue mix tells you which product the market is actually paying for.

There is a distribution fact underneath this too. Anthropic now says Claude is "the first frontier model available on all three of the world's largest cloud platforms: Amazon Web Services, Google Cloud, and Microsoft Azure." If you are an enterprise already committed to any one of those clouds, Claude is now a first-party option. That removes a procurement step, and procurement steps are where enterprise AI deals usually die.

4. What the $965B actually signals

I am wary of valuation as a metric. It measures what investors will pay, rather than what a product does. But the shape of this round signals something real.

The valuation more than doubled from $380 billion to $965 billion in roughly three months. Investors are not paying that premium for a chatbot. They are paying it for a revenue curve that went from about $10 billion to a $47 billion run rate in a year, on the back of agent products. The valuation is a bet that the agent revenue keeps compounding.

The hyperscaler money matters more than the headline number. Amazon's $5 billion, folded into this round, is not a passive investment. It is a customer and an infrastructure partner buying deeper into a model it also resells. When your cloud provider is also your investor and your distribution channel, the relationship is sticky in a way a pure financial round is not.

Anthropic says the money will go to "safety and interpretability research," more compute, and scaling products and partnerships. The compute line is the one to watch. Frontier models are gated by compute, and a $65 billion raise is, in large part, a compute purchase order.

5. Where I would push back

I want to be honest about the soft spots, because the week's coverage was mostly cheerleading.

The valuation-to-revenue gap is large. A $965 billion valuation on a $47 billion run rate is a roughly 20-times multiple on annualized revenue, and that revenue is itself growing fast enough that the real forward multiple is lower. Still, it prices in years of continued hypergrowth. If the agent revenue curve bends even slightly, the number looks stretched.

"Modest but tangible" cuts both ways. Anthropic's own description of the Opus 4.8 gains is restrained. The agentic coding jump from 64.3% to 69.2% is real and useful, but it is iteration rather than a step change. The era of dramatic version-over-version leaps may be settling into steady incremental improvement, which changes how you should plan around model upgrades.

Concentration risk runs both ways with the hyperscalers. Being the first frontier model on all three clouds is a strength. Being financially entangled with Amazon, and dependent on the same few providers for compute, is a dependency. The cloud partners are customers, investors, and suppliers at once. That is an advantage for Anthropic today and a constraint if priorities ever diverge.

6. Three things I am watching

Whether "work longer" holds up outside the demo. The codebase-migration and parallel-subagent claims are strong. The test is whether they survive contact with messy production repositories rather than curated benchmarks. The first independent reports of Opus 4.8 running multi-hour jobs unattended will tell us more than the scorecard does.

The self-honesty number in the wild. "Four times less likely to let its own flaws pass" is the most consequential claim in the release, if it is true in practice. An agent that reliably catches its own errors is the difference between supervised and unsupervised deployment. I will be looking for builders who can confirm or puncture that figure.

OpenAI's response. OpenAI is reportedly preparing an IPO and still leads on consumer reach. Being passed in private valuation is the kind of thing that provokes a counter. The next major OpenAI release will be read, fairly or not, as an answer to this week.

TAO AI LAB TAO AI LAB Perspective

At TAO AI LAB we build voice agents and agentic workflows, making predictions based on big data. So a model sold on "working independently for longer" lands directly on the problem we work on every day. The hardest part of a production agent is not the clever single answer. It is staying coherent across a long task without drifting, looping, or quietly shipping something wrong.

Three implications for builders, from where I sit.

First, the product question is shifting from "how smart per reply" to "how long can it run unattended." If Opus 4.8's self-honesty claim holds, the supervision cost of an agent drops, and the supervision cost is most of what makes agents expensive to operate. We design our systems assuming an agent will make mistakes, so a model that catches more of its own mistakes changes our error-handling budget directly.

Second, "code, not chat" is a signal worth internalizing. The money sits in autonomous work more than in conversation. For a voice agent that means the value is in completing the booking, resolving the request, and updating the system of record, more than in sounding pleasant. We try to keep that distinction at the center of what we build.

Third, Claude being native on all three major clouds matters for small labs as much as for enterprises. It means the model we prototype against is the same one a future enterprise client can adopt inside their existing cloud contract. That continuity from prototype to deployment is rare, and it lowers the risk of betting on a provider.

Three signals:

  • Optimize for the cost of a completed task, rather than the cost of a token. Opus 4.8's flat pricing and cheaper fast mode reward agents that finish jobs. If your system is measured per message, you are measuring the wrong thing for an agent economy.
  • Build for self-checking agents. The "four times fewer unremarked flaws" claim points at where models are heading. Architect your agents to expect and use self-verification, so you inherit the gain as models improve.
  • Treat model choice as a routing decision tied to your cloud. With Claude native on AWS, Azure, and Google Cloud, provider lock-in matters less than it did. Design so you can route to the right model where your data already lives.

Frequently Asked Questions

What is Claude Opus 4.8?

It is Anthropic's top-tier model, released May 28, 2026, positioned around longer autonomous work. Anthropic describes it as having sharper judgement and the ability to work independently for longer, and says it is around four times less likely than Opus 4.7 to let flaws in its own code pass unremarked.

How much did Anthropic raise, and at what valuation?

Anthropic raised $65 billion in a Series H round at a $965 billion post-money valuation, announced May 28, 2026. That figure put it ahead of OpenAI's $852 billion March valuation in the private market for the first time.

Why is Anthropic's growth described as "code, not chat"?

Its run-rate revenue crossed $47 billion this month, driven mainly by Claude Code and its developer platform rather than a consumer chatbot. The agent and coding products, rather than conversation, are the revenue engine.

Is Claude available on AWS, Azure, and Google Cloud?

Yes. Anthropic says Claude is the first frontier model available on all three of the largest cloud platforms, which lets enterprises adopt it inside their existing cloud contracts.

How big are the Opus 4.8 benchmark gains?

Agentic coding rose from 64.3% to 69.2% and multidisciplinary reasoning with tools from 54.7% to 57.9% versus Opus 4.7. Anthropic calls the upgrade "a modest but tangible improvement," with the larger gains in agentic tasks.

Your turn

Here is the question I keep coming back to. For your own work, what matters more right now: a model that gives a smarter single answer, or one that can run a long task unattended without drifting? Those are not the same model, and the answer says a lot about what you are actually building.

And if you have run Opus 4.8 on a real job: did the "works longer" and "catches its own flaws" claims hold up, or fall apart? Real reports from real repositories are worth more than any benchmark.

Drop your thoughts in the comments. I read and reply to all of them.

Sources:

Leave A Comment