Claude's compute crunch
The lab that built the most wanted model did not build the capacity to serve it to the masses
ON APRIL 16TH, Anthropic released Claude Opus 4.7 with a conspicuous denial. The company's blog post confirmed what users had been saying for weeks: Claude Opus 4.6 had, through a series of mostly undocumented changes, quietly become less capable on the hardest tasks. What Anthropic insisted was untrue was the explanation. The model had not been scaled back to redirect compute toward Mythos, its more powerful successor, or toward the ballooning subscriber base the company had acquired over the preceding six weeks. Opus 4.7, it argued, was simply better. The word that appeared in the release only as the thing being denied was the one users had been typing into Reddit threads since mid-March: compute.
That word is now the single most important one in the company's story. Anthropic began 2026 with roughly $9 billion in annualized revenue. By the end of March, in a figure the company publicly confirmed on April 6, that figure had reached $30 billion — a 3.3-fold increase in a quarter, a rate of ARR growth without close precedent at this scale. The proximate cause is well known. In late February the Pentagon blacklisted Anthropic as a "supply-chain risk" after the company refused to let Claude be used for mass domestic surveillance or fully autonomous weapons; OpenAI signed the contract Anthropic had refused, hours later; and an American consumer base that had spent two years treating the two labs as interchangeable suddenly decided they were not. ChatGPT uninstalls spiked 295 percent in a day, Claude topped the US App Store for the first time in its history, and by the end of March the company was serving roughly 18.9 million professional users — a base of users that had not existed in November.
Token resistance
But every one of those users arrived carrying a bill. Inference — the compute used to generate responses from a trained model — is the part of the AI stack that does not scale with cleverness. It scales with customers. And the customers Anthropic gained from the QuitGPT wave, a wave the company did not solicit and could not refuse, arrived in an infrastructure it had not sized for them. On March 26, Anthropic tightened peak-hour session limits during weekday afternoons and quietly reduced the default "effort" level for Claude Code, two changes that together produced the performance regression its heaviest users had been flagging for weeks. The $200-a-month Max tier reportedly exhausted in twenty minutes of active coding. Stella Laurenzo, a senior director in AMD's AI Group, published a quantitative analysis of nearly 18,000 thinking blocks across almost seven thousand Claude Code sessions. "Claude has regressed to the point it cannot be trusted to perform complex engineering," she wrote in the GitHub issue that accompanied her data. OpenAI's revenue chief, in a memo that surfaced in CNBC reporting, described Anthropic's compute posture as a "strategic misstep" that left the company "operating on a meaningfully smaller curve" than its rivals. Anthropic publicly said nothing of the kind; in the changelog, it said everything of the kind.
Opus 4.7's "adaptive thinking" is the version of that admission the company is willing to put on a product page. In Opus 4.6, developers could request that the model think for a fixed budget of tokens before answering — a lever that translated directly into inference cost per query. In 4.7, that lever is gone. The fixed-budget mode is deprecated. Thinking is off by default; callers who want it must opt in, and when thinking does happen, the reasoning content itself is now omitted from the response unless a separate flag is set. Anthropic describes the change as a latency improvement, which is true in the narrow sense that hidden work appears to take less of it. The broader framing is that the model has been given the intelligence to decide when to think and how hard, which is also true as far as it goes. The practical effect is that Anthropic, not the customer, now decides how many tokens get spent on each request, and the customer cannot see how many were spent until the monthly bill arrives. "Adaptive" is the correct word. It is adaptive to Anthropic's compute constraints.
The release notes for Opus 4.7 supply a catalog of changes in the language of user benefit that read more plausibly in the language of unit economics. Response length now "calibrates to perceived task complexity rather than defaulting to a fixed verbosity," which is to say Claude now says less by default. "Fewer tool calls by default, using reasoning more" — fewer round-trips to external systems, at the cost of reasoning tokens the company now controls. "More direct, opinionated tone with less validation-forward phrasing and fewer emoji" — fewer filler tokens. The new tokenizer, by Anthropic's own admission, consumes roughly 1 to 1.35 times as many tokens per unit of text as the old one, a 35 percent swing in input costs the company is quietly asking its API customers to absorb. Taken together, these changes read as a systematic effort to reduce serving cost per query — which is precisely what a compute-constrained company would do. The alternative would be to announce a capacity shortfall no frontier lab wants on record twice in the same quarter.
The ASIC gamble
The rest of the story unfolds on the hardware side, and it was made unusually explicit last week by the person best positioned to tell it. On April 15th, Nvidia's Jensen Huang sat with the podcaster Dwarkesh Patel for ninety minutes, mostly on autopilot. But pressed on why a non-trivial volume of frontier AI workload now runs on Google's TPUs and Amazon's Trainium chips rather than on his own, Huang was more candid than the occasion required. "Without Anthropic, why would there be any TPU growth at all? It's 100 percent Anthropic. Without Anthropic, why would there be Trainium growth at all? It's 100 percent Anthropic." The entire non-Nvidia training market, in Huang's telling, is one company, and he went on to explain why. Nvidia, a few years ago, had been unable to make the multi-billion-dollar equity investment that Anthropic needed to underwrite its first tranche of training compute. Google and Amazon could, and did. The offtake, as Huang put it, followed the capital. He called this his "miss" and said he does not intend to repeat it.
That miss has now become Anthropic's most distinctive feature. Claude today runs in production across three distinct silicon architectures — Nvidia GPUs, Google TPUs, and AWS Trainium — serving customer inference from the same underlying model on all three. The achievement is real: moving the same model across heterogeneous hardware requires a layer of compiler and kernel work that most labs do not bother with because they do not need to. The bulk of Anthropic's live serving capacity runs through AWS Project Rainier, the Indiana-based Trainium2 cluster that went operational in late 2025 and now houses more than one million custom chips dedicated to Anthropic workloads.
The comparison with peers is instructive. OpenAI runs primarily on Nvidia today, with AMD's older MI300X silicon already active in Azure OpenAI production for legacy GPT models, Microsoft's in-house Maia 200 accelerators serving a meaningful share of GPT-5.2 inference, and a pipeline of newer deployments — 6 gigawatts of AMD MI450, 10 gigawatts of bespoke Broadcom accelerators (codenamed "Titan"), 2 gigawatts of AWS Trainium3/4, and 750 megawatts of Cerebras inference capacity — all ramping toward scale through 2026 and 2027. xAI runs primarily on Nvidia, with a 200,000-GPU "Colossus" cluster in Memphis and a smaller AMD Instinct footprint. Meta runs primarily on Nvidia with its MTIA 300 already in production for Facebook and Instagram ranking and recommendation, AMD MI300X active for Llama inference, and a Google TPU rental agreement and a multi-gigawatt Broadcom-designed 2nm MTIA roadmap through 2029 both ramping toward deployment. Google, the other notable exception, runs its frontier Gemini models on its own Broadcom-manufactured TPUs by default because it co-designed them.
Anthropic is the only frontier lab running the same customer-facing model across three silicon stacks. Meta runs three architectures too, but spread across distinct workloads — training on Nvidia and AMD, ranking and recommendation on MTIA — not a single foundation model. The other labs are still ramping their second and third stacks toward deployment; AWS engineers, speaking to TechCrunch in late March about Trainium, said they "haven't had much chance to work with OpenAI yet."
What that silicon diversity buys Anthropic in resilience, it does not buy in scale. By Vector's tally of counterparties able to run customer inference today, Anthropic has roughly $71 billion of contracted serving capacity in the ground: Microsoft Azure ($30 billion), Google TPUs ($30 billion, described by Anthropic only as "tens of billions"), and AWS Project Rainier ($11 billion of Amazon's own capex, with no Anthropic-side headline dollar figure disclosed). OpenAI has roughly $272 billion live — Microsoft Azure ($250 billion) plus CoreWeave ($22 billion) — a ratio of roughly four to one. Layer in contracted capacity still being built, and the gap widens. Anthropic has announced a further $50 billion with Fluidstack for custom data centers in Texas and New York, the first of which comes online later this year. OpenAI has announced $300 billion with Oracle for 4.5 gigawatts of Stargate capacity from 2027, a $138 billion AWS expansion that includes 2 gigawatts of Trainium scaling through 2027, and a $10 billion Cerebras deal for 750 megawatts of inference capacity phased through 2028. Sum it all, and OpenAI's $720 billion of announced serving capacity exceeds Anthropic's $121 billion by a factor of roughly six.
These figures do not include the chip-partnership commitments that sit one layer deeper in OpenAI's stack — up to ten gigawatts of bespoke Broadcom accelerators, six gigawatts of AMD Instinct, and ten gigawatts of Nvidia systems tied to a matching equity investment. Those commitments, worth somewhere between $100 billion and $200 billion for Broadcom alone by analyst estimates and another $175 billion in announced AMD and Nvidia dollars, will eventually need to be deployed inside data centres someone is paying Oracle or AWS or Microsoft to operate. For Anthropic, no such secondary layer exists. What the company has contracted is what the company will run.
The financing structure is the thing Huang was describing when he said Nvidia had learned its lesson. OpenAI is not underwriting its $720 billion. Its counterparties are, in exchange for equity and offtake. Oracle's original $300 billion Stargate commitment has already lost a 600-megawatt Abilene expansion; Stargate UK was paused on April 9 and Stargate Norway was abandoned on April 15 and handed to Microsoft. In February, Sam Altman quietly reset OpenAI's 2030 infrastructure spend guidance from $1.4 trillion to roughly $600 billion. Even the halved version dwarfs what Anthropic can command.
Demand curse
Broadcom is slated to supply Anthropic with 3.5 gigawatts of Google-designed TPU capacity starting in 2027, a timeline InfoWorld-sourced analysts attribute to the HBM memory and CoWoS packaging constraints that also bound Nvidia's output. The theoretical advantage of running on someone else's silicon — more chips, more vendors, more flexibility — vanishes in the quarter that matters, because the bottleneck is upstream of the chipmaker. Every subscriber the Pentagon fight sent to Claude was a subscriber whose inference had to fit inside the roughly $71 billion of serving capacity Anthropic actually controls today, not the $121 billion total it has on the books or the notional capacity that will come online in 2027.
There is a plausible counterargument to all of this, and on one important metric it holds up. Vector's arithmetic puts Anthropic at roughly four times OpenAI's ratio of ARR per dollar of live serving capacity — a superlative Anthropic has not claimed for itself, but one the numbers support. Thirty billion dollars of ARR on $71 billion of committed live inference infrastructure is, by any reasonable accounting, materially more efficient than a rival generating $24 billion on $272 billion. That efficiency is also the trap. Subscription pricing, which offered flat monthly revenue per user even as the heaviest Claude Code accounts burned through tokens that cost multiples of the subscription, had quietly subsidised the compute-intensive workflows that made Claude beloved in the first place. Efficient inference was not an unalloyed good. It was the mechanism by which Claude could be used enough, by a small cadre of power users, to become unaffordable.
The company's response to that unaffordability — weekly caps, hour-restricted sessions, adaptive thinking, the aggressive policing of third-party tools that bypassed Claude Code's rate limits — is recognisable to anyone who watched the early cloud vendors ration capacity during the 2010s. It is also recognisable in a different register, as the beginning of a conversation about what kind of product Claude is. For the two years of its commercial life, Claude had been sold on the premise that the company with the best alignment research would produce the most useful model. That premise is still true. What has changed is that the premise did not price the downstream cost of being chosen by everyone who believed it. Anthropic's public position since the Pentagon blacklist has been that the fight was about values rather than market share. It was. The fight was also, in its unintended consequences, about the unit economics of inference. Those economics do not care whose values anyone holds.
Mythos versus arithmetic
The interesting question now is what version of Anthropic emerges at the other end of this squeeze. Mythos, the successor model the company released in restricted preview on April 7th, is larger, more capable, and — by every indication — substantially more expensive to serve. It is not being broadly released, officially because of its cybersecurity implications, which the company's own Project Glasswing has documented in a 244-page system card. It is also not being broadly released because Anthropic has no realistic path to running it at consumer scale until its Fluidstack build and its next TPU tranche come online. Opus 4.7, by contrast, is smaller, shorter by default, more literal, and less willing to spend tokens unprompted. The release notes describe it in the language of user benefit. In the language of compute, it is a model engineered to be cheaper to serve to the customers Anthropic now has, while the models engineered for the customers it wants are held back for a year.
Dario Amodei himself sketched the forward curve on Dwarkesh Patel's podcast on February 14, describing an industry trajectory in which global AI compute grows from roughly 10 to 15 gigawatts in 2026 to something like 300 gigawatts in 2029, and placing Anthropic's own target at roughly 10 gigawatts by 2027 or 2028 — a fraction of the 30-plus gigawatts OpenAI has publicly targeted. If the Broadcom-manufactured TPUs ship on schedule and the Fluidstack data centres finish without incident, the squeeze eases and the inference tax becomes, in retrospect, a difficult quarter. If either slips, the company enters 2027 having to explain why the most valuable AI lab in America kept running out of capacity on the product its customers actually wanted to buy. Huang, for his part, was asked on the Dwarkesh podcast whether other frontier labs might yet migrate to ASICs the way Anthropic has. He answered that there is "only one Anthropic," by which he meant it is hard to find another company whose existential need for compute so far exceeded its ability to raise conventional venture capital that two hyperscalers were prepared to underwrite its entire training run. He is probably right. He may also be describing a company whose distinctiveness is about to become, for several uncomfortable quarters, its most visible liability.