Inference splinters the silicon stack
Two thirds of AI compute now runs inference, a workload increasingly served by silicon Nvidia did not design
QUALCOMM, the largest maker of smartphone processors, climbed more than 15% in late trading on April 29th after telling investors that a "top hyperscaler" would begin using its silicon later this year. The buyer was not named; the news was that there was one at all. For two decades the firm has flirted with the data center, selling Centriq server chips before killing them and pursuing then abandoning an Arm-server pact with Microsoft along the way. What has changed is not Qualcomm but the workload.
The headline numbers were mixed. Sales of $10.6 billion in the quarter ending March 29th were down 3% from a year earlier, and profit excluding items came in at $2.65 a share, a beat. Guidance for the current quarter — $9.2 billion to $10 billion in revenue and $2.10 to $2.30 in earnings per share — fell short of analysts' expectations, the consequence of a memory crunch that has forced smartphone makers to cut volumes. To soften the blow, Qualcomm announced a stock-repurchase program of up to $20 billion. The shares rose anyway, on a single line in the earnings call from Cristiano Amon, the chief executive: that the new hyperscaler customer was committed to multiple generations of a custom ASIC, the kind of chip Broadcom and Marvell currently sell to Google, Meta and Amazon.
Specific gravity
But the deal reads better as a barometer than as a Qualcomm story. The product itself is the AI200 and AI250, inference accelerators announced last October and packaged into rack-scale systems; the first named customer was Humain, an AI startup backed by Saudi Arabia's sovereign-wealth machinery, and the unnamed hyperscaler is the second. An analyst day in June will detail the roadmap.
This article is for Vector members. Start a 7-day free trial to keep reading.
Start your free trial