Hardware / T-2026-2588

Custom AI ASICs hit an inflection point: 27.8% of server shipments in 2026

Q: Custom AI ASICs hit an inflection point: 27.8% of server shipments in 2026 — key point 1

Custom AI ASIC shipments are projected to grow 44.6% year-over-year in 2026, nearly triple the 16.1% growth rate for merchant GPUs.

Q: Custom AI ASICs hit an inflection point: 27.8% of server shipments in 2026 — key point 2

Broadcom reported $8.4 billion in AI semiconductor revenue for Q1 FY2026, a 106% year-over-year increase, with a $73 billion AI backlog.

Q: Custom AI ASICs hit an inflection point: 27.8% of server shipments in 2026 — key point 3

Google claims the total cost of ownership per TPU Ironwood chip is roughly 44% lower than a GB200 server from its own procurement perspective.

Custom AI ASIC shipments will hit 27.8% of the market in 2026, with Broadcom alone carrying a $73 billion backlog. An inflection point for inference compute.

Tessera Newsroom · 7 min read · May 31, 2026

Source The custom AI ASIC state of play (May 2026) - Tom's Hardware (tomshardware.com)

FIGURE T-2026-2588

27.8% HARDWARE

The custom AI ASIC market has crossed an inflection point. ASIC-based AI server shipments are projected to reach 27.8% of the market in 2026, the highest share since 2023, with custom ASIC shipments growing 44.6% year-over-year — nearly triple the 16.1% growth rate projected for merchant GPUs. That is the headline from a comprehensive Tom’s Hardware survey of the custom silicon landscape published May 21.

The driving factor is not training but inference. Deloitte projects inference workloads will account for two-thirds of all AI compute this year. Custom ASICs are designed for exactly that: specific model architectures, specific batch sizes, specific latency targets. Nvidia still holds roughly 70% of the AI chip market, but that share is projected to erode as every major hyperscaler now designs its own silicon and deploys it at scale.

Broadcom is the invisible giant of this shift. The company reported $8.4 billion in AI semiconductor revenue for Q1 FY2026 (ending February 2026), a 106% year-over-year increase, and guided to $10.7 billion in Q2. CEO Hock Tan told investors Broadcom has “line of sight to achieve AI revenue from chips in excess of $100 billion in 2027,” backed by a disclosed $73 billion AI backlog. Broadcom has confirmed six major XPU customers: Google (its longest-standing partner, with seven generations of co-designed TPUs since 2014), OpenAI (signed a multi-year collaboration in October 2025 for 10 gigawatts of custom accelerators), Meta, ByteDance, and Fujitsu. Analysts have identified Apple and Arm/SoftBank as potential future engagements.

Broadcom’s technical edge is its 3.5D XDSiP platform, using face-to-face 3D stacking via TSMC’s SoIC process combined with 2.5D CoWoS integration. The platform enables packages exceeding 6,000 mm squared of silicon with up to 12 HBM stacks, far beyond the roughly 2,500 mm squared limit of conventional 2.5D designs. In February, Broadcom announced it had begun shipping the industry’s first 2nm compute SoC built on this platform, integrating four N2 compute dies, one I/O die, and six HBM modules.

Google’s TPU program remains the most mature custom effort. The TPU v7, codenamed Ironwood and announced at Cloud Next in April 2025, delivers 4,614 FP8 TFLOPS per chip with 192 GB of HBM3E memory at 7.37 TB/s bandwidth. Each chip is manufactured on TSMC’s N3P process in a dual-chiplet design co-developed with Broadcom and MediaTek. The 9,216-chip superpod configuration delivers 42.5 FP8 exaflops with 1.77 PB of aggregate HBM.

Per-chip, Ironwood’s 4,614 TFLOPS sits close to Blackwell’s approximately 5,000 FP8 TFLOPS. But SemiAnalysis estimates that TPUs achieve higher sustained model FLOP utilization — roughly 90% for transformers versus 70% to 80% for GPUs — narrowing or erasing the real-world performance gap. Google claims the total cost of ownership per Ironwood chip is roughly 44% lower than a GB200 server from its own procurement perspective.

That TCO advantage is driving adoption beyond Google’s own services. Anthropic committed to up to one million TPUs in the largest deal in Google Cloud history back in October. Meta entered talks for multi-billion-dollar TPU deployments in February 2026. The current-generation TPU v6e Trillium remains widely available on Google Cloud at $2.70 per chip-hour on demand, delivering roughly four-times better price-performance than H100 instances for LLM workloads, according to Google’s own benchmarks.

Amazon’s Trainium program is matching Google’s pace. Trainium3, which went generally available at re:Invent in December 2025, is AWS’s first 3nm chip. Each Trainium3 delivers 2.517 PFLOPS FP8 with 144GB HBM3E at 4.9 TB/s bandwidth, roughly double the compute and 1.5 times the memory of its predecessor. The new Trn3 UltraServer packs 144 chips delivering 362 FP8 petaflops with 20.7 TB of memory, a 4.4 times improvement over Trn2 UltraServers.

AWS CEO Matt Garman said at re:Invent 2025 that the company had “already deployed more than 1 million Trainium processors” and was selling them as fast as production allowed. CEO Andy Jassy called it “already a multibillion-dollar business.” The Project Rainier facility in Indiana, an $11 billion, 2.2 GW campus, had roughly 500,000 Trainium2 chips running for Anthropic by October 2025. AWS also confirmed an OpenAI deal to supply 2 GW of Trainium computing capacity.

Trainium4 was announced in December 2025 for late 2026 or early 2027 availability, promising three times FP8 performance, six times FP4 throughput, and four times memory bandwidth over Trainium3, with an estimated 288 GB of memory. One notable feature: support for Nvidia NVLink Fusion, enabling hybrid clusters that mix Trainium and Nvidia GPUs.

Meta disclosed one of the most ambitious custom chip roadmaps in March, unveiling four new MTIA generations (300 through 500) for deployment through 2027, in addition to the already-shipping MTIA 100 and 200. The company has deployed hundreds of thousands of MTIA chips for inference across Facebook and Instagram. The MTIA 400 delivers 6 PFLOPS FP8 and 18 PFLOPS MX4 with 288GB HBM at 9.2 Tbps bandwidth in a 1,200W envelope. The MTIA 500, scheduled for 2027 mass deployment, scales to 10 PFLOPS FP8 and 30 PFLOPS MX4 with up to 512GB HBM at 27.6 Tbps in a 2x2 chiplet configuration, consuming 1,700W.

Meta has been explicit that MTIA is not a replacement for Nvidia GPUs. The company expanded its Nvidia partnership in February for “millions of AI chips,” including Grace Blackwell and future Vera Rubin platforms, in a deal reportedly worth tens of billions. Custom silicon handles optimized inference at a massive scale. Nvidia handles frontier model training. With $115-135 billion in 2026 capex guidance, Meta is buying everything it can from both sources.

Microsoft’s Maia 200, deployed in January, is manufactured on TSMC 3nm with over 140 billion transistors. The chip delivers more than 10 PFLOPS FP4 and 5 PFLOPS FP8 with 216GB HBM3E at 7 TB/s bandwidth in a 750W envelope. Microsoft claims it offers 30% better performance per dollar than the best hardware in its existing fleet and calls it “the most performant first-party silicon from any hyperscaler.” Maia 200 currently serves GPT-5.2 models for OpenAI and powers Microsoft 365 Copilot workloads from its Des Moines data center.

The path to Maia 200 was not smooth. The original Maia 100, built on TSMC 5nm, was reportedly designed more for image processing than generative AI and never powered production AI services at scale. Maia 200 was delayed roughly six months due to design changes requested by OpenAI that caused simulation instability, plus chip team turnover. CEO Satya Nadella has emphasized that Microsoft will continue purchasing Nvidia and AMD chips alongside Maia.

Tesla’s Dojo project met a different fate. Despite years of development and an innovative D1 chip (TSMC 7nm, 50 billion transistors, 362 TFLOPS BF16, with a unique 354-core mesh architecture), Tesla disbanded the Dojo team in August 2025. Lead architect Peter Bannon departed, and roughly 20 engineers left to found DensityAI. Elon Musk explained that “once it became clear that all paths converged to AI6, I had to shut down Dojo.” Tesla is now focusing on AI5 and AI6 inference chips, with AI6 backed by a $16.5 billion Samsung fabrication deal, while relying on Nvidia hardware for current training needs.

TSMC is the indispensable enabler across all these efforts. The foundry generated $122.4 billion in 2025 revenue, up 36% year-over-year, and forecasts a 60% compound annual growth rate for AI chip revenue through 2029. Its CoWoS advanced packaging capacity is scaling from roughly 65,000-75,000 wafers per month in 2025 to a target of 120,000-130,000 wafers per month in 2026. Capital expenditure of up to $56 billion is planned for the year. The 2nm node entered mass production at the back-end of 2025, with capacity fully booked.

Nvidia has secured roughly 60% of CoWoS allocation (about 595,000 wafers), Broadcom about 15% (about 150,000 wafers), and AMD approximately 11% (about 105,000 wafers). Every custom ASIC in this article depends on CoWoS or its successor CoWoS-L for HBM integration.

What this means for AI builders is straightforward: the compute landscape is diversifying faster than many expected. A year ago, the conventional wisdom held that Nvidia’s CUDA moat and software ecosystem would keep custom ASICs as niche players. That is no longer the case. Google’s TPU v7 achieves higher sustained model FLOP utilization than Blackwell. Amazon has deployed over a million Trainium processors. Broadcom has line of sight to $100 billion in annual AI chip revenue by 2027.

The open question is whether the software ecosystem can keep pace. Custom ASICs require custom compilers, custom runtime optimizations, custom model partitions. Google’s TPU software stack is mature after seven generations. Amazon’s Neuron SDK is still catching up. Meta’s MTIA software is entirely internal. For AI builders, the practical implication is that portability across hardware backends is becoming a competitive advantage — and that the era of “one GPU architecture to rule them all” is ending.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / HARDWARE

The 370kW Rack: Why Power and Cooling Define AI Infrastructure in 2026

AI rack densities are forcing a rebuild of the data center industry around liquid cooling, grid-scale power, and modular deployment

Tessera Newsroom · July 3, 2026

Hardware / T-2026-0097

Lambda's Balaban on AI Compute 2026: The Real Bottleneck is Shell, Not Silicon

Lambda's Stephen Balaban argues AI compute's real constraint is land and power, not chips, in a new podcast conversation.

Tessera Newsroom · July 2, 2026

Hardware / T-2026-8334

NVIDIA's RTX Spark Brings a Petaflop of AI Compute to the Windows PC

NVIDIA launches RTX Spark, a 1 petaflop AI superchip for Windows PCs, developed with MediaTek and optimized with Microsoft. First devices arrive fall 2026.

Tessera Newsroom · July 1, 2026