What is a TPU?
The TPU (Tensor Processing Unit) is a proprietary ASIC (application-specific integrated circuit) that Google designed to accelerate its own neural network inference and training. It has an architecture that strips away unrelated features found in general-purpose GPUs—such as variable pipelines and ray tracing—and goes all-in on matrix multiplication (MatMul) and reduction operations. The first-generation TPU was deployed internally in 2015, and then-CEO Sundar Pichai first publicly revealed its existence at Google I/O in 2016. Since then, Google has expanded the TPU for training by equipping the v2 with HBM, introduced liquid cooling in the v3, established a 3D torus fabric using optical circuit switches (OCS) in the v4 and v5, and pursued the simultaneous goals of large-scale training and high-speed inference with the sixth-generation "Trillium" and seventh-generation "Ironwood."
The design is characterized by a systolic array called the matrix unit (MXU), ultra-wide-bandwidth memory provided by HBM, and a "scale-up fabric" that treats an entire pod as a single logical machine. Whereas Nvidia GPUs bundle individual nodes together via NVLink and InfiniBand, the TPU embraces a philosophy of enlarging a hardware-coherent shared memory space so that a single job can be housed in its entirety—and the biggest difference from other companies' ASICs is that it operates as one with Google's own software stack, including JAX and Pathways. Dylan Patel of SemiAnalysis describes this as "an advantage at the system-architecture level rather than the microarchitecture level," and positions it as the source of Google Cloud's structural total-cost-of-ownership advantage over Microsoft Azure and Amazon EC2.
The Impact of TPU 8t and TPU 8i — The "Inflection Point" Arriving at the 8th Generation
The biggest point of contention in the 8th generation is that Google, for the first time, split TPU into two chips per line. The training-focused "TPU 8t" (internal codename Sunfish), whose design is led by Broadcom, features a superpod composed of 9,600 chips with 2 petabytes of shared HBM and 121 ExaFLOPs (FP4), raising training price-performance by up to 2.8x versus Ironwood. Meanwhile, the inference and inference-time reasoning-focused "TPU 8i" (codename Zebrafish), designed by new partner MediaTek, packs 288GB of HBM and 384MB of on-chip SRAM (3x the previous generation) into a pod composed of 1,152 chips, improving inference price-performance by 80% versus Ironwood. Both are said to still lag Nvidia's Vera Rubin R200 and AMD's MI455X in absolute per-chip compute performance by a roughly 3-to-1 margin, but Google argues that at the pod level — and ultimately at the data center level — they can compete on par or better in total cost of ownership and throughput.
The core of the shock lies in three points. First, Google has effectively abandoned the concept of a "general-purpose AI chip." HyperFRAME Research characterized this as "an implicit confession that the load profiles of pretraining and mass-parallel agentic inference have diverged too far," noting that Google has steered toward specialization rather than hybrid optimization. Second, Broadcom's monopoly has collapsed with MediaTek's entry, and a group of analysts led by Bank of America's Vivek Arya estimate that this will push the ASP per TPU upward from the previous $5,000–$6,000 (about ¥770,000–¥930,000) to $12,000–$15,000 (about ¥1.86M–¥2.32M). Third, Anthropic has been positioned as the largest customer using up to 1 million chips, and Meta, Siri inference via Apple, Citadel Securities, the 17 U.S. Department of Energy national laboratories, and even OpenAI have begun securing TPU capacity. The simultaneous progression of three movements — specialization, dual sourcing, and expanded external sales — is what elevates Cloud Next 2026 from a "mere annual event" into a "structural inflection point in the AI infrastructure market."
Technical Deep Dive — Boardfly and Innovations in Fabric Design
The TPU 8t inherits the conventional 3D torus while introducing FP4-native arithmetic and TPUDirect RDMA. It delivers 12.6 FP4 PFLOPs per chip, supplying data from 216GB of HBM3e at 6,528GB/s of bandwidth. Of particular note are the upgrade of the ICI (Inter-Chip Interconnect) to 19.2Tbps and the 10x IO acceleration via storage-attached TPUDirect Storage, which substantiate Google's claim of shortening the training cycle for a single job from months to weeks. Furthermore, at the fabric layer, the new-generation "Virgo Network" connects more than 134,000 TPU 8t chips with 47 petabits per second of bidirectional bisection bandwidth, and combined with Pathways, the design enables the construction of a single training cluster on the scale of one million chips. The ability to maintain a "goodput" of 97% utilization in combination with optical circuit switches (OCS) also holds immeasurable value for foundation model development, which demands long-duration continuous training.
The design of the TPU 8i goes even further. The largest structural change is the abandonment of the 3D torus in favor of a new topology called "Boardfly," inspired by high-radix research from 2008. Comparing at a domain scale of 1,024 chips, while the 3D torus required 16 hops for the farthest communication, Boardfly requires 7 hops—meaning the network diameter has been reduced by 56%. This carries decisive significance for workloads requiring unpredictable all-to-all communication, such as Mixture-of-Experts models and inference-time reasoning (chain-of-thought). In addition, by entirely removing Ironwood's SparseCore block and replacing it with a newly established Collectives Acceleration Engine (CAE) on the core chiplet die, the latency of on-chip collectives in autoregressive decoding has been reduced by up to a factor of five. Patrick Moorhead has described this as "the right bet for the agentic era, optimizing for latency rather than bandwidth." Moreover, both chips adopt Google's proprietary Arm-based "Axion" as the host CPU, and by combining it with fourth-generation liquid cooling, they raise the per-rack thermal density while doubling performance per watt compared to the previous generation. The manufacturing node is said to be TSMC's 2nm-class process, but Google has not officially confirmed this, and some view it as part of the TSMC N3 series, so this area requires reservation.
Silicon Valley VCs' Take — "The Knife Has Been Drawn"
Major Silicon Valley VCs are receiving the TPU 8t / 8i announcement as an event that accelerates "the transition from a future where Nvidia takes 99% of the market to one where it takes 80%." Under the "Theory of Well" thesis led by Andreessen Horowitz partner Anjney Midha, the most durable value in the AI stack accumulates not in applications but in the "well"—the infrastructure layer that controls chokepoints. a16z raised a total of $15 billion (approximately ¥2.3 trillion) in 2025 and announced it would allocate $1.7 billion (approximately ¥260 billion) of that to AI infrastructure. A recent memo from the firm lays out the view that "Google pushing its proprietary TPUs, Amazon pushing Trainium / Inferentia, and Microsoft pushing Maia is a war to defend their well positions to the death, and startups should not charge head-on into it." In other words, a16z is reading the arrival of the TPU 8t / 8i as a signal to reconfirm where its own portfolio should not be placing bets.
Sequoia Capital and Founders Fund have refrained from official comment, but industry media coverage reports that both firms are shifting their investment decisions on foundation model companies such as Anthropic, xAI, Cohere, and Mistral toward a framework that depends heavily on "accessible compute capacity and its pricing curve." On April 24, 2026, Anthropic received up to $40 billion (approximately ¥6.2 trillion) in additional investment and 5 gigawatts of TPU capacity from Google, bringing its post-money valuation to $350 billion (approximately ¥54 trillion). Immediately afterward, it also signed a 5GW agreement with AWS, securing a combined 10GW of compute capacity, and the unrealized gains from the round Sequoia led in 2025 are expanding rapidly. Within the $3.5 billion (approximately ¥540 billion) AI fund Kleiner Perkins announced in March 2026, moves to explore participation opportunities in new neoclouds centered on TPU 8t (such as the Blackstone-Google joint venture) have also been reported.
The most symbolic move came on May 19, 2026, when Blackstone announced that it had committed $5 billion (approximately ¥780 billion) in equity to a joint venture with Google and would bring a 500MW TPU-based data center online in 2027. Strictly speaking, this is a private equity move rather than a VC one, but it was also the moment when the Silicon Valley VC community recognized that "for the first time, a TPU-based neocloud has emerged as a counterweight in a world that had been dominated entirely by Nvidia-based neoclouds." Multiple VC partners have said anonymously that "with the TPU 8t / 8i announcement, the era of seriously conducting due diligence on alternatives to Nvidia has finally arrived," serving as a catalyst that reinforces the Silicon Valley VC investment theme of "the decentralization of compute access."
Reporting stance of each newspaper and website
In an article dated April 22, Bloomberg's Ian King positioned the TPU 8t / 8i as "the most serious challenge yet to Nvidia's stronghold," covering the 5GW contract with Anthropic and the Blackstone JV announcement as a package, and summarized that "Wall Street understood for the first time that the AI chip race is no longer a one-horse race." Reuters took a more cautious tone, emphasizing the fact that Google itself still offers Nvidia GPU instances (Vera Rubin NVL72) on the same Virgo fabric, cautioning that it is "complementary, not a complete replacement." The Wall Street Journal focused on the division-of-labor structure between Broadcom and MediaTek, reporting that the Wall Street average price target for Broadcom stock had been raised to $478 (approx. ¥74,000), and that Morgan Stanley's Brian Nowak raised his price target on April 23 from $235 (approx. ¥36,000) to $258 (approx. ¥40,000).
The tone of technology-specialist media is somewhat different. Tom's Hardware presented detailed tables under the frame that "while the chip alone is inferior to Nvidia, the total cost of ownership at scale flips the equation," and SemiAnalysis's Dylan Patel also wrote in his newsletter that "microarchitecture is only a small part of the true cost of AI infrastructure; system architecture and deployment flexibility are the real essence." Stratechery's Ben Thompson published an exclusive interview with Google Cloud CEO Thomas Kurian, evaluating that "ten years of accumulation in which Google has honed itself as its own top user (customer zero) have finally crystallized into a product that can be sold externally." Meanwhile, Patrick Moorhead of Moor Insights & Strategy framed it as "TPU is not 'confronting' Nvidia, but competing at the system level like Apple Silicon," presenting a cautious view that definitive judgments should be avoided until peer-reviewed third-party benchmarks (MLPerf, InferenceMax) are released.
In Japan, Nikkei Shimbun, ASCII, HelenTech, GIGAZINE, AI Research Institute, and others uniformly took up the structural point of "separate chips for training and inference" along with the figures "2.8x / 80% vs. Ironwood" and "2x per watt," with ASCII running Google's claim that "frontier model development is shortened from months to weeks" directly as a headline. GIGAZINE emphasized "2x performance per watt," suggesting that energy constraints will become the next axis of competition. AI Revolution Inc. and others reinforced the division-of-labor structure of Broadcom = Sunfish, MediaTek = Zebrafish, and the perspective that this will generate new tightness around TSMC CoWoS production capacity.
Customers and the Demand Curve — "Our Own Servers Aren't Reaching Our In-House Researchers"
The demand curve for the 8th-generation TPU is unlike anything seen in previous generations. The largest customer, Anthropic, has secured up to 1 million chips and 5 GW of compute capacity in a new contract with Google, with the total expected to reach 10 GW when its additional contract with AWS is included. Anthropic CFO Krishna Rao has publicly stated, "We are targeting annual revenue of $30 billion (approximately ¥4.6 trillion) by 2027," and the 8th-generation TPU is being counted on as the backing for that goal. In February 2026, Meta signed a multibillion-dollar, multiyear contract with Google, and reports indicate that it will secure 500,000 to 800,000 chips by 2027. Apple has adopted TPUs for the Gemini-based backend of Siri and is expected to spend roughly $1 billion (approximately ¥155 billion) annually. Citadel Securities has adopted TPUs for its quantitative research software, and 17 U.S. Department of Energy national laboratories are building a scientific AI platform called "AI Co-Scientist" on TPUs. Recent reports indicate that even OpenAI has begun securing some TPU capacity.
As evidence of excess demand, TheNextWeb has reported that "Google has given Anthropic priority access even to TPUs intended for in-house researchers, resulting in a situation where its internal Research team is stuck waiting in line for TPUs." Bank of America, taking into account the expansion of external sales and the full-scale rollout of Gemini 3, projects that Broadcom's AI semiconductor revenue could more than double year-over-year for full-year 2026 and aim for the $100 billion (approximately ¥15.5 trillion) range in 2027. Big Tech's total AI infrastructure investment for 2026 is estimated to exceed $800 billion (approximately ¥124 trillion), and a structural shift is beginning in which a certain proportion of that allocation moves from Nvidia GPUs to custom ASICs such as TPU, Trainium, and Maia.
The Composition with Nvidia — How Jensen Huang Pushed Back
Nvidia CEO Jensen Huang, when asked about the 8th-generation TPU on Dwarkesh Patel's podcast, pushed back saying, "Anthropic is a special case, not a trend. If you take Anthropic out of the picture, where is the source of TPU growth? It's 100% reliant on Anthropic." Huang also repeatedly goaded Google and Amazon, saying, "They should put up results on public benchmarks like MLPerf and InferenceMax," and stated, "No one has demonstrated a platform that beats Nvidia on performance per total cost of ownership." Among analysts, contrary to Huang's bullish remarks, estimates from IDC and Bernstein are gaining traction — suggesting that Nvidia's inference market share could fall from over 90% today to 20–30% by 2028 — and the threat of custom ASICs in the inference market has entered a phase that can no longer be ignored.
That said, Google itself has not declared "all-out war on Nvidia." At Cloud Next 2026, it was revealed that Nvidia Vera Rubin NVL72 instances would be sold alongside on the same Virgo fabric, and CEO Thomas Kurian emphasized, "Expanding customer choice is the top priority, and Nvidia remains an important partner." Among Silicon Valley VCs, the dominant framing has become "not a binary choice between Nvidia and TPU, but a multi-accelerator era in which the optimal silicon is selected for each workload." The fact that Google is not fully selling TPUs externally and is keeping access primarily via Google Cloud is also being interpreted as a signal that "there is no intent to destroy Nvidia's channel economics."
Points to watch over the next 12 to 18 months
The first point to watch is the precise timing of the general availability (GA) scheduled for the second half of 2026. Google has only stated "second half of 2026," and the timing could shift forward or backward depending on the ramp-up of TSMC's 2nm CoWoS production line, which holds the key to mass production capacity. While Morgan Stanley predicts that "MediaTek's Zebrafish will enter mass production in the second half of 2026 as scheduled," HyperFRAME Research notes that "full deployment will be in the latter half of 2027, when TSMC 2nm enters full-scale mass production." It is reasonable to view the gap between the two as the difference between a beta release and a full-scale GW-class deployment.
The second point to watch is the MLPerf v5.0 and InferenceMax rounds to be held in June–July 2026. As Huang has repeatedly demanded, the focus will be on whether Google will publish third-party benchmark results for the TPU 8t / 8i for the first time, and if it does, the current argument that "it falls short of Nvidia in absolute performance but wins on cost efficiency" will be quantified. In parallel, the actual inference cost and throughput measurements on the TPU 8i associated with the release of Anthropic Claude 5 / Gemini 3 Pro have become the greatest interest of the media and investors.
The third point to watch is the interim progress toward the 2027 launch of the first phase (500MW) of the Blackstone-Google JV, and the emergence of second and third TPU neoclouds following in its wake. Many Silicon Valley VCs are unearthing "TPU-based neoclouds" as a new investment theme, and attention is focused on whether operators will emerge who can replicate, on the TPU side, the rapid growth that CoreWeave and Lambda Labs have enjoyed on the Nvidia side. Furthermore, multiple sources have whispered that in the fall of 2026, third and fourth mega-customers other than Anthropic and Meta (such as OpenAI, Microsoft, or xAI) may publicly announce inference contracts based on the TPU 8i.
Finally, as a long-term point to watch, there is the announcement of "TPU 9" or an equivalent next-generation chip during 2027. Broadcom has a long-term contract with Google through 2031 and is expected to continue with design and supply, while MediaTek is also said to be securing production capacity equivalent to 120,000–150,000 CoWoS wafers in stages by 2027. Big Tech's 2026 AI capex of $800 billion (approximately ¥124 trillion) is partially underpinned by "the purchasing power of the TPU 8 generation," and in 2027 this is likely to expand into the territory of more than $1 trillion (more than ¥155 trillion). The true assessment of the 8th-generation TPU will be rendered from late 2026 through the first half of 2027, when it goes head-to-head with Nvidia's Vera Rubin Ultra generation—and this will be the milestone Silicon Valley VCs should watch most closely going forward.