NVIDIA GPU vs Google TPU

The two massive pillars supporting the infrastructure of the AI industry——NVIDIA's GPUs and Google's TPUs——have entered a new phase of competition heading into 2026. NVIDIA's data center revenue reached $115.2 billion (approximately ¥17.28 trillion) in fiscal year 2025 (ending January 2025), with shipments of the Blackwell generation (B200/GB200) ramping up in earnest. The company holds an estimated 70–95% share of the AI training accelerator market, making it an overwhelming dominant force. Meanwhile, Google's TPU (Tensor Processing Unit), having evolved over a decade since the v1 announcement in 2016, has now reached its sixth generation, "Trillium." Google claims that TPU v5e delivers a 50% reduction in training costs and up to 2.5x improvement in inference cost efficiency compared to equivalent GPU instances. AI startups such as Anthropic (Claude), Character.AI, Cohere, and MidJourney have taken note of the TPU's cost advantages and adopted it, while OpenAI, Meta, and xAI maintain a strategy of relying exclusively on NVIDIA GPUs. Sequoia Capital, in its "AI's $600B Question" report, flagged the risks of over-investment in GPUs, while a16z has positioned reducing dependence on NVIDIA as a key challenge for its portfolio companies. Jim Keller (CEO of Tenstorrent, designer of AMD Zen and Apple A-series chips) has thrown down the gauntlet, asserting that "NVIDIA's moat is not as deep as people think," while David Patterson (Professor Emeritus at UC Berkeley, inventor of RISC, and Google Distinguished Engineer) argues the structural superiority of domain-specific architectures exemplified by the TPU. Morgan Stanley maintains a short-term advantage for NVIDIA, while Goldman Sachs predicts that "custom chips will expand their market share over the medium term." This article provides a comprehensive examination of the historical background and technical characteristics of GPUs and TPUs, cost-performance comparisons, the investment theses of Silicon Valley VCs, the views of prominent figures, and the trends that lie ahead.

The Origins of the GPU — From Gaming to AI Dominance

NVIDIA's path to becoming synonymous with AI semiconductors was shaped by the vision of one entrepreneur and several historic turning points.

In 1993, Jensen Huang, Chris Malachowsky, and Curtis Priem founded NVIDIA in Santa Clara, California. The company's initial business was graphics chips for PC gaming. NVIDIA launched the GeForce 256 in 1999 and coined the term "GPU (Graphics Processing Unit)." In the early 2000s, the company engaged in fierce market share battles with ATI (later AMD) in the gaming GPU market.

The first turning point came in 2006 with the release of CUDA (Compute Unified Device Architecture). CUDA provided a programming model that allowed the thousands of cores in GPUs—originally dedicated to rendering graphics—to be used for general-purpose parallel computation. By enabling GPU parallelism through C-like code, it attracted researchers in scientific computing and physics simulation. At that point, no one imagined CUDA would become the "moat" of the AI industry.

The second turning point was 2012's "AlexNet moment." Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton dominated the ImageNet competition with "AlexNet," a convolutional neural network trained on two GTX 580 GPUs. This achievement—dramatically reducing image recognition error rates from 26% to 16%—became the starting point of the "deep learning revolution." Jensen Huang later described this moment as the "Big Bang" and made the decision to make AI the core future business of NVIDIA.

From that point on, NVIDIA rapidly evolved its data center GPUs. The 2017 Volta-generation Tesla V100 was the first to feature Tensor Cores dedicated to AI computation, achieving significant speedups through mixed-precision arithmetic (FP16/FP32). The 2020 Ampere-generation A100 achieved 312 TFLOPS (TF32), and combined with surging AI demand during the COVID-19 pandemic, drove rapid growth in data center revenue. The 2022 Hopper-generation H100 featured the Transformer Engine with FP8 support and became the "standard" for training large GPT-class models. In 2024, NVIDIA announced the Blackwell-generation B200/GB200—a 208-billion-transistor chip integrating two dies in a single package, achieving 20 PFLOPS with FP4 support. The GB200 NVL72 (a 72-GPU liquid-cooled rack) claims a 30x improvement in inference performance over the previous generation.

NVIDIA's fiscal year 2025 (ending January 2025) revenue reached $130.5 billion, up 114% year over year. Of that, the data center business accounted for $115.2 billion, representing 88% of total revenue. Its market capitalization has exceeded $3 trillion, making it one of the most valuable companies in the world.

The Origins of TPU — Why Google Built Its Own Chip

Google's motivation for developing the TPU was not pure technological ambition, but economic necessity.

In the early 2010s, the use of deep learning within Google expanded rapidly. Neural networks were embedded into every service — speech recognition, Google Translate, search ranking, YouTube recommendations. Google's internal estimates suggested that "if all users used voice search for just three minutes a day, the company would need to double its data center capacity." Continuing to purchase large quantities of NVIDIA GPUs was not sustainable, either in cost or supply.

Google's answer to this challenge was the Domain-Specific Architecture (DSA) — a proprietary chip designed specifically for neural network computation. Led by Jeff Dean (then head of Google Brain) and David Patterson (Professor Emeritus at UC Berkeley, inventor of RISC, and Google Distinguished Engineer from 2016), TPU v1 began operating internally at Google in 2015.

TPU v1 was an inference-only 8-bit integer arithmetic chip with a performance of 92 TOPS (INT8). In March 2016, when DeepMind's AlphaGo defeated Lee Sedol, TPU v1 was used for inference, bringing the chip to the world's attention.

The design philosophy of the TPU is fundamentally different from that of the GPU. While GPUs aim for general-purpose parallel computation, TPUs adopt a Systolic Array structure optimized for matrix multiplication (GEMM). By maximizing data reuse, they achieve higher computational efficiency per watt. Additionally, the BFloat16 (Brain Float 16) format, which Google introduced ahead of the industry, significantly improves throughput at a slight cost to precision. BFloat16 was later adopted by NVIDIA GPUs (V100 and later) and Intel CPUs, becoming an industry standard.

The TPU has steadily advanced through generations. The v2 in 2017 added training support and HBM, with public availability on Google Cloud. The v3 in 2018 introduced liquid cooling. The v4 in 2021 achieved over 1 EXAFLOPS with SparseCore and a 4,096-chip Pod configuration using optical circuit switching (OCS). In 2023, the v5e (cost-efficiency focused) and v5p (performance focused, 8,960-chip Pod) were introduced. Then in 2024, the sixth-generation "Trillium" was announced, delivering 4.7x the training performance of the v5e and a 67% improvement in energy efficiency.

Technical strengths and weaknesses——generality vs. specialized efficiency

When we organize the technical characteristics of GPUs and TPUs, the differences in their design philosophies become clear.

The strengths of NVIDIA GPUs are, first, versatility. They can handle all parallel computing workloads—not just AI training and inference, but also scientific computing, rendering, simulation, and cryptocurrency mining. Second is the scale of the CUDA ecosystem. An estimated 4 million or more CUDA developers, optimized libraries such as cuDNN, TensorRT, NCCL, and Triton, and all frameworks—PyTorch, TensorFlow, and JAX—support CUDA optimization as a first-class citizen. This software foundation, built over more than 15 years, "cannot be replicated overnight" (Jensen Huang). Third, high-bandwidth inter-GPU communication via NVLink/NVSwitch (bidirectional 900 GB/s on the H100), combined with the integration of InfiniBand networking through the Mellanox acquisition (2019, $6.9 billion), achieves end-to-end optimization from chip to cluster.

On the other hand, the weaknesses of GPUs are equally clear. Power consumption reaches 700W for the H100 and over 1,000W for the B200, leading to enormous data center power and cooling costs. The price of a single H100 is approximately $25,000–$40,000, and a DGX H100 system (8 GPUs) exceeds $200,000. From 2023 to 2024, severe supply shortages resulted in lead times stretching to 6–12 months. And while dependence on CUDA serves as a "moat," it is simultaneously a "lock-in." The cost of migrating to other hardware is extremely high, and AMD's ROCm has yet to catch up to CUDA's maturity.

The strengths of TPUs lie above all in their cost-performance ratio (detailed in the next chapter). Their matrix-operation-specialized design yields high performance per watt. Trillium achieved a 67% improvement in energy efficiency compared to v5e. Direct chip-to-chip connectivity via ICI (Inter-Chip Interconnect) delivers low latency and high bandwidth comparable to NVLink, with Pod configurations of thousands of chips already proven at scale. Affinity with Google's JAX framework is extremely high, and Gemini training is conducted using the JAX + TPU combination.

The weaknesses of TPUs are that they are limited to Google Cloud (no on-premises purchase option), their ecosystem is smaller than CUDA's (PyTorch's TPU support tends to lag behind its CUDA counterpart), and there is a learning curve for TPU-specific optimizations (data pipeline design, sharding strategies).

In MLPerf benchmarks (organized by MLCommons, the industry standard for AI performance), NVIDIA records the highest performance in nearly all categories with Blackwell, while Google TPU v5p also achieves top-tier results in multiple categories. However, MLPerf is a benchmark that competes for "peak performance" and does not measure cost efficiency. The cost-performance ratio—TPU's greatest strength—is structurally not reflected in MLPerf.

TPU Cost-Performance Ratio——A Noteworthy Structural Advantage

In the GPU vs TPU debate, the most overlooked yet most critical point is cost-performance ratio.

Google has consistently emphasized cost advantages with each generation of TPU announcements. At the TPU v5e launch (August 2023), they claimed "half the training cost compared to v4, and roughly one-third the inference cost"; at the v5p launch (December 2023), "superior cost-performance ratio versus H100 for large-scale model training"; and at the Trillium launch (2024), "4.7x performance per dollar improvement over v5e."

While direct cloud pricing comparisons vary by configuration and region, rough estimates reveal the following picture. On Google Cloud, TPU v5e costs approximately $1.20/hour per chip (on-demand), dropping to around $0.50/hour with a 3-year commitment. By contrast, H100s on the same Google Cloud (A3 instances) run about $3.90/GPU/hour. AWS H100s (p5 instances) cost approximately $12.29/GPU/hour, while GPU clouds such as CoreWeave and Lambda run roughly $2.00–$2.50/hour.

For LLM training cost comparisons, training a model at the scale of LLaMA 2 70B on a 2,048-H100 configuration (assuming AWS/Azure) costs approximately $2–3 million. Google claims that an equivalent TPU v5p configuration achieves 30–50% cost reduction, putting it at roughly $1–2 million. On a per-token inference cost basis, Google claims TPU v5e delivers up to 2.5x better cost efficiency compared to H100.

There are three structural reasons behind this cost advantage. First, TPUs' domain-specific design delivers better matrix operation efficiency per watt than GPUs — the efficiency gained by sacrificing generality is reflected in cost. Second, Google vertically integrates TPU design, manufacturing (outsourced to TSMC), and operations, eliminating the margins that arise when purchasing NVIDIA GPUs as a third party. Google's internal TPU costs are likely even lower than what external customers pay. Third, Google's data centers achieve world-class energy efficiency at around PUE 1.1, keeping power and cooling costs low.

That said, there are important caveats to cost comparisons. These include the difficulty of direct comparison (due to differing cloud pricing structures), the impact of optimization level (comparisons are only fair when code is optimized for each respective platform), and hidden costs (data transfer fees, engineering time, and the learning curve of migrating to TPUs). Additionally, since TPUs are exclusive to Google Cloud, they are not an option for companies pursuing multi-cloud strategies or on-premises operations.

Given the rapid escalation of AI training costs — GPT-3 (estimated $4.6 million, 2020) → GPT-4 (estimated over $100 million, 2023) → next-generation models (estimated $500 million–$1 billion) — differences in cost-performance ratio translate into tens of millions of dollars in impact. This creates a compelling economic incentive to choose TPUs, particularly for startups that prioritize capital efficiency.

Corporate Infrastructure Choices——Why They Diverge

The choice of infrastructure for AI development varies greatly depending on a company's strategy, partnerships, and technical background.

OpenAI has a strategic partnership with Microsoft Azure, and the training of GPT-4/o is conducted on NVIDIA GPUs (estimated tens of thousands to 100,000 H100s) on Azure. CEO Sam Altman has stated that "in the long term, a diverse range of AI-optimized chips will be needed," while in early 2024 it was reported (Bloomberg) that he had envisioned raising $5–7 trillion to manufacture proprietary AI chips. Although this vision itself was never realized, it demonstrates a deep sense of urgency around GPU supply.

Meta has made its strategy of exclusively using NVIDIA GPUs clear. Mark Zuckerberg announced that Meta would secure approximately 350,000 H100s by the end of 2024, and LLaMA 3.1 405B was trained on an estimated 16,000+ H100s. The company is developing a custom inference chip called MTIA (with v2 delivering a 3x improvement in inference performance), but training remains centered on NVIDIA GPUs. For Meta, which champions open-source principles, compatibility with the CUDA ecosystem and PyTorch is the primary reason for choosing GPUs.

xAI (Elon Musk) goes even further. It has built "Colossus," one of the world's largest single GPU clusters, in Memphis, Tennessee, operating 100,000 H100s. Musk publicly declares that "GPUs are the new gold" and that "companies that can't secure enough GPUs cannot participate in the AI race." While Tesla developed a proprietary AI chip called Dojo (D1), the company ultimately massively increased its investment in NVIDIA GPUs and effectively scaled back the Dojo project in 2024—a symbolic case illustrating the difficulty of developing proprietary chips.

On the other hand, startups choosing TPUs are steadily increasing. Anthropic, backed by over $2 billion in investment from Google (2023), trains Claude on Google Cloud TPUs, while also adopting a hybrid strategy that combines GPU/Trainium on AWS through a $4 billion investment from Amazon. Character.AI (founded by Google Brain alumni Noam Shazeer and Daniel De Freitas) handles millions of daily user conversations using TPU v4/v5e, selecting them primarily for cost efficiency in large-scale inference. Cohere uses both TPUs and GPUs to enable multi-cloud support. MidJourney initially used Google Cloud TPUs to train its image generation models.

Google/DeepMind itself is naturally TPU-centric. Gemini was trained on TPU v5p, PaLM 2 on TPU v4 Pods, and AlphaFold was also run on TPUs. However, Google Cloud also offers NVIDIA H100/A100s to customers, signaling a "provide options" stance. The majority of AI inference workloads within Google—Search, YouTube, Gmail, Google Translate, and Gemini—are said to run on TPUs.

Silicon Valley VC Perspective — The Sustainability of NVIDIA's Dominance and Alternative Scenarios

Silicon Valley VCs view the GPU vs. TPU debate not as a "chip performance comparison" but as a "structural risk in the AI industry."

Sequoia Capital highlighted in its early 2024 report "AI's $600B Question" that NVIDIA's GPU revenues exceeded $50 billion while actual revenues of AI companies fell far short, suggesting that GPU/compute investment may be excessive and underscoring the importance of cost optimization through alternatives (TPUs, custom chips).

a16z (Andreessen Horowitz)'s Martin Casado and Matt Bornstein, in their 2023 analysis "Who Owns the Generative AI Platform?", noted that "AI startups' gross margins are lower than traditional SaaS companies due to GPU costs." a16z views the AI infrastructure layer (GPU/TPU) as a "tax" dominated by NVIDIA and Google, arguing that the greatest VC opportunity lies in the application layer, while keeping a close eye on "NVIDIA dependency risk" and the rise of custom silicon. Matt Bornstein predicts "2026 will be the year of AI agents," but also notes that optimizing underlying costs will be a matter of life or death for startups.

VC investment behavior reflects this awareness. As an "alternative" to NVIDIA's dominance, large investments are flowing into the following AI chip startups: Cerebras Systems (cumulative funding ~$700M / ~¥105B, wafer-scale chip WSE-3), Groq (cumulative funding ~$640M / ~¥96B, inference-specialized LPU), SambaNova Systems (cumulative funding ~$1.1B / ~¥165B, RDU), Tenstorrent (cumulative funding ~$300M / ~¥45B, RISC-V based under Jim Keller), and Etched (cumulative funding ~$120M / ~¥18B, Transformer-dedicated ASIC "Sohu").

The VC community's common understanding is organized along three time horizons. In the short term (1–3 years), NVIDIA's dominance is unshakeable — the CUDA moat is deep, and the Blackwell/Rubin generational updates are rapid. In the medium term (3–5 years), custom silicon (including TPUs) will expand its share, particularly in the inference market. In the long term (5+ years), a heterogeneous environment (mixed GPU + TPU + custom ASIC) is expected to become the standard.

Goldman Sachs, in its report "AI Infrastructure: The Next $1 Trillion Opportunity" (2024), named NVIDIA the near-term winner while positioning Google TPU and AWS Trainium as "the most viable alternatives." Morgan Stanley analyzed that "NVIDIA's moat lies not in hardware but in the CUDA ecosystem," and Bernstein Research's Stacy Rasgon — NVIDIA's most prominent analyst — maintained that "NVIDIA's competitive advantage will persist for the next several years," while also noting that in the long run, the rise of ASICs and custom chips could pressure gross margins.

Claims of Notable Figures — The GPU Camp vs. The TPU Camp

The GPU vs. TPU debate divides even prominent figures in Silicon Valley.

Jensen Huang (NVIDIA CEO) consistently argues that the versatility of GPUs provides a long-term advantage. "Chips specialized for specific workloads may be temporarily efficient, but AI models evolve rapidly. A versatile GPU platform is more advantageous in the long run." On CUDA, he states: "An install base of millions is an ecosystem built over more than 15 years — it cannot be replicated overnight." At GTC 2024, he declared, "The next industrial revolution has begun." NVIDIA's roadmap announces annual generational updates (Blackwell → Rubin → Vera), accelerating from the traditional two-year cycle.

David Patterson (UC Berkeley Professor Emeritus, Google Distinguished Engineer) is the most powerful advocate on the TPU side. Known in semiconductor design history as the inventor of RISC and RAID, he argued for TPU's superiority in his 2020 paper "A Domain-Specific Supercomputer for Training Deep Neural Networks," and in 2023 co-authored an ISCA paper with Jeff Dean revealing architectural details of TPU v4. He contends that "domain-specific architectures are orders of magnitude more efficient than general-purpose processors."

Jeff Dean (Google Chief Scientist) is the driving force behind TPU development. "The design philosophy of the TPU is to leverage the essential nature of neural network computation — maximizing throughput even at the cost of some precision," he says. As a believer in scaling laws, he frames the TPU as "a tool for economically realizing the scaling that is key to improving AI performance."

Yann LeCun (Meta Chief AI Scientist, NYU Professor) supports GPUs but brings a unique perspective. Meta's large-scale AI research (including the LLaMA series) is conducted entirely on NVIDIA GPUs. While he acknowledges that "general-purpose GPUs are evolving too fast for ASICs to keep up," he also recognizes the long-term importance of domain-specific chips. As an open-source advocate, he is concerned about excessive dependence on any single vendor.

Jim Keller (Tenstorrent CEO, designer of AMD Zen / Apple A-series / Tesla Dojo) directly challenges NVIDIA. "NVIDIA's moat is not as deep as people think. If a good alternative emerges, migration will happen." He promotes an open RISC-V-based architecture and flatly states that "the GPU+CUDA model is not optimal."

Elon Musk has reached his conclusion in practice. While developing Tesla's proprietary AI chip Dojo, he ultimately purchased 100,000 NVIDIA H100s for xAI. His words — "GPUs are the new gold" — most succinctly capture the reality of NVIDIA's dominance.

Andrew Ng (Stanford Professor, Coursera co-founder) takes a pragmatic middle ground. As a pioneer of early GPU-based deep learning research, he states: "What you build matters more than which chip you use. That said, at this point in time, the GPU+CUDA ecosystem offers the highest productivity."

GPU vs TPU by the Numbers — Market Data and Investment Trends

The numbers in the AI semiconductor market reflect both NVIDIA's overwhelming dominance and the rise of forces challenging it.

NVIDIA's data center revenue expanded roughly eightfold in just two years: from $15 billion in fiscal year 2023 (ending January 2023), to $47.5 billion in fiscal year 2024, and $115.2 billion in fiscal year 2025. The company holds an estimated 70–95% share of the AI training accelerator market. a16z has described this revenue scale as a "tax on the AI industry."

AMD is closing the gap with its MI300X, targeting approximately $5 billion in AI accelerator sales for 2024. However, that figure is less than one-tenth of NVIDIA's scale, leaving AMD with only around a 5–15% market share.

Direct revenue figures for Google Cloud TPUs are not publicly disclosed. Alphabet reported approximately $43 billion in total Google Cloud revenue for full-year 2024 (up ~28% year-over-year), achieving operating profit. The number of companies using TPUs is said to exceed several hundred, though internal Google usage is overwhelmingly larger — the majority of inference workloads for Search, YouTube, Gmail, Google Translate, and Gemini run on TPUs.

Multiple research firms forecast the overall AI semiconductor market will reach approximately $70–80 billion in 2024 and $300–400 billion by 2030, representing annual growth of 20–30%.

Capital expenditure by cloud providers is also surging. Sundar Pichai (CEO of Google/Alphabet) announced a capex plan of approximately $75 billion per year. Microsoft and Amazon are planning investments of similar scale. NVIDIA is the biggest beneficiary of this "AI infrastructure arms race," but investment in custom chip development by each company is also accelerating.

Soaring AI training costs are making cost efficiency increasingly critical. Training cost estimates have risen from approximately $4.6 million for GPT-3 (2020) to over $100 million for GPT-4 (2023), with next-generation models projected at $500 million to $1 billion. At that scale, the 30–50% cost reduction that TPUs' cost advantage can deliver translates to a difference of $150 million to $500 million.

The Custom Silicon Wave——A Third Path Beyond GPUs and TPUs

In addition to the GPU vs. TPU binary, a third wave — "custom silicon" — is gaining momentum.

Amazon/AWS has deployed Trainium 2 (2024) to reduce dependence on NVIDIA. A large-scale Trainium cluster called "Project Rainier" is under construction for training Anthropic's next-generation models. The inference-optimized Inferentia 2 is also being rolled out.

Microsoft announced its first AI-dedicated chip, Maia 100, in November 2023. Combined with the Arm-based CPU Cobalt, it is being deployed for Azure, though its scale remains limited for now, with the NVIDIA partnership continuing as the primary axis.

Meta achieved a 3x improvement in inference performance with MTIA v2. However, training remains centered on NVIDIA GPUs, with MTIA focused specifically on cost optimization for inference.

Apple runs on-device AI inference on its own chips via Apple Silicon (M-series), but uses NVIDIA GPUs for data center training.

Alongside these moves, startup challengers continue to emerge: Cerebras (wafer-scale chips), Groq (inference-dedicated LPU, ultra-low latency), Tenstorrent (RISC-V based, led by Jim Keller), and Etched (Transformer-dedicated ASIC) — each taking a different approach to challenge NVIDIA's stronghold.

The Stanford HAI (Human-Centered AI Institute) AI Index Report 2024 warns that compute costs are becoming a bottleneck for AI research, and that the disparity in GPU/TPU access is impeding the "democratization of AI research."

Future Trends — Toward a Heterogeneous Future

The GPU vs TPU competition will most likely converge toward a heterogeneous environment (a mix of diverse chips) rather than ending with one side winning outright.

NVIDIA's roadmap is accelerating. The company has declared a shift from its traditional two-year cycle to a one-year cycle: Blackwell (2024–2025) → Rubin (2026, HBM4, next-gen NVLink) → Vera (2028). Beyond raw per-chip performance gains, the trend is toward an integrated platform encompassing NVLink, NVSwitch, Spectrum-X Ethernet, and software (NIM, NEMO).

Google continues its generational updates as well. The successor to Trillium (v6) is expected on an 18–24 month cadence. Integration with its proprietary CPU "Axion" (Arm-based, announced 2024) is also advancing, sketching out an "AI hypercomputer" vision combining TPU + GPU + CPU. Inference optimization is a particularly critical theme for large-scale Gemini deployments.

On the software side, efforts to improve portability across chips are accelerating. Standardization of ML compilers such as MLIR and OpenXLA is progressing, and Triton (developed by OpenAI/Meta) is exploring expansion to non-GPU backends. As these technologies mature, the barrier of CUDA lock-in will gradually diminish.

Synthesizing analyst forecasts, NVIDIA is expected to maintain a 60–80% share of the training market through 2025–2027, while its share of the inference market falls to 50–60%. Between 2028 and 2030, custom chips (TPU, Trainium, and various ASICs) could reach 30–40% of the training market. The inference market, being highly cost-sensitive, is the segment where TPU/custom chip penetration is expected to advance most rapidly.

If Jensen Huang's vision of "every company becoming an AI factory" comes to fruition, those factories will be equipped not with NVIDIA GPUs alone, but with a diverse mix of Google TPUs, AWS Trainium, and custom ASICs from various vendors. The real question is not who wins "GPU vs TPU," but rather that an era has arrived in which each company selects the optimal chip based on its workload, scale, and cost structure.

Impact on the Industry

First, NVIDIA's GPU dominance will not be shaken in the short term, but the cost structure known as the "NVIDIA tax" could constrain the growth of the entire AI industry. The figure of $115.2 billion in data center revenue (FY2025) illustrates the magnitude of costs that the AI industry pays to the "factories of computation." The "gap between GPU investment and revenue" highlighted by Sequoia Capital is generating structural pressure toward cost-optimization alternatives——TPUs, Trainium, and custom ASICs.

Second, the cost-performance ratio of TPUs represents an advantage that cannot be overlooked, particularly for AI startups that prioritize capital efficiency. A 30–50% reduction in training costs translates to a difference of hundreds of millions of dollars at the scale of next-generation models (estimated training costs of $500M–$1B). The fact that companies such as Anthropic, Character.AI, and Cohere have chosen TPUs signals that the cost advantage has moved from the "theoretical" phase into the "practical" phase.

Third, the CUDA ecosystem is simultaneously NVIDIA's greatest strength and a bottleneck for the entire AI industry. A developer base of over 4 million makes migration costs extremely high; however, with the evolution of cross-chip compiler technologies such as MLIR, OpenXLA, and Triton, this barrier is expected to decline over the medium term. Whether Jim Keller's observation that "NVIDIA's moat is not as deep as people think" becomes reality depends on the maturity of these software technologies.

Fourth, the AI semiconductor market is transitioning away from a GPU-vs-TPU binary toward a heterogeneous environment of diverse, co-existing chips. With the addition of Amazon Trainium, Microsoft Maia, Meta MTIA, and challenges from startups such as Cerebras, Groq, Tenstorrent, and Etched, enterprises are being compelled to choose chips based on workload, scale, and cost structure. NVIDIA GPU dominance will continue for the time being in the training market, but TPU and custom chip penetration is advancing most rapidly in the inference market.

References: NVIDIA FY2025 Annual Report & Earnings (Jan 2025), NVIDIA GTC 2024 Keynote (Jensen Huang), Google Cloud Next 2024 (Trillium/TPU v6 announcement), Google ISCA 2023 TPU v4 Paper (Jeff Dean, David Patterson et al.), Sequoia Capital "AI's $600B Question" (2024), a16z "Who Owns the Generative AI Platform?" (Martin Casado, Matt Bornstein, 2023), Goldman Sachs "AI Infrastructure: The Next $1 Trillion Opportunity" (2024), Morgan Stanley NVIDIA Coverage Reports, Bernstein Research (Stacy Rasgon) Semiconductor Analysis, Stanford HAI AI Index Report 2024, MLCommons MLPerf Training v4.0 Results (2024), Google Cloud TPU Pricing & Documentation, AWS P5/Trainium Pricing, Azure ND H100 Pricing, David Patterson "A Domain-Specific Supercomputer for Training Deep Neural Networks" (Communications of the ACM, 2020), Anthropic-Google Cloud Partnership Announcement (2023), Character.AI TPU Infrastructure Reports, Elon Musk xAI Colossus Announcements, Sam Altman AI Chip Fundraising Reports (Bloomberg, 2024), Jim Keller Tenstorrent Interviews & RISC-V Vision, Yann LeCun AI Hardware Commentary, Andrew Ng GPU-based DL Research, Cerebras/Groq/SambaNova/Etched Funding Rounds (TechCrunch, The Information), Google Axion CPU Announcement (2024), NVIDIA Rubin/Vera Roadmap (GTC 2024), The Information "NVIDIA Tax" Coverage, IEEE Spectrum TPU Architecture Analysis, Nikkei Cross Tech NVIDIA/AI Semiconductor Feature