Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)

The Basic Architecture of TPU and Analysis of Optimization Scenarios

Authors
Zeyou Zhu1, *
1Shanghai New Oriental International Curriculum Center, Shanghai, 200040, China
*Corresponding author. Email: donnyzhu0430@Outlook.com
Corresponding Author
Zeyou Zhu
Available Online 18 February 2026.
DOI
10.2991/978-94-6463-986-5_48How to use a DOI?
Keywords
TPU ASIC; Systolic Array; Energy Efficiency; Algorithm-Hardware
Abstract

Over the past decade, transformer models have ballooned from 110 M to 540 B parameters, rendering Central Processing Unit (CPU)/Graphics Processing Unit (GPU) baselines infeasible for both training and latency-critical inference. Google’s Tensor Processing Unit (TPU) program—spanning four silicon generations from the 28 nm, 40 W TPU v1 to the 7 nm, 175 W TPU v4—has emerged as the first large-scale deployment of domain-specific ASICs purpose-built for dense and semi-structured tensor arithmetic. This paper provides a holistic, quantitative study of the TPU ecosystem. This paper dissects the deterministic, compiler-co-designed micro-architecture of TPU v3, highlighting how systolic arrays, scheduling, and compiler-managed scratchpads eliminate cache-miss variability and sustain 90 %+ MAC utilization. Scenario analyses of four production workloads. This paper further identifies the memory wall, dynamic-shape compilation stalls, and ecosystem breadth as the primary challenges, and map emerging optimizations—weight-streaming, compiler-ahead sharding via JAX/pjit, optical-circuit switching, and the Pathways runtime—that together promise to extend TPU leadership.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
Series
Advances in Engineering Research
Publication Date
18 February 2026
ISBN
978-94-6463-986-5
ISSN
2352-5401
DOI
10.2991/978-94-6463-986-5_48How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Zeyou Zhu
PY  - 2026
DA  - 2026/02/18
TI  - The Basic Architecture of TPU and Analysis of Optimization Scenarios
BT  - Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
PB  - Atlantis Press
SP  - 465
EP  - 472
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-986-5_48
DO  - 10.2991/978-94-6463-986-5_48
ID  - Zhu2026
ER  -