Quantization Methods for Transformer-Based Models on Edge Devices
- DOI
- 10.2991/978-94-6463-986-5_70How to use a DOI?
- Keywords
- Quantization; Large Language Models; Edge Devices
- Abstract
As the rapid development and the broad usage of Transformer-based models like ChatGPT and Gemini, and the popularity of edge devices like mobile phones, the request of deployment of Transformer-based models on edge devices is growing urgently. However, the original Transformer-based models have numerous parameters and large expense of computation and storage, which made it almost impossible for deployment on edge devices. To tackle this problem, quantization, which is an efficient and energy-saving model compression approach, eventually emerged and gradually developed into the mainstream method of compressing models on edge devices. This article reviews various state-of-the-art quantization methods, focusing on three key areas: image recognition, large language models, and image generation. We analyze their applicability, application requirements, and performance across different tasks, and compare them with typical experiments and case studies. We further identify shortcomings in existing methods in terms of accuracy preservation, model robustness, and hardware adaptability, and propose potential areas for improvement. These findings aim to contribute to a deeper understanding of on-device model quantization and provide a reference for more efficient and universal large-scale model deployment.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Zhixiang Zeng PY - 2026 DA - 2026/02/18 TI - Quantization Methods for Transformer-Based Models on Edge Devices BT - Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025) PB - Atlantis Press SP - 680 EP - 691 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-986-5_70 DO - 10.2991/978-94-6463-986-5_70 ID - Zeng2026 ER -