Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)

Quantization Methods for Transformer-Based Models on Edge Devices

Authors
Zhixiang Zeng1, *
1School of Advanced Manufacturing, Guangdong University of Technology, Guangzhou, 510006, China
*Corresponding author. Email: zengzhixiang1@mails.gdut.edu.cn
Corresponding Author
Zhixiang Zeng
Available Online 18 February 2026.
DOI
10.2991/978-94-6463-986-5_70How to use a DOI?
Keywords
Quantization; Large Language Models; Edge Devices
Abstract

As the rapid development and the broad usage of Transformer-based models like ChatGPT and Gemini, and the popularity of edge devices like mobile phones, the request of deployment of Transformer-based models on edge devices is growing urgently. However, the original Transformer-based models have numerous parameters and large expense of computation and storage, which made it almost impossible for deployment on edge devices. To tackle this problem, quantization, which is an efficient and energy-saving model compression approach, eventually emerged and gradually developed into the mainstream method of compressing models on edge devices. This article reviews various state-of-the-art quantization methods, focusing on three key areas: image recognition, large language models, and image generation. We analyze their applicability, application requirements, and performance across different tasks, and compare them with typical experiments and case studies. We further identify shortcomings in existing methods in terms of accuracy preservation, model robustness, and hardware adaptability, and propose potential areas for improvement. These findings aim to contribute to a deeper understanding of on-device model quantization and provide a reference for more efficient and universal large-scale model deployment.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
Series
Advances in Engineering Research
Publication Date
18 February 2026
ISBN
978-94-6463-986-5
ISSN
2352-5401
DOI
10.2991/978-94-6463-986-5_70How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Zhixiang Zeng
PY  - 2026
DA  - 2026/02/18
TI  - Quantization Methods for Transformer-Based Models on Edge Devices
BT  - Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
PB  - Atlantis Press
SP  - 680
EP  - 691
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-986-5_70
DO  - 10.2991/978-94-6463-986-5_70
ID  - Zeng2026
ER  -