Quantization Methods for Transformer-Based Models on Edge Devices

Zhixiang Zeng

doi:10.2991/978-94-6463-986-5_70

<Previous Article In Volume

Next Article In Volume>

Quantization Methods for Transformer-Based Models on Edge Devices

Authors

Zhixiang Zeng¹^{, *}

¹School of Advanced Manufacturing, Guangdong University of Technology, Guangzhou, 510006, China

^*Corresponding author. Email: zengzhixiang1@mails.gdut.edu.cn

Corresponding Author

Zhixiang Zeng

Available Online 18 February 2026.

DOI: 10.2991/978-94-6463-986-5_70 How to use a DOI?
Keywords: Quantization; Large Language Models; Edge Devices
Abstract: As the rapid development and the broad usage of Transformer-based models like ChatGPT and Gemini, and the popularity of edge devices like mobile phones, the request of deployment of Transformer-based models on edge devices is growing urgently. However, the original Transformer-based models have numerous parameters and large expense of computation and storage, which made it almost impossible for deployment on edge devices. To tackle this problem, quantization, which is an efficient and energy-saving model compression approach, eventually emerged and gradually developed into the mainstream method of compressing models on edge devices. This article reviews various state-of-the-art quantization methods, focusing on three key areas: image recognition, large language models, and image generation. We analyze their applicability, application requirements, and performance across different tasks, and compare them with typical experiments and case studies. We further identify shortcomings in existing methods in terms of accuracy preservation, model robustness, and hardware adaptability, and propose potential areas for improvement. These findings aim to contribute to a deeper understanding of on-device model quantization and provide a reference for more efficient and universal large-scale model deployment.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
Series: Advances in Engineering Research
Publication Date: 18 February 2026
ISBN: 978-94-6463-986-5
ISSN: 2352-5401
DOI: 10.2991/978-94-6463-986-5_70 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Zhixiang Zeng
PY  - 2026
DA  - 2026/02/18
TI  - Quantization Methods for Transformer-Based Models on Edge Devices
BT  - Proceedings of the 2025 International Conference on Electronics, Electrical and Grid Technology (ICEEGT 2025)
PB  - Atlantis Press
SP  - 680
EP  - 691
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-986-5_70
DO  - 10.2991/978-94-6463-986-5_70
ID  - Zeng2026
ER  -

download .riscopy to clipboard