Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)

Adapter-Fusion: A Practical, Parameter-Efficient Framework for Composable Control in Text-to-Image Diffusion

Authors
Yunzhong Zheng1, *
1College of Art and Science, New York University, New York, 10003, USA
*Corresponding author. Email: Zyz5678@outlook.com
Corresponding Author
Yunzhong Zheng
Available Online 24 April 2026.
DOI
10.2991/978-94-6239-648-7_92How to use a DOI?
Keywords
Diffusion Models; Controllable Generation; Parameter-Efficient Fine-Tuning (PEFT); LoRA; Multi-Control Composition
Abstract

The surge of text-to-image diffusion models is an innovative step in the development of generative artificial intelligence. However, when the model is applied in production, the lack of precise control is a critical constraint. There are existing methods that introduced singular control modalities. The naïve combination of different adapters can cause “signal interference”, meaning that the different effects from different adapters degrade one another, making the result worse. This paper introduces Adapter-Fusion, which is a novel framework aiming to achieve both high-accuracy generation and computational efficiency. Adapter-Fusion adopts a ‘frozen-backbone’ philosophy. It incorporates Control-LoRA and IP-Adapter without altering their pretrained weights. Control-LoRA is used for controlling the spatial structure. On the other hand, stylistic content is controlled by IP-Adapter. The central innovation of the research is the “Composer” module. The “composer” deploys a gated LoRA-switching mechanism. This mechanism predicts the gating coefficients for different blocks. Signals are sent to the layers of the U-Net by the LoRA-switching mechanism. Its main goal is to decouple the spatial and temporal domain signals. An artificially generated dataset is used for validation in the research. The aim of using a synthetic dataset is to isolate different control interactions. Adapter-Fusion obtains a superior balance of precision. The model has high CLIP-I score and CLIP-T score, while the RMSE is still robust. The architecture can generate images for consumer-level hardware at a relatively high speed. This result excels the guidance baselines. Thus, Adapter-Fusion is a practical solution to multi-modal control.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
Series
Advances in Computer Science Research
Publication Date
24 April 2026
ISBN
978-94-6239-648-7
ISSN
2352-538X
DOI
10.2991/978-94-6239-648-7_92How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yunzhong Zheng
PY  - 2026
DA  - 2026/04/24
TI  - Adapter-Fusion: A Practical, Parameter-Efficient Framework for Composable Control in Text-to-Image Diffusion
BT  - Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
PB  - Atlantis Press
SP  - 853
EP  - 861
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6239-648-7_92
DO  - 10.2991/978-94-6239-648-7_92
ID  - Zheng2026
ER  -