Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025)

Enhancing Music Generation with Text Descriptions: a Hybrid Approach

Authors
Chunyu Wang1, *
1Aquinas International Academy, Xinxiang, 453000, China
*Corresponding author. Email: 15793066214@163.com
Corresponding Author
Chunyu Wang
Available Online 15 May 2025.
DOI
10.2991/978-2-38476-400-6_82How to use a DOI?
Keywords
Music Generation; Text-to-Music; Deep Learning; Transformer
Abstract

Recent advances in deep learning have led to significant progress in music generation. However, existing methods often lack fine-grained control over the generated output, limiting their expressiveness and diversity. This paper presents a novel approach for generating full songs (vocals and accompaniment) from detailed text descriptions. We propose a hybrid model that combines the strengths of transformer and diffusion models to generate music that is both coherent and high-quality. Our method allows users to specify nuanced details about the desired music, such as mood, genre, instrumentation, and rhythmic patterns, enabling the creation of music that closely aligns with their creative vision. We evaluate our approach on a diverse dataset of text descriptions and demonstrate its ability to generate expressive and diverse music that surpasses existing methods in terms of controllability and fidelity.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025)
Series
Advances in Social Science, Education and Humanities Research
Publication Date
15 May 2025
ISBN
978-2-38476-400-6
ISSN
2352-5398
DOI
10.2991/978-2-38476-400-6_82How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Chunyu Wang
PY  - 2025
DA  - 2025/05/15
TI  - Enhancing Music Generation with Text Descriptions: a Hybrid Approach
BT  - Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025)
PB  - Atlantis Press
SP  - 710
EP  - 716
SN  - 2352-5398
UR  - https://doi.org/10.2991/978-2-38476-400-6_82
DO  - 10.2991/978-2-38476-400-6_82
ID  - Wang2025
ER  -