Enhancing Music Generation with Text Descriptions: a Hybrid Approach

Chunyu Wang

doi:10.2991/978-2-38476-400-6_82

<Previous Article In Volume

Next Article In Volume>

Enhancing Music Generation with Text Descriptions: a Hybrid Approach

Authors

Chunyu Wang¹^{, *}

¹Aquinas International Academy, Xinxiang, 453000, China

^*Corresponding author. Email: 15793066214@163.com

Corresponding Author

Chunyu Wang

Available Online 15 May 2025.

DOI: 10.2991/978-2-38476-400-6_82 How to use a DOI?
Keywords: Music Generation; Text-to-Music; Deep Learning; Transformer
Abstract: Recent advances in deep learning have led to significant progress in music generation. However, existing methods often lack fine-grained control over the generated output, limiting their expressiveness and diversity. This paper presents a novel approach for generating full songs (vocals and accompaniment) from detailed text descriptions. We propose a hybrid model that combines the strengths of transformer and diffusion models to generate music that is both coherent and high-quality. Our method allows users to specify nuanced details about the desired music, such as mood, genre, instrumentation, and rhythmic patterns, enabling the creation of music that closely aligns with their creative vision. We evaluate our approach on a diverse dataset of text descriptions and demonstrate its ability to generate expressive and diverse music that surpasses existing methods in terms of controllability and fidelity.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025)
Series: Advances in Social Science, Education and Humanities Research
Publication Date: 15 May 2025
ISBN: 978-2-38476-400-6
ISSN: 2352-5398
DOI: 10.2991/978-2-38476-400-6_82 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Chunyu Wang
PY  - 2025
DA  - 2025/05/15
TI  - Enhancing Music Generation with Text Descriptions: a Hybrid Approach
BT  - Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025)
PB  - Atlantis Press
SP  - 710
EP  - 716
SN  - 2352-5398
UR  - https://doi.org/10.2991/978-2-38476-400-6_82
DO  - 10.2991/978-2-38476-400-6_82
ID  - Wang2025
ER  -

download .riscopy to clipboard