Enhancing Music Generation with Text Descriptions: a Hybrid Approach
- DOI
- 10.2991/978-2-38476-400-6_82How to use a DOI?
- Keywords
- Music Generation; Text-to-Music; Deep Learning; Transformer
- Abstract
Recent advances in deep learning have led to significant progress in music generation. However, existing methods often lack fine-grained control over the generated output, limiting their expressiveness and diversity. This paper presents a novel approach for generating full songs (vocals and accompaniment) from detailed text descriptions. We propose a hybrid model that combines the strengths of transformer and diffusion models to generate music that is both coherent and high-quality. Our method allows users to specify nuanced details about the desired music, such as mood, genre, instrumentation, and rhythmic patterns, enabling the creation of music that closely aligns with their creative vision. We evaluate our approach on a diverse dataset of text descriptions and demonstrate its ability to generate expressive and diverse music that surpasses existing methods in terms of controllability and fidelity.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Chunyu Wang PY - 2025 DA - 2025/05/15 TI - Enhancing Music Generation with Text Descriptions: a Hybrid Approach BT - Proceedings of the 2nd International Conference on Educational Development and Social Sciences (EDSS 2025) PB - Atlantis Press SP - 710 EP - 716 SN - 2352-5398 UR - https://doi.org/10.2991/978-2-38476-400-6_82 DO - 10.2991/978-2-38476-400-6_82 ID - Wang2025 ER -