Hybrid conditional VAE-CLSTM model for generating synchronized music from humming
- DOI
- 10.2991/978-94-6463-738-0_3How to use a DOI?
- Abstract
Many aspiring musicians face challenges in creating their own music due to the high cost of equipment and the complexity of learning music theory. These barriers make it difficult for beginners to express their creative ideas, leading to frustration and disappointment. To address this issue, we propose a system that transforms simple humming into complete musical compositions, bridging the gap between artistic intention and the means to create music.
The system uses the CREPE model to extract key features, including frequencies, timestamps, and confidence levels, from a user’s hum in.wav format. This allows the system to capture the essence of the user’s musical ideas. To further enhance the composition, deep learning algorithms like CVAE (Conditional Variational Autoencoder) and CLSTM (Convolutional Long Short-Term Memory) are applied. The CVAE adds style to the music by inferring musical characteristics based on the user’s input, while the CLSTM enhances the music’s duration and flow, ensuring a full musical sequence.
This system enables users, regardless of their musical background or access to expensive equipment, to turn their hums into polished musical pieces. It democratizes the music creation process, making it easier, more affordable, and accessible to anyone with a passion for music. By simplifying the process of composition, this system opens up creative possibilities for people who otherwise might not have had the tools or knowledge to make music.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - N. Sandeep Chaitanya AU - Chinthoju Rohith AU - Dunna Shiva Prasad AU - Golla Durgesh Yadav AU - Natta Rishitha PY - 2025 DA - 2025/06/22 TI - Hybrid conditional VAE-CLSTM model for generating synchronized music from humming BT - Proceedings of the International Conference on Advances and Applications in Artificial Intelligence (ICAAAI 2025) PB - Atlantis Press SP - 17 EP - 31 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-738-0_3 DO - 10.2991/978-94-6463-738-0_3 ID - Chaitanya2025 ER -