Fine Tuning Based End-to-End Indian English Speech Synthesis System
- DOI
- 10.2991/978-94-6463-740-3_2How to use a DOI?
- Keywords
- Indian English; end-to-end; fine tuning; Tacotron2; Waveglow; FastSpeech2; speech synthesis
- Abstract
Natural-sounding speech synthesis systems with end-to-end models have been designed for Spanish, American English, and Chinese. However, little work has been done on the end-to-end text-to-speech synthesis development for the Indian languages. The lack of good training data has been a challenge of this in the past. In this work, we have used approximately eight hours of training data to construct a human-resembling quality Indian English text-to-speech converting system. The checkpoints of Tacotron2, FastSpeech2, WaveGlow, and Parallel WaveGAN were pre-trained in American English, so we continued training them using the fine-tuning technique. Therefore, as far as the authors are aware, this is the best quality text-to-speech synthesis (TTS) for Indian English that has not been accomplished yet. Our experiment yields a mean opinion score (MOS) of 4.35 ± 0.14 with the Tacotron2 model and MOS of 4.12 ± 0.17 with the FastSpeech2 model.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Manisha Gupta AU - Amita Dev AU - Poonam Bansal PY - 2025 DA - 2025/06/25 TI - Fine Tuning Based End-to-End Indian English Speech Synthesis System BT - Proceedings of the 6th International Conference on Deep Learning, Artificial Intelligence and Robotics (ICDLAIR 2024) PB - Atlantis Press SP - 3 EP - 16 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-740-3_2 DO - 10.2991/978-94-6463-740-3_2 ID - Gupta2025 ER -