A Novel Approach for Enhancing Child Speech Synthesis Using LIESS Algorithm

N. Danapaquiame; M. Shanmugam; Arokiaraj Christian St. Hubert; M. Aishwariya Lakshmi

doi:10.2991/978-94-6239-616-6_76

<Previous Article In Volume

Next Article In Volume>

A Novel Approach for Enhancing Child Speech Synthesis Using LIESS Algorithm

Authors

N. Danapaquiame¹, M. Shanmugam²^{, *}, Arokiaraj Christian St. Hubert³, M. Aishwariya Lakshmi⁴

¹Head of Department, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India

²Associate Professor, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India

³Assistant Professor, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India

⁴UG Student, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India

^*Corresponding author. Email: shanmugam.muthalu@gmail.com

Corresponding Author

M. Shanmugam

Available Online 31 March 2026.

DOI: 10.2991/978-94-6239-616-6_76 How to use a DOI?
Keywords: LIESS; ASR; HiFi-GAN; WER; LLM
Abstract: In today’s rapidly evolving technological landscape, voice assistants have become an integral part of everyday life. However, child-specific voice assistants with natural and expressive child-like speech remain limited. Existing solutions often fail to achieve the expected level of naturalness, clarity, and efficiency, making interactions less engaging and familiar for young users. This project introduces a novel LLM-Infused Expressive Speech Synthesis (LIESS) Algorithm, which integrates Large Language Models (LLMs) and diffusion-based Text-to- Speech (TTS) techniques to enhance child speech synthesis. By leveraging diffusion models for parallel spectrogram generation and HiFi-GAN + FastPitch for refined waveform synthesis, the proposed system generates highly natural and expressive child-like voices. The workflow includes real-time speech capture via Gradio UI, speech-to-text conversion using Deepgram ASR, and emotion-aware response generation through Gemini API. Collectively these three processes are called as Voice-to-Language Engine (VTLE). To ensure high-quality output in terms of clarity and intelligibility, the synthesized speech undergoes Word Error Rate (WER) evaluations, assessing accuracy in speech recognition and linguistic precision. This project aims to revolutionize child-specific AI voice assistants by creating a more engaging, interactive, and accessible system. The innovation enhances speech applications in education, entertainment, and assistive technologies, bridging the gap between young users and AI-driven voice interactions.
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 31 March 2026
ISBN: 978-94-6239-616-6
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-616-6_76 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - N. Danapaquiame
AU  - M. Shanmugam
AU  - Arokiaraj Christian St. Hubert
AU  - M. Aishwariya Lakshmi
PY  - 2026
DA  - 2026/03/31
TI  - A Novel Approach for Enhancing Child Speech Synthesis Using LIESS Algorithm
BT  - Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
PB  - Atlantis Press
SP  - 1037
EP  - 1056
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-616-6_76
DO  - 10.2991/978-94-6239-616-6_76
ID  - Danapaquiame2026
ER  -

download .riscopy to clipboard