Proceedings of International Conference on Applied Innovation in IT  ·  2026/03/31  ·  Vol. 14  ·  Issue 1  ·  pp. 59–68
Whisper Speech Recognition Model for Pronunciation Improvement for Autistic Patients
Ghadeer Alaa Azhr, Zaid Abdi Alkareem Alyasseri and Ali Hilal Ali
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that significantly affects speech, communication, and social interaction. Early intervention is essential, yet most existing speech training systems are limited to English, use very restricted vocabularies, and are not adapted for Arabic-speaking children. This study proposes an AI-based pronunciation training system designed specifically for Arabic-speaking children with ASD. The system integrates Text-to-Speech (TTS) for generating clear reference pronunciations and Whisper-based Automatic Speech Recognition (ASR) for transcribing and evaluating the child’s speech. Due to the lack of publicly available Arabic ASD speech datasets, synthetic data augmentation was used to improve robustness. The system evaluates pronunciation using two main metrics: Accuracy (exact match) and Similarity (normalized edit distance), enabling more flexible and encouraging feedback. A test set of 50 Modern Standard Arabic words was used for evaluation. Results showed an overall word accuracy of 76.5%, similarity of 85.2%, Word Error Rate of 23.5%, Character Error Rate of 14.8%, and Mean Opinion Score of 4.2/5. The findings indicate that the proposed system can reliably detect near-correct pronunciations and provide positive reinforcement even when strict accuracy is low. This suggests its potential as a supportive tool for incremental speech development in children with ASD, especially in Arabic-speaking environments.
Autism Pronunciation Training Arabic Speech Generative AI Whisper ASR.
References
  1. World Health Organization, “Autism spectrum disorders,” Fact Sheet, Nov. 15, 2023, [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/autism-spectrum-disorders, [Accessed: Oct. 26, 2025].
  2. N. A. Chi, P. Washington, A. Kline, A. Husic, C. Hou, C. He, K. Dunlap, and D. P. Wall, “Classifying autism from crowdsourced semistructured speech recordings: Machine-learning model comparison study,” JMIR Pediatrics and Parenting, vol. 5, no. 2, e35406, 2022, [Online]. Available: https://doi.org/10.2196/35406.
  3. B. N. Lizeta and A. S. Drigas, “Technological development process of emotional intelligence as a therapeutic recovery implement in children with ADHD and ASD comorbidity,” International Journal of Online and Biomedical Engineering, vol. 16, no. 3, pp. 75-85, 2020, [Online]. Available: https://doi.org/10.3991/ijoe.v16i03.12877.
  4. R. Hamzah, N. Jamil, N. D. Ahmad, and S. M. Z. S. Z. Ariffin, “Convolutional neural network modelling for autistic individualized education chatbot,” IAES International Journal of Artificial Intelligence, vol. 14, no. 1, pp. 109-118, 2025, [Online]. Available: https://doi.org/10.11591/ijai.v14.i1.pp109-118.
  5. K. M. Manjunath and V. Veeramani, “A novel thermal imaging-based framework for continuous ASD classification and behavior analysis using facial mood and skin temperature features,” Biomedical Signal Processing and Control, vol. 100, Art. no. 107009, 2025, [Online]. Available: https://doi.org/10.1016/j.bspc.2024.107009.
  6. S. Saranya and R. Menaka, “A quantum-based machine learning approach for autism detection using EEG signals,” IEEE Access, vol. 13, pp. 15739-15750, 2025, [Online]. Available: https://doi.org/10.1109/ACCESS.2025.3531979.
  7. J. Du, S. Wang, R. Chen, and S. Wang, “Improving fMRI-based autism severity identification via brain network distance and adaptive label distribution learning,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 33, pp. 162-174, 2025, [Online]. Available: https://doi.org/10.1109/TNSRE.2024.3516216.
  8. A. Radford, J. W. Kim, T. Xu, et al., “Robust speech recognition via large-scale weak supervision,” arXiv preprint arXiv:2212.04356, 2022, [Online]. Available: https://arxiv.org/abs/2212.04356, [Accessed: Oct. 26, 2025].
  9. J. Shen, R. Pang, R. J. Weiss, et al., “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” arXiv preprint arXiv:1712.05884, 2018, [Online]. Available: https://arxiv.org/abs/1712.05884, [Accessed: Oct. 26, 2025].
  10. Y. Ren, C. Hu, T. Qin, et al., “FastSpeech 2: Fast and high-quality end-to-end text-to-speech,” arXiv preprint arXiv:2006.04558, 2021, [Online]. Available: https://arxiv.org/abs/2006.04558, [Accessed: Oct. 26, 2025].
  11. F. Colonnese, F. Di Luzio, A. Rosato, and M. Panella, “Enhancing autism detection through gaze analysis using eye tracking sensors and data attribution with distillation in deep neural networks,” Sensors, vol. 24, no. 23, Art. no. 7792, 2024, [Online]. Available: https://doi.org/10.3390/s24237792.
  12. K. Barik, S. Dey, K. Watanabe, T. Hirosawa, Y. Yoshimura, M. Kikuchi, J. Bhattacharya, and G. Saha, “Self-supervised machine learning approach for autism detection in young children using MEG signals,” Biomedical Signal Processing and Control, vol. 98, Art. no. 106671, 2024, [Online]. Available: https://doi.org/10.1016/j.bspc.2024.106671.
  13. W. Nie, B. Zhou, Z. Wang, B. Chen, X. Wang, C. Hu, H. Li, Q. Xu, X. Xu, and H. Liu, “Computational interpersonal communication model for screening autistic toddlers: A case study of response-to-name,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 6, pp. 3683-3694, 2024, [Online]. Available: https://doi.org/10.1109/JBHI.2024.3388836.
  14. P. K. Panda, A. Elwadhi, D. Gupta, et al., “Effectiveness of IMPUTE ADT-1 mobile application in children with autism spectrum disorder,” Iranian Journal of Materials Science and Engineering, vol. 15, no. 2, pp. 262-269, 2024, [Online]. Available: https://doi.org/10.25259/JNRP_599_2023.
  15. G. Lorenzo and A. Lorenzo-Lledó, “The use of artificial intelligence for detecting emotions in autistic students during social interaction with the NAO robot: A case study,” International Journal of Information Technology (Singapore), vol. 16, no. 2, pp. 625-631, 2024, [Online]. Available: https://doi.org/10.1007/s41870-023-01682-0.
  16. A. 3DA, “Hans Asperger, Leo Kanner, and the history of autism,” 3DA Foundation Report, Jul. 27, 2021, [Online]. Available: https://www.3da.org/post/hans-asperger-leo-kanner-and-the-history-of-autism, [Accessed: Oct. 26, 2025].
  17. Autism Speaks, “What causes autism?” [Online]. Available: https://www.autismspeaks.org/what-causes-autism, [Accessed: Oct. 26, 2025].
  18. S. I. Khan, R. A. Shafee, R. Huda, M. Khaliluzzaman, and F. I. Chowdhury, “Predicting the level of autism and improvement rate from assessment dataset using machine learning techniques,” International Journal of Information Technology, vol. 15, no. 3, pp. 1647-1652, 2023, [Online]. Available: https://doi.org/10.1007/s41870-023-01212-y.
  19. C. P. Wang, “Training children with autism spectrum disorder with AI robots related to the automatic organization of sentence menus and interaction design evaluation,” Expert Systems with Applications, vol. 229, Art. no. 120527, 2023, [Online]. Available: https://doi.org/10.1016/j.eswa.2023.120527.
  20. H. Liu, T. Baoueb, M. Fontaine, J. Le Roux, and G. Richard, “GLAGrad: A Griffin–Lim extended waveform generation diffusion model,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, Apr. 2024, [Online]. Available: https://doi.org/10.1109/ICASSP48485.2024.10446058.

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

ICAIIT 2026
International Conference on Applied Innovation in IT
Navigation
Publisher
ISSN2199-8876
Location Anhalt University of Applied Sciences
Phone +49 (0) 3496 67 5611
Address Building 01, Room 425
Bernburger Str. 55
D-06366 Köthen, Germany
Open Access License

All works are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0), unless otherwise noted.

Published by ICAIIT in cooperation with Anhalt University of Applied Sciences.

© 2026 ICAIIT — International Conference on Applied Innovations in IT. Anhalt University of Applied Sciences, Köthen, Germany.
Visitors: site traffic counter