A Hybrid Spiking-Attention Transformer Model for Robust and Efficient Speech Emotion Recognition on Multi-Dataset Benchmarks

Abbas, Samah

doi:<a href=

10.25673/122078">

Proceedings of International Conference on Applied Innovation in IT
2025/08/29, Volume 13, Issue 4, pp.135-141

A Hybrid Spiking-Attention Transformer Model for Robust and Efficient Speech Emotion Recognition on Multi-Dataset Benchmarks

Samah Abbas Ali and Jamal Mustafa Abbas

Abstract: This study introduces a novel and effective method for Speech Emotion Recognition (SER) that combines Spiking Neural Networks (SNNs), Temporal Attention, and Transformer encoders within a powerful hybrid model. SER is essential for improving human-computer interaction by enabling intelligent systems to effectively recognize emotions from speech. Unlike traditional methods that typically rely on shallow classifiers and manually engineered features, our deep learning-based approach takes full advantage of the energy efficiency of SNNs, the selective focus provided by temporal attention, and the long-range temporal modeling capabilities of Transformer architectures. We thoroughly evaluated the performance of this model on a comprehensive multi-dataset corpus, which included TESS, SAVEE, RAVDESS, and CREMA-D. The model achieved an impressive and consistent accuracy of 98% across all emotion classes. These strong results not only demonstrate the model’s superior effectiveness but also highlight its potential for use in real-time, resource-limited environments. Furthermore, this hybrid approach clearly surpasses existing state-of-the-art SER techniques and offers a reliable foundation for application in real-world affective computing scenarios.

Keywords: Speech Emotion Recognition (SER), Spiking Neural Networks (SNN), Temporal Attention, Transformer Encoders, Deep Learning, TESS, SAVEE, RAVDESS, CREMA-D.

DOI: 10.25673/122078

Download: PDF

References:

C. Parlak, “Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines,” Biomimetics, vol. 10, no. 3, p. 167, 2025, doi: 10.3390/biomimetics10030167.
D. Y. Badawood and F. M. Aldosari, “Enhanced Deep Learning Techniques for Real-Time Speech Emotion Recognition in Multilingual Contexts,” Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18662–18669, 2024.
S. Yaser, M. S. I. Sadhin, and R. H. Ifty, “Speech Emotion Recognition using Transfer Learning Approach and Real-Time Evaluation in English and Bengali Language,” unpublished.
“Speech Emotion Recognition (en).” [Online]. Available: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en
, accessed Jan. 19, 2025.
M. A. Uddin, M. S. U. Chowdury, M. U. Khandaker, N. Tamam, and A. Sulieman, “The efficacy of deep learning-based mixed model for speech emotion recognition,” Computer Materials & Continua, vol. 74, no. 1, pp. 1709–1722, 2022.
C. Tan, M. Šarlija, and N. Kasabov, “NeuroSense: Short-term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns,” Neurocomputing, vol. 434, pp. 137–148, 2021.
M. Ezz-Eldin, A. A. M. Khalaf, H. F. A. Hamed, and A. I. Hussein, “Efficient feature-aware hybrid model of deep learning architectures for speech emotion recognition,” IEEE Access, vol. 9, pp. 19999–20011, 2021.
W. Alzhrani, M. Doborjeh, Z. Doborjeh, and N. Kasabov, “Emotion recognition and understanding using EEG data in a brain-inspired spiking neural network architecture,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2021, pp. 1–9.
B. Wang, J. Xu, L. Chen, Q. Zhang, and Y. Li, “Spiking Emotions: Dynamic Vision Emotion Recognition Using Spiking Neural Networks,” in Proc. AHPCAI, 2022, pp. 50–58.
K. Mountzouris, I. Perikos, and I. Hatzilygeroudis, “Speech emotion recognition using convolutional neural networks with attention mechanism,” Electronics, vol. 12, no. 20, p. 4376, 2023, doi: 10.3390/electronics12204376.
R. Ullah, Y. Zhang, S. Ali, H. Kim, and T. Lee, “Speech emotion recognition using convolution neural networks and multi-head convolutional transformer,” Sensors, vol. 23, no. 13, p. 6212, 2023, doi: 10.3390/s23136212.
C. S. A. Kumar, A. Das Maharana, S. M. Krishnan, S. S. S. Hanuma, G. J. Lal, and V. Ravi, “Speech emotion recognition using CNN-LSTM and vision transformer,” in Proc. Int. Conf. Innovations in Bio-Inspired Computing and Applications, 2022, pp. 86–97.
W. Li, C. Fang, Z. Zhu, C. Chen, and A. Song, “Fractal spiking neural network scheme for EEG-based emotion recognition,” IEEE J. Transl. Eng. Health Med., vol. 12, pp. 106–118, 2023.
T.-W. Kim and K.-C. Kwak, “Speech emotion recognition using deep learning transfer models and explainable techniques,” Applied Sciences, vol. 14, no. 4, p. 1553, 2024, doi: 10.3390/app14041553.
X. Tang, J. Huang, Y. Lin, T. Dang, and J. Cheng, “Speech emotion recognition via CNN-transformer and multidimensional attention mechanism,” Speech Communication, p. 103242, 2025, doi: 10.1016/j.specom.2025.103242.
Z. Wei, C. Ge, C. Su, R. Chen, and J. Sun, “A Deep Learning Model for Speech Emotion Recognition on RAVDESS Dataset,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 5, 2025.

HOME

       - Conference
       - Journal
       - Paper Submission to Conference
       - Paper Submission to Journal
       - Fee Payment
       - For Authors
       - For Reviewers
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceeding

PROCEEDINGS

       - Volume 14, Issue 1 (ICAIIT 2026)
       - Volume 13, Issue 5 (ICAIIT 2025)
       - Volume 13, Issue 4 (ICAIIT 2025)
       - Volume 13, Issue 3 (ICAIIT 2025)
       - Volume 13, Issue 2 (ICAIIT 2025)
       - Volume 13, Issue 1 (ICAIIT 2025)
       - Volume 12, Issue 2 (ICAIIT 2024)
       - Volume 12, Issue 1 (ICAIIT 2024)
       - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)

LAST CONFERENCE

       ICAIIT 2026
         - Photos
         - Reports

    PAST CONFERENCES

ETHICS IN PUBLICATIONS

ACCOMODATION

CONTACT US

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.