Proceedings of International Conference on Applied Innovation in IT  ·  2026/04/22  ·  Vol. 14  ·  Issue 2  ·  pp. 155–162
LLM-based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning
Qing Huang, Pooja Pol and Jianing Zhang
Understanding human states and interaction dynamics is a core goal of human-computer interaction (HCI). As interaction paradigms become more immersive, virtual reality (VR) has emerged as a powerful platform for studying collaborative work. In such settings, evaluating team collaboration states, including team performance and team resilience, requires continuous and reliable inference of latent team-level cognitive and affective states from multi-modal sensor data, such as speech signals. However, generating ground truth labels for these latent states remains challenging due to sensor-induced noise, contextual variability, and sparse expert annotations. Traditional self-reporting approaches provide only static and delayed measurements and are therefore insufficient for capturing dynamic team processes reflected in continuous speech data. In this work, we propose a large language model (LLM)-driven, agentic inference workflow for automated emotion-related synthetic ground truth generation from streaming speech data in multi-user VR environments. Leveraging the generalization capabilities of LLMs, we use In-Context Learning (ICL) with few-shot demonstrations of paired audio-based samples and their corresponding transcriptions. ICL tends to achieve task adaptation comparable to model fine-tuning while circumventing the computational overhead of parameter updates. To construct informative and robust in-context prompts, we adopt a retrieval-based selection strategy that dynamically identifies relevant audio demonstrations based on similarity in the acoustic feature space. Specifically, retrieval is guided by prosodic descriptors derived from speech signals, whereas transcript-based semantic information is incorporated directly within the prompt and interpreted by the LLM through its reasoning capabilities. This modality-aware strategy promotes consistent affective labeling and improved generalization across previously unseen speech segments. Evaluated on multi-player VR audio recordings, our methodology demonstrates potential as a scalable, data-efficient component for data-driven team-based decision support. By integrating acoustic similarity-based retrieval with LLM-based semantic reasoning, this work contributes to emerging interdisciplinary methodologies at the intersection of scientific machine learning, multi-modal systems, and AI-driven decision-making.
Large Language Models (LLMs) In-Context Learning (ICL) Synthetic Ground Truth Affective Computing Data-Driven Decision Support Virtual Reality (VR).
References
  1. I. A. Castiblanco Jimenez, E. C. Olivetti, E. Vezzetti, S. Moos, A. Celeghin, and F. Marcolin, “Effective affective EEG-based indicators in emotion-evoking VR environments: An evidence from machine learning,” Neural Computing and Applications, vol. 36, pp. 22245-22263, 2024.
  2. N. Gao, M. S. Rahaman, W. Shao, and F. D. Salim, “Investigating the reliability of self-report data in the wild: The quest for ground truth,” in Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp Adjunct), pp. 237-242, 2021.
  3. M. Ihori, T. Yamane, N. Kawata, N. Makishima, T. Tanaka, S. Suzuki, S. Orihashi, and R. Masumura, “Few-shot personalization via in-context learning for speech emotion recognition based on speech-language model,” arXiv preprint arXiv:2509.08344, 2025.
  4. M. Mosbach, T. Pimentel, S. Ravfogel, D. Klakow, and Y. Elazar, “Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation,” in Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, pp. 12284-12314, 2023.
  5. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in Proc. 40th International Conference on Machine Learning (ICML 2023), Honolulu, HI, USA, pp. 28492-28518, 2023, [Online]. Available: https://doi.org/10.48550/arXiv.2212.04356.
  6. A. H. Liu, A. Ehrenberg, L. A. Lo, C. Denoix, C. Barreau, G. Lample, J.-M. Delignon, K. R. Chandu, P. von Platen, and P. R. Muddireddy, “Voxtral,” arXiv preprint arXiv:2507.13264, 2025.
  7. J. Wagner, A. Triantafyllopoulos, H. Wierstorf, M. Schmitt, F. Burkhardt, F. Eyben, and B. W. Schuller, “Dawn of the transformer era in speech emotion recognition: Closing the valence gap,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 2023.
  8. S. M. Mohammad, “Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text,” in Emotion Measurement, 2nd ed., H. L. Meiselman, Ed. Cambridge, U.K.: Woodhead Publishing, 2021, pp. 323-379.
  9. S. M. Mohammad and P. D. Turney, “Crowdsourcing a word-emotion association lexicon,” Computational Intelligence, vol. 29, no. 3, pp. 436-465, 2012.
  10. S. M. Mohammad, “NRC VAD lexicon v2: Norms for valence, arousal, and dominance for over 55k English terms,” 2025, doi: 10.48550/arXiv.2503.23547.
  11. C. Lalk, K. Targan, T. Steinbrenner, J. Schaffrath, S. Eberhardt, B. Schwartz, A. Vehlen, W. Lutz, and J. Rubel, “Employing large language models for emotion detection in psychotherapy transcripts,” Frontiers in Psychiatry, vol. 16, Art. no. 1504306, 2025.
ICAIIT 2026
International Conference on Applied Innovation in IT
Bringing together researchers, engineers and practitioners to share advances in applied information technology.
Submission deadline
September 29, 2026
Paper acceptance
November 2, 2026
Journal publication
November 30, 2026
Next conference
March 11, 2027 · Köthen, Germany
© 2026 ICAIIT · Anhalt University of Applied Sciences ISSN 2198-8005 (online)

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License