JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/18836

Title: Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion
Authors: Atmaja, Bagus Tris
Sasou, Akira
Akagi, Masato
Keywords: Speech emotion recognition
Affective computing
Audiotextual information
Bimodal fusion
Information fusion
Issue Date: 2022-03-26
Publisher: Elsevier
Magazine name: Speech Communication
Volume: 140
Start page: 11
End page: 28
DOI: 10.1016/j.specom.2022.03.002
Abstract: Speech emotion recognition (SER) is traditionally performed using merely acoustic information. Acoustic features, commonly are extracted per frame, are mapped into emotion labels using classifiers such as support vector machines for machine learning or multi-layer perceptron for deep learning. Previous research has shown that acoustic-only SER suffers from many issues, mostly on low performances. On the other hand, not only acoustic information can be extracted from speech but also linguistic information. The linguistic features can be extracted from the transcribed text by an automatic speech recognition system. The fusion of acoustic and linguistic information could improve the SER performance. This paper presents a survey of the works on bimodal emotion recognition fusing acoustic and linguistic information. Five components of bimodal SER are reviewed: emotion models, datasets, features, classifiers, and fusion methods. Some major findings, including state-of-the-art results and their methods from the commonly used datasets, are also presented to give insights for the current research and to surpass these results. Finally, this survey proposes the remaining issues in the bimodal SER research for future research directions.
Rights: Copyright (C)2022, The Author(s). Published by Elsevier B.V. This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY). [http://creativecommons.org/licenses/by/4.0/] Bagus Tris Atmaja, Akira Sasou, Masato Akagi, Speech Communication 140, 2022, 11-28, https://doi.org/10.1016/j.specom.2022.03.002
URI: http://hdl.handle.net/10119/18836
Material Type: publisher
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
M-AKAGI-I-2.pdf2436KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology