JAIST Repository: Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/18836

タイトル:	Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion
著者:	Atmaja, Bagus Tris Sasou, Akira Akagi, Masato
キーワード:	Speech emotion recognition Affective computing Audiotextual information Bimodal fusion Information fusion
発行日:	2022-03-26
出版者:	Elsevier
誌名:	Speech Communication
巻:	140
開始ページ:	11
終了ページ:	28
DOI:	10.1016/j.specom.2022.03.002
抄録:	Speech emotion recognition (SER) is traditionally performed using merely acoustic information. Acoustic features, commonly are extracted per frame, are mapped into emotion labels using classifiers such as support vector machines for machine learning or multi-layer perceptron for deep learning. Previous research has shown that acoustic-only SER suffers from many issues, mostly on low performances. On the other hand, not only acoustic information can be extracted from speech but also linguistic information. The linguistic features can be extracted from the transcribed text by an automatic speech recognition system. The fusion of acoustic and linguistic information could improve the SER performance. This paper presents a survey of the works on bimodal emotion recognition fusing acoustic and linguistic information. Five components of bimodal SER are reviewed: emotion models, datasets, features, classifiers, and fusion methods. Some major findings, including state-of-the-art results and their methods from the commonly used datasets, are also presented to give insights for the current research and to surpass these results. Finally, this survey proposes the remaining issues in the bimodal SER research for future research directions.
Rights:	Copyright (C)2022, The Author(s). Published by Elsevier B.V. This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY). [http://creativecommons.org/licenses/by/4.0/] Bagus Tris Atmaja, Akira Sasou, Masato Akagi, Speech Communication 140, 2022, 11-28, https://doi.org/10.1016/j.specom.2022.03.002
URI:	https://hdl.handle.net/10119/18836
資料タイプ:	publisher
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
M-AKAGI-I-2.pdf		2436Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)