JAIST Repository: Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/18116

タイトル:	Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
著者:	Atmaja, Bagus Tris Akagi, Masato
キーワード:	Automatic speech emotion recognition Affective computing Late fusion Bimodal fusion Dimensional emotion
発行日:	2020-11-19
出版者:	Elsevier
誌名:	Speech Communication
巻:	126
開始ページ:	9
終了ページ:	21
DOI:	10.1016/j.specom.2020.11.003
抄録:	Automatic speech emotion recognition (SER) by a computer is a critical component for more natural human-machine interaction. As in human-human interaction, the capability to perceive emotion correctly is essential to taking further steps in a particular situation. One issue in SER is whether it is necessary to combine acoustic features with other data such as facial expressions, text, and motion capture. This research proposes to combine acoustic and text information by applying a late-fusion approach consisting of two steps. First, acoustic and text features are trained separately in deep learning systems. Second, the prediction results from the deep learning systems are fed into a support vector machine (SVM) to predict the final regression score. Furthermore, the task in this research is dimensional emotion modeling, because it can enable deeper analysis of affective states. Experimental results show that this two-stage, late-fusion approach, obtains higher performance than that of any one-stage processing, with a linear correlation from one-stage to two-stage processing. This late-fusion approach improves previous early fusion result measured in concordance correlation coefficients score.
Rights:	Copyright (C)2020, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license (CC BY-NC-ND 4.0). [http://creativecommons.org/licenses/by-nc-nd/4.0/] NOTICE: This is the author’s version of a work accepted for publication by Elsevier. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Bagus Tris Atmaja, Masato Akagi, and Peter Birkholz, Speech Communication, 126, 2020, 9-21, https://doi.org/10.1016/j.specom.2020.11.003
URI:	https://hdl.handle.net/10119/18116
資料タイプ:	author
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
specom126.pdf		538Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)