JAIST Repository: Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/16289

完全登録情報レコード

ダブリン･コア･フィールド	値	言語
contributor.author	Atmaja, Bagus Tris	en_US
contributor.author	Akagi, Masato	en_US
date.accessioned	2020-06-03T01:03:43Z	-
date.available	2020-06-03T01:03:43Z	-
date.issued	2020-05-27	en_US
identifier.uri	http://hdl.handle.net/10119/16289	-
description.abstract	The majority of research in speech emotion recognition (SER) is conducted to recognize emotion categories. Recognizing dimensional emotion attributes is also important, however, and it has several advantages over categorical emotion. For this research, we investigate dimensional SER using both speech features and word embeddings. The concatenation network joins acoustic networks and text networks from bimodal features. We demonstrate that those bimodal features, both are extracted from speech, improve the performance of dimensional SER over unimodal SER either using acoustic features or word embeddings. A significant improvement on the valence dimension is contributed by the addition of word embeddings to SER system, while arousal and dominance dimensions are also improved. We proposed a multitask learning (MTL) approach for the prediction of all emotional attributes. This MTL maximizes the concordance correlation between predicted emotion degrees and true emotion labels simultaneously. The findings suggest that the use of MTL with two parameters is better than other evaluated methods in representing the interrelation of emotional attributes. In unimodal results, speech features attain higher performance on arousal and dominance, while word embeddings are better for predicting valence. The overall evaluation uses the concordance correlation coefficient score of the three emotional attributes. We also discuss some differences between categorical and dimensional emotion results from psychological and engineering perspectives.	en_US
format.extent	744243 bytes	-
format.mimetype	application/pdf	-
language.iso	en	en_US
publisher	Cambridge University Press	en_US
rights	SIP (2020), vol. 9, e17, page 1 of 12 (c) The Author(s), 2020. Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.1017/ATSIP.2020.14	en_US
subject	Speech emotion recognition	en_US
subject	Multitask learning	en_US
subject	Feature fusion	en_US
subject	Dimensional emotion	en_US
subject	Affective computing	en_US
title	Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning	en_US
type.nii	Journal Article	en_US
identifier.niiissn	2048-7703	en_US
identifier.jtitle	APSIPA Transactions on Signal and Information Processing	en_US
identifier.volume	9	en_US
identifier.spage	e17	en_US
relation.doi	10.1017/ATSIP.2020.14	en_US
rights.textversion	publisher	en_US
language.iso639-2	eng	en_US
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
SIP_Atmaja.pdf		726Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)