JAIST Repository: Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/18103

タイトル:	Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings
著者:	Atmaja, Bagus Tris Sasou, Akira Akagi, Masato
キーワード:	Speech emotion recognition speech naturalness recognition multitask learning affective computing speech processing
発行日:	2022-07-07
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	IEEE Access
巻:	10
開始ページ:	72381
終了ページ:	72387
DOI:	10.1109/ACCESS.2022.3189481
抄録:	This paper evaluates speech emotion and naturalness recognitions by utilizing deep learning models with multitask learning and single-task learning approaches. The emotion model accommodates valence, arousal, and dominance attributes known as dimensional emotion. The naturalness ratings are labeled on a five-point scale as dimensional emotion. Multitask learning predicts both dimensional emotion (as the main task) and naturalness scores (as an auxiliary task) simultaneously. The single-task learning predicts either dimensional emotion (valence, arousal, and dominance) or naturalness score independently. The results with multitask learning show improvement from previous studies on single-task learning for both dimensional emotion recognition and naturalness predictions. Within this study, single-task learning still shows superiority over multitask learning for naturalness recognition. The scatter plots of emotion and naturalness prediction scores against the true labels in multitask learning exhibit the lack of the model; it fails to predict the low and extremely high scores. The low score of naturalness prediction in this study is possibly due to a low number of samples of unnatural speech samples since the MSP-IMPROV dataset promotes the naturalness of speech. The finding that jointly predicting naturalness with emotion helps improve the performance of emotion recognition may be embodied in the emotion recognition model in future work.
Rights:	Bagus Tris Atmaja, Akira Sasou, Masato Akagi, IEEE Access, 10, 2022, pp.72381-72387. DOI:10.1109/ACCESS.2022.3189481. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
URI:	http://hdl.handle.net/10119/18103
資料タイプ:	publisher
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
IEEEAccess2022.pdf		8209Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)