JAIST Repository: Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/16212

タイトル:	Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends
著者:	Peng, Zhichao Li, Xingfeng Zhu, Zhi Unoki, Masashi Dang, Jianwu Akagi, Masato
キーワード:	Auditory front-ends 3D convolutions joint spectral-temporal representations attentionbased sliding recurrent networks speech emotion recognition
発行日:	2020-01-20
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	IEEE Access
巻:	8
開始ページ:	16560
終了ページ:	16572
DOI:	10.1109/ACCESS.2020.2967791
抄録:	Emotion information from speech can effectively help robots understand speaker’s intentions in natural human-robot interaction. The human auditory system can easily track temporal dynamics of emotion by perceiving the intensity and fundamental frequency of speech, and focus on the salient emotion regions. Therefore, speech emotion recognition combined with the auditory mechanism and attention mechanism may be an effective way. Some previous studies used auditory-based static features to identify emotion while ignoring the emotion dynamics. Some other studies used attention models to capture the salient regions of emotion while ignoring cognitive continuity. To fully utilize the auditory and attention mechanism, we first investigate temporal modulation cues from auditory front-ends and then propose a joint deep learning model that combines 3D convolutions and attention-based sliding recurrent neural networks (ASRNNs) for emotion recognition. Our experiments on the IEMOCAP and MSP-IMPROV datasets indicate that the proposed method can be effectively used to recognize the emotions of speech from temporal modulation cues. The subjective evaluation shows that the attention patterns of the attention model are basically consistent with human behaviors in recognizing the emotions.
Rights:	Zhichao Peng, Xingfeng Li, Zhi Zhu, Masashi Unoki, Jianwu Dang, and Masato Akagi, IEEE Access, 8, 2020, pp.16560-16572. DOI:10.1109/ACCESS.2020.2967791. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
URI:	http://hdl.handle.net/10119/16212
資料タイプ:	publisher
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
3065.pdf		5315Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)