JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/16212

Title: Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends
Authors: Peng, Zhichao
Li, Xingfeng
Zhu, Zhi
Unoki, Masashi
Dang, Jianwu
Akagi, Masato
Keywords: Auditory front-ends
3D convolutions
joint spectral-temporal representations
attentionbased sliding recurrent networks
speech emotion recognition
Issue Date: 2020-01-20
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Magazine name: IEEE Access
Volume: 8
Start page: 16560
End page: 16572
DOI: 10.1109/ACCESS.2020.2967791
Abstract: Emotion information from speech can effectively help robots understand speaker’s intentions in natural human-robot interaction. The human auditory system can easily track temporal dynamics of emotion by perceiving the intensity and fundamental frequency of speech, and focus on the salient emotion regions. Therefore, speech emotion recognition combined with the auditory mechanism and attention mechanism may be an effective way. Some previous studies used auditory-based static features to identify emotion while ignoring the emotion dynamics. Some other studies used attention models to capture the salient regions of emotion while ignoring cognitive continuity. To fully utilize the auditory and attention mechanism, we first investigate temporal modulation cues from auditory front-ends and then propose a joint deep learning model that combines 3D convolutions and attention-based sliding recurrent neural networks (ASRNNs) for emotion recognition. Our experiments on the IEMOCAP and MSP-IMPROV datasets indicate that the proposed method can be effectively used to recognize the emotions of speech from temporal modulation cues. The subjective evaluation shows that the attention patterns of the attention model are basically consistent with human behaviors in recognizing the emotions.
Rights: Zhichao Peng, Xingfeng Li, Zhi Zhu, Masashi Unoki, Jianwu Dang, and Masato Akagi, IEEE Access, 8, 2020, pp.16560-16572. DOI:10.1109/ACCESS.2020.2967791. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
URI: http://hdl.handle.net/10119/16212
Material Type: publisher
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
3065.pdf5315KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology