JAIST Repository: Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/15481

タイトル:	Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation
著者:	Peng, Zhichao Zhu, Zhi Unoki, Masashi Dang, Jianwu Akagi, Masato
キーワード:	temporal modulation three-dimensional convolutional recurrent neural networks spectral-temporal representation speech emotion recognition
発行日:	2018-07-26
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	2018 IEEE International Conference on Multimedia and Expo (ICME)
開始ページ:	1
終了ページ:	6
DOI:	10.1109/ICME.2018.8486564
抄録:	The human auditory system has far superior emotion recognition abilities compared with recent speech emotion recognition systems, so research has focused on designing emotion recognition systems by mimicking the human auditory system. Psychoacoustic and physiological studies indicate that the human auditory system decomposes speech signals into acoustic and modulation frequency components, and further extracts temporal modulation cues. Speech emotional states are perceived from temporal modulation cues using the spectral and temporal receptive field of the neuron. This paper proposes an emotion recognition system in an end-to-end manner using three-dimensional convolutional recurrent neural networks (3D-CRNNs) based on temporal modulation cues. Temporal modulation cues contain four-dimensional spectral-temporal (ST) integration representations directly as the input of 3D-CRNNs. The convolutional layer is used to extract high-level multiscale ST representations, and the recurrent layer is used to extract long-term dependency for emotion recognition. The proposed method was verified on the IEMOCAP database. The results show that our proposed method can exceed the recognition accuracy compared to that of the state-of-the-art systems.
Rights:	This is the author's version of the work. Copyright (C) 2018 IEEE. 2018 IEEE International Conference on Multimedia and Expo (ICME), 2018, DOI:10.1109/ICME.2018.8486564. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
URI:	http://hdl.handle.net/10119/15481
資料タイプ:	author
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
2812.pdf		1149Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)