JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/15512
|
タイトル: | A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech |
著者: | Li, Xingfeng Akagi, Masato |
キーワード: | emotion recognition emotion dimension threelayer model prosodic feature spectrogram glottal waveform |
発行日: | 2018 |
出版者: | International Speech Communication Association |
誌名: | Proc. Interspeech 2018 |
開始ページ: | 3643 |
終了ページ: | 3647 |
DOI: | 10.21437/Interspeech.2018-1820 |
抄録: | Automated emotion detection from speech has recently shifted from monolingual to multilingual tasks for human-like interaction in real-life where a system can handle more than a single input language. However, most work on monolingual emotion detection is difficult to generalize in multiple languages, because the optimal feature sets of the work differ from one language to another. Our study proposes a framework to design, implement and validate an emotion detection system using multiple corpora. A continuous dimensional space of valence and arousal is first used to describe the emotions. A three-layer model incorporated with fuzzy inference systems is then used to estimate two dimensions. Speech features derived from prosodic, spectral and glottal waveform are examined and selected to capture emotional cues. The results of this new system outperformed the existing state-of-the-art system by yielding a smaller mean absolute error and higher correlation between estimates and human evaluators. Moreover, results for speaker independent validation are comparable to human evaluators. |
URI: | http://hdl.handle.net/10119/15512 |
資料タイプ: | publisher |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
InterSpeech2018_Xingfeng.pdf | | 576Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|