JAIST Repository: Improving speech emotion dimensions estimation using a three-layer model of human perception

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/11935

タイトル:	Improving speech emotion dimensions estimation using a three-layer model of human perception
著者:	Elbarougy, Reda Akagi, Masato
キーワード:	Emotion dimensions Automatic speech emotion recognition Multi-layer model Fuzzy Inference Systems (FIS)
発行日:	2014
出版者:	Acoustical Society of Japan
誌名:	Acoustical Science and Technology
巻:	35
号:	2
開始ページ:	86
終了ページ:	98
DOI:	10.1250/ast.35.86
抄録:	Most previous studies using the dimensional approach mainly focused on the direct relationship between acoustic features and emotion dimensions (valence, activation, and dominance). However, the acoustic features that correlate to valence dimension are very few and very weak. As a result, the valence dimension has been particularly difficult to predict. The purpose of this research is to construct a speech emotion recognition system that has the ability to precisely estimate values of emotion dimensions especially valence. This paper proposes a three-layer model to improve the estimating values of emotion dimensions from acoustic features. The proposed model consists of three layers: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. First, a top-down acoustic feature selection method based on this model was conducted to select the most relevant acoustic features for each emotion dimension. Then, a button-up method was used to estimate values of emotion dimensions from acoustic features by firstly using fuzzy inference system (FIS) to estimate the degree of each semantic primitive from acoustic features, then using another FIS to estimate values of emotion dimensions from the estimated degrees of semantic primitives. The experimental results reveal that the constructed emotion recognition system based on the proposed three-layer model outperforms the conventional system.
Rights:	Copyright (C) 2014 Acoustical Society of Japan. Reda Elbarougy and Masato Akagi, Acoustical Science and Technology, 35(2), 2014, 86-98. http://dx.doi.org/10.1250/ast.35.86
URI:	https://hdl.handle.net/10119/11935
資料タイプ:	author
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
AST2014_Reda.pdf		360Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)