JAIST Repository >
School of Information Science >
Conference Papers >
Conference Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/11563

Title: Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
Authors: Elbarougy, Reda
Akagi, Masato
Issue Date: 2013-10
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Magazine name: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
Start page: 1
End page: 10
DOI: 10.1109/APSIPA.2013.6694137
Abstract: The purpose of this study is to investigate whether emotion dimensions valence, activation, and dominance can be estimated cross-lingually. Most of the previous studies for automatic speech emotion recognition were based on detecting the emotional state working on mono-language. However, in order to develop a generalized emotion recognition system, the performance of these systems must be analyzed in mono-language as well as cross-language. The ultimate goal of this study is to build a bilingual emotion recognition system that has the ability to estimate emotion dimensions from one language using a system trained using another language. In this study, we first propose a novel acoustic feature selection method based on a human perception model. The proposed model consists of three layers: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. The experimental results reveal that the proposed method is effective for selecting acoustic features representing emotion dimensions, working with two different databases, one in Japanese and the other in German. Finally, the common acoustic features between the two databases are used as the input to the cross-lingual emotion recognition system. Moreover, the proposed cross-lingual system based on the three-layer model performs just as well as the two separate mono-lingual systems for estimating emotion dimensions values.
Rights: This is the author's version of the work. Copyright (C) 2013 IEEE. 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013, 1-10. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
URI: http://hdl.handle.net/10119/11563
Material Type: author
Appears in Collections:b11-1. 会議発表論文・発表資料 (Conference Papers)

Files in This Item:

File Description SizeFormat
Cross-lingual_Elbarougy_final.pdf403KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology