JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/17071

Title: Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model
Authors: Li, Xingfeng
Akagi, Masato
Keywords: Multilingual emotion recognition
Human emotional perception
Emotional space
Three-layer model
Issue Date: 2019-04-03
Publisher: Elsevier
Magazine name: Speech Communication
Volume: 110
Start page: 1
End page: 12
DOI: 10.1016/j.specom.2019.04.004
Abstract: This study presents a scheme for multilingual speech emotion recognition. Determining the emotion of speech in general relies upon specific training data, and a different target speaker or language may present significant challenges. In this regard, we first explore 215 acoustic features from emotional speech. Second, we carry out speaker normalization and feature selection to develop a shared standard acoustic parameter set for multiple languages. Third, we use a three-layer model composed of acoustic features, semantic primitives, and emotion dimensions to map acoustics into emotion dimensions. Finally, we classify the continuous emotion dimensional values into basic categories by using the logistic model trees. The proposed approach was tested on Japanese, German, Chinese, and English emotional speech corpora. The recognition performance was examined and enhanced by cross-speaker and cross-corpus evaluation, and stressed the fact that our strategy is particularly suited for the task of multilingual emotion recognition even with a different speaker or language. The experimental results were found to be reasonably comparable with those of monolingual emotion recognizers as a reference.
Rights: Copyright (C)2019, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license (CC BY-NC-ND 4.0). [http://creativecommons.org/licenses/by-nc-nd/4.0/] NOTICE: This is the author’s version of a work accepted for publication by Elsevier. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Xingfeng Li and Masato Akagi, Speech Communication, 110, 2019, 1-12, http://dx.doi.org/10.1016/j.specom.2019.04.004
URI: http://hdl.handle.net/10119/17071
Material Type: author
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
2925.pdf271KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.

 


Contact : Library Information Section, JAIST (ir-sys[at]ml.jaist.ac.jp)