JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/12208

タイトル: Toward relaying emotional state for speech-to-speech translator: Estimation of emotional state for synthesizing speech with emotion
著者: Akagi, Masato
Elbarougy, Reda
発行日: 2014-07
出版者: International Institute of Acoustics and Vibration (IIAV)
誌名: Proceedings of the 21st International Congress on Sound and Vibration (ICSV21)
開始ページ: 1
終了ページ: 8
抄録: Most of the previous studies on Speech-to-Speech Translation (S2ST) focused on processing of linguistic content by directly translating the spoken utterance from the source language to the target language without taking into account the paralinguistic and non-linguistic information like emotional states emitted by the source. However, for clear communication, it is important to capture and transmit the emotional states from the source language to the target language. In order to synthesize the target speech with the emotional state conveyed at the source, a speech emotion recognition system is required to detect the emotional state of the source language. The S2ST system should enable the source and target languages to be used interchangeably, i.e. it should possess the ability to detect the emotional state of the source regardless of the language used. This paper proposes a Bilingual Speech Emotion Recognition (BSER) system for detecting the emotional state of the source language in the S2ST system. In natural speech, humans can detect the emotional states from the speech regardless of the language used. Therefore, this study demonstrates feasibility of constructing a global BSER system that has the ability to recognize universal emotions. This paper introduces a three-layer model: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. The experimental results reveal that the proposed system precisely estimates the emotion dimensions cross-lingual working with Japanese and German languages. The most important outcome is that, using the proposed normalization method for acoustic features, we found that emotion recognition is language independent. Therefore, this system can be extended for estimating the emotional state conveyed in the source languages in a S2ST system for several language pairs.
Rights: Copyright (C) 2014 International Institute of Acoustics and Vibration (IIAV). Masato Akagi and Reda Elbarougy, Proceedings of the 21st International Congress on Sound and Vibration (ICSV21), 2014, pp.1-8. This paper is based on one first published in the proceedings of the 21st International Congress on Sound and Vibration, July 2014 and is published here with permission of the International Institute of Acoustics and Vibration (IIAV.)
URI: http://hdl.handle.net/10119/12208
資料タイプ: publisher
出現コレクション:b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル 記述 サイズ形式
ICSV2014_Akagi_Reda.pdf437KbAdobe PDF見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

 


お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係