JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/14744
|
タイトル: | Emotional speech synthesis system based on a three-layered model using a dimensional approach |
著者: | Xue, Yawen Hamada, Yasuhiro Akagi, Masato |
発行日: | 2015-12-19 |
出版者: | Institute of Electrical and Electronics Engineers (IEEE) |
誌名: | 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) |
開始ページ: | 505 |
終了ページ: | 514 |
DOI: | 10.1109/APSIPA.2015.7415323 |
抄録: | This paper proposes an emotional speech synthesis system based on a three-layered model using a dimensional approach. Most previous studies related to emotional speech synthesis using the dimensional approach focused on the relationship between acoustic features and emotion dimensions (valence and activation) only. However, people do not perceive emotion directly from acoustic features. Hence, the acoustic features have being particularly difficult to predict, and the affectiveness of the synthesized sound is far from that intended. The ultimate goal of this research is to improve the accuracy of acoustic feature estimation and modification rules in order to synthesize affective speech more similar to that intended in the dimensional emotion space. The proposed system is composed by three layers: acoustic features, semantic primitives, and emotion dimensions. Fuzzy Inference System (FIS) is used to connect the three layers. The related acoustic features of each semantic primitive are selected for synthesizing the emotional speech. On the basis of morphing rules, the estimated acoustic features can be applied to synthesize emotional speech. Listening tests were carried out to verify whether the synthesized speech can give the intended impression in the dimensional emotion space. Results show that not only is the accuracy of estimated acoustic features raised but also the modification rules work well for the synthesized speech, resulting in the proposed method improving the quality of synthesized speech. |
Rights: | Copyright (C) 2015 APSIPA. This material is posted here with permission of APSIPA. Yawen Xue, Yasuhiro Hamada and Masato Akagi, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, 505-514. http://dx.doi.org/10.1109/APSIPA.2015.7415323 |
URI: | http://hdl.handle.net/10119/14744 |
資料タイプ: | author |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
APSIPA2015_Xue.pdf | | 1383Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|