JAIST Repository: Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

ヘルプ

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/14281

タイトル:	Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody
著者:	Xue, Yawen Hamada, Yasuhiro Akagi, Masato
発行日:	2016
出版者:	Institute of Electrical and Electronics Engineers (IEEE)
誌名:	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
開始ページ:	1
終了ページ:	6
DOI:	10.1109/APSIPA.2016.7820690
抄録:	This paper proposes a system to convert neutral speech to emotional with controlled intensity of emotions. Most of previous researches considering synthesis of emotional voices used statistical or concatenative methods that can synthesize emotions in categorical emotional states such as joy, angry, sad, etc. While humans sometimes enhance or relieve emotional states and intensity during daily life, synthesized emotional speech in categories is not enough to describe these phenomena precisely. A dimensional approach which can represent emotion as a point in a dimensional space can express emotions with continuous intensity. Employing the dimensional approach to describe emotion, we conduct a three-layered model to estimate displacement of the acoustic features of the target emotional speech from that of source (neutral) speech and propose a rule-based conversion method to modify acoustic features of source (neutral) speech to synthesize the target emotional speech. To convert the source speech freely and easily, we introduce two methods to parameterize dynamic features in prosody, that is, Fujisaki model for f0 contour and target prediction model for power envelope. Evaluation results show that subjects can perceive intended emotion with satisfactory order of emotional intensity and naturalness. This fact means that this system not only has the ability to synthesize emotional speech in category but also can control the order of emotional intensity in dimensional space even in the same emotion category.
Rights:	This is the author's version of the work. Copyright (C) 2016 IEEE. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, 1-6. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
URI:	http://hdl.handle.net/10119/14281
資料タイプ:	author
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
2230.pdf		753Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)