JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/9980
|
タイトル: | Phoneme-based Spectral Voice Conversion Using Temporal Decomposition and Gaussian Mixture Model |
著者: | Nguyen, Binh Phu Akagi, Masato |
キーワード: | spectral voice conversion temporal decomposition Gaussian mixture model (GMM) |
発行日: | 2008-06 |
出版者: | Institute of Electrical and Electronics Engineers (IEEE) |
誌名: | Second International Conference on Communications and Electronics, 2008 (ICCE 2008) |
開始ページ: | 224 |
終了ページ: | 229 |
抄録: | In state-of-the-art voice conversion systems, GMM-based voice conversion methods are regarded as some of the best systems. However, the quality of converted speech is still far from natural. There are three main reasons for the degradation of the quality of converted speech: (i) modeling the distribution of acoustic features in voice conversion often uses unstable frames, which degrades the precision of GMM parameters (ii) the transformation function may generate discontinuous features if frames are processed independently (iii) over-smooth effect occurs in each converted frame. This paper presents a new spectral voice conversion method to deal with the two first draw-backs of standard spectral modification methods, insufficient precision of GMM parameters and insufficient smoothness of the converted spectra between frames. A speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to effectively model the spectral evolution. For improvement of estimation of GMM parameters, we use phoneme-based features of event targets as spectral vectors in training procedure to take into account relations between spectral parameters in each phoneme, and to avoid using spectral parameters in transition parts. For enhancement of the continuity of speech spectra, we only need to convert event targets, instead of converting source features to target features frame by frame, and the smoothness of converted speech is ensured by the shape of the event functions. Experimental results show that our proposed spectral voice conversion method improves both the speech quality and the speaker individuality of converted speech. |
Rights: | Copyright (C) 2008 IEEE. Reprinted from Second International Conference on Communications and Electronics, 2008 (ICCE 2008), 2008, 224-229. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of JAIST's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. |
URI: | http://hdl.handle.net/10119/9980 |
資料タイプ: | publisher |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
ICCE2008_BA.pdf | | 327Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|