JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/15379
|
タイトル: | Nonparallel Dictionary-Based Voice Conversion Using Variational Autoencoder with Modulation-Spectrum-Constrained Training |
著者: | Ho, Tuan Vu Akagi, Masato |
発行日: | 2018-07-25 |
出版者: | 信号処理学会 |
誌名: | Journal of Signal Processing |
巻: | 22 |
号: | 4 |
開始ページ: | 189 |
終了ページ: | 192 |
DOI: | 10.2299/jsp.22.189 |
抄録: | In this paper, we present a nonparallel voice conversion (VC) approach that does not require parallel data or linguistic labeling for the training process. Dictionary-based voice conversion is a class of methods aiming to decompose speech into separate factors for manipulation. Non-negative matrix factorization (NMF) is the most common method to decompose an input spectrum into a weighted linear combination of a set comprising a dictionary (basis) and weights. However, the requirement for parallel training data in this method causes several problems: 1) limited practical usability when parallel data are not available, 2) the additional error from the alignment process degrades the output speech quality. To alleviate these problems, we present a dictionary-based VC approach by incorporating a variational autoencoder (VAE) to decompose an input speech spectrum into a speaker dictionary and weights without parallel training data. According to evaluation results, the proposed method achieves better speech naturalness while retaining the same speaker similarity as NMF-based VC even though unaligned data is used. |
Rights: | Copyright (C) 2018 信号処理学会. Tuan Vu Ho, Masato Akagi, Journal of Signal Processing, 22(4), 2018, 189-192. http://dx.doi.org/10.2299/jsp.22.189 |
URI: | http://hdl.handle.net/10119/15379 |
資料タイプ: | publisher |
出現コレクション: | b10-1. 雑誌掲載論文 (Journal Articles)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
2813.pdf | | 716Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|