JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/15379

タイトル: Nonparallel Dictionary-Based Voice Conversion Using Variational Autoencoder with Modulation-Spectrum-Constrained Training
著者: Ho, Tuan Vu
Akagi, Masato
発行日: 2018-07-25
出版者: 信号処理学会
誌名: Journal of Signal Processing
巻: 22
号: 4
開始ページ: 189
終了ページ: 192
DOI: 10.2299/jsp.22.189
抄録: In this paper, we present a nonparallel voice conversion (VC) approach that does not require parallel data or linguistic labeling for the training process. Dictionary-based voice conversion is a class of methods aiming to decompose speech into separate factors for manipulation. Non-negative matrix factorization (NMF) is the most common method to decompose an input spectrum into a weighted linear combination of a set comprising a dictionary (basis) and weights. However, the requirement for parallel training data in this method causes several problems: 1) limited practical usability when parallel data are not available, 2) the additional error from the alignment process degrades the output speech quality. To alleviate these problems, we present a dictionary-based VC approach by incorporating a variational autoencoder (VAE) to decompose an input speech spectrum into a speaker dictionary and weights without parallel training data. According to evaluation results, the proposed method achieves better speech naturalness while retaining the same speaker similarity as NMF-based VC even though unaligned data is used.
Rights: Copyright (C) 2018 信号処理学会. Tuan Vu Ho, Masato Akagi, Journal of Signal Processing, 22(4), 2018, 189-192. http://dx.doi.org/10.2299/jsp.22.189
URI: http://hdl.handle.net/10119/15379
資料タイプ: publisher
出現コレクション:b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル 記述 サイズ形式
2813.pdf716KbAdobe PDF見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

 


お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係