JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/15379

タイトル: Nonparallel Dictionary-Based Voice Conversion Using Variational Autoencoder with Modulation-Spectrum-Constrained Training
著者: Ho, Tuan Vu
Akagi, Masato
発行日: 2018-07-25
出版者: 信号処理学会
誌名: Journal of Signal Processing
巻: 22
号: 4
開始ページ: 189
終了ページ: 192
DOI: 10.2299/jsp.22.189
抄録: In this paper, we present a nonparallel voice conversion (VC) approach that does not require parallel data or linguistic labeling for the training process. Dictionary-based voice conversion is a class of methods aiming to decompose speech into separate factors for manipulation. Non-negative matrix factorization (NMF) is the most common method to decompose an input spectrum into a weighted linear combination of a set comprising a dictionary (basis) and weights. However, the requirement for parallel training data in this method causes several problems: 1) limited practical usability when parallel data are not available, 2) the additional error from the alignment process degrades the output speech quality. To alleviate these problems, we present a dictionary-based VC approach by incorporating a variational autoencoder (VAE) to decompose an input speech spectrum into a speaker dictionary and weights without parallel training data. According to evaluation results, the proposed method achieves better speech naturalness while retaining the same speaker similarity as NMF-based VC even though unaligned data is used.
Rights: Copyright (C) 2018 信号処理学会. Tuan Vu Ho, Masato Akagi, Journal of Signal Processing, 22(4), 2018, 189-192. http://dx.doi.org/10.2299/jsp.22.189
URI: http://hdl.handle.net/10119/15379
資料タイプ: publisher
出現コレクション:b10-1. 雑誌掲載論文 (Journal Articles)


ファイル 記述 サイズ形式
2813.pdf716KbAdobe PDF見る/開く



お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係