JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/16656
|
タイトル: | Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder |
著者: | Ho, Tuan Vu Akagi, Masato |
キーワード: | Voice conversion speaker embedding voice characteristics control variational autoencoder non-parallel data |
発行日: | 2019-11-19 |
出版者: | Institute of Electrical and Electronics Engineers (IEEE) |
誌名: | 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) |
開始ページ: | 106 |
終了ページ: | 111 |
DOI: | 10.1109/APSIPAASC47483.2019.9023264 |
抄録: | We propose a flexible non-parallel voice conversion (VC) system that is capable of both performing speaker adaptation and controlling speaker individuality. The proposed VC framework aims to tackle the inability to arbitrarily modify voice characteristics in the converted waveform of conventional VC model. To achieve this goal, we use the speaker embedding realized by a Variational Autoencoder (VAE) instead of one-hot encoded vectors to represent and modify the target voice's characteristics. Neither parallel training data, linguistic label nor time alignment procedure is required to train our system. After training on a multi-speaker speech database, the proposed VC system can adapt an arbitrary source speaker to any target speaker using only one sample from a target speaker. The speaker individuality of converted speech can be controlled by modifying the speaker embedding vectors; resulting in a fictitious speaker individuality. The experimental results showed that our proposed system is similar to conventional non-parallel VAE-based VC and better than the parallel Gaussian Mixture Model (GMM) in both perceived speech naturalness and speaker similarity; even when our system only uses one sample from target speaker. Moreover, our proposed system can convert a source voice to a fictitious target voice with well perceived speech naturalness of 3.1 MOS. |
Rights: | This is the author's version of the work. Copyright (C) 2019 IEEE. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp.106-111. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
URI: | http://hdl.handle.net/10119/16656 |
資料タイプ: | author |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
APSIPA_2019_106.pdf | | 1162Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|