JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/18157
|
タイトル: | Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement |
著者: | Ho, Tuan Vu Nguyen, Quoc Huy Akagi, Masato Unoki, Masashi |
キーワード: | Speech enhancement vector-quantized variational autoencoder complex Wiener filter noise reduction |
発行日: | 2022-09 |
出版者: | International Speech Communication Association |
誌名: | Proc. InterSpeech 2022 |
開始ページ: | 176 |
終了ページ: | 180 |
DOI: | 10.21437/Interspeech.2022-443 |
抄録: | Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM
poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational
autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the stateof-art method based on cIRM estimation during the 2020 Deep Noise Challenge. |
Rights: | Copyright (C) 2022 International Speech Communication Association. Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.176-180. doi: 10.21437/Interspeech.2022-443 |
URI: | http://hdl.handle.net/10119/18157 |
資料タイプ: | publisher |
出現コレクション: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
ho22_interspeech.pdf | | 2122Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|