JAIST Repository >
School of Information Science >
Conference Papers >
Conference Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/18157
|
Title: | Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement |
Authors: | Ho, Tuan Vu Nguyen, Quoc Huy Akagi, Masato Unoki, Masashi |
Keywords: | Speech enhancement vector-quantized variational autoencoder complex Wiener filter noise reduction |
Issue Date: | 2022-09 |
Publisher: | International Speech Communication Association |
Magazine name: | Proc. InterSpeech 2022 |
Start page: | 176 |
End page: | 180 |
DOI: | 10.21437/Interspeech.2022-443 |
Abstract: | Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM
poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational
autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the stateof-art method based on cIRM estimation during the 2020 Deep Noise Challenge. |
Rights: | Copyright (C) 2022 International Speech Communication Association. Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.176-180. doi: 10.21437/Interspeech.2022-443 |
URI: | http://hdl.handle.net/10119/18157 |
Material Type: | publisher |
Appears in Collections: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
Files in This Item:
File |
Description |
Size | Format |
ho22_interspeech.pdf | | 2122Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|