JAIST Repository >
School of Information Science >
Conference Papers >
Conference Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/18157

Title: Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
Authors: Ho, Tuan Vu
Nguyen, Quoc Huy
Akagi, Masato
Unoki, Masashi
Keywords: Speech enhancement
vector-quantized variational autoencoder
complex Wiener filter
noise reduction
Issue Date: 2022-09
Publisher: International Speech Communication Association
Magazine name: Proc. InterSpeech 2022
Start page: 176
End page: 180
DOI: 10.21437/Interspeech.2022-443
Abstract: Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the stateof-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
Rights: Copyright (C) 2022 International Speech Communication Association. Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.176-180. doi: 10.21437/Interspeech.2022-443
URI: http://hdl.handle.net/10119/18157
Material Type: publisher
Appears in Collections:b11-1. 会議発表論文・発表資料 (Conference Papers)

Files in This Item:

File Description SizeFormat
ho22_interspeech.pdf2122KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology