JAIST Repository: Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/18157

タイトル:	Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
著者:	Ho, Tuan Vu Nguyen, Quoc Huy Akagi, Masato Unoki, Masashi
キーワード:	Speech enhancement vector-quantized variational autoencoder complex Wiener filter noise reduction
発行日:	2022-09
出版者:	International Speech Communication Association
誌名:	Proc. InterSpeech 2022
開始ページ:	176
終了ページ:	180
DOI:	10.21437/Interspeech.2022-443
抄録:	Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the stateof-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
Rights:	Copyright (C) 2022 International Speech Communication Association. Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.176-180. doi: 10.21437/Interspeech.2022-443
URI:	http://hdl.handle.net/10119/18157
資料タイプ:	publisher
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
ho22_interspeech.pdf		2122Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)