JAIST Repository: A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/16673

タイトル:	A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
著者:	Yin, Lu Li, Junfeng Yan, Yonghong Akagi, Masato
キーワード:	speech separation phase recovery amplitude estimation deep learning mask estimation
発行日:	2020-07-01
出版者:	電子情報通信学会
誌名:	IEICE Transactions Information and Systems
巻:	E103-D
号:	7
開始ページ:	1732
終了ページ:	1743
DOI:	10.1587/transinf.2019EDP7259
抄録:	The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
Rights:	Copyright (C)2020 IEICE. Lu Yin, Junfeng Li, Yonghong Yan, and Masato Akagi, IEICE Transactions Information and Systems, E103-D(7), 2020, pp.1732-1743. https://www.ieice.org/jpn/trans_online/
URI:	http://hdl.handle.net/10119/16673
資料タイプ:	publisher
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
IEICE_E103D_1732.pdf		1100Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)