JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/7750
|
タイトル: | Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems |
著者: | Lu, Xugang Unoki, Masashi Akagi, Masato |
キーワード: | Power envelope restoration Speech recognition Modulation transfer function Power envelope inverse filtering |
発行日: | 2008 |
出版者: | Acoustical Society of Japan(日本音響学会) |
誌名: | Acoustical Science and Technology |
巻: | 29 |
号: | 6 |
開始ページ: | 351 |
終了ページ: | 361 |
DOI: | 10.1250/ast.29.351 |
抄録: | To reduce speech degradation in reverberant environments, we previously proposed a modulation-transfer-function (MTF)-based method of speech dereverberation. By considering the temporal modulation properties of speech, and the exponential decay properties of the power envelope of the impulse response of room acoustics, we obtained the following MTF relation: the sub-band power envelope of reverberant speech that can be represented as a convolution between the sub-band power envelope of clean speech and the power envelope of the impulse response of room acoustics. On the basis of the MTF relation, inverse MTF filtering can be applied to restoring the power envelopes of reverberant speech. Therefore, the impulse response of the room acoustics in this restoration dose not need to be measured at any time since we model the power envelope of the impulse response as an exponential decay function. We have tested how effective this method is as a front-end for automatic speech recognition (ASR) systems in artificial and real reverberant environments. Reverberant speech signals were created by simply convoluting clean speech (AURORA-2J database) with the artificially produced or real impulse responses of room acoustics. A method based on the auditory power spectrum was used as a baseline for comparison. Compared with the baseline, the proposed method for artificial reverberant environments produced a 35.67% relative improvement in the error reduction rate (on average, for reverberation times from 0.2 to 2.0 s), and for real reverberant environments (43 reverberant impulse responses), it produced a 25.78% relative improvement in the error reduction rate. The results demonstrate that our new approach can improve the robustness of speech-recognition systems in reverberant environments, and it performs better than conventional methods. |
Rights: | Copyright (C)2008 日本音響学会, Xugang Lu, Masashi Unoki and Masato Akagi, Acoustical Science and Technology, 29(6), 2008, 351-361. |
URI: | http://hdl.handle.net/10119/7750 |
資料タイプ: | publisher |
出現コレクション: | b10-1. 雑誌掲載論文 (Journal Articles)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
AST_LUA08.pdf | | 502Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|