JAIST Repository: A Study on Restoration of Bone-Conducted Speech with MTF-Based and LP-Based Models

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: https://hdl.handle.net/10119/4015

タイトル:	A Study on Restoration of Bone-Conducted Speech with MTF-Based and LP-Based Models
著者:	Thang, Tat Vu Kimura, Kenji Unoki, Masashi Akagi, Masato
キーワード:	bone-conducted (BC) speech air-conducted (AC) speech modulation transfer function (MTF) linear prediction (LP) speech intelligibility
発行日:	2006
出版者:	信号処理学会
誌名:	Journal of signal processing : 信号処理
巻:	10
号:	6
開始ページ:	407
終了ページ:	417
抄録:	Bone-conducted speech in an extremely noisy environment seems to be more advantageous than normal noisy speech (i.e., noisy air-conducted speech) because of its stability against surrounding noise. The sound quality of bone-conducted speech, however, is very low and restoring bone-conducted speech is a challenging new topic in the speech signal-processing field. We describe two types of models for restoration: one based on the modulation transfer function (MTF) and the other based on linear prediction (LP). The MTF-based model is expected to yield a restored signal with higher intelligibility while the LP-based model is expected to yield one that is not only more intelligible to human hearing systems but also enables automatic speech recognition (ASR) systems to achieve better performance. To evaluate the ability of these models to improve voice-quality, we compared them with the other previous two models using one subjective and three objective measurements. The mean opinion score (MOS) and log-spectrum distortion (LSD) were used to evaluate the improvements in intelligibility, which is useful for human hearing systems. The distances based on LP coefficients and mel-frequency cepstral coefficients (MFCCs) were used to evaluate improvements in cepstral distances which are useful for ASR systems. The results proved that both the MTF-based and LP-based models are better than the other previous models for improving intelligibility. They particularly proved that LP-based models produces the best results for both human hearing and ASR systems.
Rights:	信号処理学会, Thang Tat Vu, Kenji Kimura, Masashi Unoki and Masato Akagi, Journal of signal processing : 信号処理, 10(6), 2006, 407-417.
資料タイプ:	Article
URI:	https://hdl.handle.net/10119/4015
資料タイプ:	publisher
出現コレクション:	b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル	記述	サイズ	形式
62-3.pdf		6594Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課学術情報係 (ir-sys[at]ml.jaist.ac.jp)