タイトル: LP-based method of blind restoration to improve intelligibility of bone-conducted speech
著者: Thang, Tat Vu
Unoki, Masashi
Akagi, Masato
発行日: 2007-10-05
巻: IS-RR-2007-011
抄録: Bone-conducted (BC) speech can be used instead of air-conducted (AC) speech in an extremely noisy environment. However, its intelligibility is degraded when transmitted through bone-conduction. Therefore, voice quality and the intelligibility of BC speech need to be blindly improved in actual communication through speech and this is a challenging new topic in the field of speech signal processing. We proposed a linear prediction (LP) based model to restore BC speech to improve voice quality in a previous study. While other methods such as Long-term Fourier transform need to use numerous AC speech parameters to restore BC speech, the model we proposed demonstrated the expressed ability of blindly restoring BC speech by predicting AC-LP coefficients from BC-LP coefficients. We improved the previous model by (1) extending long-term processing to frame-basis processing, (2) using line spectral frequency (LSF) coefficients on an LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with others to find out whether it could adequately improve voice quality and the intelligibility of BC speech, using objective measures (i.e., LSD, MCD, and LCD) and carrying out a subjective measure — a Japanese-word intelligibility test (JWIT). The experimental results proved significant improvements to our newly proposed models (LSF and LSF-SRN). The LSF model demonstrated it had significant capabilities for improving BC speech, i.e., both voice quality and intelligibility of speech. Our proposed model, LSF-SRN, demonstrated an expressed capability for improving the intelligibility of BC speech even when using blind restoration.
