JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/18719

Title: Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function
Authors: Ngo, Thuanvan
Kubo, Rieko
Akagi, Masato
Keywords: Modulation transfer function
modulation spectrum
Issue Date: 2021-10-01
Publisher: Elsevier
Magazine name: Speech Communication
Volume: 135
Start page: 11
End page: 24
DOI: 10.1016/j.specom.2021.09.004
Abstract: This study focuses on identifying effective features for controlling speech to increase speech intelligibility under adverse conditions. Previous approaches either cancel noise throughout speech presentation or preprocess speech by controlling its intensity and/or spectra. Among them, a method based on modulation transfer function theory, inverting the environmental effects to anticipate attenuation of speech modulation spectrum, shows excellent potential due to its systematic and explicit derivation of intelligibility enhancement against environmental smears. However, strictly following the inverse modulation transfer function is dangerous and ineffcient as important speech features can be damaged, and it costs lots of energy to boost all smeared regions. This study takes a different approach: analyzing the relations of smeared modulation spectra by the environments for intelligibility to extract effective modifying features. First, we conduct listening tests for intelligibility in noise with different types of enhanced speech. Next, we extract acoustic and modulation frequency components in the smeared modulation spectra by noise showing high correlation with intelligibility scores. Finally, we examine the intelligibility benefits of modifying these components by performing listening tests. The results show that these components effectively increase intelligibility by at most 18%, which demonstrates that our concept is valid.
Rights: Copyright (C)2021, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license (CC BY-NC-ND 4.0). [http://creativecommons.org/licenses/by-nc-nd/4.0/] NOTICE: This is the author's version of a work accepted for publication by Elsevier. Thuanvan Ngo, Rieko Kubo, Masato Akagi, Speech Communication 135, 2021, 11-24, https://doi.org/10.1016/j.specom.2021.09.004
URI: http://hdl.handle.net/10119/18719
Material Type: author
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
M-AKAGI-I-1115.pdf3082KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology