JAIST Repository >
School of Information Science >
Articles >
Journal Articles >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/18719
|
Title: | Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function |
Authors: | Ngo, Thuanvan Kubo, Rieko Akagi, Masato |
Keywords: | Modulation transfer function modulation spectrum intelligibility |
Issue Date: | 2021-10-01 |
Publisher: | Elsevier |
Magazine name: | Speech Communication |
Volume: | 135 |
Start page: | 11 |
End page: | 24 |
DOI: | 10.1016/j.specom.2021.09.004 |
Abstract: | This study focuses on identifying effective features for controlling speech to increase speech intelligibility under adverse conditions. Previous approaches either cancel noise throughout speech presentation or preprocess speech by controlling its intensity and/or spectra. Among them, a method based on modulation transfer function theory, inverting the environmental effects to anticipate attenuation of speech modulation spectrum, shows excellent potential due to its systematic and explicit derivation of intelligibility enhancement against environmental smears. However, strictly following the inverse modulation transfer function is dangerous and ineffcient as important speech features can be damaged, and it costs lots of energy to boost all smeared regions. This study takes a different approach: analyzing the relations of smeared modulation spectra by the environments for intelligibility to extract effective modifying features. First, we conduct listening tests for intelligibility in noise with different types of enhanced speech. Next, we extract acoustic and modulation frequency components in the smeared modulation spectra by noise showing high correlation with intelligibility scores. Finally, we examine the intelligibility benefits of modifying these components by performing listening tests. The results show that these components effectively increase intelligibility by at most 18%, which demonstrates that our concept is valid. |
Rights: | Copyright (C)2021, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license (CC BY-NC-ND 4.0). [http://creativecommons.org/licenses/by-nc-nd/4.0/] NOTICE: This is the author's version of a work accepted for publication by Elsevier. Thuanvan Ngo, Rieko Kubo, Masato Akagi, Speech Communication 135, 2021, 11-24, https://doi.org/10.1016/j.specom.2021.09.004 |
URI: | http://hdl.handle.net/10119/18719 |
Material Type: | author |
Appears in Collections: | b10-1. 雑誌掲載論文 (Journal Articles)
|
Files in This Item:
File |
Description |
Size | Format |
M-AKAGI-I-1115.pdf | | 3082Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|