|
JAIST Repository >
School of Information Science >
Conference Papers >
Conference Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/18158
|
Title: | Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection |
Authors: | Li, Kai Li, Sheng Lu, Xugang Akagi, Masato Liu, Meng Zhang, Lin Zeng, Chang Wang, Longbiao Dang, Jianwu Unoki, Masashi |
Keywords: | fake audio detection data augmentation McAdams coefficients speaker anonymization |
Issue Date: | 2022-09 |
Publisher: | International Speech Communication Association |
Magazine name: | Proc. InterSpeech 2022 |
Start page: | 664 |
End page: | 668 |
DOI: | 10.21437/Interspeech.2022-10088 |
Abstract: | Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed
for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdamscoefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented
a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%. |
Rights: | Copyright (C) 2022 International Speech Communication Association. Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.664-668. doi:10.21437/Interspeech.2022-10088 |
URI: | http://hdl.handle.net/10119/18158 |
Material Type: | publisher |
Appears in Collections: | b11-1. 会議発表論文・発表資料 (Conference Papers)
|
Files in This Item:
File |
Description |
Size | Format |
li22o_interspeech.pdf | | 295Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|