JAIST Repository: Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection

トップページ| 北陸先端科学技術大学院大学| 附属図書館

一覧

コミュニティ
& コレクション
タイトル
著者
日付
学位論文
リサーチレポート・テクニカルメモランダム

登録利用者:

登録者ページ
利用者(E-people)

当システムについて

JAIST Repository >
b. 情報科学研究科・情報科学系 >
b11. 会議発表論文・発表資料等 >
b11-1. 会議発表論文・発表資料 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/18158

タイトル:	Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection
著者:	Li, Kai Li, Sheng Lu, Xugang Akagi, Masato Liu, Meng Zhang, Lin Zeng, Chang Wang, Longbiao Dang, Jianwu Unoki, Masashi
キーワード:	fake audio detection data augmentation McAdams coefficients speaker anonymization
発行日:	2022-09
出版者:	International Speech Communication Association
誌名:	Proc. InterSpeech 2022
開始ページ:	664
終了ページ:	668
DOI:	10.21437/Interspeech.2022-10088
抄録:	Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdamscoefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%.
Rights:	Copyright (C) 2022 International Speech Communication Association. Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki, Proc. InterSpeech2022, 2022, pp.664-668. doi:10.21437/Interspeech.2022-10088
URI:	http://hdl.handle.net/10119/18158
資料タイプ:	publisher
出現コレクション:	b11-1. 会議発表論文・発表資料 (Conference Papers)

このアイテムのファイル:

ファイル	記述	サイズ	形式
li22o_interspeech.pdf		295Kb	Adobe PDF	見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

お問合せ先 : 北陸先端科学技術大学院大学　研究推進課図書館情報係 (ir-sys[at]ml.jaist.ac.jp)