Volltext-Downloads (blau) und Frontdoor-Views (grau)

Interpretable Deepfake Voice Detection: A Hybrid Deep-Learning Model and Explanation Evaluation

  • With the unprecedented advancement of Generative Artificial Intelligence (GenAI), the threat of voice scams using synthetic voices has become a serious concern across various sectors. Recent efforts have focused on identifying fake voices through handcrafted features, deep learning models, and hybrid approaches. However, most existing methods lack explainability, rendering their predictions non-transparent to users. This paper proposes a novel, interpretable, and transparent method for fake voice identification by introducing a hybrid deep learning model that leverages multiple extracted features. The hybrid model consists of two main components: the first component addresses heterogeneous feature spaces by employing deep convolutional sub-models tailored to individual features, while the second component, the terminus model, utilizes the concatenated representations from the final layers of each sub-model as input. The terminus model follows a typical multi-layer perceptron architecture, enabling effective integration and classification of the diverse feature representations. To enhance interpretability, we decompose the model’s decisions using Local Interpretable Model-agnostic Explanations (LIME), taking advantage of the identical feature representation before the concatenation layers to address challenges related to multidimensional feature representations. To evaluate the features and assess the quality of the generated explanations, we propose two metrics: importance and trust. Extensive experiments are conducted on the In-the-Wild dataset, which is designed to test the generalization capability of synthetic audio detection methods. The experimental results demonstrate that our approach achieves performance comparable to benchmark methods. Furthermore, the results based on our proposed metrics conclude that certain perceptible features demonstrate promise for generating explanations that are meaningful to general users. For reproducibility, the source code for these experiments is available in the following repository: https://github.com/jacoblarock/fake_voices_xai

Download full text files

Export metadata

Additional Services

Search Google Scholar Check availability

Statistics

Show usage statistics
Metadaten
Document Type:Conference Object
Language:English
Author:Jacob LaRock, Md Shajalal, Gunnar Stevens
Parent Title (English):Biecek, Nowaczyk et al. (Eds.): Joint Proceedings of the xAI 2025 Late-breaking Work, Demos and Doctoral Consortium co-located with the 3rd World Conference on eXplainable Artificial Intelligence (xAI 2025), Istanbul, Turkey, July 9-11, 2025
Number of pages:8
First Page:97
Last Page:104
URN:urn:nbn:de:hbz:1044-opus-91928
URL:https://ceur-ws.org/Vol-4017/#paper_13
Publisher:RWTH Aachen
Place of publication:Aachen, Germany
Publishing Institution:Hochschule Bonn-Rhein-Sieg
Date of first publication:2025/08/27
Copyright:© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Funding:This research has been funded by the AntiScam Project (Defense against communication fraud), funded by BMBF Germany, Grant reference 16KIS2214
Tag:DeepFake Detection; Explainable AI (XAI); Explanation Evaluation; Fake Voice Detection; Hybrid Model; Metrics
Departments, institutes and facilities:Fachbereich Wirtschaftswissenschaften
Institut für Verbraucherinformatik (IVI)
Projects:AntiScam - Verbundprojekt: Abwehr von Conversational Scams zum Schutz der digitalen Identität von Verbraucher:innen (DE/BMFTR/16KIS2214)
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 005 Computerprogrammierung, Programme, Daten
Entry in this database:2025/09/05
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International