[20250303 Yi Zhu] Generalizing Audio Deepfake Detection via Style-Linguistics Alignment Pretraining

Settings

Always show controls

Qualities

UbiCast player

Keyboard shortcuts

Action

Key

Play / Pause

K or space

Mute / Unmute

Toggle fullscreen mode

Select next subtitles

Select next audio track

Toggle automatic slides maximization

Seek 5s backward

left arrow

Seek 5s forward

right arrow

Seek 10s backward

shift + left arrow or J

Seek 10s forward

shift + right arrow or L

Seek 60s backward

control + left arrow

Seek 60s forward

control + right arrow

Seek 1 frame backward

alt + left arrow

Seek 1 frame forward

alt + right arrow

Decrease volume

shift + down arrow

Increase volume

shift + up arrow

Decrease playback rate

Increase playback rate

Seek to end

end

Seek to beginning

beginning

Loading Click here to add:

Subscribe to notifications

When subscribed to notifications, an email will be sent to you for all added annotations.

Your user account has no email address.

Information on this media

10 views

Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized by generative AI models. Existing ADD models suffer from generalization issues to unseen attacks, with a large performance discrepancy between in-domain and out-of-domain data. In this work, we introduce a new ADD model that explicitly uses the Style-LInguistics Mismatch (SLIM) in fake speech to separate them from real speech. SLIM first employs self-supervised pretraining on only real samples to learn the style-linguistics dependency in the real class. The learned features are then used in complement with standard pretrained acoustic features (e.g., Wav2vec) to learn a classifier on the real and fake classes. When the feature encoders are frozen, SLIM outperforms benchmark methods on out-of-domain datasets while achieving competitive results on in-domain data. The features learned by SLIM allow us to quantify the (mis)match between style and linguistic content in a sample, hence facilitating an explanation of the model decision.

Creation date: April 23, 2025

Speakers: Karla Pizzi

Links:

Latest annotations RSS feed