Safeguarding Voices via Adversarial Examples: Defense and Way Forward in the Era of GenAI
Action | Key |
---|---|
Play / Pause | K or space |
Mute / Unmute | M |
Select next subtitles | C |
Select next audio track | A |
Show slide in full page or toggle automatic source change | V |
Seek 5s backward | left arrow |
Seek 5s forward | right arrow |
Seek 10s backward | shift + left arrow or J |
Seek 10s forward | shift + right arrow or L |
Seek 60s backward | control + left arrow |
Seek 60s forward | control + right arrow |
Decrease volume | shift + down arrow |
Increase volume | shift + up arrow |
Decrease playback rate | shift + comma |
Increase playback rate | shift + dot or shift + semicolon |
Seek to end | end |
Seek to beginning | beginning |
Information on this media
The recent advancement in generative AI is bringing paradigm shifts to society. Using the contemporary AI-based voice synthesizers, it now becomes practical to produce speech that vividly mimics a specific person. While these technologies are designed to improve lives, they also pose significant risks of misuse, potentially harming voice actors' livelihoods and enabling financial scams. In recognition of such threats, existing strategies primarily focus on detecting synthetic speech. In complementary to these defenses, we propose AntiFake as a proactive approach that hinders unauthorized speech synthesis. AntiFake works by adding minor noises to speech samples, such that the attacker's synthesis attempts will lead to audio that does not sound like the target speaker. To attain an optimal balance between sample quality, protection strength, and system usability, we propose adversarial optimization on the three-way trade-offs guided by minimal user inputs. In this work, we make an initial step towards actively protecting our voices, and highlight the ongoing need for robust and sustainable defenses in this evolving landscape.