Speaker Verification Systems Haizhou Li Institute for Infocomm - PowerPoint PPT Presentation

Voice Conversion and Spoofing Attack on Speaker Verification Systems Haizhou Li Institute for Infocomm Research (I 2 R), Singapore Acknowledgements: Zhizheng Wu, Eng Siong Chng, NTU Singapore

Outline • Introduction • Speaker verification • Voice conversion and spoofing attack • Anti-spoofing attack • Future research APSIPA ASC 2013 APSIPA ASC 2013 2 APSIPA ASC 2013

Introduction Authentication To decide „Who you are‟ based on „What you have‟ and „What you know‟ Biometrics To verify identity of a living persons based on behavioral and physiological characteristics APSIPA ASC 2013

Introduction No, you are This is Jay, not verify me! Yes, Jay Mode • Text-Dependent • Text-Independent (Language-Independent) APSIPA ASC 2013 APSIPA ASC 2013 4 APSIPA ASC 2013

Speaker Recognition Spoofing Attack Spoofing attack is to use a falsifying voice as the system input No, you are Impersonation This is Jay, not verify me! Playback TTS Yes, Jay Voice conversion APSIPA ASC 2013

Introduction Summary of spoofing attack techniques Spoofing Accessibility Effectiveness (risk) technique (practicality) Text-independent Text-dependent Impersonation Low Low/unknown Low/unknown Playback High High Low (promoted text) to high (fixed phrase) Speech synthesis Medium to High High High Voice conversion Medium to High High High APSIPA ASC 2013 APSIPA ASC 2013 6 APSIPA ASC 2013

Speaker Verification • Speech to Singing Synthesis • Expressive Speech Synthesis (behavioral characteristics) Prosody speech Content Timbre • Speaker Recognition • Text-to-Speech • Voice Conversion • Speech-to-Text • Voice Impersonation (physiological characteristics) APSIPA ASC 2013 APSIPA ASC 2013

Speaker Verification Tomi Kinnunen and Haizhou Li, “An Overview of Text -Independent Speaker Recognition: from Features to Supervectors ”, Speech Communication 52(1): 12--40, January 2010 APSIPA ASC 2013

Speaker Verification Tomi Kinnunen and Haizhou Li, “An Overview of Text -Independent Speaker Recognition: from Features to Supervectors ”, Speech Communication 52(1): 12--40, January 2010 APSIPA ASC 2013 APSIPA ASC 2013 11 APSIPA ASC 2013

Speaker Verification Evaluation Metrics – Equal Error Rate (ERR): when false alarm equals miss detection – Four categories of trial decisions in speaker verification Decision Accept Reject Genuine Correct acceptance Miss detection Impostor False alarm (FAR) Correct rejection APSIPA ASC 2013 APSIPA ASC 2013 12 APSIPA ASC 2013

Speaker Verification Some Observations • Most systems use short-term spectral features (MFCC, LPCC) instead of segmental features (pitch contour, energy flow) – Systems sensitive to spectral features instead of prosodic features – Prosody could become a feature when detecting spoofing • Most systems are sensitive to channels and noises – Same speaker, different channels/noises – Different speakers, same channel/noise • All systems assume natural voice (genuine human voice) as inputs APSIPA ASC 2013 13 APSIPA ASC 2013

Voice Conversion Prosody speech Content Timbre Hello world Hello world Voice conversion Source speaker‟s voice Target speaker‟s voice Yannis Stylianou, "Voice transformation: a survey." ICASSP 2009. APSIPA ASC 2013 APSIPA ASC 2013 15 APSIPA ASC 2013

Voice Conversion System Diagram Parallel data Source Target speaker speaker Speak the same Speak the same Parameterization utterances utterances Speech alignment Conversion function Parameterization Target Source speaker speaker Synthesis filter Hello world Hello world APSIPA ASC 2013 APSIPA ASC 2013 16 APSIPA ASC 2013

Voice Conversion • Voice conversion demo – Using 10 utterances (around 30 seconds speech) to train the mapping function – Only transform the timbre while keeping the prosody Source Target Converted Male-to-male Male-to-female APSIPA ASC 2013 APSIPA ASC 2013 17 APSIPA ASC 2013

Voice Conversion Spoofing Attack • Four categories of trial decisions in speaker verification Decision Accept Reject Genuine Correct acceptance Miss detection Impostor False alarm (FAR) Correct rejection • Spoofing attacks increase the false alarm, and thus increase equal error rate • Move impostor‟s score distribution towards that of genuine APSIPA ASC 2013

Voice Conversion Spoofing Attack • Dataset design (use a subset of NIST SRE 2006 core task) • An extreme dataset in which all impostors are voice-converted Standard speaker Spoofing attack verification Unique speakers 504 504 Genuine trials 3,978 3,978 Impostor trials 2,782 0 Impostor trials (via VC) 0 2,782 Tomi Kinnunen, Zhizheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, Haizhou Li, "Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech", ICASSP 2012. APSIPA ASC 2013

Voice Conversion Spoofing Attack • Score distributions before and after spoofing attack 300 Genuine Impostor Impostor via VC Decision threshold 250 More false 200 Acceptance! Number of trials 150 100 50 0 -200 -150 -100 -50 0 50 100 Recoganizer score Tomi Kinnunen, Zhizheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, Haizhou Li, "Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech", ICASSP 2012. APSIPA ASC 2013

Voice Conversion Spoofing Attack A summary of spoofing attack studies (mostly Text-independent test) EER and FAR increase considerably under spoofing attack! Anthony Larcher and Haizhou Li, The RSR2015 Speech Corpus, IEEE SLTC Newsletter, May 2012 APSIPA ASC 2013

Voice Conversion Spoofing Attack • EER and FAR increase as the number of training utterances for voice conversion increases • Text-dependent test on RSR 2015 database Male Female # of training EER FAR EER FAR utterances for VC Baseline 2.92 2.92 2.39 2.39 VC 2 utterances 3.90 4.80 1.78 1.06 VC 5 utterances 5.07 9.17 2.51 2.64 VC 10 utterances 7.04 16.20 2.82 3.77 VC 20 utterances 8.30 21.87 3.12 4.68 APSIPA ASC 2013

Anti-spoofing attack • More accurate speaker verification system is never good enough – JFA, PDLA, i-vector • Synthetic speech detection – the absence of natural speech phase [1] – the use of F0 statistics to detect spoofing attacks [3] – synthetic speech generated according to the specific algorithm [2] provokes lower variation in frame-level log-likelihood values than natural speech • Countermeasures are specific to a type of synthetic speech, therefore, easily overcome by other voice conversion techniques 1) Z. Wu, T. Kinnunen, E. S. Chng, H. Li, and E. Ambikairajah, "A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case," in Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific. IEEE, 2012, pp. 1-5 2) T. Satoh, T. Masuko, T. Kobayashi, and K. Tokuda, "A robust speaker verification system against imposture using an HMM-based speech synthesis system," in Proc. Eurospeech, 2001. 3) A. Ogihara, H. Unno, and A. Shiozakai, "Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification," IEICE transactions on fundamentals of electronics, communications and computer sciences, vol. 88, no. 1, pp. 280-286, jan 2005 APSIPA ASC 2013

Anti-spoofing attack • Artifacts are introduced during analysis-synthesis process Source Analysis Artifact is also Transformation introduced here! Artifact is introduced! function Synthesis Target Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition", Interspeech 2012 APSIPA ASC 2013

Anti-spoofing attack • Artifacts are introduced during analysis-synthesis process Source Analysis Learn the artifacts! Synthesis Target Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition", Interspeech 2012 APSIPA ASC 2013

Anti-spoofing attack • Natural speech vs copy-synthesis speech #1 #2 #3 #4 #5 Natural Synthetic APSIPA ASC 2013

Speaker Verification Systems Haizhou Li Institute for Infocomm - PowerPoint PPT Presentation

Voice Conversion and Spoofing Attack on Speaker Verification Systems Haizhou Li Institute for Infocomm Research (I 2 R), Singapore Acknowledgements: Zhizheng Wu, Eng Siong Chng, NTU Singapore Outline Introduction Speaker verification

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Verification of Robotics and Autonomous Deep Learning Verification Systems Safety Definition

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

Verification and Validation for Safety in Robots Kerstin Eder Design Automation and Verification

SYSTEMS USING SOFTWARE VERIFICATION Parasara Sridhar Duggirala UConn Mahesh Viswanathan

Model-Checking Acknowledgment Formal Verification Formal verification means to apply

Introduction to Electronic Verification Electronic Verification of Educational Records

INDEPENDENT INTEGRATED VERIFICATION AND VALIDATION (I 2 V 2 ) INDEPENDENT VERIFICATION and

Preparing for Re-Verification Brief Mr. Terrence Moultrie Chief, Verification and Executive

Verification Reporting Public and Nonpublic Verification Collection Report Deb Linderblood

Verification regulations: Description Draft to define model for meeting verification

Cartographic Visualization Jennifer Tillett November 10, 2004 From Metaphor to Method:

INTRODUCTION TO MUSICAL TIMBRE II YU / LAMONT FEBRUARY 22, 2018 LINGUIST 197M, SPRING 2018.

CS 398 ACC MapReduce Part 1 Prof. Robert J. Brunner Ben Congdon Tyler Kim Data Science

Logic Synthesis in the Twilight of Moores Law Near-threshold, Heterogeneous, 3D Design Looking

WCET Analysis for Multi-Core Processors with Shared Buses and Event-Driven Bus Arbitration

IPPM Considerations for the IPv6 PDM Destination Option Nalini Elkins Inside Products, Inc.

Programming Tools for Embedded Multicore Jakob Engblom Technical Marketing Manager Simics

In Search of Lost Time Bernadette Charron-Bost CNRS / Ecole Polytechnique, France Martin Hutle

Sambuz

Useful Links

Newsletter

Mail Us