Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi - PowerPoint PPT Presentation

Voice Conversion and Anti-spoofing of Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Xiaohai Tian 1

Agenda • Spoofing Attacks • Voice Conversion • Artifacts • ASVspoof 2015 2

Speaker Verification Reject! This is John! Speaker Verification Yes, John! 4

Spoofing Attacks This is John! Reject! Impersonation Replay Speaker Speech Verification Synthesis Voice Conversion Yes, John! 5

Spoofing Attacks Spoofing Effectiveness (risk) Countermeasure Accessibility attack availability Text-independent Text-dependent Impersonation Low Low/unknown Low/unknown N.A. Low Replay High Low Low to high Speech Medium High High Medium synthesis to high Voice Medium High High Medium conversion to high Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre , and H. Li, “Spoofing and countermeasures for speaker verification: a survey,” Speech Communication, vol. 66, pp. 130 – 153, 2015. 6

Impersonation Spoofing Effectiveness (risk) Countermeasure Accessibility attack availability Text-independent Text-dependent Impersonation Low Low/unknown Low/unknown N.A. Low Replay High Low Low to high Speech Medium High High Medium synthesis to high Voice Medium High High Medium conversion to high • Y. Lau, D. Tran, and M. Wagner, “Testing voice mimicry with the YOHO speaker verification corpus,” in Knowledge -Based Intelligent Information and Engineering Systems. Springer, 2005, pp. 907 – 907. • J. Mariethoz and S. Bengio , “Can a professional imitator fool a GMM based speaker verification system?” IDIAP Research Report (No. Idiap- RR-61-2005), 2005. • R. G. Hautamaki, T. Kinnunen, V. Hautamaki, T. Leino, and A.-M. Laukkanen , “I -vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry,” in Interspeech 2013 7

Replay Effectiveness (risk) Spoofing Countermeasure Accessibility attack Text-independent Text-dependent availability Impersonation Low Low/unknown Low/unknown N.A. Low Replay High Low Low to high Speech Medium High High Medium synthesis to high Voice Medium High High Medium conversion to high Zhizheng Wu, Sheng Gao, Eng Siong Chng, Haizhou Li, "A study on replay attack and anti-spoofing for text-dependent speaker verification", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2014. 8

Traits of Replay • J. Villalba and E. Lleida, “Preventing replay attack on speaker verification systems’, IEEE ICCST 2011 • L. Cuccovillo, P. Aichroth, “Open - set microphone classification via blind channel analysis”, ICASSP 2016 9

Audio Fingerprinting Genuine speech 8000 Frequency (Hz) 0 1.0 2.0 3.0 Time (Seconds) Replay speech 8000 Frequency (Hz) 0 1.0 2.0 3.0 Time (Seconds) 1. A. Wang, “An industrial strength audio search algorithm,” in Proc. Int. Symposium on Music Information Retrieval (ISMIR), 2003, pp. 7 – 13. 2. Zhizheng Wu, Sheng Gao, Eng Siong Chng, Haizhou Li, "A study on replay attack and anti-spoofing for text-dependent speaker verification", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2014. 10

Spoofing Attacks Effectiveness (risk) Spoofing Countermeasure Accessibility attack availability Text-independent Text-dependent Impersonation Low Low/unknown Low/unknown N.A. Low Replay High Low Low to high Speech Medium High High Medium synthesis to high Voice Medium High High Medium conversion to high Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre , and H. Li, “Spoofing and countermeasures for speaker verification: a survey,” Speech Communication, vol. 66, pp. 130 – 153, 2015. 11

Speaker Verification: Robust Features • Modeling the human voice production system • Modeling the peripheral auditory system Tomi Kinnunen and Haizhou Li, “An Overview of Text -Independent Speaker Recognition: from Features to Supervectors ”, Speech Communication 52(1): 12 --40, January 2010 12

More Robust = More Vulnerable Reject! Reject! This is John! Synthetic No Speaker Speech Verification Detection Yes, John! 13

Voice Conversion: Vocoder Source Target Speaker Speaker Feature Synthesis Analysis conversion 15

Vocoder: Analysis - Synthesis source target Analysis Synthesis Source Target Speaker Speaker Feature Synthesis Analysis conversion 16

Vocoder Sinusoidal vocoders Harmonic plus noise model (HNM) vocoder Harmonic and stochastic vocoder Adaptive harmonic vocoder Source-filter model Linear predictive vocoder Mel – generalised cepstral vocoder STRAIGHT Glottal vocoder 17

Vocoder: Copy Synthesis Source Target Synthesis Analysis Feature EER (%) MFCC 10.98 MGDCC 1.25 MGDCC+PM 0.89 Z. Wu, X. Xiao, E.S. Chng, H. Li, “Synthetic Speech Detection Using Temporal Modulation Feature”, ICASSP 2013 18

Voice Conversion: Feature Conversion Source Target Speaker Speaker Feature Synthesis Analysis conversion 19

Differences between Speakers Speaker A Speaker B Z. Wu, Spectral Mapping for Voice Conversion, Ph.D Thesis, Nanyang Technological University, 2015 20

Basics of Voice Conversion Conversion Training 21

Chronological Map of Voice Conversion Z. Wu et al, Tutorial Notes, APSIPA ASC 2015 22

Voice Conversion: Codebook Mapping source target • Z. Wu, Spectral mapping for voice conversion, Ph.D Thesis, Nanyang Technological University, 2015 • Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara. "Voice conversion through vector quantization." ICASSP 1988 23

Voice Conversion: Joint Density GMM source target • Alexander Kain, and Michael W. Macon. "Spectral voice conversion for text-to-speech synthesis." ICASSP 1998 24

Voice Conversion: Frequency Warping Target spectrum Use partially the source spectrum information Source spectrum • Daniel Erro, Asunción Moreno, and Antonio Bonafonte. "Voice conversion based on weighted frequency warping." IEEE Transactions on Audio, Speech, and Language Processing, 18, no. 5 (2010): 922-931. • Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Eng Siong Chng, Minghui Dong, "Sparse representation for frequency warping based voice conversion", ICASSP 2015 25

Voice Conversion: Frame/Unit Selection • Thierry Dutoit, Andre Holzapfel, Matthieu Jottrand, Alexis Moinet, J. M. Perez, and Yannis Stylianou. "Towards a voice conversion system based on frame selection." ICASSP 2007. • Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Exemplar-based unit selection for voice conversion utilizing temporal information", Interspeech 2013 26

Unit Selection Synthesis • Source symbol – Target segment costs: suitability of unit for target • Target segment -Target segment costs: acoustic continuity of two adjacent units # dh ax c ae t s ae t # # dh ax c ae t s ae t # # dh ax c ae t s ae t # # dh ax c ae t s ae t # # dh ax c ae t s ae # # ax ae t ae # ae Z. Wu et al, Tutorial Notes, APSIPA ASC 2015

Evaluation of Synthetic Voice Subjective Analysis Objective Analysis “ Spoofing Analysis” 1. Spectral distortion 2. Temporal (magnitude/phase) discontinuity 3. Spectro-temporal artifacts 4. Pitch pattern 5. ASVspoof 2015 ? 28

Artifacts Magnitude • Short-time Fourier transform (STFT) • Smoothing effect (local vs global optimization) • Temporal magnitude discontinuity Phase • Minimum phase vocoding • Phase distortion • Temporal phase discontinuity … that are common to synthetic speech … that are different from natural speech 30

Magnitude: STFT • Time-Frequency resolution • Spectral leakage • Windowing tradeoffs 31

Magnitude: Smoothing in Vocoder Hideki Kawahara, STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, Acoust. Sci. & Tech. 27, 6 (2006) 32

Magnitude : Smoothing in synthesized/converted speech A. Kain and M. W. Macon, “Spectral voice conversion for text -to- speech synthesis,” in ICASSP 1998. Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura “Speech Synthesis Based on Hidden Markov Models” Proceedings of The IEEE, 2013 33

Magnitude : Log Magnitude Spectrum Natural speech Copy synthetic speech Absolute difference X. Tian, Z. Wu, X. Xiao, E. S. Chng, H. Li, "Spoofing detection from a feature representation perspective", ICASSP 2016 3 Tian Xiaohai 4

Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi - PowerPoint PPT Presentation

Voice Conversion and Anti-spoofing of Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Xiaohai Tian 1 Agenda Spoofing Attacks Voice Conversion Artifacts ASVspoof

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

Introduction to Electronic Verification Electronic Verification of Educational Records

Verification and Validation for Safety in Robots Kerstin Eder Design Automation and Verification

INDEPENDENT INTEGRATED VERIFICATION AND VALIDATION (I 2 V 2 ) INDEPENDENT VERIFICATION and

Preparing for Re-Verification Brief Mr. Terrence Moultrie Chief, Verification and Executive

Verification Reporting Public and Nonpublic Verification Collection Report Deb Linderblood

Verification of Robotics and Autonomous Deep Learning Verification Systems Safety Definition

Verification regulations: Description Draft to define model for meeting verification

Verification and Validation Steven Zeil February 13, 2013 Verification and Validation

Decidability Decidability and Symbolic Symbolic Verification Symbolic Symbolic Verification

BQ1D: Bringing it all together Umut Gl (LiI Postdoc) Marcel van Gerven (BQ1D coordinator)

07/01/19 Changes from last year What is cognitive neuroscience? Some changes due to last

Precise and Approximate Representation of Numbers The Cartesian-Lagrangian representation of

Simula'ons of cor'cal network models made of stochas'c spiking

Non-linear hypotheses Machine Learning Non-linear Classification x 2 x 1 size # bedrooms #

Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of

Ch 7. Cortical feature maps and competitive population coding Fundamentals of Computational

Microwave Radiation Russell S. Witte, PhD Professor Medical Imaging Optical Sciences

Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi - PowerPoint PPT Presentation

Voice Conversion and Anti-spoofing of Speaker Verification Haizhou Li Acknowledgement: Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Xiaohai Tian 1 Agenda Spoofing Attacks Voice Conversion Artifacts ASVspoof

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

Introduction to Electronic Verification Electronic Verification of Educational Records

Verification and Validation for Safety in Robots Kerstin Eder Design Automation and Verification

INDEPENDENT INTEGRATED VERIFICATION AND VALIDATION (I 2 V 2 ) INDEPENDENT VERIFICATION and

Preparing for Re-Verification Brief Mr. Terrence Moultrie Chief, Verification and Executive

Verification Reporting Public and Nonpublic Verification Collection Report Deb Linderblood

Verification of Robotics and Autonomous Deep Learning Verification Systems Safety Definition

Verification regulations: Description Draft to define model for meeting verification

Verification and Validation Steven Zeil February 13, 2013 Verification and Validation

Decidability Decidability and Symbolic Symbolic Verification Symbolic Symbolic Verification

BQ1D: Bringing it all together Umut Gl (LiI Postdoc) Marcel van Gerven (BQ1D coordinator)

07/01/19 Changes from last year What is cognitive neuroscience? Some changes due to last

Precise and Approximate Representation of Numbers The Cartesian-Lagrangian representation of

Simula'ons of cor'cal network models made of stochas'c spiking

Non-linear hypotheses Machine Learning Non-linear Classification x 2 x 1 size # bedrooms #

Neural Networks Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Ch 7. Cortical feature maps and competitive population coding Fundamentals of Computational

Microwave Radiation Russell S. Witte, PhD Professor Medical Imaging Optical Sciences

Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of