Integrated Presentation Attack Detection and Automatic Speaker - PDF document

Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion Massimiliano Todisco 1 , H´ ector Delgado 1 , Kong Aik Lee 2 , Md Sahidullah 3 , Nicholas Evans 1 , Tomi Kinnunen 4 and Junichi Yamagishi 5 , 6 1 Department of Digital Security, EURECOM, France 2 Data Science Research Laboratories, NEC Corporation, Japan 3 MULTISPEECH, Inria, France 4 School of Computing, University of Eastern Finland, Finland 5 Digital Content and Media Sciences Research Division, National Institute of Informatics, Japan 6 Centre of Speech Technology Research, University of Edinburgh, U.K. { todisco,delgado,evans } @eurecom.fr,k-lee@ax.jp.nec.com, md.sahidullah@inria.fr, tkinnu@cs.uef.fi, jyamagis@nii.ac.jp Abstract use features designed for ASV, the use of different front-ends augments computational complexity. The vulnerability of automatic speaker verification (ASV) sys- It can hence be convenient to use a single front-end. The tems to spoofing is widely acknowledged. Recent years have use of such a single front-end avoids redundant processing and seen an intensification in research efforts to develop spoofing can also simplify the combination of ASV and PAD decisions. countermeasures, also known as presentation attack detection The search for features which perform well for a combined ASV (PAD) systems. Much of this work has involved the exploration and PAD task is the subject of this paper. of features that discriminate reliably between bona fide and A second contribution relates to the manner in which ASV spoofed speech. While there are grounds to use different front- and PAD systems scores can be combined. It extends previ- ends for ASV and PAD systems (they are different tasks) the ous work [1] which proposed cascade and parallel approaches use of a single front-end has obvious benefits, not least conve- to system combination and is similar in nature to the combina- nience and computational efficiency, especially when ASV and tion architecture reported in [2]. New to this paper is a two- PAD are combined. This paper investigates the performance of dimensional score modelling technique which avoids the joint a variety of different features used previously for both ASV and optimisation of separate ASV and PAD decision thresholds. PAD and assesses their performance when combined for both The explicit modelling of target and impostor trial scores en- tasks. The paper also presents a Gaussian back-end fusion ap- compassing genuine, bona fide trials in addition to both zero- proach to system combination. In contrast to cascaded architec- effort and spoofed impostor trials provides for greater flexibil- tures, it relies upon the modelling of the two-dimensional score ity in decision boundaries and hence more reliable decisions. distribution stemming from the combination of ASV and PAD The merits of these two contributions are assessed through ex- in parallel. This approach to combination is shown to gener- periments with the ASVspoof 2017 database of bona fide and alise particularly well across independent ASVspoof 2017 v2.0 spoofed speech signals and protocols for the assessment of com- development and evaluation datasets. bined ASV and PAD systems. Index Terms : automatic speaker verification, spoofing, coun- The remainder of the paper is organised as follows. Sec- termeasures, presentation attack detection tion 2 describes the different front-ends used in this work. The approach to system combination is presented in Section 3. Ex- 1. Introduction periments are reported in Section 4 whereas results are reported Presentation attack detection (PAD) systems capable of detect- in Section 5. Conclusions are presented in Section 6. ing and deflecting so-called spoofing attacks, or presentation attack (PA) in ISO/IEC 30107 1 nomenclature, leveled at au- 2. Front-end processing tomatic speaker verification (ASV) systems have been under development for a number of years. While ASV systems aim This paper aims to determine a common front-end for both to verify the identity claimed by a speaker, PAD systems aim ASV and PAD tasks. While ASV calls for features that capture to verify the authenticity of the speech signal itself, namely speaker-discriminant information, PAD systems rely on features whether it is bona fide speech or whether, instead, it is artifi- that capture the tell-tale signs of spoofing. The study includes cially created or somehow manipulated, i.e. spoofed . four different front-ends, each of which is described here. While early PAD systems used features similar to those Mel-frequency cepstral coefficients (MFCCs) : MFCCs are used for ASV, being distinctly different tasks, most efforts to de- used widely in speech and speaker recognition and have been velop effective PAD systems have focused on the design of new explored extensively as features for spoofing detection [3]. features tailored to discriminate between bona fide and spoofed MFCCs are usually derived from short-time Fourier transform speech. While the use of features designed specifically for PAD (STFT) decompositions, the application of perceptually moti- have been shown to give better performance than systems that vated Mel-frequency scaled filterbank [4] and standard cepstral 1 https://www.iso.org/standard/67381.html analysis.

Integrated Presentation Attack Detection and Automatic Speaker - PDF document

Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion Massimiliano Todisco 1 , H ector Delgado 1 , Kong Aik Lee 2 , Md Sahidullah 3 , Nicholas Evans 1 , Tomi Kinnunen 4 and

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

MiTM Attack MiTM Attack Edri Guy Edri Guy May 29 ,2013 May 29 ,2013 PC-Labs May 29 2013

Distributed Measurements for Attack Detection Distributed Measurements for Attack Detection Prof.

Automatic Defect Detection Andrzej Wasylkowski Overview Automatic Defect Detection

Automatic Disfluency Automatic Disfluency Detection in Multi-party Detection in Multi-party

Large-scale Evaluation of Distributed Attack Detection Thomas Gamer, Christoph P. Mayer Institut

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Automatic Key Detection Computer Music Seminar Leon Wittwer June 28, 2017 Table of Contents

.tr DDoS Attack December 2015 Attila zgit .tr ccTLD Manager Dec, 2015 .tr DDoS Attack A

Attack on Traffic Systems These attack examples have happened in the past. We will take an

cc cc . Earthquake attack ccc ccc . Sticker injection attack The day disaster struck the

Identifying Attack Vectors Professor Larry Heimann Web Application Security Information Systems

Digital Identity for University People G. Gentili 1 , F. Ippoliti 2 , M. Maccari 1 , A. Polzonetti

Security in the PEPPOL infrastructure Presentation for OASIS BUSDOX TC, March 2011 Thomas

OCITA Central Ohio Fall Event Sponsored by the Ohio County/City Information Technology

Data Mining for Translation to Practice Chih-Lin Chi, Ph.D., M.B.A. Assistant Professor, School

Semantic Interoperability Courses Course Module 1 Introduction and overview of existing

South Florida South Florida Ecosystem Ecosystem Restoration Restoration I ntegrated Delivery

Agenda Joint Public Workshop for Minimum Flows and Levels Priority Lists and Schedules 1.

Contributions of tungsten-fibre reinforced tungsten composites to divertor concepts of future