Odyssey 2016 The Speaker and Language Recognition Workshop June - - PowerPoint PPT Presentation

▶

Sep 27, 2023 346 likes •434 views

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016, Bilbao, Spain A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients Massimiliano Todisco, Hctor Delgado and Nicholas Evans

SLIDE 1

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

Odyssey 2016

The Speaker and Language Recognition Workshop

June 21-24, 2016, Bilbao, Spain

The OCTAVE project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 647850.

Massimiliano Todisco, Héctor Delgado and Nicholas Evans Department of Digital Security EURECOM, Sophia Antipolis, France

SLIDE 2

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

Introduction

Spoofing is a manipulation of a biometric system by a fraudulent user
Automatic speaker verification is vulnerable to spoofing
Spoofing algorithm cannot be known in advance
Need for generalised countermeasures
New feature based on constant Q transform were proposed

SLIDE 3

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

From Fourier to constant Q

Fourier transform may lack

frequency resolution

In STFT, the time and frequency

resolutions are constant

Constant Q transform (CQT) is an

alternative which reflects more closely human perception

CQT employs a variable

time/frequency resolution:

greater time resolution for higher

frequencies

greater frequency resolution for

lower frequencies

SLIDE 4

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

Constant Q Cepstral Coefficients (CQCC)

Block diagram of CQCC feature extraction

Combining CQT (in place of STFT) with traditional cepstral analysis
Issue: discrete cosine transform (DCT) cannot be directly applied

§ CQT and DCT have different scale (geometric vs linear) § Geometric DCT bases are no longer orthogonal

Solution: Uniformly resample the non-uniform frequency scale of CQT to a linear

frequency scale

Constant-Q Transform Power spectrum LOG DCT

speech signal

Uniform resampling

CQCC

SLIDE 5

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

Experiments & Results on ASVspoof 2015 Database

Comparison of results (EER [%]) on ASVspoof2015 Database

Front-end: CQCC-A (19+0th second derivative coefficients) Back-end: 2 GMMs (512 components, EM training), one for human speech and one for spoofed speech

Known attacks: all the systems deliver excellent error rates
Unknown attacks: CQCC features give best performance

§ Attack S10 (unit selection synthesis): EER = 1.065% à 87% relative improvement

Best spoofing detection performance (72% relative improvement) reported to date

SLIDE 6

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients

Poster place

SLIDE 7

A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients