Speaker Verification using i-Vectors CAST-F orderpreis - - PowerPoint PPT Presentation

speaker verification using i vectors
SMART_READER_LITE
LIVE PREVIEW

Speaker Verification using i-Vectors CAST-F orderpreis - - PowerPoint PPT Presentation

Speaker Verification using i-Vectors CAST-F orderpreis IT-Sicherheit 2014 Andreas Nautsch Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group Darmstadt, 20.11.2014 . . . . . . .. . .. . .. .. .. .. .. .. .. ..


slide-1
SLIDE 1

Speaker Verification using i-Vectors

CAST-F¨

  • rderpreis IT-Sicherheit 2014

Andreas Nautsch

Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group

Darmstadt, 20.11.2014

. . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 1/22

slide-2
SLIDE 2

Outline

◮ Motivation with research questions ◮ Speaker verification and i-vectors ◮ Research and development ◮ Conclusion and future perspectives

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 2/22

slide-3
SLIDE 3

Motivation Biometric IT-security & forensic applications

◮ Authentication and recognition by voice ◮ Advantages to knowledge-/token-based approaches:

◮ Cannot be forgotten ◮ Cannot be incorporated

◮ Application fields and scenarios e.g.:

◮ Mobile device authentication: random PINs, short duration ◮ Call-center user validation: free speech, variant duration ◮ Suspect tracking: various contents & signal qualities Voice reference Feature extraction Voice probe Feature extraction Comparison Score Accept Reject Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 3/22

slide-4
SLIDE 4

Motivation Assessment of Speaker Recognition

◮ Known technology: modeling acoustic features

✦ Very accurate due to detailed modeling ✦ Fast processing on short duration scenarios ✪ High computational effort on text-independent scenarios

◮ State-of-the-Art: identity vector (i-vector) features, 2011

✦ Fully text- & language-independent ✦ Fast computation & scoring independent of duration ✪ Unknown behavior in commercial voice biometric scenarios

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 4/22

slide-5
SLIDE 5

Motivation Voice biometrics: speech duration & sample completeness

◮ Sound unit (phoneme) distribution by duration

From [T. Hasan et al., 2013]

◮ Text-independent case: content varies from sample to sample

◮ Long duration ⇒ stable distribution ◮ Short duration ⇒ insufficient data Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 5/22

slide-6
SLIDE 6

Motivation Baseline speaker recognition approach

◮ Hidden-Markov-Models (HMMs)

◮ State-based model ◮ States can represent articulation phases, phonemes, . . .

1 2

p00 p10 p11 p12 p22

◮ Detailed, but extensive computation of optimal path

2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1

Speaker model

  • Impostor model

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 6/22

slide-7
SLIDE 7

Motivation Research questions

  • 1. Is the i-vector approach extensible on short duration scenarios

with applicable performances?

  • 2. Do i-vector systems deliver new information to HMM systems?
  • 3. Are duration-depending performance mismatches

compensable?

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 7/22

slide-8
SLIDE 8

Speaker Verification Speech processing

  • 1. Raw speech signal as air pressure changes
  • 2. Frequency analysis: spectral representation
  • 3. Short-time acoustic features, e.g. Mel-Frequency Cepstral

Coefficients (MFCCs)

Time Air pressure Time Frequency Time MFCCs −50 50 100

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 8/22

slide-9
SLIDE 9

Speaker Verification Statistic model: i-vector extraction

  • 4. Gaussian modeling
  • 5. Speaker sub-spaces from Universal Background Model (UBM)
  • 6. identity vectors (i-vectors) as characteristic offset

Mapping by total variability matrix*

MFCC-1 MFCC-2

Speaker A Speaker B UBM

MFCC-1 MFCC-2

* iteratively in order to optimize the model fit: UBM offset → i-vectors on development data Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 9/22

slide-10
SLIDE 10

Speaker Verification How to imagine i-vectors?

Speaker separation

◮ Relevant parameters

◮ UBM size: detail of the acoustic space ◮ # iterations: adaptation depth of total variability ◮ # characteristic factors: i-vector dimension Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 10/22

slide-11
SLIDE 11

Research & Development Extending the baseline, exploring new technologies

◮ Research on new technologies

◮ Experimental evaluations on i-vectors ◮ Commercial and academic scenarios ◮ Participation in international research evaluation

◮ Developing more robust approaches

◮ Extending state-of-the-art i-vector score normalization ◮ Implementation of speaker verification framework in Matlab

according to ISO/IEC IS 19795-1

Information technology – Biometric performance testing and reporting – Part 1: Principles and framework ⇒ reproducible research

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 11/22

slide-12
SLIDE 12

Research & Development Commercial scenario: experimental set-up

◮ Aiming at research questions

  • 1. i-vectors on short duration scenarios?
  • 2. New information to HMMs by i-vectors?

◮ Short but fix duration scenario

◮ In-house database: 3 – 5 German digits ◮ Text-independent: random sequences ◮ 362 / 56 / 300 subjects (development, calibration, evaluation) ◮ 30 – 34 reference / 2 probe samples

≈ 200,000 comparisons

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 12/22

slide-13
SLIDE 13

Research & Development Examining i-vector parameters

UBM size: 128 UBM size: 256 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) UBM size: 512 UBM size: 1024 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 13/22

slide-14
SLIDE 14

Research & Development Performance analysis

UBM size: 128 UBM size: 256 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) UBM size: 512 UBM size: 1024 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) 1 2 5 10 200 300 400 600 2 5 Iterations Factors EER (in %) .1.2 .5 1 2 5 10 20 40 .1 .2 .5 1 2 5 10 20 40 False Match Rate (FMR in %) False Non-Match Rate (FNMR in %) Detection Error Tradeoff diagram i-vector-128 i-vector-128 i-vector-256 i-vector-256 i-vector-512 i-vector-512 30 FNMs Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 14/22

slide-15
SLIDE 15

Research & Development Information analysis: cross-entropy of system fusion

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

HMM

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

i-vector-256

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

HMM+i-vector-256 Optimizing min Entropy(η ≈ 4.6 = odds 1:100)

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 15/22

slide-16
SLIDE 16

Research & Development Performance gains by fusion

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

HMM

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

i-vector-256

−10 −5 5 10 0.5 1 Bayesian thresholds η Normalized entropy

HMM+i-vector-256 Optimizing min Entropy(η ≈ 4.6 = odds 1:100)

.1.2 .5 1 2 5 10 20 40 .1 .2 .5 1 2 5 10 20 40 FMR (in %) FNMR (in %) HMM i-vector-256 HMM+i-vector-256 30 FNMs Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 16/22

slide-17
SLIDE 17

Research & Development Academic scenario: experimental set-up

◮ Aiming at research question

  • 3. Compensation of duration mismatches on i-vectors?

◮ Variable duration scenario

◮ Data of 2013 – 2014 NIST i-vector Machine Learning challenge ◮ Text-independent, multi-lingual scenario ◮ 4,781 / 1,306 subjects (development, evaluation) ◮ 5 reference samples / 9,634 probe i-vectors

≈ 12,000,000 comparisons

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 17/22

slide-18
SLIDE 18

Research & Development Analysis of a variant duration scenario

◮ 10 offline 5-fold cross-validations ◮ NIST baseline system

2048-component UBM, 60 MFCCs, 600-dim i-vectors

◮ Adaptive Symmetric (AS) score-normalization

S′ = 1

2

  • S−µreference

σreference

+ S−µprobe

σprobe

  • , each from top-100 scores of comparisons to dev-set

5s 10s 20s 40s full 1 2 5 10 EER (in %) Baseline AS-norm Baseline, all AS-norm, all Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 18/22

slide-19
SLIDE 19

Research & Development Proposing duration-invariant AS score-normalization

◮ Observation: independent i-vector factors ⇒ poor comparisons

dev-set full 40s 20s 10s 5s 5s 141 91 66 44 10s 230 140 70 20s 246 118 40s 180 full (Student t-test)

◮ Idea:

◮ On probes: dev-set duration > 60s ◮ On references: dev-set duration ≈ probe duration System EER (in %) all 5s 10s 20s 40s full Baseline 2.56 0.89 0.95 0.93 0.92 0.89 0.86 AS-norm 2.49 0.17 1.18 0.41 0.18 0.08 0.05 dAS-norm 2.06 0.10 0.35 0.20 0.11 0.07 0.07 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 19/22

slide-20
SLIDE 20

Conclusion and perspectives Conclusion and future perspectives

◮ Research questions positively confirmed

  • 1. i-vectors are applicable to short duration scenarios
  • 2. Relative cross-entropy gain: 48% to HMM baseline
  • 3. Performance mismatch compensation: 89% to baseline

◮ Examining probabilistic i-vector comparators e.g.,

Probabilistic Linear Discriminant Analysis (PLDA)

◮ Analyse effects on other speech signal features

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 20/22

slide-21
SLIDE 21

Conclusion and perspectives Community impact

◮ XI IAPR/IEEE Int.l Summer school for advanced studies on

biometrics: Biometrics for Forensics, Security and beyond, Alghero, Italy, June 9 – 13, 2014

Presentation to well-established biometric researchers

◮ ISCA Odyssey 2014: The Speaker and Language Recognition

Workshop, Joensuu, Finland, June 16 – 19, 2014

  • A. Nautsch, C. Rathgeb, C. Busch, H. Reininger, and K. Kasper:

Towards Duration Invariance of i-Vector-based Adaptive Score Normalization, ISCA Odyssey, 2014.

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 21/22

slide-22
SLIDE 22 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Andreas Nautsch, M.Sc.

Doctoral Researcher | Research Area: Secure Services CASED

  • Mornewegstr. 32

64293 Darmstadt/Germany andreas.nautsch@cased.de +49 6151 16-75182 +49 6151 16-4321 Telefon Fax www.cased.de

Acknowledgment: atip GmbH, associated company during B.Sc. and M.Sc. studies, for founding and education.

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-23
SLIDE 23 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

System parameters & metrics

◮ Variable parameters

◮ UBM size ◮ # i-vector factors ◮ # training interations of

total variability matrix

◮ Performance metrics

◮ Equal Error Rate (EER) ◮ False Match Rate (FMR) ◮ False Non-Match Rate (FNMR) ◮ Score-cross entropy .1.2 .51 2 5 10 20 40 .1 .2 .5 1 2 5 10 20 40 FMR (in %) FNMR (in %) Detection Error Tradeoff diagram System 30 FMs 30 FNMs −10 −5 5 10 0.5 1 Bayesian thresholds η

  • Norm. entropy

Hdefault

norm

Hmin

norm

Htot

norm

30 FMs 30 FNMs NISTη Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-24
SLIDE 24 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Real-time evaluation

Table: Enroll/Verify HMM ⇔ i-vector

System Enroll Verify HMM 206.2% 5.5% i-vector-128 6.1% 3.2% i-vector-256 9.9% 3.1% i-vector-512 16.2% 3.3% Baum-Welch statistics T matrix estimation 128 256 512 1024 200 300 400 600 3 3.2 3.4 UBM size Factors ×RT in % 128 256 512 1024 200 300 400 600 10 20 UBM size Factors ×RT in % Enrollments Verifications 128 256 512 1024 200 300 400 600 20 40 UBM size Factors ×RT in % 128 256 512 1024 200 300 400 600 3 4 UBM size Factors ×RT in % Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-25
SLIDE 25 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Illustration of total variability matrix

500 1,000 1,500 2,000 100 200 300 UBM offset vector i-vector elements −5 5

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-26
SLIDE 26 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Analyzing i-vector processing steps

◮ State-of-the-Art: spherical space projection

Raw i-vectors → Spherical i-vectors From [N. Dehak et al., 2011]

◮ On short-duration scenario (calibration set)

64 128 256 512 1024 2048 1 2 5 10 20 UBM size Equal Error Rate EER (in %) Raw i-vectors Spherical i-vectors 5% EER cut-off Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-27
SLIDE 27 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Duration-invariant Adaptive Symmetric score normalization

Development set i-vectors Subset z-norm Subset t-norm Probe i-vector with duration Λsubset dprobe == dΛ Λ>60 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-28
SLIDE 28 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Probabilistic Linear Discriminant Analysis (PLDA)

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-29
SLIDE 29 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Evaluation framework in Matlab

Biometric characteristics Presentation Sample Data Capture Sensor Voice Activity Detection VAD samples Features Signal Processing Enrolment database Comparator Identity claim Threshold Verification

  • utcome

Decision Score normalisation Simlarity score Enrolment Verification Comparison scores Normalised scores Reference Reference Probe Feature extraction Fused (calibrated) scores Score fusion/calibration Evaluation Metrics & plots UBM Features GMMs Template creation

Figure: Framework design according to ISO/IEC 19795-1

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

slide-30
SLIDE 30 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . .

Evaluation framework in Matlab

Voice Activity Detection Signal Processing Comparator Threshold Verification

  • utcome

Decision Score normalisation Simlarity score Feature extraction Template creation UBM Features Score fusion/calibration Evaluation Metrics & plots atip VAD HTK Matlab & rastamat HTK Matlab & Joint Factor Analysis Matlab Demo Matlab & Joint Factor Analysis Matlab Demo Matlab BOSARIS toolkit BOSARIS toolkit Matlab Matlab Matlab & HDF5 Matlab & HDF5 Matlab & HDF5 Matlab & HDF5 Raw-PCM files Matlab & HDF5 Matlab & HDF5

Figure: Framework & Toolboxes

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014