Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF - - PowerPoint PPT Presentation

michael l p wong martin j russell
SMART_READER_LITE
LIVE PREVIEW

Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF - - PowerPoint PPT Presentation

Speaker Verification Under Additive Noise Conditions With Non-stationary SNR Using PMC Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF BIRMINGHAM References M.J. Gales and S.J. Young, Robust Continuous Speech


slide-1
SLIDE 1

1 of 37

THE UNIVERSITY OF BIRMINGHAM

Speaker Verification Under Additive Noise Conditions With Non-stationary SNR Using PMC

Michael L P Wong & Martin J Russell

slide-2
SLIDE 2

2 of 37

THE UNIVERSITY OF BIRMINGHAM

References

  • M.J. Gales and S.J. Young, “Robust Continuous Speech Recognition

Using Parallel Model Combination,” IEE Transactions on Speech and Audio Processing, Vol. 4, No. 5, pp. 352-359, September 1996.

  • T. Matsui, T. Kanno, S. Furui, “Speaker Recognition Using HMM

Composition in Noisy Environments,” Computer Speech and Language, Vol. 10, pp. 107- 116, 1996.

  • O. Bellot, D. Matrouf, T. Merlin and Jean-Francois Bonastre, “Additive

and Convolutive Noises Compensation for Speaker Recognition”, Proceedings of the ICSLP 2000 Beijing, China, 2000.

slide-3
SLIDE 3

3 of 37

THE UNIVERSITY OF BIRMINGHAM

Task Definition

  • Clean verification speech : Good
  • Noise-contaminated verification speech

with non-stationary SNR : Bad

slide-4
SLIDE 4

4 of 37

THE UNIVERSITY OF BIRMINGHAM

Preview of Results

  • Clean speech models tested on non-

stationary SNR phrases

– Speech noise : 38.55% EER – Operations room noise : 34.78% EER

  • Performance of compensated models

– Speech noise : 19.92% EER – Operations room noise : 18.84% EER

slide-5
SLIDE 5

5 of 37

THE UNIVERSITY OF BIRMINGHAM

Structure of Presentation

  • Stage One

– Evaluation of PMC on speaker verification tasks : stationary SNR conditions

  • Stage Two

– Recognition of unknown SNR conditions

  • Stage Three

– Modelling the dynamics of SNR in noise- contaminated verification phrases

slide-6
SLIDE 6

6 of 37

THE UNIVERSITY OF BIRMINGHAM

Problem Formulation

  • Text-dependent speaker verification
  • Deployment in dynamic real world

environments

  • Model based approach
  • Ultimately multi noise multi SNR scenario
slide-7
SLIDE 7

7 of 37

THE UNIVERSITY OF BIRMINGHAM

Evaluation Using PMC

  • Successful in improving the

performance of ASR systems

  • Based on work by Mark Gales
  • Evaluate use of PMC in text-dependent

speaker verification tasks

slide-8
SLIDE 8

8 of 37

THE UNIVERSITY OF BIRMINGHAM

Performance of PMC in ASR Experiments

Reference : Gales

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 18 12 6

  • 6

Signal to noise ratio (dB) Accuracy Un-compensated Compensated

slide-9
SLIDE 9

9 of 37

THE UNIVERSITY OF BIRMINGHAM

Design Criteria

  • Additive noises considered
  • Scaling to be performed on noises
  • Compensate only for static parameters

)) exp( ) log(exp( )) exp( ) log(exp( l N g l S l N S l N g l S l N S Σ + Σ = ⊗ Σ + = ⊗ µ µ µ

slide-10
SLIDE 10

10 of 37

THE UNIVERSITY OF BIRMINGHAM

Implementation

  • Selection of databases
  • Preparation of data
  • System Structure
  • Scoring Procedures
slide-11
SLIDE 11

11 of 37

THE UNIVERSITY OF BIRMINGHAM

Selection of Databases

  • Yoho speaker verification database

– Standard database used, performance comparison available

  • Timit database

– Used for the initialisation of isolated phone models prior to Yoho training

  • Noisex-92 noise database

– Selection of repetitive noise sources. Two noise sources reported in this paper. Speech noise and operations room noise

slide-12
SLIDE 12

12 of 37

THE UNIVERSITY OF BIRMINGHAM

Preparation of Data

  • Scaling of both enrolment and verification

data

  • Measurement of verification speech power

– Silence periods ignored [ref 7, ITU-T Rec.]

  • Mixing of speech and noise from –18dB to

+18dB at 6dB intervals. Retain multiplication factor, g, and take an average

slide-13
SLIDE 13

13 of 37

THE UNIVERSITY OF BIRMINGHAM

System Structure

  • Front-end

– 25ms, Hamming windowed, MEL scale warped – 12 cepstral coefficients with 0th energy appended, 1st and 2nd order derivatives included

  • HTK Software for both training and

recognition

  • 3 state 4 component tied-triphone speaker

dependent models, 1 state 4 component noise models

slide-14
SLIDE 14

14 of 37

THE UNIVERSITY OF BIRMINGHAM

System Structure

  • Training

– 96 phrases per speaker – 118 authorised – 20 for General Speaker model

  • Recognition

– 40 phrases used for both FR and FA experiments

slide-15
SLIDE 15

15 of 37

THE UNIVERSITY OF BIRMINGHAM

Scoring Procedures

  • Likelihood ratio test employed
  • Performance quoted in % EER

t GSM X P S X P ≥ ) | ( ) | (

slide-16
SLIDE 16

16 of 37

THE UNIVERSITY OF BIRMINGHAM

Experiment Methodology

  • Establish baseline performance using clean

speaker models and clean verification data

  • Evaluate performance of clean speaker

models under multi SNR verification data

  • Evaluate performance of PMC compensated

speaker models under multi SNR verification data

slide-17
SLIDE 17

17 of 37

THE UNIVERSITY OF BIRMINGHAM

Un-compensated Models

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 18 16 14 12 10 8 6 4 2

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16
  • 18

Signal to Noise Ratio (dB) Equal Error Rate (%) Operations Room Noise Speech Noise

Clean speech and models performance = 0.57%

slide-18
SLIDE 18

18 of 37

THE UNIVERSITY OF BIRMINGHAM

Compensated Models

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 18 12 6

  • 6
  • 12
  • 18

Signal to Noise Ratio (dB) Equal Error Rate (%) Operations Room Noise Speech Noise Operations Room Noise (Std) Speech Noise (Std)

slide-19
SLIDE 19

19 of 37

THE UNIVERSITY OF BIRMINGHAM

Stage One Summary

  • Text-dependent SV task
  • HTK Software used with modifications for

PMC

  • Yoho, Timit and Noisex-92 databases used
  • 7 SNR scenarios considered (-18dB to

+18dB)

slide-20
SLIDE 20

20 of 37

THE UNIVERSITY OF BIRMINGHAM

Stage One Summary

  • PMC improves SV performance
  • 2 additive noises considered
  • Static parameters compensated
  • Baseline used : clean models,

clean/contaminated speech

slide-21
SLIDE 21

21 of 37

THE UNIVERSITY OF BIRMINGHAM

Experimental Extension

  • We now have 7 SNR specific PMC

models

  • Can SNR specific PMC models be used

for other SNRs? How sensitive are they?

  • If yes, how well do they perform?
slide-22
SLIDE 22

22 of 37

THE UNIVERSITY OF BIRMINGHAM

Evaluation of Non-ideal PMC Models

  • For each SNR specific PMC model,

perform SV task on noise contaminated verification phrases from –18dB to +18dB at 2dB intervals

  • Observe any degradation in

performance from using non-ideal models

slide-23
SLIDE 23

23 of 37

THE UNIVERSITY OF BIRMINGHAM

Speech Noise Result

Speech Noise

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 18 16 14 12 10 8 6 4 2

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16
  • 18

Signal to Noise Ratio (dB) Equal Error Rate (%) 18dB (0.06645) 12dB (0.132584) 6dB (0.264541) 0dB (0.527828)

  • 6dB (1.053155)
  • 12dB (2.101321)
  • 18dB (4.192687)
slide-24
SLIDE 24

24 of 37

THE UNIVERSITY OF BIRMINGHAM

Operations Room Noise Result

Operations Room Noise 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 18 16 14 12 10 8 6 4 2

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16
  • 18

Signal to Noise Ratio (dB) Equal Error Rate (%) 18dB (0.074531) 12dB (0.14871) 6dB (0.296715) 0dB (0.592023)

  • 6dB (1.181242)
  • 12dB (2.356887)
  • 18dB (4.702608)
slide-25
SLIDE 25

25 of 37

THE UNIVERSITY OF BIRMINGHAM

Discussion

  • Allow the selection of SNR specific

PMC models based on which has the highest probability for a given

  • bservation
slide-26
SLIDE 26

26 of 37

THE UNIVERSITY OF BIRMINGHAM

Automatic Model Selection

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 18 16 14 12 10 8 6 4 2

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16
  • 18

Signal to Noise Ratio (dB) Equal Error Rate (%) Operations Room Noise Speech Noise

slide-27
SLIDE 27

27 of 37

THE UNIVERSITY OF BIRMINGHAM

Stage Two Summary

  • Limiting the number of SNR specific

PMC models to 7 does not affect SV performance on unknown SNR

  • Better performance is achieved by

automatic selection of models

slide-28
SLIDE 28

28 of 37

THE UNIVERSITY OF BIRMINGHAM

Varying SNR Task

slide-29
SLIDE 29

29 of 37

THE UNIVERSITY OF BIRMINGHAM

Modelling SNR Dynamics

  • Operating models in parallel assumes

that SNR changes occur at model boundaries

  • Create one model from multiple models,

with the SNR dynamics embedded within the transition probabilities

slide-30
SLIDE 30

30 of 37

THE UNIVERSITY OF BIRMINGHAM

Implementation of a Composite HMM

  • Rows and columns correspond to

different SNR, 1st row = entry probability

                   

− − − + + + 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 1 . 1 . 4 . 1 . 1 . 1 . 1 . 1 . 2 . 3 . 18dB 12dB 6dB 0dB 6dB 12dB 18dB Entry

slide-31
SLIDE 31

31 of 37

THE UNIVERSITY OF BIRMINGHAM

Implementation of a Composite HMM

  • 3 dimensional model
  • 1 state noise model
  • 3 state speech model
  • 7 state SNR model

SNR Speech Noise

slide-32
SLIDE 32

32 of 37

THE UNIVERSITY OF BIRMINGHAM

Expectations

  • Extracting true SNR dynamics and

embedding it into the transition probabilities will further improve performance

[ to be evaluated ]

slide-33
SLIDE 33

33 of 37

THE UNIVERSITY OF BIRMINGHAM

Varying SNR Task

slide-34
SLIDE 34

34 of 37

THE UNIVERSITY OF BIRMINGHAM

Evaluation Using Non- stationary SNR Utterances

  • Clean speech models tested on non-

stationary SNR phrases

– Speech noise : 38.55% EER – Operations room noise : 34.78% EER

  • Performance of compensated models

– Speech noise : 19.92% EER – Operations room noise : 18.84% EER

slide-35
SLIDE 35

35 of 37

THE UNIVERSITY OF BIRMINGHAM

Stage Three Summary

  • Composite 3-D HMM created
  • SNR dynamics embedded into transition

probabilities

  • Improvement in performance observed
slide-36
SLIDE 36

36 of 37

THE UNIVERSITY OF BIRMINGHAM

Conclusion

  • PMC improves SV performance under

both stationary and varying speech SNR

  • SNR dynamics can be embedded into

the HMM structure, providing additional information

slide-37
SLIDE 37

37 of 37

THE UNIVERSITY OF BIRMINGHAM

Work In Progress

  • Currently : known noise, unknown SNR
  • Ideally : unknown noise, unknown SNR
  • Tracking SNR transitions
  • Comparison with other robust methods
  • Establishing another baseline using

matched recognition