Speaker Recognition and Speaker Recognition and the ETSI Standard - - PowerPoint PPT Presentation

speaker recognition and speaker recognition and the etsi
SMART_READER_LITE
LIVE PREVIEW

Speaker Recognition and Speaker Recognition and the ETSI Standard - - PowerPoint PPT Presentation

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front- -End End Recognition Front Charles Broun David Pearce William Campbell Holly Kelleher Motorola


slide-1
SLIDE 1

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front Recognition Front-

  • End

End

Charles Broun William Campbell

Motorola Labs Human Interface Lab Tempe, Arizona, USA

David Pearce Holly Kelleher

Motorola Limited Basingstoke, UK

slide-2
SLIDE 2

2

Outline Outline Outline

  • Background
  • Speaker Verification

– Embedded Process – Distributed Process

  • Distributed Speech Recognition (DSR)
  • Classifier
  • Experimental Setup
  • Results
  • Conclusion
slide-3
SLIDE 3

3

Background Background Background

Motivation

  • Issues with Embedded Solutions

Mobile devices do not have the necessary memory or battery capacity Updating software requires access to each device Multiple devices may contain different speaker models

  • Potential Benefits of Distributed Solutions

Server supports computation and memory requirements Software updates are handled in a single location A single speaker model may support multiple mobile devices – enabling a ‘portable’ interface

  • Distributed Speech Recognition (DSR) Standard

This standard addresses the above issues for speech recognition Can work on this standard be leveraged for speaker verification?

slide-4
SLIDE 4

4

Speaker Verification Speaker Verification Speaker Verification

Embedded Process

  • Feature extractor and classifier typically combined

into a proprietary solution

  • Can jointly optimize both components
  • Target system must support computation & memory

requirements of both components

Feature Extractor Input Speech Data Score Speaker Model Compare to Threshold, T Accept >T Reject <T Classifier

slide-5
SLIDE 5

5

Speaker Verification Speaker Verification Speaker Verification

Distributed Process

  • Feature extractor is standardized
  • Cannot jointly optimize both components
  • Client only supports computational & memory

requirements of feature extractor

  • Server supports higher load of classifier

Feature Extractor Input Speech Data Score Speaker Model Compare to Threshold, T Accept >T Reject <T Classifier Wireless Channel

slide-6
SLIDE 6

6

DSR DSR DSR

Background of DSR Standard

  • Motivation of Standard Front-End

Potential benefits of distributed solutions for speech recognition Eliminates voice/vocoder channel mismatch

  • Activities

European Telecommunications Standards Institute (ETSI) Aurora Working Group within ETSI First standard published in February 2000

slide-7
SLIDE 7

7

DSR DSR DSR

ETSI Standard DSR System Concept

  • Terminal front-end targeted to mobile devices
  • Features transmitted over a low-error data channel
  • Speech recognizer runs on high power server

Server DSR Back-End Terminal DSR Front-End

Parameterisation M el-Cepstrum Compression Split V Q Frame Structure & Error Protection Error Detection & M itigation Decompression

Recognition W ireless Data Channel – 4.8 kbit/s

slide-8
SLIDE 8

8

DSR DSR DSR

ETSI Standard DSR Front-End

  • Feature set consists of 12 mel-cepstum coefficient, logE, C0
  • Quantization supports a data rate of 4800 b/s
  • Error protection supports robustness to transmission errors

ADC Offcom Framing PE W FFT MF LOG DCT logE Feature Compression Bit Stream Formatting Input Speech To Transmission Channel Abbreviations: ADC Offcom PE logE W FFT MF LOG DCT Analog-to-digital conversion Offset compensation Pre-emphasis Energy measure computation Windowing Fast Fourier transform Mel-filtering Non-linear transform Discrete cosine transform

slide-9
SLIDE 9

9

Classifier Classifier Classifier

Polynomial Classifier

  • Compute the polynomial basis vector
  • Apply a polynomial discriminant

function

  • Compute the score as the average

across all frames

) ( ) , ( x p w w x

t

d =

∑ ∑

= =

= =

M k k t M k k t

M M s

1 1

) ( 1 ) ( 1 x p w x p w

[ ]

[ ]

t t

x x x x x x K x x

2 2 2 1 2 1 2 1 2 1

1 ) ( 2 and Given = = = x p x

DSR Feature Vectors Score Speaker Model w Discriminant Function d(x,w) Average

Σ

k

x s

Polynomial Basis Vector p(x)

slide-10
SLIDE 10

10

Experimental Setup Experimental Setup Experimental Setup

YOHO Database

  • 138 speakers
  • Enrollment

– 4 sessions – 24 phrases – “23-45-56”

  • Testing

– 10 sessions – 4 phrases – “45-23-56”

Speaker Verification System

  • Classifier: 3rd order polynomial
  • Features: 12 MFCCs from the DSR front-end
  • Channel: GSM bit-error masks
slide-11
SLIDE 11

11

Results Results Results

Performance

Average Equal Error Rate (%) for a 1-Phrase Test

1.66 1.27 1.22 1.22

  • EP2

EP3 EP2 EP1 Error

  • Free

Un- quantized Verify Enroll EP3 EP1 Error-Free Unquantized 1.70 1.30 1.26 1.26

  • 1.67

1.26 1.22 1.22

  • 1.67

1.26 1.22 1.22

  • 1.18
slide-12
SLIDE 12

12

Conclusion Conclusion Conclusion

Demonstrated that the ETSI Standard Distributed Speech Recognition Front-End is viable for speaker verification