Speaker Recognition and Speaker Recognition and the ETSI Standard - - PowerPoint PPT Presentation

▶

Dec 14, 2023 245 likes •380 views

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front- -End End Recognition Front Charles Broun David Pearce William Campbell Holly Kelleher Motorola

SLIDE 1

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front Recognition Front-

End

Charles Broun William Campbell

Motorola Labs Human Interface Lab Tempe, Arizona, USA

David Pearce Holly Kelleher

Motorola Limited Basingstoke, UK

SLIDE 2

Outline Outline Outline

Background
Speaker Verification

– Embedded Process – Distributed Process

Distributed Speech Recognition (DSR)
Classifier
Experimental Setup
Results
Conclusion

SLIDE 3

Background Background Background

Motivation

Issues with Embedded Solutions

Mobile devices do not have the necessary memory or battery capacity Updating software requires access to each device Multiple devices may contain different speaker models

Potential Benefits of Distributed Solutions

Server supports computation and memory requirements Software updates are handled in a single location A single speaker model may support multiple mobile devices – enabling a ‘portable’ interface

Distributed Speech Recognition (DSR) Standard

This standard addresses the above issues for speech recognition Can work on this standard be leveraged for speaker verification?

SLIDE 4

Speaker Verification Speaker Verification Speaker Verification

Embedded Process

Feature extractor and classifier typically combined

into a proprietary solution

Can jointly optimize both components
Target system must support computation & memory

requirements of both components

Feature Extractor Input Speech Data Score Speaker Model Compare to Threshold, T Accept >T Reject <T Classifier

SLIDE 5

Speaker Verification Speaker Verification Speaker Verification

Distributed Process

Feature extractor is standardized
Cannot jointly optimize both components
Client only supports computational & memory

requirements of feature extractor

Server supports higher load of classifier

Feature Extractor Input Speech Data Score Speaker Model Compare to Threshold, T Accept >T Reject <T Classifier Wireless Channel

SLIDE 6

DSR DSR DSR

Background of DSR Standard

Motivation of Standard Front-End

Potential benefits of distributed solutions for speech recognition Eliminates voice/vocoder channel mismatch

Activities

European Telecommunications Standards Institute (ETSI) Aurora Working Group within ETSI First standard published in February 2000

SLIDE 7

DSR DSR DSR

ETSI Standard DSR System Concept

Terminal front-end targeted to mobile devices
Features transmitted over a low-error data channel
Speech recognizer runs on high power server

Server DSR Back-End Terminal DSR Front-End

Parameterisation M el-Cepstrum Compression Split V Q Frame Structure & Error Protection Error Detection & M itigation Decompression

Recognition W ireless Data Channel – 4.8 kbit/s

SLIDE 8

DSR DSR DSR

ETSI Standard DSR Front-End

Feature set consists of 12 mel-cepstum coefficient, logE, C0
Quantization supports a data rate of 4800 b/s
Error protection supports robustness to transmission errors

ADC Offcom Framing PE W FFT MF LOG DCT logE Feature Compression Bit Stream Formatting Input Speech To Transmission Channel Abbreviations: ADC Offcom PE logE W FFT MF LOG DCT Analog-to-digital conversion Offset compensation Pre-emphasis Energy measure computation Windowing Fast Fourier transform Mel-filtering Non-linear transform Discrete cosine transform

SLIDE 9

Classifier Classifier Classifier

Polynomial Classifier

Compute the polynomial basis vector
Apply a polynomial discriminant

function

Compute the score as the average

across all frames

) ( ) , ( x p w w x

d =

∑ ∑

= =

M k k t M k k t

M M s

1 1

) ( 1 ) ( 1 x p w x p w

[ ]

t t

x x x x x x K x x

2 2 2 1 2 1 2 1 2 1

1 ) ( 2 and Given = = = x p x

DSR Feature Vectors Score Speaker Model w Discriminant Function d(x,w) Average

Σ

x s

Polynomial Basis Vector p(x)

SLIDE 10

Experimental Setup Experimental Setup Experimental Setup

YOHO Database

138 speakers
Enrollment

– 4 sessions – 24 phrases – “23-45-56”

Testing

– 10 sessions – 4 phrases – “45-23-56”

Speaker Verification System

Classifier: 3rd order polynomial
Features: 12 MFCCs from the DSR front-end
Channel: GSM bit-error masks

SLIDE 11

Results Results Results

Performance

Average Equal Error Rate (%) for a 1-Phrase Test

1.66 1.27 1.22 1.22

EP3 EP2 EP1 Error

Free

Un- quantized Verify Enroll EP3 EP1 Error-Free Unquantized 1.70 1.30 1.26 1.26

1.67

1.26 1.22 1.22

1.67

1.26 1.22 1.22

1.18

SLIDE 12

Speaker Recognition and Speaker Recognition and the ETSI Standard - - PowerPoint PPT Presentation

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front Recognition Front-

End

Charles Broun William Campbell

David Pearce Holly Kelleher

Outline Outline Outline

– Embedded Process – Distributed Process

Background Background Background

Motivation

Speaker Verification Speaker Verification Speaker Verification

Embedded Process

into a proprietary solution

requirements of both components

Speaker Verification Speaker Verification Speaker Verification

Distributed Process

requirements of feature extractor

DSR DSR DSR

Background of DSR Standard

DSR DSR DSR

ETSI Standard DSR System Concept

DSR DSR DSR

ETSI Standard DSR Front-End

Classifier Classifier Classifier

Polynomial Classifier

function

across all frames

Σ

Experimental Setup Experimental Setup Experimental Setup

YOHO Database

Speaker Verification System

Results Results Results

Performance

Conclusion Conclusion Conclusion

Demonstrated that the ETSI Standard Distributed Speech Recognition Front-End is viable for speaker verification