Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien - - PowerPoint PPT Presentation

bayesian and discriminative speaker adaptation
SMART_READER_LITE
LIVE PREVIEW

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien - - PowerPoint PPT Presentation

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien Huang Huang Chih Supervisor : Prof. Jen-Tzung Chien National Cheng Kung University Outline INTRODUCTION INTRODUCTION INTRODUCTION LARGE VOCABULARY CONTINUOUS


slide-1
SLIDE 1

Bayesian and Discriminative Speaker Adaptation

Chih Chih-

  • Hsien

Hsien Huang Huang

Supervisor : Prof. Jen-Tzung Chien National Cheng Kung University

slide-2
SLIDE 2

2

Outline

INTRODUCTION LARGE VOCABULARY CONTINUOUS SPEECH

RECOGNITION

CONTRIBUTIONS OF DISSERTATION BAYESIAN DURATION ADAPTATION

  • INTRODUCTION

INTRODUCTION

  • LARGE VOCABULARY CONTINUOUS SPEECH

LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION RECOGNITION

  • KEYPOINTS OF THIS TALK

KEYPOINTS OF THIS TALK

  • BAYESIAN DURATION ADAPTATION

BAYESIAN DURATION ADAPTATION

  • DISCRIMINATIVE LINEAR REGRESSION

DISCRIMINATIVE LINEAR REGRESSION ADAPTATION ADAPTATION

  • EXPERIMENTS

EXPERIMENTS

  • CONCLUSION AND FUTURE WORKS

CONCLUSION AND FUTURE WORKS

slide-3
SLIDE 3

INTRODUCTION

slide-4
SLIDE 4

4

Why Speech Recognition is Important?

  • Speech communication

Speech communication is one of the basic and essential capabilities of human beings.

Speech is the only way to exchange information without any

tools.

  • Speech control

Speech control is natural on mobile devices.

  • Automatic speech recognition

Automatic speech recognition is important to broadcast news transcription.

High performance automatic speech recognition

recognition and summarization summarization is desirable.

slide-5
SLIDE 5

LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

slide-6
SLIDE 6

6

Elements of Speech Recognition

State-of-the-art speech recognizer is based on hidden Markov

hidden Markov models models (HMMs).

Parameter estimation is performed through EM algorithm

EM algorithm.

Decoding rule is according to MAP criterion

MAP criterion.

Goal of speech recognizer is to minimize the classification

classification error error.

slide-7
SLIDE 7

7

Bayesian Decision Theory

Bayes rule MAP decoding criterion

) ( ) ( ) | ( ) | ( X X X P W P W P W P = ) ( ) ( max arg ˆ W P W P W

W

X =

slide-8
SLIDE 8

8

Hidden Markov Models

Left-to-Right HMM Parameters of HMM

Initial probabilities Transition probabilities Output probabilities

Mixture of Gaussians

12

a

=

Σ =

M m jm jm jm j

N c b

1

) , ( ) ( μ x x

{ }

i

π π =

λ

{ }

ij

a A =

{ }

) (⋅ =

i

b B

1 2 3

23

a

11

a

22

a

33

a

1

b

2

b

3

b

slide-9
SLIDE 9

Feature Extraction Large Vocabulary Continuous Speech Recognition Hidden Markov models n-gram language models Feature Vectors

Recognition results

Lexicon Tree Speech Signal

slide-10
SLIDE 10

10

Lexicon

  • Linear structure
  • Tree structure

ㄅ ㄧ 逼,弊,... ㄢ 辦,搬,... ㄍ ㄨㄥ 辦公 ㄔ ㄥ ㄍ ㄉ ㄚ ㄒ ㄩㄝ 成,程,... ㄨㄥ 成功 ㄓ ㄤ 成長 成功大學

slide-11
SLIDE 11

11

J(k) 1 j Observation 1 t T The jth state of the kth subsyllable

States J(k) 1 J(k') Observation 1 t T The kth subsyllable The k'th subsyllable 1

Transitions within subsyllable Transitions across subsyllables

) ' , , 1 ( max ) ( ) , , (

' 1 ,

j k t Q t b j k t Q

j j j j k

− + =

≤ ≤ −

{ }

) , , 1 ( ), , ' , 1 ( max max ) ( ) , , (

' ' 1 ,

k t Q J k t Q t b k t Q

k K k j k

− − + =

≤ ≤

Search Algorithm

slide-12
SLIDE 12

12

Tree-Copy Search Concept

無從 從此 台視 開始 了 樂 我 窩 在 災 在職

V trees V trees V trees V trees V trees

P(‧| 從此) P(‧| 無從) P(‧|台視) P(‧|樂) P(‧| 窩) P(‧|在職) P(‧|在)

清華

P(‧|開始) P(‧|了) P(‧|我) P(‧| 在)

Language Model Look-ahead Acoustic Look-ahead Q(word history, arc, state)

P(‧| sil) P(‧| 清華)

slide-13
SLIDE 13

A B C sil A B C sil A B C sil A B C sil A B C sil A B C sil A B C sil A B C sil A B C sil A B C sil acoustic model language model acoustic model t t

slide-14
SLIDE 14

14

Proceed from left to right over time t Acoustic level: process states of lexical trees Initialization: Time alignment: Propagate back pointers Prune unlikely hypotheses For each pair Store best boundary Store best predecessor Word pair level: process word ends

Search Algorithm

) 1 ; ( ) , 1 ( − = = − t v H s t Qv 1 ) , 1 ( − = = − t s t Bv

{ }

) , 1 ( ) | , ( max ) , ( s t Q s s x p s t Q

v t s v

′ − ⋅ ′ =

) , ( s t Bv ) ; ( t w

{ }

) , ( ) | ( max ) ; (

w v v

s t Q v w p t w H ⋅ =

{ }

) , ( ) | ( max arg ) ; (

w v v

s t Q v w p t w v ⋅ =

) ; ( t w v v =

) , (

w v v

S t B = τ

slide-15
SLIDE 15

15

Mismatch Problem

Many mismatch sources

mismatch sources exist between training and test data in real applications.

Most popular technique is to conduct speaker/environment

speaker/environment adaptation adaptation.

Maximum a posteriori (MAP) Speaker clustering Linear regression

slide-16
SLIDE 16

Speech Database Acoustic Models Speaker indepent Speaker Adapted Training Testing

Adaptation Data

M I S M A T C H

slide-17
SLIDE 17

17

Keypoints of This Talk

Bayesian Duration Adaptation

Parametric duration modeling

Gaussian, Poisson and gamma distributions

Joint sequential learning of acoustic model and duration model QB estimates of Gaussian and Poisson duration models were

formulated.

Reproducible prior/posterior property was exploited.

slide-18
SLIDE 18

18

Aggregate a Posteriori Linear Regression

Robustness

Considering the prior information of regression matrix The relation of AAPLR and MAPLR was illustrated.

Discriminative adaptation

The AAP criterion can be represented as the form of minimum error

rate.

Rapid adaptation

AAPLR has closed-form solution. It is superior to traditional discriminative adaptation. (MCELR)

slide-19
SLIDE 19

BAYESIAN DURATION ADAPTATION

slide-20
SLIDE 20

20

Background Knowledge

  • Speaking rate

Speaking rate is one of the mismatch sources between training and testing.

In standard HMM, the state duration is represented with

transition probability transition probability.

Non-parametric approaches

Ferguson explicitly modeled the duration Too many parameters

Parametric approaches

Russell and Moore applied Poisson distribution Levinson applied gamma distribution

slide-21
SLIDE 21

21

Parametric Duration Modeling

HMM parameter set is extended with state duration

Initial state probability, Transition probability, Observation density, Duration density,

Maximum likelihood criterion

slide-22
SLIDE 22

22

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-23
SLIDE 23

23

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-24
SLIDE 24

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-25
SLIDE 25

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%) empirical distribution Geometric distribution Gaussian distribution Poisson distribution gamma distribution

slide-26
SLIDE 26

26

Parametric Duration Models

Duration models and their prior distributions

  • Gaussian distribution

Gaussian distribution with Gaussian prior Gaussian prior

  • Poisson distribution

Poisson distribution with gamma prior gamma prior

  • Gamma distribution

Gamma distribution with Gaussian prior Gaussian prior

Estimation Criteria

ML estimation MAP estimation QB estimation

slide-27
SLIDE 27

27

ML Parameter Estimation

Auxiliary Q-function

slide-28
SLIDE 28

28

ML Estimation for Different Duration Parameters

  • Gaussian

Gaussian Duration Parameters

slide-29
SLIDE 29
  • Poisson

Poisson Duration Parameters

  • Gamma

Gamma Duration Parameters

slide-30
SLIDE 30

30

Bayesian Learning of Duration Models

MAP batch learning

  • Risk function

Risk function

slide-31
SLIDE 31

31

QB Sequential Learning

  • Risk function

Risk function

slide-32
SLIDE 32

32

MAP Estimation for Gamma Gamma Duration Parameters

  • Gamma duration with Gaussian prior
  • M-step,

for the parameter η

slide-33
SLIDE 33

33

  • for the parameter

for the parameter

  • No closed

No closed-

  • form solution

form solution exists.

Newton’s algorithm can be applied.

ν

slide-34
SLIDE 34

34

QB Estimation for Gaussian Gaussian Duration Parameters

Gaussian Duration with Gaussian prior QB estimate is obtained by

slide-35
SLIDE 35

35

QB Estimation for Poisson Poisson Duration Parameters

Poisson duration with gamma prior E-step

slide-36
SLIDE 36

36

Updating Hyperparameters

  • Gamma

Gamma hyperparameters :

  • Poisson

Poisson parameters :

slide-37
SLIDE 37

DISCRIMINATIVE LINEAR REGRESSION ADAPTATION

slide-38
SLIDE 38

38

Estimation Criteria

  • Distribution estimation

Distribution estimation and discriminative training discriminative training are two categories of HMM parameter estimation approach.

Distribution estimation

  • Maximum likelihood

Maximum likelihood criterion

  • Maximum

Maximum a posteriori a posteriori criterion

Discriminative training

  • Minimum classification error

Minimum classification error (MCE MCE) criterion

  • Maximum mutual information

Maximum mutual information (MMI MMI) criterion

slide-39
SLIDE 39

39

Discriminative Training

  • MCE

MCE

  • MMI

MMI

⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − + − =

Ω ∈

m

j j j m m m

M d )] ; ( exp[ 1 1 log ) ; ( ) , ( λ λ λ X X X g g )) , ( exp( 1 1 )) , ( ( ) ; (

m m m

d d λ λ λ X X X − + = = l l

) ; (

) ( ) ( ) 1 ( k m k m k m

U λ ε λ λ X l ∇ − =

+

M M d W p W p W p p W p W p W I

m M j j j m m m m

log ))) 1 log( ) , ( ( ( log( ) ( ) ( log ) ( log ) ( ) ( ) , ( log ) , (

1

+ − + − = − = =

=

λ X X X X X X l

slide-40
SLIDE 40

40

Linear Regression Adaptation

Model transformation (e.g. MLLR, MAPLR) Initial models Adapted models Regression class Class 1 Class 2

slide-41
SLIDE 41

41

MLLR

Linear transformation MLLR

Solution m m r m

ξ μ

) (

ˆ W =

) , ( max arg

ML

Λ = W X W

W

p

1 2 , 2 ML

) ( ) (

− Ω ∈ Ω ∈

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ =

∑∑ ∑∑

t m T m m mi t t m T m i t mi t ri

r r

m x m ξ ξ σ ς ξ σ ς w

slide-42
SLIDE 42

42

Prior of regression matrices Solution

) ( ) , ( max arg ) , ( max arg

MAP

W W X X W W

W W

g p p Λ = Λ =

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − Σ − ⋅ Δ ∝

= − − d i T ri ri ri ri ri r r

g

1 1 2 / 1

) ( ) ( exp ) ( m w m w W

1 1 2 1 , 2 MAP

) ( ) (

− − Ω ∈ − Ω ∈

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ Σ + ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ Σ + =

∑∑ ∑∑

ri t m T m m mi t ri ri t m T m i t mi t ri

r r

m x m ξ ξ σ ς ξ σ ς m w

MAPLR

slide-43
SLIDE 43

43

MCELR

  • Discriminant

Discriminant function function

Solution

CMLLR

Solution

∑ ∑

Ω ∈

=

t m m r t t r r

r

p m g ) , ( log ) ( ) ; ( λ ς W x W X

⎪ ⎭ ⎪ ⎬ ⎫ ⎥ ⎥ ⎦ ⎤ − − − ⎪ ⎩ ⎪ ⎨ ⎧ ⎢ ⎣ ⎡ − × − − =

∑ ∑ ∑

Ω ∈ Ω ∈ +

) )( ( ) ( ) ( )) ; ( 1 )( ; (

2 ) ( , 2 ) ( , ) ( ) ( ) ( ) 1 (

r r

j T j ji j k ri i t t T m mi m k ri i t t m t k ri k ri k ri k ri

x j x m ξ σ ξ ς ξ σ ξ ς ε w w w X w X w w l l

1 2 2 , CML

)) ( ) ( ( )) ( ) ( (

− Ω ∈ Ω ∈

⎪ ⎬ ⎫ ⎪ ⎨ ⎧ ⎥ ⎤ ⎢ ⎡ + − × ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + − =

∑ ∑ ∑ ∑

r r

T m m m t t j m m T m t mi m i t t t ri

D j m D x j m ξ ξ ς ς σ ξ μ ς ς w

Ω ∈ Ω ∈

⎪ ⎭ ⎪ ⎩ ⎥ ⎦ ⎢ ⎣

r r j

m m t

σ

⋅ =

t t m

j E D ) ( ς

MCELR & CMLLR

slide-44
SLIDE 44

44

AAP Criterion

Let Then Loss function

∑∑∑ ∑∑

= = = = =

= = Λ = Λ

M m N n M j j j n m m m n m M m N n n m m

m m

P p P p p p J

1 1 1 , , 1 1 , AAP AAP

) ( ) ( ) ( ) ( ) ( λ λ λ X X X X

∏ =

=

n

T t m t n m m n m

p p

1 , , ,

) ( ) ( λ λ x X

∑∑

= =

= Λ

M m N n m n m

m

d J

1 1 , AAP AAP

)) , ( ( ) ( λ X l

Ω ∈

− =

m

j j j n m m m n m m n m

P p P p d ) ( log ) ( log ) , (

, , , AAP

λ λ λ X X X

slide-45
SLIDE 45

45

AAPLR and MAPLR

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ = = Λ =

∑∑

= = M m N n n m m r m r n m r r r

m r r

X p P g X p J X p

1 1 , , AAP AAP AAP

) ( ) ( ) , ( ) ( max arg ) , ( max arg W W W W W

W W

λ

∑∑

= =

=

M m N n n m r m r n m r

m

X p g X p J

1 1 , , MAP

) ( ) ( ) , ( log ) ( W W W λ

∑∑

= =

=

M m N n n m m r m r n m r

m

X p P g X p J

1 1 , , AAP

) ( ) ( ) , ( ) ( W W W λ

AAPLR AAPLR MAPLR MAPLR

slide-46
SLIDE 46

46

Arrangement of AAPLR Criterion

In a form of MCE

MCE criterion

  • Misclassification measure

Misclassification measure

where

Adopting diagonal covariance matrix

∑∑

= =

=

M m N n r n m r

m

X d J

1 1 , AAP AAP

)) , ( ( ) ( W W l ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ − − =

Ω ∈

r

j r j n m r r m n m r r n m

X g M X g X d )] , ; ( exp[ 1 1 log ) , ; ( ) , (

, , , AAP

W W W λ λ )} ( ) , ( log{ ) , ; (

, , r m r n m r m n m r

g X p X g W W W λ λ =

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − − ⋅ = Σ

∑ ∏

= = − − d i mi m ri i t n m d i mi d r m m t n m

x p

1 2 2 , , , 1 2 / 1 2 2 / , ,

) ( 2 1 exp ) ( ) 2 ( ) , , ( σ ξ σ π μ w W x

slide-47
SLIDE 47

47

1 1 , , 2 , , 1 1 2 , 1 , , 1 2 , , , , , 1 1 1 2 , , , , AAP

) ) ( ) ( 1 ( 2 ) ( ) ( ) ( ) ) ( ) ( 1 ( 2 ) ( ) ( ) (

− − Ω ∈ Ω ∈ = = − Ω ∈ = Ω ∈ = = =

⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ Σ Φ Ψ − − Φ Ψ − ⎢ ⎢ ⎣ ⎡ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ Σ Φ Ψ − − Φ Ψ − ⎢ ⎢ ⎣ ⎡ =

∑ ∑ ∑∑ ∑ ∑ ∑ ∑∑ ∑

ri j n m j n m r ji T j j j n m j n m r n M m N n mi T m m n n m r ri ri j n m j n m r T t ji T j i t n m j n m j n m r M m N n T t mi T m i t n m n m r ri

r r m r n r m n

X X X X T T X L X X x X X x X L σ ξ ξ σ ξ ξ σ ξ σ ξ m w

Ω ∈

= Ψ

r

j r j n m r n m r

X g X )] , ; ( exp[ 1 ) (

, ,

W λ

)] , ; ( exp[ ) (

, , r j n m r n m j

X g X W λ = Φ

))) , ( ( 1 ))( , ( ( ) (

, AAP , AAP , r n m r n m n m r

X d X d X L W W l l − =

AAPLR Solution

slide-48
SLIDE 48

Joint Probability Competing Hypothesis Bayesian Learning Closed- Form Solution MLLR Product No No Yes MAPLR Product No Yes Yes MCELR

[Wu and Huo, 2002]

Product Yes No No MCELR

[He and Chou, 2003]

Product Yes No Yes CMLLR Product Yes No Yes MPELR Sum Yes No Yes AAPLR Sum Yes Yes Yes

slide-49
SLIDE 49

49

Discriminability of Distributions

  • 500
  • 400
  • 300
  • 200
  • 100

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 log likelihood frequency of occurrences

slide-50
SLIDE 50

50

Discriminability of Distributions

  • 500
  • 400
  • 300
  • 200
  • 100

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 log likelihood frequency of occurrences

slide-51
SLIDE 51

51

Discriminability of Distributions

  • 500
  • 400
  • 300
  • 200
  • 100

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 log likelihood frequency of occurrences MLLR, target model MLLR, competing model AAPLR, target model AAPLR, competing model

slide-52
SLIDE 52

EXPERIMENTS

slide-53
SLIDE 53

53

Database Settings

Training database

Connected digits

1000 utterances 50 male and 50 female speakers

TCC300 microphone speech database

About 16 hours 100 speakers for training

slide-54
SLIDE 54

54

Testing database

Car noisy speech database

50km/h car noisy speech by 10 speakers

TCC300 database

20 speakers

Broadcast news database

Radio stations Public Television Service News

MATBN database Anchor speech

slide-55
SLIDE 55

55

Feature Extraction

All utterances were sampled at 16kHz with 16 bit resolution Feature representation

12 MFCCs, 1 log energy and their derivatives

Channel effect removal

Cepstral mean subtraction (CMS)/utterance

slide-56
SLIDE 56

Bayesian Duration Adaptation

slide-57
SLIDE 57

57

Comparison of Speaking Rates and Baseline Recognition Performances

Database TCC300 Broadcast news Male 3.55 5.50 Female 4.86 5.47 Speaking rate (syl/sec) 35.6 36.4 36.9 38.2 gamma Poisson Gaussian With durations Without duration SER(%)

slide-58
SLIDE 58

58

Evaluation of Different Duration Models

KL divergence measure

Empirical distribution Estimated parametric distribution

[ ]

− τ τ τ τ τ d d d d d

e e

) ( ˆ ) ( ˆ log ) ( ˆ ) ( ˆ

) ( ˆ τ

e

d ) ( ˆ τ d

Parametric distribution Divergence Gaussian 0.243 Poisson 0.185 Gamma 0.134

slide-59
SLIDE 59

59

SER with/without MAP Adaptation

30 35 40

MAP Adaptation, N=30 No Adaptation

Syllable Error Rate (%) without duration model Gaussian duration+Gaussian prior Gaussian duration+gamma prior Poisson duration+gamma prior gamma duration+Gaussian prior

15 20 25

MAP Adaptation, N=5 No Adaptation Syllable Error Rate (%) without duration model Gaussian duration+Gaussian prior Gaussian duration+gamma prior Poisson duration+gamma prior gamma duration+Gaussian prior

slide-60
SLIDE 60

60

Poisson Duration Modeling

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-61
SLIDE 61

61

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-62
SLIDE 62

62

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%) empirical distribution Poisson distribution Poisson distribution at 3rd epoch Poisson distribution at 5th epoch

slide-63
SLIDE 63

63

Gaussian & Poisson Duration Modeling

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-64
SLIDE 64

64

Gaussian & Poisson Duration Modeling

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%)

slide-65
SLIDE 65

65

Gaussian & Poisson Duration Modeling

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Duration Length Relative Frequency(%) empirical distribution Gaussian distribution Gaussian distribution at 5th epoch Poisson distribution Poisson distribution at 5th epoch

slide-66
SLIDE 66

66

5 10 15 20 25 30 33 34 35 36 37 38 Number of Adaptation Utterances Syllable Error Rate (%) Baseline Gaussian duration+Gaussian prior Poisson duration+gamma prior

slide-67
SLIDE 67

67

Evaluation of Computational Cost

Gaussian Poisson Gamma Recognition Time (ms/syl) 2.6 2.5 3.2 Adaptation Time (ms/syl) 2.7 2.7 4.2

slide-68
SLIDE 68

Discriminative Linear Regression Adaptation

slide-69
SLIDE 69

69

Adaptation Strategies

Two pass adaptation strategy

Task adaptation

200 utterances (30min)

Speaker adaptation

60 utterances (14min) One male and one female reporters

Testing set

40 utterances (9min)

slide-70
SLIDE 70

70

5 10 15 20 40 60 28 29 30 31 32 33 34 35 number of adaptation utterances (N) syllable error rate (%) MLLR,R=4 MAPLR,R=4 MCELR,R=4 CMLLR,R=4 AAPLR,R=4 Supervised Adaptation

Supervised Adaptation

slide-71
SLIDE 71

71

5 10 15 20 40 60 30 40 50 60 70 80 90 100 110 number of adaptation utterances (N) adaptation time (sec) MLLR MAPLR MCELR CMLLR AAPLR

Computational Cost

slide-72
SLIDE 72

CONCLUSIONS AND FUTURE WORKS

slide-73
SLIDE 73

73

Conclusions –Duration Model

  • Joint Bayesian learning

Joint Bayesian learning framework of HMM’s and duration parameters was proposed.

  • Gaussian

Gaussian, Poisson Poisson and gamma gamma densities for duration modeling were evaluated.

  • QB estimates

QB estimates for Gaussian Gaussian and Poisson Poisson duration models were formulated.

  • Reproducible prior/posterior

Reproducible prior/posterior property was applied to establish the updating mechanism for prior statistics.

slide-74
SLIDE 74

74

Conclusions –AAPLR

  • Aggregate

Aggregate a posteriori a posteriori linear regression linear regression algorithm was proposed for speaker adaptation.

Broadcast news transcription was carried out to evaluate

performance improvement.

  • AAP criterion

AAP criterion was introduced to achieve model discriminability and derive rapid parameter estimation.

A closed

closed-

  • form

form solution to AAPLR AAPLR was derived to achieve desirable adaptation performance.

slide-75
SLIDE 75

75

Future Works

Duration modeling

Alternative distributions,e.g. alpha-stable distributions will be

investigated.

Application to the higher acoustic level, eg. sub-syllable, syllable,

word.

Discriminative Linear Regression Adaptation

Convergence problem Sum of probabilities vs. product of probabilities

slide-76
SLIDE 76

76

References

  • Basics

Chou, W., Juang, B.-H., 2003. Pattern Recognition in Speech and Language Processing,

CRC Press.

DeGroot, M. H., 1970. Optimal Statistical Decisions. McGraw-Hill. Dempster, P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from

incomplete data via EM algorithm. Journal of the Royal Statistical Society (B), vol. 39, pp.1-38.

Duda, R. O., Hart, P. E., Stork, D. G., 2001. Pattern Classification, John-Wiley &

Sons, Inc.

  • Bayesian Duration Adaptation

Chien, J.-T., Huang, C.-H., 2003. Bayesian learning of speech duration models.

IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 558-567.

  • Discriminative Linear Regression Adaptation

Chien, J.-T., Huang, C.-H., 2006. Aggregate a Posteriori Linear Regression,

IEEE Transactions on ASLP.