Adaptation Techniques for Acoustic Adaptation Techniques for - - PowerPoint PPT Presentation

adaptation techniques for acoustic adaptation techniques
SMART_READER_LITE
LIVE PREVIEW

Adaptation Techniques for Acoustic Adaptation Techniques for - - PowerPoint PPT Presentation

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Models Models Models Jen- -Wei Wei Roger Roger Kuo Kuo Jen Speech Lab, CSIE, NTNU Speech Lab, CSIE, NTNU rogerkuo@csie.ntnu.edu.tw


slide-1
SLIDE 1

Adaptation Techniques for Acoustic Models Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Models Models

Jen Jen-

  • Wei

Wei Roger Roger Kuo Kuo

Speech Lab, CSIE, NTNU Speech Lab, CSIE, NTNU rogerkuo@csie.ntnu.edu.tw rogerkuo@csie.ntnu.edu.tw

slide-2
SLIDE 2

Presentation Outline

  • Background
  • Maximum a Posterior
  • Maximum Likelihood Linear Regression
  • MLLR another approach
  • Constrained Maximum Likelihood Linear Regression
  • Maximum a Posterior Linear Regression
  • Structural Maximum a Posterior
  • Joint MAP and MLLR
  • Appendix — Matrix Calculus
slide-3
SLIDE 3

Background Background Background

slide-4
SLIDE 4

Reference

  • Adaptive Methods for Speech and

Speaker Recognition – PSTL Jean- Claude Junqua and Roland Kuhn

slide-5
SLIDE 5

Background

  • Intra-Speaker Variability
slide-6
SLIDE 6

Background

  • Inter-Speaker Variability
slide-7
SLIDE 7

Background

  • Adaptive System
slide-8
SLIDE 8

Background

  • Goal

– Produce a system with Speaker-Dependent (SD) performance given small amount of data from new speaker.

slide-9
SLIDE 9

Background

  • Main Speaker Adaptation Methods

– Bayesian – Transformation-based – Based on Clustering and Model selection

slide-10
SLIDE 10

Background

  • Comparison of Methods

model crazy

slide-11
SLIDE 11

Background

  • Direct and indirect adaptation
slide-12
SLIDE 12

Approaches

Eigen- MAPLR Eigen- MLLR Indirect Direct MCELR SMAPLR EMAPLR QBLR MAPLR-Mean MAPLR-Covariance MLLR CMLLR Indirect Adaptation (Transformation) SMAP Structural Hierarchy MCE Maximum Classification Error QB MAP Baum- Welch Training Direct Adaptation Maximum a Posterior (MAP) On-Line(QB) EigenVoice Eigen Approach Maximum Likelihood (ML)

slide-13
SLIDE 13

Maximum a Posterior Maximum a Posterior Maximum a Posterior

slide-14
SLIDE 14

Maximum a Posterior

  • Reference:

– A Study on Speaker Adaptation of the Parameters

  • f Continuous Density Hidden Markov Models –

SAP’91 C.-H. Lee – Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition – TR’92 Q. Huo – Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains – SAP’94 J.-L. Gauvain – Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition – SAP’95 Q. Huo – On Adaptive Decision Rules and Decision Parameter Adaptation for Automatic Speech Recognition – Proceedings of IEEE’00 C.-H. Lee

slide-15
SLIDE 15

Maximum a Posterior

  • Introduction
  • MAP estimate for Multinomial

– Prior Dirichlet

  • MAP estimate for DHMM

– Prior Dirichlet

  • MAP estimate for SCHMM

– Prior Dirichlet + normal-Wishart

  • MAP estimate for CDHMM
slide-16
SLIDE 16

Introduction

Estimation Posterior a Maximum ) ( ) | ( max ) ( ) ( ) | ( max ) | ( max Estimation Likelihood Maximum ) | ( max model acoustic d reestimate : model acoustic current : adaptation for data n

  • bservatio

: ↓ = = = → = λ λ X X λ λ X X λ λ λ X λ λ λ X

λ λ λ λ

p p p p p p p

MAP ML

slide-17
SLIDE 17

Introduction

slow. be will prior from away movement big, s it' if prior, in faith

  • f

strength is Gaussian. prior

  • f

mean the is ) ( ) ( MAP Use ) ( ) ( MLE Use : Gaussian particular a

  • f

mean the estimate To

1 1 1 1

τ γ τ γ τ γ γ

nwi T t T t t nwi MAP T t T t t ML

t t t t µ x µ µ x µ

∑ ∑ ∑ ∑

= = = =

+ + = → = →

slide-18
SLIDE 18

Multinomial

1

e

K

e

M

1

e

K

e

M

1

e

K

e

M

1

x

2

x

T

x

L L L

1

w →

K

w

{ } ( ) ( ) ( )

function. indicator the is ) 1( with sequence the in event th the

  • bserving
  • ccurrence
  • f

number the is ) ( 1 where ! ! ! ! ! ! ! ! , , | , ,

  • n

distributi l multinomia a follows , , ns

  • bservatio

discrete i.i.d

  • f

sequence a

  • bserving
  • f

y probabilit the Then, . 1 and , , 1 |

  • utcomes

possible

  • f

set a among event discrete th the

  • bserving
  • f

y probabilit the be Let

1 1 2 1 1 2 1 1 1

⋅ = = = + + = = = =

∑ ∏ ∏ ∑

= = = =

k e x n w n n n T w n n n n n w w p w K k e K e k w

T t k t k K k n k k K k n k k k 1 K T 1 T 1 K k k k k k

k k

L L L L L L L x x x x X

slide-19
SLIDE 19

Multinomial

  • Many useful random variables used in speech

recognition and language processing, including N- grams, histograms, mixture gains, and discrete HMM probabilities, can be modeled this way.

( ) { }

= − − −

∝ = > Γ Γ + + Γ =

K k k K K k K K K K K K 1

k K

w w w p K k w w w w p w w

1 1 1 1 1 1 1 1 1 1 1

) , | ,..., ( So, eters. hyperparam

  • f

set the is , , 1 | where ... ) ( )... ( ) ... ( ) , | ,..., ( density. l multinomia a

  • f

parameters for the prior conjugate a is which density t a Dirichle as assumed be can , ,

  • f

density prior The

1

ν ν ν

ν ν ν ν ν ν ν ν ν L L L L

slide-20
SLIDE 20

Multinomial

( ) ( )

∑ ∑ ∑ ∑ ∑ ∑ ∏

= = = = = = = − +

− + − + = ∴ − + − = ⇒ = − + − = − + − = ⇒ = + − + × ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + + Ψ = ⇒ ∝

K k k k k k k K k k k K k k k K k k k k k k k k k K j j K k k k k T K K k n k T K

n n w n l l n w l n w l n w w w l w n w w p w w w p

k k

1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 ) 1 ( 1 then w.r.t ate Differenti log ) 1 ( ) ,..., | ,..., ( log ) ,..., | ,..., ( So, ν ν ν ν ν ν ν

ν

Q x x x x

Apply Lagrange Multiplier

slide-21
SLIDE 21

Discrete HMM

( ) { } { } ( )

[ ] [ ]

{ }

K 1 k jk jk ij N T T T t t s s s s s

v v V j v b K k N j b N j i a N s s b a b p p

t t t

, , as denoted is set symbol n

  • bservatio

The state in symbol

  • bserving
  • f

y probabilit the being with , , 1 , , 1, where matrix. y probabilit transition the is , , 1 , where y vector. probabilit state initial the is where , , vector parameter DHMM with state

  • Consider

, , and , , where , sequence state a is ) ( ) ( | , ) | (

1 1 1 2 1

1 1 1

L L L L M L L = = = = = = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = =

∑ ∏ ∑

=

B A π B A π λ x x X S S x x λ S X λ X

S S

π π π

Definition :

slide-22
SLIDE 22

Discrete HMM

( )

( ) [ ]

( )

( ) [ ]

( ) ( )

( )

( ) ( ) ( )

∑ ∑ ∑ ∑ ∏ ∑ ∏ ∑ ∑ ∑

⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + ′ = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ′ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = = =

= = ′ = = ′

− − −

S S S S S S

x λ S X x x x x λ S X λ S X λ S X λ X λ S X λ S X λ X S λ X λ S X λ λ λ X

T t t s T t s s s T t t s s s s s T t t s s s s s

t t t t t t t t t

b a P b a b b a b P P P P P P P P E Q p

2 2 2 1 2 1

) ( log log | , ) ( ) ( ) ( ) ( log | , | , , log | | , , log , , | , log | maximize to need

  • nly

we , ) | ( log maximize

  • rder to

In

1 1 1 1 1 1 1 1

π π π

Q-function :

slide-23
SLIDE 23

Discrete HMM

1 2 1 2 1 2

4

v

7

v

4

v

statrt

1

π

2

π

11

a

12

a

22

a

21

a

4 , 1 11 7 , 1 11 4 , 1 1

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

1

4 , 1 11 7 , 1 11 4 , 1 1

log log log log log log b a b a b + + + + + π

4 , 2 12 7 , 1 11 4 , 1 1

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

2

4 , 2 12 7 , 1 11 4 , 1 1

log log log log log log b a b a b + + + + + π

4 , 1 21 7 , 2 12 4 , 1 1

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

3

4 , 1 21 7 , 2 12 4 , 1 1

log log log log log log b a b a b + + + + + π

4 , 2 22 7 , 2 12 4 , 1 1

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

4

4 , 2 22 7 , 2 12 4 , 1 1

log log log log log log b a b a b + + + + + π

4 , 1 11 7 , 1 21 4 , 2 2

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

5

4 , 1 11 7 , 1 21 4 , 2 2

log log log log log log b a b a b + + + + + π

4 , 2 12 7 , 1 21 4 , 2 2

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

6

4 , 2 12 7 , 1 21 4 , 2 2

log log log log log log b a b a b + + + + + π

4 , 1 21 7 , 2 22 4 , 2 2

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

7

4 , 1 21 7 , 2 22 4 , 2 2

log log log log log log b a b a b + + + + + π

4 , 2 22 7 , 2 22 4 , 2 2

b a b a b ⋅ ⋅ ⋅ ⋅ ⋅ π

8

4 , 2 22 7 , 2 22 4 , 2 2

log log log log log log b a b a b + + + + + π

) | , ( log λ s X p

) | , ( λ s X p

A simple example : Total 8 paths

slide-24
SLIDE 24

Discrete HMM

2 2 1 1 2 1

log ) 1 ( log ) 1 ( log 8 7 6 5 log 4 3 2 1 π γ π γ π π ⋅ + ⋅ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + all all ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + +

22 21 12 11

log 8 4 8 7 log 7 3 6 5 log 6 2 4 3 log 5 1 2 1 a all all a all all a all all a all all

1 = i 2 = i 1 = j 2 = j 1 = t 2 = t

( )

paths all 8 7 6 5 4 3 2 1 + + + + + + + = all

A simple example :

slide-25
SLIDE 25

Discrete HMM

X1 State X2 X3 1 2 3 time s1 s2 s1 s2 s1 s2

( ) ( )

1 1

1 1

β α

( ) ( )

2 2

1 1

β α

( ) ( )

2 2

2 2

β α

( ) ( )

2 2

3 3

β α

( ) ( )

1 1

2 2

β α

( ) ( )

1 1

3 3

β α

The Forward/Backward Procedure

( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ∑ ∑

= =

= = = = = =

N j t t t t N j t t t t

j j i i j s P i s P P i s P i

1 1

, , , β α β α γ λ X λ X λ X λ X

( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ∑ ∑ ∑ ∑

= + + = + + = + = + +

= = = = = = = = =

N j t 1 t j ij t N i t 1 t j ij t N j t t N i t t t t t

j b a i j b a i j s i s P j s i s P P j s i s P j i

1 1 1 1 1 1 1 1 1

) ( ) ( , , , , , , , β α β α ξ x x λ X λ X λ X λ X

slide-26
SLIDE 26

Discrete HMM

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⋅ =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = − = + = = = jk v x t k t t K k N j ij T t t t N j N i N i i

b v j s a j s i s i s p p p Q

k t

log ) , | ~ , Pr( log ) , | , Pr( log ) , | Pr( ) | , ( log ) | , ( ) | , ( ) | (

~ : 1 1 1 1 1 1 1 1 1

λ X x λ X λ X λ S X λ S X λ S X λ λ

S S

π

) 1 (

i

γ ) (t

ij

ξ

∑ ∑

= =

= ⋅

T t t k t T t t

j v j

1 1

) ( ) ( 1 ) ( γ γ x

Q-function :

slide-27
SLIDE 27

Discrete HMM

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) [ ] ( ) ( ) [ ]

( ) ( ) [ ]

) ( log ) | ( ) | ( function auxiliary the define We log | max log | log max | log max | log max 1 and , where then

  • ns

distributi Dirichlet

  • f

form the assume densities their and then is for density prior The assumed. is and ,

  • f

ce independen prior , simplicity For

1 1 1 1 1 1 1 1

λ λ λ λ λ λ λ λ λ λ X λ λ X X λ λ λ B A π λ λ B A

λ λ λ λ

p Q R p Q p p p p p b a K p p p p p

MAP ik ij i ik K k N i N j ij N i N i i c

ik ij i

+ = + = + = = = > ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⋅ ⋅ =

− = = = − = = −

∏ ∏ ∏ ∏ ∏

ν η η π π

ν η η

R-function :

slide-28
SLIDE 28

Discrete HMM

( )

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + = + Ψ = ∴

∑ ∑ ∑ ∑ ∑ ∑ ∑

= = − = + = = = jk ik v x t k t t K k N j ij ij T t t t N j N i N i i i

b v i s a j s i s i s constant R

k t

log 1 ) , | , Pr( log 1 ) , | , Pr( log 1 ) , | Pr( ) ( ) | (

~ : 1 1 1 1 1 1 1 1 1

ν η π η λ X x λ X λ X λ λ

slide-29
SLIDE 29

Discrete HMM

[ ]

∑ ∑ ∑ ∑ ∑ ∑ ∑

= = − = + − = + =

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = = − + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = = = − + = − + = =

K k ik v x t k t t ik v x t k t t jk N j ij T t t t ij T t t t ij N i i i i

k t k t

v x i s v i s b j s i s j s i s a i s i s

1 ~ : ~ : 1 1 1 1 1 1 1 1 1 1

1 ) , | , Pr( 1 ) , | , Pr( 1 ) , | , Pr( 1 ) , | , Pr( 1 ) , | Pr( 1 ) , | Pr(

  • btain

can we So, ν ν η η η η π λ X λ X x λ X λ X λ X λ X

slide-30
SLIDE 30

Discrete HMM

  • How to choose the initial estimate

for ?

  • One reasonable choice of the initial estimate is

the mode of the prior density.

jk ij i

b and a , π

( ) ( ) ( )

K k N j b N j i a N i

jp K p jk jk ip N p ij ij p N p i i

, , 1 and , , 1 1 1 , , 1 , 1 1 , , 1 1 1

1 ) ( 1 ) ( 1 ) (

L L L L = = − − = = − − = = − − =

∑ ∑ ∑

= = =

ν ν η η η η π

slide-31
SLIDE 31

Discrete HMM

  • What’s the mode ?

– So applying Lagrange Multiplier we can easily derive above modes.

– Example :

) ( max density prior the

  • f

mode the is If λ λ λ

λ

p

mode mode

= ⇒

( ) ( )

( ) ( )

1 1 1 1 1 1 but 1 ) 1 ( 1 1 ) , , ( log log 1 ) , , ( log ) , , (

1 1 1 1 1 1 1 1 1 1 1

− − = ⇒ − = − ∴ = − − ∴ = − − = ⇒ = − + × − = ∂ ∂ − + Ψ = ⇒ ∝

∑ ∑ ∑ ∑ ∑ ∑ ∏

= = = = = = = − p N p i i p N p p N p p N p i i p N p i i i N N i i i N N i i N

l l l l p p p

i

η η π η η π η π π π η π π π π η π π π π π

η

L L L

slide-32
SLIDE 32

Discrete HMM

  • Another reasonable choice of the initial

estimate is the mean of the prior density.

  • Both are some kind of summarization of the

available information about the parameters before any data are observed.

K k N j b N j i a N i

jp K p jk jk ip N p ij ij p N p i i

, , 1 and , , 1 , , 1 , , , 1

1 ) ( 1 ) ( 1 ) (

L L L L = = = = = = =

∑ ∑ ∑

= = =

ν ν η η η η π

slide-33
SLIDE 33

SCHMM

{ } { }

{ }

{ }

) ( ) ( 2 1 2 / 1 2 / ) ( ) ( ) ( 1 1 1 1 2 1

| | ) 2 ( ) , | ( where ) number mixture ( ,.., 1 , and , , 1 ), number state ( ,..., 1 , | , , number HMMs total the is where , , , , , ) , | ( ) ( and ,..., where likelihood the be ) ( ) ( ) | ( Let Wishart normal Dirichlet Prior HMM Continuous Semi Likelihood

1 1 1 k k T k t t t

e N K k K k N j i w a M N w b b a b p

k D k k k k k m ik m ij m i m K M K k k k t ik t i T T t t s s s s s m x r m x S

r r m x r m θ λ θ θ λ λ Λ r m x x x x X x x X

− − − − = =

= = = = = = = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = Λ − + ⇒ − ⇒

∑ ∑ ∏

π π π L L L

slide-34
SLIDE 34

SCHMM

∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏

= − − − − − − − − − − = = = − = = − = =

∝ ⇒ − × > − > ∝ ⇒ − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ∝ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ =

D d r m r kd k k k k k k k k k tr D k k k k k k ik K k N i N j ij N i N i i c m K k k k M m m

kd kd kd kd kd kd kd k k k k k T k k k k ik ij i

e e r g g D D D e e g g w a K g g g g

1 ) ( 2 2 2 / 1 ) ( 2 1 ) ( ) ( 2 2 1 1 1 1 1 1 1 1 1 1

2

) , ( gamma normal

  • f

product a as assumed is ) , ( then matrix precision diagonal a is If matrix defintite positive a is and D dimension

  • f

vector a is , , 1 | | ) , ( Wishart normal a as assumed is ) , ( then matrix precision full a is If ) ( where ) , ( ) ( ) ( : be to assumed is Λ for density prior The

β µ τ α τ α ν η η

τ α π r m r m r u µ r r m r m r λ r m λ Λ

r u µ m γ µ m

independent

slide-35
SLIDE 35

SCHMM

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∏ ∏

= =

) ( ) , | ( max arg by

  • btained

be can

  • f

estimates MAP the . sequences n

  • bservatio

has model each and model with associated length

  • f

sequence bservation th the denote X Let

1 ) , ( 1 ) , ( ) , (

Λ Θ λ X Λ Λ

Λ

g f W m m T

  • n

Wm n m n m M m MAP m n m n m

Model 1 Model 2 Model M

) , 1 ( ) 1 , 1 (

1

W

X X M

) , 2 ( ) 1 , 2 (

2

W

X X M

) , ( ) 1 , (

M

W M M

X X M

L

slide-36
SLIDE 36

SCHMM

[ ] [ ]

∏ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = =

= = = = −

) , ( 1 1 1 1 1 1

2 ) , ( 1 , ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 1 ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 1 ) , ( ) , ( 1 1

) , | ( ) , | ( ) | , , ( Where ) | , , ( log ) | ( ) | , , ( ) | , , ( log ) , | , ( , | ) | , , ( log ) | ( as function Q a Define

n m t t t t t t (m,n) (m,n) m (m,n) (m,n) m m

T t l l n m t l s s s l l l s s n m n m n m n m n m n m n m n m n m n m W n M m n m n m n m n m n m n m W n M m n m n m W n M m

N w a N w f f f f f f f E Q r m x r m x Λ L S X Λ L S X Λ X Λ X L S Λ L S X Λ X L S Λ X Λ L S X Λ Λ

L S L S

π

Q-function :

slide-37
SLIDE 37

SCHMM

( )

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= + = = = = = = = = = = = = = = = = =

⋅ = = = = = = = = = = = + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = − ∴

K k k k n m t m ik k k n m t m ik n m t n m t m n m n m t n m t m n m n m t n m t n m t m n m n m t n m t m n m n m t n m t n m t k k n m t n m t T t W n M m K k m ik T t n m t W n K k N i M m m ij T t n m t W n N j N i M m m i W n n m N i M m

N w N w i k i k l k k l i s k i i s i j s i s j i N k w k i a j i i Q

n m m n m m n m m m

1 ) , ( ) ( ) , ( ) ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( 1 1 1 1 ) ( 1 ) , ( 1 1 1 1 ) ( 1 ) , ( 1 1 1 1 ) ( 1 ) , ( 1 1 1

) , | ( ) , | ( ) ( ) , ( and ) , | Pr( ) ( ) , | , Pr( ) , ( ) , | Pr( ) ( ) , | , Pr( ) , ( where ) , | ( log ) ( log ) , ( log ) , ( log ) ( ) | ( in decomposed be can function Q

) , ( ) , ( ) , (

r m x r m x λ X λ X λ X λ X r m x Λ Λ γ ξ ξ ξ γ γ ξ ξ γ π γ

Q-function :

slide-38
SLIDE 38

SCHMM

( ) ( ) ( ) ( )

Constant g N k w k i a j i i g Q R Constant r m g w a g

K k k k k k n m t n m t T t W n M m K k m jk m jk n m t T t W n K k N i M m m ij m ij n m t T t W n N j N i M m m i m i n m W n N i M m k k K k m jk m jk K k N i M m m ij m ij N j N i M m m i m i N i M m

n m m n m m n m m m

+ + + ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = + = + + − + − + − =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = = = = = = = = = = = = = = 1 ) , ( ) , ( 1 1 1 1 ) ( ) ( ) , ( 1 1 1 1 1 ) ( ) ( ) , ( 1 1 1 1 1 ) ( ) ( ) , ( 1 1 1 1 1 ) ( ) ( 1 1 1 ) ( ) ( 1 1 1 ) ( ) ( 1 1

) , ( log ) , | ( log ) ( log 1 ) , ( log 1 ) , ( log 1 ) ( ) ( log ) | ( ) | ( ) , ( log log 1 log 1 log 1 ) ( log

) , ( ) , ( ) , (

r m r m x Λ Λ Λ Λ Λ Λ ξ ν ξ η γ π η γ ν η π η

slide-39
SLIDE 39

SCHMM

  • Differentiating w.r.t

and equating it to zero.

) | ( Λ Λ R

) (m i

π

) ( ) ( 1 1 ) ( 1 ) ( 1 ) ( 1 1 ) ( 1 1 ) ( 1 ) ( 1 log 1 ) ( log 1 ) ( ) | (

) , ( 1 1 1 ) ( 1 ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 1 ) ( 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 ) ( ) ( 1 ) ( ) ( ) , ( 1 1 ) ( ) ( ) ( ) , ( 1 1 1 1 ) ( ) (

j N i j i j l l j l i l i l i i R

n m W n N j m j N j n m W n m i m j n m W n N j m i n m W n m i m j n m W n N j m j n m W n N j m j N j m i n m W n m i m i n m W n m i m j N j m i m i n m W n m i m i m i n m W n N i M m m i m i

m m m m m m m m m m

γ η γ η η γ η γ π η γ η γ π η γ π η γ π π π η γ π π η γ π π

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = = = = = =

+ − + − = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ∴ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = − ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⇒ = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ∂ ∂ Λ Λ

Initial probability

slide-40
SLIDE 40

SCHMM

  • Differentiating w.r.t

and equating it to zero.

) | ( Λ Λ R

) (m ij

a

) , ( ) , ( 1 1 ) , ( 1 ) , ( 1 ) , ( 1 1 ) , ( 1 1 ) , ( 1 ) , ( 1 log 1 ) , ( log 1 ) , ( ) | (

) , ( 1 1 1 ) ( 1 ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 1 ) ( 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 ) ( ) ( 1 ) ( ) ( ) , ( 1 1 ) ( ) ( ) ( ) , ( 1 1 1 1 1 ) ( ) (

) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , (

j i N j i j i j i a j i l l j i a l j i a l j i a a l a j i a a j i a a R

n m t T t W n N j m ij N j n m t T t W n m ij m ij n m t T t W n N j m ij n m t T t W n m ij m ij n m t T t W n N j m ij n m t T t W n N j m ij N j m ij n m t T t W n m ij m ij n m t T t W n m ij m ij N j m ij m ij n m t T t W n m ij m ij m ij n m t T t W n N j N i M m m ij m ij

n m m n m m n m m n m m n m m n m m n m m n m m n m m n m m

γ η γ η η γ η γ η γ η γ η γ η γ η γ η γ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

+ − + − = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ∴ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = − ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⇒ = + ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ∂ ∂ Λ Λ

Transition probability

slide-41
SLIDE 41

SCHMM

  • Differentiating w.r.t

and equate it to zero.

) | ( Λ Λ R

) (m ik

w

) , ( ) , ( 1 1 ) , ( 1 ) , ( 1 ) , ( 1 1 ) , ( 1 1 ) , ( 1 ) , ( 1 log 1 ) , ( log 1 ) , ( ) | (

) , ( 1 1 1 ) ( 1 ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 1 ) ( ) , ( 1 1 1 ) ( 1 ) ( ) , ( 1 1 ) ( ) ( ) , ( 1 1 ) ( ) ( 1 ) ( ) ( ) , ( 1 1 ) ( ) ( ) ( ) , ( 1 1 1 1 1 ) ( ) (

) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , (

j i K k i j i k i w j i l l j i w l k i w l k i w w l w k i w w k i w w R

n m t T t W n K j m ij K j n m t T t W n m ik m ij n m t T t W n K j m ik n m t T t W n m ik m ij n m t T t W n K j m ij n m t T t W n N j m ij K j m ik n m t T t W n m ik m ik n m t T t W n m ik m ij K j m ik m ik n m t T t W n m ik m ik m ik n m t T t W n K k N i M m m ik m ik

n m m n m m n m m n m m n m m n m m n m m n m m n m m n m m

ξ ν ξ ν ν ξ ν ξ ν ξ ν ξ ν ξ ν ξ ν ξ ν ξ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

+ − + − = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ∴ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = − ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⇒ = − − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⇒ = + ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂ ⇒ = ∂ ∂ Λ Λ

Mixture weight

slide-42
SLIDE 42

SCHMM

  • Differentiating w.r.t

and equating it to zero.

  • Differentiating w.r.t

and equating it to zero.

) | ( Λ Λ R

) (m k

m

) 55 ( ) , ( log ) , | ( log ) (

) , ( ) , ( 1 1 1

) , (

= ∂ ∂ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂

∑ ∑ ∑

= = = k k k k k k n m t n m t T t W n M m

g N k

n m m

m r m m r m x ξ ) 56 ( ) , ( log ) , | ( log ) (

) , ( ) , ( 1 1 1

) , (

= ∂ ∂ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ ∂

∑ ∑ ∑

= = = k k k k k k n m t n m t T t W n M m

g N k

n m m

r r m r r m x ξ

) | ( Λ Λ R

) (m k

r

slide-43
SLIDE 43

SCHMM

  • Full Covariance matrix case :

) ( ) 1 ( ) )( ( ) 2 1 ( ) ( ) ( 2 1 ) , | ( log

) , ( ) , ( ) , ( ) , ( ) , ( k n m t k k n m t T k k k n m t k T k n m t k k k n m t

N m x r m x r r m x r m x m r m x − = − × − + × − = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − = ∂ ∂

( ) ( )

) ( ) )( ( 2 | | ) , ( 1 | | ) , ( 1 ) , ( log

) ( ) ( 2 2 1 2 2 1 ) ( ) ( 2 2 k k k k k k k k k tr D k k k tr D k k k k k k

k k k T k k k k k k k k k k k T k k k k

e e g e e g g µ m r r r µ m r r m r r m m r m

µ m r µ m r u r u µ m r µ m

− − = + − − × × = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × = ∂ ∂

− − − − − − − − − −

τ τ

τ α τ α

Full Covariance

slide-44
SLIDE 44

SCHMM

  • Full Covariance matrix case :

( )

) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (

) , ( 1 1 1 ) , ( ) , ( 1 1 1 ) , ( ) , ( 1 1 1 ) , ( 1 1 1 ) , ( 1 1 1 ) , ( ) , ( 1 1 1 ) , ( ) , ( 1 1 1

) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , (

k k k k k k k

n m t T t W n M m k n m t n m t T t W n M m k k k k k n m t n m t T t W n M m k k n m t T t W n M m k k k k k k k k n m t T t W n M m n m t k n m t T t W n M m k k k k k n m t k n m t T t W n M m

n m m n m m n m m n m m n m m n m m n m m

ξ τ ξ τ τ ξ τ ξ τ τ ξ ξ τ ξ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = = = = = = = =

+ + = ∴ + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + = + − − = − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − x µ m µ x m µ r m r m r x r µ m r m x r

Full Covariance

slide-45
SLIDE 45

SCHMM

  • Full Covariance matrix case :

[ ]

[ ]

[ ] [ ]

T k n m t k n m t k k n m t k T k n m t k k k k n m t k T k n m t k k k k n m t

N ) )( ( 2 1 ) ( ) ( 2 1 | | | | 2 1 | | ) ( ) ( 2 1 | | log ) , | ( log

) , ( ) , ( 1 ) , ( ) , ( 2 / 1 2 / 1 ) , ( ) , ( 2 / 1 ) , (

m x m x r m x r m x r r r m x r m x r r r m x − − − = ′ − − − ′ × × = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − + ′ = ∂ ∂

− − −

Full Covariance

slide-46
SLIDE 46

SCHMM

  • Full Covariance matrix case :

( )

k T k k k k k k k k T k k k k k k k D k k tr D k k k k k k

D D e e g g

k k k k k k T k k k k

u µ m µ m r u µ m µ m r r r r r m r r m

r u µ m r µ m

2 1 ) )( ( 2 2 2 1 ) 3 ( ) 2 ( ) 1 ( ) )( ( 2 ) 2 ( ) 3 ( ) 1 ( ) 3 ( ) 2 ( | | | | 2 | | ) , ( 1 ) , ( log

1 1 2 1 2 1 ) ( ) ( 2 2

− − − − − = ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − × × × + − − − × × × + × × × − = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × × = ∂ ∂

− − − − − − − − −

τ α τ α

α τ α

(1) (2) (3)

Full Covariance

slide-47
SLIDE 47

SCHMM

  • Full Covariance matrix case :

[ ]

D k k k D k D k

k n m t T t W n M m T k n m t k n m t n m t T t W n M m T k k k k k k k T k n m t k n m t n m t T t W n M m T k k k k k k k n m t T t W n M m k k T k k k k k k k T k n m t k n m t k n m t T t W n M m

n m m n m m n m m n m m n m m

− + − − + − − + = ⇒ − − + − − + = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − + ⇒ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − − − + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − − ∴

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = − = = = = = = − − − = = =

α ξ ξ τ ξ τ α ξ τ α ξ ) ( ) )( )( ( ) )( ( ) )( )( ( ) )( ( ) ( 2 1 ) )( ( 2 2 ) )( ( 2 1 ) (

) , ( 1 1 1 ) , ( ) , ( ) , ( 1 1 1 1 ) , ( ) , ( ) , ( 1 1 1 ) , ( 1 1 1 1 1 ) , ( ) , ( 1 ) , ( 1 1 1

) , ( ) , ( ) , ( ) , ( ) , (

m x m x µ m µ m u r m x m x µ m µ m u r u µ m µ m r m x m x r

Full Covariance

slide-48
SLIDE 48

SCHMM

  • The initial estimate can be chosen as the

mode of the prior PDF

  • And also can be chosen as the mean of the

prior PDF

( )

1 ) ( ) ( ) (

and DHMM to same , ,

− = =

k k k k k m ik m ij m i

D w a u r µ m α π

1 ) ( ) ( ) (

and DHMM to same , ,

= =

k k k k k m ik m ij m i

w a u r µ m α π

Full Covariance

slide-49
SLIDE 49

SCHMM

  • Diagonal Covariance matrix case :
  • Then

and

kd kd n m td D d

r m x kd D d k k n m t

e r N

2 ) , ( 1

) ( 2 1 2 / 1 1 ) , (

) , | (

− − =

∑ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ∝

=

r m x

C e e r g e e r g

kd kd kd kd kd kd kd kd kd kd kd kd kd kd

r m r kd D d k k r m r kd D d k k

+ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∝

− − − − = − − − − =

∑ ∏

β µ τ α β µ τ α

2 2

) ( 2 1 2 / 1 1 ) ( 2 1 2 / 1 1

log ) , ( log ) , ( r m r m

Diagonal Covariance

slide-50
SLIDE 50

SCHMM

  • Diagonal Covariance matrix case :

) ( ) 2 )( ( 2 ) , ( 1 ) , ( 1 ) , ( log ) ( ) 1 ( ) )( 2 ( ) 2 1 ( ) , | ( 1 ) , | ( log

2 2 2 ) , ( 1

) ( 2 1 2 / 1 ) ( 2 1 2 / 1 ) , ( ) , ( ) ( 2 1 2 / 1 1 ) , ( ) , ( kd kd kd kd kd kd kd kd r m r kd k k r m r kd k k kd k k kd n m td kd kd n m td kd m x r kd D d k k n m t kd k k n m t

m r r m e e r g e e r g m g m x r m x r e r N m N

kd kd kd kd k kd kd kd kd kd kd kd kd kd kd n m td kd D d

µ τ µ τ

β µ τ α β µ τ α

− − = × − − × × = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × = ∂ ∂ − = − × − × × − × ∑ × ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × = ∂ ∂

− − − − − − − − − − =

=

r m r m r m r m x r m x

Diagonal Covariance

slide-51
SLIDE 51

SCHMM

  • Diagonal Covariance matrix case :

( )

) ( ) ( ) ( ) ( ) ( ) ( ) (

) , ( 1 1 1 ) , ( ) , ( 1 1 1 ) , ( ) , ( 1 1 1 ) , ( 1 1 1 ) , ( ) , ( 1 1 1

) , ( ) , ( ) , ( ) , ( ) , (

k x k m x k m m k m r m x r k

n m t T t W n M m kd n m td n m t T t W n M m kd kd kd kd kd n m td n m t T t W n M m kd kd kd n m t T t W n M m kd kd kd kd kd n m td kd n m t T t W n M m

n m m n m m n m m n m m n m m

ξ τ ξ µ τ µ τ ξ τ ξ µ τ ξ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = = = = = = = =

+ + = ⇒ + = + ⇒ = − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − ∴

Diagonal Covariance

slide-52
SLIDE 52

SCHMM

  • Diagonal Covariance matrix case :

[ ]

2 ) , ( 1 ) , ( ) ( 2 1 ) ( 2 1 2 / 1 1 ) , ( ) , (

) ( 2 1 ) ( ) 2 1 ( 2 1 ) , | ( 1 ) , | ( log

2 ) , ( 1 2 ) , ( 1

kd n m td k kd n m td r m x r m x ki D d i kd D d k k n m t kd k k n m t

m x r m x e e r r N r N

kd kd n m td D d kd kd n m td D d

− − = ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − × − × ∑ + ∑ × ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × = ∂ ∂

− − − − − ≠ − =

= =

∏ ∏

r m x r m x Diagonal Covariance

slide-53
SLIDE 53

SCHMM

  • Diagonal Covariance matrix case :

kd kd kd kd kd kd kd kd kd kd kd kd k k r m r kd k k kd k k

m r m r g e e r g r g

kd kd kd kd kd kd kd kd

β µ τ α β µ τ α

α β µ τ α

− − − × − = ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − × × × + − − × × × + × × × − × = ′ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × × = ∂ ∂

− − − − − − 2 1 2 2 / 3 ) ( 2 1 2 / 1

) ( 2 ) 2 / 1 ( ) 2 1 ( ) 3 ( ) 2 ( ) 1 ( ) ( 2 ) 2 ( ) 3 ( ) 1 ( ) 3 ( ) 2 ( ) 2 / 1 ( ) , ( 1 | | ) , ( 1 ) , ( log

2

r m r m r m

Diagonal Covariance

slide-54
SLIDE 54

SCHMM

  • Diagonal Covariance matrix case :

[ ]

) ( ) 1 2 ( ) )( ( ) ( 2 ) )( ( ) ( 2 ) 1 2 ( ) ( ) ( 2 ) 2 / 1 ( ) ( 2 1 ) (

) , ( 1 1 1 2 ) , ( ) , ( 1 1 1 2 1 2 ) , ( ) , ( 1 1 1 2 1 1 ) , ( 1 1 1 2 1 2 ) , ( 1 ) , ( 1 1 1

) , ( ) , ( ) , ( ) , ( ) , (

k m x k m r m x k m r r k m r m x r k

n m t T t W n M m kd kd n m td n m t T t W n M m kd kd kd kd kd kd n m td n m t T t W n M m kd kd kd kd kd kd kd n m t T t W n M m kd kd kd kd kd kd kd n m td kd n m t T t W n M m

n m m n m m n m m n m m n m m

ξ α ξ µ τ β ξ µ τ β α ξ β µ τ α ξ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = − = = = − − = = = − − = = =

+ − − + − + = − + − + = × − + = − − − × − + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − × ∴

Diagonal Covariance

slide-55
SLIDE 55

SCHMM

  • The initial estimate can be chosen as the

mode of the prior PDF

  • And also can be chosen as the mean of the

prior PDF

( )

kd kd kd kd kd m ik m ij m i

r m w a β α µ π 2 / 1 and DHMM to same , ,

) ( ) ( ) (

− = =

kd kd kd kd kd m ik m ij m i

r m w a β α µ π = = and DHMM to same , ,

) ( ) ( ) (

Diagonal Covariance

slide-56
SLIDE 56

CDHMM

  • Continuous Density HMM case:

) ( ) ( 2 1 2 / 1 2 / ) ( ) ( 2 1 2 / 1 2 / 1 1

| | ) 2 ( ) , | ( where | | ) 2 ( ) , | ( where and ) , | ( ) ( ) , | ( ) ( Then

ik t ik T ik t k t k T k t

e N e N N w b N w b

ik D ik ik t k D k k t K k ik ik t ik t i K k k k t ik t i m x r m x m x r m x

r r m x r r m x r m x x r m x x

− − − − − − − − = =

= ⇓ = = ⇓ =

∑ ∑

π π

slide-57
SLIDE 57

CDHMM

( ) ( )

) , | ( log ) , ( ) , | ( log ) ( function Q In

) , ( ) , ( 1 1 1 1 1 ) , ( ) , ( 1 1 1 1

) , ( ) , (

ik ik n m t n m t T t W n M m K k N i k k n m t n m t T t W n M m K k

N k i N k

n m m n m m

r m x r m x ξ ξ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = =

⇓ −

slide-58
SLIDE 58

CDHMM

∏ ∏ ∑ ∑ ∑

= − − − − = − − − − − − − − − − − − − − = = =

∝ ⇓ ∝ ∝ ⇓ ∝ ⇓

D d r m r ikd ik ik D d r m r kd k k tr D ik ik ik tr D k k k ik ik K k N i k k K k

ikd ikd ikd ikd ikd ikd ikd kd kd kd kd kd kd kd ik ik ik ik ik T ik ik ik ik k k k k k T k k k k

e e r g e e r g e e g e e g g g g

1 ) ( 2 2 2 / 1 1 ) ( 2 2 2 / 1 ) ( 2 1 ) ( ) ( 2 2 ) ( 2 1 ) ( ) ( 2 2 1 1 1

2 2

) , ( ) case covariance Diagonal ( ) , ( and | | ) , ( ) case covariance Full ( | | ) , ( and ) , ( log ) , ( log ) ( log In

β µ τ α β µ τ α τ α γ τ α

r m r m r r m r r m r m r m Λ

r u µ m γ µ m r u µ m µ m

slide-59
SLIDE 59

Maximum Likelihood Linear Regression Maximum Likelihood Maximum Likelihood Linear Regression Linear Regression

slide-60
SLIDE 60

MLLR

  • Linear transformation of original model (SI)

to maximize likelihood of adaptation

  • MLLR is multiplicative; MAP is additive
  • MLLR much less conservative than MAP – a

few sec. of data may change model dramatically. Background

slide-61
SLIDE 61

MLLR

– Speaker Adaptation of HMMs Using Linear Regression – TR’94 Leggetter and Woodland – Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models – CSL’95 Leggetter and Woodland – MLLR:A Speaker Adaptation Technique for LVCSR – Hamaker Reference :

slide-62
SLIDE 62

MLLR

  • The regression transform is first derived

for the single Gaussian distribution pre state, and later extended to the general case of Gaussian mixtures.

  • So, the p.d.f for the state s is

( )

matrix covariance the is and mean the is | | 2 1 ) (

) ( ) ( 2 / 1 2 / 1 2 /

1

j j j D j

j j T j

e b C µ C x

µ x C µ x − − −

= π

Single Gaussian Case

slide-63
SLIDE 63

( )

) ( ) ( 2 1 2 / 1 2 / 1 1

1

| | ) 2 ( 1 ) ( So

  • ffsets

the ignore If regression the in

  • ffset

an include 1 If ) matrix ) 1 ( an ( transform linear the is , where is mean adapted the

  • f

estimate The regression the for term

  • ffset

the is where define we then mean, the is If

j j j T j j

e b D D

j D j D D ξ W x C ξ W x

C x b A W Wξ b Aµ µ ξ µ

− − −

= ⇒ = ⇒ = + × = = + = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = π ω ω ω µ µ ω µ µ M M

MLLR

Single Gaussian Case

slide-64
SLIDE 64

MLLR

  • A more general approach is adopted in which

the same transformations matrix is used for several distributions. Regression Class

  • If some of the distributions are not observed

in the adaptation data, a transformation may still be applied. Models would update whether correspond adaptation data observed or not. Single Gaussian Case

slide-65
SLIDE 65

MLLR

  • MLLR estimates the regression matrices to

maximize the likelihood of the adapted models generation the adaptation data. Maximize the likelihood to obtain the regression matrices.

  • Full and Diagonal covariance cases will be

discussed. Single Gaussian Case

slide-66
SLIDE 66

MLLR

Assume the adaptation data, X, is a series of T

  • bservations.

Denote the current set of model parameters by and a re-estimated set of model parameters as Current extended mean Re-estimated mean

T

x x x X ,..., ,

2 1

=

λ λ

µ ξ

Single Gaussian Case

slide-67
SLIDE 67

MLLR

The total likelihood is

λ S X λ S X λ S X λ X

S

model given sequence state the using generating

  • f

likelihood the is ) | , ( ) | , ( ) | ( f f f

=

The quantity is the objective function to be maximized during adaptation.

) | ( λ X f

Single Gaussian Case

slide-68
SLIDE 68

MLLR

We define the auxiliary function

[ ]

=

S

λ S X λ X S λ λ ) | , ( log ) , | ( ) | ( f f Q

Since only the transformations are re-estimated,

  • nly the output distributions are affected so

the auxiliary function can be written as

j

W ) (

t j

b x Single Gaussian Case

) ( log ) , | ( ) | (

1 t j t T t

b j s f constant Q x λ X λ λ

S

= + =

∑ ∑

=

slide-69
SLIDE 69

MLLR

= =

S

λ X ) , | ( ) ( define We j s f t

t j

γ

So…The Q-function can be rewritten as

=

+ =

T t t j j

b t constant Q

1

) ( log ) ( ) | ( x λ λ γ

Single Gaussian Case

slide-70
SLIDE 70

MLLR

Expanding then the auxiliary function is

) ( log

t j

b x

[ ]

) ( ) ( ) , ( where ) , ( | | log ) 2 log( ) ( 2 1 ) | (

1 1 1 j j t j T j j t t T t t j j N j

j h j h D t constant Q ξ W x C ξ W x x x C λ λ − − = + + × − =

− = = ∑

π γ

[ ]

∑ ∑

= =

+ + ∂ ∂ − = ∂ ∂

T t t j j N j s s s

j h D t Q Q

1 1

) , ( | | log ) 2 log( ) ( 2 1 ) | ( is w.r.t ) | (

  • f

al differenti The x C W W λ λ W λ λ π γ

Single Gaussian Case

slide-71
SLIDE 71

MLLR

[ ] ( )

[ ]

[ ] [ ] T

j j j t j T j j T j j j j T j j j T j T j t T j T j t j j j j T j T j j j T t T j t j T j T j j j j j T j T j j j j T t t j T j T j t j T t j j j t j T j T j T t j j j t j T j j t j j t j t

j h j h ξ ξ W x C C C ξ ξ W C ξ ξ W C ξ x C ξ x C ξ W C W ξ ξ W x C x C W ξ W ξ W C W ξ ξ W C x x C W ξ x C x W ξ W x C W ξ x W ξ W x C ξ W x W W x W x − − = = + + − − = + − − ∂ ∂ = + − − ∂ ∂ = − − ∂ ∂ = − − ∂ ∂ = ∂ ∂

− − − − − − − − − − − − − − − − 1 1 1 1 1 1 1 1 1 1 1 1

2 ) ( ) ( ) ( ) ( ) , ( is w.r.t ) , (

  • f

al differenti The Q

Single Gaussian Case

slide-72
SLIDE 72

MLLR

[ ]

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = − = − = − = − = −

= = ∴ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⇒ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⇒ = ∴ = − = ∂ ∂

T t j T t t j j j j j T t j j T t t j T j j T t j j j T j T t t j j T t T j j j j j T t T j t j j T t T j j j t j j j

t t t t t t t t t Q

1 1 1 1 1 1 1 1 1 1 1 1 1 1

) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) | ( γ γ γ γ γ γ γ γ γ x ξ W µ ξ W x ξ ξ W C ξ x C ξ ξ W C ξ x C ξ ξ W x C λ λ W

Then complete the differentiation, and equating to zero. Single Gaussian Case

slide-73
SLIDE 73

MLLR

  • Regression Class Tree for MLLR

Tied Regression Matrices

slide-74
SLIDE 74

MLLR

{ }

[ ]

[ ]

[ ]

[ ]

) 1 ( ) 1 ( ) ( 1 1 ) ( ) 1 ( 1 1 1 ) 1 ( 1 ) ( ) ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 1 1 ) 1 ( 1 1 1 1 1 1 1 1 1 1 ) (

) ( ) ( where ) ( ) ( ) ( ) ( then , class regression the in states by the shared is If ,..., class regression th he Consider t

+ × + × − = + × = − = + × = + × + × + + × × − = = + × = − = = − = = − =

= ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⇒ = =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

D D T s s r D D s s T t r D D T s t s T t s R r D D R r r s r D D D D T s s D D s D D s s T t R r D D T s t s T t s R r R r T s s s s s T t R r T s t s s T t (s) s R s

r r r r r r r r r r r r r r r r r r r r r

t t t t t t RC W s s RC s ξ ξ D C V ξ x C Z D W V Z ξ ξ W C ξ x C ξ ξ W C ξ x C γ γ γ γ γ γ

Tied Regression Matrices

slide-75
SLIDE 75

MLLR

[ ] [ ] [ ] [ ]

[ ]

[ ]

[ ] [ ] [ ] [

]

[ ]

[ ] [ ] { } [ ] [

]

[ ]

[ ]

[ ] [ ] ⎥

⎦ ⎤ ⎢ ⎣ ⎡ = ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = = ⇒ = + ×

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= + = = = + = = + = = = + × = + × + + × × + × + × qj r ip r R r pq s D q D p qj r pq s ip r D p D q R r qj r iq s r D q R r ij kj s ik r D k ij s r D D R r D D r D D s D D r ij ij D D D D

D D

) ( ) ( 1 1 1 1 ) ( ) ( 1 1 1 1 ) ( ) ( 1 1 1 ) ( 1 ) ( ) 1 ( 1 ) 1 ( ) 1 ( ) ( ) 1 ( ) ( ) 1 ( ) 1 (

where matrix ) 1 ( the by denoted is side hand right If D V W D W V D W V Y W V W V D W V Y Y Z Y Z Y

Tied Regression Matrices

slide-76
SLIDE 76

MLLR

[ ] [ ] [ ] [ ]

[ ] [ ]

[ ]

[ ] [ ]

[ ]

[ ] [ ]

[ ] [ ]

) ( 1 1 ) ( ) ( 1 1 1 ) ( ) ( 1 1 1 1 ) ( ) ( 1 ) ( ) ( 1 ) ( ) (

symmetric is and diagonal is diagonal is matrix covariance the If

i jq iq s D q jq r ii r R r iq s D q qj r ip r R r pq s D q D p ij ij jq r ip r R r qj r ip r R r r r

p i p i G W D V W D V W Y Z D V D V D V

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

+ = = + = = + = = = =

= ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = = ∴ ⎪ ⎩ ⎪ ⎨ ⎧ ≠ = = ⇒

Tied Regression Matrices [ ]

) , ( j i pq

G

slide-77
SLIDE 77

MLLR

[ ]

[ ] [

]

[ ] [

]

[ ]

[ ]

[ ] [ ] [

]

[ ] [

]

[ ]

[ ]

[ ] [ ]

[ ]

[ ]

[ ]

[ ]

[ ]

[ ]

⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ = + + + = + + + = + + +

+ + + + + + + + + + 1 , 1 , ) ( 1 , 1 2 , ) ( 2 , 1 1 , ) ( 1 , 1 2 , 1 , ) ( 1 , 2 2 , ) ( 2 , 2 1 , ) ( 1 , 2 1 , 1 , ) ( 1 , 1 2 , ) ( 2 , 1 1 , ) ( 1 , 1

equations linear below solving by

  • f

row a

  • btain

can we Then

D i D i s i D D i s i D i s i D i D i s i D i s i i s i i D i s i D i s i i s i s

i Z W G W G W G Z W G W G W G Z W G W G W G W L M L L

Tied Regression Matrices

1 = ← j 2 = ← j

1 + = ← D j

slide-78
SLIDE 78

MLLR

[ ]

[ ]

[ ]

[ ]

[ ]

[ ] [ ]

[ ]

[ ]

[ ]

[ ] [ ]

[ ]

[ ]

[ ]

[ ]

⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ = + + = + + = + +

+ + + + + + + + + 1 , 1 , ) 1 , ( 1 , 1 , 1 ) 1 , ( 1 , 1 2 , 1 1 , ) 2 , 1 ( 1 , 1 , 1 ) 2 , 1 ( 1 , 1 1 , 1 1 , ) 1 , 1 ( 1 , 1 , 1 ) 1 , 1 ( 1 , 1

equations linear below solving by

  • btain

could we full, still is matrix covariance the If

D D D D s D D D D s D D D D s D D s D D s D D s s

Z W G W G Z W G W G Z W G W G W L M L L

Tied Regression Matrices

1 , 1 = = ← j i 2 , 1 = = ← j i

1 , + = = ← D j D i

slide-79
SLIDE 79

MLLR

  • Then the p.d.f for the state j would be
  • Then likelihood

( )

weight mixture the is and matrix covariance the is and mean the is | | 2 1 ) (

) ( ) ( 2 / 1 2 / 1 2 / 1

1

jk jk jk jk D jk K k j

w e w b

jk jk T jk

C µ C x

µ x C µ x − − − =

= π

sequence mixture possible

  • ne

is and sequence state possible

  • ne

is where ) | , , ( ) | ( L S λ L S X λ X

L S

f f

∑ ∑

=

Mixture Gaussian Case

slide-80
SLIDE 80

MLLR

  • Then Q-function will be
  • The derivation is the same as single Gaussian

case, just substitute for

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = ∴ =

= = L S L S L S

λ X x x λ X λ λ λ L S X λ X L S λ λ ) , | , ( ) ( where ) ( log ) ( ) ( log ) , | , ( ) | ( transform. regression the

  • n

dependent which term the consider Only ) | , , ( log ) , | , ( ) | (

1 1

k l j s f t b t b k l j s f Q f f Q

t t jk T t t jk jk T t t jk t t b

γ γ

) (t

j

γ

) (t

jk

γ

Mixture Gaussian Case

slide-81
SLIDE 81

MLLR

  • If all the covariance of the distributions

tied to the same transformation are the same a special case of MLLR

  • Then

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = − = = − =

= ⇒ =

R r T s s s s T t R r T s t s T t R r T s s s s s T t R r T s t s s T t

r r r r r r r r r r r r

t t t t

1 1 1 1 as rewritten be can 1 1 1 1 1 1

) ( ) ( ) ( ) ( ξ ξ W ξ x ξ ξ W C ξ x C γ γ γ γ

Least Squares Regression

slide-82
SLIDE 82

MLLR

  • If each frame is assigned to exactly one

distribution ( Viterbi alignment )

  • Then

t n t t t n t t n t t t n t t n

s RC T s s T t s s RC T s t T t n t s RC T s s s s RC T t T s t s RC T t

RC s δ

, 1 , 1 ) ( , , 1 , 1

) ( ) ( ) ( ) ( ) (

  • therwise

1 where δ δ δ δ ξ ξ ξ W ξ x ξ ξ W x

∑ ∑ ∑ ∑

= = = =

= ⇒ ⎩ ⎨ ⎧ ∈ = =

⎩ ⎨ ⎧ =

  • therwise
  • n

distributi state to assigned is if 1 ) (

r t s

s t

r

x γ Least Squares Regression

slide-83
SLIDE 83

MLLR [ ]

[ ]

( )

1 , , 2 , 1

then , , , , , , as , matrices Define

) ( 2 ) ( 1 ) ( 2 1

= = = =

T T s T T s s RC T s RC s RC s s s

T n n n T

XX YX W YX XX W x x x Y ξ ξ ξ X Y X δ δ δ L L

Least Squares Regression

slide-84
SLIDE 84

MLLR

  • If the scaling portion of the regression

matrix is assumed to be diagonal, the computation can be vastly reduced.

1 ) 2 ( 1 , 2 , 1 1 , 1 , 1 ) 1 ( 1 , 1 , 3 , 2 1 , 2 2 , 1 1 , 1 × + + × +

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = ⇒ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = ∴

D D D D s D D D D D s

w w w w w w w w w w M M L M O M M M L L w W

i i

yµ x µ + = that means It

Single Variable Linear Regression

slide-85
SLIDE 85

MLLR

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = ×

− D D s s

matrix D D an define And µ ω µ ω µ ω µ ω 2

1 2 1

L L L L M M O M M M M O M M L L L L D D

) ( ) ( ) ( ) ( ) , (

1 1 s s t s T s s t s s t s T s s t t s

h w D x C w D x ξ W x C ξ W x x − − = − − =

− −

Single Variable Linear Regression

slide-86
SLIDE 86

MLLR

[ ] [ ] [ ]

( )

( ) ( )

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⇒ = − = ∂ ∂ ∴ − − = + − − − = − − − ∂ ∂ = − − ∂ ∂ = ∂ ∂

∑ ∑ ∑ ∑ ∑

= − − − = − = = − − = − − − − − − − − − − t s T t s T s s s T s s T t s s s s T s s T t t s T t s T s s s t s T s s T t s s s t s T s s T s s T s s s T s t s T s T s s T t s s s T s T s t s T s T s s s s T t t s T t s s s t s T s s t s s t

t t t t t Q s h x C D D C D w w D C D x C D w D x C D λ λ w w D x C D w D C D D C D x C D D C x w D C D w x C D w w D C x x C x w w D x C w D x w w x ) ( ) ( ) ( ) ( ) ( ) | ( 2 ) ( ) ( ) , (

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

γ γ γ γ γ

Single Variable Linear Regression

slide-87
SLIDE 87

MLLR

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⇒ =

− = = − − = = − = = − = =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

t s T s s T t R r s s T s s T t R r s s s s T s s T t R r t s T s s T t R r

r r r r r r r r r r r r r r

t t t t x C D D C D w w D C D x C D

1 1 1 1 1 1 1 1 1 1 1 1 1

) ( ) ( ) ( ) ( : case matrix regression tied the to extension The γ γ γ γ

Single Variable Linear Regression

slide-88
SLIDE 88

MLLR

  • Two approaches were considered:

– 1.based on broad phonetic classes.

  • Models which represent the same broad phonetic

class were placed in the same regression class.

– 2.based on clustering of mixture components.

  • The mixture components were compared using a

likelihood measure and similar components placed in the same regression class.

  • The data driven approach was found to be more

appropriate for defining large numbers of classes.

Defining Regression Classes

slide-89
SLIDE 89

MLLR

– Variance Compensation Within the MLLR Framework for Robust Speech Recognition and Speaker Adaptation –ICSLP’96 Gales – Mean and variance adaptation within the MLLR framework – CSL’96 Gales and Woodland – MLLR:A Speaker Adaptation Technique for LVCSR – Hamaker Variance Adapted Reference :

slide-90
SLIDE 90

MLLR

  • We apply Cholesky Decomposition to the

inverse of covariance matrix:

  • We can observe that
  • Now the inverse of covariance matrix is

updated by

  • So

1 1

matrix triangular lower a is where

− − −

= ∴ =

s T s s s T s s s

L L C L L L C

[ ]

[ ] [ ]jd

s id s D d ij s

L L C

= −

=

1 1

tion transforma linear the is where

1 1 s T s s s s

H L H L C

− − = 1 − −

=

s s T s s

L H L C

Variance Adapted Single Gaussian Case

slide-91
SLIDE 91

MLLR

  • What does the transformation mean ?

– Origin : – New : [

]

[ ] [ ] [

]kd

s jd s ik s D k D d ij s 1 1 1 1 − = = −

∑ ∑

= H L L C

[ ]

[ ] [ ]jd

s id s D d ij s

L L C

= −

=

1 1

i-th row

s

L

T s

L

j-th column

1 − s

H

d-th column k-th row

Variance Adapted Single Gaussian Case

slide-92
SLIDE 92

MLLR

  • The auxiliary can be obtained

[ ] [ ] [ ] [ ] [ ]

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= − = − − − − − − = − − − = = − − − = = − =

− − + + + × − = ⋅ = ⋅ = ⋅ ⋅ = ⋅ ⋅ − − + ⋅ ⋅ + × − = − − + + × − = − − + + × − =

T t j T j t T j j T j T j t T j j j j N j j j j j T j j j T j j j T j T t j T j t T j j T j T j t T j j j T j j N j T t j t T j j j T j t j j T j j N j T t j t j T j t j j N j

D t D t D t D t constant Q

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

) ( ) ( | | log | | log ) 2 log( ) ( 2 1 | | | | | | | | | | | | | | | | | | | | ) ( ) ( | | | | | | log ) 2 log( ) ( 2 1 ) ( ) ( | | log ) 2 log( ) ( 2 1 ) ( ) ( | | log ) 2 log( ) ( 2 1 ) | ( µ L x L H µ L x L H C H C H L L H L L L H L µ L x L H µ L x L L H L µ x L H L µ x L H L µ x C µ x C λ λ π γ π γ π γ π γ Q y probabilit transition

Variance Adapted Single Gaussian Case

slide-93
SLIDE 93

MLLR

  • Differentiate Q-function w.r.t

and set it to zero then…

[ ]

) ( ) )( )( ( ) )( )( ( ) ( ) )( ( ) ( ) ( ) )( ( | | | | 1 ) ( ) ( ) ( | | log ) (

1 1 1 1 1 1 1 1 1 1

t t t t t t t t

j T t T j T j t T j j T j t T j j T t T j T j T j t T j j T j t T j j T t T j j T t T j T j T j t T j j T j t T j T j j T t T j j T t T j T j T j t T j j T j t T j T j T j j j j T t j T j t T j j T j T j t T j j j T t N j j

γ γ γ γ γ γ γ γ

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = − − = − = − = − = =

− − = − − = × − − = × = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − − − × × = − − + ∂ ∂ µ L x L µ L x L H µ L x L µ L x L H H µ L x L µ L x L H H H µ L x L µ L x L H H H H µ L x L H µ L x L H H

j

H

Variance Adapted Single Gaussian Case

slide-94
SLIDE 94

MLLR

T j j T j j T t j T j t j t j T t T j j T t T j T j t T j j T j t T j j T t T j

t t t t H H H L µ x µ x L µ L x L µ L x L H = ∴ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − = − − = ∴

∑ ∑ ∑ ∑

= = = =

. symmetric is that

  • bserve

can We ) ( ) )( )( ( ) ( ) )( )( (

1 1 1 1

γ γ γ γ

Variance Adapted Single Gaussian Case

slide-95
SLIDE 95

MLLR

then } , , { states by shared is If

1 R s

s s R L H

) ( ) )( )( (

1 1 1 1

t t

r r r r r r

s T t R r s T s t s t s T t T s R r s

γ γ

∑ ∑ ∑ ∑

= = = =

⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − = L µ x µ x L H

Variance Adapted Tied Regression Matrices Case

slide-96
SLIDE 96

MLLR another approach MLLR another approach MLLR another approach

slide-97
SLIDE 97

MLLR another approach

  • Reference:

– Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures – SAP’95 Vassilios V. Digalakis

slide-98
SLIDE 98

Introduction

  • This approach is an extension of model

space MLLR where the covariances of the Gaussian components are constrained to share the same transforms as the means.

  • The transformed means and variances

and are given as a function of the transform parameters:

T

A AΣ Σ b Aµ µ = + =

slide-99
SLIDE 99

Single Gaussian Case

  • Assume the adaptation data, X, is a series of T
  • bservations.
  • For each state s
  • Denote the initial model by
  • Current set of model parameters by applying

the transformation

T

x x x X ,..., ,

2 1

= ) , (

) ( ) ( ) ( s s s

Σ µ λ = ) , (

) ( ) ( T s s s s s s s

A Σ A b µ A λ + =

s

A

slide-100
SLIDE 100

Single Gaussian Case

  • Re-estimated set of model parameters by

applying the transformation

  • We denote the parameter set

number state total the is } , , , , , , , { } , , , , , , , {

2 1 2 1 ) ( ) ( 2 ) ( 1 ) ( ) ( 2 ) ( 1 s N N N N

N

s s s s

b b b A A A η Λ L L L L = Σ Σ Σ = µ µ µ

s

A

) , (

) ( ) ( T s s s s s s s

A Σ A b µ A λ + =

slide-101
SLIDE 101

Single Gaussian Case

η Λ S X Λ η S X Λ η S X Λ η X

S

ation transform and model given sequence state the using generating

  • f

likelihood the is ) , | , ( and ) , | , ( ) , | ( where p p p

=

) ( ) , | ( max arg ) , | ( max arg η Λ η X X Λ η η

η η

g p p

MAP

= =

slide-102
SLIDE 102

Single Gaussian Case

[ ]

=

S

Λ η S X Λ η X S η η ) , | , ( log ) , , | ( ) | ( p p Q

  • We define the auxiliary function
  • Since only the transformations are

re-estimated, only the output distributions are affected so the auxiliary function can be written as

s s b

A , ) (

t s

b x

) , | ( ) ( where ) ( log ) ( ) | (

1

Λ X x Λ Λ

S

j s p t b t constant Q

t j t j j T t

= = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + =

∑ ∑

=

γ γ

slide-103
SLIDE 103

Single Gaussian Case

  • Where

( ) ( ) ( )

[ ]

) ( ) ( log 2 log 2 1 ) ( ) ( 2 1 log 2 1 2 log 2 ) ( log 2 ) (

1 1 ) ( ) ( 2 1 2 1 2 j t j T j t j j t j T j t j t j j D t j

D D b e b

j t j T j t

µ x Σ µ x Σ µ x Σ µ x Σ x Σ x

µ x Σ µ x

− − + + − = − − − − − = =

− − − − − − −

π π π

slide-104
SLIDE 104

Single Gaussian Case

  • Expanding then the auxiliary

function is

) ( log

t j

b x

[ ] ( )

) ( ) ( ) , ( where ) , ( | | log ) 2 log( ) ( 2 1 ) | (

) ( 1 ) ( ) ( 1 ) ( 1 j j j t T j j j T j j j t t T t t T j j j j N j

j h j h D t constant Q

s

b µ A x A Σ A b µ A x x x A Σ A η η − − − − = + + × − =

− = = ∑

π γ

[ ] [ ]

) ( ) ( ) ( ) (

matrix

  • f

element column th and row th the is and matrix

  • f

element column th and row th the is that Assume

j pq j j pq j

q p q p A A Σ Σ

slide-105
SLIDE 105

Single Gaussian Case

( )

[ ]

[ ] [ ] [ ] [ ]

) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( | | log ) ( ) ( ) ( | | log ) ( ) | (

1 ) ( 1 1 ) ( ) ( ) ( 1 1 ) ( 1 1 ) ( ) ( 1 1 ) ( ) ( 1 ) ( 1 ) ( ) ( ) (

= ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ + − − − − − − + ∇ ⇒ = − − − − + ∇ ⇒ = ∇

∑ ∑

= − − − − − − − − − − − − = − T t j j j j T j T j j j j j j T j T j t j t j j T j T j j j t j j T j T j t T j j j j T t j j j t T j j j T j j j t T j j j j

t t Q

j j j

µ A A Σ A µ A µ A A Σ A b x b x A Σ A µ A b x A Σ A b x A Σ A b µ A x A Σ A b µ A x A Σ A η η

A A A

γ γ

slide-106
SLIDE 106

Single Gaussian Case

[ ] [ ] [ ] [ ] [ ] [ ]

( ) [ ]

[ ]

[ ]

[ ]

[ ]

) ( ) ( ) )( ( 2 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( | | log ) (

1 ) ( 1 ) ( ) ( ) ( 1 ) ( 1 ) ( 1 ) ( 1 1 ) ( ) ( ) ( 1 1 ) ( 1 1 ) ( ) ( 1 1 ) ( ) (

= ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − + − + − − + − ⇒ = ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ + − − − − − − + ∇ ⇒

∑ ∑

= − − − − − − − − − − − − = − − − − − − − − − − − − T t T j T j t j j T j T j T j t j T j T j T j T j t j t j T j j T j T j j T t j j j j T j T j j j j j j T j T j t j t j j T j T j j j t j j T j T j t T j j j j

t t

j

A b x µ Σ A A b x µ Σ A A b x b x A Σ Σ A A µ A A Σ A µ A µ A A Σ A b x b x A Σ A µ A b x A Σ A b x A Σ A

A

γ γ

slide-107
SLIDE 107

Single Gaussian Case

[ ] [ ]

( ) [ ]

[ ]

[ ]

[ ]

[ ]

[ ] [ ]

( )

[ ] [ ]

) ( ) ( ) )( ( 2 ) ( side left the in and right the in multiply ) ( ) ( ) )( ( 2 ) (

1 ) ( 1 ) ( ) ( ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ( ) ( ) ( 1 ) ( 1 ) (

= ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − + − + − − + − = ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − + − + − − + −

∑ ∑

= − − − − − = − − − − − − − − − − − − T t T j t j j T j t j T j T j t j t j T j j T j j T j T t T j T j t j j T j T j T j t j T j T j T j T j t j t j T j j T j T j j

t t b x µ Σ b x µ Σ b x b x A Σ Σ A A A b x µ Σ A A b x µ Σ A A b x b x A Σ Σ A A γ γ

slide-108
SLIDE 108

Single Gaussian Case

  • Since covariance is diagonal, means that

there is no correlation between dimensions. Hence, the transformation is assumed to be diagonal.

[ ] [ ] [ ]

[ ]

[ ] [ ]

[ ]

) ( ) )( ( ) ( ) ( 2 ) )( ( 2 2 ) ( symmetric is

1 ) ( 1 ) ( 1 1 ) ( 1 ) ( 1 ) ( 1 1 ) ( 1 ) (

= − + − − − = − + − − −

∑ ∑

= − − − = − − − − T t T j t j j T j t j t j j T j j T t T j t j j T j t j t j j T j j j

t t b x µ Σ b x b x A Σ A b x µ Σ b x b x A Σ A Σ γ γ Q

slide-109
SLIDE 109

Single Gaussian Case (ML)

[ ] [ ] [ ] [ ] ( )

, , ) ( ) ( ) )( ( ) ( Let

) ( ) ( ) ( 2 ) ( ) ( ) ( 1 2 ) ( 1 ) ( 1 1 ) ( 1

= − + ∴ = = − + = − = − −

− = − =

∑ ∑

j pp j p j p j p j D j j T j T j T j t j j T t j T j t j t j T t j

d a z a a a diag t t L A D Z A A Z b x µ Σ D b x b x Σ γ γ

[ ] [ ]

) ( ) ( ) )( ( ) (

) ( 1 ) ( 1 1 ) ( 1

= − + − − −

− = − =

∑ ∑

T j t j j T t j T j T j t j t j T t j T j T j

t t b x µ Σ A b x b x Σ A A γ γ

slide-110
SLIDE 110

Constrained Maximum Likelihood Linear Regression Constrained Maximum Constrained Maximum Likelihood Linear Likelihood Linear Regression Regression

slide-111
SLIDE 111

CMLLR

  • Reference:

– Constrained Maximum Likelihood Linear Regression for Speaker Adaptation – ICSLP’00 Mohamed Afify and Olivier Siohan

slide-112
SLIDE 112

CMLLR

slide-113
SLIDE 113

MAPLR MAPLR MAPLR

slide-114
SLIDE 114

MAPLR

  • Reference:

– Hidden Markov Model Adaptation Using Maximum a Posteriori Linear Regression – In Workshop on Robust Methods for Speech Recognition in Adverse Conditions ’99 – Maximum a Posteriori Linear Regression for Hidden Markov Model Adaptation – EuroSpeech’99 C. Chesta – Maximum a Posterior Linear Regression with Elliptically Symmetric Matrix Variate Priors – EuroSpeech’99 W. Chou

slide-115
SLIDE 115

MAPLR

  • It is necessary to introduce some

constraints on the possible values of the transformation parameters to avoid getting unreasonable estimates.

  • A bayesian counterpart of the well

known MLLR adaptation is formulated based on MAP estimation.

slide-116
SLIDE 116

MAPLR

  • Let be a set of SI hidden Markov models.
  • Some transformations applied to

various clusters of HMM parameters.

  • Denote be some adaptation data.
  • Then
  • Rather than carrying out the estimation

using ML, we derive an estimate of using MAP.

Λ

) (⋅

η

F

X

) ( ) , | ( max arg ) , | ( max arg ˆ η Λ η X Λ X η η

η η

p p p

MAP

∝ =

( )

b A η , =

slide-117
SLIDE 117

MAPLR

  • In a given state , the pdf of an
  • bservation vector is modeled by a

mixture of Gaussian distributions : where is a Normal distribution of mean and precision matrix

j x

K

( )

=

= =

K k k j k j k j N

w j s p

1 , , ,

, | ) | ( R µ x x

( )

k j k j

N

, , ,

| R µ x

k j,

µ

k j,

R

( ) ( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ − − − ∝

k j T k j k j k j k j k j

tr N

, , , 2 1 , , ,

) )( ( 2 1 exp | | , | R µ x µ x R R µ x

slide-118
SLIDE 118

MAPLR

  • A mean vector is adapted using an

affine transformation

  • Where
  • So the Gaussian distribution ..
  • Clusters of mean vectors are also defined

so that all mean vectors from the same cluster share the same transformation

D k j

R µ ∈

,

} , { b A η =

k j k j , ,

Wξ µ =

) , 1 ( and ) , (

, , k j k j

µ ξ b A W = =

c

c

W

( ) ( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ − − − − − ∝

t t t t t t t t t t t t

l s T c l s c t c l s c t l s l s l s c c t

tr N R b µ A x b µ A x R R µ b A x ) )( ( 2 1 exp | | , , , |

2 1 ,

slide-119
SLIDE 119

MAPLR

  • How to select the prior density ?

– Unlike MAP estimation of HMM parameters, no obvious conjugate prior densities would be found in our case. – If the prior of matrix is chosen as the product of a Normal-Wishart density, no closed-form solution could be obtained for the square transformation matrix.

) (W p

slide-120
SLIDE 120

MAPLR

  • Let be an adaptation

utterance used to derive

  • Define an auxiliary function

} {

t

x X =

W

[ ]

) ( log ) , | , , ( log ) , , | , ( , , | ) ( log ) , | , , ( log ) | (

c L c c S c c c c c

p p p p p E Q η η Λ L S X η Λ X L S η Λ X η η Λ L S X η η + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = + =

∑ ∑

sequence mixture l sequence state s where

t t

} { } { = = L S

data Complete

data Hidden

slide-121
SLIDE 121

MAPLR

[ ] ( )

Ψ′ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − − − = Ψ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = + + + =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = =

) ( log ) )( ( 2 1 ) , ( ) ( log ) , , | ( log ) , ( ) ( log ) , , | ( log log log ) , , | , ( ) | (

, , , 1 1 1 1 , , 1 1 1 , , , ,

1

c l s T c l s c t c l s c t t K k N j T t c K k l s l s c t t N j T t c T t l s l s c t l s s s c c c

p tr k j p p k j p p w a p Q

t t t t t t t t t t t t t t t t t t

η R b µ A x b µ A x η R µ η x η R µ η x η Λ X L S η η

L S

γ γ

c c t t t

k l j s P k j where η η Λ X

  • f

t independen terms all : ) , , | , ( ) , ( Ψ = = = γ

slide-122
SLIDE 122

MAPLR

  • We assume that
  • So we can maximize the

transformation and bias individually.

  • we will differentiate the auxiliary

function w.r.t the bias firstly.

  • Assuming the distribution of the bias

is modeled by a Gaussian distribution with mean and covariance matrix

) ( ) ( ) , ( ) (

c c c c c

p p p p b A b A W = =

( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ − − − ∝

c T c c c c c

tr p Ξ β b β b Ξ b ) )( ( 2 1 exp | | ) (

2 / 1

β Ξ

slide-123
SLIDE 123

MAPLR

  • Differentiating w.r.t

and equate to zero

c

b

( )

( )

( )

c t t t c c c c m n c t k j t K k N j T t c c c c c k j k j c t k j t K k N j T t c c c c c

k l j s k j R k j k j Q η Λ X β b Ξ b µ A x β Ξ b Ξ b R µ A x R b A b A b , , | , Pr ) , ( where ) ( ) ( ) , ( 2 2 2 1 2 ) ( 2 2 1 ) , ( ) , | , (

, , 1 1 1 , , , 1 1 1

= = = = − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − ⇒ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × × + × × − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛− + × × + − × × − × − × ∴ = ∂ ∂

∑ ∑ ∑ ∑ ∑ ∑

= = = = = =

γ γ γ

slide-124
SLIDE 124

MAPLR

  • A closed form solution can be easily
  • btained.

( )

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ∴ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − = + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − = − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − −

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = − = = = = = = = = = = = = = = = c c k j c t k j t K k N j T t c k j t K k N j T t c c c c k j t K k N j T t c c k j c t k j t K k N j T t c c c c c k j t k j c t k j t K k N j T t c c c c k j c t k j t K k N j T t

k j k j k j k j k j k j k j β Ξ µ A x R Ξ R b b Ξ b R β Ξ µ A x R b Ξ β Ξ b R µ A x R β b Ξ b µ A x R ) ( ) , ( ) , ( ) , ( ) ( ) , ( ) , ( ) ( ) , ( ) ( ) ( ) , (

, , 1 1 1 1 , 1 1 1 , 1 1 1 , , 1 1 1 , , , 1 1 1 , , 1 1 1

γ γ γ γ γ γ γ

slide-125
SLIDE 125

MAPLR

  • Second, we’ll differentiate the auxiliary

function w.r.t the transformation

  • But how to choose the prior distribution for

the transformation ?

  • Unlike MAP estimation of HMM parameters,

no obvious conjugate prior densities could be found.

  • If we assume that the transformation is

modeled by Normal-Wishart density, we would get a equation with no closed-form solution.

  • The derivation will be shown in the following :

c

A

slide-126
SLIDE 126

MAPLR

  • Prior :
  • Differentiating w.r.t

and equate to zero

( )

c c

tr p c c

e p

A τ

A A

2 1 2 1

| | ) (

− − −

α

c

A

( )

( ) ( ) ( )

( )

c t t t p c T c tr tr T c c p c tr p c T k j c k j c t k j t K k N j T t c c c c c

k l j s k j where e e p e k j Q

c c c c c c

η Λ X A τ A A A A µ b µ A x R b A b A A

A τ A τ A τ

, , | , Pr ) , ( | | 2 1 | | | | 2 1 | | 1 ) 1 ( ) ( 2 2 1 ) , ( ) , | , (

2 1 2 1 2 1 2 2 2 1 2 1 , , , 1 1 1

= = = = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ × ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × − × + × × × − − + ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × − × − − × × − × = ∂ ∂

− − − − − − − − − − = = =

∑ ∑ ∑

γ α γ

α α α

slide-127
SLIDE 127

MAPLR

[ ]

( ) ( )

[ ] [ ]

( )

  • btained

be can solution form

  • Closed

....No , , | , Pr ) , ( where 2 1 ) ( ) , ( 2 1 ) , ( 2 1 2 1 ) ( ) , ( 2 1 2 1 ) ( ) , ( 2 1 2 1 | | | | 1 ) ( ) , (

, , 1 1 1 , , , 1 1 1 , , , , , 1 1 1 , , , 1 1 1 2 1 2 1 2 1 2 1 , , , 1 1 1 c t t t T c T k j c t k j t K k N j T t T c T k j k j c k j t K k N j T t T c T c T k j k j c k j T k j c t k j t K k N j T t T c T c T k j c k j c t k j t M m N n T t T c T c tr p c tr p c T k j c k j c t k j t K k N j T t

k l j s k j k j p k j p k j p k j p e e k j

c c c c

η Λ X τ µ b x R A µ µ A R τ A µ µ A R µ b x R τ A µ b µ A x R τ A A A µ b µ A x R

A τ A τ

= = = − − = × − − − ⇒ = − × − − + − − × ⇒ = − × − − + − − × ⇒ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − × − − + ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − − × ∴

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = − = = = − = = = − = = = − − − − − − − = = =

γ γ α γ α γ α γ α γ

α α

slide-128
SLIDE 128

MAPLR

  • Chou suggests to select from a

family of elliptically symmetric distributions.

  • Here, a special case of elliptical

distribution can be seen as a matrix version of a multivariate normal distribution.

( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ − − − ∝

− − − + − 1 1 2 2 ) 1 (

) ( ) ( 2 1 exp | | | | ) ( Φ M W Σ M W Φ Σ W

T p p

tr p , , , where

) 1 ( ) 1 ( ) 1 (

≥ ∈ ≥ ∈ ∈

+ × + × + ×

Φ R Φ Σ R Σ R M W

p p p p p p

) (W p

slide-129
SLIDE 129

MAPLR

  • So, we can maximize the Q-function w.r.t

transformation and bias

  • simultaneously. (i.e. with respect to )
  • The Q-function can be rewrite as

A

b W

( )

) ( log ) )( ( 2 1 ) , ( ) | (

, , , 1 1 1 c l s T l s c t l s c t t K k N j T t c c

p tr k j Q

t t t t t t

η R ξ W x ξ W x η η + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − =

∑ ∑ ∑

= = =

γ

slide-130
SLIDE 130

MAPLR

  • Differentiating Q-function w.r.t

and set it to zero.

  • In the following we drop subscript c. It should

be clear that the summation should be performed only on acoustic units belonging to the cluster c. ( ) ( )

) ( ) ( 2 1 ) )( ( 2 1 ) , ( ) | (

1 1 , , , 1 1 1

= ⎪ ⎪ ⎭ ⎪ ⎪ ⎬ ⎫ ⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ − − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − ∂ ∂ ∴ = ∂ ∂

− − = = =

∑ ∑ ∑

c c c c T c c k j T k j c t k j c t t K k N j T t c c c c

tr tr k j Q Φ M W Σ M W R ξ W x ξ W x W η η W γ

c

W

slide-131
SLIDE 131

MAPLR

[ ] [ ] [ ]

1 1 , , , 1 1 1 1 1 , , 1 1 1 1 1 , , , 1 1 1 1 1 , , , 1 1 1

2 1 2 1 ) , ( 2 1 2 1 ) , ( ) ( ) ( 2 1 ) ( ) , ( ) ( ) ( 2 1 ) 1 ( ) ( 2 2 1 ) , ( ) | (

− − − − = = = − − − − = = = − − − − = = = − − − − = = =

+ + = + + ⇒ − + − = − ⇒ = − + − × − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ × − × − × × × − = ∂ ∂

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

Φ W Σ Φ W Σ ξ ξ W R MΦ Σ MΦ Σ ξ x R Φ M W Σ Φ M W Σ ξ ξ W x R Φ M W Σ Φ M W Σ ξ ξ W x R η η W

T T T k j k j k j t K k N j T t T T T k j t k j t K k N j T t T T T k j k j t k j t K k N j T t T T T k j k j t k j t K k N j T t c c c

k j k j k j k j Q γ γ γ γ [ ]

) 1 ( + × D D

Z

[ ]

) 1 ( + × D D

H

slide-132
SLIDE 132

MAPLR

[ ]

( ) [ ]

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = + = = + = = = = = = m l l d d n l m l d n d k j m k j d n t d t T t K k N j D l D d m l l d d n l m l d n d D l D d k j m t d k j d n D d t K k N j T t m n

m m r x k j m m ξ x r k j

, , , , , , ) , ( ) , ( , ) ( 1 1 1 1 1 1 , , , , , , 1 1 1 ) , ( ) ( ) , ( , 1 1 1 1 ,

2 1 ) , ( 2 1 ) , ( φ σ φ σ ξ γ φ σ φ σ γ Z

[ ]

( ) [ ]

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = + = = + = = + = = = = = m l d n l m n d k j m k j l k j d n t T t K k N j l d D l D d m l l d d n l m l d n d D l D d k j m k j l l d k j d n D l D d t K k N j T t m n

r k j w w w w r k j H

, , , , ) , ( ) , ( ) , ( , 1 1 1 , 1 1 1 , , , , , , 1 1 1 ) , ( ) , ( , ) , ( , 1 1 1 1 1 1 ,

2 1 ) , ( 2 1 ) , ( φ σ φ σ ξ ξ γ φ σ φ σ ξ ξ γ

) , ( , m n l d

g

slide-133
SLIDE 133

MAPLR

  • The matrix can be obtained by

solving the system of linear equations:

W

) 1 ( + × D D

[ ]

1 , , 1 , , 1

, , , ) , ( ,

+ = = =

D m D n w g

m n l d l d m n l d

L L Z

[ ] [ ] [ ] [ ]

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎧ = = = =

+ + + +

∑ ∑ ∑ ∑

1 , , , ) 1 , ( , 1 , , , ) 1 , ( , 1 , 1 , , ) 1 , 1 ( , 1 , 1 , , ) 1 , 1 ( , D D l d l d D D l d D l d l d D l d D l d l d D l d l d l d l d

w g w g w g w g Z Z Z Z M M M

slide-134
SLIDE 134

MAPLR

  • If we assume that and are

diagonal then

  • So the matrix can be obtained by

solving D system linear equations.

k j,

R Σ

[ ]

[ ]

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = =

∑ ∑ ∑ ∑

= = = + = m l n n l m n n k j m k j l k j n n t T t K k N j l n D l m n

r k j w

, , , , ) , ( ) , ( ) , ( , 1 1 1 , 1 1 ,

2 1 ) , ( φ σ φ σ ξ ξ γ H

[ ]

[ ]

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑ ∑ ∑ ∑

= = = + = m l l n n n l m l n n n k j m k j n n t n t T t K k N j D l m n

m m r x k j

, , , , , , ) , ( ) , ( , ) ( 1 1 1 1 1 ,

2 1 ) , ( φ σ φ σ ξ γ Z

W

1 + D

) , ( m n l

g

slide-135
SLIDE 135

MAPLR

  • When n = 3

[ ] [ ]

⎪ ⎩ ⎪ ⎨ ⎧ = + + = + +

+ + + + + + + + + 1 , 3 ) 1 , 3 ( 1 1 , 3 ) 1 , 3 ( 1 1 , 3 1 , 3 ) 1 , 3 ( 1 1 , 3 ) 1 , 3 ( 1 1 , 3 D D D D D D D D D

g w g w g w g w Z Z L M L

[ ]

system th each for 1 , , 1

, ) , ( , 1 1

n D m g w

m n m n l l n D l

+ = =

+ =

L Z

slide-136
SLIDE 136

MAPLR

  • Hyper-parameter Estimation :

– Basic principle :

  • First generate a set of N transformation

matrices.

  • Then use N matrices to derive an estimate of

the hyper-parameters.

– Another way:

  • Select a set of training data for N different

speakers and estimate the regression matrices for each speaker using a MLLR approach.

  • Drawback: Some data is required to derive the

hyper-parameter.

slide-137
SLIDE 137

MAPLR

  • Hyper-parameter Estimation :

– Another attractive technique:

  • Collect the mean vectors from the SI models

associated to a given regression class.

  • Assign the mean vectors into N subsets
  • For each subset , the subset can be further

partitioned into 2 subsets

  • Each mean vectors from can be projected
  • nto each mean vectors from by applying the

transformation.

  • can be easily estimated using MLLR by

assuming that is a set of data vectors and corresponds to the models.

N i

i

, , 1 L = S

i

S

i i

R L and

i

L

i

R

i

L

i

R

slide-138
SLIDE 138

MAPLR

slide-139
SLIDE 139

MAPLR

  • Hyper-parameter Estimation :

– Once these N transformation matrices have been obtained, we can use a maximum likelihood approach to derive the

i

W

Φ Σ M and ,

∑ ∑

= − =

− − = =

N i T i i N i i

N N

1 1 1

) ( ) ( 1 1 M W Φ M W Σ W M

matrix identity the as chosen be can Φ

slide-140
SLIDE 140

MAPLR Variance Adaptation MAPLR Variance MAPLR Variance Adaptation Adaptation

slide-141
SLIDE 141

Reference

  • Maximum a Posterior Linear

Regression (MAPLR) Variance Adaptation for Continuous Density HMMs – EuroSpeech’03 Wu Chou and Xiaodong He

slide-142
SLIDE 142

MAPLR Variance Adaptation

( )

[ ] [ ]

( ) [ ] [ ]

) ( , ), 1 ( , ) ( , ), 1 ( 2 ) , , let , , diagonal be to assumed is Here, ) | ( max arg criterion MAP ) | ( max arg criterion ML ion) factorizat (Choleski

) ( ) ( 2 1 2 1 2 2 2 1 2 2 1 n n n n 2 1 2 1

D D e p( h h h h diag p p p

D D D k m k m k m k m n n T n n n n n T n n h h h h h h µ h Σ µ h h H H

σ σ Σ µ µ µ Σ h h H H H O H O H B H B Σ Σ Σ B B Σ

h 1 h T h

L L L L = = = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = → ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = → = = =

− − − − −

∏ ∏

π λ λ

slide-143
SLIDE 143

MAPLR Variance Adaptation

[ ]

( ) ( ) ( ) ( ) ( ) ( ) ( )

)) ( ( 2 ) ( 1 ) ( ) 2 ( 1 2 ) , ( each for ) | ( Set ) ( 2 )) ( ( ) ( 2 log 2 1 ) ( log log ) , ( 2 1 ) ( log ) ( ) ( log ) , ( 2 1 ) | (

) ( 2 3 ) ( 2 ) ( 2 ) ( ) ( ) ( , , ) ( 2 2 ) ( 2 2 ) ( ) ( 2 ) ( 2 ) ( 2 ) ( 2 ) ( 1 , , , , 1

= − × × − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − × − + × − = ∂ ∂ − − − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + + − = + − − + − =

− − = −

∑ ∑ ∑ ∑

d h d h σ µ

  • h

m n d h Q d d h d µ

  • σ

h h σ m n p m n Q

h d d d m d m d t d t m n t d h d d m d t d m d d d m t D d m n t m n t m t m T m t m t

µ σ γ σ µ πσ γ γ

h h h

M M h µ

  • Σ

µ

  • Σ

M M

slide-144
SLIDE 144

MAPLR Variance Adaptation

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

) ( ) , ( ) , ( ) ( ) ( ) ( 1 ) ( ) ( ) ( ) ( ) , ( ) ( ) ( ) ( ) ( ) , ( )) ( ( 2 ) ( 1 ) ( ) 2 ( 1 2 ) , (

2 ) ( 2 ) ( ) ( , , 2 ) ( , , 3 ) ( 2 4 ) ( 2 3 ) ( 2 2 4 ) ( 2 ) ( 2 ) ( ) ( 2 ) ( , , 2 2 ) ( 2 ) ( 2 ) ( ) ( 3 ) ( 1 ) ( , , ) ( 2 3 ) ( 2 ) ( 2 ) ( ) ( ) ( , ,

= − + − + − = + − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − + − = + − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − − − = − × × − ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − × − + × −

∑ ∑ ∑ ∑ ∑

− − − d m d m d t t m n t d t m n t d h d d h d d m d m d t d t m n t h d d m d m d t d d t m n t h d d d m d m d t d t m n t

σ µ

  • m

n h m n h d d h d h d d d h σ µ

  • h

m n d d d h σ µ

  • h

h m n d h d h σ µ

  • h

m n γ γ σ µ σ σ µ σ γ σ µ σ γ µ σ γ

h h h h h h h

slide-145
SLIDE 145

SMAP SMAP SMAP

slide-146
SLIDE 146

Reference

  • Structural MAP Speaker Adaptation Using

Hierarchical Priors – ASRU’97 Koichi Shinoda and Chin-Hui Lee

  • Unsupervised Adaptation Using Structural

Bayes Approach – ICASSP’98 Koichi Shinoda and Chin-Hui Lee

  • A Structural Bayes Approach to Speaker

Adaptation – SAP’01 Koichi Shinoda and Chin-Hui Lee

slide-147
SLIDE 147

Tree Structure

  • Distance measure between two gaussian

components:

– The sum of the Kullback-Leibler divergence

( ) ( )

∑ ∫ ∫

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + − + − + − = + =

i m m n m n n m n n m m n n n m m

i i i i i i i i i i dx x g x g x g dx x g x g x g n m d ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( log ) ( ) ( ) ( log ) ( ) , (

2 2 2 2 2 2 2 2

σ µ µ σ σ σ µ µ σ σ

slide-148
SLIDE 148

Tree Structure

  • Approximate single gaussian for each node k

[ ] ( )

[ ]

( ) ( )

k M i M i i M i i x E M i i M i x E M i

k k k M m k m M k k m k M m k k m k k M m k m k M m k m k k

k k k k k

node in number mixture the is where ) ( ) ( ) ( 1 ) ( ) ( 1 ) ( ) ( 1 ) ( 1 ) (

2 1 2 ) ( 1 2 ) ( 1 2 ) ( 2 1 ) ( 1 ) (

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − + = − = = =

∑ ∑ ∑ ∑ ∑

= = = = =

µ µ σ µ σ µ µ

slide-149
SLIDE 149

Tree Structure

  • How to build a tree structure

– The structure of the tree structure is designed; the number of layers and the number of branches from a node in each layer are determined. – Set the root node to be node k and the set G to be set Gnow . Calculate the node pdf for the root node.

slide-150
SLIDE 150

Tree Structure

  • How to build a tree structure

– If node k has no child nodes, stop

  • clustering. Otherwise, give the initial pdf

for each child node using the minimax method that is described as follows.

k p (c k k

1...P p c g k P k g

p

= ⋅ , node child for pdf node the is node

  • f

nodes child

  • f

number the is node for pdf node the is ) (

) ) (

slide-151
SLIDE 151

Tree Structure

  • Minimax method :

– First : – Second:

) ( ) ( i.e., , child for pdf node as it set and ) ( to distance largest the has that ˆ component mixture the set the among Choose

ˆ )

⋅ = ⋅ ⋅

m (c 1 (k) now

g g c g m G

1

) ( ) ( ) min max arg ˆ : follows as nodes child for pdfs node the to set those and to from ly successive for components mixture Choose

ˆ ) 1 1

⋅ = ⋅ = = =

− ≤ ≤ m (c i p i m k p

g g c d(m, m P p 2 p c

P

slide-152
SLIDE 152

Tree Structure

  • Minimax method :

– Third:

( )

( )

1 where ) ( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( : follows as for pdf node the be set to are pdf resulting and ed interpolat is for pdf node the and node child each for pdf node The

2 2 2 2 2

≤ ≤ ′ − + + + − = ′ + − = ′ α µ µ σ α µ σ α σ αµ µ α µ

p p p p p p

c c c k k c c k c p p

i i i i i i i c k c

slide-153
SLIDE 153

Tree Structure

  • How to build a tree structure

– Repeat the following k-means procedure until the grand sum of distances converges.

  • A) For each mixture in Gnow, calculate the

distance from it to each child node pdf, and assign each mixture to the nearest child node.

  • B) Recalculate the child node pdf
  • C) Calculate the sum of distances from each

child node to each of its mixture and then

  • btain the grand sum.

– Set each child node to be node k.

slide-154
SLIDE 154

Summarization of Gaussian Distributions

{ } ( ) { }

I Y I Y Y I Y Y ≠ ≠ = = = − Σ = =

η ν η ν µ and where be to assumed is for Y pdf the Here from different is for pdf the them, between mismatch a is there When

  • n

distributi normal standard the is ,..., for pdf the data, adaptation and training the between mismatch no is there When a vector into ed transform is vector sample each data. adaptation

  • f

set given a denote ,..., Let

1 2 / 1 1

r r r ) , | N( ) , | N( ) , | N( y y M 1,..., m T 1,..., t x y y x x x X

mT m m m t m mt mt t T

slide-155
SLIDE 155

Summarization of Gaussian Distributions

{ }

{ }

p (p) p M p m p p p p (p) P 1

G M g g g G N h P G G G

p

subset in mixture the

  • f

number the is where ) ( ),..., ( ),..., ( subset each in mixture the all by shared is ) , | ( ) ( pdf, normalized common One subsets

  • f

number total the is where ,..., subsets into divided is mixtures

  • f

set whole the is which

) ( ) ( ) ( 1 ) ( ) (

) (

⋅ ⋅ ⋅ = = ⋅ η ν Y

slide-156
SLIDE 156

Summarization of Gaussian Distributions

( )( ) ( ) ( ) ( )

[ ]

T p m p p m p m p p m (p) m (p) m M m p mt T t M m T p p mt p p mt p mt T t p M m p mt T t M m p mt p mt T t p p (p)

µ µ y y y (

p p p p

2 / 1 ) ( ) ( 2 / 1 ) ( ) ( ) ( 2 / 1 ) ( 1 ) ( 1 1 ) ( ) ( ) ( ) ( ) ( 1 ) ( 1 ) ( 1 1 ) ( ) ( 1 ) ( ) (

~ ~ ~ ~ updated are parameters HMM Then ~ ~ ~ ~ ) , parameters the

  • f

estimates ML the Then

) ( ) ( ) ( ) (

Σ Σ = Σ Σ + = − − = =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = =

η ν γ ν ν γ η γ γ ν η ν

slide-157
SLIDE 157

Hierarchical Prior

( )

( ) ( ) ( )

( )

( ) (

) ( ) ( )

( ) ( ) ( )

( )⎥

⎦ ⎤ ⎢ ⎣ ⎡− ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − ∝ = = ⇒ = = = = = =

− − − − − − − − − − − − − − − − − − 1 1 1 1 1 2 / 1 1 1 1 1 1 1 1 ) ( 1 ) ( 1 ) (

ˆ 2 1 exp ˆ ˆ 2 exp | | , , ˆ , ˆ | , ˆ | Wishart

  • normal

a be to assumed is ˆ | ˆ | | max arg ˆ , ˆ ) ( ˆ | ˆ , | max arg , ˆ | max arg ˆ given with , each for Now ~ , ~ ˆ tion representa structure tree the developed have We

k k k k k T k k k k k k k k k k k k k k k k k k k k k k k k k 1 k p k p k p 1 k

tr g p p p p p p p p K 1,..., k

k k k k

η η ν ν η ν ν τ η τ ξ η ν η ν λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ η ν λ

ξ λ λ λ

Y I Y Y Y r

slide-158
SLIDE 158

SMAP Adaptation Using Hierarchical Priors

slide-159
SLIDE 159

SMAP Adaptation Using Hierarchical Priors

{ }

m N N N N N m

K 1 K k 1

mixture to attached directly node leaf the is and root the is where ,..., ,..., be mixture th the to ing correspond leaf the root to the from sequence node Let the

slide-160
SLIDE 160

SMAP Adaptation Using Hierarchical Priors

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( )

( ) (

) ( )

1 ,..., 1 , ˆ | , ˆ | | | , ˆ | | | | assume be to is step key a directly, maximize to difficult is it Because 1 | p with | , | , | , | | | , | | Since

1 1 1 1 1 1 1 1 1 1 1

− = ≈ ≈ = = = ∀ =

+ − + − − − − − −

∫ ∫ ∫ ∫ ∫ ∫

K k p d p p p d p p d d d d p p p p K 1,.., k d p p p

k k k k k k 1 k 1 K k k K K K k k k k k

Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ L L L L

slide-161
SLIDE 161

SMAP Adaptation Using Hierarchical Priors

( )

( )

∏ ∑ ∏

+ = = − − − − − = +

+ Γ + Γ Γ = = ∴ Γ + − − Γ + Γ + Γ + = + Γ + Γ = ≈ ∴

K k i i i i k k k k K k k k K k k T k k k k k k k k k k k k k k k k k k k K k k k K

p p

1 1 1 1 1 1 1 1 1

where ~ ˆ ) ˆ ~ )( ˆ ~ ( ~ ˆ ˆ ˆ ~ ˆ Then , ˆ | | τ τ τ ω ν ω ν ξ ν ν ν ν τ τ η η η τ ν τ ν ν λ λ λ Y Y

slide-162
SLIDE 162

Joint MAP and MAPLR Joint MAP and MAPLR Joint MAP and MAPLR

slide-163
SLIDE 163

Joint MAP and MAPLR

  • Reference:

– Speaker Adaptation Using combined Transformation and Bayesian Methods – SAP’96 Vassilios V. Digalakis – Joint Maximum a Posteriori Adaptation of Transformation and HMM Parameters – ICASSP’00 and SAP’01 O. Siohan

slide-164
SLIDE 164

Joint MAP and MAPLR

  • MLLR is quite efficient when the

amount of adaptation data is limited.

  • MAP has nice asymptotic properties.
  • We jointly optimize a direct and

indirect adaptation to take advantage

  • f both approaches.
  • We use MAP criterion

( )

funtion likelihood the denotes ) , | ( where ) , ( ) , | ( max arg ,

,

η Λ X η Λ η Λ X η Λ

η Λ

p p p

MAP MAP

=

slide-165
SLIDE 165

Joint MAP and MAPLR

slide-166
SLIDE 166

Joint MAP and MAPLR

  • Denote
  • So the criterion can be rewritten as
  • Define the auxiliary function
  • Assume that and are independent, then

sequence n

  • bservatio

: } { sequence mixture : } { sequence state : } {

t t t

l s x X L S = = =

( )

) , ( ) , | , , ( max arg ,

,

η Λ η Λ L S X η Λ

L S η Λ

p p

MAP MAP

∑ ∑

=

( ) [ ]

η Λ X η Λ η Λ L S X η Λ η Λ , , | ) , ( log ) , | , , ( log , | , p p E Q + =

Λ η

) ( ) ( ) , ( η Λ η Λ p p p =

( ) [ ]

η Λ X η Λ η Λ L S X η Λ η Λ , , | ) ( log ) ( log ) , | , , ( log , | , p p p E Q + + = ∴

slide-167
SLIDE 167

Joint MAP and MAPLR

  • The M-step can then be sub-divided to jointly
  • ptimize and via an iterative approach

leading to the following M-step:

– 1.Initialization: – 2.Setp i :

  • Maximize Q function w.r.t

given

  • Transform the model using
  • Maximize Q-function w.r.t

: – 3.If a fixed point is not reached, goto step 2 – 4.Termination:

Λ η

1 ,

) (

= = i Λ Λ

) (i

η

) 1 ( − i

Λ

[ ]

) 1 ( ) ( ) ( ) 1 ( ) (

, | ) ( log ) , | , , ( log max arg

− −

+ =

i i i i i

p p E Λ X η η Λ L S X η

η

) ( ~ :

) 1 ( ) ( ) (

) (

=

i i i

i

F Λ Λ η

η ) (i

Λ

[ ]

) ( ) ( ) ( ) (

~ , | ) ( log ) | , , ( log max arg

i i i i

p p E Λ X Λ Λ L S X Λ

Λ

+ =

+ + i

) ( ) ( , i i

η η Λ Λ = =

slide-168
SLIDE 168

Joint MAP and MAPLR

) model initial (

) (

Λ Λ =

) 1 (

find η

) model te intermedia ( ) ( ~

) ( ) 1 (

) 1 (

Λ Λ

η

F =

) 1 (

~ adapt Λ

) 1 step in model final (

) 1 (

Λ

) 1 (

Λ

) 2 (

find η

) model te intermedia ( ) ( ~

) 1 ( ) 2 (

) 2 (

Λ Λ

η

F =

) 2 (

~ adapt Λ

) 2 step in model final (

) 2 (

Λ

) 2 (

Λ

) 3 (

find η

) mode te intermedia ( ) ( ~

) 2 ( ) 3 (

) 3 (

l F Λ Λ

η

=

) 3 (

~ adapt Λ

) 3 step in model final (

) 3 (

Λ

MAPLR MAPLR MAPLR MAP MAP MAP

Step 1 Step 2 Step 3

slide-169
SLIDE 169

Joint MAP and MAPLR

  • MAP estimation of the transformation

matrix

MAPLR estimation

  • MAP estimation of the HMM parameters

MAP estimation

) ( ) , | ( max arg W Λ W X W

W

p p = ) ~ ( ) ~ , | ( max arg

~

Λ Λ W X Λ

Λ

p p =

slide-170
SLIDE 170

Joint MAP and MAPLR

[ ]

( )

Ψ ′ ′ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − = Ψ′ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = Ψ + + + + =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = =

) ( log ) )( ( 2 1 ) , ( ) ( log ) , , | ( log ) , ( ) ( log ) , , | ( log log log ) , | , ( ) | ~ (

, , , 1 1 1 1 , , 1 1 1 , , , ,

1

c k j T k j c t k j c t t K k N j T t c K k k j k j c t t N j T t c T t l s l s c t l s s s c c c

p tr k j p p k j p p w a p Q

t t t t t t t t

W R ξ W x ξ W x Λ R µ η x Λ R µ η x Λ X L S Λ Λ

L S

γ γ

  • Differentiating w.r.t

and set it to zero then we can obtain the estimate

c

W

MAPLR

slide-171
SLIDE 171

Joint MAP and MAPLR

  • How to choose

– It might happen that and no longer belongs to the same family of distributions. – We only consider the reestimation of the mean vector and assume that the conjugate prior density of is a Normal distribution

) ~ (Λ p

µ

) ~ (Λ p ) (Λ p

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − ∝ = ) ( ) ( 2 1 exp ) , | ( ) (

, , , , , , , , , k j k j k j T k j k j k j k j k j k j

p p m µ τ m µ τ m µ µ

µ

matrix precision the is and mean the is eters hyperparam all are and where

, , , , k j k j k j k j

τ m τ m

MAP

slide-172
SLIDE 172

Joint MAP and MAPLR

  • Then it is easy to show that the prior

distribution of is also a Normal distribution

k j,

~ µ

( ) ( )

1 , 1 , , , , , , , , , , ,

) ( ~ and ~ where ~ ~ ~ ~ ~ 2 1 exp ) ~ , ~ | ~ (

− −

= + = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − − − ∝ A τ A τ b Am m m µ τ m µ τ m µ

k j T k j k j k j k j k j k j T k j k j k j k j k j

p

MAP

slide-173
SLIDE 173

Joint MAP and MAPLR

[ ]

( ) ( )

Ψ′ + − − − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = Ψ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = + + + =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = =

k j k j k j T k j k j K k k j k j t t N j T t k j K k k j k j t t N j T t c T t l s l s t l s s s c c

p k j p p k j p p w a p Q

t t t t t t t t

, , , , , 1 , , 1 1 , 1 , , 1 1 1 , , , ,

~ ~ ~ 2 1 ) , | ( log ) , ( ) ( log ) , | ( log ) , ( ) ( log ) , | ( log log log ) ~ , | , ( ) ~ | (

1

m µ τ m µ R µ x µ R µ x Λ R µ x Λ X L S Λ Λ

L S

γ γ

MAP

slide-174
SLIDE 174

Joint MAP and MAPLR

( )

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = + = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + =

∑ ∑ ∑ ∑

= − − − = − − − − = − = t T t k j t k j k j T T t k j t k j T k j k j T k j k j k j t T t k j t k j k j T t k j t k j k j

k j k j k j k j x R b Am A τ A R A τ A µ A τ A τ b Am m x R m τ R τ µ

1 , , 1 , 1 1 1 , 1 , 1 , 1 , , , , 1 , , , 1 1 , , ,

) , ( ) ( ) ( ) , ( ~ ~ ~ ) , ( ~ ~ ) , ( ~ γ γ γ γ Q

  • Differentiate w.r.t

and set it to zero, we can obtain the estimate [MAP]

k j,

µ

MAP

slide-175
SLIDE 175

Appendix Appendix Appendix

Matrix Calculus Matrix Calculus

slide-176
SLIDE 176

Appendix-Matrix Calculus(1)

  • Notation:

ij ij i j i i ij ij i i i

d dy j i d dy dx d j i dx d d d j i d d d dy i d dy dx d i dx d j i n n i n X X X X x y x y x x y y X B X B X B b x a b x a b x a

T

is element ) , ( whose matrix a is is element ) , ( whose matrix a is is element ) , ( whose matrix a is is element th whose vector a is is element th whose vector a is and matrix in element column th , row th the be , dimension

  • f

matrix a is , and , vector in element th the be , , dimension

  • f

vectors are , , ×

slide-177
SLIDE 177

Appendix-Matrix Calculus(2)

  • Properties 1:

– proof

  • Properties 1— Extension:

– proof

T T T

ba X b X a = d d ) (

( ) ( )

T T T T T T

ba X Xb a a b X X a b X b X a X a b b X a = ∴ = ∂ ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ∂ = ∂ ∂ ∴ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ =

∑ ∑ ∑ ∑

= = = =

d d

t k kt n i n j ij j i kt n i n j ij j i

) ( ) (

1 1 1 1

Q

T T

ab X Xb a = d d ) (

( ) ( )

T T T T

ab X Xb a b a X X a b X Xb a X a b Xb a = ∴ = ∂ ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ∂ = ∂ ∂ ∴ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ =

∑ ∑ ∑ ∑

= = = =

d d

t k kt n i n j ji j i kt n i n j ji j i

) ( ) (

1 1 1 1

Q

slide-178
SLIDE 178

Appendix-Matrix Calculus(3)

  • Properties 2:

– proof

( )x

C C x Cx x

T T

+ = d d ) (

( ) ( ) ( )x

C C X Cx x x C x C x C x C C x X C x x X Cx x C x x Cx x

T T T T

+ = ⇒ + = ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎨ ⎧ ≠ = ≠ = = = = ∂ ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ∂ = ∂ ∂ ∴ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = ≠ ≠ = = = =

d d i k j j k i k j i

n t t tk n t t kt k t t kt k t t tk kk k k n i n j ji j i k n i n j ji j i

) ( 2 ) (

1 1 1 1 1 1

Q

slide-179
SLIDE 179

Appendix-Matrix Calculus(4)

  • Properties 3:

– proof

( ) [ ]

B X BXT = d tr d

( ) ( ) ( ) [ ] ( ) [ ]

B X BX B B X BX X B BX X B BX

T T T T

= ∴ = = = ⋅ = ∂ ∂ ⇒ = ∴ =

∑∑ ∑∑ ∑

= = = = =

d tr d j t i k tr tr

ij n k n t kt ij n k n t kt kt n k jk ik ij 1 1 1 1 1

) , ( 1 Q

slide-180
SLIDE 180

Appendix-Matrix Calculus(5)

  • Properties 4:

– proof

( ) [ ]

T

X X X X

⋅ = ) det( det d d

( ) [ ] [ ] ( ) [ ] [ ]

[ ]

T T T T

adj d d adj and adj d d

− −

⋅ = ⋅ ⋅ = = ∴ ⋅ = ⋅ = X X I X X X X X I X X X X X X ) det( ) det( ) ( det ) det( ) ( ) ( det

1

Q

slide-181
SLIDE 181

Appendix-Matrix Calculus(6)

  • Properties 5:

– proof

[ ]

T T T T T

CXba Xab C X CXb X a + = d d

[ ]

( )

( ) ( )

( )

( )

( )

( )

( )

T T T T T T T T

CXba Xab C X C b a X C b a C a b X C a X X C b a X C a X b X C a X b X X C a X b X X CXb X a C a X b X C u v Cv u CXb X a b X v a X u Xb v Xa u + = + = ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎨ ⎧ = ≠ ≠ = = = = ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎧ = ≠ ∂ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ∂ ≠ = ∂ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ∂ = = = ∂ ∂ ∴ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = = = ∴ = =

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = ≠ ≠ ≠ ≠ = = = = = = = = = n i iv ui v v n j jv ju v v n u i ui v v iv n u j ju v jv v uv ju v v uv n u i ui v uv v iv uv n u j ju v jv v uv uv ju v uv v uv v k uv n i n j ji n k k jk n k k ik n i n j ji j i n k k ik i n k k ik i

u j u i u j u i b u j u i u j u i u j u i X u j u i d d and and Let

1 1 1 1 1 1 1 1 1 1

, , , 2 , , ,

slide-182
SLIDE 182

Appendix-Matrix Calculus(7)

  • Properties 6:

– proof

( )

A xx Ax x

T T

tr =

( ) ( )

) ( ) (

1 1 1 1 1 1 1 1 1

A xx Ax x A x x A xx A xx A xx A x x x A x Ax x

T T T T T T

tr tr

n i ki k i n k n i ii ki ik n k n i ii kj k j n k n j j n k kj k n j

= ∴ = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = = = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = = = = = =

slide-183
SLIDE 183

Appendix-Matrix Calculus(8)

  • Some other properties :

– – – –

a x x a x a x

T T

= = d d d d ) ( ) (

( )

AXB XB A X AXB X T + =

T T

d tr d ) (

( )

T T T T

d d

− − −

− = X ab X X b X a

1

← ( ) [ ]

symmetric is where ) )( ( ??

1 1 1 1

C X C X A A X X A X C X

T T T T

d tr d

− − − − − − −

+ − =