adaptation techniques for acoustic adaptation techniques
play

Adaptation Techniques for Acoustic Adaptation Techniques for - PowerPoint PPT Presentation

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Models Models Models Jen- -Wei Wei Roger Roger Kuo Kuo Jen Speech Lab, CSIE, NTNU Speech Lab, CSIE, NTNU rogerkuo@csie.ntnu.edu.tw


  1. Discrete HMM A simple example : ( ) = + + + + + + + all 1 2 3 4 5 6 7 8 all paths + + + + + + ⎡ ⎤ ⎡ ⎤ 1 2 3 4 5 6 7 8 π + π = γ ⋅ π + γ ⋅ π log log ( 1 ) log ( 1 ) log ⎢ ⎥ ⎢ ⎥ 1 2 1 1 2 2 ⎣ all ⎦ ⎣ all ⎦ = = t 1 t 2 = j 2 = j 1 = i 1 ⎡ + + + + ⎤ ⎡ 1 2 1 5 ⎤ ⎡ 3 4 2 6 ⎤ + + + + log a log a ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 11 12 ⎣ all all ⎦ ⎣ all all ⎦ ⎣ ⎦ = i 2 ⎡ + + + + ⎤ ⎡ 5 6 3 7 ⎤ ⎡ 7 8 4 8 ⎤ + + + log a log a ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 21 22 ⎣ all all ⎦ ⎣ all all ⎦ ⎣ ⎦

  2. ( ) = P s i , X λ ( ) γ = t i ( ) Discrete HMM t P X λ ( ) = P s i , X λ = t ( ) N ∑ = P s j , X λ t The Forward/Backward Procedure = j 1 ( ) ( ) α β i i = t t ( ) ( ) ( ) ( ) ( ) ( ) α β α β α β 1 1 1 1 1 1 N ( ) ( ) ∑ α β 3 3 1 1 2 2 j j t t = j 1 s 1 s 1 s 1 State ( ) = = P s i , s j , X λ ( ) + ξ = t t 1 i , j ( ) s 2 s 2 s 2 t P X λ ( ) ( ) ( ) ( ) ( ) ( ) ( ) α β α β α β 2 2 2 2 2 2 = = P s i , s j , X λ 3 3 1 1 2 2 + = t t 1 ( ) N N 1 2 3 time ∑ ∑ = = P s i , s j , X λ + t t 1 = = i 1 j 1 X 1 X 2 X 3 ( ) ( ) α β i a b ( x ) j + + = t ij j t 1 t 1 N N ( ) ( ) ∑ ∑ α β i a b ( x ) j + + t ij j t 1 t 1 = = i 1 j 1

  3. Discrete HMM Q-function : ⎡ ⎤ ⎢ ⎥ p ( X , S | λ ) ∑ ⎢ ⎥ = ⋅ Q ( λ | λ ) log p ( X , S | λ ) ⎢ ⎥ ∑ S p ( X , S | λ ) ⎢ ⎥ ⎣ ⎦ ξ ( t ) γ ( 1 ) S ij i ⎡ ⎤ ⎡ ⎤ ⎛ − ⎞ N N N T 1 ∑ ∑ ∑ ∑ = = π + = = ⎜ ⎟ Pr( s i | X , λ ) log Pr( s i , s j | X , λ ) log a ⎢ ⎥ ⎢ ⎥ + 1 i t t 1 ij ⎣ ⎦ ⎝ ⎠ ⎣ ⎦ = = = = i 1 i 1 j 1 t 1 ⎡ ⎤ ⎛ ⎞ N K ∑ ∑ ∑ ⎜ ⎟ + = Pr( s j , x ~ v | X , λ ) log b ⎢ ⎥ ⎜ ⎟ t t k jk ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = j 1 k 1 t : x ~ v t k T ∑ γ ⋅ = ( j ) 1 ( x v ) t t k = t 1 T ∑ γ ( j ) t = t 1

  4. Discrete HMM R-function : π For simplicity , prior independen ce of , A and B is assumed. The prior density for λ is then ( ) ( ) ( ) ( ) = ⋅ ⋅ p λ p π p A p B and their densities assume the form of Dirichlet distributi ons then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ N N N N K ( ) ∏ ∏ ∏ ∏ ∏ η − η − 1 ν − = π 1 1 p λ K a b ij ⎢ ⎥ i ik ⎢ ⎥ ⎢ ⎥ c i ij ik ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ = = = = = i 1 i 1 j 1 i 1 k 1 η η ν > where , and 1 i ij ik [ ] ( ) ( ) ( ) = = λ max log p λ | X max log p X | λ p λ MAP λ λ [ ] ( ) ( ) = + max log p X | λ log p λ λ [ ( ) ( ) ] = + max Q λ | λ log p λ λ = + We define the auxiliary function R ( λ | λ ) Q ( λ | λ ) log p ( λ )

  5. Discrete HMM ∴ = Ψ + R ( λ | λ ) ( constant ) ⎡ ⎤ N ( ) ∑ = + η − π + Pr( s i | X , λ ) 1 log ⎢ ⎥ 1 i i ⎣ ⎦ = i 1 ⎡ ⎤ ⎛ ⎞ ⎛ − ⎞ N N T 1 ∑ ∑ ∑ ⎜ ⎟ = = + η − + ⎜ ⎟ Pr( s i , s j | X , λ ) 1 log a ⎢ ⎥ ⎜ ⎟ + t t 1 ij ij ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ = = = i 1 j 1 t 1 ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ N K ∑ ∑ ∑ ⎜ ⎟ ⎜ ⎟ = = + ν − ⎢ ⎥ Pr( s i , x v | X , λ ) 1 log b ⎜ ⎟ ⎜ ⎟ t t k ik jk ⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ = = j 1 k 1 t : x ~ v t k

  6. Discrete HMM So, we can obtain ⎛ − ⎞ T 1 ∑ = = + η − ⎜ ⎟ Pr( s i , s j | X , λ ) 1 + t t 1 ij = + η − Pr( s i | X , λ ) 1 ⎝ ⎠ = π = = t 1 1 i a i ij N ⎡ ⎤ ⎛ ⎞ − N T 1 [ ] ∑ ∑ ∑ = + η − Pr( s i | X , λ ) 1 = = + η − ⎜ ⎟ Pr( s i , s j | X , λ ) 1 ⎢ ⎥ 1 i + t t 1 ij ⎝ ⎠ ⎣ ⎦ = i 1 = = j 1 t 1 ⎛ ⎞ ∑ ⎜ ⎟ = = + ν − Pr( s i , x v | X , λ ) 1 ⎜ ⎟ t t k ik ⎝ ⎠ t : x ~ v = b t k jk ⎡ ⎤ ⎛ ⎞ K ∑ ∑ ⎜ ⎟ = = + ν − Pr( s i , x v | X , λ ) 1 ⎢ ⎥ ⎜ ⎟ t t k ik ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = k 1 t : x ~ v t k

  7. Discrete HMM • How to choose the initial estimate for ? π , a and b i ij jk • One reasonable choice of the initial estimate is the mode of the prior density. η − 1 π = = ( 0 ) i i 1 , , N L i N ( ) ∑ η − 1 p = p 1 η − 1 = ij = ( 0 ) a i , j 1 , , N L ij N ( ) ∑ η − 1 ip = p 1 ν − 1 = jk = = ( 0 ) b j 1 , , N and k 1 , , K L L jk K ( ) ∑ ν − 1 jp = p 1

  8. Discrete HMM • What ’ s the mode ? If λ is the mode of the prior density mode ⇒ = λ max p ( λ ) mode λ – So applying Lagrange Multiplier we can easily derive above modes. N N ∏ ∑ ( ) – Example : π π ∝ π η − ⇒ π π = Ψ + η − π 1 p ( , , ) log p ( , , ) 1 log L L i 1 N i 1 N i i = = i 1 i 1 ∂ π π N log p ( , , ) 1 L ( ) ∑ = η − × + π − = 1 N 1 l ( 1 ) 0 i p ∂ π π = p 1 i i η − η − 1 1 N N ∑ ∑ p ⇒ π = π = ∴ = i but 1 1 i − p − l l = = p 1 p 1 η − ( ) N 1 ∑ ∴ − = η − ⇒ π = l 1 i p i ( ) N ∑ η − = p 1 1 p = p 1

  9. Discrete HMM • Another reasonable choice of the initial estimate is the mean of the prior density. η η ij π ( 0 ) = = ( 0 ) = = i i 1 , , N a i , j 1 , , N L L i ij N N ∑ ∑ η η p ip = = p 1 p 1 ν jk ( 0 ) = = = b j 1 , , N and k 1 , , K L L jk K ∑ ν jp = p 1 • Both are some kind of summarization of the available information about the parameters before any data are observed.

  10. SCHMM ⇒ − Likelihood Semi Continuous HMM ⇒ + − Prior Dirichlet normal Wishart ⎡ ⎤ T ∑ ∏ Λ = π Let p ( X | ) b ( x ) a b ( x ) be the likelihood ⎢ ⎥ s s 1 s s s t ⎣ ⎦ − 1 1 t 1 t t = S t 2 K { } ∑ = = where X x ,..., x and b ( x ) w N ( x | m , r ) 1 T i t ik t k k = k 1 { } = Λ λ , , λ , θ , , θ where M is the total HMMs number L L 1 M 1 K { } = π = = ( m ) ( m ) ( m ) λ , a , w | i , j 1 ,..., N ( state number ), k 1 , , K L m i ij ik { } = = and θ m , r k 1 ,.., K ( mixture number ) k k k 1 T − − − ( x m ) r ( x m ) = π − k k k D / 2 1 / 2 where N ( x | m , r ) ( 2 ) | r | e 2 k k k

  11. SCHMM The prior density for Λ is assumed to be : ⎡ ⎤ ⎡ ⎤ M K ∏ ∏ = g ( Λ ) g ( λ ) g ( m , r ) ⎢ ⎥ ⎢ ⎥ m k k ⎣ ⎦ ⎣ ⎦ = = m 1 k 1 independent ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ N N N N K ∏ ∏ ∏ ∏ ∏ η − η − 1 ν − ∝ π 1 1 where g ( λ ) K a w ⎢ ij ⎥ ⎢ i ⎥ ⎢ ik ⎥ m c i ij ik ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ = = = = = i 1 i 1 j 1 i 1 k 1 If r is a full precision matrix then g ( m , r ) is assumed as a k k k α − τ D 1 T k − k − − − ( m µ ) γ ( m µ ) tr ( u r ) − ⇒ ∝ k k k k k k k normal Wishart g ( m , r ) | r | e e 2 2 2 k k k α > − τ > D 1 , 0 , µ is a vector of dimension D k k k × and u is a D D positive defintite matrix k If r is a diagonal precision matrix then g ( m , r ) is assumed as a k k k τ D α − 1 / 2 2 − kd − µ r ( m ) ∏ kd − β − ⇒ ∝ kd kd kd r product of normal gamma g ( m , r ) r e e 2 kd kd 2 k k kd = d 1

  12. SCHMM ( m , n ) ( m , n ) Let X denote the n th o bservation sequence of length T associated with model m and each model m has W observatio n sequences . m the MAP estimates of Λ can be obtained by ⎡ ⎤ ⎛ ⎞ M Wm ∏ ∏ = ⎜ ( m , n ) ⎟ Λ arg max f ( X | λ , Θ ) g ( Λ ) ⎢ ⎥ MAP m ⎝ ⎠ ⎣ ⎦ Λ = = m 1 n 1 Model 1 Model 2 Model M ( 1 , 1 ) ( 2 , 1 ) ( M , 1 ) X X X L M M M ( 1 , W ) ( 2 , W ) ( M , W ) X X X 1 2 M

  13. SCHMM Q-function : − Define a Q function as [ ] W M m ∑ ∑ = ( m , n ) ( m , n ) Q ( Λ | Λ ) E log f ( X , S , L | Λ ) | X , Λ = = m 1 n 1 W M ∑ ∑ m ∑ ∑ = ( m , n ) ( m , n ) ( m , n ) ( m , n ) ( m , n ) ( m , n ) f ( S , L | X , Λ ) log f ( X , S , L | Λ ) = = (m,n) (m,n) m 1 n 1 S L W ( m , n ) ( m , n ) ( m , n ) M f ( S , L , X | Λ ) ∑ ∑ m ∑ ∑ = ( m , n ) ( m , n ) ( m , n ) log f ( X , S , L | Λ ) ( m , n ) f ( X | Λ ) = = (m,n) (m,n) m 1 n 1 S L ( m , n ) ( m , n ) ( m , n ) Where f ( X , S , L | Λ ) ( m , n ) [ ] T ∏ = π ( m , n ) w N ( x | m , r ) a w N ( x | m , r ) s s , l 1 l l s s s l t l l − 1 1 1 1 1 t 1 t t t t t = t 2

  14. SCHMM Q-function : ∴ − Q function can be decomposed in ⎛ ⎞ ⎛ ⎞ ( m , n ) W W M N M N N T m m ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ⎜ ⎟ = ⎜ γ ⎟ π + γ + ( m , n ) ( m ) ( m , n ) ( m ) Q ( Λ | Λ ) ( i ) log ( i , j ) log a ⎜ ⎟ ⎜ ⎟ 1 i t ij ⎝ ⎠ ⎝ ⎠ = = = = = = = = m 1 i 1 n 1 m 1 i 1 j 1 n 1 t 1 ⎛ ⎞ ( m , n ) ( m , n ) W W ( ) M N K T K M T m m ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ⎜ ⎟ ξ + ξ ( m , n ) ( m ) ( m , n ) ( m , n ) ( i , k ) log w ( k ) log N ( x | m , r ) ⎜ ⎟ t ik t t k k ⎝ ⎠ = = = = = = = = = m 1 i 1 k 1 n 1 t 1 k 1 m 1 n 1 t 1 γ = = = ( m , n ) ( m , n ) ( m , n ) ( m , n ) where ( i , j ) Pr( s i , s j | X , λ ) + t t t 1 m γ = = ( m , n ) ( m , n ) ( m , n ) ( i ) Pr( s i | X , λ ) t t m ξ ( m , n ) = ( m , n ) = ( m , n ) = ( m , n ) ( i , k ) Pr( s i , l k | X , λ ) t t t m ξ = = ( m , n ) ( m , n ) ( m , n ) ( k ) Pr( l k | X , λ ) t t m ( m ) ( m , n ) w N ( x | m , r ) ξ ( m , n ) = γ ( m , n ) ⋅ and ( i , k ) ( i ) ik t k k t t K ∑ ( m ) ( m , n ) w N ( x | m , r ) ik t k k = k 1

  15. SCHMM ( ) ( ) M N M N N ∑ ∑ ∑ ∑ ∑ = η − π + η − ( m ) ( m ) ( m ) ( m ) log g ( Λ ) 1 log 1 log a i i ij ij = = = = = m 1 i 1 m 1 i 1 j 1 ( ) M N K K ∑ ∑ ∑ ∑ + ν − + + ( m ) ( m ) 1 log w log g ( m , r ) Constant jk jk k k = = = = m 1 i 1 k 1 k 1 ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ⎪ W ⎪ M N ∑ ∑ ∑ m = + = ⎜ γ ⎟ + η − π ( m , n ) ( m ) ( m ) R ( Λ | Λ ) Q ( Λ | Λ ) log g ( Λ ) ( i ) 1 log ⎨ ⎬ ⎢ ⎥ ⎜ ⎟ 1 i i ⎪ ⎪ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = 1 1 1 m i n ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ( m , n ) ⎪ W ⎪ M N N T m ∑ ∑ ∑ ∑ ∑ ⎜ ⎟ + γ + η − ( m , n ) ( m ) ( m ) ( i , j ) 1 log a ⎢ ⎥ ⎨ ⎬ ⎜ ⎟ t ij ij ⎪ ⎪ ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = = = m 1 i 1 j 1 n 1 t 1 ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ( m , n ) ⎪ W ⎪ M N K T ∑ ∑ ∑ ∑ m ∑ ⎜ ⎟ + ξ + ν − ( m , n ) ( m ) ( m ) ⎢ ( i , k ) 1 ⎥ log w ⎨ ⎬ ⎜ ⎟ t jk jk ⎪ ⎪ ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = = = m 1 i 1 k 1 n 1 t 1 ( m , n ) W ( ) K M T K m ∑ ∑ ∑ ∑ ∑ + ξ + + ( m , n ) ( m , n ) ( k ) log N ( x | m , r ) log g ( m , r ) Constant t t k k k k = = = = = k 1 m 1 n 1 t 1 k 1

  16. SCHMM Initial probability • Differentiating w.r.t and π ( m ) R ( Λ | Λ ) i equating it to zero. ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ∂ ∂ ⎪ W ⎪ R ( Λ | Λ ) M N ∑ ∑ ∑ m ⎜ ⎟ = ⇒ γ ( m , n ) + η ( m ) − π ( m ) = 0 ( i ) 1 log 0 ⎨ ⎢ ⎥ ⎬ ⎜ ⎟ 1 i i ∂ π ∂ π ( m ) ( m ) ⎪ ⎪ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = m 1 i 1 n 1 i i ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ ∂ W N m ∑ ∑ ⎜ ⎟ ⇒ ⎜ γ ( m , n ) ⎟ + η ( m ) − π ( m ) + π ( m ) = ( i ) 1 log l 0 ⎢ ⎥ ⎜ ⎟ ⎜ ⎟ ∂ π 1 i i j ( m ) ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ = = n 1 j 1 i ⎛ ⎞ W ∑ m ⎜ γ ⎟ + η − ( m , n ) ( m ) ( i ) 1 ⎜ ⎟ ⎡ ⎤ 1 i ⎛ ⎞ W 1 ⎝ ⎠ m ∑ = ⎜ γ ⎟ + η − + = ⇒ π = n 1 ( m , n ) ( m ) ( m ) ( i ) 1 l 0 ⎢ ⎥ ⎜ ⎟ 1 i i π ( m ) − l ⎝ ⎠ ⎣ ⎦ = n 1 i ⎛ ⎞ W ∑ m ⎜ ⎟ γ ( m , n ) + η ( m ) − ( j ) 1 ⎜ ⎟ 1 j ⎡ ⎤ ⎛ ⎞ W N N N ⎝ ⎠ ∑ ∑ ∑ ∑ m = π = ⇒ n 1 = ⇒ − = ⎜ γ ⎟ + η − ( m ) ( m , n ) ( m ) 1 1 l ( j ) 1 ⎢ ⎥ ⎜ ⎟ j 1 j − l ⎝ ⎠ ⎣ ⎦ = = = = j 1 j 1 j 1 n 1 ⎛ ⎞ W W m ∑ ∑ m ⎜ γ ⎟ + η − ( m , n ) ( m ) ( i ) 1 η − + γ ( m ) ( m , n ) 1 ( i ) ⎜ ⎟ 1 i i 1 ⎝ ⎠ = n 1 ∴ π ( m ) = = = n 1 i ⎡ ⎤ W ⎛ ⎞ N N W N m ∑ ∑ ∑ m ∑ ∑ η − + γ ( m ) ( m , n ) N ( j ) ⎜ γ ( m , n ) ⎟ + η ( m ) − ( j ) 1 ⎢ ⎥ ⎜ ⎟ j 1 1 j ⎝ ⎠ ⎣ ⎦ = = = j 1 j 1 n 1 = = j 1 n 1

  17. SCHMM Transition probability • Differentiating w.r.t and ( m ) a R ( Λ | Λ ) ij equating it to zero. ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ∂ ∂ ( m , n ) ⎪ W ⎪ R ( Λ | Λ ) M N N T m ∑ ∑ ∑ ∑ ∑ ⎜ ⎟ = ⇒ γ + η − = ( m , n ) ( m ) ( m ) 0 ⎢ ( i , j ) 1 ⎥ log a 0 ⎨ ⎬ ⎜ ⎟ ∂ ∂ t ij ij ( m ) ( m ) a a ⎪ ⎪ ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = = = m 1 i 1 j 1 n 1 t 1 ij ij ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ ∂ W ( m , n ) T N ∑ m ∑ ∑ ⎜ ⎟ ⎜ ⎟ ⇒ γ ( m , n ) + η ( m ) − ( m ) + ( m ) = ⎢ ( i , j ) 1 ⎥ log a l a 0 ⎜ ⎟ ⎜ ⎟ t ij ij ij ∂ ( m ) a ⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ = = = n 1 t 1 j 1 ij ⎛ ⎞ ( m , n ) W T m ∑ ∑ ⎜ ⎟ γ ( m , n ) + η ( m ) − ( i , j ) 1 ⎜ ⎟ ⎡ ⎤ t ij ⎛ ⎞ ( m , n ) W 1 T ⎝ ⎠ ∑ m ∑ = = ⎜ ⎟ n 1 t 1 γ + η − + = ⇒ = ( m , n ) ( m ) ( m ) ( i , j ) 1 l 0 a ⎢ ⎥ ⎜ ⎟ t ij ij ( m ) − a l ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = n 1 t 1 ij ⎛ ⎞ ( m , n ) W T ∑ m ∑ ⎜ ⎟ γ ( m , n ) + η ( m ) − ( i , j ) 1 ⎜ ⎟ t ij ⎡ ⎤ ⎛ ⎞ ( m , n ) W N N N T ⎝ ⎠ m ∑ ∑ ∑ ∑ ∑ = = n 1 t 1 ⎜ ⎟ = ⇒ = ⇒ − = γ + η − ( m ) ( m , n ) ( m ) a 1 1 l ⎢ ( i , j ) 1 ⎥ ⎜ ⎟ ij − t ij l ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = = = = j 1 j 1 j 1 n 1 t 1 ⎛ ⎞ ( m , n ) W T ( m , n ) W T ∑ m ∑ ⎜ ⎟ ∑ m ∑ γ + η − ( m , n ) ( m ) ( i , j ) 1 η ( m ) − + γ ( m , n ) 1 ( i , j ) ⎜ ⎟ t ij ij t ⎝ ⎠ = = n 1 t 1 ∴ ( m ) = = = = n 1 t 1 a ij ⎡ ⎤ ( m , n ) ⎛ ⎞ W ( m , n ) N N T W N T m ∑ ∑ ∑ ∑ ∑ ∑ m ∑ η − + γ ⎜ ⎟ ( m ) ( m , n ) N ( i , j ) γ ( m , n ) + η ( m ) − ( i , j ) 1 ⎢ ⎥ ⎜ ⎟ ij t t ij ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = = = j 1 j 1 n 1 t 1 = = = j 1 n 1 t 1

  18. SCHMM Mixture weight • Differentiating w.r.t and ( m ) w R ( Λ | Λ ) ik equate it to zero. ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ ∂ ∂ ( m , n ) ⎪ W ⎪ R ( Λ | Λ ) M N K T m ∑ ∑ ∑ ∑ ∑ ⎜ ⎟ = ⇒ ξ + ν − = ( m , n ) ( m ) ( m ) 0 ⎢ ( i , k ) 1 ⎥ log w 0 ⎨ ⎬ ⎜ ⎟ ∂ ∂ t ik ik ( m ) ( m ) w w ⎪ ⎪ ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ ⎩ ⎭ = = = = = m 1 i 1 k 1 n 1 t 1 ik ik ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ ∂ W ( m , n ) T K ∑ m ∑ ∑ ⎜ ⎟ ⎜ ⎟ ⇒ ξ ( m , n ) + ν ( m ) − ( m ) + ( m ) = ⎢ ( i , k ) 1 ⎥ log w l w 0 ⎜ ⎟ ⎜ ⎟ t ik ik ij ∂ ( m ) w ⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ = = = n 1 t 1 j 1 ik ⎛ ⎞ ( m , n ) W T m ∑ ∑ ⎜ ⎟ ξ ( m , n ) + ν ( m ) − ( i , k ) 1 ⎜ ⎟ ⎡ ⎤ t ik ⎛ ⎞ ( m , n ) W 1 T ⎝ ⎠ ∑ m ∑ = = ⎜ ⎟ n 1 t 1 ξ + ν − + = ⇒ = ( m , n ) ( m ) ( m ) ( i , k ) 1 l 0 w ⎢ ⎥ ⎜ ⎟ t ik ik ( m ) − w l ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = n 1 t 1 ik ⎛ ⎞ ( m , n ) W T ∑ m ∑ ⎜ ⎟ ξ ( m , n ) + ν ( m ) − ( i , j ) 1 ⎜ ⎟ t ij ⎡ ⎤ ⎛ ⎞ ( m , n ) W K N K T ⎝ ⎠ m ∑ ∑ ∑ ∑ ∑ = = n 1 t 1 ⎜ ⎟ = ⇒ = ⇒ − = ξ + ν − ( m ) ( m , n ) ( m ) w 1 1 l ⎢ ( i , j ) 1 ⎥ ⎜ ⎟ ij − t ij l ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = = = = j 1 j 1 j 1 n 1 t 1 ⎛ ⎞ ( m , n ) W T ( m , n ) W T ∑ m ∑ ⎜ ⎟ ∑ m ∑ ξ + ν − ( m , n ) ( m ) ( i , k ) 1 ν ( m ) − + ξ ( m , n ) 1 ( i , k ) ⎜ ⎟ t ik ik t ⎝ ⎠ = = n 1 t 1 ∴ ( m ) = = = = n 1 t 1 w ik ⎡ ⎤ ( m , n ) ⎛ ⎞ W ( m , n ) K K T W K T m ∑ ∑ ∑ ∑ ∑ ∑ m ∑ ν − + ξ ⎜ ⎟ ( m ) ( m , n ) K ( i , j ) ξ ( m , n ) + ν ( m ) − ( i , j ) 1 ⎢ ⎥ ⎜ ⎟ ij t t ij ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ = = = = j 1 j 1 n 1 t 1 = = = j 1 n 1 t 1

  19. SCHMM • Differentiating w.r.t and ( m ) m R ( Λ | Λ ) k equating it to zero. ⎡ ⎤ ⎛ ⎞ ( m , n ) ∂ ∂ W ( m , n ) M T log N ( x | m , r ) log g ( m , r ) ∑ ∑ m ∑ ⎜ ξ ⎟ + = ( m , n ) ( k ) t k k k k 0 ⎢ ⎥ ⎜ ⎟ t ∂ ∂ m m ⎝ ⎠ ⎣ ⎦ = = = m 1 n 1 t 1 k k ( 55 ) • Differentiating w.r.t and ( m ) r R ( Λ | Λ ) k equating it to zero. ⎡ ⎤ ⎛ ⎞ ( m , n ) ∂ ∂ W ( m , n ) M T log N ( x | m , r ) log g ( m , r ) m ∑ ∑ ∑ ⎜ ⎟ ξ + = ( m , n ) t k k k k ( k ) 0 ⎢ ⎥ ⎜ ⎟ t ∂ ∂ r r ⎝ ⎠ ⎣ ⎦ = = = m 1 n 1 t 1 k k ( 56 )

  20. SCHMM Full Covariance • Full Covariance matrix case : ′ ∂ ( m , n ) log N ( x | m , r ) ⎡ ⎤ 1 = − − − ( m , n ) T ( m , n ) t k k ( x m ) r ( x m ) ⎢ ⎥ t k k t k ∂ m ⎣ 2 ⎦ k 1 = − × + − × − T ( m , n ) ( ) ( r r )( x m ) ( 1 ) k k t k 2 = − ( m , n ) r ( x m ) k t k ′ ⎡ α − τ ⎤ D 1 ∂ log g ( m , r ) 1 ( ) T k − k − − − ( m µ ) r ( m µ ) tr u r = × k k k k k k k k k | r | e e 2 2 2 ⎢ ⎥ k ∂ m g ( m , r ) ⎣ ⎦ k k k α − τ D 1 τ 1 ( ) T k − − k − − tr u r ( m µ ) r ( m µ ) = × k k k k k k k × − − + k | r | e e ( m µ )( r r ) 2 2 2 k k k k k g ( m , r ) 2 k k = − τ − r ( m µ ) k k k k

  21. SCHMM Full Covariance • Full Covariance matrix case : ⎡ ⎤ ( m , n ) W ( ) M T ∑ ∑ m ∑ ξ − − τ − = ( m , n ) ( m , n ) ( k ) r ( x m ) r ( m µ ) 0 ⎢ ⎥ t k t k k k k k ⎣ ⎦ = = = m 1 n 1 t 1 ( m , n ) ( m , n ) W W M T M T m m ∑ ∑ ∑ ∑ ∑ ∑ ξ − ξ − τ + τ = ( m , n ) ( m , n ) ( m , n ) ( k ) r x ( k ) r m r m r µ 0 t k t t k k k k k k k k = = = = = = m 1 n 1 t 1 m 1 n 1 t 1 ⎡ ⎤ ( m , n ) ( m , n ) W W M T M T m m ∑ ∑ ∑ ∑ ∑ ∑ ξ + τ = ξ + τ ( m , n ) ( m , n ) ( m , n ) ( k ) m ( k ) x µ ⎢ ⎥ t k k t t k k ⎣ ⎦ = = = = = = m 1 n 1 t 1 m 1 n 1 t 1 ( m , n ) W M T m ∑ ∑ ∑ τ + ξ ( m , n ) ( m , n ) µ ( k ) x k k t t ∴ = = = = m m 1 n 1 t 1 k ( m , n ) W M T m ∑ ∑ ∑ τ + ξ ( m , n ) ( k ) k t = = = m 1 n 1 t 1

  22. SCHMM Full Covariance • Full Covariance matrix case : ′ ∂ [ ] ′ ( m , n ) log N ( x | m , r ) ⎡ ⎤ 1 = + − − − 1 / 2 ( m , n ) T ( m , n ) t k k log | r | ( x m ) r ( x m ) ⎢ ⎥ k t k k t k ∂ r ⎣ 2 ⎦ k [ ] ′ 1 ′ 1 [ ] − − = × × − − − 1 / 2 1 / 2 ( m , n ) T ( m , n ) | | | | | | r r r ( x m ) r ( x m ) k k k t k k t k 2 2 [ ] 1 − = − − − 1 ( m , n ) ( m , n ) T r ( x m )( x m ) k t k t k 2

  23. SCHMM Full Covariance • Full Covariance matrix case : ′ ⎡ α − τ ⎤ ∂ D 1 log g ( m , r ) 1 ( ) k k T − − − − ( m µ ) r ( m µ ) tr u r = × × k k k k k k k k k | r | e e 2 2 2 ⎢ ⎥ k ∂ r g ( m , r ) ⎣ ⎦ k k k (1) (2) (3) ⎡ α − − ⎤ D 1 α − D k × − × × 1 k | r | | r | r ( 2 ) ( 3 ) 2 ⎢ ⎥ k k k 2 ⎢ ⎥ τ ⎢ ⎥ = + × × × − − − T ( 1 ) ( 3 ) ( 2 ) k ( m µ )( m µ ) ⎢ ⎥ k k k k 2 ⎢ ⎥ 1 ⎢ ⎥ + × × × − ( 1 ) ( 2 ) ( 3 ) u ⎢ k ⎥ 2 ⎣ ⎦ α − τ D 1 = − − − − − 1 T k r k ( m µ )( m µ ) u k k k k k k 2 2 2

  24. SCHMM Full Covariance • Full Covariance matrix case : ⎡ ⎤ ( m , n ) [ ] W M T ⎛ 1 ⎞ ∑ ∑ m ∑ − ∴ ξ ( m , n ) 1 − ( m , n ) − ( m , n ) − T ⎜ ( k ) r ( x m )( x m ) ⎟ ⎢ ⎥ t k t k t k ⎝ 2 ⎠ ⎣ ⎦ = = = m 1 n 1 t 1 α − τ ⎡ ⎤ D 1 + − − − − − = 1 T k k r ( m µ )( m µ ) u 0 ⎢ ⎥ k k k k k k 2 2 2 ⎣ ⎦ ⎧ ⎫ ( m , n ) W M T m ∑ ∑ ∑ ⇒ − ξ + α − 1 ( m , n ) r ( k ) D ⎨ ⎬ k t k ⎩ ⎭ = = = m 1 n 1 t 1 ( m , n ) W M T m ∑ ∑ ∑ = + τ − − + ξ − − T ( m , n ) ( m , n ) ( m , n ) T u ( m µ )( m µ ) ( k )( x m )( x m ) k k k k k k t t k t k = = = m 1 n 1 t 1 ( m , n ) W M T m ∑ ∑ ∑ + τ − − + ξ − − T ( m , n ) ( m , n ) ( m , n ) T u ( m µ )( m µ ) ( k )( x m )( x m ) k k k k k k t t k t k ⇒ − = 1 = = = m 1 n 1 t 1 r k ( m , n ) W M T m ∑ ∑ ∑ ξ + α − ( m , n ) ( k ) D t k = = = m 1 n 1 t 1

  25. SCHMM Full Covariance • The initial estimate can be chosen as the mode of the prior PDF π ( m ) ( m ) ( m ) , a , w same to DHMM i ij ik and = m µ k k ( ) − = α − 1 r D u k k k • And also can be chosen as the mean of the prior PDF π ( m ) ( m ) ( m ) , a , w same to DHMM i ij ik and = m µ k k − = α 1 r u k k k

  26. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : • Then D 1 ∑ 1 / 2 ( m , n ) 2 ⎛ ⎞ − − D ( x m ) r ∏ kd kd td 2 ∝ ( m , n ) ⎜ ⎟ N ( x | m , r ) r e = d 1 t k k kd ⎝ ⎠ = d 1 and ⎛ ⎞ 1 D 2 − τ − µ r ( m ) ∏ ⎜ ⎟ α − − β ∝ 1 / 2 kd kd kd kd r g ( m , r ) r e e 2 kd kd kd ⎜ ⎟ k k kd ⎝ ⎠ = d 1 ⎛ ⎞ 1 D 2 − τ − µ r ( m ) ∑ ⎜ ⎟ α − − β = + 1 / 2 kd kd kd kd r log g ( m , r ) log r e e C 2 kd kd kd ⎜ ⎟ k k kd ⎝ ⎠ = d 1

  27. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : ∂ ( m , n ) log N ( x | m , r ) 1 = t k k ∂ ( m , n ) m N ( x | m , r ) kd t k k D 1 1 / 2 ∑ ( m , n ) 2 ⎛ ⎞ − − D r ( x m ) 1 ∏ kd kd td 2 × ⎜ ⎟ × × − × × ( m , n ) − × − r e ( ) ( r 2 )( x m ) ( 1 ) = d 1 kd kd td kd 2 ⎝ ⎠ = d 1 = − ( m , n ) r ( x m ) kd td kd ′ ⎡ ⎤ ∂ 1 log g ( m , r ) 1 2 − τ − µ r ( m ) α − − β = × kd kd kd kd 1 / 2 r k k r e e 2 ⎢ ⎥ kd kd kd kd ∂ m g ( m , r ) ⎣ ⎦ kd k k 1 τ 1 2 − τ − µ r ( m ) α − 1 / 2 − β = × kd k kd kd r × − − µ × r e e kd ( m )( r 2 ) 2 kd kd kd kd kd kd kd g ( m , r ) 2 k k = − τ − µ r ( m ) kd kd kd kd

  28. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : ⎡ ⎤ ( m , n ) W ( ) M T m ∑ ∑ ∑ ∴ ξ − − τ − µ = ( m , n ) ( m , n ) ( k ) r ( x m ) r ( m ) 0 ⎢ ⎥ t kd td kd kd kd kd kd ⎣ ⎦ = = = m 1 n 1 t 1 ( m , n ) ( m , n ) W W M T M T ∑ ∑ m ∑ ∑ ∑ m ∑ ⇒ ξ + τ = ξ + τ µ ( m , n ) ( m , n ) ( m , n ) ( k ) m m ( k ) x t kd kd kd t td kd kd = = = = = = m 1 n 1 t 1 m 1 n 1 t 1 ( m , n ) W M T ∑ ∑ m ∑ τ µ + ξ ( m , n ) ( m , n ) ( k ) x kd kd t td ⇒ = = = = m m 1 n 1 t 1 kd ( m , n ) W M T ∑ ∑ m ∑ τ + ξ ( m , n ) ( k ) kd t = = = m 1 n 1 t 1

  29. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : ∂ ( m , n ) log N ( x | m , r ) 1 = t k k ∂ ( m , n ) r N ( x | m , r ) kd t k k ⎡ ⎤ D − 1 1 / 2 ∑ ⎛ ⎞ ⎛ ⎞ ( m , n ) 2 − − D D 1 ( x m ) r ∏ ∏ kd kd td ⎢ ⎥ 2 × ⎜ ⎟ ⎜ ⎟ r r e = d 1 kd ki ⎢ ⎥ 2 ⎝ ⎠ ⎝ ⎠ = ≠ × d 1 i d ⎢ ⎥ D 1 ∑ ⎢ ⎥ ( m , n ) 2 − − ( x m ) r 1 td kd kd 2 + × − × − ( m , n ) e ( ) ( x m ) ⎢ ⎥ = d 1 td kd ⎣ ⎦ 2 [ ] 1 − = − − 1 ( m , n ) 2 r ( x m ) k td kd 2

  30. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : ′ ⎡ ⎤ 1 ∂ log g ( m , r ) 1 2 − τ − µ r ( m ) α − − β = × × kd kd kd kd 1 / 2 r k k | r | e e 2 ⎢ ⎥ kd kd kd kd ∂ r g ( m , r ) ⎣ ⎦ kd k k ⎡ ⎤ α − 3 / 2 α − × × × ⎢ ⎥ ( 1 / 2 ) r ( 2 ) ( 3 ) kd kd kd ⎢ ⎥ τ 1 ⎢ ⎥ = × + × × × − − µ 2 ( 1 ) ( 3 ) ( 2 ) kd ( m ) kd kd ⎢ ⎥ g ( m , r ) 2 k k ⎢ ⎥ 1 + × × × − β ⎢ ( 1 ) ( 2 ) ( 3 ) ( ) ⎥ kd ⎣ ⎦ 2 τ = α − × − − − µ − β 1 2 ( 1 / 2 ) r kd ( m ) kd kd kd kd kd 2

  31. SCHMM Diagonal Covariance • Diagonal Covariance matrix case : ⎡ [ ] ⎤ ( m , n ) ⎛ 1 ⎞ W M T m ∑ ∑ ∑ − ∴ ξ × − − ( m , n ) 1 ( m , n ) 2 ⎜ ( k ) r ( x m ) ⎟ ⎢ ⎥ t kd td kd 2 ⎝ ⎠ ⎣ ⎦ = = = m 1 n 1 t 1 τ + α − × − − − µ − β = 1 2 ( 1 / 2 ) r kd ( m ) 0 kd kd kd kd kd 2 ( m , n ) W M T m ∑ ∑ ∑ − − ξ + α − × ( m , n ) 1 1 ( k ) r ( 2 1 ) r t kd kd kd = = = m 1 n 1 t 1 ( m , n ) W M T m ∑ ∑ ∑ = β + τ − µ + ξ − 2 ( m , n ) ( m , n ) 2 2 ( m ) ( k )( x m ) kd kd kd kd t td kd = = = m 1 n 1 t 1 ( m , n ) W M T m β + τ − µ + ∑ ∑ ∑ ξ − 2 ( m , n ) ( m , n ) 2 2 ( m ) ( k )( x m ) kd kd kd kd t td kd − = = = = 1 r m 1 n 1 t 1 kd ( m , n ) W M T m ∑ ∑ ∑ α − + ξ ( m , n ) ( 2 1 ) ( k ) kd t = = = m 1 n 1 t 1

  32. SCHMM Diagonal Covariance • The initial estimate can be chosen as the mode of the prior PDF π ( m ) ( m ) ( m ) , a , w same to DHMM i ij ik and = µ m kd kd ( ) α − 1 / 2 = r kd kd β kd • And also can be chosen as the mean of the prior PDF π ( m ) ( m ) ( m ) , a , w same to DHMM i ij ik and = µ m kd kd α = r kd kd β kd

  33. CDHMM • Continuous Density HMM case: Then K ∑ = b ( x ) w N ( x | m , r ) i t ik t k k = k 1 ⇓ K ∑ = b ( x ) w N ( x | m , r ) i t ik t ik ik = k 1 and 1 T − − − ( x m ) r ( x m ) − = π t k k t k D / 2 1 / 2 where N ( x | m , r ) ( 2 ) | r | e 2 t k k k ⇓ 1 T − − − ( x m ) r ( x m ) = π − t ik ik t ik D / 2 1 / 2 where N ( x | m , r ) ( 2 ) | r | e 2 t ik ik ik

  34. CDHMM − In Q function ( m , n ) ( ) W K M T m ∑ ∑ ∑ ∑ ξ ( m , n ) ( m , n ) ( k ) log N ( x | m , r ) t t k k = = = = k 1 m 1 n 1 t 1 ⇓ ( m , n ) W ( ) N K M T m ∑ ∑ ∑ ∑ ∑ ξ ( m , n ) ( m , n ) ( i , k ) log N ( x | m , r ) t t ik ik = = = = = i 1 k 1 m 1 n 1 t 1

  35. CDHMM K ∑ In log g ( Λ ) log g ( m , r ) k k = k 1 ⇓ N K ∑ ∑ log g ( m , r ) ik ik = = i 1 k 1 α − τ D 1 T k − k − γ − − ( m µ ) ( m µ ) tr ( u r ) ∝ k k k k k k k and g ( m , r ) | r | e e 2 2 2 k k k ⇓ ( Full covariance case ) α − τ D 1 ik ik T − − − − ( m µ ) γ ( m µ ) tr ( u r ) ∝ ik ik ik ik ik ik ik g ( m , r ) | r | e e 2 2 2 ik ik ik τ D α − 1 / 2 2 − kd − µ ∏ r ( m ) kd − β ∝ kd kd kd r and g ( m , r ) r e e 2 kd kd 2 k k kd = d 1 ⇓ ( Diagonal covariance case ) τ D α − 1 / 2 2 − ikd − µ ∏ r ( m ) ikd − β ∝ ikd ikd ikd r g ( m , r ) r e e 2 ikd ikd 2 ik ik ikd = d 1

  36. Maximum Likelihood Maximum Likelihood Maximum Likelihood Linear Regression Linear Regression Linear Regression

  37. MLLR Background • Linear transformation of original model (SI) to maximize likelihood of adaptation • MLLR is multiplicative; MAP is additive • MLLR much less conservative than MAP – a few sec. of data may change model dramatically.

  38. MLLR Reference : – Speaker Adaptation of HMMs Using Linear Regression – TR ’ 94 Leggetter and Woodland – Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models – CSL ’ 95 Leggetter and Woodland – MLLR:A Speaker Adaptation Technique for LVCSR – Hamaker

  39. MLLR Single Gaussian Case • The regression transform is first derived for the single Gaussian distribution pre state, and later extended to the general case of Gaussian mixtures. • So, the p.d.f for the state s is 1 − T 1 − − − 1 / 2 ( x µ ) C ( x µ ) = b ( x ) e j j j ( ) j D / 2 π 1 / 2 2 | C | j µ is the mean and C is the covariance matrix j j

  40. MLLR Single Gaussian Case ω ⎡ ⎤ µ ⎡ ⎤ ⎢ ⎥ 1 µ ⎢ ⎥ ⎢ ⎥ = = 1 If µ is the mean, then we define ξ M ⎢ ⎥ ⎢ ⎥ M ⎢ µ ⎥ ⎢ ⎥ ⎣ ⎦ D µ ⎣ ⎦ D ω where is the offset term for the regression = + = The estimate of the adapted mean is µ Aµ b W ξ ( ) = × + where W A , b is the linear transform ( an D ( D 1 ) matrix ) ω = ⇒ If 1 include an offset in the regression ω = ⇒ If 0 ignore the offsets 1 1 − T 1 − − − ( x W ξ ) C ( x W ξ ) = j j j j j So b ( x ) e 2 j π D / 2 1 / 2 ( 2 ) | C | j

  41. MLLR Single Gaussian Case • A more general approach is adopted in which the same transformations matrix is used for several distributions. � Regression Class • If some of the distributions are not observed in the adaptation data, a transformation may still be applied. � Models would update whether correspond adaptation data observed or not.

  42. MLLR Single Gaussian Case • MLLR estimates the regression matrices to maximize the likelihood of the adapted models generation the adaptation data. � Maximize the likelihood to obtain the regression matrices. • Full and Diagonal covariance cases will be discussed.

  43. MLLR Single Gaussian Case Assume the adaptation data, X, is a series of T observations. = X x , x ,..., x 1 2 T λ Denote the current set of model parameters by λ and a re-estimated set of model parameters as ξ Current extended mean � µ Re-estimated mean �

  44. MLLR Single Gaussian Case The total likelihood is ∑ = f ( X | λ ) f ( X , S | λ ) S f ( X , S | λ ) is the likelihood of generating X using the state sequence S given model λ f ( X | λ ) The quantity is the objective function to be maximized during adaptation.

  45. MLLR Single Gaussian Case We define the auxiliary function [ ] ∑ = Q ( λ | λ ) f ( S | X , λ ) log f ( X , S | λ ) S W Since only the transformations are re-estimated, j b x ( ) only the output distributions are affected so j t the auxiliary function can be written as T ∑ ∑ = + = Q ( λ | λ ) constant f ( s j | X , λ ) log b ( x ) t j t = S t 1

  46. MLLR Single Gaussian Case ∑ γ = = We define ( t ) f ( s j | X , λ ) j t S So…The Q-function can be rewritten as T ∑ = + γ Q ( λ | λ ) constant ( t ) log b ( x ) j j t = t 1

  47. MLLR Single Gaussian Case log b x ( ) Expanding then the auxiliary function is j t [ ] N T 1 ∑ = ∑ = − × γ π + + Q ( λ | λ ) constant ( t ) D log( 2 ) log | C | h ( x , j ) j j t 2 = j 1 t 1 − = − − T 1 where h ( x , j ) ( x W ξ ) C ( x W ξ ) t t j j j t j j The differenti al of Q ( λ | λ ) w.r.t W is s ∂ ∂ [ ] N T Q ( λ | λ ) 1 ∑ ∑ = − γ π + + ( t ) D log( 2 ) log | C | h ( x , j ) j j t ∂ ∂ W 2 W = = j 1 t 1 s s

  48. MLLR Single Gaussian Case The differenti al of h ( x , j ) w.r.t W is t j ∂ ∂ h ( x , j ) − = − − T 1 t ( x W ξ ) C ( x W ξ ) t j j j t j j ∂ ∂ W W j j ∂ = − − − T T T 1 ( x ξ W ) C ( x W ξ ) t j j j t j j ∂ W j ∂ [ ] − − − − = − − + T 1 T T 1 T 1 T T 1 x C x ξ W C x x C W ξ ξ W C W ξ t j t j j j t t j j j j j j j j ∂ W j [ ] ∂ ( ) T = − − − − + − T T 1 T T T 1 ξ W C x C x W ξ ξ W C W ξ j j j t j t j j j j j j j ∂ W j [ ] − − − − − − = − − + + = 1 T T T T T 1 T 1 T C x ξ C x ξ C W ξ ξ C W ξ ξ C C Q j t j j t j j j j j j j j j j j [ ] T − = − − 1 2 C x W ξ ξ j t j j j

  49. MLLR Single Gaussian Case Then complete the differentiation, and equating to zero. ∂ [ ] T ∑ − = γ − = 1 T Q ( λ | λ ) ( t ) C x W ξ ξ 0 j j t j j j ∂ W = t 1 j T T ∑ ∑ ∴ γ − = γ − 1 T 1 T ( t ) C x ξ ( t ) C W ξ ξ j j t j j j j j j = = t 1 t 1 ⎛ ⎞ ⎛ ⎞ T T ∑ ∑ − − ⇒ γ = γ 1 ⎜ ⎟ T 1 ⎜ ⎟ T C ( t ) x ξ C W ( t ) ξ ξ j j t j j j j j j ⎝ ⎠ ⎝ ⎠ = = t 1 t 1 T ∑ γ ( t ) x ⎛ ⎞ j t T T ∑ ∑ ⇒ γ = γ ∴ = = ⎜ ⎟ = ( t ) x W ( t ) ξ µ W ξ t 1 j t j j j j j j T ⎝ ⎠ ∑ = = γ t 1 t 1 ( t ) j = t 1

  50. MLLR Tied Regression Matrices • Regression Class Tree for MLLR

  51. MLLR Tied Regression Matrices { } ( s ) = Consider t he s th regression class RC s ,..., s 1 R (s) If W is shared by the states in the regression class RC , then s T R T R ∑ ∑ ∑ ∑ − − γ = γ 1 T 1 T ( t ) C x ξ ( t ) C W ξ ξ s s t s s s s s s r r r r r r r = = = = t 1 r 1 t 1 r 1 ⎧ ⎫ ⎛ ⎞ R T ∑ ∑ ⇒ − γ 1 ⎜ ⎟ T C ( t ) x ξ ⎨ ⎬ s s t s ⎝ ⎠ r r r ⎩ ⎭ = = r 1 t 1 × + D ( D 1 ) ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ [ ] ⎪ ⎪ R T [ ] ∑ ∑ − = ⎜ γ ⎟ 1 T ( t ) C W ξ ξ ⎨ ⎬ ⎢ ⎥ s s s × + s s D ( D 1 ) + × + ⎪ ( D 1 ) ( D 1 ) ⎪ ⎝ ⎠ ⎣ r r ⎦ r r ⎩ ⎭ = = r 1 t 1 × D D ⎧ ⎫ ⎡ ⎤ ⎛ ⎞ R R T [ ] ∑ ∑ ∑ − = = γ ( r ) ( r ) 1 ⎜ ⎟ T Z V W D where Z C ( t ) x ξ ⎨ ⎬ ⎢ ⎥ × + D ( D 1 ) s s s t s ⎣ ⎦ ⎝ ⎠ r r r ⎩ ⎭ = = = r 1 r 1 t 1 × + D ( D 1 ) × + D ( D 1 ) ⎡ ⎤ ⎛ ⎞ [ ] T ∑ − ( r ) = ⎜ γ ⎟ 1 ( r ) = T V ( t ) C D ξ ξ ⎢ ⎥ s s s s + × + ( D 1 ) ( D 1 ) ⎝ ⎠ ⎣ r r ⎦ r r = t 1 × D D

  52. MLLR Tied Regression Matrices × + If right hand side is denoted by the D ( D 1 ) matrix Y [ ] [ ] [ ] [ ] = ⇒ = Z Y Z Y × + × + D ( D 1 ) D ( D 1 ) ij ij [ ] [ ] ⎡ ⎤ R [ ] ∑ = ( r ) ( r ) where Y V W D ⎢ ⎥ × + × + D D s × + ( D 1 ) ( D 1 ) D ( D 1 ) ⎣ ⎦ = r 1 × + D ( D 1 ) [ ] [ ] [ D ] ∑ = ( r ) ( r ) V W V W ik s s ij kj = k 1 ⎧ ⎫ ⎡ ⎤ { [ ] [ ] } [ ] [ [ ] + + ⎪ ⎪ R D 1 R D 1 D ] [ ] ∑ ∑ ∑ ∑ ∑ = = ( r ) ( r ) ( r ) ( r ) Y V W D V W D ⎨ ⎬ ⎢ ⎥ ij s qj ip s qj iq pq ⎪ ⎪ ⎣ ⎦ ⎩ ⎭ = = = = = r 1 q 1 r 1 q 1 p 1 [ ] [ ] ⎥ + ⎡ ⎤ D D 1 R [ ] ∑ ∑ ∑ = ( r ) ( r ) W V D ⎢ ip qj s pq ⎣ ⎦ = = = p 1 q 1 r 1

  53. MLLR Tied Regression Matrices ⇒ ( r ) If the covariance matrix is diagonal V is diagonal ( r ) and D is symmetric [ ] [ ] ⎧ R ∑ [ ] [ ] ⎪ = R ( r ) ( r ) V D i p ∑ = ( r ) ( r ) V D ip jq ⎨ ip qj = r 1 ⎪ ≠ = r 1 0 i p ⎩ [ ] [ ] + ⎡ ⎤ D D 1 R [ ] [ ] [ ] [ ] ∑ ∑ ∑ ( i , j ) ∴ = = ( r ) ( r ) G Z Y W V D ⎢ ⎥ pq ij ij s ip qj pq ⎣ ⎦ = = = p 1 q 1 r 1 [ ] [ ] + ⎡ ⎤ + D 1 R D 1 [ ] [ ] [ ] ∑ ∑ ∑ = = ( i ) ( r ) ( r ) W V D W G ⎢ ⎥ s ii jq s jq iq iq ⎣ ⎦ = = = q 1 r 1 q 1

  54. MLLR Tied Regression Matrices [ ] Then we can obtain a row i of W by solving below linear equations s [ ] [ ] [ ] [ ] [ ] [ ] [ ] ⎧ ( i ) + ( i ) + + ( i ) = G W G W G W Z ← j = 1 L + 1 , 1 1 , 2 1 , D 1 + i , 1 s s s i , 1 i , 2 i , D 1 ⎪ [ ] [ ] [ ] [ ] [ ] [ ] [ ] ( i ) ( i ) ( i ) + + + = ← j = G W G W G W Z 2 ⎪ L + 2 , 1 2 , 2 2 , D 1 + i , 2 s s s i , 1 i , 2 i , D 1 ⎨ M ⎪ [ ] [ ] [ ] [ ] [ ] [ ] [ ] ← = + j D 1 ⎪ ( i ) ( i ) ( i ) + + + = G W G W G W Z L ⎩ + + + + + D 1 , 1 D 1 , 2 D 1 , D 1 + i , D 1 s s s i , 1 i , 2 i , D 1

  55. MLLR Tied Regression Matrices If the covariance matrix is still full, [ ] we could obtain W by solving below linear equations s [ ] [ ] [ ] [ ] [ ] ⎧ ( 1 , 1 ) ( 1 , 1 ) + + = ← = = G W G W Z i 1 , j 1 L + 1 , 1 s D , D 1 s + 1 , 1 1 , 1 D , D 1 ⎪ [ ] [ ] [ ] [ ] [ ] ( 1 , 2 ) + + ( 1 , 2 ) = G W G W Z ← = = ⎪ i 1 , j 2 L + 1 , 1 D , D 1 + 1 , 2 s s 1 , 1 D , D 1 ⎨ M ⎪ [ ] [ ] [ ] [ ] [ ] ⎪ + + ( D , D 1 ) + + ( D , D 1 ) = ← = = + G W G W Z i D , j D 1 L ⎩ + + 1 , 1 D , D 1 + D , D 1 s s 1 , 1 D , D 1

  56. MLLR Mixture Gaussian Case • Then the p.d.f for the state j would be K 1 ∑ − T 1 − − − 1 / 2 ( x µ ) C ( x µ ) = b ( x ) w e jk jk jk ( ) j jk D / 2 π 1 / 2 2 | C | = k 1 jk µ is the mean and C is the covariance matrix jk jk and w is the mixture weight jk • Then likelihood ∑ ∑ = f ( X | λ ) f ( X , S , L | λ ) S L where S is one possible state sequence and L is one possible mixture sequence

  57. MLLR Mixture Gaussian Case • Then Q-function will be ∑ ∑ = Q ( λ | λ ) f ( S , L | X , λ ) log f ( X , S , L | λ ) S L Only consider the term which dependent on the regression transform. T ∑ ∑ ∑ ∴ = = = Q ( λ | λ ) f ( s j , l k | X , λ ) log b ( x ) b t t jk t = S L t 1 T ∑ = γ ( t ) log b ( x ) jk jk t = t 1 ∑ ∑ γ = = = where ( t ) f ( s j , l k | X , λ ) jk t t S L • The derivation is the same as single Gaussian γ case, just substitute for ( t ) γ ( t ) jk j

  58. MLLR Least Squares Regression • If all the covariance of the distributions tied to the same transformation are the same � a special case of MLLR • Then T R T R ∑ ∑ ∑ ∑ − − γ = γ 1 T 1 T ( t ) C x ξ ( t ) C W ξ ξ s s t s s s s s s r r r r r r r = = = = t 1 r 1 t 1 r 1 T R T R can be rewritten as ∑ ∑ ∑ ∑ ⇒ γ = γ T T ( t ) x ξ ( t ) W ξ ξ s t s s s s s r r r r r = = = = t 1 r 1 t 1 r 1

  59. MLLR Least Squares Regression • If each frame is assigned to exactly one distribution ( Viterbi alignment ) ⎧ 1 if x is assigned to state distributi on s γ = t r ( t ) ⎨ s 0 otherwise ⎩ r T T ∑ ∑ • Then ξ = δ T T δ x W ξ ξ t s s s s ( n ) ( n ) RC , s RC , s t t t t t = = t 1 t 1 ⎧ ∈ ( n ) 1 s RC δ = t where ⎨ ( n ) RC , s 0 otherwise ⎩ t T T ∑ ∑ ⇒ δ = δ T T x ξ W ξ ξ t s ( n ) s s s ( n ) RC , s RC , s t t t t t = = t 1 t 1

  60. MLLR Least Squares Regression Define matrices X , Y as [ ] = X ξ , ξ , , ξ L [ ] s s s 1 2 T = δ δ δ Y x , x , , x L 1 2 T ( n ) ( n ) ( n ) RC , s RC , s RC , s 1 2 T = T T then W XX YX s ( ) − 1 = T T W YX XX s

  61. MLLR Single Variable Linear Regression • If the scaling portion of the regression matrix is assumed to be diagonal, the computation can be vastly reduced. = + It means that µ x yµ i i ⎡ ⎤ w 1 , 1 ⎢ ⎥ ⎡ ⎤ w w 0 0 L M ⎢ ⎥ 1 , 1 1 , 2 ⎢ ⎥ ⎢ ⎥ w 0 w 0 w L ⎢ ⎥ 2 , 1 2 , 3 D , 1 ∴ = ⇒ = W w ⎢ ⎥ ⎢ ⎥ s s w M M M O M ⎢ ⎥ 1 , 2 ⎢ ⎥ ⎢ ⎥ w 0 0 w L M ⎣ ⎦ + D , 1 D , D 1 × + ⎢ ⎥ D ( D 1 ) w ⎣ ⎦ + D , D 1 × ( 2 D ) 1

  62. MLLR Single Variable Linear Regression × And define an D 2 D matrix D s ω µ ⎡ ⎤ 0 0 0 0 0 0 L L 1 ⎢ ⎥ ω µ 0 0 0 0 0 0 L L ⎢ ⎥ 2 = ⎢ ⎥ D M M O M M M M O M M s ⎢ ⎥ ω µ 0 0 0 0 0 0 L L ⎢ ⎥ − D 1 ⎢ ⎥ ω µ 0 0 0 0 0 0 L L ⎣ ⎦ D − = − − T 1 h ( x t s , ) ( x W ξ ) C ( x W ξ ) t s s s t s s − = − − T 1 ( x D w ) C ( x D w ) t s s s t s s

  63. MLLR Single Variable Linear Regression ∂ ∂ h ( x , s ) − = − − T 1 t ( x D w ) C ( x D w ) t s s s t s s ∂ ∂ w w s s [ ] ∂ = − − − − − − − T 1 T 1 T T 1 T T 1 x C x x C D w w D C x w D C D w t s t t s ∂ s s s s s t s s s s s w s ( ) [ ] [ ] T T − − − − = − T 1 − T 1 − T 1 + T 1 0 x C D D C x D C D D C D w t s s s s t s s s s s s s ( ) = − − − T 1 2 D C x D w s s t s s ∂ T ( ) ∑ − ∴ = γ T 1 − = Q ( λ | λ ) ( t ) D C x D w 0 s s s t s s ∂ w = t 1 s ⎡ ⎤ ⎡ ⎤ T T ∑ ∑ − − ⇒ γ = γ T 1 T 1 D C ( t ) x ( t ) D C D w ⎢ ⎥ ⎢ ⎥ s s s t s s s s s ⎣ ⎦ ⎣ ⎦ = = t 1 t 1 − 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ T T ∑ ∑ = γ − − γ T 1 T 1 w ( t ) D C D D C ( t ) x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ s s s s s s s s t ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ = = t 1 t 1

  64. MLLR Single Variable Linear Regression The extension to the tied regression matrix case : R T R T ∑ ∑ ∑ ∑ − − γ = γ T 1 T 1 ( t ) D C x ( t ) D C D w s s s t s s s s s r r r r r r r = = = = r 1 t 1 r 1 t 1 − 1 ⎡ ⎤ ⎡ ⎤ R T R T ∑ ∑ ∑ ∑ − − ⇒ = γ γ T 1 T 1 w ( t ) D C D ( t ) D C x ⎢ ⎥ ⎢ ⎥ s s s s s s s s t ⎣ ⎦ ⎣ ⎦ r r r r r r r = = = = r 1 t 1 r 1 t 1

  65. MLLR Defining Regression Classes • Two approaches were considered: – 1.based on broad phonetic classes. • Models which represent the same broad phonetic class were placed in the same regression class. – 2.based on clustering of mixture components. • The mixture components were compared using a likelihood measure and similar components placed in the same regression class. • The data driven approach was found to be more appropriate for defining large numbers of classes.

  66. MLLR Variance Adapted Reference : – Variance Compensation Within the MLLR Framework for Robust Speech Recognition and Speaker Adaptation – ICSLP ’ 96 Gales – Mean and variance adaptation within the MLLR framework – CSL ’ 96 Gales and Woodland – MLLR:A Speaker Adaptation Technique for LVCSR – Hamaker

  67. MLLR Variance Adapted Single Gaussian Case • We apply Cholesky Decomposition to the inverse of covariance matrix: − = 1 T C L L where L is a lower triangular matrix s s s s − − ∴ = T 1 C L L [ ] s s s D [ ] [ ] jd ∑ • We can observe that − = 1 C L L s s s ij id = d 1 • Now the inverse of covariance matrix is updated by − = − 1 1 T C L H L where H is the linear transforma tion s s s s s • So − − = T 1 C L H L s s s s

  68. MLLR Variance Adapted Single Gaussian Case • What does the transformation mean ? [ ] D – Origin : [ ] [ ] jd ∑ − = 1 C L L s s s ij id = d 1 – New : [ ] ] [ ] kd D D [ ] [ ∑ ∑ − = − 1 1 C L L H s s s s ij ik jd = = d 1 k 1 d-th column T L s i-th row k-th row − L 1 H s s j-th column

  69. MLLR Variance Adapted Single Gaussian Case • The auxiliary can be obtained transition probabilit y Q ( λ | λ ) [ ] N T 1 ∑ ∑ = − × γ π + + − − − T 1 constant ( t ) D log( 2 ) log | C | ( x µ ) C ( x µ ) j j t j j t j 2 = = j 1 t 1 [ ] 1 N T ∑ ∑ = − × γ π + − − + − − − T 1 T 1 T ( t ) D log( 2 ) log | L H L | ( x µ ) L H L ( x µ ) j j j j t j j j j t j 2 = = j 1 t 1 [ [ ] ] N T 1 ∑ ∑ − − − = − × γ π + T ⋅ ⋅ 1 + T − T T 1 T − T ( t ) D log( 2 ) log | L | | H | | L | ( L x L µ ) H ( L x L µ ) j j j j j t j j j j t j j 2 = = j 1 t 1 − ⋅ ⋅ − = − ⋅ − ⋅ = − − ⋅ = ⋅ T 1 T 1 T 1 | L | | H | | L | | L | | L | | H | | L L | | H | | C | | H | Q j j j j j j j j j j j [ ] 1 N T ∑ ∑ = − × γ π + + + − − − T T T 1 T T ( t ) D log( 2 ) log | C | log | H | ( L x L µ ) H ( L x L µ ) j j j j t j j j j t j j 2 = = j 1 t 1

  70. MLLR Variance Adapted Single Gaussian Case • Differentiate Q-function w.r.t and set it H j to zero then … [ ] ∂ N T ∑ ∑ − γ + − − = T T T 1 T T ( t ) log | H | ( L x L µ ) H ( L x L µ ) 0 j j j t j j j j t j j ∂ H = = j 1 t 1 j ⎡ ⎤ T 1 ∑ − γ × × T − T T − T T − T T T = ( t ) | H | H H ( L x L µ )( L x L µ ) H 0 ⎢ ⎥ j j j j j t j j j t j j j | H | ⎢ ⎥ ⎣ ⎦ = t 1 j T T ∑ ∑ − − − γ × T = γ T T − T T − T T T ( t ) H ( t ) H ( L x L µ )( L x L µ ) H j j j j j t j j j t j j j = = t 1 t 1 T T ∑ ∑ γ × = γ − − T T T T T T ( t ) H ( t )( L x L µ )( L x L µ ) j j j j t j j j t j j = = t 1 t 1 T ∑ γ − − T T T T T ( t )( L x L µ )( L x L µ ) j j t j j j t j j = T = t 1 H j T ∑ γ ( t ) j = t 1

  71. MLLR Variance Adapted Single Gaussian Case T ∑ γ − − T T T T T ( t )( L x L µ )( L x L µ ) j j t j j j t j j ∴ = T = H t 1 j T ∑ γ ( t ) j = t 1 ⎡ ⎤ T ∑ γ − − T T L ( t )( x µ )( x µ ) L ⎢ ⎥ j j t j t j j ⎣ ⎦ = = t 1 T ∑ γ ( t ) j = t 1 T We can observe that H is symmetric . j ∴ = T H H j j

  72. MLLR Variance Adapted Tied Regression Matrices Case If H is shared by R states { s , , s } then L s 1 R ⎧ ⎫ ⎡ ⎤ R T ∑ ∑ γ − − T T L ( t )( x µ )( x µ ) L ⎨ ⎬ ⎢ ⎥ s s t s t s s ⎣ ⎦ ⎩ r r r r r ⎭ = = r 1 t 1 = H s R T ∑ ∑ γ ( t ) s r = = r 1 t 1 ←

  73. MLLR another approach MLLR another approach MLLR another approach

  74. MLLR another approach • Reference: – Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures – SAP ’ 95 Vassilios V. Digalakis

  75. Introduction • This approach is an extension of model space MLLR where the covariances of the Gaussian components are constrained to share the same transforms as the means. • The transformed means and variances and are given as a function of the transform parameters: = + µ Aµ b = T Σ A Σ A

  76. Single Gaussian Case • Assume the adaptation data, X, is a series of T observations. = X x , x ,..., x 1 2 T • For each state s • Denote the initial model by = ( 0 ) ( 0 ) ( 0 ) λ ( µ , Σ ) s s s • Current set of model parameters by applying the transformation A s = + ( 0 ) ( 0 ) T λ ( A µ b , A Σ A ) s s s s s s s

  77. Single Gaussian Case • Re-estimated set of model parameters by applying the transformation A s = + ( 0 ) ( 0 ) T λ ( A µ b , A Σ A ) s s s s s s s • We denote the parameter set = µ µ µ Σ Σ Σ ( 0 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) Λ { , , , , , , , } L L 1 2 N 1 2 N s s = η { A , A , , A , b , b , , b } L L 1 2 N 1 2 N s s N is the total state number s

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend