- EN. 601.467/667
Introduction to Human Language Technology Deep Learning II
Shinji Watanabe
1
EN. 601.467/667 Introduction to Human Language Technology Deep - - PowerPoint PPT Presentation
EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1 Todays agenda Basics of (deep) neural network How to integrate DNN with HMM Recurrent neural network Language modeling
Shinji Watanabe
1
2
3
Output HMM state or phoneme Speech features ~50 to ~1000 dim ~7hidden layers, 2048 units 30 ~ 10,000 units
a i u w N
π! π‘! = π
4
*/'(
= π(π, πβ²)
*,'(&(
= π π, π0 β10
5
6
!$
= π(π¦)(1 β π(π¦))
7
8
!-"
= π(π|π’)(1 β π π π’ )
!*('|π’) !-"
= βπ(π|π’) π π π’
!-"
= π(π|π’)(π(π, π) β π π π’ ) :π(π, π): Kroneckerβs delta
9
10
, exp , sin , cos , tan()
11
simplified
12
π’234 β π’
5
two
13
14
Input: π©! β β# Output: π‘! β {1, β¦ , πΎ} Linear transformation Sigmoid activation Softmax activation οΌ, β, exp , log , etc.
15
Input: π©! β β# Output: π‘! β {1, β¦ , πΎ} Linear transformation Softmax activation
16
Input: π©! β β# Output: π‘! β {1, β¦ , πΎ} Linear transformation Sigmoid activation Softmax activation Linear transformation
17
Input: π©! β β# Output: π‘! β {1, β¦ , πΎ} Linear transformation Sigmoid activation Softmax activation οΌ, β, exp , log , etc.
19
= π(ππ¦ + π) where π = π, π
*/
= *L(M)
*M *(NO./) */
= π π§ 1 β π π§ = π ππ¦ + π 1 β π ππ¦ + π
*N
=
20
π§ = ππ¦ + π
and we just combine these derivatives
21
Softmax activation Linear transformation Sigmoid activation Linear transformation
processing
22
where
Whole data (batch)
mini batch mini batch mini batch
Split the whole data into minibatch Ξ$%& Ξ'()* Ξ$%& Ξ'()+ Ξ'()* Ξ'(),
Ξ("#$) = Ξ(") β π % Ξ&'()
(")
be degraded. Then the decay factor becomes another a hyperparameter)
π(Ξ./01
2
, Ξ./01
23) , β¦ )
information
23
24
25
Feature extraction
βI want to go to Johns Hopkins campusβ
Ac Acoustic mo modeling (HM (HMM) Lexicon Language modeling
G OW T UW βgo toβ βgo twoβ βgo tooβ βgoes toβ βgoes twoβ βgoes tooβ G OW Z T UW
27
Output HMM state Log mel filterbank + 11 context frames ~7hidden layers, 2048 units 30 ~ 10,000 units
ο½₯ο½₯ο½₯ ο½₯ο½₯ο½₯
Input speech features
a i u w N
γ»γ»γ»
the left and right contexts, and just throw it!
28
by an HMM during recognition)
29
30
31
Output HMM state Log mel filterbank + 11 context frames 1,000 ~ 10,000 units
ο½₯ο½₯ο½₯ ο½₯ο½₯ο½₯
Input speech features
a i u w N
γ»γ»γ»
Bottleneck layer
32
33
π‘V, π‘5, β¦ , sW π©V, π©5, β¦ , π©X
34
35
xt yt yt-1
36
xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5
37
xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5
38
xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5
Softmax activation π(π‘-|π¦*, π¦,, β¦ , π¦-)
39
x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y3 y2 y4 y5 y6
40
x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y3 y2 y4 y5 y6 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5
π³! = π π π³!.* π²/ π³! = π π π³!.* π²/
42
xt yt LSTM block ct ct-1 yt yt-1
43
Ο π²! π³!.* π!.* π³! π! tanh Ο concat Ο tanh Linear Linear Linear Linear
Operations w/o trainable parameters Operations w/ trainable parameters
added
by a gate function [0, 1] through the sigmoid activation
44 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
(information))
45
Ο π’! π!
added
46 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
added
47 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
added
48 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
added
49 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
added
50 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
51 Ο "# $#%& '#%& $# '# tanh Ο concat Ο tanh Linear Linear Linear Linear
52
xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5
53
w0 w1 w1 w2 w2 w3 w3 w4 w4 w5 w5 w6 y2 y1 y3 y4 y5 βIβ βwantβ βtoβ βgoβ βtoβ βmy β<s>β βIβ βwantβ βtoβ βgoβ βtoβ
s1
s2
s3
s4
s5
s6 y2 y1 y3 y4 y5
Feature extraction
βI want to go to Johns Hopkins campusβ
Ac Acoustic mo modeling (HM (HMM) Lexicon La Language mo modeling
G OW T UW βgo toβ βgo twoβ βgo tooβ βgoes toβ βgoes twoβ βgoes tooβ G OW Z T UW
55
5 β π|π = 1, β¦ , πΎ), be a character sequence
56
61
I want to
π π· π = R
!
π(π
!|π":!$", π° !)
π°
! = : "#$ %
π!" π’β²"
%
π!" = 1
62
π** π*, π*+ π*0 π,1 π,- π,2 π+3 π+4 π°
5 has an explicit dependency
for character π
5
We can represent an alignment problem as a differentiable function
63
Normal arrow: high probability Dashed arrow: low probability c
64
Normal arrow: high probability Dashed arrow: low probability c
65
Normal arrow: high probability Dashed arrow: low probability
and output.
66