EN. 601.467/667 Introduction to Human Language Technology Deep - - PowerPoint PPT Presentation

β–Ά
en 601 467 667 introduction to human language technology
SMART_READER_LITE
LIVE PREVIEW

EN. 601.467/667 Introduction to Human Language Technology Deep - - PowerPoint PPT Presentation

EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1 Todays agenda Basics of (deep) neural network How to integrate DNN with HMM Recurrent neural network Language modeling


slide-1
SLIDE 1
  • EN. 601.467/667

Introduction to Human Language Technology Deep Learning II

Shinji Watanabe

1

slide-2
SLIDE 2

Today’s agenda

  • Basics of (deep) neural network
  • How to integrate DNN with HMM
  • Recurrent neural network
  • Language modeling
  • Attention based encoder-decoder
  • Machine translation or speech recognition

2

slide-3
SLIDE 3

Deep neural network

  • Very large combination of linear classifiers

3

Output HMM state or phoneme Speech features ~50 to ~1000 dim ~7hidden layers, 2048 units 30 ~ 10,000 units

a i u w N

𝒑! 𝑑! = π‘˜

slide-4
SLIDE 4

Feed-forward neural networks

  • Affine transformation and non-linear activation function (sigmoid

function)

  • Apply the above transformation L times
  • Softmax operation to get the probability distribution

4

slide-5
SLIDE 5

Linear operation

  • Transforms 𝐸("#$)-dimensional input to 𝐸(") output

𝑔(𝐒("#$)) = 𝐗(")𝐒("#$) + 𝐜(")

  • 𝐗(&) ∈ ℝ((")Γ—(("$%): Linear transformation matrix
  • 𝐜(&) ∈ ℝ(("): bias vector
  • Derivatives
  • * βˆ‘& ,'&-&./'

*/'(

= πœ€(𝑗, 𝑗′)

  • *(βˆ‘& ,'&-&./')

*,'(&(

= πœ€ 𝑗, 𝑗0 β„Ž10

5

slide-6
SLIDE 6

DNN model size

  • Mainly determined by the number of dimensions

(units) 𝐸 and the number of layers 𝑀

6

Which one makes the model larger? How much?

slide-7
SLIDE 7

Sigmoid function

  • Sigmoid function
  • Convert the domain from ℝ to [0, 1]
  • Elementwise sigmoid function:
  • No trainable parameter in general
  • Derivative
  • !"($)

!$

= 𝜏(𝑦)(1 βˆ’ 𝜏(𝑦))

7

slide-8
SLIDE 8

Sigmoid function cont’d

  • 𝜏(𝑦)
  • 𝜏& 𝑦 = 𝜏(𝑦)(1 βˆ’ 𝜏 𝑦 )

8

slide-9
SLIDE 9

Softmax function

  • Softmax function
  • Convert the domain from ℝ& to 0, 1 & (make a multinomial dist. β†’ classification)
  • Satisfy the sum to one condition, i.e., βˆ‘'() π‘ž π‘˜ 𝐒 = 1
  • 𝐾 = 2: sigmoid function
  • Derivative
  • For 𝑗 = π‘˜: !*('|𝐒)

!-"

= π‘ž(π‘˜|𝐒)(1 βˆ’ π‘ž π‘˜ 𝐒 )

  • For 𝑗 β‰  π‘˜:

!*('|𝐒) !-"

= βˆ’π‘ž(𝑗|𝐒) π‘ž π‘˜ 𝐒

  • Or we can write as !*('|𝐒)

!-"

= π‘ž(π‘˜|𝐒)(πœ€(𝑗, π‘˜) βˆ’ π‘ž 𝑗 𝐒 ) :πœ€(𝑗, π‘˜): Kronecker’s delta

9

slide-10
SLIDE 10

Why it is used for the probability distribution?

10

slide-11
SLIDE 11

What functions/operations we can use and cannot use?

  • Most of elementary functions
  • +, βˆ’,Γ—,Γ·, log

, exp , sin , cos , tan()

  • The function/operations that we cannot take a derivative, including

some discrete operation

  • argmax,π‘ž(𝑋|𝑃): Basic ASR operation, but we cannot take a derivative….
  • Discretization

11

slide-12
SLIDE 12

Objective function design

  • We usually use the cross entropy as an objective function
  • Since the Viterbi sequence is a hard assignment, the summation over states is

simplified

12

slide-13
SLIDE 13

Other objective functions

  • Square error

𝐒234 βˆ’ 𝐒

5

  • We could also use p norm, e.g., L1 norm
  • Binary cross entropy
  • Again this is a special case of the cross entropy when the number of classes is

two

13

slide-14
SLIDE 14

Building blocks

14

Input: 𝐩! ∈ ℝ# Output: 𝑑! ∈ {1, … , 𝐾} Linear transformation Sigmoid activation Softmax activation οΌ‹, βˆ’, exp , log , etc.

slide-15
SLIDE 15

Building blocks

15

Input: 𝐩! ∈ ℝ# Output: 𝑑! ∈ {1, … , 𝐾} Linear transformation Softmax activation

slide-16
SLIDE 16

Building blocks

16

Input: 𝐩! ∈ ℝ# Output: 𝑑! ∈ {1, … , 𝐾} Linear transformation Sigmoid activation Softmax activation Linear transformation

slide-17
SLIDE 17

Building blocks

17

Input: 𝐩! ∈ ℝ# Output: 𝑑! ∈ {1, … , 𝐾} Linear transformation Sigmoid activation Softmax activation οΌ‹, βˆ’, exp , log , etc.

slide-18
SLIDE 18

How to optimize? Gradient decent and their variants

  • Take a derivative and update parameters with this derivative
  • Chain rule
  • Learning rate ρ

19

slide-19
SLIDE 19

Chain rule

  • Chain rule
  • For example
  • 𝑔 𝑕 πœ„

= 𝜏(𝑏𝑦 + 𝑐) where πœ„ = 𝑏, 𝑐

  • 𝑔 𝑦 = 𝜏 𝑦
  • 𝑕 𝑦 = 𝑏𝑦 + 𝑐
  • *I(J K )

*/

= *L(M)

*M *(NO./) */

= 𝜏 𝑧 1 βˆ’ 𝜏 𝑧 = 𝜏 𝑏𝑦 + 𝑐 1 βˆ’ 𝜏 𝑏𝑦 + 𝑐

  • *I(J K )

*N

=

20

𝑧 = 𝑏𝑦 + 𝑐

slide-20
SLIDE 20

Deep neural network: nested function

  • Chain rule to get a derivative recursively
  • Each transformation (Affine, sigmoid, and softmax) has analytical derivatives

and we just combine these derivatives

  • We can obtain the derivative from the back propagation algorithm

21

Softmax activation Linear transformation Sigmoid activation Linear transformation

slide-21
SLIDE 21

Minibatch processing

  • Batch processing
  • Slow convergence
  • Effective computation
  • Online processing
  • Fast convergence
  • Very inefficient computation
  • Minibatch processing
  • Something between batch and online

processing

22

where

Whole data (batch)

mini batch mini batch mini batch

Split the whole data into minibatch Θ$%& Θ'()* Θ$%& Θ'()+ Θ'()* Θ'(),

slide-22
SLIDE 22

How to set 𝜍?

Θ("#$) = Θ(") βˆ’ 𝜍 % Ξ”&'()

(")

  • Stochastic Gradient Decent (SGD)
  • Use a constant value (hyper-parameter)
  • Can have some heuristic tuning (e.g., 𝜍 ← 0.5 Γ— 𝜍 when the validation loss started to

be degraded. Then the decay factor becomes another a hyperparameter)

  • Adam, AdaDelta, RMSProp, etc.
  • Use current or previous gradient information adaptively update

𝜍(Ξ”./01

2

, Ξ”./01

23) , … )

  • Still has hyperparameters to make a balance between current and previous gradient

information

  • Choice of an appropriate optimizer and its hyperparameters is critical

23

slide-23
SLIDE 23

Difficulties of training

  • Blue: accuracy of training data (higher is better)
  • Orange: accuracy of validation data (higher is better)

24

slide-24
SLIDE 24

Today’s agenda

  • Basics of (deep) neural network
  • How to integrate DNN with HMM
  • Recurrent neural network
  • Language modeling
  • Attention based encoder-decoder
  • Machine translation or speech recognition

25

slide-25
SLIDE 25

Speech recognition pipeline

Feature extraction

β€œI want to go to Johns Hopkins campus”

Ac Acoustic mo modeling (HM (HMM) Lexicon Language modeling

G OW T UW β€œgo to” β€œgo two” β€œgo too” β€œgoes to” β€œgoes two” β€œgoes too” G OW Z T UW

p(O|L)

<latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit>

p(L|W)

<latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit>

p(W)

<latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="UCVdQAUobHOL6CgCuMDHI9qnb8=">ACUnicbVHLSgMxFE3HVx2rtms3wSK4KlM36k5wI7ipYG2hU8qdTNqG5jEkd6xl6A+49evc+iWmD0RbLwQO5+RwknOTAqHUfRZCnZ29/YPyofhUSU8PjmtVl6cyS3jbWaksd0EHJdC8zYKlLybWQ4qkbyTO4XeueVWyeMfsZxvsKRloMBQP0VGtQrUeNaDl0GzTXoE7WM6iVHuPUsFxjUyCc71mlG/AIuCST4P49zxDNgERrznoQbFXb9YvnNOLzyT0qGx/mikS/a3owDl3Ewl/qYCHLtNbUH+q6lcorBmupGPw5t+IXSWI9dsFT/MJUVDF1XQVFjOUM48AGaF/wFlY7DA0BcWhrHmU2aUAp0WMdiRgrd5ES/CTVbEVtE1F0uhBLr5tkHobYPnfgy+/eZm19ugfdW4bURPESmTM3JOLkmTXJM78kBapE0YSck7+Sh9BeXgZLWkoLTeVo38maD6DWK+uXQ=</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">ACWXicbVHLSgMxFL0d3/VFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/pf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryurZe21i59TZ3XDS5VdbdpeiFkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNzUSH0/qZXkv5rOFUlnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq4Vtwp+Jtr8AZ0S7oA=</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">ACWXicbVHLSgMxFL0d3/VFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/pf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryurZe21i59TZ3XDS5VdbdpeiFkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNzUSH0/qZXkv5rOFUlnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq4Vtwp+Jtr8AZ0S7oA=</latexit><latexit sha1_base64="9aYdZ0Xohvj92Q9L2Vaw21RnMm4=">ACZHicbVHLSgMxFE3Hd7U+cSVIsAh1U6Zu1J3gRhBEwVrBKeVOmraheQzJHbUM/QW3+mt+gb9hpg6irRcCh3Pu4SQncSKFwzD8KAVz8wuLS8sr5dW1yvrG5tb2vTOpZbzJjDT2IQbHpdC8iQIlf0gsBxVL3oqHF7neuLWCaPvcJTwtoK+Fj3BAHMqbWOpvVsB5Ohs6CRgGqpJibzlbpKuoaliqukUlw7rERJtjOwKJgko/LUep4AmwIf7oQbFXTubXHZMDz3TpT1j/dFIJ+xvRwbKuZGK/aYCHLhpLSf/1VQqUVjzPJWPvdN2JnSItfsO76XSoqG5n3QrCcoRx5AMwK/wLKBmCBoW+tXI40f2ZGKdDdLALbV/AyzqI83CRZBUtuEgKJdCNZw1Czxo892Pw9Temy54FzeP6WT28Davn18U/LJM9ckBqpEFOyDm5JDekSRgZkFfyRt5Ln0El2Al2v1eDUuHZIX8m2P8CigS8eA=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit>
slide-26
SLIDE 26

Feed-forward neural network for acoustic model

  • Basic problem π‘ž(𝑑*|𝐩*)
  • 𝑑4: HMM state or phoneme
  • 𝐩4: speech feature vector
  • 𝑒: data sample
  • Configurations
  • Input features
  • Context expansion
  • Output class
  • Softmax function
  • Training criterion
  • Number of layers
  • Number of hidden states
  • Type of non-linear activations

27

Output HMM state Log mel filterbank + 11 context frames ~7hidden layers, 2048 units 30 ~ 10,000 units

ο½₯ο½₯ο½₯ ο½₯ο½₯ο½₯

Input speech features

a i u w N

・・・

slide-27
SLIDE 27

Input feature

  • GMM/HMM formulation
  • Lot of conditional independence assumption and Markov assumption
  • Many of our trials are how to break these assumptions
  • In GMM, we always have to care about the correlation
  • Delta, linear discriminant analysis, semi-tied covariance
  • In DNN, we don’t have to care J
  • We can simply concatenate

the left and right contexts, and just throw it!

28

slide-28
SLIDE 28

Output

  • Phoneme or HMM state ID is used
  • We need to have a pair data of output and input data at frame 𝑒
  • First use the Viterbi alignment to obtain the state sequence
  • Then, we get the input and output pair
  • Make acoustic model as a multiclass classification problem by

predicting the all HMM state ID given the observation

  • Not consider any constraint in this stage (e.g., left to right, which is handled

by an HMM during recognition)

29

slide-29
SLIDE 29

How to integrate DNN with HMM

  • Bottleneck feature
  • DNN/HMM hybrid

30

slide-30
SLIDE 30

Bottleneck feature

  • Train DNN, but one layer having a

narrow layer

  • Use a hidden state vector for

GMM/HMM

  • Nonlinear feature extraction with

discriminative abilities

  • Can combine with existing

GMM/HMM

31

Output HMM state Log mel filterbank + 11 context frames 1,000 ~ 10,000 units

ο½₯ο½₯ο½₯ ο½₯ο½₯ο½₯

Input speech features

a i u w N

・・・

Bottleneck layer

slide-31
SLIDE 31

DNN/HMM hybrid

  • How to make it fit to the HMM framework?
  • Use the Bayes rule to convert the posterior to the likelihood
  • is obtained by the maximum likelihood (unigram count)
  • Need a modification in the Viterbi algorithm during recognition

32

slide-32
SLIDE 32

Today’s agenda

  • Basics of (deep) neural network
  • How to integrate DNN with HMM
  • Recurrent neural network
  • Language modeling
  • Attention based encoder-decoder
  • Machine translation or speech recognition

33

slide-33
SLIDE 33

Recurrent neural network

  • Basic problem
  • HMM state (or phoneme) or speech feature is a sequence

𝑑V, 𝑑5, … , sW 𝐩V, 𝐩5, … , 𝐩X

  • It’s better to consider context (e.g., previous input) to predict the probability
  • f 𝑑X

π‘ž 𝑑' 𝐩' β†’ π‘ž(𝑑'|𝐩$, 𝐩(, … , 𝐩')

  • Recurrent neural network (RNN) can handle such problems

34

slide-34
SLIDE 34

Recurrent neural network (Elman type)

  • Vanilla RNN: We ignore the bias term for simplicity

35

xt yt yt-1

𝐳' = 𝜏 𝐗 𝐳'#$ 𝐲)

slide-35
SLIDE 35

Recurrent neural network (Elman type)

  • Vanilla RNN

36

xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5

slide-36
SLIDE 36

Recurrent neural network (Elman type)

  • Vanilla RNN

37

xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5

Possibly consider long-range effect (but longer weaker) no future context

slide-37
SLIDE 37

Recurrent neural network (Elman type)

  • Vanilla RNN

38

xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5

We can compute the posterior distribution π‘ž(𝑑'|𝐩$, 𝐩(, … , 𝐩')

Softmax activation π‘ž(𝑑-|𝑦*, 𝑦,, … , 𝑦-)

slide-38
SLIDE 38

Bidirectional RNN

39

x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y3 y2 y4 y5 y6

We can compute the posterior distribution π‘ž(𝑑'|𝐩', 𝐩'*$, … , )

slide-39
SLIDE 39

Bidirectional RNN

40

x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y3 y2 y4 y5 y6 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5

We can compute the posterior distribution π‘ž(𝑑'|𝐩$, 𝐩(, … , 𝐩', 𝐩'*$, … , )

𝐳! = 𝜏 𝐗 𝐳!.* 𝐲/ 𝐳! = 𝜏 𝐗 𝐳!.* 𝐲/

slide-40
SLIDE 40

Long short-term memory RNN

  • Keep two states
  • Normal recurrent state: yt
  • Memory cell: ct

42

xt yt LSTM block ct ct-1 yt yt-1

slide-41
SLIDE 41

LSTM block

43

Οƒ 𝐲! 𝐳!.* 𝐝!.* 𝐳! 𝐝! tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

Operations w/o trainable parameters Operations w/ trainable parameters

slide-42
SLIDE 42

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt
  • ”will be” function is implemented

by a gate function [0, 1] through the sigmoid activation

44 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-43
SLIDE 43

tanh and sigmoid activations

  • sigmoid
  • Convert the domain from to
  • Used as a gating (weight the state vector

(information))

  • tanh
  • Convert the domain from to
  • Allow negative and positive values

45

Οƒ 𝐒! π’œ!

slide-44
SLIDE 44

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt

46 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-45
SLIDE 45

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt

47 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-46
SLIDE 46

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt

48 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-47
SLIDE 47

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt

49 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-48
SLIDE 48

LSTM block

  • Cell state keeps the history

information

  • 1. It will be forgotten
  • 2. New information from xt will be

added

  • 3. The cell information will be
  • utputted as yt

50 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-49
SLIDE 49

LSTM block summary

  • 3 gating functions
  • Cell update
  • Hidden state update

51 Οƒ "# $#%& '#%& $# '# tanh Οƒ concat Οƒ tanh Linear Linear Linear Linear

slide-50
SLIDE 50

LSTM RNN

  • LSTM

52

xt yt yt-1 x1 y1 x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 y2 y1 y3 y4 y5

Possibly consider to keep the above initial information dependency by 𝐑+,-./) = 𝟐 and 𝐑0123) = 𝟏 at 𝑒 > 1

slide-51
SLIDE 51

RNN can be used for both acoustic and language models

  • HMM/DNN Acoustic model

π‘ž(𝑑'|𝐩$, 𝐩(, … , 𝐩')

  • Language model

π‘ž(π‘₯4|π‘₯$, π‘₯(, … , π‘₯4#$)

53

w0 w1 w1 w2 w2 w3 w3 w4 w4 w5 w5 w6 y2 y1 y3 y4 y5 ”I” β€œwant” β€œto” β€œgo” β€œto” β€œmy β€œ<s>” ”I” β€œwant” β€œto” β€œgo” β€œto”

  • 1

s1

  • 2

s2

  • 3

s3

  • 4

s4

  • 5

s5

  • 6

s6 y2 y1 y3 y4 y5

slide-52
SLIDE 52

Speech recognition pipeline

Feature extraction

β€œI want to go to Johns Hopkins campus”

Ac Acoustic mo modeling (HM (HMM) Lexicon La Language mo modeling

G OW T UW β€œgo to” β€œgo two” β€œgo too” β€œgoes to” β€œgoes two” β€œgoes too” G OW Z T UW

p(O|L)

<latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH80gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiBrjdqEHrU4Lkfg6+/Olz2KjvVA4q4cVuXa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit>

p(L|W)

<latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">ACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9u2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFlFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit>

p(W)

<latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="UCVdQAUobHOL6CgCuMDHI9qnb8=">ACUnicbVHLSgMxFE3HVx2rtms3wSK4KlM36k5wI7ipYG2hU8qdTNqG5jEkd6xl6A+49evc+iWmD0RbLwQO5+RwknOTAqHUfRZCnZ29/YPyofhUSU8PjmtVl6cyS3jbWaksd0EHJdC8zYKlLybWQ4qkbyTO4XeueVWyeMfsZxvsKRloMBQP0VGtQrUeNaDl0GzTXoE7WM6iVHuPUsFxjUyCc71mlG/AIuCST4P49zxDNgERrznoQbFXb9YvnNOLzyT0qGx/mikS/a3owDl3Ewl/qYCHLtNbUH+q6lcorBmupGPw5t+IXSWI9dsFT/MJUVDF1XQVFjOUM48AGaF/wFlY7DA0BcWhrHmU2aUAp0WMdiRgrd5ES/CTVbEVtE1F0uhBLr5tkHobYPnfgy+/eZm19ugfdW4bURPESmTM3JOLkmTXJM78kBapE0YSck7+Sh9BeXgZLWkoLTeVo38maD6DWK+uXQ=</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">ACWXicbVHLSgMxFL0d3/VFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/pf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryurZe21i59TZ3XDS5VdbdpeiFkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNzUSH0/qZXkv5rOFUlnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq4Vtwp+Jtr8AZ0S7oA=</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">ACWXicbVHLSgMxFL0d3/VFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/pf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryurZe21i59TZ3XDS5VdbdpeiFkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNzUSH0/qZXkv5rOFUlnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq4Vtwp+Jtr8AZ0S7oA=</latexit><latexit sha1_base64="9aYdZ0Xohvj92Q9L2Vaw21RnMm4=">ACZHicbVHLSgMxFE3Hd7U+cSVIsAh1U6Zu1J3gRhBEwVrBKeVOmraheQzJHbUM/QW3+mt+gb9hpg6irRcCh3Pu4SQncSKFwzD8KAVz8wuLS8sr5dW1yvrG5tb2vTOpZbzJjDT2IQbHpdC8iQIlf0gsBxVL3oqHF7neuLWCaPvcJTwtoK+Fj3BAHMqbWOpvVsB5Ohs6CRgGqpJibzlbpKuoaliqukUlw7rERJtjOwKJgko/LUep4AmwIf7oQbFXTubXHZMDz3TpT1j/dFIJ+xvRwbKuZGK/aYCHLhpLSf/1VQqUVjzPJWPvdN2JnSItfsO76XSoqG5n3QrCcoRx5AMwK/wLKBmCBoW+tXI40f2ZGKdDdLALbV/AyzqI83CRZBUtuEgKJdCNZw1Czxo892Pw9Temy54FzeP6WT28Davn18U/LJM9ckBqpEFOyDm5JDekSRgZkFfyRt5Ln0El2Al2v1eDUuHZIX8m2P8CigS8eA=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">ACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJE6kcBiGH6VgZnZufqG8WFlaXldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHfCBPsZGBRMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4S0n/9VUKlFY8zyRj72TiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZgaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWQVLbhICiXQjaYNQk8bPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit>
slide-53
SLIDE 53

Today’s agenda

  • Basics of (deep) neural network
  • How to integrate DNN with HMM
  • Recurrent neural network
  • Attention mechanism

55

slide-54
SLIDE 54

Attention based encoder-decoder

  • Let 𝐷 = (𝑑

5 ∈ 𝕍|π‘˜ = 1, … , 𝐾), be a character sequence

  • 𝕍 : set of characters
  • Let 𝑃 = (𝐩' ∈ ℝ6|𝑒 = 1, … , π‘ˆ), be a sequence of 𝐸 dimensional

feature vectors J 𝐷 = argmax7π‘ž(𝐷|𝑃)

  • Problem: π‘ˆ and 𝐾 are different, and we cannot use normal neural

networks

  • Sequence to sequence is a solution to deal with it

56

slide-55
SLIDE 55

Alignment problem

61

I want to

slide-56
SLIDE 56

Attention mechanism

π‘ž 𝐷 𝑃 = R

!

π‘ž(𝑑

!|𝑑":!$", 𝐰 !)

  • Obtain the context vector

𝐰

! = : "#$ %

𝑏!" 𝐒′"

  • Compute the assignment probability for each output π‘˜ from a neural network
  • 𝐛! = 𝑏!" 𝑒 = 1, … , π‘ˆ} ∈ ℝ%, 0 < 𝑏!" < 1, βˆ‘"#$

%

𝑏!" = 1

  • 𝑏!" is obtained by using a neural network

62

𝑏** 𝑏*, 𝑏*+ 𝑏*0 𝑏,1 𝑏,- 𝑏,2 𝑏+3 𝑏+4 𝐰

5 has an explicit dependency

for character 𝑑

5

We can represent an alignment problem as a differentiable function

slide-57
SLIDE 57

63

Normal arrow: high probability Dashed arrow: low probability c

slide-58
SLIDE 58

64

Normal arrow: high probability Dashed arrow: low probability c

slide-59
SLIDE 59

65

Normal arrow: high probability Dashed arrow: low probability

slide-60
SLIDE 60

Summary of today’s talk

  • Basics of deep neural network
  • Input, output, function block, back propagation, optimization
  • Recurrent neural network
  • Now we can handle a sequence (𝑑", 𝑑(, … ), (π‘₯", π‘₯(, … ), (𝐩", 𝐩(, … )
  • Integrate deep neural network for speech recognition
  • GMM/HMM Γ  DNN/HMM or RNN/HMM
  • RNN language model
  • Attention based encoder-decoder
  • Machine translation and speech recognition have different lengths of input

and output.

  • Attention mechanism to deal with the different lengths

66