automatic speech recognition cs753 automatic speech
play

Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 Final Project Landscape Tabla bol transcription Voice-based music Sanskrit


  1. Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 


  2. Final Project Landscape Tabla bol transcription Voice-based music Sanskrit Synthesis Automatic Tongue player and Recognition Twister Generator InfoGAN for 
 Music Genre music Classification Automatic authorised ASR Speech synthesis Keyword spotting Singer 
 & ASR for Indic for continuous Audio Synthesis Identification languages speech Using LSTMs Speaker 
 Verification Transcribing TED Swapping Ad detection in live Talks instruments in radio streams Emotion recordings Recognition from speech End-to-end Nationality Audio-Visual Bird call Programming Speaker Adaptation detection from Speech Recognition with speech-based speech accents Recognition commands

  3. Feed-forward Neural Network Output 
 Layer Hidden 
 Input 
 Layer Layer

  4. Feed-forward Neural Network 
 Brain Metaphor Single neuron g w i y i x i ( activation 
 function ) y i =g( Σ i w i ⋅ x i ) Image from: https://upload.wikimedia.org/wikipedia/commons/1/10/Blausen_0657_MultipolarNeuron.png

  5. Feed-forward Neural Network 
 Parameterized Model w 13 w 35 1 3 x 1 w 14 w 23 a 5 5 w 45 2 4 w 24 x 2 Parameters of 
 a 5 = g( w 35 ⋅ a 3 + w 45 ⋅ a 4 ) the network: all w ij 
 = g( w 35 ⋅ ( g( w 13 ⋅ a 1 + w 23 ⋅ a 2 ) ) + 
 (and biases not 
 shown here) w 45 ⋅ ( g( w 14 ⋅ a 1 + w 24 ⋅ a 2 ) ) ) If x is a 2-dimensional vector and the layer above it is a 2-dimensional vector h , a fully-connected layer is associated with: h = xW + b where w ij in W is the weight of the connection between i th neuron in the input row and j th neuron in the first hidden layer and b is the bias vector

  6. Feed-forward Neural Network 
 Parameterized Model w 13 w 35 1 3 x 1 w 14 w 23 a 5 5 w 45 2 4 w 24 x 2 a 5 = g( w 35 ⋅ a 3 + w 45 ⋅ a 4 ) = g( w 35 ⋅ ( g( w 13 ⋅ a 1 + w 23 ⋅ a 2 ) ) + 
 w 45 ⋅ ( g( w 14 ⋅ a 1 + w 24 ⋅ a 2 ) ) ) The simplest neural network is the perceptron: Perceptron(x) = xW + b A 1-layer feedforward neural network has the form: MLP(x) = g( xW 1 + b 1 ) W 2 + b 2

  7. Common Activation Functions (g) Sigmoid : σ ( x ) = 1/(1 + e - x ) nonlinear activation functions 1.0 0.8 sigmoid 0.6 output 0.4 0.2 0.0 − 10 − 5 0 5 10 x

  8. Common Activation Functions (g) Sigmoid : σ ( x ) = 1/(1 + e - x ) Hyperbolic tangent (tanh) : tanh( x ) = ( e 2 x - 1)/( e 2 x + 1) nonlinear activation functions 1.0 tanh sigmoid 0.5 output 0.0 − 0.5 − 1.0 − 10 − 5 0 5 10 x

  9. Common Activation Functions (g) Sigmoid : σ ( x ) = 1/(1 + e - x ) Hyperbolic tangent (tanh) : tanh( x ) = ( e 2 x - 1)/( e 2 x + 1) Rectified Linear Unit (ReLU): RELU( x ) = max(0, x ) nonlinear activation functions 10 ReLU tanh 8 sigmoid 6 output 4 2 0 − 10 − 5 0 5 10 x

  10. Optimization Problem To train a neural network, define a loss function L(y, ỹ ): 
 • a function of the true output y and the predicted output ỹ L(y, ỹ ) assigns a non-negative numerical score to the neural • network’s output, ỹ The parameters of the network are set to minimise L over the • training examples (i.e. a sum of losses over di ff erent training samples) L is typically minimised using a gradient-based method •

  11. Stochastic Gradient Descent (SGD) SGD Algorithm Inputs: 
 Function NN(x; θ ), Training examples, x 1 … x n and 
 outputs, y 1 … y n and Loss function L. do until stopping criterion 
 Pick a training example x i , y i 
 Compute the loss L(NN(x i ; θ ), y i ) 
 Compute gradient of L, ∇ L with respect to θ 
 θ ← θ - η ∇ L 
 done Return: θ

  12. Training a Neural Network Define the Loss function to be minimised as a node L Goal: Learn weights for the neural network which minimise L Gradient Descent: Find ∂ L/ ∂ w for every weight w , and update it as 
 w ← w - η ∂ L/ ∂ w How do we e ff iciently compute ∂ L/ ∂ w for all w ? Will compute ∂ L/ ∂ u for every node u in the network! ∂ L/ ∂ w = ∂ L/ ∂ u ⋅ ∂ u/ ∂ w where u is the node which uses w

  13. Training a Neural Network New goal: compute ∂ L/ ∂ u for every node u in the network Simple algorithm: Backpropagation Key fact: Chain rule of di ff erentiation If L can be wri tu en as a function of variables v 1 ,…, v n , which in turn depend (partially) on another variable u , then ∂ L/ ∂ u = Σ i ∂ L/ ∂ v i ⋅ ∂ v i / ∂ u

  14. Backpropagation If L can be wri tu en as a function of variables v 1 ,…, v n , which in turn depend (partially) on another variable u , then ∂ L/ ∂ u = Σ i ∂ L/ ∂ v i ⋅ ∂ v i / ∂ u L Consider v 1 ,…, v n as the layer 
 above u, Γ ( u ) v u Then, the chain rule gives ∂ L/ ∂ u = Σ v ∈ Γ ( u ) ∂ L/ ∂ v ⋅ ∂ v/ ∂ u

  15. Backpropagation ∂ L/ ∂ u = Σ v ∈ Γ ( u ) ∂ L/ ∂ v ⋅ ∂ v/ ∂ u Backpropagation Forward Pass L Base case: ∂ L/ ∂ L = 1 First compute all For each u (top to values of u given an bo tu om): v input, in a forward For each v ∈ Γ ( u ): pass 
 u (The values of each node Inductively, have 
 will be needed during computed ∂ L/ ∂ v backprop) Directly compute ∂ v/ ∂ u Compute ∂ L/ ∂ u Compute ∂ L/ ∂ w 
 Where values computed in the where ∂ L/ ∂ w = ∂ L/ ∂ u ⋅ ∂ u/ ∂ w forward pass may be needed

  16. Neural Network Acoustic Models Input layer takes a window of acoustic feature vectors • Output layer corresponds to classes (e.g. monophone labels, • triphone states, etc.) Phone posteriors Image adapted from: Dahl et al., "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”, TASL’12

  17. Neural Network Acoustic Models Input layer takes a window of acoustic feature vectors • Hybrid NN/HMM systems: replace GMMs with outputs of NNs • Image from: Dahl et al., "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”, TASL’12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend