cs480 680 lecture 15 june 26 2019
play

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] - PowerPoint PPT Presentation

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Deep Neural Networks Gradient Vanishing Rectified linear units Overfitting


  1. CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

  2. Outline • Deep Neural Networks – Gradient Vanishing • Rectified linear units – Overfitting • Dropout • Breakthroughs – Acoustic modeling in speech recognition – Image recognition University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

  3. Deep Neural Networks • Definition: neural network with many hidden layers • Advantage: high expressivity • Challenges: – How should we train a deep neural network? – How can we avoid overfitting? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

  4. Expressiveness • Neural networks with one hidden layer of sigmoid/hyperbolic units can approximate arbitrarily closely neural networks with several layers of sigmoid/hyperbolic units • However as we increase the number of layers, the number of units needed may decrease exponentially (with the number of layers) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

  5. Example – Parity Function • Single layer of hidden nodes = 0 1 23 %$$ %& −1 23 565# 7 = 1 7 = −1 2 %&! odd "#$ "#$ "#$ "#$ "#$ "#$ "#$ "#$ subsets ! ! ! " ! # ! $ # inputs University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

  6. Example – Parity Function • 2" − 2 layers of hidden nodes 2 odd 2 odd 2 odd subsets subsets subsets ! ! "#$ %& "#$ %& "#$ %& = ( 1 *+ %$$ −1 *+ -.-# ! " "#$ ! # ! $ "#$ "#$ / = 1 / = −1 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

  7. The power of depth (practice) • Challenge: how to train deep NNs? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

  8. Speech • 2006 (Hinton, al.): first effective algo for deep NN – layerwise training of Stacked Restricted Boltzmann Machines (SRBM)s • 2009: Breakthrough in acoustic modeling – replace Gaussian Mixture Models by SRBMs – Improved speech recognition at Google,Microsoft,IBM • 2013-today: recurrent neural nets (LSTM) – Google error rate: 23% (2013) à 8% (2015) – Microsoft error rate: 5.9% (Oct 17, 2016) same as human performance University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

  9. Image Classification • ImageNet Large Scale Visual Recognition Challenge Features + SVMs Deep Convolutional Neural Nets 28.2 30 25.8 Classification error (%) 5 8 19 22 152 depth 25 20 16.4 15 11.7 10 7.3 6.7 5.1 3.57 3.07 5 0 NEC (2010) XRCE (2011) AlexNet (2012) ZF (2013) VGG (2014) GoogleLeNet (2014) ResNet (2015) GoogleLeNet-v4 (2016) Human University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

  10. Vanishing Gradients • Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients medium large small gradient gradient gradient University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

  11. Sigmoid and hyperbolic units • Derivative is always less than 1 sigmoid hyperbolic University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

  12. Simple Example ! = # $ % # $ & # $ ' # $ ( ) • $ + $ ) $ ' $ & ) ℎ ) ℎ ' ! ℎ + Common weight initialization in (-1,1) • Sigmoid function and its derivative always less than 1 • This leads to vanishing gradients: • !# $ = # % * & # * ' !" As products of !# ( = # % * & $ & # % * ' # * ) !" factors less than 1 gets longer, !# * = # % * & $ & # % * ' $ ' # % * ) # * + !" gradient vanishes !# , = # % * & $ & # % * ' $ ' # % * ) $ ) #′ * + ) !" University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

  13. Avoiding Vanishing Gradients • Several popular solutions: – Pre-training – Rectified linear units and maxout units – Skip connections – Batch normalization University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

  14. Rectified Linear Units • Rectified linear: ℎ " = max(0, ") – Gradient is 0 or 1 – Sparse computation • Soft version (“Softplus”) : ℎ " = log(1 + 0 ! ) Softplus Rectified Linear • Warning: softplus does not prevent gradient vanishing (gradient < 1) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

  15. Maxout Units • Generalization of rectified linear units " # ! , ∑ ! % ! # # ! , ∑ ! % ! $ # ! , … ∑ ! % ! !"# max identity identity identity ! ( ! ) ! + ! * University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

  16. Overfitting • High expressivity increases the risk of overfitting – # of parameters is often larger than the amount of data • Some solutions: – Regularization – Dropout – Data augmentation University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

  17. Dropout • Idea: randomly “drop” some units from the network when training • Training: at each iteration of gradient descent – Each input unit is dropped with probability ! ! (e.g., 0.2) – Each hidden unit is dropped with probability ! " (e.g., 0.5) • Prediction (testing): – Multiply each input unit by 1 − ! ! – Multiply each hidden unit by 1 − ! " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

  18. Dropout Algorithm Training: let ⨀ denote elementwise multiplication Repeat • – For each training example (# ! , % ! ) do ()) from *+,-./001 1 − 4 ) 5 ! for 1 ≤ 0 ≤ 7 • Sample ' ( • Neural network with dropout applied: % $ # ! # ! , ' ! ; : = ℎ " : # … ℎ $ : $ ℎ % : % 8 > # ! ⨀' ! ⨀ ' ! … ⨀ ' ! • Loss: ?,,(% ( , 8 ( (# ( , ' ( ; :) DEFF • Update: @ 5A ← @ 5A − C DG "# – End for Until convergence • Prediction: 8 # ! ; : = ℎ " : # … ℎ $ : $ ℎ % : % > # ! (1 − 4 % 1 − 4 $ … (1 − 4 # ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

  19. Intuition • Dropout can be viewed as an approximate form of ensemble learning • In each training iteration, a different subnetwork is trained • At test time, these subnetworks are “merged” by averaging their weights University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

  20. Applications of Deep Neural Networks • Speech Recognition • Image recognition • Machine translation • Control • Any application of shallow neural networks University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

  21. Acoustic Modeling in Speech Recognition University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

  22. Acoustic Modeling in Speech Recognition University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

  23. Image Recognition • Convolutional Neural Network – With rectified linear units and dropout – Data augmentation for transformation invariance University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

  24. ImageNet Breakthrough • Results: ILSVRC-2012 • From Krizhevsky, Sutskever, Hinton University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

  25. ImageNet Breakthrough • From Krizhevsky, Sutskever, Hinton University of Waterloo CS480/680 Spring 2019 Pascal Poupart 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend