understanding convolutional neural networks
play

Understanding Convolutional Neural Networks David Stutz July 24th, - PowerPoint PPT Presentation

Understanding Convolutional Neural Networks Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July 24th, 2014 David Stutz | July 24th, 2014 0/53 1/53 Table of Contents - Table of Contents 1 Motivation


  1. Understanding Convolutional Neural Networks Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July 24th, 2014 David Stutz | July 24th, 2014 0/53 1/53

  2. Table of Contents - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 2/53

  3. Motivation - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 3/53

  4. Motivation - Motivation Convolutional networks represent specialized networks for application in computer vision: ◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features necessary). Problem: Internal workings of convolutional networks not well understood ... ◮ Unsatisfactory state for evaluation and research! Idea: Visualize feature activations within the network ... David Stutz | July 24th, 2014 4/53

  5. Motivation - Motivation Convolutional networks represent specialized networks for application in computer vision: ◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features necessary). Problem: Internal workings of convolutional networks not well understood ... ◮ Unsatisfactory state for evaluation and research! Idea: Visualize feature activations within the network ... David Stutz | July 24th, 2014 4/53

  6. Neural Networks and Network Training - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 5/53

  7. Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons A multilayer perceptron represents an adaptable model y ( · , w ) able to map D -dimensional input to C -dimensional output:   y 1 ( x, w ) y ( · , w ) : R D → R C , x �→ y ( x, w ) = . .  . (1)   .  y C ( x, w ) In general, a ( L + 1) -layer perceptron consists of ( L + 1) layers, each layer l computing linear combinations of the previous layer ( l − 1) (or the input). David Stutz | July 24th, 2014 6/53

  8. Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – First Layer On input x ∈ R D , layer l = 1 computes a vector y (1) := ( y (1) 1 , . . . , y (1) m (1) ) where D � � y (1) z (1) with z (1) w (1) i,j x j + w (1) � = f = i, 0 . (2) i i i j =1 i th component is called “unit i ” where f is called activation function and w (1) i,j are adjustable weights. David Stutz | July 24th, 2014 7/53

  9. Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – First Layer What does this mean? Layer l = 1 computes linear combinations of the input and applies an (non-linear) activation function ... The first layer can be interpreted as generalized linear model: �� � � T y (1) w (1) x + w (1) = f . (3) i i i, 0 Idea: Recursively apply L additional layers on the output y (1) of the first layer. David Stutz | July 24th, 2014 8/53

  10. Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – Further Layers In general, layer l computes a vector y ( l ) := ( y ( l ) 1 , . . . , y ( l ) m ( l ) ) as follows: m ( l − 1) � � y ( l ) z ( l ) with z ( l ) w ( l ) i,j y ( l − 1) + w ( l ) � = f = i, 0 . (4) i i i j j =1 Thus, layer l computes linear combinations of layer ( l − 1) and applies an activation function ... David Stutz | July 24th, 2014 9/53

  11. Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – Output Layer Layer ( L + 1) is called output layer because it computes the output of the multilayer perceptron: y ( L +1)     y 1 ( x, w ) 1 . .  = y ( L +1) . . y ( x, w ) =  := (5)     . .   y ( L +1) y C ( x, w ) C where C = m ( L +1) is the number of output dimensions. David Stutz | July 24th, 2014 10/53

  12. Neural Networks and Network Training - Multilayer Perceptrons Network Graph 1 st layer L th layer output input y (1) y ( L ) y ( L +1) 1 1 x 1 1 . . . y (1) y ( L ) 2 2 y ( L +1) x 2 2 . . . . . . . . . . . . . . . x D y ( L +1) y (1) y ( L ) C m (1) m ( L ) David Stutz | July 24th, 2014 11/53

  13. Neural Networks and Network Training - Multilayer Perceptrons Activation Functions – Notions How to choose the activation function f in each layer? ◮ Non-linear activation functions will increase the expressive power: Multilayer perceptrons with L + 1 ≥ 2 are universal approximators [HSW89]! ◮ Depending on the application: For classification we may want to interpret the output as posterior probabilities: y i ( x, w ) ! = p ( c = i | x ) (6) where c denotes the random variable for the class. David Stutz | July 24th, 2014 12/53

  14. Neural Networks and Network Training - Multilayer Perceptrons Activation Functions – Notions How to choose the activation function f in each layer? ◮ Non-linear activation functions will increase the expressive power: Multilayer perceptrons with L + 1 ≥ 2 are universal approximators [HSW89]! ◮ Depending on the application: For classification we may want to interpret the output as posterior probabilities: y i ( x, w ) ! = p ( c = i | x ) (6) where c denotes the random variable for the class. David Stutz | July 24th, 2014 12/53

  15. Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Usually the activation function is chosen to be the logistic sigmoid: 1 σ ( z ) 1 σ ( z ) = 1 + exp( − z ) 0 − 2 0 2 z which is non-linear, monotonic and differentiable. David Stutz | July 24th, 2014 13/53

  16. Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Alternatively, the hyperbolic tangent is used frequently: tanh( z ) . (7) For classification with C > 1 classes, layer ( L + 1) uses the softmax activation function: exp( z ( L +1) ) y ( L +1) = σ ( z ( L +1) , i ) = i . (8) i k =1 exp( z ( L +1) � C ) k Then, the output can be interpreted as posterior probabilities. David Stutz | July 24th, 2014 14/53

  17. Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Alternatively, the hyperbolic tangent is used frequently: tanh( z ) . (7) For classification with C > 1 classes, layer ( L + 1) uses the softmax activation function: exp( z ( L +1) ) y ( L +1) = σ ( z ( L +1) , i ) = i . (8) i k =1 exp( z ( L +1) � C ) k Then, the output can be interpreted as posterior probabilities. David Stutz | July 24th, 2014 14/53

  18. Neural Networks and Network Training - Network Training Network Training – Notions By now, we have a general model y ( · , w ) depending on W weights. Idea: Learn the weights to perform ◮ regression, ◮ or classification. We focus on classification. David Stutz | July 24th, 2014 15/53

  19. Neural Networks and Network Training - Network Training Network Training – Training Set C classes: Given a training set 1 -of- C coding scheme U S = { ( x n , t n ) : 1 ≤ n ≤ N } , (9) learn the mapping represented by U S ... by minimizing the squared error N N C � � � ( y i ( x n , w ) − t n,i ) 2 E ( w ) = E n ( w ) = (10) n =1 n =1 i =1 using iterative optimization. David Stutz | July 24th, 2014 16/53

  20. Neural Networks and Network Training - Network Training Network Training – Training Set C classes: Given a training set 1 -of- C coding scheme U S = { ( x n , t n ) : 1 ≤ n ≤ N } , (9) learn the mapping represented by U S ... by minimizing the squared error N N C � � � ( y i ( x n , w ) − t n,i ) 2 E ( w ) = E n ( w ) = (10) n =1 n =1 i =1 using iterative optimization. David Stutz | July 24th, 2014 16/53

  21. Neural Networks and Network Training - Network Training Training Protocols We distinguish ... Stochastic Training A training sample ( x n , t n ) is chosen at random, and the weights w are updated to minimize E n ( w ) . Batch and Mini-Batch Training A set M ⊆ { 1 , . . . , N } of training samples is chosen and the weights w are updated based on the cumulative error E M ( w ) = � n ∈ M E n ( w ) . Of course, online training is possible, as well. David Stutz | July 24th, 2014 17/53

  22. Neural Networks and Network Training - Network Training Training Protocols We distinguish ... Stochastic Training A training sample ( x n , t n ) is chosen at random, and the weights w are updated to minimize E n ( w ) . Batch and Mini-Batch Training A set M ⊆ { 1 , . . . , N } of training samples is chosen and the weights w are updated based on the cumulative error E M ( w ) = � n ∈ M E n ( w ) . Of course, online training is possible, as well. David Stutz | July 24th, 2014 17/53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend