Understanding Convolutional Neural Networks Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July 24th, 2014 David Stutz | July 24th, 2014 0/53 1/53
Table of Contents - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 2/53
Motivation - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 3/53
Motivation - Motivation Convolutional networks represent specialized networks for application in computer vision: ◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features necessary). Problem: Internal workings of convolutional networks not well understood ... ◮ Unsatisfactory state for evaluation and research! Idea: Visualize feature activations within the network ... David Stutz | July 24th, 2014 4/53
Motivation - Motivation Convolutional networks represent specialized networks for application in computer vision: ◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features necessary). Problem: Internal workings of convolutional networks not well understood ... ◮ Unsatisfactory state for evaluation and research! Idea: Visualize feature activations within the network ... David Stutz | July 24th, 2014 4/53
Neural Networks and Network Training - Table of Contents 1 Motivation Neural Networks and Network Training 2 Multilayer Perceptrons Network Training Deep Learning Convolutional Networks 3 4 Understanding Convolutional Networks Deconvolutional Networks Visualization Conclusion 5 David Stutz | July 24th, 2014 5/53
Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons A multilayer perceptron represents an adaptable model y ( · , w ) able to map D -dimensional input to C -dimensional output: y 1 ( x, w ) y ( · , w ) : R D → R C , x �→ y ( x, w ) = . . . (1) . y C ( x, w ) In general, a ( L + 1) -layer perceptron consists of ( L + 1) layers, each layer l computing linear combinations of the previous layer ( l − 1) (or the input). David Stutz | July 24th, 2014 6/53
Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – First Layer On input x ∈ R D , layer l = 1 computes a vector y (1) := ( y (1) 1 , . . . , y (1) m (1) ) where D � � y (1) z (1) with z (1) w (1) i,j x j + w (1) � = f = i, 0 . (2) i i i j =1 i th component is called “unit i ” where f is called activation function and w (1) i,j are adjustable weights. David Stutz | July 24th, 2014 7/53
Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – First Layer What does this mean? Layer l = 1 computes linear combinations of the input and applies an (non-linear) activation function ... The first layer can be interpreted as generalized linear model: �� � � T y (1) w (1) x + w (1) = f . (3) i i i, 0 Idea: Recursively apply L additional layers on the output y (1) of the first layer. David Stutz | July 24th, 2014 8/53
Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – Further Layers In general, layer l computes a vector y ( l ) := ( y ( l ) 1 , . . . , y ( l ) m ( l ) ) as follows: m ( l − 1) � � y ( l ) z ( l ) with z ( l ) w ( l ) i,j y ( l − 1) + w ( l ) � = f = i, 0 . (4) i i i j j =1 Thus, layer l computes linear combinations of layer ( l − 1) and applies an activation function ... David Stutz | July 24th, 2014 9/53
Neural Networks and Network Training - Multilayer Perceptrons Multilayer Perceptrons – Output Layer Layer ( L + 1) is called output layer because it computes the output of the multilayer perceptron: y ( L +1) y 1 ( x, w ) 1 . . = y ( L +1) . . y ( x, w ) = := (5) . . y ( L +1) y C ( x, w ) C where C = m ( L +1) is the number of output dimensions. David Stutz | July 24th, 2014 10/53
Neural Networks and Network Training - Multilayer Perceptrons Network Graph 1 st layer L th layer output input y (1) y ( L ) y ( L +1) 1 1 x 1 1 . . . y (1) y ( L ) 2 2 y ( L +1) x 2 2 . . . . . . . . . . . . . . . x D y ( L +1) y (1) y ( L ) C m (1) m ( L ) David Stutz | July 24th, 2014 11/53
Neural Networks and Network Training - Multilayer Perceptrons Activation Functions – Notions How to choose the activation function f in each layer? ◮ Non-linear activation functions will increase the expressive power: Multilayer perceptrons with L + 1 ≥ 2 are universal approximators [HSW89]! ◮ Depending on the application: For classification we may want to interpret the output as posterior probabilities: y i ( x, w ) ! = p ( c = i | x ) (6) where c denotes the random variable for the class. David Stutz | July 24th, 2014 12/53
Neural Networks and Network Training - Multilayer Perceptrons Activation Functions – Notions How to choose the activation function f in each layer? ◮ Non-linear activation functions will increase the expressive power: Multilayer perceptrons with L + 1 ≥ 2 are universal approximators [HSW89]! ◮ Depending on the application: For classification we may want to interpret the output as posterior probabilities: y i ( x, w ) ! = p ( c = i | x ) (6) where c denotes the random variable for the class. David Stutz | July 24th, 2014 12/53
Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Usually the activation function is chosen to be the logistic sigmoid: 1 σ ( z ) 1 σ ( z ) = 1 + exp( − z ) 0 − 2 0 2 z which is non-linear, monotonic and differentiable. David Stutz | July 24th, 2014 13/53
Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Alternatively, the hyperbolic tangent is used frequently: tanh( z ) . (7) For classification with C > 1 classes, layer ( L + 1) uses the softmax activation function: exp( z ( L +1) ) y ( L +1) = σ ( z ( L +1) , i ) = i . (8) i k =1 exp( z ( L +1) � C ) k Then, the output can be interpreted as posterior probabilities. David Stutz | July 24th, 2014 14/53
Neural Networks and Network Training - Multilayer Perceptrons Activation Functions Alternatively, the hyperbolic tangent is used frequently: tanh( z ) . (7) For classification with C > 1 classes, layer ( L + 1) uses the softmax activation function: exp( z ( L +1) ) y ( L +1) = σ ( z ( L +1) , i ) = i . (8) i k =1 exp( z ( L +1) � C ) k Then, the output can be interpreted as posterior probabilities. David Stutz | July 24th, 2014 14/53
Neural Networks and Network Training - Network Training Network Training – Notions By now, we have a general model y ( · , w ) depending on W weights. Idea: Learn the weights to perform ◮ regression, ◮ or classification. We focus on classification. David Stutz | July 24th, 2014 15/53
Neural Networks and Network Training - Network Training Network Training – Training Set C classes: Given a training set 1 -of- C coding scheme U S = { ( x n , t n ) : 1 ≤ n ≤ N } , (9) learn the mapping represented by U S ... by minimizing the squared error N N C � � � ( y i ( x n , w ) − t n,i ) 2 E ( w ) = E n ( w ) = (10) n =1 n =1 i =1 using iterative optimization. David Stutz | July 24th, 2014 16/53
Neural Networks and Network Training - Network Training Network Training – Training Set C classes: Given a training set 1 -of- C coding scheme U S = { ( x n , t n ) : 1 ≤ n ≤ N } , (9) learn the mapping represented by U S ... by minimizing the squared error N N C � � � ( y i ( x n , w ) − t n,i ) 2 E ( w ) = E n ( w ) = (10) n =1 n =1 i =1 using iterative optimization. David Stutz | July 24th, 2014 16/53
Neural Networks and Network Training - Network Training Training Protocols We distinguish ... Stochastic Training A training sample ( x n , t n ) is chosen at random, and the weights w are updated to minimize E n ( w ) . Batch and Mini-Batch Training A set M ⊆ { 1 , . . . , N } of training samples is chosen and the weights w are updated based on the cumulative error E M ( w ) = � n ∈ M E n ( w ) . Of course, online training is possible, as well. David Stutz | July 24th, 2014 17/53
Neural Networks and Network Training - Network Training Training Protocols We distinguish ... Stochastic Training A training sample ( x n , t n ) is chosen at random, and the weights w are updated to minimize E n ( w ) . Batch and Mini-Batch Training A set M ⊆ { 1 , . . . , N } of training samples is chosen and the weights w are updated based on the cumulative error E M ( w ) = � n ∈ M E n ( w ) . Of course, online training is possible, as well. David Stutz | July 24th, 2014 17/53
Recommend
More recommend