understanding neural networks
play

Understanding Neural Networks Part I: Artificial Neurons and Network - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Understanding Neural Networks Part I: Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks :


  1. TensorFlow Workshop 2018 Understanding Neural Networks Part I: Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  2. Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  3. Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  4. Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  5. Artificial Neural Networks Neural networks are a class of simple, yet effective, “computing systems” with a diverse range of applications. In these systems, small computational units, or nodes, are arranged to form networks in which connectivity is leveraged to carry out complex calculations. � Deep Learning by Goodfellow, Bengio, and Courville: http://www.deeplearningbook.org/ � Convolutional Neural Networks for Visual Recognition at Stanford: http://cs231n.stanford.edu/ SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  6. Artificial Neurons Diagram modified from Stack Exchange post answered by Gonzalo Medina. � Weights are first used to scale inputs; the results are summed with a bias term and passed through an activation function. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  7. Formula and Vector Representation The diagram from the previous slide can be interpreted as: y = f ( x 1 · w 1 + x 2 · w 2 + x 3 · w 3 + b ) which can be conveniently represented in vector form via: � � w T x + b y = f by interpreting the neuron inputs and weights as column vectors. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  8. Artificial Neurons: Multiple Outputs SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  9. Matrix Representation This corresponds to a pair of equations, one for each ouput: � � w T y 1 = f 1 x + b 1 � � w T y 2 = f 2 x + b 2 which can be represented in matrix form by the system: y = f ( W x + b ) where we assume the activation function has been vectorized . SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  10. Fully-Connected Neural Layers � The resulting layers, referred to as fully-connected or dense , can be visualized as a collection of nodes connected by edges corresponding to weights (bias/activations are typically omitted) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  11. Floating Point Operation Count Matrix-Vector Multiplication     w 11 . . . w 1 N x 1     . . . ...     . . . . . . w M 1 . . . w MN x N   w 11 · x 1 . . . w 1 N · x N   . . ... Mult: MN ∼   . . . . w M 1 · x 1 . . . w MN · x N   w 11 · x 1 + . . . + w 1 N · x N   . . . Add: M ( N − 1) ∼   . . . . . . w M 1 · x 1 + . . . + w MN · x N SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  12. Floating Point Operation Count So we see that when bias terms are omitted, the FLOPs required for a neural connection between N inputs and M outputs is: 2 MN − M = MN multiplies + M ( N − 1) adds When bias terms are included, an additional M addition operations are required, resulting in a total of 2 MN FLOPs. Note: This omits the computation required for applying the activation function to M values resulting from the linear operations. Depending on the activation function selected, this may or may not have a significant impact on the overall computational complexity. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  13. Activation Functions Activation functions are a fundamental component of neural network architectures; these functions are responsible for: � Providing all of the network’s non-linear modeling capacity � Controlling the gradient flows that guide the training process While activation functions play a fundamental role in all neural networks, it is still desirable to limit their computational demands (e.g. avoid defining them in terms of a Krylov subspace method...). In practice, activations such as rectified linear units (ReLUs) with the most trivial function and derivative definitions often suffice. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  14. Activation Functions Rectified Linear Unit (ReLU) SoftPlus Activation � � � x x ≥ 0 f ( x ) = f ( x ) = ln 1 + exp( − x ) 0 x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  15. Activation Functions Hyperbolic Tangent Unit Sigmoidal Unit 1 f ( x ) = f ( x ) = tanh( x ) 1 + exp( − x ) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  16. Activation Functions (Parameterized) Exponential Linear Unit (ELU) Leaky Rectified Linear Unit � � x x ≥ 0 x x ≥ 0 f α ( x ) = f α ( x ) = α · ( e x − 1) x < 0 α · x x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  17. Activation Functions (Learnable Parameters) Parameterized ReLU Swish Units � x β · x x ≥ 0 f β ( x ) = f β ( x ) = 1 + exp( − β · x ) x x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  18. Hidden Layers Intermediate, or hidden , layers can be added between the input and ouput nodes to allow for additional non-linear processing. For example, we can first define a layer such as: h = f 1 ( W 1 x + b 1 ) and construct a subsequent layer to produce the final output: y = f 2 ( W 2 h + b 2 ) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  19. Hidden Layers SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  20. Multiple Hidden Layers SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  21. Multiple Hidden Layers Multiple hidden layers can easily be defined in the same way: h 1 = f 1 ( W 1 x + b 1 ) h 2 = f 2 ( W 2 h 1 + b 2 ) y = f 3 ( W 3 h 2 + b 3 ) One of the challenges of working with additional layers is the need to determine the impact that earlier layers have on the final ouput. This will be necessary for tuning/optimizing network parameters (i.e. weights and biases) to produce accurate predictions. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  22. Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  23. Universal Approximators: Cybenko (1989) Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems , 2(4), pp.303-314. Basic Idea of Result: Let I n denote the unit hypercube in R n ; the collection of functions which can be expressed in the form: � N � � w T i =1 α i · σ i x + b i ∀ x ∈ I n is dense in the space of continuous functions C ( I n ) defined on I n : i.e. ∀ f ∈ C ( I n ) , ε > 0 there exist constants N, α i , w i , b i � � � N � � � � i =1 α i · σ ( w T � f ( x ) − i x + b i ) � < ε ∀ x ∈ I n such that SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

  24. Universal Approximators: Hornik et al. / Funahashi Hornik, K., Stinchcombe, M. and White, H., 1989. Multilayer feedforward networks are universal approximators. Neural networks , 2(5), pp.359-366. Funahashi, K.I., 1989. On the approximate realization of continuous mappings by neural networks. Neural networks , 2(3), pp.183-192. Summary of Results: For any compact set K ⊂ R n , multi-layer feedforward neural networks are dense in the space of continuous funtions C ( K ) on K , with respect to the supremum norm, provided that the activation function used for the network layers is: � Continuous and increasing � Non-constant and bounded SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend