Understanding Neural Networks Part I: Artificial Neurons and Network - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Understanding Neural Networks Part I: Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Artificial Neural Networks Neural networks are a class of simple, yet effective, “computing systems” with a diverse range of applications. In these systems, small computational units, or nodes, are arranged to form networks in which connectivity is leveraged to carry out complex calculations. � Deep Learning by Goodfellow, Bengio, and Courville: http://www.deeplearningbook.org/ � Convolutional Neural Networks for Visual Recognition at Stanford: http://cs231n.stanford.edu/ SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Artificial Neurons Diagram modified from Stack Exchange post answered by Gonzalo Medina. � Weights are first used to scale inputs; the results are summed with a bias term and passed through an activation function. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Formula and Vector Representation The diagram from the previous slide can be interpreted as: y = f ( x 1 · w 1 + x 2 · w 2 + x 3 · w 3 + b ) which can be conveniently represented in vector form via: � � w T x + b y = f by interpreting the neuron inputs and weights as column vectors. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Artificial Neurons: Multiple Outputs SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Matrix Representation This corresponds to a pair of equations, one for each ouput: � � w T y 1 = f 1 x + b 1 � � w T y 2 = f 2 x + b 2 which can be represented in matrix form by the system: y = f ( W x + b ) where we assume the activation function has been vectorized . SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Fully-Connected Neural Layers � The resulting layers, referred to as fully-connected or dense , can be visualized as a collection of nodes connected by edges corresponding to weights (bias/activations are typically omitted) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Floating Point Operation Count Matrix-Vector Multiplication     w 11 . . . w 1 N x 1     . . . ...     . . . . . . w M 1 . . . w MN x N   w 11 · x 1 . . . w 1 N · x N   . . ... Mult: MN ∼   . . . . w M 1 · x 1 . . . w MN · x N   w 11 · x 1 + . . . + w 1 N · x N   . . . Add: M ( N − 1) ∼   . . . . . . w M 1 · x 1 + . . . + w MN · x N SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Floating Point Operation Count So we see that when bias terms are omitted, the FLOPs required for a neural connection between N inputs and M outputs is: 2 MN − M = MN multiplies + M ( N − 1) adds When bias terms are included, an additional M addition operations are required, resulting in a total of 2 MN FLOPs. Note: This omits the computation required for applying the activation function to M values resulting from the linear operations. Depending on the activation function selected, this may or may not have a significant impact on the overall computational complexity. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Activation Functions Activation functions are a fundamental component of neural network architectures; these functions are responsible for: � Providing all of the network’s non-linear modeling capacity � Controlling the gradient flows that guide the training process While activation functions play a fundamental role in all neural networks, it is still desirable to limit their computational demands (e.g. avoid defining them in terms of a Krylov subspace method...). In practice, activations such as rectified linear units (ReLUs) with the most trivial function and derivative definitions often suffice. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Activation Functions Rectified Linear Unit (ReLU) SoftPlus Activation � � � x x ≥ 0 f ( x ) = f ( x ) = ln 1 + exp( − x ) 0 x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Activation Functions Hyperbolic Tangent Unit Sigmoidal Unit 1 f ( x ) = f ( x ) = tanh( x ) 1 + exp( − x ) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Activation Functions (Parameterized) Exponential Linear Unit (ELU) Leaky Rectified Linear Unit � � x x ≥ 0 x x ≥ 0 f α ( x ) = f α ( x ) = α · ( e x − 1) x < 0 α · x x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Activation Functions (Learnable Parameters) Parameterized ReLU Swish Units � x β · x x ≥ 0 f β ( x ) = f β ( x ) = 1 + exp( − β · x ) x x < 0 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Hidden Layers Intermediate, or hidden , layers can be added between the input and ouput nodes to allow for additional non-linear processing. For example, we can first define a layer such as: h = f 1 ( W 1 x + b 1 ) and construct a subsequent layer to produce the final output: y = f 2 ( W 2 h + b 2 ) SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Hidden Layers SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Multiple Hidden Layers SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Multiple Hidden Layers Multiple hidden layers can easily be defined in the same way: h 1 = f 1 ( W 1 x + b 1 ) h 2 = f 2 ( W 2 h 1 + b 2 ) y = f 3 ( W 3 h 2 + b 3 ) One of the challenges of working with additional layers is the need to determine the impact that earlier layers have on the final ouput. This will be necessary for tuning/optimizing network parameters (i.e. weights and biases) to produce accurate predictions. SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Outline 1 Neural Networks Artificial Neurons and Hidden Layers Universal Approximation Theorem Regularization and Batch Norm 2 Network Optimization Evaluating Network Performance Stochastic Gradient Descent Algorithms Backprop and Automatic Differentiation SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Universal Approximators: Cybenko (1989) Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems , 2(4), pp.303-314. Basic Idea of Result: Let I n denote the unit hypercube in R n ; the collection of functions which can be expressed in the form: � N � � w T i =1 α i · σ i x + b i ∀ x ∈ I n is dense in the space of continuous functions C ( I n ) defined on I n : i.e. ∀ f ∈ C ( I n ) , ε > 0 there exist constants N, α i , w i , b i � � � N � � � � i =1 α i · σ ( w T � f ( x ) − i x + b i ) � < ε ∀ x ∈ I n such that SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Universal Approximators: Hornik et al. / Funahashi Hornik, K., Stinchcombe, M. and White, H., 1989. Multilayer feedforward networks are universal approximators. Neural networks , 2(5), pp.359-366. Funahashi, K.I., 1989. On the approximate realization of continuous mappings by neural networks. Neural networks , 2(3), pp.183-192. Summary of Results: For any compact set K ⊂ R n , multi-layer feedforward neural networks are dense in the space of continuous funtions C ( K ) on K , with respect to the supremum norm, provided that the activation function used for the network layers is: � Continuous and increasing � Non-constant and bounded SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks : Part I

Understanding Neural Networks Part I: Artificial Neurons and Network - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Understanding Neural Networks Part I: Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks :

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Performance Measurement Work Group Meeting 1/15/2020 Agenda 1. Welcome and introductions 2.

Transparency in Teaching and Learning Pat Hutchings National Institute for Learning Outcomes

Mrs. Schulewitch OCTOBER 21ST o Personal Narrative Essay Writing o This is day one out of four

Welcome! Reducing Emergency Department use among the MI Population Learning Series- Systems

211: A Tool for Alleviating Poverty A Vibrant Communities Canada Webinar Series featuring: Bill

Out of the Spreadsheet & into the Community Rahul Bhargava, rahulb@mit.edu, @rahulbot Out of

Mergeable Summaries Graham Cormode graham@research.att.com graham@research.att.com Pankaj

Ed Turanchik Of Counsel, Government Affairs and Public Policy Florida 3P Law Opens Door to

Understanding Neural Networks Part I: Artificial Neurons and Network - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Understanding Neural Networks Part I: Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Understanding Neural Networks :

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Performance Measurement Work Group Meeting 1/15/2020 Agenda 1. Welcome and introductions 2.

Transparency in Teaching and Learning Pat Hutchings National Institute for Learning Outcomes

Mrs. Schulewitch OCTOBER 21ST o Personal Narrative Essay Writing o This is day one out of four

Welcome! Reducing Emergency Department use among the MI Population Learning Series- Systems

211: A Tool for Alleviating Poverty A Vibrant Communities Canada Webinar Series featuring: Bill

Out of the Spreadsheet &amp; into the Community Rahul Bhargava, rahulb@mit.edu, @rahulbot Out of

Mergeable Summaries Graham Cormode graham@research.att.com graham@research.att.com Pankaj

Ed Turanchik Of Counsel, Government Affairs and Public Policy Florida 3P Law Opens Door to

Out of the Spreadsheet & into the Community Rahul Bhargava, rahulb@mit.edu, @rahulbot Out of