Introduction to Neural Networks Slides from L. Lazebnik, B. - PowerPoint PPT Presentation

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan

Outline • Perceptrons • Perceptron update rule • Multi-layer neural networks • Training method • Best practices for training classifiers • After that: convolutional neural networks

Recall: “Shallow” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier

“Deep” recognition pipeline Image Simple Layer 1 Layer 2 Layer 3 pixels Classifier • Learn a feature hierarchy from pixels to classifier • Each layer extracts features from the output of previous layer • Train all layers jointly

Neural networks vs. SVMs (a.k.a. “deep” vs. “shallow” learning)

Linear classifiers revisited: Perceptron Input Weights x 1 w 1 x 2 w 2 Output: sgn( w × x + b) x 3 w 3 . . . Can incorporate bias as w D component of the weight x D vector by always including a feature with value set to 1

Loose inspiration: Human neurons

Multi-layer perceptrons • To make nonlinear classifiers out of perceptrons, build a multi-layer neural network! • This requires each perceptron to have a nonlinearity

Multi-layer perceptrons • To make nonlinear classifiers out of perceptrons, build a multi-layer neural network! • This requires each perceptron to have a nonlinearity • To be trainable, the nonlinearity should be differentiable 1 Sigmoid: g ( t ) = Rectified linear unit (ReLU): g ( t ) = max(0, t ) 1 + e − t

Training of multi-layer networks • Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐹 𝐱 = $ 𝑚(𝐲 ! , 𝑧 ! ; 𝐱) ! • Possible losses (for binary problems): 𝐱 (𝐲 ! ) − 𝑧 ! # Quadratic loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = 𝑔 • Log likelihood loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = −log 𝑄 𝐱 𝑧 ! | 𝐲 ! • Hinge loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = max(0,1 − 𝑧 ! 𝑔 𝐱 𝐲 ! ) •

Training of multi-layer networks • Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐹 𝐱 = $ 𝑚(𝐲 ! , 𝑧 ! ; 𝐱) ! ¶ E ¬ - a w w • Update weights by gradient descent: ¶ w w 2 w 1

Training of multi-layer networks • Find network weights to minimize the prediction loss between true and estimated labels of training examples: 𝐹 𝐱 = $ 𝑚(𝐲 ! , 𝑧 ! ; 𝐱) ! ¶ E ¬ - a w w • Update weights by gradient descent: ¶ w • Back-propagation: gradients are computed in the direction from output to input layers and combined using chain rule • Stochastic gradient descent: compute the weight update w.r.t. one training example (or a small batch of examples) at a time, cycle through training examples in random order in multiple epochs

Back-propagation

Network with a single hidden layer • Neural networks with at least one hidden layer are universal function approximators

Network with a single hidden layer • Hidden layer size and network capacity : Source: http://cs231n.github.io/neural-networks-1/

Regularization • It is common to add a penalty (e.g., quadratic) on weight magnitudes to the objective function: 𝑚(𝐲 ! , 𝑧 ! ; 𝐱) + 𝜇 𝐱 # 𝐹 𝐱 = $ ! • Quadratic penalty encourages network to use all of its inputs “a little” rather than a few inputs “a lot” Source: http://cs231n.github.io/neural-networks-1/

Multi-Layer Network Demo http://playground.tensorflow.org/

Dealing with multiple classes • If we need to classify inputs into C different classes, we put C units in the last layer to produce C one-vs.-others scores 𝑔 ! , 𝑔 " , … , 𝑔 # • Apply softmax function to convert these scores to probabilities: exp(𝑔 ! ) % ) , … , exp(𝑔 # ) softmax 𝑔 ! , … , 𝑔 $ = ∑ % exp(𝑔 ∑ % exp(𝑔 % ) • If one of the inputs is much larger than the others, then the corresponding softmax value will be close to 1 and others will be close to 0 • Use log likelihood ( cross-entropy ) loss: 𝑚 𝐲 & , 𝑧 & ; 𝐱 = −log 𝑄 𝐱 𝑧 & | 𝐲 &

Neural networks: Pros and cons • Pros • Flexible and general function approximation framework • Can build extremely powerful models by adding more layers • Cons • Hard to analyze theoretically (e.g., training is prone to local optima) • Huge amount of training data, computing power may be required to get good performance • The space of implementation choices is huge (network architectures, parameters)

Best practices for training classifiers • Goal: obtain a classifier with good generalization or performance on never before seen data 1. Learn parameters on the training set 2. Tune hyperparameters (implementation choices) on the held out validation set 3. Evaluate performance on the test set • Crucial: do not peek at the test set when iterating steps 1 and 2!

What’s the big deal?

http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015

Bias-variance tradeoff • Prediction error of learning algorithms has two main components: • Bias: error due to simplifying model assumptions • Variance: error due to randomness of training set • Bias-variance tradeoff can be controlled by turning “knobs” that determine model complexity High bias, low variance Low bias, high variance Figure source

Underfitting and overfitting • Underfitting: training and test error are both high • Model does an equally poor job on the training and the test set • The model is too “simple” to represent the data or the model is not trained well • Overfitting: Training error is low but test error is high • Model fits irrelevant characteristics (noise) in the training data • Model is too complex or amount of training data is insufficient Underfitting Good tradeoff Overfitting Figure source

Introduction to Neural Networks Slides from L. Lazebnik, B. - PowerPoint PPT Presentation

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons Perceptron update rule Multi-layer neural networks Training method Best practices for training classifiers After that:

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9,

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Learning and transferring mid-level image representions using convolutional neural networks

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Sambuz

Useful Links

Newsletter

Mail Us