Lecture 23: Final Exam Review Dr. Chengjiang Long Computer Vision - PowerPoint PPT Presentation

Learning Algorithm: Backpropagation Propagation of signals through the hidden layer. • Symbols w mn represent weights of connections between output of neuron m and input of neuron n in the next layer. 53 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation 54 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation Propagation of signals through the output layer. • 55 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation In the next algorithm step the output signal of the • network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal of output layer neuron 56 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation The idea is to propagate error signal (computed in • single step) back to all neurons, which output signals were input for discussed neuron. 57 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation The idea is to propagate error signal (computed in • single step) back to all neurons, which output signals were input for discussed neuron. 58 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation The weights' coefficients w mn used to propagate errors back • are equal to this used during computing output value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustration is below: 59 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the • weights coefficients of each neuron input node may be modified. 62 C. Long Lecture 23 May 6, 2018

Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the • weights coefficients of each neuron input node may be modified. 63 C. Long Lecture 23 May 6, 2018

Classification f 1 Classification • f 2 f 1 … Preprocessing for Input output Classification feature extraction f n Convolutional neural network • Feature ext Shift and distortion Input output classification r a c t i o n invariance 64 C. Long Lecture 23 May 6, 2018

CNN’s Topology Feature maps C Feature extraction layer Convolution layer Shift and distortion invariance or P Pooling layer 65 C. Long Lecture 23 May 6, 2018

Feature extraction Shared weights : all neurons in a feature share the • same weights ( but not the biases ). In this way all neurons detect the same feature at • different positions in the input image . Reduce the number of free parameters . • Inputs C P 66 C. Long Lecture 23 May 6, 2018

Putting it all together 67 C. Long Lecture 23 May 6, 2018

Intuition behind Deep Neural Nets The final layer outputs a probability distribution of • categories . 68 C. Long Lecture 23 May 6, 2018

Joint training architecture overview 69 C. Long Lecture 23 May 6, 2018

Lots of pretrained ConvNets Caffe models: https://github.com/BVLC/caffe/wiki/Model-Zoo • TensorFlow models: • https://github.com/tensorflow/models/tree/master/research/slim PyTorch models:https://github.com/Cadene/pretrained- • models.pytorch Caffe TensorFlow PyTorch 70 C. Long Lecture 23 May 6, 2018

Disadvantages From a memory and capacity standpoint the CNN is • not much bigger than a regular two layer network . At runtime the convolution operations are • computationally expensive and take up about 67% of the time . CNN’s are about 3 X slower than their fully connected • equivalents ( size wise ). 71 C. Long Lecture 23 May 6, 2018

Disadvantages Convolution operation • - 4 nested loops ( 2 loops on input image & 2 loops on kernel ) Small kernel size • - make the inner loops very inefficient as they frequently JMP . Cash unfriendly memory access • Back - propagation require both row - wise and column - wise • access to the input and kernel image . 2 D Images represented in a row - wise / serialized order . • Column - wise access to data can result in a high rate of cash • misses in memory subsystem . 72 C. Long Lecture 23 May 6, 2018

Activation Functions SReLU (Shift Rectified Linear Unit) max(-1, x) 73 C. Long Lecture 23 May 6, 2018

In practice Use ReLU . Be careful with your learning rates • Try out Leaky ReLU / Maxout / ELU • Try out tanh but don’t expect much • Don’t use sigmoid • 74 C. Long Lecture 23 May 6, 2018

Mini-batch SGD Loop : • 1. Sample a batch of data 2. Forward prop it through the graph , get loss 3. Backprop to calculate the gradients 4. Update the parameters using the gradient 75 C. Long Lecture 23 May 6, 2018

Overview of gradient descent optimization algorithms Link: http://ruder.io/optimizing-gradient- descent/ 76 C. Long Lecture 23 May 6, 2018

Which Optimizer to Use? If your input data is sparse, then you likely achieve the best results using • one of the adaptive learning-rate methods. RMSprop is an extension of Adagrad that deals with its radically • diminishing learning rates. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar, RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances. Experiments show that bias-correction helps Adam slightly outperform • RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice. Interestingly, many recent papers use SGD without momentum and a • simple learning rate annealing schedule. As has been shown, SGD usually achieves to find a minimum, but it might take significantly longer than with some of the optimizers, is much more reliant on a robust initialization and annealing schedule, and may get stuck in saddle points rather than local minima. If you care about fast convergence and train a deep or complex neural • network, you should choose one of the adaptive learning rate methods 77 C. Long Lecture 23 May 6, 2018

Learning rate SGD , SGD + Momentum , Adagrad , RMSProp , Adam • all have learning rate as a hyperparameter . 78 C. Long Lecture 23 May 6, 2018

L-BFGS Usually works very well in full batch, deterministic • mode. i.e. if you have a single, deterministic f(x) then L-BFGS will • probably work very nicely Does not transfer very well to mini-batch setting . • Gives bad results. Adapting L-BFGS to large-scale, • stochastic setting is an active area of research In practice : • Adam is a good default choice in most cases • If you can afford to do full batch updates then try out L- • BFGS (and don’t forget to disable all sources of noise) 79 C. Long Lecture 23 May 6, 2018

Regularization: Dropout “randomly set some neurons to zero in the forward • pass” [Srivastava et al., 2014] 80 C. Long Lecture 23 May 6, 2018

Regularization: Dropout Wait a second… How could this possibly be a good • idea ? Another interpretation: Dropout is training a large ensemble of models (that share parameters). Each binary mask is one model, gets trained on only ~one datapoint 81 C. Long Lecture 23 May 6, 2018

At test time…. Ideally : • want to integrate out all the • noise Monte Carlo approximation : • do many forward passes with • different dropout masks , average all predictions 82 C. Long Lecture 23 May 6, 2018

At test time…. Can in fact do this with a single forward pass ! • ( approximately ) Leave all input neurons turned on ( no dropout ). • Q: Suppose that with all inputs present at test time the output of this neuron is x. What would its output be during training time, in expectation? (e.g. if p = 0.5) 83 C. Long Lecture 23 May 6, 2018

At test time…. Can in fact do this with a single forward pass ! • ( approximately ) Leave all input neurons turned on ( no dropout ). • 84 C. Long Lecture 23 May 6, 2018

At test time…. Can in fact do this with a single forward pass ! • ( approximately ) Leave all input neurons turned on ( no dropout ). • 85 C. Long Lecture 23 May 6, 2018

Pattern recognition design cycle Train Evaluate Collect Regression Select regressor regressor data model features Linear regression model Support vector regression Logistical regression model Convolutional Neural Networks 86 C. Long Lecture 23 May 6, 2018

Linear Regression Given data with n dimensional variables and 1 target- • variable (real number) where The objective: Find a function f that returns the best fit. • To find the best fit , we minimize the sum of squared • errors -> Least square estimation The solution can be found by solving • 87 C. Long Lecture 23 May 6, 2018

Linear Regression To avoid over-fitting, a regularization term can be introduced (minimize a magnitude of w) 88 C. Long Lecture 23 May 6, 2018

Support Vector Regression Find a function , f ( x ), with at most ε- deviation from • the target y 89 C. Long Lecture 23 May 6, 2018

Support Vector Regression 90 C. Long Lecture 23 May 6, 2018

Soft margin 91 C. Long Lecture 23 May 6, 2018

Logistic Regression 92 C. Long Lecture 23 May 6, 2018

Logistic Regression Objective Function Can’t just use squared loss as in linear regression • – Using the logistic regression model results in a non-convex optimization 93 C. Long Lecture 23 May 6, 2018

Deriving the Cost Function via Maximum Likelihood Estimation 94 C. Long Lecture 23 May 6, 2018

Deriving the Cost Function via Maximum Likelihood Estimation 95 C. Long Lecture 23 May 6, 2018

Regularized Logistic Regression We can regularize logistic regression exactly as before • 96 C. Long Lecture 23 May 6, 2018

Another Interpretation Equivalently , logistic regression assumes that • In other words , logistic regression assumes that the log • odds is a linear function of x 97 C. Long Lecture 23 May 6, 2018

DNN Regression For a two-layer MLP: • The network weights are adjusted to minimize an • output cost function 98 C. Long Lecture 23 May 6, 2018

Idea #1: Localization as Regression 99 C. Long Lecture 23 May 6, 2018

Simple Recipe for Classification + Localization Step 2: Attach new fully-connected “regression head” • to the network 100 C. Long Lecture 23 May 6, 2018

Lecture 23: Final Exam Review Dr. Chengjiang Long Computer Vision - PowerPoint PPT Presentation

Lecture 23: Final Exam Review Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Final Project Presentation Agenda on May 1st No. Start time Duration Project name Authors 1

Math 211 Math 211 Review for the Final Exam December 8, 2002 2 The Final Exam The Final Exam

Final Review Drawing on the Web Final exam on Thursday, May 14 at 2:00 p.m. (EST) Final Review

ICS 101 Final Exam Review Fall 2016 Final Exam information In lab: check final exam schedule

Final Review Introduction to Web Design Final exam on Thursday, December 19 at 12:00 p.m. Final

Final exam effects Textures I Final exam effects Final exam effects Lighting Grads

Announcements Announcements Final Exam will be a take Final Exam will be a take- -home exam

The final exam Other finals review Final Exam Review CSH Review November 17 th

Did I happen to mention? Final exam Final Exam Review The date for the Final has been

Exam4 Information and Guidance General Topics General Exam Information Exam types

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

Final Exam Details The final exam will be posted on Blackboard by 7am on April 26th It will be

Final exam on Thursday, May 16 Drawing on the Web Final CSCI-UA 380 Review Multiple choice

FINAL EXAM REVIEW PACKET ANSWERS All answers can be found on my website! Final Exam Review 1.

Exam Review 2 Exam Overview Final Exam Friday,

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Algorithm-Based Fault Tolerance erault 1 , Yves Robert 1 , 2 & Fr eric Vivien 2 Thomas H

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de Computac ao, Unicamp

Reliability Basic concepts and properties Computadores II / 2005-2006 Characteristics of a RTS

Explaining Program Failures via Postmortem Static Analysis Manu Sridharan Roman Manevich UC

Artificial Neural Network : Training Debasis Samanta IIT Kharagpur

Twisted-Pair Superposition Transmission for Low Latency Communications Suihua Cai and Xiao Ma

Numerical behavior of stationary and two-step splitting iteration methods Miro Rozlo zn k

Week 10.2, Wednesday, Oct 23 Homework 5 Due October 26 @ 11:59PM on Gradescope Practice Midterm

Lecture 23: Final Exam Review Dr. Chengjiang Long Computer Vision - PowerPoint PPT Presentation

Lecture 23: Final Exam Review Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Final Project Presentation Agenda on May 1st No. Start time Duration Project name Authors 1

Math 211 Math 211 Review for the Final Exam December 8, 2002 2 The Final Exam The Final Exam

Final Review Drawing on the Web Final exam on Thursday, May 14 at 2:00 p.m. (EST) Final Review

ICS 101 Final Exam Review Fall 2016 Final Exam information In lab: check final exam schedule

Final Review Introduction to Web Design Final exam on Thursday, December 19 at 12:00 p.m. Final

Final exam effects Textures I Final exam effects Final exam effects Lighting Grads

Announcements Announcements Final Exam will be a take Final Exam will be a take- -home exam

The final exam Other finals review Final Exam Review CSH Review November 17 th

Did I happen to mention? Final exam Final Exam Review The date for the Final has been

Exam4 Information and Guidance General Topics General Exam Information Exam types

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

Final Exam Details The final exam will be posted on Blackboard by 7am on April 26th It will be

Final exam on Thursday, May 16 Drawing on the Web Final CSCI-UA 380 Review Multiple choice

FINAL EXAM REVIEW PACKET ANSWERS All answers can be found on my website! Final Exam Review 1.

Exam Review 2 Exam Overview Final Exam Friday,

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Algorithm-Based Fault Tolerance erault 1 , Yves Robert 1 , 2 &amp; Fr eric Vivien 2 Thomas H

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de Computac ao, Unicamp

Reliability Basic concepts and properties Computadores II / 2005-2006 Characteristics of a RTS

Explaining Program Failures via Postmortem Static Analysis Manu Sridharan Roman Manevich UC

Artificial Neural Network : Training Debasis Samanta IIT Kharagpur

Twisted-Pair Superposition Transmission for Low Latency Communications Suihua Cai and Xiao Ma

Numerical behavior of stationary and two-step splitting iteration methods Miro Rozlo zn k

Week 10.2, Wednesday, Oct 23 Homework 5 Due October 26 @ 11:59PM on Gradescope Practice Midterm

Algorithm-Based Fault Tolerance erault 1 , Yves Robert 1 , 2 & Fr eric Vivien 2 Thomas H