CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning

Recap: Bias and Variance • Bias – error caused because the model lacks the ability to represent the (complex) concept • Variance – error caused because the learning algorithm overreacts to small changes (noise) in the training data TotalLoss = Bias + Variance (+ noise)

Credit: Elements of Statistical Learning, Second edition

Recap: Universality Theorem Any continuous function f N → M f : R R Can be realized by a network with one hidden layer Reference for the reason: (given enough hidden http:// neurons) neuralnetworksanddeeplearn ing.com/chap4.html

Recap: Universality is Not Enough • Neural network has very high capacity (millions of parameters) • By our basic knowledge of bias-variance tradeoff, so many parameters should imply very low bias, and very high variance. The test loss may not be small. • Many efforts of deep learning are about mitigating overfitting!

Address Overfitting for NN • Use larger training data set • Design better network architecture

Convolutional Neural Network

Images as input to neural networks

Convolutional Neural Networks • CNN = a multi-layer neural network with – Local connectivity: • Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions: • Learning shift-invariant filter kernels Image credit: A. Karpathy Jia-Bin Huang and Derek Hoiem, UIUC

Perceptron: This is convolution!

Recap: Image filtering 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10 0 0 0 90 90 90 90 90 0 0 0 20 40 60 60 60 40 20 0 0 0 90 90 90 90 90 0 0 0 30 60 90 90 90 60 30 0 0 0 90 90 90 90 90 0 0 0 30 50 80 80 90 60 30 0 0 0 90 0 90 90 90 0 0 0 30 50 80 80 90 60 30 0 0 0 90 90 90 90 90 0 0 0 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 10 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Credit: S. Seitz

Stride = 3

2D spatial filters

k-D spatial filters

Dimensions of convolution image Convolutional layer Slide: Lazebnik

Dimensions of convolution feature map learned weights image Convolutional layer Slide: Lazebnik

Dimensions of convolution next layer image Convolutional layer Slide: Lazebnik

Dimensions of convolution Stride s

Number of weights

Convolutional Neural Networks [Slides credit: Efstratios Gavves]

Local connectivity

Pooling operations • Aggregate multiple values into a single value

Pooling operations • Aggregate multiple values into a single value • Invariance to small transformations • Keep only most important information for next layer • Reduces the size of the next layer • Fewer parameters, faster computations • Observe larger receptive field in next layer • Hierarchically extract more abstract features

Yann LeCun’s MNIST CNN architecture

AlexNet for ImageNet Layers - Kernel sizes - Strides - # channels - # kernels - Max pooling

[Krizhevsky et al. 2012] AlexNet diagram (simplified) Input size 227 x 227 x 3 227 3x3 227 3x3 Stride 2 Stride 2 Conv 2 Conv 3 Conv 4 Conv 4 Conv 1 5 x 5 x 48 3 x 3 x 256 3 x 3 x 192 3 x 3 x 192 11 x 11 x 3 Stride 1 Stride 1 Stride 1 Stride 1 Stride 4 256 filters 384 filters 256 filters 384 filters 96 filters

Learning Neural Networks

Practice II: Setting Hyperparameters Lecture 2 - Fei-Fei Li & Justin Johnson & Serena April 5, 2018 Yeung

Practice I: Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data Your Dataset Lecture 2 - Fei-Fei Li & Justin Johnson & Serena April 5, 2018 56 Yeung

Practice I: Setting Hyperparameters BAD : big network always works Idea #1 : Choose hyperparameters that work best on the data perfectly on training data Your Dataset Lecture 2 - Fei-Fei Li & Justin Johnson & Serena April 5, 2018 57 Yeung

Practice I: Setting Hyperparameters BAD : big network always works Idea #1 : Choose hyperparameters that work best on the data perfectly on training data Your Dataset Idea #2 : Split data into train and test , choose hyperparameters that work best on test data test train 58

Practice I: Setting Hyperparameters BAD : big network always works Idea #1 : Choose hyperparameters that work best on the data perfectly on training data Your Dataset Idea #2 : Split data into train and test , choose BAD : No idea how algorithm hyperparameters that work best on test data will perform on new data test train Lecture 2 - April 5, 2018 59

Practice I: Setting Hyperparameters BAD : big network always works Idea #1 : Choose hyperparameters that work best on the data perfectly on training data Your Dataset Idea #2 : Split data into train and test , choose BAD : No idea how algorithm hyperparameters that work best on test data will perform on new data test train Idea #3 : Split data into train , val , and test ; choose Better! hyperparameters on val and evaluate on test Fei-Fei Li & Justin Johnson & Serena Yeung April 5, 2018 60 train validation test Lecture 2 -

Practice II: Select Optimizer Lecture 2 - Fei-Fei Li & Justin Johnson & Serena April 5, 2018 Yeung

Stochastic gradient descent Gradient from entire training set: • For large training data, gradient computation takes a long time • Leads to “slow learning” • Instead, consider a mini-batch with m samples • If sample size is large enough, properties approximate the dataset

Stochastic gradient descent

Stochastic gradient descent Build up velocity as a running mean of gradients.

Many variations of using momentum • In PyTorch, you can manually specify the momentum of SGD • Or, you can use other optimization algorithms with “adaptive” momentum, e.g., ADAM • ADAM: Adaptive Moment Estimation • Empirically, ADAM usually converges faster, but SGD gives local minima with better generalizability

Practice III: Data Augmentation Lecture 2 - Fei-Fei Li & Justin Johnson & Serena April 5, 2018 Yeung

Horizontal flips

Random crops and scales

Color jitter

Color jitter Can do a lot more: rotation, shear, non-rigid, motion blur, lens distortions, ….

Exam • Linear algebra, such as • rank, null space, range, invertible, eigen decomposition, SVD, pseudo inverse, basic matrix calculus • Optimization: • Least square, low-rank approximation, statistical interpretation of PCA • Image formation • diffuse/specular reflection, Lambertian lighting equation • Filtering • Linear filter, filter vs convolution, properties of filters, filterbank, usage of filters, median filter • Statistics: • Bias, variance, bias-variance tradeoff, overfitting, underfitting • Neural network • Linear classifier, softmax, why linear classifier is insufficient, activation function, feed-forward pass, universality theorem, what does back- propagation do, stochastic gradient descent, concepts in neural networks, why CNN, concepts in CNN, how to set hyperparameter, moment in SGD, data augmentation

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning Recap: Bias and Variance Bias error caused because the model lacks the ability to represent the (complex) concept Variance error caused because

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Stereo Vision I Introduction to Computer Vision CSE 152 Lecture 13 CSE152, Spr 07 Intro

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

CSE 152: Computer Vision Hao Su Filters and Features Diffuse reflection: Lamberts cosine law

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects -

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

Announcements Homework 1 is due Apr 24, 11:59 PM Homework 2 will be assigned this week

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Infinite discrete symmetries near singularities and modular forms Axel Kleinschmidt (Albert

Spontaneous symmetry breaking - long range order 5 Ginzburg-Landau theory (Ising model of

Micro-structural analysis & radiation stability studies in undoped and cerium doped

Virtual Memory Goals for Today Virtual memory Mechanism Mechanism How does it

Central Drift Chamber for the Belle experiment Kota Nakagiri (KEK) on behalf of Belle CDC

Strichartz inequalities on non compact manifolds Jean-Marc Bouclet Institut de Mathmatiques de

Higher partial waves in ! "# interaction . Albert Feijoo Aliau Nuclear Physics Institute,

Accessing localization properties of many - body systems with quantum Monte Carlo Fabien Alet

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning Recap: Bias and Variance Bias error caused because the model lacks the ability to represent the (complex) concept Variance error caused because

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Stereo Vision I Introduction to Computer Vision CSE 152 Lecture 13 CSE152, Spr 07 Intro

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Area of Rectangles MP6: Attend to precision. MP7: Look for &amp; make use of structure. MP8:

CSE 152: Computer Vision Hao Su Filters and Features Diffuse reflection: Lamberts cosine law

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects -

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

Announcements Homework 1 is due Apr 24, 11:59 PM Homework 2 will be assigned this week

CSE 182-L2:Blast &amp; variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Infinite discrete symmetries near singularities and modular forms Axel Kleinschmidt (Albert

Spontaneous symmetry breaking - long range order 5 Ginzburg-Landau theory (Ising model of

Micro-structural analysis &amp; radiation stability studies in undoped and cerium doped

Virtual Memory Goals for Today Virtual memory Mechanism Mechanism How does it

Central Drift Chamber for the Belle experiment Kota Nakagiri (KEK) on behalf of Belle CDC

Strichartz inequalities on non compact manifolds Jean-Marc Bouclet Institut de Mathmatiques de

Higher partial waves in ! &quot;# interaction . Albert Feijoo Aliau Nuclear Physics Institute,

Accessing localization properties of many - body systems with quantum Monte Carlo Fabien Alet

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

Micro-structural analysis & radiation stability studies in undoped and cerium doped

Higher partial waves in ! "# interaction . Albert Feijoo Aliau Nuclear Physics Institute,