CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural - - PowerPoint PPT Presentation

cse 152 computer vision
SMART_READER_LITE
LIVE PREVIEW

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural - - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning Recap: Bias and Variance Bias error caused because the model lacks the ability to represent the (complex) concept Variance error caused because


slide-1
SLIDE 1

Lecture 9: Convolutional Neural Network and Learning

CSE 152: Computer Vision

Hao Su

slide-2
SLIDE 2

Recap: Bias and Variance

  • Bias – error caused because the model lacks the

ability to represent the (complex) concept

  • Variance – error caused because the learning

algorithm overreacts to small changes (noise) in the training data TotalLoss = Bias + Variance (+ noise)

slide-3
SLIDE 3

Credit: Elements of Statistical Learning, Second edition

slide-4
SLIDE 4

Recap: Universality Theorem

Reference for the reason: http:// neuralnetworksanddeeplearn ing.com/chap4.html

Any continuous function f

M

: R R f

N →

Can be realized by a network with one hidden layer (given enough hidden neurons)

slide-5
SLIDE 5

Recap: Universality is Not Enough

  • Neural network has very high capacity (millions of

parameters)

  • By our basic knowledge of bias-variance tradeoff, so

many parameters should imply very low bias, and very high variance. The test loss may not be small.

  • Many efforts of deep learning are about mitigating
  • verfitting!
slide-6
SLIDE 6

Address Overfitting for NN

  • Use larger training data set
  • Design better network architecture
slide-7
SLIDE 7

Address Overfitting for NN

  • Use larger training data set
  • Design better network architecture
slide-8
SLIDE 8

Convolutional Neural Network

slide-9
SLIDE 9

Images as input to neural networks

slide-10
SLIDE 10

Images as input to neural networks

slide-11
SLIDE 11

Images as input to neural networks

slide-12
SLIDE 12

Convolutional Neural Networks

  • CNN = a multi-layer neural network with

– Local connectivity:

  • Neurons in a layer are only connected to a small region
  • f the layer before it

– Share weight parameters across spatial positions:

  • Learning shift-invariant filter kernels

Image credit: A. Karpathy

Jia-Bin Huang and Derek Hoiem, UIUC

slide-13
SLIDE 13
slide-14
SLIDE 14

Perceptron:

This is convolution!

slide-15
SLIDE 15

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 10 20 30 30 30 20 10 20 40 60 60 60 40 20 30 60 90 90 90 60 30 30 50 80 80 90 60 30 30 50 80 80 90 60 30 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 10 10 10

Recap: Image filtering

1 1 1 1 1 1 1 1 1

Credit: S. Seitz

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

Stride = 3

slide-33
SLIDE 33

Stride = 3

slide-34
SLIDE 34

Stride = 3

slide-35
SLIDE 35

Stride = 3

slide-36
SLIDE 36

Stride = 3

slide-37
SLIDE 37

2D spatial filters

slide-38
SLIDE 38

k-D spatial filters

slide-39
SLIDE 39

image Convolutional layer

Slide: Lazebnik

Dimensions of convolution

slide-40
SLIDE 40

image feature map learned weights Convolutional layer

Slide: Lazebnik

Dimensions of convolution

slide-41
SLIDE 41

image feature map learned weights Convolutional layer

Slide: Lazebnik

Dimensions of convolution

slide-42
SLIDE 42

image next layer Convolutional layer

Slide: Lazebnik

Dimensions of convolution

slide-43
SLIDE 43

Stride s

Dimensions of convolution

slide-44
SLIDE 44

Number of weights

slide-45
SLIDE 45

Number of weights

slide-46
SLIDE 46

Convolutional Neural Networks

[Slides credit: Efstratios Gavves]

slide-47
SLIDE 47

Local connectivity

slide-48
SLIDE 48

Pooling operations

  • Aggregate multiple values into a single value
slide-49
SLIDE 49

Pooling operations

  • Aggregate multiple values into a single value
  • Invariance to small transformations
  • Keep only most important information for next layer
  • Reduces the size of the next layer
  • Fewer parameters, faster computations
  • Observe larger receptive field in next layer
  • Hierarchically extract more abstract features
slide-50
SLIDE 50

Yann LeCun’s MNIST CNN architecture

slide-51
SLIDE 51

Layers

  • Kernel sizes
  • Strides
  • # channels
  • # kernels
  • Max pooling

AlexNet for ImageNet

slide-52
SLIDE 52

AlexNet diagram (simplified)

Input size 227 x 227 x 3 Conv 1 11 x 11 x 3 Stride 4 96 filters

227 227

Conv 2 5 x 5 x 48 Stride 1 256 filters

3x3 Stride 2 3x3 Stride 2

[Krizhevsky et al. 2012] Conv 3 3 x 3 x 256 Stride 1 384 filters Conv 4 3 x 3 x 192 Stride 1 384 filters Conv 4 3 x 3 x 192 Stride 1 256 filters

slide-53
SLIDE 53
slide-54
SLIDE 54

Learning Neural Networks

slide-55
SLIDE 55

April 5, 2018

Practice II: Setting Hyperparameters

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 -

slide-56
SLIDE 56

April 5, 2018

Practice I: Setting Hyperparameters

Fei-Fei Li & Justin Johnson & Serena Yeung 56 Lecture 2 - Idea #1: Choose hyperparameters that work best on the data Your Dataset

slide-57
SLIDE 57

April 5, 2018

Practice I: Setting Hyperparameters

Fei-Fei Li & Justin Johnson & Serena Yeung 57 Lecture 2 - Idea #1: Choose hyperparameters that work best on the data BAD: big network always works perfectly on training data Your Dataset

slide-58
SLIDE 58

Practice I: Setting Hyperparameters

58 Idea #1: Choose hyperparameters that work best on the data Idea #2: Split data into train and test, choose hyperparameters that work best on test data Your Dataset train test BAD: big network always works perfectly on training data

slide-59
SLIDE 59

April 5, 2018

Practice I: Setting Hyperparameters

59 Lecture 2 - Idea #1: Choose hyperparameters that work best on the data Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data Your Dataset train test BAD: big network always works perfectly on training data

slide-60
SLIDE 60

April 5, 2018

Practice I: Setting Hyperparameters

Fei-Fei Li & Justin Johnson & Serena Yeung 60 Lecture 2 - Idea #1: Choose hyperparameters that work best on the data Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data Your Dataset train test Idea #3: Split data into train, val, and test; choose hyperparameters on val and evaluate on test Better! train validation test BAD: big network always works perfectly on training data

slide-61
SLIDE 61

April 5, 2018

Practice II: Select Optimizer

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 -

slide-62
SLIDE 62

Stochastic gradient descent

Gradient from entire training set:

  • For large training data, gradient computation takes a long time
  • Leads to “slow learning”
  • Instead, consider a mini-batch with m samples
  • If sample size is large enough, properties approximate the dataset
slide-63
SLIDE 63

Stochastic gradient descent

slide-64
SLIDE 64

Stochastic gradient descent

slide-65
SLIDE 65

Stochastic gradient descent

slide-66
SLIDE 66

Stochastic gradient descent

Build up velocity as a running mean of gradients.

slide-67
SLIDE 67

Many variations of using momentum

  • In PyTorch, you can manually specify the

momentum of SGD

  • Or, you can use other optimization algorithms with

“adaptive” momentum, e.g., ADAM

  • ADAM: Adaptive Moment Estimation
  • Empirically, ADAM usually converges faster, but SGD

gives local minima with better generalizability

slide-68
SLIDE 68

April 5, 2018

Practice III: Data Augmentation

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 -

slide-69
SLIDE 69
slide-70
SLIDE 70

Horizontal flips

slide-71
SLIDE 71

Random crops and scales

slide-72
SLIDE 72

Color jitter

slide-73
SLIDE 73

Color jitter Can do a lot more: rotation, shear, non-rigid, motion blur, lens distortions, ….

slide-74
SLIDE 74

Exam

  • Linear algebra, such as
  • rank, null space, range, invertible, eigen decomposition, SVD, pseudo

inverse, basic matrix calculus

  • Optimization:
  • Least square, low-rank approximation, statistical interpretation of PCA
  • Image formation
  • diffuse/specular reflection, Lambertian lighting equation
  • Filtering
  • Linear filter, filter vs convolution, properties of filters, filterbank, usage
  • f filters, median filter
  • Statistics:
  • Bias, variance, bias-variance tradeoff, overfitting, underfitting
  • Neural network
  • Linear classifier, softmax, why linear classifier is insufficient, activation

function, feed-forward pass, universality theorem, what does back- propagation do, stochastic gradient descent, concepts in neural networks, why CNN, concepts in CNN, how to set hyperparameter, moment in SGD, data augmentation