CS 4803 / 7643: Deep Learning Topics: Image Classification - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: Image Classification - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view K-NN Linear Classifier Zsolt Kira Georgia Tech Last Time High-level intro to what deep learning is Fast brief of logistics


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Image Classification – Supervised Learning view – K-NN – Linear Classifier

slide-2
SLIDE 2

Last Time

  • High-level intro to what deep learning is
  • Fast brief of logistics

– Requirements: ML, math (linear algebra, calculus), programming (python) – Grades: 80% PS/HW, 20% Project, Piazza Bonus – Project: Topic of your choosing (related to DL), groups of 3-4 with separated undergrad/grad – 7 free late days – 1 week re-grading period – No Cheating

  • PS0 out, due Tuesday 01/14 11:55pm

– Graded pass/fail – Intended to do on your own – Don’t worry if rusty! It’s OK to need a refresher on various subjects to do it. Some of it (e.g. last question) is more suitable for graduate students. – If not registered, email staff for gradescope account

  • Look through slides on website for all details

(C) Dhruv Batra & Zsolt Kira 2

slide-3
SLIDE 3

Current TAs

(C) Dhruv Batra & Zsolt Kira 3

  • New TAs: Zhuoran Yu. Manas Sahni, (in process) Harish Kamath
  • Official office hours coming soon (TA and instructor)
  • For this & next week:
  • 11:30am-12:30pm Friday 01/09 (Zhuoan Yu)
  • 11:30-12:30am on Monday (Patrick)
  • 11:30 AM to 12:30 PM on Tuesdays. (Sameer)
  • 4-5pm Tuesday (Jiachen)
  • 1:30-2:30 pm on Wed. (Anishi)
  • 11:30 AM to 12:30 PM on Thursdays. (Rahul)

Rahul Duggal 2nd year CS PhD student http://www.rahulduggal.com/ Patrick Grady 2nd year Robotics PhD student https://www.linkedin.com/in/patrick-grady Sameer Dharur MS-CS student https://www.linkedin.com/in/sameerdharur/ Jiachen Yang 2nd year ML PhD https://www.cc.gatech.edu/~jyang462/ Yinquan Lu 2nd year MSCSE student https://www.cc.gatech.edu/~jyang462/ Anishi Mehta MSCS student https://www.linkedin.com/in/anishimehta

slide-4
SLIDE 4

Registration/Access

  • Waitlist

– Still a large waitlist for grad, still adding some capacity

  • Canvas

– Anybody not have access?

  • Piazza

– 110+ people signed up. Please use that for questions.

(C) Dhruv Batra & Zsolt Kira 4

Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/ Piazza: https://piazza.com/gatech/spring2020/cs4803dl7643a/ Staff mailing list (personal questions): cs4803-7643-staff@lists.gatech.edu Gradescope: https://www.gradescope.com/courses/78537

Canvas: https://gatech.instructure.com/courses/94450/

Course Access Code (Piazza): MWXKY8

slide-5
SLIDE 5

http://cs231n.github.io/python-numpy-tutorial/

Prep for HW1: Python+Numpy Tutorial

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-6
SLIDE 6

Plan for Today

  • Reminder:

– What changed to enable DL

  • Some Problems with DL
  • Image Classification
  • Supervised Learning view
  • K-NN
  • (Beginning of) Linear Classifiers

(C) Dhruv Batra & Zsolt Kira 6

slide-7
SLIDE 7

Reminder: What Deep Learning Is

(C) Dhruv Batra & Zsolt Kira 7

  • We will learn a complex non-linear

hierarchical (compositional) function in an end-to-end manner

  • (Hierarchical) Compositionality

– Cascade of non-linear transformations – Multiple layers of representations

  • End-to-End Learning

– Learning (goal-driven) representations – Learning to feature extraction

  • Distributed Representations

– No single neuron “encodes” everything – Groups of neurons work together

slide-8
SLIDE 8

What Changed?

  • Few people saw this combination coming: gigantic growth in

data and processing to enable depth and feature learning

– Combined with specialized hardware (gpus) and open- source/distribution (arXiv, github)

  • If the input features are poor, so will your result be

– If your model is poor, so will your result be

  • If your optimizer is poor, so will your result be
  • Now we have methods for feature learning that works (after

some finesse)

– Still have to guard against overfitting (very complex functions!) – Still tune hyper-parameters – Still design neural network architectures – Lots of research to automate this too, e.g. via reinforcement learning!

slide-9
SLIDE 9

Problems with Deep Learning

  • Problem#1: Non-Convex! Non-Convex! Non-Convex!

– Depth>=3: most losses non-convex in parameters – Theoretically, all bets are off – Leads to stochasticity

  • different initializations  different local minima
  • Standard response #1

– “Yes, but all interesting learning problems are non-convex” – For example, human learning

  • Order matters  wave hands  non-convexity
  • Standard response #2

– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 9

slide-10
SLIDE 10

Problems with Deep Learning

  • Problem#2: Lack of interpretability

– Hard to track down what’s failing – Pipeline systems have “oracle” performances at each step – In end-to-end systems, it’s hard to know why things are not working

(C) Dhruv Batra & Zsolt Kira 10

slide-11
SLIDE 11

Problems with Deep Learning

  • Problem#2: Lack of interpretability

(C) Dhruv Batra & Zsolt Kira 11

End-to-End Pipeline [Fang et al. CVPR15] [Vinyals et al. CVPR15]

slide-12
SLIDE 12

Problems with Deep Learning

  • Problem#2: Lack of interpretability

– Hard to track down what’s failing – Pipeline systems have “oracle” performances at each step – In end-to-end systems, it’s hard to know why things are not working

  • Standard response #1

– Tricks of the trade: visualize features, add losses at different layers, pre-train to avoid degenerate initializations… – “We’re working on it”

  • Standard response #2

– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 12

slide-13
SLIDE 13

Problems with Deep Learning

  • Problem#3: Lack of easy reproducibility

– Direct consequence of stochasticity & non-convexity

  • Standard response #1

– It’s getting much better – Standard toolkits/libraries/frameworks now available – Caffe, Theano, (Py)Torch

  • Standard response #2

– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 13

slide-14
SLIDE 14

Yes it works, but how?

(C) Dhruv Batra & Zsolt Kira 14

slide-15
SLIDE 15

Image Classification

slide-16
SLIDE 16

Image Classification: A core task in Computer Vision cat

(assume given set of discrete labels) {dog, cat, truck, plane, ...}

This image by Nikita is licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-17
SLIDE 17

This image by Nikita is licensed under CC-BY 2.0

The Problem: Semantic Gap

What the computer sees An image is just a big grid of numbers between [0, 255]: e.g. 800 x 600 x 3 (3 channels RGB)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-18
SLIDE 18

Challenges: Viewpoint variation

All pixels change when the camera moves!

This image by Nikita is licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-19
SLIDE 19

Challenges: Illumination

This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-20
SLIDE 20

Challenges: Deformation

This image by Umberto Salvagnin is licensed under CC-BY 2.0 This image by Tom Thai is licensed under CC-BY 2.0 This image by sare bear is licensed under CC-BY 2.0 This image by Umberto Salvagnin is licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-21
SLIDE 21

Challenges: Occlusion

This image is CC0 1.0 public domain This image by jonsson is licensed under CC-BY 2.0 This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-22
SLIDE 22

Challenges: Background Clutter

This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-23
SLIDE 23

An image classifier

24

Unlike e.g. sorting a list of numbers, no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-24
SLIDE 24

Attempts have been made

25

John Canny, “A Computational Approach to Edge Detection”, IEEE TPAMI 1986

Find edges Find corners

?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-25
SLIDE 25

ML: A Data-Driven Approach

26

  • 1. Collect a dataset of images and labels
  • 2. Use Machine Learning to train a classifier
  • 3. Evaluate the classifier on new images

Example training set

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-26
SLIDE 26

Supervised Learning

  • Input: x

(images, text, emails…)

  • Output: y

(spam or non-spam…)

  • (Unknown) Target Function

– f: X  Y (the “true” mapping / reality)

  • Data

– (x1,y1), (x2,y2), …, (xN,yN)

  • Model / Hypothesis Class

– h: X  Y – y = h(x) = sign(wTx)

  • Learning = Search in hypothesis space

– Find best h in model class.

(C) Dhruv Batra & Zsolt Kira 27

slide-27
SLIDE 27

Procedural View

  • Training Stage:

– Training Data { (x,y) }  f (Learning)

  • Testing Stage

– Test Data x  f(x) (Apply function, Evaluate error)

(C) Dhruv Batra & Zsolt Kira 28

slide-28
SLIDE 28

Statistical Estimation View

  • Probabilities to rescue:

– X and Y are random variables – D = (x1,y1), (x2,y2), …, (xN,yN) ~ P(X,Y)

  • IID: Independent Identically Distributed

– Both training & testing data sampled IID from P(X,Y) – Learn on training set – Have some hope of generalizing to test set

(C) Dhruv Batra & Zsolt Kira 29

slide-29
SLIDE 29

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 30

Reality

slide-30
SLIDE 30

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 31

Reality

3x3 conv, 384 Pool 5x5 conv, 256 11x11 conv, 96 Input Pool 3x3 conv, 384 3x3 conv, 256 Pool FC 4096 FC 4096 Softmax FC 1000

AlexNet

slide-31
SLIDE 31

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 32

Reality

Input Softmax FC HxWx3

Multi-class Logistic Regression

slide-32
SLIDE 32

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 33

Reality

Pool Input Pool Pool Pool Pool Softmax 3x3 conv, 512 3x3 conv, 512 3x3 conv, 256 3x3 conv, 256 3x3 conv, 128 3x3 conv, 128 3x3 conv, 64 3x3 conv, 64 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 FC 4096 FC 1000 FC 4096

VGG19

slide-33
SLIDE 33

Error Decomposition

  • Approximation/Modeling Error

– You approximated reality with model

  • Estimation Error

– You tried to learn model with finite data

  • Optimization Error

– You were lazy and couldn’t/didn’t optimize to completion

  • Bayes Error

– Reality just sucks

(C) Dhruv Batra & Zsolt Kira 34

slide-34
SLIDE 34

Guarantees

  • 20 years of research in Learning Theory
  • versimplified:
  • If you have:

– Enough training data D – and H is not too complex – then probably we can generalize to unseen test data

  • Note: Several ways to measure complexity

– Vapnik–Chervonenkis dimension – Rademacher complexity

(C) Dhruv Batra & Zsolt Kira 35

slide-35
SLIDE 35

Learning is hard!

(C) Dhruv Batra & Zsolt Kira 36

slide-36
SLIDE 36

Learning is hard!

  • No assumptions = No learning

(C) Dhruv Batra & Zsolt Kira 37

slide-37
SLIDE 37

First classifier: Nearest Neighbor

38

Memorize all data and labels Predict the label

  • f the most similar

training image

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-38
SLIDE 38

Example Dataset: CIFAR10

39

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

10 classes 50,000 training images 10,000 testing images

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-39
SLIDE 39

Nearest Neighbours

40

slide-40
SLIDE 40

Nearest Neighbours

slide-41
SLIDE 41

Instance/Memory-based Learning

Four things make a memory based learner:

  • A distance metric
  • How many nearby neighbors to look at?
  • A weighting function (optional)
  • How to fit with the local points?

(C) Dhruv Batra & Zsolt Kira 42 Slide Credit: Carlos Guestrin

slide-42
SLIDE 42

1-Nearest Neighbour

Four things make a memory based learner:

  • A distance metric

– Euclidean (and others)

  • How many nearby neighbors to look at?

– 1

  • A weighting function (optional)

– unused

  • How to fit with the local points?

– Just predict the same output as the nearest neighbour.

(C) Dhruv Batra & Zsolt Kira 43 Slide Credit: Carlos Guestrin

slide-43
SLIDE 43

k-Nearest Neighbour

Four things make a memory based learner:

  • A distance metric

– Euclidean (and others)

  • How many nearby neighbors to look at?

– k

  • A weighting function (optional)

– unused

  • How to fit with the local points?

– Just predict the average output among the nearest neighbours.

(C) Dhruv Batra & Zsolt Kira 44 Slide Credit: Carlos Guestrin

slide-44
SLIDE 44

1-NN for Regression

(C) Dhruv Batra & Zsolt Kira 45

x y

Here, this is the closest datapoint Figure Credit: Carlos Guestrin

slide-45
SLIDE 45

Distance Metric to compare images

46

L1 distance:

add

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-46
SLIDE 46

47

Nearest Neighbor classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-47
SLIDE 47

48

Nearest Neighbor classifier Memorize training data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-48
SLIDE 48

49

Nearest Neighbor classifier For each test image: Find closest train image Predict label of nearest image

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-49
SLIDE 49

50

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-50
SLIDE 50

51

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-51
SLIDE 51

52

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N) This is bad: we want classifiers that are fast at prediction; slow for training is ok

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-52
SLIDE 52

What does this look like?

53

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-53
SLIDE 53

Nearest Neighbour

  • Demo 1

– http://vision.stanford.edu/teaching/cs231n-demos/knn/

  • Demo 2

– http://www.cs.technion.ac.il/~rani/LocBoost/

(C) Dhruv Batra & Zsolt Kira 55

slide-54
SLIDE 54

Parametric vs Non-Parametric Models

  • Does the capacity (size of hypothesis class) grow

with size of training data?

– Yes = Non-Parametric Models – No = Parametric Models

(C) Dhruv Batra & Zsolt Kira 56

slide-55
SLIDE 55

K-Nearest Neighbors: Distance Metric

57

L1 (Manhattan) distance L2 (Euclidean) distance

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-56
SLIDE 56

K-Nearest Neighbors: Distance Metric

58

L1 (Manhattan) distance L2 (Euclidean) distance

K = 1 K = 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-57
SLIDE 57

K-Nearest Neighbors: Distance Metric

L1 norm (absolute) Linfinity (max) norm Scaled Euclidian (L2) Mahalanobis

Slide by Andrew W. Moore

slide-58
SLIDE 58

More Distance Metrics

Slide by Andrew W. Moore

Other Metrics…

  • Mahalanobis, Rank-based, Correlation-based

(Stanfill+Waltz, Maes’ Ringo system…) where Or equivalently,

slide-59
SLIDE 59

Hyperparameters

61

What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-60
SLIDE 60

Hyperparameters

62

What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn

Very problem-dependent. Must try them all out and see what works best.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-61
SLIDE 61

Hyperparameters

63

Idea #1: Choose hyperparameters that work best on the data Your Dataset

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-62
SLIDE 62

Hyperparameters

64

Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-63
SLIDE 63

Hyperparameters

65

Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Idea #2: Split data into train and test, choose hyperparameters that work best on test data Your Dataset train test

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-64
SLIDE 64

Hyperparameters

66

Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data Your Dataset train test

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-65
SLIDE 65

Hyperparameters

67

Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data Your Dataset train test Idea #3: Split data into train, val, and test; choose hyperparameters on val and evaluate on test Better! train test validation

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-66
SLIDE 66

Hyperparameters

68

Your Dataset test fold 1 fold 2 fold 3 fold 4 fold 5 Idea #4: Cross-Validation: Split data into folds, try each fold as validation and average the results test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 Useful for small datasets, but not used too frequently in deep learning

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-67
SLIDE 67

Setting Hyperparameters

69

Example of 5-fold cross-validation for the value of k. Each point: single

  • utcome.

The line goes through the mean, bars indicated standard deviation (Seems that k ~= 7 works best for this data)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-68
SLIDE 68

Scene Completion [Hayes & Efros, SIGGRAPH07]

(C) Dhruv Batra & Zsolt Kira 70

slide-69
SLIDE 69

Hays and Efros, SIGGRAPH 2007

slide-70
SLIDE 70

… 200 total

Hays and Efros, SIGGRAPH 2007

slide-71
SLIDE 71

Context Matching

Hays and Efros, SIGGRAPH 2007

slide-72
SLIDE 72

Graph cut + Poisson blending

Hays and Efros, SIGGRAPH 2007

slide-73
SLIDE 73

Hays and Efros, SIGGRAPH 2007

slide-74
SLIDE 74

Hays and Efros, SIGGRAPH 2007

slide-75
SLIDE 75

Hays and Efros, SIGGRAPH 2007

slide-76
SLIDE 76

Problems with Instance-Based Learning

  • Expensive

– No Learning: most real work done during testing – For every test sample, must search through all dataset – very slow! – Must use tricks like approximate nearest neighbour search

  • Doesn’t work well when large number of irrelevant

features

– Distances overwhelmed by noisy features

  • Curse of Dimensionality

– Distances become meaningless in high dimensions

(C) Dhruv Batra & Zsolt Kira 78

slide-77
SLIDE 77

79

k-Nearest Neighbor on images never used.

  • Very slow at test time
  • Distance metrics on pixels are not informative

(all 3 images have same L2 distance to the one on the left)

Original Boxed Shifted Tinted

Original image is CC0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-78
SLIDE 78

80

k-Nearest Neighbor on images never used.

  • Curse of dimensionality

Dimensions = 1 Points = 4 Dimensions = 3 Points = 43 Dimensions = 2 Points = 42

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-79
SLIDE 79

Curse of Dimensionality

(C) Dhruv Batra and Zsolt Kira 81

Figure Credit: Kevin Murphy

slide-80
SLIDE 80

K-Nearest Neighbors: Summary

In Image classification we start with a training set of images and labels, and must predict labels on the test set The K-Nearest Neighbors classifier predicts labels based on nearest training examples Distance metric and K are hyperparameters Choose hyperparameters using the validation set; only run

  • n the test set once at the very end!

82

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-81
SLIDE 81

This image is CC0 1.0 public domain

Neural Network Linear classifiers

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-82
SLIDE 82

Visual Question Answering

(C) Dhruv Batra 84

Convolution Layer + Non-Linearity Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP

4096-dim

Embedding (VGGNet) Embedding (LSTM) Image Question

“How many horses are in this image?”

Neural Network Softmax

  • ver top K answers
slide-83
SLIDE 83

50,000 training images each image is 32x32x3 10,000 test images.

Recall CIFAR10

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-84
SLIDE 84

Image

f(x,W)

10 numbers giving class scores

Array of 32x32x3 numbers (3072 numbers total)

parameters

  • r weights

W

Parametric Approach

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-85
SLIDE 85

Image parameters

  • r weights

W

f(x,W)

10 numbers giving class scores

Array of 32x32x3 numbers (3072 numbers total)

f(x,W) = Wx

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-86
SLIDE 86

Image parameters

  • r weights

W

f(x,W)

10 numbers giving class scores

Array of 32x32x3 numbers (3072 numbers total)

f(x,W) = Wx

10x1 10x3072 3072x1

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-87
SLIDE 87

Image parameters

  • r weights

W

f(x,W)

10 numbers giving class scores

Array of 32x32x3 numbers (3072 numbers total)

f(x,W) = Wx + b

3072x1 10x1 10x3072 10x1

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-88
SLIDE 88

Error Decomposition

(C) Dhruv Batra 90

Reality

3x3 conv, 384 Pool 5x5 conv, 256 11x11 conv, 96 Input Pool 3x3 conv, 384 3x3 conv, 256 Pool FC 4096 FC 4096 Softmax FC 1000

AlexNet

slide-89
SLIDE 89

Error Decomposition

(C) Dhruv Batra 91

Reality

Input Softmax FC HxWx3

Multi-class Logistic Regression

slide-90
SLIDE 90

92

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Input image

56 231 24 2 56 231 24 2

Stretch pixels into column

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-91
SLIDE 91

93

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

0.2

  • 0.5

0.1 2.0 1.5 1.3 2.1 0.0 0.25 0.2

  • 0.3

W

Input image

56 231 24 2 56 231 24 2

Stretch pixels into column

1.1 3.2

  • 1.2

+

  • 96.8

437.9 61.95

=

Cat score Dog score Ship score

b

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-92
SLIDE 92

94

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

f(x,W) = Wx Algebraic Viewpoint

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-93
SLIDE 93

95

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Input image

0.2

  • 0.5

0.1 2.0 1.5 1.3 2.1 0.0 .25 0.2

  • 0.3

1.1 3.2

  • 1.2

W b f(x,W) = Wx Algebraic Viewpoint

  • 96.8

Score

437.9 61.95 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-94
SLIDE 94

Interpreting a Linear Classifier

96

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-95
SLIDE 95

Interpreting a Linear Classifier: Visual Viewpoint

97

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-96
SLIDE 96

Interpreting a Linear Classifier: Geometric Viewpoint

98

f(x,W) = Wx + b

Array of 32x32x3 numbers (3072 numbers total)

Cat image by Nikita is licensed under CC-BY 2.0 Plot created using Wolfram Cloud

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-97
SLIDE 97

Hard cases for a linear classifier

99

Class 1: First and third quadrants Class 2: Second and fourth quadrants Class 1: 1 <= L2 norm <= 2 Class 2: Everything else Class 1: Three modes Class 2: Everything else

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-98
SLIDE 98

Linear Classifier: Three Viewpoints

100

f(x,W) = Wx Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint One template per class Hyperplanes cutting up space

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-99
SLIDE 99

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

So far: Defined a (linear) score function

f(x,W) = Wx + b Example class scores for 3 images for some W: How can we tell whether this W is good or bad?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-100
SLIDE 100
  • 1. Define a loss function

that quantifies our unhappiness with the scores across the training data.

  • 2. Come up with a way of

efficiently finding the parameters that minimize the loss function. (optimization)

TODO:

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

So far: Defined a (linear) score function

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n