Lecture 3: Linear Classification Fei-Fei Li & Andrej Karpathy - - PowerPoint PPT Presentation

lecture 3 linear classification
SMART_READER_LITE
LIVE PREVIEW

Lecture 3: Linear Classification Fei-Fei Li & Andrej Karpathy - - PowerPoint PPT Presentation

Lecture 3: Linear Classification Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 2 - Lecture 2 - 7 Jan 2015 7 Jan 2015 1 Last time: Image Classification assume given set of discrete labels {dog, cat, truck,


slide-1
SLIDE 1

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 1

Lecture 3: Linear Classification

slide-2
SLIDE 2

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 2

Last time: Image Classification

cat

assume given set of discrete labels {dog, cat, truck, plane, ...}

slide-3
SLIDE 3

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 3

k-Nearest Neighbor

training set test images

slide-4
SLIDE 4

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 4

Linear Classification

  • 1. define a score function

class scores

slide-5
SLIDE 5

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 5

Linear Classification

  • 1. define a score function

“weights” “bias vector” data (image) class scores “parameters”

slide-6
SLIDE 6

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 6

Linear Classification

  • 1. define a score function

(assume CIFAR-10 example so 32 x 32 x 3 images, 10 classes)

weights bias vector data (image) [3072 x 1] class scores

slide-7
SLIDE 7

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 7

Linear Classification

  • 1. define a score function

(assume CIFAR-10 example so 32 x 32 x 3 images, 10 classes)

weights [10 x 3072] bias vector [10 x 1] data (image) [3072 x 1] class scores [10 x 1]

slide-8
SLIDE 8

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 8

Linear Classification

slide-9
SLIDE 9

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 9

Interpreting a Linear Classifier

Question: what can a linear classifier do?

slide-10
SLIDE 10

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 10

Interpreting a Linear Classifier

Example training classifiers on CIFAR-10:

slide-11
SLIDE 11

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 11

Interpreting a Linear Classifier

slide-12
SLIDE 12

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 12

Bias trick

slide-13
SLIDE 13

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 13

So far:

We defined a (linear) score function:

slide-14
SLIDE 14

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 14

  • 2. Define a loss function (or cost function, or objective)
slide-15
SLIDE 15

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 15

  • 2. Define a loss function (or cost function, or objective)
  • scores, label loss.
slide-16
SLIDE 16

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 16

  • 2. Define a loss function (or cost function, or objective)
  • scores, label loss.

Example:

slide-17
SLIDE 17

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 17

  • 2. Define a loss function (or cost function, or objective)
  • scores, label loss.

Example: Question: if you were to assign a single number to how “unhappy” you are with these scores, what would you do?

slide-18
SLIDE 18

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 18

  • 2. Define a loss function (or cost function, or objective)

One (of many ways) to do it: Multiclass SVM Loss

slide-19
SLIDE 19

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 19

  • 2. Define a loss function (or cost function, or objective)

One (of many ways) to do it: Multiclass SVM Loss

(One possible generalization of Binary Support Vector Machine to multiple classes)

slide-20
SLIDE 20

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 20

  • 2. Define a loss function (or cost function, or objective)

One (of many ways) to do it: Multiclass SVM Loss

loss due to example i sum over all incorrect labels difference between the correct class score and incorrect class score

slide-21
SLIDE 21

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 21

loss due to example i sum over all incorrect labels difference between the correct class score and incorrect class score

slide-22
SLIDE 22

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 22

Example:

e.g. 10 loss = ?

slide-23
SLIDE 23

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 23

Example:

e.g. 10

slide-24
SLIDE 24

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 24

slide-25
SLIDE 25

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 25

There is a bug with the objective…

slide-26
SLIDE 26

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 26

L2 Regularization

Regularization strength

slide-27
SLIDE 27

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 27

L2 regularization: motivation

slide-28
SLIDE 28

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 28

slide-29
SLIDE 29

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 29

Do we have to cross-validate both and ?

slide-30
SLIDE 30

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 30

slide-31
SLIDE 31

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 31

So far…

  • 1. Score function
  • 2. Loss function
slide-32
SLIDE 32

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 32

slide-33
SLIDE 33

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 33

Softmax Classifier

score function is the same (extension of Logistic Regression to multiple classes)

slide-34
SLIDE 34

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 34

Softmax Classifier

score function is the same

slide-35
SLIDE 35

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 35

Softmax Classifier

score function is the same i.e. we’re minimizing the negative log likelihood. softmax function

slide-36
SLIDE 36

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 36

Softmax Classifier

score function is the same softmax function

slide-37
SLIDE 37

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 37

Softmax Classifier

score function is the same i.e. we’re minimizing the negative log likelihood.

slide-38
SLIDE 38

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 38

slide-39
SLIDE 39

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 39

Softmax vs. SVM

  • Interpreting the probabilities from the Softmax
slide-40
SLIDE 40

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 40

Softmax vs. SVM

  • Interpreting the probabilities from the Softmax

suppose the weights W were only half as large (we use a higher regularization strength)

slide-41
SLIDE 41

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 41

Softmax vs. SVM

  • Interpreting the probabilities from the Softmax

suppose the weights W were only half as large:

slide-42
SLIDE 42

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 42

Softmax vs. SVM

  • Interpreting the probabilities from the Softmax

suppose the weights W were only half as large: What happens in the limit, as the regularization strength goes to infinity?

slide-43
SLIDE 43

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 43

Softmax vs. SVM

scores: [10, -2, 3] [10, 9, 9] [10, -100, -100] 1

slide-44
SLIDE 44

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 44

Softmax vs. SVM

1

slide-45
SLIDE 45

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 45

Interactive Web Demo time....

http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/

slide-46
SLIDE 46

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 46

Summary

  • We introduced a parametric approach to

image classification

  • We defined a score function (linear map)
  • We defined a loss function (SVM / Softmax)

One problem remains: How to find W,b ?

slide-47
SLIDE 47

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 47

Next class: Optimization, Backpropagation

Find the W,b that minimizes the loss function.