Softmax Classifier + Generalization Various slides from previous - - PowerPoint PPT Presentation

softmax classifier generalization
SMART_READER_LITE
LIVE PREVIEW

Softmax Classifier + Generalization Various slides from previous - - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Softmax Classifier + Generalization Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays


slide-1
SLIDE 1

CS4501: Introduction to Computer Vision

Softmax Classifier + Generalization

Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays (Brown / Georgia Tech), A. Berg (Stony Brook / UNC), D. Samaras (Stony Brook) . J. M. Frahm (UNC), V. Ordonez (UVA), Steve Seitz (UW).

slide-2
SLIDE 2
  • Introduction to Machine Learning
  • Unsupervised Learning: Clustering (e.g. k-means clustering)
  • Supervised Learning: Classification (e.g. k-nearest neighbors)

Last Class

slide-3
SLIDE 3
  • Softmax Classifier (Linear Classifiers)
  • Generalization / Overfitting / Regularization
  • Global Features

Today’s Class

slide-4
SLIDE 4

Supervised Learning vs Unsupervised Learning

cat cat cat dog dog dog bear bear bear

! → $ !

slide-5
SLIDE 5

Supervised Learning vs Unsupervised Learning

cat cat cat dog dog dog bear bear bear

! → $ !

slide-6
SLIDE 6

Supervised Learning vs Unsupervised Learning

cat cat cat dog dog dog bear bear bear

! → $ !

Classification Clustering

slide-7
SLIDE 7

7

Supervised Learning – k-Nearest Neighbors

cat cat cat dog dog dog bear bear bear cat, cat, dog

k=3

slide-8
SLIDE 8

8

Supervised Learning – k-Nearest Neighbors

cat cat cat dog dog dog bear bear bear bear, dog, dog

k=3

slide-9
SLIDE 9

9

Supervised Learning – k-Nearest Neighbors

  • How do we choose the right K?
  • How do we choose the right features?
  • How do we choose the right distance metric?
slide-10
SLIDE 10

10

Supervised Learning – k-Nearest Neighbors

  • How do we choose the right K?
  • How do we choose the right features?
  • How do we choose the right distance metric?

Answer: Just choose the one combination that works best! BUT not on the test data. Instead split the training data into a ”Training set” and a ”Validation set” (also called ”Development set”)

slide-11
SLIDE 11

11

Supervised Learning - Classification

cat cat cat dog dog dog bear bear bear

Training Data Test Data

slide-12
SLIDE 12

12

Supervised Learning - Classification

cat cat dog bear

Training Data Test Data

. . . . . .

slide-13
SLIDE 13

13

Supervised Learning - Classification

cat cat dog bear

Training Data

!" = [ ] !' = [ ] !( = [ ] !) = [ ] *) = [ ] *( = [ ] *' = [ ] *" = [ ] . . .

slide-14
SLIDE 14

14

Supervised Learning - Classification

Training Data

1 1 2 3

!" = !$ = !% = !& = '& = ['&& '&% '&$ '&*] '% = ['%& '%% '%$ '%*] '$ = ['$& '$% '$$ '$*] '" = ['"& '"% '"$ '"*] . . .

!,

  • = .(',; 1)

We need to find a function that maps x and y for any of them. How do we ”learn” the parameters

  • f this function?

We choose ones that makes the following quantity small:

3 4567(!,

  • , !,)

" ,9&

inputs targets / labels / ground truth

1 2 2 1

! :" = ! :$ = ! :% = ! :& =

predictions

slide-15
SLIDE 15

15

Supervised Learning –Softmax Classifier

Training Data

1 1 2 3

!" = !$ = !% = !& = '& = ['&& '&% '&$ '&*] '% = ['%& '%% '%$ '%*] '$ = ['$& '$% '$$ '$*] '" = ['"& '"% '"$ '"*] . . .

inputs targets / labels / ground truth

slide-16
SLIDE 16

16

Training Data

[1 0 0] [1 0 0] [0 1 0] [0 0 1]

!" = !$ = !% = !& = '& = ['&& '&% '&$ '&*] '% = ['%& '%% '%$ '%*] '$ = ['$& '$% '$$ '$*] '" = ['"& '"% '"$ '"*] . . .

inputs targets / labels / ground truth

[0.85 0.10 0.05] [0.40 0.45 0.05] [0.20 0.70 0.10] [0.40 0.25 0.35]

! ," = ! ,$ = ! ,% = ! ,& =

predictions

Supervised Learning –Softmax Classifier

slide-17
SLIDE 17

17

[1 0 0]

!" = $" = [$"& $"( $") $"*] ! ," = [-

. - / - 0]

  • 1. = 2.&$"& + 2.($"( + 2.)$") + 2.*$"* + 4.

1/ = 2/&$"& + 2/($"( + 2/)$") + 2/*$"* + 4/ 10 = 20&$"& + 20($"( + 20)$") + 20*$"* + 40

  • . = 567/(567+56: + 56;)
  • / = 56:/(567+56: + 56;)
  • 0 = 56;/(567+56: + 56;)

Supervised Learning –Softmax Classifier

slide-18
SLIDE 18

18

How do we find a good w and b?

[1 0 0]

!" = $" = [$"& $"( $") $"*] ! ," = [-

.(0, 2) - 4(0, 2) - 5(0, 2)]

We need to find w, and b that minimize the following function L: 6 0, 2 = 7 7 −!",9log (! ,",9)

) 9=& > "=&

Why? = 7 −log (! ,",?@5A?)

> "=&

= 7 −log -

",?@5A?(0, 2) > "=&

slide-19
SLIDE 19

How do we find a good w and b?

Problem statement: Find ! and " such that is minimal.

# !, " % %! # !, " = 0

Solution from calculus. and solve for

! % %" # !, " = 0

and solve for

"

slide-20
SLIDE 20

https://courses.lumenlearning.com/businesscalc1/chapter/reading-curve-sketching/

slide-21
SLIDE 21

How do we find a good w and b?

Problem statement: Find ! and " such that is minimal.

# !, " % %! # !, " = 0

Solution from calculus. and solve for

! % %" # !, " = 0

and solve for

"

slide-22
SLIDE 22

Problems with this approach:

  • Some functions L(w, b) are very complicated and compositions of

many functions. So finding its analytical derivative is tedious.

  • Even if the function is simple to derivate, it might not be easy to

solve for w. e.g.

! !" # ", % = '( + " = 0

How do you find w in that equation?

slide-23
SLIDE 23

23

Solution: Iterative Approach: Gradient Descent (GD)

! " "

  • 1. Start with a random value
  • f w (e.g. w = 12)

w=12

  • 2. Compute the gradient

(derivative) of L(w) at point w = 12. (e.g. dL/dw = 6)

  • 3. Recompute w as:

w = w – lambda * (dL / dw)

slide-24
SLIDE 24

24

! " " w=10

  • 2. Compute the gradient

(derivative) of L(w) at point w = 12. (e.g. dL/dw = 6)

  • 3. Recompute w as:

w = w – lambda * (dL / dw)

Solution: Iterative Approach: Gradient Descent (GD)

slide-25
SLIDE 25

25

Gradient Descent (GD)

!(#, %) = ( −log .

/,01230(#, %) 4 /56

7 = 0.01 for e = 0, num_epochs do end Initialize w and b randomly ;!(#, %)/;# ;!(#, %)/;% Compute: and Update w: Update b: # = # − 7 ;!(#, %)/;# % = % − 7 ;!(#, %)/;% Print: !(#, %) // Useful to see if this is becoming smaller or not. Problem: expensive!

slide-26
SLIDE 26

26

Solution: (mini-batch) Stochastic Gradient Descent (SGD)

!(#, %) = ( −log .

/,01230(#, %) /∈5

6 = 0.01 for e = 0, num_epochs do end Initialize w and b randomly :!(#, %)/:# :!(#, %)/:% Compute: and Update w: Update b: # = # − 6 :!(#, %)/:# % = % − 6 :!(#, %)/:% Print: !(#, %) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do B is a small set of training examples.

slide-27
SLIDE 27

Source: Andrew Ng

slide-28
SLIDE 28

28

Three more things

  • How to compute the gradient
  • Regularization
  • Momentum updates
slide-29
SLIDE 29

SGD Gradient for the Softmax Function

slide-30
SLIDE 30

SGD Gradient for the Softmax Function

slide-31
SLIDE 31

SGD Gradient for the Softmax Function

slide-32
SLIDE 32

32

Supervised Learning –Softmax Classifier

!" = [!"% !"' !"( !")]

Extract features

+, = -,%!"% + -,'!"' + -,(!"( + -,)!") + /, +0 = -0%!"% + -0'!"' + -0(!"( + -0)!") + /0 +1 = -1%!"% + -1'!"' + -1(!"( + -1)!") + /1 2

, = 345/(345+348 + 349)

2

0 = 348/(345+348 + 349)

2

1 = 349/(345+348 + 349)

Run features through classifier

; <" = [2

, 2 0 2 1]

Get predictions

slide-33
SLIDE 33

Prediction

Supervised Machine Learning Steps

Training Labels Training Images Training

Training

Image Features Image Features

Testing

Test Image Learned model Learned model

Slide credit: D. Hoiem

slide-34
SLIDE 34

Generalization

  • Generalization refers to the ability to correctly classify never before

seen examples

  • Can be controlled by turning “knobs” that affect the complexity of

the model

Training set (labels known) Test set (labels unknown)

slide-35
SLIDE 35

Overfitting

!"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

% is linear % is cubic % is a polynomial of degree 9

slide-36
SLIDE 36

Questions?

36