CS4501: Introduction to Computer Vision Max-Margin Classifier, - - PowerPoint PPT Presentation

cs4501 introduction to computer vision
SMART_READER_LITE
LIVE PREVIEW

CS4501: Introduction to Computer Vision Max-Margin Classifier, - - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging Previous Class Softmax Classifier Inference vs Training Gradient Descent (GD)


slide-1
SLIDE 1

CS4501: Introduction to Computer Vision

Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging

slide-2
SLIDE 2
  • Softmax Classifier
  • Inference vs Training
  • Gradient Descent (GD)
  • Stochastic Gradient Descent (SGD)
  • mini-batch Stochastic Gradient Descent (SGD)

Previous Class

slide-3
SLIDE 3
  • Softmax Classifier
  • Inference vs Training
  • Gradient Descent (GD)
  • Stochastic Gradient Descent (SGD)
  • mini-batch Stochastic Gradient Descent (SGD)
  • Generalization
  • Regularization / Momentum
  • Max-Margin Classifier
  • Regression / Tagging

Previous Class

slide-4
SLIDE 4

4

(mini-batch) Stochastic Gradient Descent (SGD)

! = 0.01 for e = 0, num_epochs do end Initialize w and b randomly &'(), +)/&) &'(), +)/&+ Compute: and Update w: Update b: ) = ) − ! &'(), +)/&) + = + − ! &'(), +)/&+ Print: '(), +) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do

'(), +) = /

0∈2

−log 6

0,789:7(), +)

For Softmax Classifier

slide-5
SLIDE 5

5

Supervised Learning –Softmax Classifier

!" = [!"% !"& !"' !"(]

Extract features

*+ = ,+%!"% + ,+&!"& + ,+'!"' + ,+(!"( + .+ */ = ,/%!"% + ,/&!"& + ,/'!"' + ,/(!"( + ./ *0 = ,0%!"% + ,0&!"& + ,0'!"' + ,0(!"( + .0 1

+ = 234/(234+237 + 238)

1

/ = 237/(234+237 + 238)

1

0 = 238/(234+237 + 238)

Run features through classifier

: ;" = [1

+

1

/

1

0]

Get predictions

slide-6
SLIDE 6

6

Linear Max Margin-Classifier

Training Data

[1 0 0] [1 0 0] [0 1 0] [0 0 1]

!" = !$ = !% = !& = '& = ['&& '&% '&$ '&)] '% = ['%& '%% '%$ '%)] '$ = ['$& '$% '$$ '$)] '" = ['"& '"% '"$ '")] . . .

inputs targets / labels / ground truth

[4.3 -1.3 1.1] [3.3 3.5 1.1] [0.5 5.6 -4.2] [1.1 -5.3 -9.4]

+ !" = + !$ = + !% = + !& =

predictions

slide-7
SLIDE 7

7

Linear – Max Margin Classifier - Inference

[1 0 0]

!" = $" = [$"& $"' $"( $")] + !" = [,

  • ,

.

,

/]

,

  • = 0-&$"& + 0-'$"' + 0-($"( + 0-)$") + 2-

,

. = 0.&$"& + 0.'$"' + 0.($"( + 0.)$") + 2.

,

/ = 0/&$"& + 0/'$"' + 0/($"( + 0/)$") + 2/

slide-8
SLIDE 8

8

Training: How do we find a good w and b?

[1 0 0]

!" = $" = [$"& $"' $"( $")] + !" = [,

  • (/, 1)

,

3(/, 1)

,

4(/, 1)]

We need to find w, and b that minimize the following: 5 /, 1 = 6

"7& 8

6

9:;<4=;

max(0, + !"9 − + !",;<4=; + Δ) Why this might be good compared to softmax?

slide-9
SLIDE 9

Regression vs Classification

Regression

  • Labels are continuous

variables – e.g. distance.

  • Losses: Distance-based

losses, e.g. sum of distances to true values.

  • Evaluation: Mean distances,

correlation coefficients, etc. Classification

  • Labels are discrete variables (1
  • ut of K categories)
  • Losses: Cross-entropy loss,

margin losses, logistic regression (binary cross entropy)

  • Evaluation: Classification

accuracy, etc.

slide-10
SLIDE 10

Linear Regression – 1 output, 1 input

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

slide-11
SLIDE 11

Linear Regression – 1 output, 1 input

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

Model: . ! = 0" + 2

slide-12
SLIDE 12

Linear Regression – 1 output, 1 input

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

Model: . ! = 0" + 2

slide-13
SLIDE 13

Linear Regression – 1 output, 1 input

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

Model: . ! = 0" + 2 Loss: 3 0, 2 = 4

56$ 56-

. !5 − !5 '

slide-14
SLIDE 14

Quadratic Regression

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

Model: . ! = 0$"' + 0'" + 2 Loss: 3 0, 2 = 4

56$ 56-

. !5 − !5 '

slide-15
SLIDE 15

n-polynomial Regression

! "

("$, !$) ("', !') ("(, !() ("), !)) ("*, !*) ("+, !+) (",, !,) ("-, !-)

Model: . ! = 01"1 + ⋯ + 0$" + 4 Loss: 5 0, 4 = 6

78$ 78-

. !7 − !7 '

slide-16
SLIDE 16

Overfitting

!"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

% is linear % is cubic % is a polynomial of degree 9

Taken from Christopher Bishop’s Machine Learning and Pattern Recognition Book.

slide-17
SLIDE 17

Detecting Overfitting

  • Look at the values of the weights in the polynomial
slide-18
SLIDE 18

Recommended Reading

  • http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-

%20Pattern%20Recognition%20And%20Machine%20Learning%20- %20Springer%20%202006.pdf

Print and Read Chapter 1 (at minimum)

slide-19
SLIDE 19

19

More …

  • Regularization
  • Momentum updates
slide-20
SLIDE 20

Regularization

  • Large weights lead to large variance. i.e. model fits to the training

data too strongly.

  • Solution: Minimize the loss but also try to keep the weight values

small by doing the following:

minimize ! ", $ + &

'

|"'|)

slide-21
SLIDE 21

Regularization

  • Large weights lead to large variance. i.e. model fits to the training

data too strongly.

  • Solution: Minimize the loss but also try to keep the weight values

small by doing the following:

minimize ! ", $ + & '

(

|"(|* Regularizer term e.g. L2- regularizer

slide-22
SLIDE 22

22

SGD with Regularization (L-2)

! ", $ = ! ", $ + ' ∑) |")|+ , = 0.01 for e = 0, num_epochs do end Initialize w and b randomly 0!(", $)/0" 0!(", $)/0$ Compute: and Update w: Update b: " = " − , 0!(", $)/0" − ,'" $ = $ − , 0!(", $)/0$ − ,'" Print: !(", $) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do

slide-23
SLIDE 23

23

Revisiting Another Problem with SGD

! ", $ = ! ", $ + ' ∑) |")|+ , = 0.01 for e = 0, num_epochs do end Initialize w and b randomly 0!(", $)/0" 0!(", $)/0$ Compute: and Update w: Update b: " = " − , 0!(", $)/0" − ,'" $ = $ − , 0!(", $)/0$ − ,'" Print: !(", $) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do These are only approximations to the true gradient with respect to 5(", $)

slide-24
SLIDE 24

24

Revisiting Another Problem with SGD

! ", $ = ! ", $ + ' ∑) |")|+ , = 0.01 for e = 0, num_epochs do end Initialize w and b randomly 0!(", $)/0" 0!(", $)/0$ Compute: and Update w: Update b: " = " − , 0!(", $)/0" − ,'" $ = $ − , 0!(", $)/0$ − ,'" Print: !(", $) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do This could lead to “un- learning” what has been learned in some previous steps of training.

slide-25
SLIDE 25

25

Solution: Momentum Updates

! ", $ = ! ", $ + ' ∑) |")|+ , = 0.01 for e = 0, num_epochs do end Initialize w and b randomly 0!(", $)/0" 0!(", $)/0$ Compute: and Update w: Update b: " = " − , 0!(", $)/0" − ,'" $ = $ − , 0!(", $)/0$ − ,'" Print: !(", $) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do Keep track of previous gradients in an accumulator variable! and use a weighted average with current gradient.

slide-26
SLIDE 26

26

Solution: Momentum Updates

! ", $ = ! ", $ + ' ∑) |")|+ , = 0.01 for e = 0, num_epochs do end Initialize w and b randomly 0!(", $)/0" Compute: Update w: " = " − , 5 Print: !(", $) // Useful to see if this is becoming smaller or not. end for b = 0, num_batches do Keep track of previous gradients in an accumulator variable! and use a weighted average with current gradient. 6 = 0.9 global 5 Compute: 5 = 65 + 0!(", $)/0" + '"

slide-27
SLIDE 27

https://distill.pub/2017/momentum/

More on Momentum

slide-28
SLIDE 28
slide-29
SLIDE 29

Questions?

29