cs4501 introduction to computer vision
play

CS4501: Introduction to Computer Vision Max-Margin Classifier, - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging Previous Class Softmax Classifier Inference vs Training Gradient Descent (GD)


  1. CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging

  2. Previous Class • Softmax Classifier • Inference vs Training • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD)

  3. Previous Class • Softmax Classifier • Inference vs Training • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD) • Generalization • Regularization / Momentum • Max-Margin Classifier • Regression / Tagging

  4. (mini-batch) Stochastic Gradient Descent (SGD) '(), +) = / −log 6 0,789:7 (), +) ! = 0.01 0∈2 Initialize w and b randomly For Softmax Classifier for e = 0, num_epochs do for b = 0, num_batches do Compute: and &'(), +)/&) &'(), +)/&+ Update w: ) = ) − ! &'(), +)/&) Update b: + = + − ! &'(), +)/&+ // Useful to see if this is becoming smaller or not. Print: '(), +) end end 4

  5. Supervised Learning –Softmax Classifier ; " = : [1 1 1 0 ] + / Extract features ! " = [! "% ! "& ! "' ! "( ] Run features through classifier * + = , +% ! "% + , +& ! "& + , +' ! "' + , +( ! "( + . + * / = , /% ! "% + , /& ! "& + , /' ! "' + , /( ! "( + . / * 0 = , 0% ! "% + , 0& ! "& + , 0' ! "' + , 0( ! "( + . 0 Get predictions + = 2 3 4 /(2 3 4 +2 3 7 + 2 3 8 ) 1 / = 2 3 7 /(2 3 4 +2 3 7 + 2 3 8 ) 1 0 = 2 3 8 /(2 3 4 +2 3 7 + 2 3 8 ) 1 5

  6. Linear Max Margin-Classifier Training Data targets / labels / predictions inputs ground truth ! & = + ! & = [4.3 -1.3 1.1] ' & = [' && ' &% ' &$ ' &) ] [1 0 0] ! % = + ' % = [' %& ' %% ' %$ ' %) ] ! % = [0.5 5.6 -4.2] [0 1 0] ! $ = + ! $ = [3.3 3.5 1.1] ' $ = [' $& ' $% ' $$ ' $) ] [1 0 0] . . . ! " = + ! " = [1.1 -5.3 -9.4] ' " = [' "& ' "% ' "$ ' ") ] [0 0 1] 6

  7. Linear – Max Margin Classifier - Inference ! " = ! " = + [, , , / ] $ " = [$ "& $ "' $ "( $ ") ] [1 0 0] - . , - = 0 -& $ "& + 0 -' $ "' + 0 -( $ "( + 0 -) $ ") + 2 - , . = 0 .& $ "& + 0 .' $ "' + 0 .( $ "( + 0 .) $ ") + 2 . , / = 0 /& $ "& + 0 /' $ "' + 0 /( $ "( + 0 /) $ ") + 2 / 7

  8. Training: How do we find a good w and b? ! " = ! " = + [, - (/, 1) , 3 (/, 1) , 4 (/, 1)] $ " = [$ "& $ "' $ "( $ ") ] [1 0 0] We need to find w, and b that minimize the following: 8 5 /, 1 = 6 6 max(0, + ! "9 − + ! ",;<4=; + Δ) "7& 9:;<4=; Why this might be good compared to softmax? 8

  9. Regression vs Classification Regression Classification Labels are continuous Labels are discrete variables (1 • • variables – e.g. distance. out of K categories) Losses: Distance-based Losses: Cross-entropy loss, • • losses, e.g. sum of distances margin losses, logistic regression to true values. (binary cross entropy) Evaluation: Mean distances, Evaluation: Classification • • correlation coefficients, etc. accuracy, etc.

  10. Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) "

  11. Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " ! = 0" + 2 . Model:

  12. Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " ! = 0" + 2 . Model:

  13. Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 56- ! 5 − ! 5 ' Loss: 3 0, 2 = 4 . ! = 0" + 2 . Model: 56$

  14. Quadratic Regression ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 56- ! = 0 $ " ' + 0 ' " + 2 ! 5 − ! 5 ' Loss: 3 0, 2 = 4 . Model: . 56$

  15. n-polynomial Regression ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 78- ! = 0 1 " 1 + ⋯ + 0 $ " + 4 ! 7 − ! 7 ' Loss: 5 0, 4 = 6 . Model: . 78$

  16. Taken from Christopher Bishop’s Machine Learning and Pattern Recognition Book. Overfitting % is a polynomial of % is linear % is cubic degree 9 !"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

  17. Detecting Overfitting • Look at the values of the weights in the polynomial

  18. Recommended Reading • http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20- %20Pattern%20Recognition%20And%20Machine%20Learning%20- %20Springer%20%202006.pdf Print and Read Chapter 1 (at minimum)

  19. More … • Regularization • Momentum updates 19

  20. Regularization Large weights lead to large variance. i.e. model fits to the training • data too strongly. Solution: Minimize the loss but also try to keep the weight values • small by doing the following: |" ' | ) ! ", $ + & minimize '

  21. Regularization Large weights lead to large variance. i.e. model fits to the training • data too strongly. Solution: Minimize the loss but also try to keep the weight values • small by doing the following: |" ( | * ! ", $ + & ' minimize Regularizer term e.g. L2- regularizer (

  22. SGD with Regularization (L-2) , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do for b = 0, num_batches do Compute: and 0!(", $)/0" 0!(", $)/0$ Update w: " = " − , 0!(", $)/0" − ,'" Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 22

  23. Revisiting Another Problem with SGD , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do These are only for b = 0, num_batches do approximations to the Compute: and 0!(", $)/0" 0!(", $)/0$ true gradient with Update w: " = " − , 0!(", $)/0" − ,'" respect to 5(", $) Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 23

  24. Revisiting Another Problem with SGD , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do This could lead to “un- for b = 0, num_batches do learning” what has Compute: and 0!(", $)/0" 0!(", $)/0$ been learned in some Update w: " = " − , 0!(", $)/0" − ,'" previous steps of training. Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 24

  25. Solution: Momentum Updates , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do Keep track of previous for b = 0, num_batches do gradients in an accumulator variable! Compute: and 0!(", $)/0" 0!(", $)/0$ and use a weighted Update w: " = " − , 0!(", $)/0" − ,'" average with current gradient. Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 25

  26. Solution: Momentum Updates , = 0.01 6 = 0.9 Initialize w and b randomly ! ", $ = ! ", $ + ' ∑ ) |" ) | + global 5 for e = 0, num_epochs do Keep track of previous for b = 0, num_batches do gradients in an Compute: 0!(", $)/0" accumulator variable! Compute: 5 = 65 + 0!(", $)/0" + '" and use a weighted average with current Update w: " = " − , 5 gradient. // Useful to see if this is becoming smaller or not. Print: !(", $) end end 26

  27. More on Momentum https://distill.pub/2017/momentum/

  28. Questions? 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend