CS4501: Introduction to Computer Vision Max-Margin Classifier, - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging

Previous Class • Softmax Classifier • Inference vs Training • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD)

Previous Class • Softmax Classifier • Inference vs Training • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD) • Generalization • Regularization / Momentum • Max-Margin Classifier • Regression / Tagging

(mini-batch) Stochastic Gradient Descent (SGD) '(), +) = / −log 6 0,789:7 (), +) ! = 0.01 0∈2 Initialize w and b randomly For Softmax Classifier for e = 0, num_epochs do for b = 0, num_batches do Compute: and &'(), +)/&) &'(), +)/&+ Update w: ) = ) − ! &'(), +)/&) Update b: + = + − ! &'(), +)/&+ // Useful to see if this is becoming smaller or not. Print: '(), +) end end 4

Supervised Learning –Softmax Classifier ; " = : [1 1 1 0 ] + / Extract features ! " = [! "% ! "& ! "' ! "( ] Run features through classifier * + = , +% ! "% + , +& ! "& + , +' ! "' + , +( ! "( + . + * / = , /% ! "% + , /& ! "& + , /' ! "' + , /( ! "( + . / * 0 = , 0% ! "% + , 0& ! "& + , 0' ! "' + , 0( ! "( + . 0 Get predictions + = 2 3 4 /(2 3 4 +2 3 7 + 2 3 8 ) 1 / = 2 3 7 /(2 3 4 +2 3 7 + 2 3 8 ) 1 0 = 2 3 8 /(2 3 4 +2 3 7 + 2 3 8 ) 1 5

Linear Max Margin-Classifier Training Data targets / labels / predictions inputs ground truth ! & = + ! & = [4.3 -1.3 1.1] ' & = [' && ' &% ' &$ ' &) ] [1 0 0] ! % = + ' % = [' %& ' %% ' %$ ' %) ] ! % = [0.5 5.6 -4.2] [0 1 0] ! $ = + ! $ = [3.3 3.5 1.1] ' $ = [' $& ' $% ' $$ ' $) ] [1 0 0] . . . ! " = + ! " = [1.1 -5.3 -9.4] ' " = [' "& ' "% ' "$ ' ") ] [0 0 1] 6

Linear – Max Margin Classifier - Inference ! " = ! " = + [, , , / ] $ " = [$ "& $ "' $ "( $ ") ] [1 0 0] - . , - = 0 -& $ "& + 0 -' $ "' + 0 -( $ "( + 0 -) $ ") + 2 - , . = 0 .& $ "& + 0 .' $ "' + 0 .( $ "( + 0 .) $ ") + 2 . , / = 0 /& $ "& + 0 /' $ "' + 0 /( $ "( + 0 /) $ ") + 2 / 7

Training: How do we find a good w and b? ! " = ! " = + [, - (/, 1) , 3 (/, 1) , 4 (/, 1)] $ " = [$ "& $ "' $ "( $ ") ] [1 0 0] We need to find w, and b that minimize the following: 8 5 /, 1 = 6 6 max(0, + ! "9 − + ! ",;<4=; + Δ) "7& 9:;<4=; Why this might be good compared to softmax? 8

Regression vs Classification Regression Classification Labels are continuous Labels are discrete variables (1 • • variables – e.g. distance. out of K categories) Losses: Distance-based Losses: Cross-entropy loss, • • losses, e.g. sum of distances margin losses, logistic regression to true values. (binary cross entropy) Evaluation: Mean distances, Evaluation: Classification • • correlation coefficients, etc. accuracy, etc.

Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) "

Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " ! = 0" + 2 . Model:

Linear Regression – 1 output, 1 input ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 56- ! 5 − ! 5 ' Loss: 3 0, 2 = 4 . ! = 0" + 2 . Model: 56$

Quadratic Regression ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 56- ! = 0 $ " ' + 0 ' " + 2 ! 5 − ! 5 ' Loss: 3 0, 2 = 4 . Model: . 56$

n-polynomial Regression ! (" - , ! - ) (" + , ! + ) (" , , ! , ) (" ) , ! ) ) (" * , ! * ) (" ' , ! ' ) (" ( , ! ( ) (" $ , ! $ ) " 78- ! = 0 1 " 1 + ⋯ + 0 $ " + 4 ! 7 − ! 7 ' Loss: 5 0, 4 = 6 . Model: . 78$

Taken from Christopher Bishop’s Machine Learning and Pattern Recognition Book. Overfitting % is a polynomial of % is linear % is cubic degree 9 !"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

Detecting Overfitting • Look at the values of the weights in the polynomial

Recommended Reading • http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20- %20Pattern%20Recognition%20And%20Machine%20Learning%20- %20Springer%20%202006.pdf Print and Read Chapter 1 (at minimum)

More … • Regularization • Momentum updates 19

Regularization Large weights lead to large variance. i.e. model fits to the training • data too strongly. Solution: Minimize the loss but also try to keep the weight values • small by doing the following: |" ' | ) ! ", $ + & minimize '

Regularization Large weights lead to large variance. i.e. model fits to the training • data too strongly. Solution: Minimize the loss but also try to keep the weight values • small by doing the following: |" ( | * ! ", $ + & ' minimize Regularizer term e.g. L2- regularizer (

SGD with Regularization (L-2) , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do for b = 0, num_batches do Compute: and 0!(", $)/0" 0!(", $)/0$ Update w: " = " − , 0!(", $)/0" − ,'" Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 22

Revisiting Another Problem with SGD , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do These are only for b = 0, num_batches do approximations to the Compute: and 0!(", $)/0" 0!(", $)/0$ true gradient with Update w: " = " − , 0!(", $)/0" − ,'" respect to 5(", $) Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 23

Revisiting Another Problem with SGD , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do This could lead to “un- for b = 0, num_batches do learning” what has Compute: and 0!(", $)/0" 0!(", $)/0$ been learned in some Update w: " = " − , 0!(", $)/0" − ,'" previous steps of training. Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 24

Solution: Momentum Updates , = 0.01 ! ", $ = ! ", $ + ' ∑ ) |" ) | + Initialize w and b randomly for e = 0, num_epochs do Keep track of previous for b = 0, num_batches do gradients in an accumulator variable! Compute: and 0!(", $)/0" 0!(", $)/0$ and use a weighted Update w: " = " − , 0!(", $)/0" − ,'" average with current gradient. Update b: $ = $ − , 0!(", $)/0$ − ,'" // Useful to see if this is becoming smaller or not. Print: !(", $) end end 25

Solution: Momentum Updates , = 0.01 6 = 0.9 Initialize w and b randomly ! ", $ = ! ", $ + ' ∑ ) |" ) | + global 5 for e = 0, num_epochs do Keep track of previous for b = 0, num_batches do gradients in an Compute: 0!(", $)/0" accumulator variable! Compute: 5 = 65 + 0!(", $)/0" + '" and use a weighted average with current Update w: " = " − , 5 gradient. // Useful to see if this is becoming smaller or not. Print: !(", $) end end 26

More on Momentum https://distill.pub/2017/momentum/

Questions? 29

CS4501: Introduction to Computer Vision Max-Margin Classifier, - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging Previous Class Softmax Classifier Inference vs Training Gradient Descent (GD)

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

CS4501: Introduction to Computer Vision Deeper Convolutional Neural Network Architectures Last

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Continuous Improvement Toolkit Regression (Introduction) Continuous Improvement Toolkit .

STK-IN4300 Piecewise polynomials and splines Smoothing splines Statistical Learning Methods in

Permutation Groups and Transformation Semigroups Lecture 1: Introduction Peter J. Cameron

CPSC 340: Machine Learning and Data Mining More Regularization Summer 2020 Admin

Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support

What is modeling? NEU 466M Instructor: Professor Ila R.

Modeling Performance and Energy Efficiency of Applica5on Codes

CS4501: Introduction to Computer Vision Max-Margin Classifier, - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization, Momentum, Regression, Multi-label Classification / Tagging Previous Class Softmax Classifier Inference vs Training Gradient Descent (GD)

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

CS4501: Introduction to Computer Vision Deeper Convolutional Neural Network Architectures Last

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Continuous Improvement Toolkit Regression (Introduction) Continuous Improvement Toolkit .

STK-IN4300 Piecewise polynomials and splines Smoothing splines Statistical Learning Methods in

Permutation Groups and Transformation Semigroups Lecture 1: Introduction Peter J. Cameron

CPSC 340: Machine Learning and Data Mining More Regularization Summer 2020 Admin

Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support

What is modeling? NEU 466M Instructor: Professor Ila R.

Modeling Performance and Energy Efficiency of Applica5on Codes

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007