Softmax Classifier + Generalization Various slides from previous - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Softmax Classifier + Generalization Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays (Brown / Georgia Tech), A. Berg (Stony Brook / UNC), D. Samaras (Stony Brook) . J. M. Frahm (UNC), V. Ordonez (UVA), Steve Seitz (UW).

Last Class • Introduction to Machine Learning • Unsupervised Learning: Clustering (e.g. k-means clustering) • Supervised Learning: Classification (e.g. k-nearest neighbors)

Today’s Class • Softmax Classifier (Linear Classifiers) • Generalization / Overfitting / Regularization • Global Features

Supervised Learning vs Unsupervised Learning ! → $ ! cat dog bear dog bear dog cat cat bear

Supervised Learning vs Unsupervised Learning ! → $ ! cat dog bear dog Classification Clustering bear dog cat cat bear

Supervised Learning – k-Nearest Neighbors cat k=3 dog bear cat, cat, dog cat cat dog bear dog bear 7

Supervised Learning – k-Nearest Neighbors cat k=3 dog bear cat bear, dog, dog cat dog bear dog bear 8

Supervised Learning – k-Nearest Neighbors • How do we choose the right K? • How do we choose the right features? • How do we choose the right distance metric? 9

Supervised Learning – k-Nearest Neighbors • How do we choose the right K? • How do we choose the right features? • How do we choose the right distance metric? Answer: Just choose the one combination that works best! BUT not on the test data. Instead split the training data into a ”Training set” and a ”Validation set” (also called ”Development set”) 10

Supervised Learning - Classification Training Data Test Data dog bear cat dog bear cat cat bear dog 11

Supervised Learning - Classification Training Data Test Data cat dog cat . . . . . . bear 12

Supervised Learning - Classification Training Data ! ) = [ ] * ) = [ ] cat ! ( = [ ] * ( = [ ] dog * ' = [ ] ! ' = [ ] cat . . . ! " = [ ] * " = [ ] bear 13

Supervised Learning - Classification Training Data targets / We need to find a function that labels / inputs predictions maps x and y for any of them. ground truth ' & = [' && ' &% ' &$ ' &* ] ! & = ! : & = 1 1 ! , - = .(' , ; 1) ! % = ! : % = ' % = [' %& ' %% ' %$ ' %* ] 2 2 ' $ = [' $& ' $% ' $$ ' $* ] ! $ = ! : $ = 1 2 How do we ”learn” the parameters of this function? . We choose ones that makes the . following quantity small: . " 3 4567(! , -, ! , ) ! " = ! : " = ' " = [' "& ' "% ' "$ ' "* ] 3 1 ,9& 14

Supervised Learning –Softmax Classifier Training Data targets / labels / inputs ground truth ' & = [' && ' &% ' &$ ' &* ] ! & = 1 ! % = ' % = [' %& ' %% ' %$ ' %* ] 2 ' $ = [' $& ' $% ' $$ ' $* ] ! $ = 1 . . . ! " = ' " = [' "& ' "% ' "$ ' "* ] 3 15

Supervised Learning –Softmax Classifier Training Data targets / labels / predictions inputs ground truth ! , & = ' & = [' && ' &% ' &$ ' &* ] ! & = [0.85 0.10 0.05] [1 0 0] ! , % = ! % = ' % = [' %& ' %% ' %$ ' %* ] [0.20 0.70 0.10] [0 1 0] ! , $ = ' $ = [' $& ' $% ' $$ ' $* ] ! $ = [0.40 0.45 0.05] [1 0 0] . . . ! , " = ! " = [0.40 0.25 0.35] ' " = [' "& ' "% ' "$ ' "* ] [0 0 1] 16

Supervised Learning –Softmax Classifier $ " = [$ "& $ "( $ ") $ "* ] ! " = ! , " = [- . - / - 0 ] [1 0 0] 1 . = 2 .& $ "& + 2 .( $ "( + 2 .) $ ") + 2 .* $ "* + 4 . 1 / = 2 /& $ "& + 2 /( $ "( + 2 /) $ ") + 2 /* $ "* + 4 / 1 0 = 2 0& $ "& + 2 0( $ "( + 2 0) $ ") + 2 0* $ "* + 4 0 . = 5 6 7 /(5 6 7 +5 6 : + 5 6 ; ) - / = 5 6 : /(5 6 7 +5 6 : + 5 6 ; ) - 0 = 5 6 ; /(5 6 7 +5 6 : + 5 6 ; ) - 17

How do we find a good w and b? $ " = [$ "& $ "( $ ") $ "* ] ! " = ! , " = [- . (0, 2) - 4 (0, 2) - 5 (0, 2)] [1 0 0] We need to find w, and b that minimize the following function L: > ) > > 6 0, 2 = 7 7 −! ",9 log (! , ",9 ) = 7 −log (! , ",?@5A? ) = 7 −log - ",?@5A? (0, 2) "=& 9=& "=& "=& Why? 18

How do we find a good w and b? Problem statement: Find ! and " # !, " such that is minimal. Solution from calculus. % ! %! # !, " = 0 and solve for % " %" # !, " = 0 and solve for

https://courses.lumenlearning.com/businesscalc1/chapter/reading-curve-sketching/

How do we find a good w and b? Problem statement: Find ! and " # !, " such that is minimal. Solution from calculus. % ! %! # !, " = 0 and solve for % " %" # !, " = 0 and solve for

Problems with this approach: • Some functions L(w, b) are very complicated and compositions of many functions. So finding its analytical derivative is tedious. • Even if the function is simple to derivate, it might not be easy to solve for w. e.g. ! !" # ", % = ' ( + " = 0 How do you find w in that equation?

Solution: Iterative Approach: Gradient Descent (GD) 1. Start with a random value of w (e.g. w = 12) ! " 2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) 3. Recompute w as: w = w – lambda * (dL / dw) w=12 " 23

Solution: Iterative Approach: Gradient Descent (GD) ! " 2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) 3. Recompute w as: w = w – lambda * (dL / dw) w=10 " 24

Gradient Descent (GD) Problem: expensive! 7 = 0.01 4 !(#, %) = ( −log . /,01230 (#, %) Initialize w and b randomly /56 for e = 0, num_epochs do Compute: and ;!(#, %)/;% ;!(#, %)/;# Update w: # = # − 7 ;!(#, %)/;# Update b: % = % − 7 ;!(#, %)/;% // Useful to see if this is becoming smaller or not. Print: !(#, %) end 25

Solution: (mini-batch) Stochastic Gradient Descent (SGD) 6 = 0.01 !(#, %) = ( −log . /,01230 (#, %) Initialize w and b randomly /∈5 for e = 0, num_epochs do B is a small for b = 0, num_batches do set of training Compute: and :!(#, %)/:% :!(#, %)/:# examples. Update w: # = # − 6 :!(#, %)/:# Update b: % = % − 6 :!(#, %)/:% // Useful to see if this is becoming smaller or not. Print: !(#, %) end end 26

Source: Andrew Ng

Three more things • How to compute the gradient • Regularization • Momentum updates 28

SGD Gradient for the Softmax Function

Supervised Learning –Softmax Classifier ; < " = [2 , 2 0 2 1 ] Extract features ! " = [! "% ! "' ! "( ! ") ] Run features through classifier + , = - ,% ! "% + - ,' ! "' + - ,( ! "( + - ,) ! ") + / , + 0 = - 0% ! "% + - 0' ! "' + - 0( ! "( + - 0) ! ") + / 0 + 1 = - 1% ! "% + - 1' ! "' + - 1( ! "( + - 1) ! ") + / 1 Get predictions , = 3 4 5 /(3 4 5 +3 4 8 + 3 4 9 ) 2 0 = 3 4 8 /(3 4 5 +3 4 8 + 3 4 9 ) 2 1 = 3 4 9 /(3 4 5 +3 4 8 + 3 4 9 ) 2 32

Supervised Machine Learning Steps Training Training Labels Training Images Image Learned Training Features model Learned model Testing Image Prediction Features Test Image Slide credit: D. Hoiem

Generalization Generalization refers to the ability to correctly classify never before • seen examples Can be controlled by turning “knobs” that affect the complexity of • the model Test set (labels unknown) Training set (labels known)

Overfitting % is a polynomial of % is linear % is cubic degree 9 !"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

Questions? 36

Softmax Classifier + Generalization Various slides from previous - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Softmax Classifier + Generalization Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays

Softmax Alternatives in Neural MT Graham Neubig 5/24/2017 1 Softmax Alternatives in Neural MT

Neural Networks + Backpropagation Last Class Softmax Classifier Generalization /

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization,

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Softmax Classifier + SGD Todays Class Intro to Machine Learning What is Machine Learning?

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

for Efficient Softmax Inference Shun Liao* 1 , Ting Chen* 2 , Tian Lin 2 , Denny Zhou 2 , Chong

Contextual Token Representations ULMfit, OpenAI GPT, ELMo, BERT, XLM Noe Casas Background:

Large-Margin Softmax Loss for Conv. Neural Networks Weiyang Liu 1* , Yandong Wen 2* , Zhiding Yu 3

Breaking the Softmax Bottleneck via Monotonic Functions Octavian Ganea, Sylvain Gelly, Gary

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Information extraction from social media: A linguistically motivated approach Nelleke Oostdijk 1

# # # # # # $250 % $2500 % $30,000%/%year% Price:%$1,250% Cost:%$350% Cash#Flow#

GroRef: Rule-Based Coreference Resolution for Dutch Rob van der Goot, Hessel Haagsma, Dieke Oele

EU-GIZ Adapting to Climate Change and Sustainable Energy An 18M EUR programme funded by EU,

Line Planning in Public Transportation Anita Schbel Institut fr Numerische und Angewandte

Exploratory testing in theory and practice Jan Jaap Cannegieter Principal consultant Squerist

Welcome in Heraklion! The Schedule for Today Keynote Please Respect the Timing 15 minutes

Welcom ome! e! Campaign for Community Wellness June 26, 2020 10:00 a.m. to 11:30 a.m. Zoom