Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: - PowerPoint PPT Presentation

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020

Administrative • HW3 due Wednesday, March 4 11:59pm • TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish the assignment by Friday, February 25 Justin Johnson EECS 442 WI 2020: Lecture 14 - 2 February 25, 2020

Last Time: Supervised Learning 1. Collect a dataset of images and labels 2. Use Machine Learning to train a classifier 3. Evaluate the classifier on new images Example training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 3 February 25, 2020

Last Time: Least Squares 8 arg min 𝒛 − 𝒀𝒙 8 or 𝒙 Training ( x i ,y i ): 4 𝒙 " 𝒚 𝒋 − 𝑧 2 8 arg min 𝒙 1 23& 𝒙 " 𝒚 = 𝑥 & 𝑦 & + ⋯ + 𝑥 * 𝑦 * Inference (x): Testing/Inference: Given a new output, what’s the prediction? Justin Johnson EECS 442 WI 2020: Lecture 14 - 4 February 25, 2020

Last Time: Regularization 8 + 𝜇 𝒙 8 8 arg min 𝒛 − 𝒀𝒙 8 Objective: 𝒙 Loss Trade-off Regularization What happens (and why) if: • λ=0 • λ= ∞ Least-squares w = 0 Something sensible? ? 0 ∞ Justin Johnson EECS 442 WI 2020: Lecture 14 - 5 February 25, 2020

Hyperparameters 8 + 𝜇 𝒙 8 8 arg min 𝒛 − 𝒀𝒙 8 Objective: 𝒙 Loss Trade-off Regularization What happens (and why) if: • λ=0 • λ= ∞ W is a parameter , since we optimize for it on the training set λ is a hyperparameter , since we choose it before fitting the training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 6 February 25, 2020

Choosing Hyperparameters Idea #1 : Choose hyperparameters BAD : λ =0 always works that work best on the data best on training data Your Dataset Idea #2 : Split data into train and BAD : No idea how we test , choose hyperparameters will perform on new data that work best on test data train test Idea #3 : Split data into train , val , and Better! test ; choose hyperparameters on val and evaluate on test train validation test Justin Johnson EECS 442 WI 2020: Lecture 14 - 7 February 25, 2020

Choosing Hyperparameters Your Dataset Idea #4 : Cross-Validation : Split data into folds , try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but (unfortunately) not used too frequently in deep learning Justin Johnson EECS 442 WI 2020: Lecture 14 - 8 February 25, 2020

Training and Testing Fit model parameters on training set; find hyperparameters by testing on validation set; evaluate on entirely unseen test set. Training Validation Test Use these data Evaluate on these points to fit points for different λ, w* =( X T X + λ I ) -1 X T y pick the best Justin Johnson EECS 442 WI 2020: Lecture 14 - 9 February 25, 2020

Image Classification Start with simplest example: binary classification x 1 x 2 Cat or not cat? … x N Actually: a feature vector representing the image Justin Johnson EECS 442 WI 2020: Lecture 14 - 10 February 25, 2020

Classification with Least Squares Treat as regression: x i is image feature; y i is 1 if it’s a cat, 0 if it’s not a cat. Minimize least-squares loss. 4 Training ( x i ,y i ): 𝒙 " 𝒚 𝒋 − 𝑧 2 8 arg min 𝒙 1 23& 𝒙 " 𝒚 > 𝑢 Inference (x): Unprincipled in theory, but often effective in practice The reverse (regression via discrete bins) is also common Rifkin, Yeo, Poggio. Regularized Least Squares Classification (http://cbcl.mit.edu/publications/ps/rlsc.pdf). 2003 Redmon, Divvala, Girshick, Farhadi. You Only Look Once: Unified, Real-Time Object Detection. CVPR 2016. Justin Johnson EECS 442 WI 2020: Lecture 14 - 11 February 25, 2020

Classification via Memorization Just memorize (as in a Python dictionary) Consider cat/dog/hippo classification. If this: If this: If this: cat. dog. hippo. Justin Johnson EECS 442 WI 2020: Lecture 14 - 12 February 25, 2020

Classification via Memorization Where does this go wrong? Rule: if this, Hmmm. Not quite the then cat same. Justin Johnson EECS 442 WI 2020: Lecture 14 - 13 February 25, 2020

Classification via Memorization Test Known Images Image Labels 𝐸(𝒚 & , 𝒚 " ) 𝒚 & 𝒚 " Cat Cat! … 𝐸(𝒚 @ , 𝒚 " ) (1) Compute distance between feature vectors 𝒚 @ (2) find nearest Dog (3) use label. Justin Johnson EECS 442 WI 2020: Lecture 14 - 14 February 25, 2020

Nearest Neighbor “Algorithm” Memorize training set Training ( x i ,y i ): bestDist, prediction = Inf, None Inference (x): for i in range(N): if dist(x i ,x) < bestDist: bestDist = dist(x i ,x) prediction = y i Justin Johnson EECS 442 WI 2020: Lecture 14 - 15 February 25, 2020

Nearest Neighbor Decision boundaries How to smooth out Nearest neighbors can be noisy; decision boundaries? in two dimensions affected by outliers Use more neighbors! x 1 Decision boundary is the boundary between two Points are training classification regions examples; colors give training labels Background colors give the category x a test point would x 0 be assigned Justin Johnson EECS 442 WI 2020: Lecture 14 - 16 February 25, 2020

K-Nearest Neighbors K = 1 K = 3 Instead of copying label from nearest neighbor, take majority vote from K closest points Justin Johnson EECS 442 WI 2020: Lecture 14 - 17 February 25, 2020

K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps smooth out rough decision boundaries Justin Johnson EECS 442 WI 2020: Lecture 14 - 18 February 25, 2020

K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps reduce the effect of outliers Justin Johnson EECS 442 WI 2020: Lecture 14 - 19 February 25, 2020

K-Nearest Neighbors K = 1 K = 3 When K > 1 there can be ties! Need to break them somehow Justin Johnson EECS 442 WI 2020: Lecture 14 - 20 February 25, 2020

K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 𝑦 2 − 𝑧 2 8 𝑒 𝑦, 𝑧 = 1 𝑒 𝑦, 𝑧 = 1 𝑦 2 − 𝑧 2 2 2 Justin Johnson EECS 442 WI 2020: Lecture 14 - 21 February 25, 2020

K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 𝑦 2 − 𝑧 2 8 𝑒 𝑦, 𝑧 = 1 𝑒 𝑦, 𝑧 = 1 𝑦 2 − 𝑧 2 2 2 K = 1 K = 1 Justin Johnson EECS 442 WI 2020: Lecture 14 - 22 February 25, 2020

K-Nearest Neighbors What distance? What value for K? Training Validation Test Use these data Evaluate on these points for lookup points for different k, distances Justin Johnson EECS 442 WI 2020: Lecture 14 - 23 February 25, 2020

K-Nearest Neighbors • No learning going on but usually effective • Same algorithm for every task • As number of datapoints → ∞, error rate is guaranteed to be at most 2x worse than optimal you could do on data • Training is fast, but inference is slow. Opposite of what we want! Justin Johnson EECS 442 WI 2020: Lecture 14 - 24 February 25, 2020

Linear Classifiers Example Setup: 3 classes 𝒙 G , 𝒙 & , 𝒙 8 Model – one weight per class: " 𝒚 𝒙 G big if cat " 𝒚 𝒙 & big if dog " 𝒚 𝒙 8 big if hippo Stack together: 𝑿 𝟒𝒚𝑮 where x is in R F Justin Johnson EECS 442 WI 2020: Lecture 14 - 25 February 25, 2020

Linear Classifiers Cat weight vector Cat score 0.2 -0.5 0.1 2.0 1.1 56 -96.8 Dog weight vector Dog score 1.5 1.3 2.1 0.0 3.2 231 437.9 Hippo weight vector Hippo score 0.0 0.3 0.2 -0.3 -1.2 24 61.95 2 𝑿𝒚 𝒋 𝑿 1 Prediction is vector Weight matrix a collection of scoring where jth component is 𝒚 𝒋 functions, one per class “score” for jth class. Diagram by: Karpathy, Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 26 February 25, 2020

Linear Classifiers: Geometric Intuition What does a linear classifier look like in 2D? Be aware: Intuition from 2D doesn’t always carry over into high-dimensional spaces. See: On the Surprising Behavior of Distance Metrics in High Dimensional Space. Charu, Hinneburg, Keim. ICDT 2001 Diagram credit: Karpathy & Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 27 February 25, 2020

Linear Classifiers: Visual Intuition CIFAR 10: 32x32x3 Images, 10 Classes Turn each image into • feature by unrolling all pixels Train a linear model • to recognize 10 classes Justin Johnson EECS 442 WI 2020: Lecture 14 - 28 February 25, 2020

Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Deer or Plane? Justin Johnson EECS 442 WI 2020: Lecture 14 - 29 February 25, 2020

Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Ship or Dog? Justin Johnson EECS 442 WI 2020: Lecture 14 - 30 February 25, 2020

Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Justin Johnson EECS 442 WI 2020: Lecture 14 - 31 February 25, 2020

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: - PowerPoint PPT Presentation

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020 Administrative HW3 due Wednesday, March 4 11:59pm TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

Sketching Linear Classifiers Over Data Streams Kai Sheng Tai Vatsal Sharan, Peter Bailis,

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

5: Overtraining and Cross-validation Machine Learning and Real-world Data Helen Yannakoudakis 1

Process Transformation: The Lean Sigma Culture By Dr. Satnam Singh Master Black Belt Six Sigma

Future balloon experiments for the measurement of electron spectra at high energy P. S.

SECOND QUARTER FISCAL YEAR 2020 FINANCIAL RESULTS October 30, 2019 CAUTIONARY STATEMENT UNDER

CMS Pixel Detector Upgrade Xuan Chen on behalf of the CMS FPIX Upgrade group Senior, Physics

The geometry of foliations with singularities Marco Zambon Inaugurale lezingen March 25, 2015

Geometric constructions related to isoparametric foliations Chao Qian Beijing Institute of

On contact structures and open books Jiajun Wang 1 1 LMAM, School of Mathematical Sciences Peking

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: - PowerPoint PPT Presentation

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020 Administrative HW3 due Wednesday, March 4 11:59pm TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

Sketching Linear Classifiers Over Data Streams Kai Sheng Tai Vatsal Sharan, Peter Bailis,

Linear &amp; nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Linear &amp; nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

5: Overtraining and Cross-validation Machine Learning and Real-world Data Helen Yannakoudakis 1

Process Transformation: The Lean Sigma Culture By Dr. Satnam Singh Master Black Belt Six Sigma

Future balloon experiments for the measurement of electron spectra at high energy P. S.

SECOND QUARTER FISCAL YEAR 2020 FINANCIAL RESULTS October 30, 2019 CAUTIONARY STATEMENT UNDER

CMS Pixel Detector Upgrade Xuan Chen on behalf of the CMS FPIX Upgrade group Senior, Physics

The geometry of foliations with singularities Marco Zambon Inaugurale lezingen March 25, 2015

Geometric constructions related to isoparametric foliations Chao Qian Beijing Institute of

On contact structures and open books Jiajun Wang 1 1 LMAM, School of Mathematical Sciences Peking

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology