Regression Many slides attributable to: Prof. Mike Hughes Erik - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 1

Logistics • HW0 due TONIGHT (Wed 1/23 at 11:59pm) • HW1 out later tonight, due a week from today • What you submit: PDF and zip • Next recitation is Mon 1/28 • Multivariate Calculus review • The gory math behind linear regression Mike Hughes - Tufts COMP 135 - Spring 2019 2

Regression Unit Objectives • 3 steps of a regression task • Training • Prediction • Evaluation • Metrics • Splitting data into train/valid/test • A “taste” of 3 Methods • Linear Regression • K-Nearest Neighbors • Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 3

What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4

Task: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Spring 2019 5

Regression Example: Uber Supervised Learning regression Unsupervised Learning Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 6

Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 7

Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 8

Try it! What should happen here? What info did you use to make that guess? Mike Hughes - Tufts COMP 135 - Spring 2019 9

Regression: Prediction Step Goal: Predict response y well given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” y ( x i ) ∈ R ˆ • Output: Scalar value like 3.1 or -133.7 “responses” “labels” Mike Hughes - Tufts COMP 135 - Spring 2019 10

Regression: Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x >>> x_NF.shape (N, F) >>> yhat_N = model.predict(x_NF) >>> yhat_N.shape (N,) Mike Hughes - Tufts COMP 135 - Spring 2019 11

Regression: Training Step Goal: Given a labeled dataset, learn a function that can perform prediction well • Input: Pairs of features and labels/responses { x n , y n } N n =1 y ( · ) : R F → R ˆ • Output: Mike Hughes - Tufts COMP 135 - Spring 2019 12

Regression: Training Step >>> # Given: 2D array of features x >>> # Given: 1D array of responses/labels y >>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = RegressionModel() >>> model.fit(x_NF, y_N) Mike Hughes - Tufts COMP 135 - Spring 2019 13

Regression: Evaluation Step Goal: Assess quality of predictions • Input: Pairs of predicted and “true” responses y ( x n ) , y n } N { ˆ n =1 • Output: Scalar measure of error/quality • Measuring Error: lower is better • Measuring Quality: higher is better Mike Hughes - Tufts COMP 135 - Spring 2019 14

Visualizing errors Mike Hughes - Tufts COMP 135 - Spring 2019 15

Regression: Evaluation Metrics N 1 • mean squared error X y n ) 2 ( y n − ˆ N n =1 N • mean absolute error 1 X | y n − ˆ y n | N n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 16

Discuss • Which error metric is more sensitive to outliers? • Which error metric is the easiest to take derivatives of? Mike Hughes - Tufts COMP 135 - Spring 2019 17

Regression: Evaluation Metrics https://scikit-learn.org/stable/modules/model_evaluation.html Mike Hughes - Tufts COMP 135 - Spring 2019 18

How to model y given x ? Mike Hughes - Tufts COMP 135 - Spring 2019 19

Is the model constant? Mike Hughes - Tufts COMP 135 - Spring 2019 20

Is the model linear? Mike Hughes - Tufts COMP 135 - Spring 2019 21

Is the model polynomial? Mike Hughes - Tufts COMP 135 - Spring 2019 22

Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 23

Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 24

Labeled dataset y x Each row represents one example Assume rows are arranged “uniformly at random” (order doesn’t matter) Mike Hughes - Tufts COMP 135 - Spring 2019 25

Split into train and test y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 26

Model Complexity vs Error Overfitting Underfitting Mike Hughes - Tufts COMP 135 - Spring 2019 27

How to fit best model? Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 28

How to fit best model? Avoid! Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x Problems train Fitting procedure used test data • Not fair assessment of how will do on • unseen data test Mike Hughes - Tufts COMP 135 - Spring 2019 29

How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train validation test Mike Hughes - Tufts COMP 135 - Spring 2019 30

How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train Concerns • Will train be too small? Make better use of data? • validation test Mike Hughes - Tufts COMP 135 - Spring 2019 31

Linear Regression Parameters: w = [ w 1 , w 2 , . . . w f . . . w F ] weight vector b bias scalar Prediction: F X y ( x i ) , ˆ w f x if + b f =1 Training: find weights and bias that minimize error Mike Hughes - Tufts COMP 135 - Spring 2019 32

Sales vs. Ad Budgets Mike Hughes - Tufts COMP 135 - Spring 2019 33

Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 34

Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist! x = mean( x 1 , . . . x N ) ¯ With only one feature (F=1): y = mean( y 1 , . . . y N ) ¯ P N n =1 ( x n − ¯ x )( y n − ¯ y ) b = ¯ y − w ¯ x w = P N n =1 ( x n − ¯ x ) 2 We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 35

Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist!   x 11 . . . x 1 F 1 x 21 . . . x 2 F 1 ˜   X = With many features (F >= 1 ):   . . .   x N 1 . . . x NF 1 [ w 1 . . . w F b ] T = ( ˜ X T ˜ X ) − 1 ˜ X T y We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 36

Nearest Neighbor Regression Parameters: none Prediction: - find “nearest” training vector to given input x - predict y value of this neighbor Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 37

Distance metrics v F u • Euclidean X u dist( x, x 0 ) = ( x f − x 0 f ) 2 t f =1 F X • Manhattan dist( x, x 0 ) = | x f − x 0 f | f =1 • Many others are possible Mike Hughes - Tufts COMP 135 - Spring 2019 38

Nearest Neighbor “Prediction functions” are piecewise constant Mike Hughes - Tufts COMP 135 - Spring 2019 39

K nearest neighbor regression Parameters: K : number of neighbors Prediction: - find K “nearest” training vectors to input x - predict average y of this neighborhood Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 40

Error vs Model Complexity Credit: Fig 2.4 ESL textbook Mike Hughes - Tufts COMP 135 - Spring 2019 41

Salary prediction for Hitters data Mike Hughes - Tufts COMP 135 - Spring 2019 42

Mike Hughes - Tufts COMP 135 - Spring 2019 43

Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 44

Decision tree regression Parameters: - at each internal node: x variable id and threshold - at each leaf: scalar y value to predict Prediction assumption: - x space is divided into rectangular regions - y is similar within “region” Training assumption: - minimize error on training set - often, use greedy heuristics Mike Hughes - Tufts COMP 135 - Spring 2019 45

Ideal Training for Decision Tree J X X y R j ) 2 min ( y n − ˆ R 1 ,...R J j =1 n : x n ∈ R j Search space is too big! Hard to solve exactly… Mike Hughes - Tufts COMP 135 - Spring 2019 46

Regression Many slides attributable to: Prof. Mike Hughes Erik - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

EMPLOYMENT COUNSELLING SERVICES MR M.S BIYELA PRINCIPAL PSYCHOLOGIST KWAZULU-NATAL OVERVIEW

An updated version of the Participant Guidance is due to be published on the ESIF website in

Third Quarter 2020 Earnings Release Supplement October 23, 2020 The data in this package should

Luca Piovano UPM DAY 2: SMART CITIES TABLE 3: SMART CITY AND URBAN SUSTAINABLE DEVELOPMENT

CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld/ University of Washington [Many

L M A D A Learning And Mining from DatA NANJING UNIVERSITY Adaptive Regret of Convex and

Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard

Learning wit ith Pairw rwis ise Losses Problems, Algorithms and Analysis Purushottam Kar

Regression Many slides attributable to: Prof. Mike Hughes Erik - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

EMPLOYMENT COUNSELLING SERVICES MR M.S BIYELA PRINCIPAL PSYCHOLOGIST KWAZULU-NATAL OVERVIEW

An updated version of the Participant Guidance is due to be published on the ESIF website in

Third Quarter 2020 Earnings Release Supplement October 23, 2020 The data in this package should

Luca Piovano UPM DAY 2: SMART CITIES TABLE 3: SMART CITY AND URBAN SUSTAINABLE DEVELOPMENT

CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld/ University of Washington [Many

L M A D A Learning And Mining from DatA NANJING UNIVERSITY Adaptive Regret of Convex and

Efficient tracking of a growing number of experts Jaouad Mourtada &amp; Odalric-ambrym Maillard

Learning wit ith Pairw rwis ise Losses Problems, Algorithms and Analysis Purushottam Kar

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard