Regression
1
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many slides attributable to: Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books)
- Prof. Mike Hughes
Regression Many slides attributable to: Prof. Mike Hughes Erik - - PowerPoint PPT Presentation
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL
1
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many slides attributable to: Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books)
2
Mike Hughes - Tufts COMP 135 - Spring 2019
3
Mike Hughes - Tufts COMP 135 - Spring 2019
4
Mike Hughes - Tufts COMP 135 - Spring 2019
Supervised Learning Unsupervised Learning Reinforcement Learning
Data, Label Pairs Performance measure Task data x label y
{xn, yn}N
n=1
Training Prediction Evaluation
5
Mike Hughes - Tufts COMP 135 - Spring 2019
Supervised Learning Unsupervised Learning Reinforcement Learning
regression
is a numeric variable e.g. sales in $$
6
Mike Hughes - Tufts COMP 135 - Spring 2019
Supervised Learning Unsupervised Learning Reinforcement Learning
regression
7
Mike Hughes - Tufts COMP 135 - Spring 2019
8
Mike Hughes - Tufts COMP 135 - Spring 2019
9
Mike Hughes - Tufts COMP 135 - Spring 2019 What should happen here? What info did you use to make that guess?
Goal: Predict response y well given features x
10
Mike Hughes - Tufts COMP 135 - Spring 2019
Entries can be real-valued, or other numeric types (e.g. integer, binary) Scalar value like 3.1 or -133.7
“features” “covariates” “predictors” “attributes” “responses” “labels”
11
Mike Hughes - Tufts COMP 135 - Spring 2019
>>> # Given: pretrained regression object model >>> # Given: 2D array of features x
>>> x_NF.shape (N, F) >>> yhat_N = model.predict(x_NF) >>> yhat_N.shape (N,)
Goal: Given a labeled dataset, learn a function that can perform prediction well
12
Mike Hughes - Tufts COMP 135 - Spring 2019
n=1
13
Mike Hughes - Tufts COMP 135 - Spring 2019
>>> # Given: 2D array of features x >>> # Given: 1D array of responses/labels y
>>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = RegressionModel() >>> model.fit(x_NF, y_N)
Goal: Assess quality of predictions
14
Mike Hughes - Tufts COMP 135 - Spring 2019
n=1
15
Mike Hughes - Tufts COMP 135 - Spring 2019
16
Mike Hughes - Tufts COMP 135 - Spring 2019
N
n=1
N
n=1
derivatives of?
17
Mike Hughes - Tufts COMP 135 - Spring 2019
18
Mike Hughes - Tufts COMP 135 - Spring 2019
https://scikit-learn.org/stable/modules/model_evaluation.html
19
Mike Hughes - Tufts COMP 135 - Spring 2019
20
Mike Hughes - Tufts COMP 135 - Spring 2019
21
Mike Hughes - Tufts COMP 135 - Spring 2019
22
Mike Hughes - Tufts COMP 135 - Spring 2019
23
Mike Hughes - Tufts COMP 135 - Spring 2019
24
Mike Hughes - Tufts COMP 135 - Spring 2019
25
Mike Hughes - Tufts COMP 135 - Spring 2019
Each row represents one example Assume rows are arranged “uniformly at random” (order doesn’t matter)
26
Mike Hughes - Tufts COMP 135 - Spring 2019
27
Mike Hughes - Tufts COMP 135 - Spring 2019
Overfitting Underfitting
28
Mike Hughes - Tufts COMP 135 - Spring 2019
Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error
x y
train test
29
Mike Hughes - Tufts COMP 135 - Spring 2019
Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error
Problems
unseen data
x y
train test
30
Mike Hughes - Tufts COMP 135 - Spring 2019
Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set
train test validation
x y
31
Mike Hughes - Tufts COMP 135 - Spring 2019
Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set
train test validation
x y
Concerns
Parameters: Prediction: Training: find weights and bias that minimize error
32
Mike Hughes - Tufts COMP 135 - Spring 2019
ˆ y(xi) ,
F
X
f=1
wfxif + b
weight vector bias scalar
33
Mike Hughes - Tufts COMP 135 - Spring 2019
34
Mike Hughes - Tufts COMP 135 - Spring 2019
w,b N
n=1
Optimization problem: “Least Squares”
35
Mike Hughes - Tufts COMP 135 - Spring 2019
w,b N
n=1
Optimization problem: “Least Squares”
Exact formula for optimal values of w, b exist! With only one feature (F=1):
w = PN
n=1(xn − ¯
x)(yn − ¯ y) PN
n=1(xn − ¯
x)2
¯ x = mean(x1, . . . xN) ¯ y = mean(y1, . . . yN)
We will derive these in next class
36
Mike Hughes - Tufts COMP 135 - Spring 2019
w,b N
n=1
Optimization problem: “Least Squares”
Exact formula for optimal values of w, b exist! With many features (F >= 1):
We will derive these in next class
˜ X = x11 . . . x1F 1 x21 . . . x2F 1 . . . xN1 . . . xNF 1
37
Mike Hughes - Tufts COMP 135 - Spring 2019
Parameters: none Prediction:
Training: none needed (use training data as lookup table)
38
Mike Hughes - Tufts COMP 135 - Spring 2019
dist(x, x0) = v u u t
F
X
f=1
(xf − x0
f)2
dist(x, x0) =
F
X
f=1
|xf − x0
f|
39
Mike Hughes - Tufts COMP 135 - Spring 2019
40
Mike Hughes - Tufts COMP 135 - Spring 2019
Parameters: K : number of neighbors Prediction:
Training: none needed (use training data as lookup table)
41
Mike Hughes - Tufts COMP 135 - Spring 2019 Credit: Fig 2.4 ESL textbook
42
Mike Hughes - Tufts COMP 135 - Spring 2019
43
Mike Hughes - Tufts COMP 135 - Spring 2019
44
Mike Hughes - Tufts COMP 135 - Spring 2019
45
Mike Hughes - Tufts COMP 135 - Spring 2019
Parameters:
Prediction assumption:
Training assumption:
46
Mike Hughes - Tufts COMP 135 - Spring 2019
R1,...RJ J
j=1
n:xn∈Rj
Search space is too big! Hard to solve exactly…
47
Mike Hughes - Tufts COMP 135 - Spring 2019
Stop when:
Given a big region, find best binary split into two subregions
j,s,ˆ yR1,ˆ yR2
48
Mike Hughes - Tufts COMP 135 - Spring 2019
49
Mike Hughes - Tufts COMP 135 - Spring 2019
Function class flexibility Knobs to tune Interpret? Linear Regression Linear Include bias? Penalize weights (more next week) Inspect weights Decision Tree Regression Axis-aligned Piecewise constant
Goal criteria Inspect tree K Nearest Neighbors Regression Piecewise constant Number of Neighbors Distance metric How neighbors vote Inspect neighbors
We are studying data with ~5 features Which method’s predictions change most across versions of feature representation?
50
Mike Hughes - Tufts COMP 135 - Spring 2019
We are studying data with ~3 features Which method’s predictions change most across versions of feature representation?
51
Mike Hughes - Tufts COMP 135 - Spring 2019
52
Mike Hughes - Tufts COMP 135 - Spring 2019
be integrated at training
not always the right thing to do