CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring - - PowerPoint PPT Presentation
CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring - - PowerPoint PPT Presentation
CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP: DATA SCIENCE solving problems with data scientific or collect & clean & use data data business understand format to create
RECAP: DATA SCIENCE
2
…solving problems with data…
collect & understand data clean & format data data problem use data to create solution scientific or business problem
…which step is most exciting?
Machine Learning
RECAP: ML
- data: anything you can measure or record
- model: specifica9on of a (mathema9cal)
rela+onship between different variables
- evalua*on: how well does the model
work?
3
…creating and using models that learn from data…
RECAP: ML WORKFLOW
- Training phase, test phase, and evaluation phase
à turn to your neighbor
- by taking turns, explain what happens in the
- training phase
- test phase
- evaluation phase
- carefully define what kinds of data are used in each phase
4
data
- utput
program data
- utput
ground truth performance measure
PROPERTY SALES DATA
Goal: predict how much my house is worth
- features (input variables)
size (in sq. ft):
- numeric
- categorical
- binary
neighborhood:
- numeric
- categorical
- binary
# bed rooms:
- numeric
- categorical
- binary
# bath rooms:
- numeric
- categorical
- binary
pool
- numeric
- categorical
- binary
age (in years):
- numeric
- categorical
- binary
renovated
- numeric
- categorical
- binary
- house price = target variable
- numeric
- categorical
- binary
5
How can this data help?
PREDICTING HOUSE PRICES
- target (house price) is a real number
6
How much is my house worth? Look at Zillow!
LINEAR REGRESSION MODEL
7
TRAINING: MINIMIZE ERROR
8
PDSH p391 Linear Regression
math & statistics
PREDICTION: USE MODEL
9
PDSH p391 Linear Regression
HOW ABOUT MORE COMPLEX MODELS?
10
PDSH p393 Linear Regression
Error on training set: linear model >> quadratic >> 6-order polynomial ß error is zero! Is the model with zero (training) error the best?
EVALUATION FOR REGRESSION
- Training Error vs. Test Error
- Error measures:
- RMSE: root mean squared error
- MAE: mean absolute error
11
RMSE % &, &() =
+ , - .
(%
- 0. − 0.)3
MAE % &, &() = +
, - .
|%
- 0. − 0.|
% & = 6(7()) predictions for test data
MACHINE LEARNING WORKFLOW
- Training Phase, Test Phase, Evaluation Phase
12
SUMMARY & READING
- Learning from Data requires a lot of math!
- Regression models are used to predict real valued targets.
- We need a test set to evaluate how well our model
generalizes.
13
- DSFS
- Ch11: ML (p142-144)
- Ch14: Simple Linear Regression (p173-176)
- PDSH Ch5: ML – Linear Regression (p390-394)
- LINEAR REGRESSION BY HAND
https://www.wired.com/2011/01/linear-regression-by-hand/
SciKit Learn
understand the model use the model in practice