CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring - - PowerPoint PPT Presentation

cse217 introduction to data science lecture 4 regression
SMART_READER_LITE
LIVE PREVIEW

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring - - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP: DATA SCIENCE solving problems with data scientific or collect & clean & use data data business understand format to create


slide-1
SLIDE 1

CSE217 INTRODUCTION TO DATA SCIENCE

Spring 2019 Marion Neumann

LECTURE 4: REGRESSION

slide-2
SLIDE 2

RECAP: DATA SCIENCE

2

…solving problems with data…

collect & understand data clean & format data data problem use data to create solution scientific or business problem

…which step is most exciting?

Machine Learning

slide-3
SLIDE 3

RECAP: ML

  • data: anything you can measure or record
  • model: specifica9on of a (mathema9cal)

rela+onship between different variables

  • evalua*on: how well does the model

work?

3

…creating and using models that learn from data…

slide-4
SLIDE 4

RECAP: ML WORKFLOW

  • Training phase, test phase, and evaluation phase

à turn to your neighbor

  • by taking turns, explain what happens in the
  • training phase
  • test phase
  • evaluation phase
  • carefully define what kinds of data are used in each phase

4

data

  • utput

program data

  • utput

ground truth performance measure

slide-5
SLIDE 5

PROPERTY SALES DATA

Goal: predict how much my house is worth

  • features (input variables)

size (in sq. ft):

  • numeric
  • categorical
  • binary

neighborhood:

  • numeric
  • categorical
  • binary

# bed rooms:

  • numeric
  • categorical
  • binary

# bath rooms:

  • numeric
  • categorical
  • binary

pool

  • numeric
  • categorical
  • binary

age (in years):

  • numeric
  • categorical
  • binary

renovated

  • numeric
  • categorical
  • binary
  • house price = target variable
  • numeric
  • categorical
  • binary

5

How can this data help?

slide-6
SLIDE 6

PREDICTING HOUSE PRICES

  • target (house price) is a real number

6

How much is my house worth? Look at Zillow!

slide-7
SLIDE 7

LINEAR REGRESSION MODEL

7

slide-8
SLIDE 8

TRAINING: MINIMIZE ERROR

8

PDSH p391 Linear Regression

math & statistics

slide-9
SLIDE 9

PREDICTION: USE MODEL

9

PDSH p391 Linear Regression

slide-10
SLIDE 10

HOW ABOUT MORE COMPLEX MODELS?

10

PDSH p393 Linear Regression

Error on training set: linear model >> quadratic >> 6-order polynomial ß error is zero! Is the model with zero (training) error the best?

slide-11
SLIDE 11

EVALUATION FOR REGRESSION

  • Training Error vs. Test Error
  • Error measures:
  • RMSE: root mean squared error
  • MAE: mean absolute error

11

RMSE % &, &() =

+ , - .

(%

  • 0. − 0.)3

MAE % &, &() = +

, - .

|%

  • 0. − 0.|

% & = 6(7()) predictions for test data

slide-12
SLIDE 12

MACHINE LEARNING WORKFLOW

  • Training Phase, Test Phase, Evaluation Phase

12

slide-13
SLIDE 13

SUMMARY & READING

  • Learning from Data requires a lot of math!
  • Regression models are used to predict real valued targets.
  • We need a test set to evaluate how well our model

generalizes.

13

  • DSFS
  • Ch11: ML (p142-144)
  • Ch14: Simple Linear Regression (p173-176)
  • PDSH Ch5: ML – Linear Regression (p390-394)
  • LINEAR REGRESSION BY HAND

https://www.wired.com/2011/01/linear-regression-by-hand/

SciKit Learn

understand the model use the model in practice