cse217 introduction to data science lecture 4 regression
play

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP: DATA SCIENCE solving problems with data scientific or collect & clean & use data data business understand format to create


  1. CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann

  2. RECAP: DATA SCIENCE …solving problems with data… scientific or collect & clean & use data data business understand format to create problem problem data data solution …which step is most exciting? Machine Learning 2

  3. RECAP: ML …creating and using models that learn from data… • data : anything you can measure or record • model : specifica9on of a (mathema9cal) rela+onship between different variables • evalua*on : how well does the model work ? 3

  4. RECAP: ML WORKFLOW • Training phase , test phase , and evaluation phase ground truth performance data measure data output program output à turn to your neighbor • by taking turns, explain what happens in the • training phase • test phase • evaluation phase • carefully define what kinds of data are used in each phase 4

  5. PROPERTY SALES DATA Goal: predict how much my house is worth features (input variables) • size (in sq. ft): o numeric o categorical o binary neighborhood: o numeric o categorical o binary # bed rooms: o numeric o categorical o binary # bath rooms: o numeric o categorical o binary pool o numeric o categorical o binary age (in years): o numeric o categorical o binary How can renovated o numeric o categorical o binary this data house price = target variable • help? o numeric o categorical o binary 5

  6. PREDICTING HOUSE PRICES • target ( house price ) is a real number How much is my house worth? Look at Zillow ! 6

  7. LINEAR REGRESSION MODEL 7

  8. TRAINING: MINIMIZE ERROR math & statistics PDSH p391 8 Linear Regression

  9. PREDICTION: USE MODEL PDSH p391 9 Linear Regression

  10. HOW ABOUT MORE COMPLEX MODELS? Error on training set : linear model >> quadratic >> 6-order polynomial ß error is zero ! Is the model with zero ( training ) error the best ? PDSH p393 10 Linear Regression

  11. EVALUATION FOR REGRESSION • Training Error vs. Test Error & = 6(7 () ) % predictions for test data • Error measures: • RMSE: root mean squared error + 0 . − 0 . ) 3 RMSE % &, & () = , - (% . • MAE: mean absolute error &, & () = + MAE % , - |% 0 . − 0 . | 11 .

  12. MACHINE LEARNING WORKFLOW • Training Phase, Test Phase, Evaluation Phase 12

  13. SUMMARY & READING • Learning from Data requires a lot of math ! • Regression models are used to predict real valued targets . • We need a test set to evaluate how well our model generalizes . understand the model • DSFS use the model in • Ch11: ML (p142-144) practice • Ch14: Simple Linear Regression (p173-176) • PDSH Ch5: ML – Linear Regression (p390-394) SciKit Learn • LINEAR REGRESSION BY HAND https://www.wired.com/2011/01/linear-regression-by-hand/ 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend