Welcome and Introd u ction SU P E R VISE D L E AR N IN G IN R : R - - PowerPoint PPT Presentation

welcome and introd u ction
SMART_READER_LITE
LIVE PREVIEW

Welcome and Introd u ction SU P E R VISE D L E AR N IN G IN R : R - - PowerPoint PPT Presentation

Welcome and Introd u ction SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Data Scientists , Win Vector LLC What is Regression ? Regression : Predict a n u merical o u tcome (" dependent v ariable ")


slide-1
SLIDE 1

Welcome and Introduction

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Nina Zumel and John Mount

Data Scientists, Win Vector LLC

slide-2
SLIDE 2

SUPERVISED LEARNING IN R: REGRESSION

What is Regression?

Regression: Predict a numerical outcome ("dependent variable") from a set of inputs ("independent variables"). Statistical Sense: Predicting the expected value of the

  • utcome.

Casual Sense: Predicting a numerical outcome, rather than a discrete one.

slide-3
SLIDE 3

SUPERVISED LEARNING IN R: REGRESSION

What is Regression?

How many units will we sell? (Regression) Will this customer buy our product (yes/no)? (Classication) What price will the customer pay for our product? (Regression)

slide-4
SLIDE 4

SUPERVISED LEARNING IN R: REGRESSION

Example: Predict Temperature from Chirp Rate

slide-5
SLIDE 5

SUPERVISED LEARNING IN R: REGRESSION

Predict Temperature from Chirp Rate

slide-6
SLIDE 6

SUPERVISED LEARNING IN R: REGRESSION

Predict Temperature from Chirp Rate

slide-7
SLIDE 7

SUPERVISED LEARNING IN R: REGRESSION

Regression from a Machine Learning Perspective

Scientic mindset: Modeling to understand the data generation process Engineering mindset: *Modeling to predict accurately Machine Learning: Engineering mindset

slide-8
SLIDE 8

Let's practice!

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

slide-9
SLIDE 9

Linear regression - the fundamental method

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Nina Zumel and John Mount

Win-Vector LLC

slide-10
SLIDE 10

SUPERVISED LEARNING IN R: REGRESSION

Linear Regression

y = β + β x + β x + ... y is linearly related to each x

Each x contributes additively to y

1 1 2 2 i i

slide-11
SLIDE 11

SUPERVISED LEARNING IN R: REGRESSION

Linear Regression in R: lm()

cmodel <- lm(temperature ~ chirps_per_sec, data = cricket)

formula: temperature ~ chirps_per_sec data frame: cricket

slide-12
SLIDE 12

SUPERVISED LEARNING IN R: REGRESSION

Formulas

fmla_1 <- temperature ~ chirps_per_sec fmla_2 <- blood_pressure ~ age + weight

LHS: outcome RHS: inputs use + for multiple inputs

fmla_1 <- as.formula("temperature ~ chirps_per_sec")

slide-13
SLIDE 13

SUPERVISED LEARNING IN R: REGRESSION

Looking at the Model

y = β + β x + β x + ...

cmodel Call: lm(formula = temperature ~ chirps_per_sec, data = cricket) Coefficients: (Intercept) chirps_per_sec 25.232 3.291

1 1 2 2

slide-14
SLIDE 14

SUPERVISED LEARNING IN R: REGRESSION

More Information about the Model

summary(cmodel) Call: lm(formula = fmla, data = cricket) Residuals: Min 1Q Median 3Q Max

  • 6.515 -1.971 0.490 2.807 5.001

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 25.2323 10.0601 2.508 0.026183 * chirps_per_sec 3.2911 0.6012 5.475 0.000107 ***

  • Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.829 on 13 degrees of freedom Multiple R-squared: 0.6975, Adjusted R-squared: 0.6742 F-statistic: 29.97 on 1 and 13 DF, p-value: 0.0001067

slide-15
SLIDE 15

SUPERVISED LEARNING IN R: REGRESSION

More Information about the Model

broom::glance(cmodel) sigr::wrapFTest(cmodel)

slide-16
SLIDE 16

Let's practice!

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

slide-17
SLIDE 17

Predicting once you fit a model

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Nina Zumel and John Mount

Win-Vector LLC

slide-18
SLIDE 18

SUPERVISED LEARNING IN R: REGRESSION

Predicting From the Training Data

cricket$prediction <- predict(cmodel) predict() by default returns training data predictions

slide-19
SLIDE 19

SUPERVISED LEARNING IN R: REGRESSION

Looking at the Predictions

ggplot(cricket, aes(x = prediction, y = temperature)) + + geom_point() + + geom_abline(color = "darkblue") + + ggtitle("temperature vs. linear model prediction")

slide-20
SLIDE 20

SUPERVISED LEARNING IN R: REGRESSION

Predicting on New Data

newchirps <- data.frame(chirps_per_sec = 16.5) newchirps$prediction <- predict(cmodel, newdata = newchirps) newchirps chirps_per_sec pred 1 16.5 79.53537

slide-21
SLIDE 21

Let's practice!

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

slide-22
SLIDE 22

Wrapping up linear regression

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION

Nina Zumel and John Mount

Win-Vector, LLC

slide-23
SLIDE 23

SUPERVISED LEARNING IN R: REGRESSION

Pros and Cons of Linear Regression

Pros Easy to t and to apply Concise Less prone to overing

slide-24
SLIDE 24

SUPERVISED LEARNING IN R: REGRESSION

Pros and Cons of Linear Regression

Pros Easy to t and to apply Concise Less prone to overing Interpretable

Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349

slide-25
SLIDE 25

SUPERVISED LEARNING IN R: REGRESSION

Pros and Cons of Linear Regression

Pros Easy to t and to apply Concise Less prone to overing Interpretable Cons Can only express linear and additive relationships

slide-26
SLIDE 26

SUPERVISED LEARNING IN R: REGRESSION

Collinearity

Collinearity -- when input variables are partially correlated.

Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349

slide-27
SLIDE 27

SUPERVISED LEARNING IN R: REGRESSION

Collinearity

Collinearity -- when variables are partially correlated. Coecients might change sign

Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349

slide-28
SLIDE 28

SUPERVISED LEARNING IN R: REGRESSION

Collinearity

Collinearity -- when variables are partially correlated. Coecients might change sign High collinearity: Coecients (or standard errors) look too large Model may be unstable

Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349

slide-29
SLIDE 29

SUPERVISED LEARNING IN R: REGRESSION

Coming Next

Evaluating a regression model Properly training a model

slide-30
SLIDE 30

Let's practice!

SU P E R VISE D L E AR N IN G IN R : R E G R E SSION