STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

stat 113 simple linear regression
SMART_READER_LITE
LIVE PREVIEW

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015 Outline Prediction Whats a Good Prediction? Linear Prediction Equation Prediction Error Regression to the Mean Prediction Correlations give us a


slide-1
SLIDE 1

STAT 113 Simple Linear Regression

Colin Reimer Dawson

Oberlin College

  • Sept. 16, 2015
slide-2
SLIDE 2

Outline

Prediction What’s a Good Prediction? Linear Prediction Equation Prediction Error Regression to the Mean

slide-3
SLIDE 3

Prediction

◮ Correlations give us a description of the relationship

between two numeric variables.

◮ However, when two variables are related, we can go further

and use knowledge of one to make predictions about the

  • ther.

◮ Examples:

◮ Use SAT scores to predict college GPA ◮ Use economic indicators to predict stock prices ◮ Use credit score to predict probability of default on a loan ◮ Use biomarkers to predict disease progression ◮ What else?

slide-4
SLIDE 4

What’s a Good Prediction?

  • 4

6 8 10 12 14 4 5 6 7 8 9 10 11 X Y

◮ Suppose I have this

data.

◮ What would be a good

prediction if I get a new X value of 12?

◮ What about an X value

  • f 5.5?
slide-5
SLIDE 5

Modeling relationships with a function

◮ We can capture all of our predictions by writing the y

variable as a function of the x variable

◮ Examples:

◮ f(x) = x2 ◮ f(x) = 1.6x + 20 ◮ f(x) = 5 cos(2πx)

slide-6
SLIDE 6

What’s a Good Prediction?

  • 4

6 8 10 12 14 4 5 6 7 8 9 10 11 X Y

How about this function?

slide-7
SLIDE 7

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 12 X Y

Or this?

slide-8
SLIDE 8

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 12 X Y

◮ What about this? ◮ There’s a tradeoff

between how well we can fit the data and how simple our model (i.e., prediction function) is.

slide-9
SLIDE 9

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 12 X Y

◮ Pretty much the

simplest model we can have is a straight line.

◮ Two things determine

what line we have:

◮ The intercept ◮ The slope

slide-10
SLIDE 10

Intercept Slope Form

◮ The intercept and slope are the parameters of our

regression model.

◮ The general equation for a line is:

f(x) = a + bx

◮ In statistics notation, we write ˆ

y (“y hat”) to represent a predicted (or fitted) value.

◮ Given a value xi, we predict using:

ˆ y = ˆ a + ˆ bxi

slide-11
SLIDE 11

Hat Notation

Figure: Source: brownsharpie.com

slide-12
SLIDE 12

Systematic vs. Random

◮ We can split up each y value into two parts: a systematic

(predictable) part and a “random” part.

◮ That is, we can write, for the y coordinate of the ith data

point: yi = ˆ yi + Errori

slide-13
SLIDE 13

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 X Y

Every line will have a differ- ent set of errors associated with it.

slide-14
SLIDE 14

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 X Y

Every line will have a differ- ent set of errors associated with it.

slide-15
SLIDE 15

What’s a Good Prediction?

  • ● ●
  • 4

6 8 10 12 14 2 4 6 8 10 X Y

◮ Every line will have a

different set of errors associated with it.

◮ Which is best? ◮ Intuitively, we want to

minimize the overall “distance” between the line and the points.

slide-16
SLIDE 16

The Prediction Equation

Prediction Function

ˆ yi = ˆ a + ˆ b1xi Pick ˆ a and ˆ b that minimize the total distance. This is a calculus problem that the computer solves for us.

slide-17
SLIDE 17

Regression Example

  • 62

66 70 74 62 66 70 74 Mid−parent Height (in.) Child's Adult Height (in.)

◮ The “father of

regression”, Francis Galton, looked at parents’ and children’s heights.

◮ Here’s his data, with

the associated regression line.

slide-18
SLIDE 18

Example: Batting Average in Successive Seasons

slide-19
SLIDE 19

Figure: Source: https://www.washingtonpost.com/opinions/why-our-childrens-future- no-longer-looks-so-bright/2011/10/14/gIQAofzlpL_story.html

slide-20
SLIDE 20

“This fall, Lafley will step down for the second time, and no one will be mentioning Steve Jobs’s legendary return to Apple. Lafley hasn’t been bad – he slimmed the company down, selling off parts and getting out of less profitable businesses – but there’s been no dramatic turnaround. ... In other words, he’s been just O.K. How could someone who, according to Fortune, was known as “an all-time C.E.O. hero” end up being just O.K.? Well, if commentators had looked at the track record of returning C.E.O.s – boomerang C.E.O.s, as they’re sometimes called – that’s precisely what they’d have predicted. A 2014 study found that profitability at companies run by boomerang C.E.O.s fell slightly, and an earlier study detected no significant difference in long-term performance between firms that reappointed a former C.E.O. and ones that hired someone new.”

Figure: Source: http://www.newyorker.com/magazine/2015/09/21/the-comeback- conundrum

slide-21
SLIDE 21

Regression to the Mean

◮ Many variables have a systematic and random part (e.g.,

“Skill” and “Luck”)

◮ If you had a really high score the first time, there’s a good

chance you had high values for both.

◮ If you try again, you would expect your skill to carry over,

but your luck will be average, on average; so your score would go down

◮ Conversely, low scores are likely partly the result of bad

luck, so they should go up as luck reverts to the mean.