stat 113 simple linear regression
play

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015 Outline Prediction Whats a Good Prediction? Linear Prediction Equation Prediction Error Regression to the Mean Prediction Correlations give us a


  1. STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015

  2. Outline Prediction What’s a Good Prediction? Linear Prediction Equation Prediction Error Regression to the Mean

  3. Prediction ◮ Correlations give us a description of the relationship between two numeric variables. ◮ However, when two variables are related, we can go further and use knowledge of one to make predictions about the other. ◮ Examples: ◮ Use SAT scores to predict college GPA ◮ Use economic indicators to predict stock prices ◮ Use credit score to predict probability of default on a loan ◮ Use biomarkers to predict disease progression ◮ What else?

  4. What’s a Good Prediction? 11 ● 10 ● ◮ Suppose I have this 9 ● data. ● ● 8 ◮ What would be a good ● Y ● ● 7 prediction if I get a new 6 X value of 12? ● 5 ◮ What about an X value ● ● 4 of 5.5? 4 6 8 10 12 14 X

  5. Modeling relationships with a function ◮ We can capture all of our predictions by writing the y variable as a function of the x variable ◮ Examples: ◮ f ( x ) = x 2 ◮ f ( x ) = 1 . 6 x + 20 ◮ f ( x ) = 5 cos(2 πx )

  6. What’s a Good Prediction? 11 ● 10 ● 9 ● ● ● 8 ● Y ● ● 7 How about this function? 6 ● 5 ● ● 4 4 6 8 10 12 14 X

  7. What’s a Good Prediction? 12 ● 10 ● ● ● ● 8 ● ● ● Y 6 ● ● Or this? ● 4 2 0 4 6 8 10 12 14 X

  8. What’s a Good Prediction? 12 ● 10 ● ◮ What about this? ● ● ● 8 ● ● ◮ There’s a tradeoff ● Y 6 ● between how well we ● ● 4 can fit the data and how 2 simple our model (i.e., 0 prediction function) is. 4 6 8 10 12 14 X

  9. What’s a Good Prediction? 12 ● ◮ Pretty much the 10 ● ● ● ● simplest model we can 8 ● ● ● have is a straight line. Y 6 ● ● ◮ Two things determine ● 4 what line we have: 2 ◮ The intercept 0 ◮ The slope 4 6 8 10 12 14 X

  10. Intercept Slope Form ◮ The intercept and slope are the parameters of our regression model. ◮ The general equation for a line is: f ( x ) = a + bx ◮ In statistics notation, we write ˆ y (“y hat”) to represent a predicted (or fitted) value. ◮ Given a value x i , we predict using: a + ˆ y = ˆ ˆ bx i

  11. Hat Notation Figure: Source: brownsharpie.com

  12. Systematic vs. Random ◮ We can split up each y value into two parts: a systematic (predictable) part and a “random” part. ◮ That is, we can write, for the y coordinate of the i th data point: y i = ˆ y i + Error i

  13. What’s a Good Prediction? ● 10 ● ● ● ● 8 ● ● ● Y 6 ● Every line will have a differ- ● ● 4 ent set of errors associated 2 with it. 0 4 6 8 10 12 14 X

  14. What’s a Good Prediction? ● 10 ● ● ● ● 8 ● ● ● Y 6 ● Every line will have a differ- ● ● 4 ent set of errors associated 2 with it. 0 4 6 8 10 12 14 X

  15. What’s a Good Prediction? ◮ Every line will have a ● 10 ● different set of errors ● ● ● 8 ● ● associated with it. ● Y 6 ● ● ◮ Which is best? ● 4 ◮ Intuitively, we want to 2 minimize the overall 0 “distance” between the 4 6 8 10 12 14 line and the points. X

  16. The Prediction Equation Prediction Function a + ˆ ˆ = ˆ y i b 1 x i a and ˆ Pick ˆ b that minimize the total distance. This is a calculus problem that the computer solves for us.

  17. Regression Example 74 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ◮ The “father of Child's Adult Height (in.) ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● regression”, Francis ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 70 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● Galton, looked at ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● parents’ and children’s ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 66 ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● heights. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ◮ Here’s his data, with ● ● ● ● ● 62 ● ● ● ● ● ● ● ● ● the associated 62 66 70 74 regression line. Mid−parent Height (in.)

  18. Example: Batting Average in Successive Seasons

  19. Figure: Source: https://www.washingtonpost.com/opinions/why-our-childrens-future- no-longer-looks-so-bright/2011/10/14/gIQAofzlpL_story.html

  20. “This fall, Lafley will step down for the second time, and no one will be mentioning Steve Jobs’s legendary return to Apple. Lafley hasn’t been bad – he slimmed the company down, selling off parts and getting out of less profitable businesses – but there’s been no dramatic turnaround. ... In other words, he’s been just O.K. How could someone who, according to Fortune, was known as “an all-time C.E.O. hero” end up being just O.K.? Well, if commentators had looked at the track record of returning C.E.O.s – boomerang C.E.O.s, as they’re sometimes called – that’s precisely what they’d have predicted. A 2014 study found that profitability at companies run by boomerang C.E.O.s fell slightly, and an earlier study detected no significant difference in long-term performance between firms that reappointed a former C.E.O. and ones that hired someone new.” Figure: Source: http://www.newyorker.com/magazine/2015/09/21/the-comeback- conundrum

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend