Chapter 12: The Regression Line We already know that the regression - - PowerPoint PPT Presentation

chapter 12 the regression line
SMART_READER_LITE
LIVE PREVIEW

Chapter 12: The Regression Line We already know that the regression - - PowerPoint PPT Presentation

Chapter 12: The Regression Line We already know that the regression line goes through the point of ____________ and has slope = _________. The equation for the line is Y = intercept + (slope) X where slope = and intercept = ave Y


slide-1
SLIDE 1

Chapter 12: The Regression Line

We already know that the regression line goes through the point of ____________ and has slope = _________. The equation for the line is Y = intercept + (slope) X where slope = and intercept = aveY – (slope)(aveX)

slide-2
SLIDE 2

Slope and intercept

slide-3
SLIDE 3

The equation can be used to get a prediction by putting in the value for X and getting out the predicted value for Y. The regression equation: Y = intercept + (slope) X

Put in X Get out Y

slide-4
SLIDE 4

Example 1: Midterm: ave = 65 SD = 16 r = 0.7 Final: ave = 60 SD = 10 Find the equation of the regression line for estimating final exam score from midterm score. Estimate the final exam score for someone who got 50 on the midterm.

slide-5
SLIDE 5

Example 2: For the men aged 18-24 in the HANES sample, the relationship between height and systolic blood pressure can be summarized as follows: Average height ≈ 70”, SD ≈ 3” Average b.p. ≈ 124mm, SD ≈ 14mm r = -0.2 Find the equation of the line for estimating blood pressure. Predict the blood pressure of a man who is 68” tall.

slide-6
SLIDE 6

Example 3: California men, aged 25-29 in 2005 Education (years) ave = 12.5 SD = 3 Income ave = $30,000 SD = $24,000 r = 0.25 Find the equation of the line for estimating income from education. Estimate the income of a California man with 4 years of education.

slide-7
SLIDE 7

California men

slide-8
SLIDE 8

Example 4: California women, aged 25-29 in 2005 Education (years) ave = 13 SD = 3.4 Income ave = $18,000 SD = $20,000 r = 0.37 Find the equation of the line for estimating income from education. Estimate the income of a California woman with 4 years of education.

slide-9
SLIDE 9

California women

slide-10
SLIDE 10

Caution!

For an observational study, the regression line describes the data that you see, but it can NOT be relied on for predicting the results of

  • INTERVENTIONS. In other words, we can not

treat it as a causal relationship.

e.g. For California women, the slope says that ASSOCIATED with each year of education, there is an increase of $2,176 more in income, on

  • average. Going to school for an extra year will not necessarily CAUSE

an increase of $2,176 in income. Those who have a 4-year degree earn, on average, ___________ more than those who only completed high school, but getting a degree will not necessarily cause someone’s salary to increase. WHY NOT? CONFOUNDING FACTORS!

slide-11
SLIDE 11

Notes

We can’t rely on the slope to tell us how y will respond if the investigator changes x unless it is a controlled experiment. In an observational study there are too many confounding factors. Sometimes the intercept will not make sense. For example, it might be negative when we would expect it to be zero or positive. Never use regression to predict outside the range

  • f your data!
slide-12
SLIDE 12

“Least Squares”

  • Among all lines, the one with the smallest r.m.s.

error is the regression line.

  • We call the regression line the “least squares

regression line”.

slide-13
SLIDE 13

A good regression example: Hooke’s Law

Hang a weight on a spring and measure the length

  • f the spring. Hooke said the stretch is proportional

to the load. Doubling the load doubles the stretch.

slide-14
SLIDE 14

Hooke’s law

Slope = .05 cm per kg Intercept = 439.01 cm

slide-15
SLIDE 15

Hooke’s law:

length = mx + b We found slope = .05 and intercept = 439.01, so

  • we estimate m by .05
  • we estimate b by 439.01
slide-16
SLIDE 16

A bad regression example

Measure the area and perimeter of these rectangles

slide-17
SLIDE 17

The correlation between area and perimeter is r = 0.98! The scatter diagram:

slide-18
SLIDE 18

Calculations show r = 0.98 slope = 1.6 intercept = -10.51 area = -10.51 + 1.6(perimeter) ? RIDICULOUS! Regression will not FIND an appropriate model – you have to do the THINKING. Don’t substitute statistics for science!