Quadratic Models We extended the additive model in two variables to - - PowerPoint PPT Presentation

quadratic models
SMART_READER_LITE
LIVE PREVIEW

Quadratic Models We extended the additive model in two variables to - - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we


slide-1
SLIDE 1

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Quadratic Models

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic model by adding a second term to the equation: E(Y ) = β0 + β1x + β2x2. This a special case of the two-variable model E(Y ) = β0 + β1x1 + β2x2 with x1 = x and x2 = x2.

1 / 16 Multiple Linear Regression Quadratic Models

slide-2
SLIDE 2

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example: immune system and exercise x = maximal oxygen uptake (VO2 max, mL/(kg · min)); y = immunoglobulin level (IgG, mg/dL); data for 30 subjects (AEROBIC.txt). Get the data and plot them:

aerobic <- read.table("Text/Exercises&Examples/AEROBIC.txt", header = TRUE) plot(aerobic[, c("MAXOXY", "IGG")])

Slight curvature suggests a linear model may not fit.

2 / 16 Multiple Linear Regression Quadratic Models

slide-3
SLIDE 3

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Check the linear model:

plot(lm(IGG ~ MAXOXY, aerobic))

Graph of residuals against fitted values shows definite curvature. Fit and summarize the quadratic model:

aerobicLm <- lm(IGG ~ MAXOXY + I(MAXOXY^2), aerobic) summary(aerobicLm)

3 / 16 Multiple Linear Regression Quadratic Models

slide-4
SLIDE 4

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Output

Call: lm(formula = IGG ~ MAXOXY + I(MAXOXY^2), data = aerobic) Residuals: Min 1Q Median 3Q Max

  • 185.375
  • 82.129

1.047 66.007 227.377 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1464.4042 411.4012

  • 3.560

0.00140 ** MAXOXY 88.3071 16.4735 5.361 1.16e-05 *** I(MAXOXY^2)

  • 0.5362

0.1582

  • 3.390

0.00217 **

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 106.4 on 27 degrees of freedom Multiple R-squared: 0.9377, Adjusted R-squared: 0.9331 F-statistic: 203.2 on 2 and 27 DF, p-value: < 2.2e-16

4 / 16 Multiple Linear Regression Quadratic Models

slide-5
SLIDE 5

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The quadratic term I(MAXOXY^2) is significant, so we reject the null hypothesis that the linear model is acceptable. The quadratic term is negative, which is consistent with the concavity of the curve. The other two t-ratios test irrelevant hypotheses, because the quadratic term is important. Extrapolation: the fitted curve has a maximum at MAXOXY = 88.3071 2 × 0.5362 ≈ 82 and declines for higher MAXOXY, which seems unlikely to represent the real relationship.

5 / 16 Multiple Linear Regression Quadratic Models

slide-6
SLIDE 6

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

An alternative analysis The graph of IGG against log(MAXOXY) is more linear:

with(aerobic, plot(log(MAXOXY), IGG)) aerobicLm2 <- lm(IGG ~ log(MAXOXY), aerobic) summary(aerobicLm2) with(aerobic, plot(MAXOXY, IGG)) with(aerobic, lines(sort(MAXOXY), fitted(aerobicLm)[order(MAXOXY)], col = "blue")) with(aerobic, lines(sort(MAXOXY), fitted(aerobicLm2)[order(MAXOXY)], col = "red"))

The fitted curve continues to increase indefinitely, but with diminishing slope.

6 / 16 Multiple Linear Regression Quadratic Models

slide-7
SLIDE 7

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Output

Call: lm(formula = IGG ~ log(MAXOXY), data = aerobic) Residuals: Min 1Q Median 3Q Max

  • 165.455
  • 88.651
  • 2.395

55.756 218.934 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4885.71 324.33

  • 15.06 5.87e-15 ***

log(MAXOXY) 1653.38 83.07 19.90 < 2e-16 ***

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 107.6 on 28 degrees of freedom Multiple R-squared: 0.934, Adjusted R-squared: 0.9316 F-statistic: 396.1 on 1 and 28 DF, p-value: < 2.2e-16

7 / 16 Multiple Linear Regression Quadratic Models

slide-8
SLIDE 8

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

More Complex Models

Complete second-order model When the first-order model E(Y ) = β0 + β1x1 + β2x2 is inadequate, the interaction model E(Y ) = β0 + β1x1 + β2x2 + β3x1x2 may be better, but sometimes a complete second-order model is needed: E(Y ) = β0 + β1x1 + β2x2 + β3x1x2 + β4x2

1 + β5x2 2

8 / 16 Multiple Linear Regression More Complex Models

slide-9
SLIDE 9

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example: cost of shipping packages Get the data and plot them:

express <- read.table("Text/Exercises&Examples/EXPRESS.txt", header = TRUE) pairs(express)

Fit the complete second-order model and summarize it:

expressLm <- lm(Cost ~ Weight * Distance + I(Weight^2) + I(Distance^2), express) summary(expressLm) plot(expressLm)

9 / 16 Multiple Linear Regression More Complex Models

slide-10
SLIDE 10

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Output

Call: lm(formula = Cost ~ Weight * Distance + I(Weight^2) + I(Distance^2), data = express) Residuals: Min 1Q Median 3Q Max

  • 0.86027 -0.19898 -0.00885

0.16531 0.94396 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.270e-01 7.023e-01 1.178 0.258588 Weight

  • 6.091e-01

1.799e-01

  • 3.386 0.004436 **

Distance 4.021e-03 7.998e-03 0.503 0.622999 I(Weight^2) 8.975e-02 2.021e-02 4.442 0.000558 *** I(Distance^2) 1.507e-05 2.243e-05 0.672 0.512657 Weight:Distance 7.327e-03 6.374e-04 11.495 1.62e-08 ***

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.4428 on 14 degrees of freedom Multiple R-squared: 0.9939, Adjusted R-squared: 0.9918 F-statistic: 458.4 on 5 and 14 DF, p-value: 5.371e-15

10 / 16 Multiple Linear Regression More Complex Models

slide-11
SLIDE 11

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Qualitative Variables A qualitative variable (or factor) is one that indicates membership of different categories. E.g., a person’s gender = male or female: a qualitative variable with two levels, indicating membership of one of two categories. E.g., package type = Fragile, Semifragile, or Durable: three levels, corresponding to three categories.

11 / 16 Multiple Linear Regression More Complex Models

slide-12
SLIDE 12

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We code a qualitative variable using indicator (dummy) variables: Choose one level to use as a base or reference level, say male or Durable. For each other level, create a variable xj =

  • 1

if this item is in this category

  • therwise.

For gender, there is only one other category, so the only indicator variable is x =

  • 1

for a female for a male.

12 / 16 Multiple Linear Regression More Complex Models

slide-13
SLIDE 13

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

For packages, there are two other categories, so the indicator variables are xFragile =

  • 1

for a Fragile package

  • therwise,

xSemifragile =

  • 1

for a Semifragile package

  • therwise,

For any item, at most one of the indicator variables is non-zero, indicating a non-base category; if they are all zero, the item belongs to the base category.

13 / 16 Multiple Linear Regression More Complex Models

slide-14
SLIDE 14

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example: shipment cost of packages, by type. Get the data and plot them:

cargo <- read.table("Text/Exercises&Examples/CARGO.txt", header = TRUE) plot(COST ~ CARGO, cargo)

Fit and summarize the model:

cargoLm <- lm(COST ~ CARGO, cargo) summary(cargoLm)

14 / 16 Multiple Linear Regression More Complex Models

slide-15
SLIDE 15

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Output

Call: lm(formula = COST ~ CARGO, data = cargo) Residuals: Min 1Q Median 3Q Max

  • 2.20
  • 1.80
  • 1.00

1.05 4.24 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.260 1.075 3.032 0.0104 * CARGOFragile 9.740 1.521 6.405 3.38e-05 *** CARGOSemiFrag 5.440 1.521 3.577 0.0038 **

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.404 on 12 degrees of freedom Multiple R-squared: 0.7745, Adjusted R-squared: 0.7369 F-statistic: 20.61 on 2 and 12 DF, p-value: 0.0001315

15 / 16 Multiple Linear Regression More Complex Models

slide-16
SLIDE 16

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Note that the intercept is the fitted value for CARGOFragile = 0 and CARGOSemiFrag = 0; that is, for Durable packages. The coefficients of CARGOFragile and CARGOSemiFrag measure the differences between those categories and Durable. The overall model F-test is the same as the analysis of variance test:

cargoAov <- aov(COST ~ CARGO, cargo) summary(cargoAov)

Output

Df Sum Sq Mean Sq F value Pr(>F) CARGO 2 238.25 119.13 20.61 0.000132 *** Residuals 12 69.37 5.78

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

16 / 16 Multiple Linear Regression More Complex Models