coded variables
play

Coded variables Some variables can be represented on different - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Coded variables Some variables can be represented on different scales. E.g., temperature in degrees Celsius or Fahrenheit. Suppose some response


  1. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Coded variables Some variables can be represented on different scales. E.g., temperature in degrees Celsius or Fahrenheit. Suppose some response Y is modeled as a linear function of temperature: E ( Y ) = β 0 + β 1 x , with x = temperature in degrees Fahrenheit. 1 / 23 Principles of Model Building Coding Independent Variables

  2. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II If x ∗ = temperature in degrees Celsius, then x = 32 + 1 . 8 x ∗ . So E ( Y ) = β 0 + β 1 (32 + 1 . 8 x ∗ ) = ( β 0 + 32 β 1 ) + (1 . 8 β 1 ) x ∗ = β ∗ 0 + β ∗ 1 x ∗ , where β ∗ 0 = β 0 + 32 β 1 and β ∗ 1 = 1 . 8 β 1 . 2 / 23 Principles of Model Building Coding Independent Variables

  3. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II So if Y is linearly related to x , then it is also linearly related to x ∗ , with different coefficients β ∗ 0 and β ∗ 1 . We sometimes code variables to make an equation more easily interpreted. When a variable takes only two distinct values, we often code them as − 1 and +1. E.g., if x is temperature with levels 80 ◦ F and 100 ◦ F, and x ∗ = ( x − 90) / 10 , then x ∗ = − 1 when x = 80, and x ∗ = 1 when X = 100. 3 / 23 Principles of Model Building Coding Independent Variables

  4. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II A variable with three levels can similarly be coded as − 1, 0, and +1, provided the three levels are equally spaced. The interpretation of the corresponding coefficient β ∗ is, as always, the change in E ( Y ) when x ∗ changes by 1, with all other variables fixed. But with a variable coded like this, a change of 1 in x ∗ means moving, say, from the midpoint value to the high value. The corresponding change in E ( Y ) is often called the effect of the variable. 4 / 23 Principles of Model Building Coding Independent Variables

  5. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II When a variable takes more than two or three values, it is sometimes standardized : i = u i = x i − ¯ x x ∗ . s x All coefficients are then in the units of Y , so they can be compared numerically. If Y is also standardized, the coefficients are dimensionless. These are called standardized regression coefficients, and are widely used in some fields. Despite what the text says, standardization has no effect on computational errors, with modern algorithms. 5 / 23 Principles of Model Building Coding Independent Variables

  6. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Models with One Qualitative Variable Recall: a qualitative variable with l levels is represented by ( l − 1) indicator (or dummy) variables. For a chosen reference level, all the indicator variables are 0; For each other level, the corresponding indicator variable is 1, and the others are 0. 6 / 23 Principles of Model Building Models with One Qualitative Variable

  7. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example Per-user software maintenance cost, by state (sample of 10 users per state). path <- file.path("Text", "Exercises&Examples", "BIDMAINT.txt") maint <- read.table(path, header = TRUE) plot(COST ~ STATE, maint) summary(lm(COST ~ STATE, maint)) Call: lm(formula = COST ~ STATE, data = maint) Residuals: Min 1Q Median 3Q Max -299.80 -95.83 -37.90 153.32 295.20 7 / 23 Principles of Model Building Models with One Qualitative Variable

  8. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 279.60 53.43 5.233 1.63e-05 *** STATEKentucky 80.30 75.56 1.063 0.2973 STATETexas 198.20 75.56 2.623 0.0141 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 168.9 on 27 degrees of freedom Multiple R-squared: 0.205, Adjusted R-squared: 0.1462 F-statistic: 3.482 on 2 and 27 DF, p-value: 0.04515 8 / 23 Principles of Model Building Models with One Qualitative Variable

  9. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The fitted equation is E ( Y ) = 279 . 6 + 80 . 3 x 1 + 198 . 2 x 2 where: x 1 = indicator variable for Kentucky, x 2 = indicator variable for Texas. For Kansas, x 1 = x 2 = 0, so E ( Y ) = 279 . 6. That is, the “intercept” is actually the expected value for the reference state, Kansas. 9 / 23 Principles of Model Building Models with One Qualitative Variable

  10. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II For Kentucky, x 1 = 1 and x 2 = 0, so E ( Y ) = 279 . 6 + 80 . 3 = 359 . 9. That is, the coefficient STATEKentucky is the difference between the expected value for Kentucky and the expected value for the reference state. Simlilarly, the coefficient STATETexas is the difference between the expected value for Texas and the expected value for the reference state. 10 / 23 Principles of Model Building Models with One Qualitative Variable

  11. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II In R, the default reference level is the first in alphabetic order. The default can be overridden using the factor() function. Often these differences themselves are of no special interest, and the focus is on testing whether there are any differences: H 0 : β 1 = β 2 = · · · = β l = 0. The value of the F -statistic is unaffected by the choice of reference level. 11 / 23 Principles of Model Building Models with One Qualitative Variable

  12. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Two Qualitative Variables E.g., two brands of diesel engine and three types of fuel. path <- file.path("Text", "Exercises&Examples", "DIESEL.txt") diesel <- read.table(path, header = TRUE) par(mfrow = c(1, 2)); plot(PERFORM ~ FUEL + BRAND, diesel) Try main-effects model (additive, no interaction): summary(aov(PERFORM ~ FUEL + BRAND, diesel)) Alternative interaction model: summary(aov(PERFORM ~ FUEL * BRAND, diesel)) 12 / 23 Principles of Model Building Models with Two Qualitative Variables

  13. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Graph the interactions: with(diesel, interaction.plot(FUEL, BRAND, PERFORM)) with(diesel, interaction.plot(BRAND, FUEL, PERFORM)) Complicated story: For F1 and F2, effects are additive, with B1 performing better than B2; For F3, B2 performs better than B1. 13 / 23 Principles of Model Building Models with Two Qualitative Variables

  14. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Three or More Qualitative Variables With a response y and independent variables a , b , c , . . . , model might contain: main effects: y ~ a + b + c + ... ; two-way interactions: y ~ a + b + c + a:b + a:c + b:c + ... ; higher-order interactions: y ~ a + b + c + a:b + a:c + b:c + a:b:c + ... ; Often only main effects and low-order interactions are significant. 14 / 23 Principles of Model Building Three or More Qualitative Variables

  15. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II To estimate the highest-order interactions, we need observations for all possible combinations of levels–a factorial design. E.g., 2 × 3 = 6 for the diesel engines. With several variables, all with at least 2 levels, the number of combinations can be large. Sometimes a carefully chosen fraction of all possible combinations is used–a fractional factorial design. 15 / 23 Principles of Model Building Three or More Qualitative Variables

  16. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Models with Both Quantitative and Qualitative Variables Example Diesel engine performance Y , as a function of: engine speed, x 1 ; fuel type, with levels F 1 , F 2 , and F 3 ; take F 1 as the reference level, and x 2 and x 3 as indicators for F 2 and F 3 , respectively. 16 / 23 Principles of Model Building Both Quantitative and Qualitative Variables

  17. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Simple model, ignoring fuel type: second-order model in x 1 : E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 1 . Additive model: include main effects of fuel type: E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 1 + β 3 x 2 + β 4 x 3 . Switching fuel from F 1 to F 2 adds β 3 to the performance Y , independently of engine speed x 1 . Interaction model: E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 1 + β 3 x 2 + β 4 x 3 + β 5 x 1 x 2 + β 6 x 1 x 3 + β 7 x 2 1 x 2 + β 8 x 2 1 x 3 . 17 / 23 Principles of Model Building Both Quantitative and Qualitative Variables

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend