chapter 2
play

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout - PowerPoint PPT Presentation

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1 . Idea: If b 1 can be written as a linear combination of the responses (which are independent and normally


  1. Chapter 2 1

  2. 2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1 . Idea: If b 1 can be written as a linear combination of the responses (which are independent and normally distributed), then from A.40, we will now have the probability (sampling) distribution of b 1 ! 2

  3. Easy: where: So immediately from (A.40): b 1 ~ Normal (E { b 1 } , Var { b 1 } ) But we still need to find E{ b 1 } and Var{ b 1 } 3

  4. Fun facts about the k i Fun Fact 1. Fun Fact 2. Fun Fact 3. 4

  5. Using the “ fun facts ” , find E{ b 1 } and Var{ b 1 } = ? 5

  6. Major Results b 1 ~ Normal ( β 1 , σ 2 ) So it follows from (A.59) that: 6

  7. Picture of t density functions with various degrees of freedom (df) 0.4 0.3 0.2 f(t) 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 t 7

  8. t distribution • Ratio of a standard normal to the square root of a scaled chi-squared distribution with n degrees of freedom. • n of about 30 is “ close ” to a standard normal but not exactly so • n → ∞ , tends to a normal distribution • n = 1, we get a t with 1 degree of freedom, otherwise known as the _____ distribution. 8

  9. t distribution • Ratio of a standard normal to the square root of a scaled chi-squared distribution with n degrees of freedom. • n of about 30 is “ close ” to a standard normal but not exactly so • n → ∞ , tends to a normal distribution • n = 1, we get a t with 1 degree of freedom, otherwise known as the _____ distribution. 9

  10. Confidence interval for β 1 From the sampling distribution of b 1 : Rearranging inside the brackets: Result: 10

  11. Hypothesis Tests for β 1 Step 1: Null and alternative hypotheses: Step 2: Test statistic: Step 3: Critical Region (or see p -value): 11

  12. S = sqrt (2384) = 48.82 Var( b 1 ) = sqrt (2384/19800) = 0.3470 t = 3.57 / 0.3470 = 10.29 = sqrt (105.8757) Note that 19800 = Var(X) times 24 12

  13. 2.3 Considerations for Inferences on β 0 and β 1 • Non-normality of errors – Small departure not too big of a deal – For very large n, asymptotically okay • Confidence coefficients interpretation – The predictor variables are fixed (not random) • X i spacing impacts variance of b 1 • Power computable via non-centrality parameter of a t-distribution 13

  14. Predictions … and their uncertainties • Mean function at given value of the predictor variable (with confidence limits) – For many future lots of size 80, what should the average number of hours be? • Future response at given value of the predictor variable (with prediction limits) – A client dropped off an order of size 80, what do we expect for the number of hours this order will take? 14

  15. 2.4--Interval Estimation of E{ Y h } Point estimate for the mean at X = X h : For the interval estimate for the mean at X = X h , we ^ require the sampling distribution of Y h : 1. Distribution? 2. Mean? 15

  16. ^ Variance: Show Y h is a linear combination of the responses: ^ Y h is b 0 + b 1 X h ; the b 0 term is y -bar – b 1 x -bar 16

  17. 17

  18. 18

  19. i.e., a good idea to show b 1 and b 2 are linear combinations of the Y i ’s 19

  20. Results: Mean: Variance: Estimated Variance: Result: 20

  21. Major Result Estimated variance of prediction: So: 21

  22. 1 – α confidence limits for E{ Y h }: Question: At which value of X h is this confidence interval the smallest? (i.e., where is your estimation most precise?) 22

  23. Predicting a Future Observation at X h ^ We will use Y h , and our prediction error will be: ^ pred = Y h(new) - Y h The variance of our prediction error is: Estimated by: 23

  24. 1 – α confidence limits for E{Y h(new) }: where: 24

  25. Example: GPA data for 2000 CH01PR19 25

  26. JMP Pro 11 Analysis 26

  27. Obtain Estimation Interval and Prediction Interval at X h = 25 And do all of the other X h -points while you ’ re at it … . 27

  28. 28

  29. “Easy” with Fit Y by X 29

  30. Y hours = 62.365859 + 3.570202*X lot size Summary of Fit RSquare 0.821533 RSquare Adj 0.813774 Root Mean Square Error 48.82331 Mean of Response 312.28 Observations (or Sum Wgts) 25 S = sqrt (2384) = Analysis of Variance 48.82 Sum of Source DF Squares Mean Square F Ratio St. Dev.( b 1 ) = sqrt Model 1 252377.58 252378 105.8757 (2384/19800) = Error 23 54825.46 2384 Prob > F C. Total 24 307203.04 <.0001* 0.3470 Parameter Estimates t = 3.57 / 0.3470 = Term Estimate Std Error t Ratio Prob>|t| 10.29 Intercept 62.365859 26.17743 2.38 0.0259* X lot size 3.570202 0.346972 10.29 <.0001* = sqrt (105.8757) Note that 19800 = Var(X) times 24 30

  31. • Lot size, variance is 825. • 825*24 = 19,800 31

  32. Variance 825 times 24 = 19,800 32

  33. Partitioning total sums of squares 33

  34. Variation “Explained” by Regression Unexplained variation before reg. Unexplained variation after reg. So what variation was explained by regression? SSR = SSTO - SSE Amazing fact: 34

  35. Aside: Show SSTO = SSR + SSE But: = ? So: 35

  36. Mean Squares 36

  37. ANOVA Table 37

  38. F test Test Statistic: Decision Rule: 38

  39. Result: F = t 2 See text page 71. t is a Normal/sqrt(Chi-sq) so if you square t you get ______ 39

  40. Example: GPA 2000 Data Linear Fit GPA = 2.1140493 + 0.0388271*ACT Summary of Fit RSquare 0.07262 RSquare Adj 0.064761 Root Mean Square Error 0.623125 Mean of Response 3.07405 Observations (or Sum Wgts) 120 Analysis of Variance Sum of Source DF Squares Mean Square F Ratio Model 1 3.587846 3.58785 9.2402 Error 118 45.817608 0.38828 Prob > F C. Total 119 49.405454 0.0029* Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 2.1140493 0.320895 6.59 <.0001* ACT 0.0388271 0.012773 3.04 0.0029* F* = Critical value at .05 level: 40

  41. 2.8---General Linear Test Approach Compares any “ full ” and “ reduced ” models and answers the question: Do the additional terms in the full model explain significant additional variation? (Are they needed?) Examples: Full Model Reduced Model 41

  42. General Linear Test Approach How much additional variation is explained by the full model? Result: Amazingly general test H 0 : Reduced model is true (R) H a : Full model is true (F) 42

  43. Example: Test: β 1 = 0 versus β 1 ≠ 0 Full model: Y i = β 0 + β 1 X i + ε i corresponding SSE(F) SSE Reduced model: Y i = β 0 + ε i (model under H 0 ) corresponding SSE(R) SSTO Test statistic is F = MSR/MSE (see pages 72-73) 43

  44. 2.9--Measures of Association R 2 : Coefficient of determination Variation explained by the regression is: Total variation is: SSTO What fraction of total variation was explained by the regression? R 2 = SSR/SSTO = 1 - SSE/SSTO (Rsquare) (Over-used, over-rated, possibly misleading statistic!) do not confuse with causation. As a screening tool in model selection, it is helpful. 44

  45. Correlation illustrations 45

  46. R 2 ? 47.0473 Y1 34.4509 47.1475 Y2 32.1399 10.5759 Y3 8.9460 47.0223 Y4 35.0287 18.5112 X 12.5144 34.4509 47.0473 32.1399 47.1475 8.9460 10.5759 35.0287 47.0223 12.5144 18.5112 46

  47. Answers: S = 0.8194 R-Sq = 97.7% R-Sq(adj) = 97.6% S = 4.097 R-Sq = 63.4% R-Sq(adj) = 62.6% S = 0.8194 R-Sq = 0.1% R-Sq(adj) = 0.0% S = 0 R-Sq = 100.0% R-Sq(adj) = 100.0% 47

  48. R 2 ? 0.977 47.0473 Y1 34.4509 0.634 47.1475 Y2 32.1399 0.001 10.5759 Y3 8.9460 0.99+ 47.0223 Y4 35.0287 18.5112 X 12.5144 34.4509 47.0473 32.1399 47.1475 8.9460 10.5759 35.0287 47.0223 12.5144 18.5112 48

  49. Misunderstandings about R 2 : 1. High R 2 implies precise 38 37 predictions. (Not necessarily!) 36 Y6 35 34 33 32 10 15 20 X1 Regression Plot 2. High R 2 implies good Y5 = -634.823 + 53.5260 X fit. (Not necessarily!) S = 45.7285 R-Sq = 90.6 % R-Sq(adj) = 90.4 % 700 600 500 400 300 Y5 200 100 0 -100 10 15 20 X 49

  50. Misunderstandings about R 2 : 3. Low R 2 implies X and Y not related. (Not necessarily!) Regression Plot GPA = 1.40765 + 0.0635624 ACT S = 0.567359 R-Sq = 14.1 % R-Sq(adj) = 13.5 % 4 Example 1: GPA data! 3 GPA 2 1 15 20 25 30 ACT Regression Plot Example 2: Wrong model: Y7 = 21.2853 - 0.294808 X S = 9.14570 R-Sq = 0.7 % R-Sq(adj) = 0.0 % 50 40 Y7 30 20 10 50 10 15 20 X

  51. Misunderstandings about R 2 : 3. Low R 2 implies X and Y not related. (Not necessarily!) Example 3: Low R 2 may result from truncation 55 45 Y2 35 25 10 15 20 X 51

  52. Coefficient of Correlation r = ± √ R 2 where the sign is given by the sign of b 1 52

  53. Anscombe’s data if time permits now 53

  54. Anscombe’s data if time permits now • Resides in 8 columns in Help/Data Sets/ Regression • Can stack x columns, y columns, add count and fit all 4 at once • Fasten your seatbelt • You should do these fits yourself! 54

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend