Chapter 2 1 2.1: Inferences about 1 Test of interest throughout - PowerPoint PPT Presentation

Chapter 2 1

2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1 . Idea: If b 1 can be written as a linear combination of the responses (which are independent and normally distributed), then from A.40, we will now have the probability (sampling) distribution of b 1 ! 2

Easy: where: So immediately from (A.40): b 1 ~ Normal (E { b 1 } , Var { b 1 } ) But we still need to find E{ b 1 } and Var{ b 1 } 3

Fun facts about the k i Fun Fact 1. Fun Fact 2. Fun Fact 3. 4

Using the “ fun facts ” , find E{ b 1 } and Var{ b 1 } = ? 5

Major Results b 1 ~ Normal ( β 1 , σ 2 ) So it follows from (A.59) that: 6

Picture of t density functions with various degrees of freedom (df) 0.4 0.3 0.2 f(t) 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 t 7

t distribution • Ratio of a standard normal to the square root of a scaled chi-squared distribution with n degrees of freedom. • n of about 30 is “ close ” to a standard normal but not exactly so • n → ∞ , tends to a normal distribution • n = 1, we get a t with 1 degree of freedom, otherwise known as the _____ distribution. 8

t distribution • Ratio of a standard normal to the square root of a scaled chi-squared distribution with n degrees of freedom. • n of about 30 is “ close ” to a standard normal but not exactly so • n → ∞ , tends to a normal distribution • n = 1, we get a t with 1 degree of freedom, otherwise known as the _____ distribution. 9

Confidence interval for β 1 From the sampling distribution of b 1 : Rearranging inside the brackets: Result: 10

Hypothesis Tests for β 1 Step 1: Null and alternative hypotheses: Step 2: Test statistic: Step 3: Critical Region (or see p -value): 11

S = sqrt (2384) = 48.82 Var( b 1 ) = sqrt (2384/19800) = 0.3470 t = 3.57 / 0.3470 = 10.29 = sqrt (105.8757) Note that 19800 = Var(X) times 24 12

2.3 Considerations for Inferences on β 0 and β 1 • Non-normality of errors – Small departure not too big of a deal – For very large n, asymptotically okay • Confidence coefficients interpretation – The predictor variables are fixed (not random) • X i spacing impacts variance of b 1 • Power computable via non-centrality parameter of a t-distribution 13

Predictions … and their uncertainties • Mean function at given value of the predictor variable (with confidence limits) – For many future lots of size 80, what should the average number of hours be? • Future response at given value of the predictor variable (with prediction limits) – A client dropped off an order of size 80, what do we expect for the number of hours this order will take? 14

2.4--Interval Estimation of E{ Y h } Point estimate for the mean at X = X h : For the interval estimate for the mean at X = X h , we ^ require the sampling distribution of Y h : 1. Distribution? 2. Mean? 15

^ Variance: Show Y h is a linear combination of the responses: ^ Y h is b 0 + b 1 X h ; the b 0 term is y -bar – b 1 x -bar 16

i.e., a good idea to show b 1 and b 2 are linear combinations of the Y i ’s 19

Results: Mean: Variance: Estimated Variance: Result: 20

Major Result Estimated variance of prediction: So: 21

1 – α confidence limits for E{ Y h }: Question: At which value of X h is this confidence interval the smallest? (i.e., where is your estimation most precise?) 22

Predicting a Future Observation at X h ^ We will use Y h , and our prediction error will be: ^ pred = Y h(new) - Y h The variance of our prediction error is: Estimated by: 23

1 – α confidence limits for E{Y h(new) }: where: 24

Example: GPA data for 2000 CH01PR19 25

JMP Pro 11 Analysis 26

Obtain Estimation Interval and Prediction Interval at X h = 25 And do all of the other X h -points while you ’ re at it … . 27

“Easy” with Fit Y by X 29

Y hours = 62.365859 + 3.570202*X lot size Summary of Fit RSquare 0.821533 RSquare Adj 0.813774 Root Mean Square Error 48.82331 Mean of Response 312.28 Observations (or Sum Wgts) 25 S = sqrt (2384) = Analysis of Variance 48.82 Sum of Source DF Squares Mean Square F Ratio St. Dev.( b 1 ) = sqrt Model 1 252377.58 252378 105.8757 (2384/19800) = Error 23 54825.46 2384 Prob > F C. Total 24 307203.04 <.0001* 0.3470 Parameter Estimates t = 3.57 / 0.3470 = Term Estimate Std Error t Ratio Prob>|t| 10.29 Intercept 62.365859 26.17743 2.38 0.0259* X lot size 3.570202 0.346972 10.29 <.0001* = sqrt (105.8757) Note that 19800 = Var(X) times 24 30

• Lot size, variance is 825. • 825*24 = 19,800 31

Variance 825 times 24 = 19,800 32

Partitioning total sums of squares 33

Variation “Explained” by Regression Unexplained variation before reg. Unexplained variation after reg. So what variation was explained by regression? SSR = SSTO - SSE Amazing fact: 34

Aside: Show SSTO = SSR + SSE But: = ? So: 35

Mean Squares 36

ANOVA Table 37

F test Test Statistic: Decision Rule: 38

Result: F = t 2 See text page 71. t is a Normal/sqrt(Chi-sq) so if you square t you get ______ 39

Example: GPA 2000 Data Linear Fit GPA = 2.1140493 + 0.0388271*ACT Summary of Fit RSquare 0.07262 RSquare Adj 0.064761 Root Mean Square Error 0.623125 Mean of Response 3.07405 Observations (or Sum Wgts) 120 Analysis of Variance Sum of Source DF Squares Mean Square F Ratio Model 1 3.587846 3.58785 9.2402 Error 118 45.817608 0.38828 Prob > F C. Total 119 49.405454 0.0029* Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 2.1140493 0.320895 6.59 <.0001* ACT 0.0388271 0.012773 3.04 0.0029* F* = Critical value at .05 level: 40

2.8---General Linear Test Approach Compares any “ full ” and “ reduced ” models and answers the question: Do the additional terms in the full model explain significant additional variation? (Are they needed?) Examples: Full Model Reduced Model 41

General Linear Test Approach How much additional variation is explained by the full model? Result: Amazingly general test H 0 : Reduced model is true (R) H a : Full model is true (F) 42

Example: Test: β 1 = 0 versus β 1 ≠ 0 Full model: Y i = β 0 + β 1 X i + ε i corresponding SSE(F) SSE Reduced model: Y i = β 0 + ε i (model under H 0 ) corresponding SSE(R) SSTO Test statistic is F = MSR/MSE (see pages 72-73) 43

2.9--Measures of Association R 2 : Coefficient of determination Variation explained by the regression is: Total variation is: SSTO What fraction of total variation was explained by the regression? R 2 = SSR/SSTO = 1 - SSE/SSTO (Rsquare) (Over-used, over-rated, possibly misleading statistic!) do not confuse with causation. As a screening tool in model selection, it is helpful. 44

Correlation illustrations 45

R 2 ? 47.0473 Y1 34.4509 47.1475 Y2 32.1399 10.5759 Y3 8.9460 47.0223 Y4 35.0287 18.5112 X 12.5144 34.4509 47.0473 32.1399 47.1475 8.9460 10.5759 35.0287 47.0223 12.5144 18.5112 46

Answers: S = 0.8194 R-Sq = 97.7% R-Sq(adj) = 97.6% S = 4.097 R-Sq = 63.4% R-Sq(adj) = 62.6% S = 0.8194 R-Sq = 0.1% R-Sq(adj) = 0.0% S = 0 R-Sq = 100.0% R-Sq(adj) = 100.0% 47

R 2 ? 0.977 47.0473 Y1 34.4509 0.634 47.1475 Y2 32.1399 0.001 10.5759 Y3 8.9460 0.99+ 47.0223 Y4 35.0287 18.5112 X 12.5144 34.4509 47.0473 32.1399 47.1475 8.9460 10.5759 35.0287 47.0223 12.5144 18.5112 48

Misunderstandings about R 2 : 1. High R 2 implies precise 38 37 predictions. (Not necessarily!) 36 Y6 35 34 33 32 10 15 20 X1 Regression Plot 2. High R 2 implies good Y5 = -634.823 + 53.5260 X fit. (Not necessarily!) S = 45.7285 R-Sq = 90.6 % R-Sq(adj) = 90.4 % 700 600 500 400 300 Y5 200 100 0 -100 10 15 20 X 49

Misunderstandings about R 2 : 3. Low R 2 implies X and Y not related. (Not necessarily!) Regression Plot GPA = 1.40765 + 0.0635624 ACT S = 0.567359 R-Sq = 14.1 % R-Sq(adj) = 13.5 % 4 Example 1: GPA data! 3 GPA 2 1 15 20 25 30 ACT Regression Plot Example 2: Wrong model: Y7 = 21.2853 - 0.294808 X S = 9.14570 R-Sq = 0.7 % R-Sq(adj) = 0.0 % 50 40 Y7 30 20 10 50 10 15 20 X

Misunderstandings about R 2 : 3. Low R 2 implies X and Y not related. (Not necessarily!) Example 3: Low R 2 may result from truncation 55 45 Y2 35 25 10 15 20 X 51

Coefficient of Correlation r = ± √ R 2 where the sign is given by the sign of b 1 52

Anscombe’s data if time permits now 53

Anscombe’s data if time permits now • Resides in 8 columns in Help/Data Sets/ Regression • Can stack x columns, y columns, add count and fit all 4 at once • Fasten your seatbelt • You should do these fits yourself! 54

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout - PowerPoint PPT Presentation

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1 . Idea: If b 1 can be written as a linear combination of the responses (which are independent and normally

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Chapter 13 Chapter 13 1 What is this? Chapter 13 2 What is this? Chapter 13 3 What is

CHAPTER CHAPTER VII CHAPTER CHAPTER VII VII VII MANAGEMENT AND MANAGEMENT AND

Appendix A Chapter 9 versus Chapter 1 1 at a Glance Chapter 9 Chapter 1 1 ( I n) voluntary Cannot

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Pushdown Automata Chapter 5 Chapter 5 Chapter 5 Chapter 5

Chapter 6 Programme design and development Lets Recap Chapter 2: Chapter 3: Chapter 1:

OWASP London Chapter Meeting 27th July 2017 London Chapter Chapter Leaders: Sam

Constraint Satisfaction Problem s C t i t S ti f ti P bl Reading: Chapter 6 (3 rd ed );

Chapter 3 Chapter 3 Data Description McGraw-Hill, Bluman, 7 th ed, Chapter 3 1 Ch Chapter 3

OWASP London Chapter Meeting 23rd November 2017 London Chapter Chapter Leaders: Sam

A.I.S. Class 22: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

A.I.S. Class 27: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

Chapters for the Final Exam Chapter 20: Electric forces and fields (Conceptual Questions) Chapter

Chapter: 9 9 9 9 Chapter: Chapter: Chapter: High-Speed Downlink High-Speed Downlink Packet

Decision Trees Aarti Singh Machine Learning 10-701/15-781 Oct 6 , 2010 Learning a good

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete

Chapter 6 Inference for categorical data Huamei Dong 03/22/2016 1. Review of hypothesis test

On the Chi square and higher-order Chi distances for approximating f -divergences Frank Nielsen 1

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Statistical Modeling of Loss Vincent Goulet Distributions Using actuar Probability Laws

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout - PowerPoint PPT Presentation

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1 . Idea: If b 1 can be written as a linear combination of the responses (which are independent and normally

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Chapter 13 Chapter 13 1 What is this? Chapter 13 2 What is this? Chapter 13 3 What is

CHAPTER CHAPTER VII CHAPTER CHAPTER VII VII VII MANAGEMENT AND MANAGEMENT AND

Appendix A Chapter 9 versus Chapter 1 1 at a Glance Chapter 9 Chapter 1 1 ( I n) voluntary Cannot

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Pushdown Automata Chapter 5 Chapter 5 Chapter 5 Chapter 5

Chapter 6 Programme design and development Lets Recap Chapter 2: Chapter 3: Chapter 1:

OWASP London Chapter Meeting 27th July 2017 London Chapter Chapter Leaders: Sam

Constraint Satisfaction Problem s C t i t S ti f ti P bl Reading: Chapter 6 (3 rd ed );

Chapter 3 Chapter 3 Data Description McGraw-Hill, Bluman, 7 th ed, Chapter 3 1 Ch Chapter 3

OWASP London Chapter Meeting 23rd November 2017 London Chapter Chapter Leaders: Sam

A.I.S. Class 22: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

A.I.S. Class 27: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

Chapters for the Final Exam Chapter 20: Electric forces and fields (Conceptual Questions) Chapter

Chapter: 9 9 9 9 Chapter: Chapter: Chapter: High-Speed Downlink High-Speed Downlink Packet

Decision Trees Aarti Singh Machine Learning 10-701/15-781 Oct 6 , 2010 Learning a good

Review - Mathematical Tools &amp; Probability Logarithm Fundamentals of Probability Discrete

Chapter 6 Inference for categorical data Huamei Dong 03/22/2016 1. Review of hypothesis test

On the Chi square and higher-order Chi distances for approximating f -divergences Frank Nielsen 1

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Statistical Modeling of Loss Vincent Goulet Distributions Using actuar Probability Laws

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete