Hypothesis Tests Recall the lm() regression output with two - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Hypothesis Tests Recall the lm() regression output with two independent variables: Call: lm(formula = Strength ~ Length + Height, data = wireBond) Residuals: Min 1Q Median 3Q Max -3.865 -1.542 -0.362 1.196 5.841 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.263791 1.060066 2.136 0.044099 * Length 2.744270 0.093524 29.343 < 2e-16 *** Height 0.012528 0.002798 4.477 0.000188 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 1 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Residual standard error: 2.288 on 22 degrees of freedom Multiple R-squared: 0.9811,Adjusted R-squared: 0.9794 F-statistic: 572.2 on 2 and 22 DF, p-value: < 2.2e-16 The statistical model is Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. The first test (“test for significance of regression”) is of the null hypothesis that neither variable has any effect; H 0 : β 1 = β 2 = 0. The test statistic is the F -statistic on the last line, p-value: < 2.2e-16 , which is F-statistic: 572.2 on 2 and 22 DF, large and highly significant: we reject this null hypothesis. 2 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers We now test the individual coefficients, using the t -statistics on the respective rows of the output: H 0 : β 1 = 0, t = 29 . 343; H 0 : β 2 = 0, t = 4 . 477. Both have small P -values, so both null hypotheses are rejected, and we conclude that both predictors are needed in the regression model. 3 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers The second-order model The output for the second-order model in one independent variable is similar: Call: lm(formula = Strength ~ Length + I(Length^2), data = wireBond) Residuals: Min 1Q Median 3Q Max -5.104 -2.092 0.564 1.807 4.870 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.83254 1.49359 5.914 5.96e-06 *** Length 1.74319 0.36951 4.718 0.000105 *** I(Length^2) 0.06090 0.01871 3.255 0.003626 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 4 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Residual standard error: 2.598 on 22 degrees of freedom Multiple R-squared: 0.9757,Adjusted R-squared: 0.9735 F-statistic: 441.2 on 2 and 22 DF, p-value: < 2.2e-16 The statistical model is Y = β 0 + β 1 x 1 + β 2 x 2 1 + ǫ. Again, first test the significance of the overall model, H 0 : β 1 = β 2 = 0; p-value: < 2.2e-16 so we F-statistic: 441.2 on 2 and 22 DF, reject H 0 . 5 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Next, test H 0 : β 2 = 0, using the t -statistic; t = 3 . 255, so again we reject H 0 . Because we have decided that β 2 � = 0, we do not test the significance of β 1 . When the second-order term is needed in the model, we want to include the first-order term even if the coefficient is small. In the second-order model, the null hypothesis β 1 = 0 rarely if ever corresponds to any question of practical interest, so there is no point in testing it. 6 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Confidence Intervals Individual coefficient Confidence intervals for individual coefficients are constructed in the usual way: β j ± t α/ 2 ,ν × estimated standard error of ˆ ˆ β j for a 100(1 − α )% confidence interval, where ν = n − ( k + 1) is the residual degrees of freedom. 7 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Mean response We also need to estimate the mean response for new predictor values x 1 , new , x 2 , new , . . . , x k , new : Y new = ˆ ˆ β 0 + ˆ β 1 x 1 , new + · · · + ˆ β k x k , new . The confidence interval is of course ˆ Y new ± t α/ 2 ,ν × estimated standard error but the calculation is complicated, and best handled by software: wireBondLm1 <- lm(Strength ~ Length + Height, wireBond) predict(wireBondLm1, data.frame(Length = 8, Height = 275), interval = "confidence") 8 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Prediction Interval As before, we may want a prediction interval, for a single new observed response Y new : predict(wireBondLm1, data.frame(Length = 8, Height = 275), interval = "prediction") The prediction interval at the new predictors is always wider than the confidence interval at the same new predictors. 9 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Interpretation Suppose that a new item will be manufactured with semiconductors of die height 275 attached with wires of length 8. We are 95% confident that the mean pull-off strength for items from this process will be between 26.66 and 28.66. If a single prototype is produced and tested, there is a 95% probability that its pull-off strength will be between 22.81 and 32.51. 10 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Coefficient of Determination How well does the model fit the observed data? If we had no predictors, we would predict Y by β 0 , estimated by ¯ y ; the sum of squared residuals would be just the total sum of squares, n � y ) 2 . SS T = ( y i − ¯ i =1 Using the regression model, we predict Y by ˆ Y ; the sum of squared residuals is the residual sum of squares, n � y i ) 2 . SS E = ( y i − ˆ i =1 11 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers The coefficient of determination , or the “fraction of variability explained by the model”, is R 2 = 1 − SS E SS T = SS T − SS E SS T = SS R SS T where SS R = SS T − SS E is the sum of squares for the regression. 12 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers The R 2 values for the two-variable model and the second-order model are 98.11% and 97.57%, respectively. Both explain around 98% of the variability in pull-off strength; the two-variable model explains a little more than the second-order model. We can combine them: summary(lm(Strength ~ Length + I(Length^2) + Height, wireBond)) and R 2 increases to 98.64%. 13 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers A problem R 2 always increases when we add a new predictor to a model. For instance, if we add Height 2 to the model, it is not significant, but R 2 increases to 98.66%. One solution is to use the adjusted R 2 , adj = 1 − SS E / [ n − ( k + 1)] = 1 − MS E R 2 SS T / ( n − 1) MS T which increases only if the new predictor reduces MS E , the residual mean square; however, R 2 adj can be negative. Both R 2 and R 2 adj are reported by most software. 14 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Another problem R 2 adj increases when we add a new predictor to a model, if and only if the t -ratio for the new predictor has | t | > 1. So the new ˆ β may not be significantly different from 0, but still R 2 adj increases. Suppose we change the question: how well will the model predict new observations? Ideally, we collect new data and test the model on them: a validation exercise. 15 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers Cross validation If we have no new data, we cannot carry out a true validation, but we can use cross validation: Leave out the i th observation, and refit the model to the remaining observations; Use the refitted model to predict the left-out response y i , writing ˆ y ( i ) for the prediction; The Prediction Error Sum of Squares is n � 2 . � � PRESS = y i − ˆ y ( i ) i =1 16 / 17 Multiple Linear Regression

ST 370 Probability and Statistics for Engineers A statistic that corresponds to R 2 is P 2 = 1 − PRESS . SS T In R library("qpcR") wireBondLm2 <- lm(Strength ~ Length + I(Length^2) + Height, wireBond) PRESS(wireBondLm2)$P.square wireBondLm3 <- lm(Strength ~ Length + I(Length^2) + Height + I(Height^2), wireBond) PRESS(wireBondLm3)$P.square 17 / 17 Multiple Linear Regression

Hypothesis Tests Recall the lm() regression output with two - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Hypothesis Tests Recall the lm() regression output with two independent variables: Call: lm(formula = Strength ~ Length + Height, data = wireBond) Residuals: Min 1Q Median 3Q Max -3.865

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

6.16.4 Hypothesis tests Prof. Tesler Math 186 Winter 2019 Prof. Tesler 6.16.4 Hypothesis

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

M6S4 - Hypothesis Tests Professor Jarad Niemi STAT 226 - Iowa State University November 1, 2018

Unit 3: Foundations for inference 3. Hypothesis tests GOVT 3990 - Spring 2020 Cornell University

Chapter 5.5: Hypothesis Tests 1. What is a hypothesis test? 2. The elements of a test: null and

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis The Elements of a Test of

Chapter 5.5: Hypothesis Tests 1. What is a hypothesis test? 2. The elements of a test: null and

Hypothesis Tests for Population Means Bernd Schr oder logo1 Bernd Schr oder Louisiana

Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Topics Declaring and Instantiating Arrays Accessing Array Elements Writing Methods

Behavioral Classification of Language Primary Verbal Behaviors Non-Verbal Behavior Mand Tact

ICMP attacks against TCP Fernando Gont UTN/FRH, OpenBSD Project FIRST Technical Colloquium

INF 212/CS 253 Type Systems Instructors: Harry Xu Crista Lopes What is a Data Type? A type

Chapter 9: Testing Hypotheses In this chapter we will cover: 1. Hypothesis tests ( 9.1, 9.2

ECON2228 Notes 10 Christopher F Baum Boston College Economics 20142015 cfb (BC Econ)

Error estimates for the Galerkin finite element approximation for a linear second order hyperbolic

Solving Differential Equations Sanzheng Qiao Department of Computing and Software McMaster

Hypothesis Tests Recall the lm() regression output with two - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Hypothesis Tests Recall the lm() regression output with two independent variables: Call: lm(formula = Strength ~ Length + Height, data = wireBond) Residuals: Min 1Q Median 3Q Max -3.865

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

6.16.4 Hypothesis tests Prof. Tesler Math 186 Winter 2019 Prof. Tesler 6.16.4 Hypothesis

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

M6S4 - Hypothesis Tests Professor Jarad Niemi STAT 226 - Iowa State University November 1, 2018

Unit 3: Foundations for inference 3. Hypothesis tests GOVT 3990 - Spring 2020 Cornell University

Chapter 5.5: Hypothesis Tests 1. What is a hypothesis test? 2. The elements of a test: null and

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis The Elements of a Test of

Chapter 5.5: Hypothesis Tests 1. What is a hypothesis test? 2. The elements of a test: null and

Hypothesis Tests for Population Means Bernd Schr oder logo1 Bernd Schr oder Louisiana

Nonparametric hypothesis tests and permutation tests 1.7 &amp; 2.3. Probability Generating

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Topics Declaring and Instantiating Arrays Accessing Array Elements Writing Methods

Behavioral Classification of Language Primary Verbal Behaviors Non-Verbal Behavior Mand Tact

ICMP attacks against TCP Fernando Gont UTN/FRH, OpenBSD Project FIRST Technical Colloquium

INF 212/CS 253 Type Systems Instructors: Harry Xu Crista Lopes What is a Data Type? A type

Chapter 9: Testing Hypotheses In this chapter we will cover: 1. Hypothesis tests ( 9.1, 9.2

ECON2228 Notes 10 Christopher F Baum Boston College Economics 20142015 cfb (BC Econ)

Error estimates for the Galerkin finite element approximation for a linear second order hyperbolic

Solving Differential Equations Sanzheng Qiao Department of Computing and Software McMaster

Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating