Business Statistics CONTENTS Ordinary least squares (recap for - PowerPoint PPT Presentation

SIMPLE REGRESSION ANALYSIS Business Statistics

CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients The ANOVA table Old exam question Further study

ORDINARY LEAST SQUARES Idea of “curve fitting” in a scatterplot ▪ linear fit: 𝑧 = 𝑏 + 𝑐𝑦 ( 𝑦 =floor area of house, 𝑧 =price of house)

ORDINARY LEAST SQUARES You find the “best” line ▪ by minimizing the “misfit” ( 𝑓 𝑗 ) between observed value ( 𝑧 𝑗 ) and modelled/estimated value ( ෝ 𝑧 𝑗 = 𝑏𝑦 𝑗 + 𝑐 ) The hat (^) is our ▪ 𝑓 𝑗 = 𝑧 𝑗 − ෝ symbol for the 𝑧 𝑗 estimate ▪ in fact by minimizing the sum of squares of misfit 𝑜 ▪ σ 𝑗=1 𝑓 𝑗 2 ▪ OLS regression

STATISTICAL FORMULATION OF THE REGRESSION MODEL Rephrasing the model 𝑧 = 𝑏 + 𝑐𝑦 as a statistical model Assumptions and notation ▪ we assume a linear relation of the form of the population regression model 𝑍 𝑗 = 𝛾 0 + 𝛾 1 𝑌 𝑗 + 𝜁 𝑗 ▪ or 𝑍 = 𝛾 0 + 𝛾 1 𝑌 + 𝜁 We prefer to use 𝛾 0 instead of 𝑏 for the ▪ with constant, and 𝛾 1 instead of 𝑐 for the slope ▪ 𝛾 0 is the intercept or constant ▪ 𝛾 1 the slope or slope coefficient ▪ random variable 𝜁 𝑗 is the error or residual , the “unexplained part”

STATISTICAL FORMULATION OF THE REGRESSION MODEL Estimation of the model coefficients ▪ we assume that 𝜁 𝑗 ~𝑂 0, 𝜏 2 ▪ based on sample of 𝑜 paired data points 𝑦 𝑗 , 𝑧 𝑗 , 𝑗 = 1, … , 𝑜 ▪ use OLS to estimate the best line through the estimated regression model ෠ 𝑍 = 𝑐 0 + 𝑐 1 𝑌 or ෝ 𝑧 𝑗 = 𝑐 0 + 𝑐 1 𝑦 𝑗 ▪ the estimated coefficients ( 𝑐 0 for 𝛾 0 and 𝑐 1 for 𝛾 1 ) and the estimated error ( 𝑓 𝑗 for 𝜁 𝑗 ) corresespond to 𝑧 𝑗 = 𝑐 0 + 𝑐 1 𝑦 𝑗 + 𝑓 𝑗

STATISTICAL FORMULATION OF THE REGRESSION MODEL 𝑦 𝑗 , 𝑧 𝑗 ෠ 𝑍 = 𝑐 0 + 𝑐 1 𝑌 𝑓 𝑗 { 𝑦 𝑗 , ෝ 𝑧 𝑗 𝑐 0 𝑐 1

STATISTICAL FORMULATION OF THE REGRESSION MODEL So ▪ 𝑐 0 is the estimated value of 𝛾 0 ▪ the intercept or constant of the regression line ▪ 𝑐 1 is the estimated value of 𝛾 1 ▪ the slope or slope coefficient of the regression line ▪ 𝑓 𝑗 is the estimated residual or error for observation 𝑗 ▪ the “misfit”

EXERCISE 1 Look back at the house prices ▪ where we have a line found 𝑧 = −264700 + 6152𝑦 a. Give the theoretical model b. Give the estimated model

ASSESSING THE REGRESSION MODEL OLS will always give an estimate for 𝛾 0 and 𝛾 1 ▪ the line of “best fit” But is “best” also “good enough” to make good predictions? ▪ can we do a statistical test on the quality of the model? We have minimized the sum of squares ( 𝑇𝑇 ) of the error 𝑜 𝑜 𝑓 𝑗 2 = ෍ 𝑧 𝑗 2 𝑇𝑇𝐹 = ෍ 𝑧 𝑗 − ෝ 𝑗=1 𝑗=1 We would like to compare this with: “R” stands for “regression” ▪ the “total” sum of squares 𝑇𝑇𝑈 ▪ the “explained” sum of squares 𝑇𝑇𝑆

ASSESSING THE REGRESSION MODEL So 𝑇𝑇𝑈 is the total variation Total sum of squares: around the mean ത 𝑧 𝑜 𝑧 2 𝑇𝑇𝑈 = ෍ 𝑧 𝑗 − ത 𝑗=1

ASSESSING THE REGRESSION MODEL So 𝑇𝑇𝑆 is the variation Regression sum of squares: around the mean ത 𝑧 that is explained by the model 𝑜 𝑧 2 𝑇𝑇𝑆 = ෍ 𝑧 𝑗 − ത ෝ 𝑗=1 ▪ So, ▪ the data has a total variability 𝑇𝑇𝑈 ▪ the regression model explains a variability 𝑇𝑇𝑆 ▪ and the residual variability is 𝑇𝑇𝐹 ▪ and 𝑇𝑇𝑈 = 𝑇𝑇𝑆 + 𝑇𝑇𝐹 Coefficient of determination (“ 𝑆 -square ”): 𝑆 2 = 𝑇𝑇𝑆 𝑇𝑇𝑈 = 1 − 𝑇𝑇𝐹 𝑇𝑇𝑈

ASSESSING THE REGRESSION MODEL 𝑆 2 is a measure of the usefulness of the model ▪ Properties ▪ 0 ≤ 𝑆 2 ≤ 1 ▪ 𝑆 2 = 0 means the model doesn’t explain anything ▪ 𝑆 2 = 1 means the model explains everything ▪ in between, the model explains 𝑆 2 × 100% of the variance of 𝑍 𝑆 2 = 1 − 𝑇𝑇𝐹 𝑇𝑇𝑈

ASSESSING THE REGRESSION MODEL If 𝑆 2 > 0 , the regression model explains “something” ▪ but in a random sample, 𝑆 2 may be non-zero due to chance ▪ when is 𝑆 2 is “significantly” different from 0 ? Finding a test statistic ▪ look at the variances associated with 𝑇𝑇𝑆 and 𝑇𝑇𝐹 ▪ so define the mean sums of squares ( 𝑁𝑇 ) (variances!) 𝑇𝑇𝑈 𝑇𝑇𝑆 𝑇𝑇𝐹 𝑜−1 ; 𝑁𝑇𝑆 = 1 ; 𝑁𝑇𝐹 = ▪ 𝑁𝑇𝑈 = 𝑜−2 𝑁𝑇𝑆 𝑇𝑇𝑆/1 ▪ use 𝑇𝑇𝐹/ 𝑜−2 as a ratio of two variances 𝑁𝑇𝐹 =

ASSESSING THE REGRESSION MODEL Statistical test: ▪ 𝐼 0 : the independent variable ( 𝑌 ) does not explain the variation in the dependent variable ( 𝑍 ) ▪ i.e., 𝐼 0 : 𝛾 1 = 0 versus 𝐼 1 : 𝛾 1 ≠ 0 𝑁𝑇𝑆 ▪ Sample statistic: 𝐺 = 𝑁𝑇𝐹 ; reject for large values ▪ Under 𝐼 0 : 𝐺~𝐺 1,𝑜−2 ; assumptions: see model 𝑁𝑇𝑆 ▪ Compare 𝐺 calc = 𝑁𝑇𝐹 with 𝐺 crit = 𝐺 1,𝑜−2;𝛽 ▪ or compute 𝑞 -value as the probability of obtaining 𝐺 calc or more extreme if 𝐼 0 is true

ASSESSING THE REGRESSION MODEL Using SPSS, three types of output Model summary ▪ 𝑆 2 Variance decomposition (ANOVA?) ▪ 𝑇𝑇𝑆 , 𝑇𝑇𝐹 , 𝑇𝑇𝑈 ▪ 𝑁𝑇𝑆 , 𝑁𝑇𝐹 ▪ 𝐺 𝑑𝑏𝑚𝑑 ▪ 𝑞 -value Regression coefficients ▪ 𝑐 0 and 𝑐 1

ASSESSING THE REGRESSION MODEL The model is 𝑍 = 𝛾 0 + 𝛾 1 𝑌 + 𝜁 ▪ OLS extracts estimates from the data: 𝑐 0 and 𝑐 1 ▪ But how accurate are these estimates? We can also find the distribution of 𝐶 0 and 𝐶 1 ▪ So, we can find confidence intervals and perform hypothesis tests 𝐶 0 and 𝐶 1 are 𝑢 -distributed: 𝐶 0 −𝛾 0 ▪ 𝑇 𝐶0 ~𝑢 𝑜−2 𝐶 1 −𝛾 1 ▪ 𝑇 𝐶1 ~𝑢 𝑜−2

ASSESSING THE REGRESSION MODEL Mind the notation, like before: ▪ mean ▪ population value 𝜈 𝑌 ▪ sample estimate ҧ 𝑦 When you’re careless ▪ sampling distribution of random variable ത 𝑌 with this, it all gets ▪ regression slope mixed up in one big abracadabra trickery! ▪ population value 𝛾 1 ▪ sample estimate 𝑐 1 ▪ sampling distribution of random variable 𝐶 1

EXERCISE 2 a. Is the model significant? b. Has the model practical relevance?

TESTING THE REGRESSION COEFFICIENTS ▪ Testing 𝛾 0 is usually not interesting ▪ but testing 𝛾 1 is! ▪ in particular, the hypothesis 𝛾 1 = 0 is often interesting ▪ i.e., the hypothesis that there is no relation between 𝑌 and 𝑍 ▪ or: that knowledge of 𝑌 doesn’t tell you anything about 𝑍 ▪ This test requires the standard deviation of 𝐶 1 ▪ it is calculated from the data; see computer output ▪ here 𝑡 𝐶 1 = 347.578

TESTING THE REGRESSION COEFFICIENTS 𝑐 1 −𝛾 1 6151.670−0 So: 𝑢 calc = = = 17.699 𝑡 𝐶1 347.578 ▪ which has to be compared to 𝑢 crit = ±𝑢 0.025;69 ▪ reject 𝐼 0 : 𝛾 1 = 0 , because 𝑢 calc > 𝑢 crit ▪ or with 𝑞 -value: 𝑞 = 0.000 ≪ 0.05 ▪ and conclude that the slope differs significantly from zero ▪ post-hoc conclusion: it is larger than zero

TESTING THE REGRESSION COEFFICIENTS Testing the regression model 𝑁𝑇𝑆 ▪ on the basis of 𝑁𝑇𝐹 ~𝐺 1,𝑜−2 Testing the regression coefficient 𝑐 1 𝐶 1 −0 ▪ on the basis of 𝑇 𝐶1 ~𝑢 𝑜−2 The two approaches are equivalent ▪ they have the same null hypothesis: 𝐼 0 : 𝛾 1 = 0 ▪ they lead to the same conclusion (rejection or no rejection) ▪ they lead to the same 𝑞 -value ▪ when we do multiple regression with several explanatory variables this is not the case! See later.

TESTING THE REGRESSION COEFFICIENTS We can also perform other tests than 𝐼 0 : 𝛾 1 = 0 ▪ Case 1: Different test values for 𝛾 1 ▪ for example 𝐼 0 : 𝛾 1 = 2 𝑐 1 −2 ▪ 𝑢 calc = 𝑡 𝐶1 ▪ not in SPSS, but easily calculated using s 𝐶 1 ▪ Case 2: One sided tests ▪ for example 𝐼 0 : 𝛾 1 ≥ 0 ▪ 𝑢 calc as before, but now tested with different 𝑢 crit ▪ not in SPSS, but also easily calculated using 2-sided 𝑞 -value ▪ Case 3: combination of case 1 and case 2 ▪ for example 𝐼 0 : 𝛾 1 ≥ 2 ▪ Try all! (see tutorials)

TESTING THE REGRESSION COEFFICIENTS Example of case 3: ▪ is there evidence that the price per square meter larger than 5500€? one-sided critical value, with 𝛽 , not 𝛽/2 ▪ 𝐼 0 : 𝛾 1 ≤ 5500 ; 𝐼 1 : 𝛾 1 > 5500 ; 𝛽 = 0.05 6151.670−5500 ▪ 𝑢 calc = = 1.875 > 𝑢 crit ≈ 1.7 347.578 ▪ reject 𝐼 0 ▪ conclude that price per m 2 is significantly larger than 5500€

TESTING THE REGRESSION COEFFICIENTS One may also test 𝛾 0 in exactly the same way ▪ however, this is hardly ever useful 𝐶 1 𝐶 0 Overall significance of 𝐺 -test only depends on 𝑇 𝐶1 , not on 𝑇 𝐶0 ▪ that is because the slope explains variation ▪ while the intercept is only a vertical shift

THE ANOVA TABLE One of the regression results is the ANOVA table ANOVA = analysis of variance ▪ Excel ▪ SPSS

Business Statistics CONTENTS Ordinary least squares (recap for - PowerPoint PPT Presentation

SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients The ANOVA table Old exam

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David

Existence of the free boundary in a diffusive ow in porous media Gabriela Marinoschi

Design and analysis of follow-up studies with genetic component Juha Karvanen Department of

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Machine Learning for Computational Linguistics Regression ar ltekin University of

r rrss ttst

Sambuz

Useful Links

Newsletter

Mail Us

Business Statistics CONTENTS Ordinary least squares (recap for - PowerPoint PPT Presentation

SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients The ANOVA table Old exam

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David

Existence of the free boundary in a diffusive ow in porous media Gabriela Marinoschi

Design and analysis of follow-up studies with genetic component Juha Karvanen Department of

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Machine Learning for Computational Linguistics Regression ar ltekin University of

r rrss ttst

Sambuz

Useful Links

Newsletter

Mail Us

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning