simple linear regression
play

Simple linear regression STAT 401A - Statistical Methods for - PowerPoint PPT Presentation

Simple linear regression STAT 401A - Statistical Methods for Research Workers Jarad Niemi Iowa State University October 4, 2013 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 1 / 9 Model Simple Linear Regression Recall


  1. Simple linear regression STAT 401A - Statistical Methods for Research Workers Jarad Niemi Iowa State University October 4, 2013 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 1 / 9

  2. Model Simple Linear Regression Recall the One-way ANOVA model: ind ∼ N ( µ i , σ 2 ) Y ij where Y ij is the observation for individual j in group i . The simple linear regression model is ind ∼ N ( β 0 + β 1 X i , σ 2 ) Y i where Y i and X i are the response and explanatory variable, respectively, for individual i . response explanatory outcome covariate Terminology (all of these are equivalent): dependent independent endogenous exogenous Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 2 / 9

  3. Model Telomere length vs years post diagnosis ● 1.6 ● ● ● ● 1.4 ● ● ● ● ● ● Telomere length ● ● ● ● ● ● ● ● ● ● ● ● 1.2 ● ● ●● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● 2 4 6 8 10 12 Years post diagnosis (jittered) Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 3 / 9 R package abd , data set Telomeres

  4. Model Interpretation Interpretation V [ Y i | X i = x ] = σ 2 E [ Y i | X i = x ] = β 0 + β 1 x If X i = 0, then E [ Y i | X i = 0] = β 0 . β 0 is the expected response when the explanatory variable is zero. If X i increases from x to x + 1, then E [ Y i | X i = x + 1] = β 0 + β 1 x + β 1 − E [ Y i | X i = x ] = β 0 + β 1 x = β 1 β 1 is the expected increase in the response for each unit increase in the explanatory variable. σ is the standard deviation of the response for a fixed value of the explanatory variable. Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 4 / 9

  5. Model Estimators Remove the mean: iid ∼ N (0 , σ 2 ) Y i = β 0 + β 1 X i + e i e i So e i = Y i − ( β 0 + β 1 X i ) which we approximate by the residual e i = Y i − (ˆ β 0 + ˆ r i = ˆ β 1 X i ) The least squares, maximum likelihood, and Bayesian estimators are ˆ β 1 = SXY / SXX ˆ = Y − ˆ β 0 β 1 X σ 2 ˆ = SSE / ( n − 2) d.f. = n − 2 = � n SXY i =1 ( X i − X )( Y i − Y ) = � n i =1 ( X i − X )( X i − X ) = � n i =1 ( X i − X ) 2 SXX = � n i =1 r 2 SSE i � n = 1 X i =1 X i n � n = 1 Y i =1 Y i n Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 5 / 9

  6. Model Standard errors How certain are we about ˆ β 0 and ˆ β 1 being equal to β 0 and β 1 ? We quantify this uncertainty using their standard errors: � 2 1 X SE ( β 0 ) = ˆ σ n + d . f . = n − 2 ( n − 1) s 2 X � 1 SE ( β 1 ) = ˆ σ d . f . = n − 2 ( n − 1) s 2 X s 2 = SXX / ( n − 1) X s 2 = SYY / ( n − 1) Y = � n i =1 ( Y i − Y ) 2 SYY = SXY / ( n − 1) correlation coefficient r XY s X s Y R 2 = r 2 = SST − SSE coefficient of determination XY SST = SYY = � n i =1 ( Y i − Y ) 2 SST The coefficient of determination is the percentage of the total response variation explained by the explanatory variable(s). Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 6 / 9

  7. Model Pvalues and confidence intervals Pvalues and confidence interval We can compute two-sided pvalues via � � � � � � � � ˆ ˆ β 0 β 1 � � � � 2 P t n − 2 > and 2 P t n − 2 > � � � � SE ( β 0 ) SE ( β 1 ) � � � � � � � � These test the null hypothesis that the corresponding parameter is zero. We can construct 100(1 − α )% confidence intervals via ˆ ˆ β 0 ± t n − 2 (1 − α/ 2) SE ( β 0 ) and β 1 ± t n − 2 (1 − α/ 2) SE ( β 1 ) These provide ranges of the parameter consistent with the data. Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 7 / 9

  8. Model Pvalues and confidence intervals Telomere length vs years post diagnosis ● 1.6 ● ● ● ● 1.4 ● ● ● ● ● ● Telomere length ● ● ● ● ● ● ● ● ● ● ● ● 1.2 ● ● ●● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● 2 4 6 8 10 12 Years post diagnosis (jittered) Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 8 / 9

  9. Model Pvalues and confidence intervals DATA t; INFILE ’telomeres.csv’ DSD FIRSTOBS=2; INPUT years length; PROC REG DATA=t; MODEL length = years; RUN; The REG Procedure Model: MODEL1 Dependent Variable: length Number of Observations Read 39 Number of Observations Used 39 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 0.22777 0.22777 8.42 0.0062 Error 37 1.00033 0.02704 Corrected Total 38 1.22810 Root MSE 0.16443 R-Square 0.1855 Dependent Mean 1.22026 Adj R-Sq 0.1634 Coeff Var 13.47473 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits Intercept 1 1.36768 0.05721 23.91 <.0001 1.25176 1.48360 years 1 -0.02637 0.00909 -2.90 0.0062 -0.04479 -0.00796 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 9 / 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend