outline
play

Outline The Simple Linear Regression Model (12.1) Fitting the - PDF document

2/22/2007 219323 Probability and Statistics for Software Statistics for Software and Knowledge Engineers Lecture 13: Simple Linear Regression Simple Linear Regression and Correlation Monchai Sopitkamon, Ph.D. Outline The Simple Linear


  1. 2/22/2007 219323 Probability and Statistics for Software Statistics for Software and Knowledge Engineers Lecture 13: Simple Linear Regression Simple Linear Regression and Correlation Monchai Sopitkamon, Ph.D. Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 1

  2. 2/22/2007 The Simple Linear Regression Model I (12.1) � Purpose of regression analysis: predict the value of a dependent or response variable value of a dependent or response variable from the values of at least one explanatory or independent variable (also called predictors or factors). � Purpose of correlation analysis: measure the strength of the correlation between two variables. variables The Simple Linear Regression Model II (12.1) y i = β 0 + β 1 x i Y i ∼ N ( β 0 + β 1 x i , σ 2 ) N ( β 0 β 1 x i , σ ) Y i Intercept parameter Slope parameter Sim ple linear regression m odel 2

  3. 2/22/2007 The Simple Linear Regression Model III (12.1) I nterpretation of the error variance σ 2 error variance σ The Simple Linear Regression Model IV (12.1) β 1 > 0 � positive relationship β 1 = 0 � No relationship 35 SLR model is not appropriate β 1 < 0 � negative relationship 30 for nonlinear relationship for nonlinear relationship 25 20 15 10 5 0 0 2 4 6 8 10 12 14 16 3

  4. 2/22/2007 The Simple Linear Regression Model V (12.1) � Ex.67 pg.536: Car Plant Electricity Usage 3.8 3.6 3.4 Electricity usage 3.2 3 2.8 2.6 2.4 2.2 2 3 3.5 4 4.5 5 5.5 6 6.5 Productiom Excel sheet Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 4

  5. 2/22/2007 Fitting the Regression Line I (12.2) : Selecting the “best” line (errors) error estimated y The least squares fit Fitting the Regression Line II (12.2) = β + β ˆ y x i 0 1 i ˆ i y : predicted value of y for observation i . x : value of observation i . i β β are chosen to minimize: and 0 1 2 2 n n n [ ] ∑ ∑ ∑ = = − = − β + β 2 2 ˆ SSE e ( y y ) y ( x ) i i i i 0 1 i = = = i 1 i 1 i 1 n ∑ Subject to: = e 0 i = i 1 5

  6. 2/22/2007 Fitting the Regression Line III (12.2) � Method of Least Squares n ∑ − x y n x y i i β = = i 1 1 n ( ) ∑ − 2 2 x n x i = i 1 β = − β y x 0 1 SSE σ = n Variance of errors: ˆ 2 − 2 n -2 since two regression parameters need to be computed first Fitting the Regression Line IV (12.2) � Ex.67 pg.545: Car Plant Electricity Usage = = x x 4 885 4.885 n ∑ − = x y n x y y 2.846 i i β = = 12 i 1 ∑ = 2 x 291.231 1 n i ∑ ( ) − = 2 2 i 1 x n x 12 i ∑ = x y 169.253 = i 1 i i = i 1 β = − y b x − × × 169 . 253 12 4 . 885 2 . 846 0 1 β β = = − × 1 2 291 . 231 12 4 . 885 = 0 . 4988 β = − × 2 . 846 0 . 4998 4 . 885 0 = 0 . 409 ∴ = + y 0 . 409 0 . 499 x Excel sheet 6

  7. 2/22/2007 Fitting the Regression Line V (12.2) � Ex.67 pg.545: Car Plant Electricity Usage 3.8 3.6 y = 0.498x + 0.409 R² = 0.802 3.4 3.2 Electricity usage 3 2.8 2.6 2.4 2.2 2 3 3.5 4 4.5 5 5.5 6 6.5 Productiom Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 7

  8. 2/22/2007 Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 8

  9. 2/22/2007 The Analysis of Variance Table: Sum of Squares Decomposition I (12.6.1) � Apply the similar ANOVA approach as the one-factor layout as in Chapter 11 one factor layout as in Chapter 11 � Consider the variability in the dependent variable y � Hypothesis test: H 0 : β 1 = 0 The Analysis of Variance Table: Sum of Squares Decomposition II (12.6.1) n = ∑ ∑ i − 2 SST ( ( y y y y ) ) i = i 1 n ( ) n ∑ ( ) = ∑ = − 2 − = − 2 ˆ SSE y y SSR y ˆ y SST SSE i i i = = 1 i i 1 9

  10. 2/22/2007 The Analysis of Variance Table: Sum of Squares Decomposition III (12.6.1) The sum of squares for a sim ple linear regression The Analysis of Variance Table: Sum of Squares Decomposition IV (12.6.1) The analysis of variance table for a sim ple linear regression analysis � Hypothesis test: H 0 : β 1 = 0 � The two-sided p -value is p -value = P ( X > F ) where X is RV that has an F 1, n -2 distribution 10

  11. 2/22/2007 The Analysis of Variance Table: Sum of Squares Decomposition V (12.6.1) � Coefficient of determination ( R 2 ): fraction of variation explained by the regression of variation explained by the regression − SSR SST SSE SSE 2 = = = − R 1 SST SST SST (0 ≤ R 2 ≤ 1) The closer R 2 is to one, the better is the regression model. The Analysis of Variance Table: Sum of Squares Decomposition VI (12.6.1) The coefficient of determination R 2 is larger in scenario I I than in scenario I 11

  12. 2/22/2007 The Analysis of Variance Table: Sum of Squares Decomposition VII (12.6.1) � Ex.67 pg.572: Car Plant Electricity Usage MSR 1 . 2124 = = = F 40 . 53 MSE 0 . 0299 SSR SSR 1 1 . 2124 2124 = SST = = 2 R 0 . 802 1 . 5115 The higher the value of R 2 the better the regression. Excel sheet Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 12

  13. 2/22/2007 Residual Analysis Methods I (12.7.1) � Residuals: differences between the observed values of the dependent variable observed values of the dependent variable and the corresponding predicted (fitted) values = − ˆ e y y 1 ≤ i ≤ n i i i � Residual analysis can be used to – Identify outliers – Check if the fitted model is good – Check if the variance of error is constant – Check if the error terms are normally distributed Excel sheet Residual Analysis Methods II (12.7.1) � Plot the residuals e i against the values of the explanatory variable x i explanatory variable x i � Random scatter plot indicates no problem with the obtained regression model � If ( standardized residual ) is > 3, data σ ˆ e / i point i is an outlier � If there are outliers, they should be removed and the regression line should be fitted again Excel sheet 13

  14. 2/22/2007 Residual Analysis Methods III (12.7.1) Residual plot indicating points that m ay be outliers Residual Analysis Methods IV (12.7.1) � If residual plots show positive and negative residuals grouped together, a linear model is residuals grouped together, a linear model is not suitable A grouping of positive and negative residuals d i id l indicates that the linear m odel is inappropriate 14

  15. 2/22/2007 Residual Analysis Methods V (12.7.1) � If the residual plot shows a “funnel shape”, the variance of error ( σ 2 ) is not shape , the variance of error ( σ ) is not constant, conflicting w/ the assumption A funnel shape in the residual plot indicates residual plot indicates a non-constant error variance Residual Analysis Methods VI (12.7.1) � Normal probability plot (normal scores plot) of residuals can be used to check if the error of residuals can be used to check if the error terms ε i are normally distributed A norm al scores cores plot of a sim ulated Normal sc sam ple from a sam ple from a norm al distribution, w hich show s the points lying approxim ately on a straight line 15

  16. 2/22/2007 Residual Analysis Methods VII (12.7.1) � Exhibits non-normal distribution of res Normal scor residuals � Linear modeling approach may not be used Norm al scores plots Norm al scores plots of sim ulated sam ples Normal scores from non-norm al distributions, w hich show nonlinear patterns Outline � The Simple Linear Regression Model (12.1) � Fitting the Regression Line (12.2) � The Analysis of Variance Table (12.6) � Residual Analysis (12.7) � Correlation Analysis (12.9) 16

  17. 2/22/2007 The Sample Correlation Coefficient I (12.9.1) � From the correlation eq. in Section 2.5.4, Cov ( X , Y ) ρ ρ = = Corr Corr ( ( X X , , Y Y ) ) Var ( X ) Var ( Y ) which measures the strength of linear association between two jointly distributed RVs X and Y � The sample correlation coefficient r for a set of paired data observations ( x i , y i ) is n ∑ ∑ − − ( ( x x )( )( y y y y ) ) i i i i = = i 1 r n n ∑ ∑ − − 2 2 ( x x ) ( y y ) i i = = i 1 i 1 n ∑ − x y n x y i i = (-1 ≤ r ≤ 1) = i 1 n n ∑ ∑ − − 2 2 2 2 x n x y n y i i = = i 1 i 1 The Sample Correlation Coefficient II (12.9.1) r = 0 � no linear association r < 0 � negative linear association r > 0 � positive linear association 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend