statistics and learning
play

Statistics and learning Regression Emmanuel Rachelson and Matthieu - PowerPoint PPT Presentation

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15 The regression model expresses a random variable Y as a function of


  1. Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15

  2. The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

  3. The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

  4. The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

  5. The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

  6. The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model ◮ if the fit is acceptable, tests on parameters can be performed and the model can be used for predictions E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

  7. Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

  8. Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

  9. Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

  10. Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . ◮ Fitting (or adjusting) the model = estimate β 0 , β 1 and σ from the n -sample ( x i , y i ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

  11. Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

  12. Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

  13. Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! ◮ ◮ In matrix notation (useful later): Y = X.B + ǫ, with Y = ⊤ ( Y 1 . . . Y n ) , B = ⊤ ( β 0 , β 1 ) , ǫ = ⊤ ( ǫ 1 . . . ǫ n ) and � � 1 · · · 1 X = ⊤ . X 1 · · · X n E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

  14. Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

  15. Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

  16. Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . Theorem 1. Least Square estimators are ˆ x and ˆ y − ˆ β 1 = s xy /s 2 β 0 = ¯ β 1 ¯ x . 2. These estimators are unbiased and efficient. � 2 � 3. s 2 = y i − ( ˆ β 0 + ˆ 1 is an unbiased estimator of σ 2 . It is � β 1 x i ) i n − 2 however not efficient. 4. Var( ˆ σ 2 x and Var( ˆ x 2 Var( ˆ β 1 ) + σ 2 /n β 1 ) = β 0 ) = ¯ ( n − 1) s 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

  17. Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

  18. Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

  19. Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. Theorem (Distribution of estimators) 1. ˆ β 0 ) and ˆ β 0 ∼ N ( β 0 , σ 2 β 1 ∼ N ( β 0 , σ 2 β 1 ) , with ˆ ˆ x ) 2 + 1 /n σ 2 β 0 = σ 2 � x 2 / � � and σ 2 β 1 = σ 2 / � x ) 2 i ( x i − ¯ i ( x i − ¯ ¯ ˆ ˆ 2. ( n − 2) s 2 /σ 2 ∼ χ 2 n − 2 3. ˆ β 0 and ˆ β 1 are independent of ˆ ǫ i . β 1 are given in 1. by replacing σ 2 by s 2 . 4. Estimators of σ 2 β 0 and σ 2 ˆ ˆ E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

  20. Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

  21. Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

  22. Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ ◮ Definition : Determination coefficient R 2 = y ) 2 � i ( ˆ y i − ¯ y ) 2 = SSR SST = 1 − SSE SST = 1 − Residual Variance Total variance . � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend