Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15
The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15
The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15
The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15
The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15
The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model ◮ if the fit is acceptable, tests on parameters can be performed and the model can be used for predictions E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15
Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15
Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15
Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15
Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . ◮ Fitting (or adjusting) the model = estimate β 0 , β 1 and σ from the n -sample ( x i , y i ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15
Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15
Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15
Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! ◮ ◮ In matrix notation (useful later): Y = X.B + ǫ, with Y = ⊤ ( Y 1 . . . Y n ) , B = ⊤ ( β 0 , β 1 ) , ǫ = ⊤ ( ǫ 1 . . . ǫ n ) and � � 1 · · · 1 X = ⊤ . X 1 · · · X n E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15
Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15
Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15
Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . Theorem 1. Least Square estimators are ˆ x and ˆ y − ˆ β 1 = s xy /s 2 β 0 = ¯ β 1 ¯ x . 2. These estimators are unbiased and efficient. � 2 � 3. s 2 = y i − ( ˆ β 0 + ˆ 1 is an unbiased estimator of σ 2 . It is � β 1 x i ) i n − 2 however not efficient. 4. Var( ˆ σ 2 x and Var( ˆ x 2 Var( ˆ β 1 ) + σ 2 /n β 1 ) = β 0 ) = ¯ ( n − 1) s 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15
Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15
Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15
Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. Theorem (Distribution of estimators) 1. ˆ β 0 ) and ˆ β 0 ∼ N ( β 0 , σ 2 β 1 ∼ N ( β 0 , σ 2 β 1 ) , with ˆ ˆ x ) 2 + 1 /n σ 2 β 0 = σ 2 � x 2 / � � and σ 2 β 1 = σ 2 / � x ) 2 i ( x i − ¯ i ( x i − ¯ ¯ ˆ ˆ 2. ( n − 2) s 2 /σ 2 ∼ χ 2 n − 2 3. ˆ β 0 and ˆ β 1 are independent of ˆ ǫ i . β 1 are given in 1. by replacing σ 2 by s 2 . 4. Estimators of σ 2 β 0 and σ 2 ˆ ˆ E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15
Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15
Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15
Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ ◮ Definition : Determination coefficient R 2 = y ) 2 � i ( ˆ y i − ¯ y ) 2 = SSR SST = 1 − SSE SST = 1 − Residual Variance Total variance . � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15
Recommend
More recommend