Statistics and learning Regression Emmanuel Rachelson and Matthieu - PowerPoint PPT Presentation

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15

The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

The regression model ◮ expresses a random variable Y as a function of random variables X ∈ R p according to: Y = f ( X ; β ) + ǫ, where functional f depends on unknown parameters β 1 , . . . , β k and the residual (or error ) ǫ is an unobservable rv which accounts for random fluctuations between the model and Y . ◮ Goal: from n experimental observations ( x i , y i ) , we aim at ◮ estimating unknown ( β l ) l =1 ...k , ◮ evaluating the fitness of the model ◮ if the fit is acceptable, tests on parameters can be performed and the model can be used for predictions E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Simple linear regression ◮ A single explanatory variable X and an affine relationship to the dependant variable Y : E [ Y | X = x ] = β 0 + β 1 x or Y i = β 0 + β 1 X i + ǫ i , where β 1 is the slope of the adjusted regression line and β 0 is the intercept. ◮ Residuals ǫ i are assumed to be centred (R1), have equal variances ( = σ 2 , R2) and be uncorrelated: Cov( ǫ i , ǫ j ) = 0 , ∀ i � = j (R3). ◮ Hence: E [ Y i ] = β 0 + β 1 x i , Var( Y i ) = σ 2 and Cov( Y i , Y j ) = 0 , ∀ i � = j . ◮ Fitting (or adjusting) the model = estimate β 0 , β 1 and σ from the n -sample ( x i , y i ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Least square estimate ◮ Seeking values for β 0 and β 1 minimising the sum of quadratic errors: ( ˆ β 0 , ˆ � [ y i − ( β 0 + β 1 x i )] 2 β 1 ) = argmin ( β 0 ,β 1 ) ∈ R 2 Note that Y and X do not play a symetric role ! ◮ ◮ In matrix notation (useful later): Y = X.B + ǫ, with Y = ⊤ ( Y 1 . . . Y n ) , B = ⊤ ( β 0 , β 1 ) , ǫ = ⊤ ( ǫ 1 . . . ǫ n ) and � � 1 · · · 1 X = ⊤ . X 1 · · · X n E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

Estimator properties y , s 2 x , s 2 ◮ useful notations: ¯ x = 1 /n � i x i , ¯ y and s xy = 1 / ( n − 1) � i ( x i − ¯ x )( y i − ¯ y ) . s xy ◮ Linear correlation coefficient: r xy = s x s y . Theorem 1. Least Square estimators are ˆ x and ˆ y − ˆ β 1 = s xy /s 2 β 0 = ¯ β 1 ¯ x . 2. These estimators are unbiased and efficient. � 2 � 3. s 2 = y i − ( ˆ β 0 + ˆ 1 is an unbiased estimator of σ 2 . It is � β 1 x i ) i n − 2 however not efficient. 4. Var( ˆ σ 2 x and Var( ˆ x 2 Var( ˆ β 1 ) + σ 2 /n β 1 ) = β 0 ) = ¯ ( n − 1) s 2 E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

Simple Gaussian linear model ◮ In addition to R1 (centred noise), R2 (equal variance noise) and R3 (uncorrelated noise), we assume (R3’) ∀ i � = j , ǫ i and ǫ j independent and (R4) ∀ i , ǫ i ∼ N (0 , σ 2 ) or equivalently y i ∼ N ( β 0 + β 1 x i , σ 2 ) . ◮ Theorem : under (R1, R2, R3’ and R4), Least Square estimators = MLE. Theorem (Distribution of estimators) 1. ˆ β 0 ) and ˆ β 0 ∼ N ( β 0 , σ 2 β 1 ∼ N ( β 0 , σ 2 β 1 ) , with ˆ ˆ x ) 2 + 1 /n σ 2 β 0 = σ 2 � x 2 / � � and σ 2 β 1 = σ 2 / � x ) 2 i ( x i − ¯ i ( x i − ¯ ¯ ˆ ˆ 2. ( n − 2) s 2 /σ 2 ∼ χ 2 n − 2 3. ˆ β 0 and ˆ β 1 are independent of ˆ ǫ i . β 1 are given in 1. by replacing σ 2 by s 2 . 4. Estimators of σ 2 β 0 and σ 2 ˆ ˆ E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Tests, ANOVA and determination coefficient ◮ Previous theorem allows us to build CI for β 0 and β 1 . y ) 2 (total sum of ◮ SST/n = SSR/n + SSE/n , with SST = � i ( y i − ¯ y ) 2 (regression sum of squares) and squares), SSR = � i ( ˆ y i − ¯ y i ) 2 (sum of squared errors). SSE = � i ( y i − ¯ ◮ Definition : Determination coefficient R 2 = y ) 2 � i ( ˆ y i − ¯ y ) 2 = SSR SST = 1 − SSE SST = 1 − Residual Variance Total variance . � i ( y i − ¯ E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Statistics and learning Regression Emmanuel Rachelson and Matthieu - PowerPoint PPT Presentation

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15 The regression model expresses a random variable Y as a function of

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Friday 1

IRNAS Solutions Luka Mustafa, Institute IRNAS, November 2018 IRNAS.EU CC BY-SA 4.0 KORUZA

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Introduction to Learning Theory CS 760@UW-Madison Goals for the lecture you should understand

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Decision-aid methodologies in transportation Lecture 3: Logistic regression and probabilistic

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Equity Factors G. Simon Universit Paris Dauphine 2019-2020 Equity Factors G. Simon

Statistics and learning Regression Emmanuel Rachelson and Matthieu - PowerPoint PPT Presentation

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 6 th November 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15 The regression model expresses a random variable Y as a function of

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Friday 1

IRNAS Solutions Luka Mustafa, Institute IRNAS, November 2018 IRNAS.EU CC BY-SA 4.0 KORUZA

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Introduction to Learning Theory CS 760@UW-Madison Goals for the lecture you should understand

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Decision-aid methodologies in transportation Lecture 3: Logistic regression and probabilistic

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Equity Factors G. Simon Universit Paris Dauphine 2019-2020 Equity Factors G. Simon

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning