statistical analysis of corpus data with r
play

Statistical Analysis of Corpus Data with R A short introduction to - PowerPoint PPT Presentation

Statistical Analysis of Corpus Data with R A short introduction to regression and linear models Designed by Marco Baroni 1 and Stefan Evert 2 1 Center for Mind/Brain Sciences (CIMeC) University of Trento 2 Institute of Cognitive Science (IKW)


  1. Statistical Analysis of Corpus Data with R A short introduction to regression and linear models Designed by Marco Baroni 1 and Stefan Evert 2 1 Center for Mind/Brain Sciences (CIMeC) University of Trento 2 Institute of Cognitive Science (IKW) University of Onsabr¨ uck Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 1 / 15

  2. Outline Outline Regression 1 Simple linear regression General linear regression Linear statistical models 2 A statistical model of linear regression Statistical inference Generalised linear models 3 Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 2 / 15

  3. Regression Simple linear regression Linear regression Can random variable Y be predicted from r. v. X ? ☞ focus on linear relationship between variables Linear predictor: Y ≈ β 0 + β 1 · X ◮ β 0 = intercept of regression line ◮ β 1 = slope of regression line Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 3 / 15

  4. Regression Simple linear regression Linear regression Can random variable Y be predicted from r. v. X ? ☞ focus on linear relationship between variables Linear predictor: Y ≈ β 0 + β 1 · X ◮ β 0 = intercept of regression line ◮ β 1 = slope of regression line Least-squares regression minimizes prediction error n � 2 � � Q = y i − ( β 0 + β 1 x i ) i =1 for data points ( x 1 , y 1 ) , . . . , ( x n , y n ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 3 / 15

  5. Regression Simple linear regression Simple linear regression Coefficients of least-squares line � n i =1 x i y i − nx n y n ˆ β 1 = � n i =1 x 2 i − nx 2 n β 0 = y n − ˆ ˆ β 1 x n Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 4 / 15

  6. Regression Simple linear regression Simple linear regression Coefficients of least-squares line � n i =1 x i y i − nx n y n ˆ β 1 = � n i =1 x 2 i − nx 2 n β 0 = y n − ˆ ˆ β 1 x n Mathematical derivation of regression coefficients ◮ minimum of Q ( β 0 , β 1 ) satisfies ∂ Q /∂β 0 = ∂ Q /∂β 1 = 0 ◮ leads to normal equations (system of 2 linear equations) n n n X X X ˆ ˜ − 2 y i − ( β 0 + β 1 x i ) = 0 β 0 n + β 1 x i = ➜ y i i =1 i =1 i =1 n n n n X X X x 2 X − 2 ˆ y i − ( β 0 + β 1 x i ) ˜ = 0 x i + β 1 i = β 0 ➜ x i x i y i i =1 i =1 i =1 i =1 ◮ regression coefficients = unique solution ˆ β 0 , ˆ β 1 Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 4 / 15

  7. Regression Simple linear regression The Pearson correlation coefficient Measuring the “goodness of fit” of the linear prediction ◮ variation among observed values of Y = sum of squares S 2 y ◮ closely related to (sample estimate for) variance of Y n � S 2 ( y i − y n ) 2 y = i =1 ◮ residual variation wrt. linear prediction: S 2 resid = Q Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 5 / 15

  8. Regression Simple linear regression The Pearson correlation coefficient Measuring the “goodness of fit” of the linear prediction ◮ variation among observed values of Y = sum of squares S 2 y ◮ closely related to (sample estimate for) variance of Y n � S 2 ( y i − y n ) 2 y = i =1 ◮ residual variation wrt. linear prediction: S 2 resid = Q Pearson correlation = amount of variation “explained” by X R 2 = 1 − S 2 � n i =1 ( y i − β 0 − β 1 x i ) 2 resid = 1 − � n S 2 i =1 ( y i − y n ) 2 y ☞ correlation vs. slope of regression line R 2 = ˆ β 1 ( y ∼ x ) · ˆ β 1 ( x ∼ y ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 5 / 15

  9. Regression General linear regression Multiple linear regression Linear regression with multiple predictor variables Y ≈ β 0 + β 1 X 1 + · · · + β k X k minimises n � 2 � � Q = y i − ( β 0 + β 1 x i 1 + · · · + β k x ik ) i =1 � � � � for data points x 11 , . . . , x 1 k , y 1 , . . . , x n 1 , . . . , x nk , y n Multiple linear regression fits n -dimensional hyperplane instead of regression line Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 6 / 15

  10. Regression General linear regression Multiple linear regression: The design matrix Matrix notation of linear regression problem y ≈ Z β “Design matrix” Z of the regression data   1 x 11 x 12 · · · x 1 k 1 x 21 x 22 · · · x 2 k   Z =  . . . .  . . . .   . . . .   1 x n 1 x n 2 · · · x nk � ′ � y = y 1 y 2 . . . y n � ′ � β = β 0 β 1 β 2 . . . β k ☞ A ′ denotes transpose of a matrix; y , β are column vectors Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 7 / 15

  11. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  12. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) System of normal equations satisfying ∇ β Q = 0: Z ′ Z β = Z ′ y Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  13. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) System of normal equations satisfying ∇ β Q = 0: Z ′ Z β = Z ′ y Leads to regression coefficients β = ( Z ′ Z ) − 1 Z ′ y ˆ Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  14. Regression General linear regression General linear regression Predictor variables can also be functions of the observed variables ➜ regression only has to be linear in coefficients β E.g. polynomial regression with design matrix x 2 x k   1 x 1 · · · 1 1 x 2 x k 1 · · · x 2  2 2  Z =  . . . .  . . . .   . . . .   x 2 x k 1 x n · · · n n corresponding to regression model Y ≈ β 0 + β 1 X + β 2 X 2 + · · · + β k X k Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 9 / 15

  15. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  16. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Mathematical notation: β 0 + β 1 x 1 + · · · + β k x k , σ 2 � � Y | x 1 , . . . , x k ∼ N Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  17. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Mathematical notation: β 0 + β 1 x 1 + · · · + β k x k , σ 2 � � Y | x 1 , . . . , x k ∼ N Assumptions ◮ error terms ǫ i are i.i.d. (independent, same distribution) ◮ error terms follow normal (Gaussian) distributions ◮ equal (but unknown) variance σ 2 = homoscedasticity Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  18. Linear statistical models A statistical model of linear regression Linear statistical models Probability density function for simple linear model � n � 1 − 1 � ( y i − β 0 − β 1 x i ) 2 Pr ( y | x ) = (2 πσ 2 ) n / 2 · exp 2 σ 2 i =1 ◮ y = ( y 1 , . . . , y n ) = observed values of Y (sample size n ) ◮ x = ( x 1 , . . . , x n ) = observed values of X Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 11 / 15

  19. Linear statistical models A statistical model of linear regression Linear statistical models Probability density function for simple linear model � n � 1 − 1 � ( y i − β 0 − β 1 x i ) 2 Pr ( y | x ) = (2 πσ 2 ) n / 2 · exp 2 σ 2 i =1 ◮ y = ( y 1 , . . . , y n ) = observed values of Y (sample size n ) ◮ x = ( x 1 , . . . , x n ) = observed values of X Log-likelihood has a familiar form: n 1 ( y i − β 0 − β 1 x i ) 2 ∝ Q � log Pr ( y | x ) = C − 2 σ 2 i =1 ➥ MLE parameter estimates ˆ β 0 , ˆ β 1 from linear regression Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 11 / 15

  20. Linear statistical models Statistical inference Statistical inference for linear models Model comparison with ANOVA techniques ◮ Is variance reduced significantly by taking a specific explanatory factor into account? ◮ intuitive: proportion of variance explained (like R 2 ) ◮ mathematical: F statistic ➜ p -value Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 12 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend