lecture 1 from linear regression
play

Lecture 1. From Linear Regression Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20 Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1


  1. Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20

  2. Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) (b) y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) (d) 2 / 20

  3. Q2. If there is a unique least squares regression line y = β ⊤ x on ( x 1 , y 1 ) , . . . , ( x n , y n ) ∈ R d × R, what is β ? ( X ⊤ X ) − 1 X ⊤ y ( XX ⊤ ) − 1 Xy (a) (b) X ⊤ y (c) (d) Xy where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . y 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 ● ● ● 3 / 20

  4. Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

  5. Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 We will study some options in this course! (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

  6. Your Tasks Assignment 4 14% out 18 Sep, due 12pm 2 Oct Assignment 5 14% out 2 Oct, due 12pm 16 Oct Consulting Project project description + data, out 2.5% half-time check, due 6pm 1 Oct 7.5% seminar, during a lecture in the week of 22 Oct 20% report, due 6pm on 26 Oct There are bonus questions in lectures and assignments. 5 / 20

  7. Our Problem Regression 6 / 20

  8. Course Objective • Understand the general theory of generalized linear models model structure, parameter estimation, asymptotic normality, prediction • Be able to recognize and apply generalized linear models and extensions for regression on different types of data • Be able to determine the goodness of fit and the prediction quality of a model Put it simply, to be able to do regression using generalized linear models and extensions... 7 / 20

  9. Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 8 / 20

  10. This Lecture • Revisit basics of OLS • Systematic and random components of OLS • Extensions of OLS to other types of data • A glimpse on generalized linear models 9 / 20

  11. Revisiting OLS The objective function Ordinary least squares (OLS) finds a hyperplane minimizing the sum of squared errors (SSE) n ∑︂ ( x ⊤ i β − y i ) 2 , β n = arg min β ∈ R d i =1 where each x i ∈ R d and each y i ∈ R . Terminology x : input, independent variables, covariate vector, observation, predictors, explanatory variables, features. y: output, dependent variable, response. 10 / 20

  12. Solution The solution to OLS is β n = ( X ⊤ X ) − 1 X ⊤ y , where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . The formula holds when X ⊤ X is non-singular. When X ⊤ X is singular, there are infinitely many possible values for β n . They can be obtained by solving the linear systems ( X ⊤ X ) β = X ⊤ y . 11 / 20

  13. Justification as MLE ind ∼ N ( x ⊤ i β, σ 2 ). • Assumption: y i | x i • Derivation: the log-likelihood of β is given by ln p ( y 1 , . . . , y n | x 1 , . . . , x n , β ) ∑︂ = ln p ( y i | x i , β ) i (︃ )︃ 1 ∑︂ exp( − ( y i − x ⊤ β ) 2 / 2 σ 2 ) = ln √ 2 πσ i = const. − 1 ∑︂ ( y i − x ⊤ i β ) 2 . σ 2 i Thus minimizing the SSE is the same as maximizing the log-likelihood, i.e. maximum likelihood estimation (MLE). 12 / 20

  14. An Alternative View • OLS has two orthogonal components E ( Y | x ) = β ⊤ x . (systematic) Y | x is normally distributed with variance σ 2 . (random) • This has two key features • Expected value of Y given x is a function of β ⊤ x . • Parameters of the conditional distribution of Y given x can be determined from E ( Y | x ). • This defines a conditional distribution p ( y | x , β ), with parameters estimated using MLE. 13 / 20

  15. Generalization E ( Y | x ) = g ( β ⊤ x ) . (systematic) (random) Y | x is normally/Poisson/Bernoulli/... distributed . 14 / 20

  16. Example 1. Logistic regression for binary response • When Y takes value 0 or 1, we can use the logistic function to squash x ⊤ β to [0 , 1], and use the Bernoulli distribution to model Y | x , as follows. 1 E ( Y | x ) = logistic ( β ⊤ x ) = (systematic) 1 + e − β ⊤ x . (random) Y | x is Bernoulli distributed . • Or more compactly, (︃ 1 )︃ Y | x ∼ B , 1 + e − β ⊤ x where B ( p ) is the Bernoulli distribution with parameter p . 15 / 20

  17. Example 2. Poisson regression for count response • When Y is a count, we can use exponentiation to map β ⊤ x to a non-negative value, and use the Poisson distribution to model Y | x , as follows. E ( Y | x ) = exp( β ⊤ x ) . (systematic) (random) Y | x is Poisson distributed . • Or more compactly, (︂ )︂ exp( β ⊤ x ) Y | x ∼ Po , where Po ( λ ) is a Poisson distribution with parameter λ . 16 / 20

  18. Example 3. Gamma regression for non-negative response • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( β ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is µ 2 /ν ( ν treated as known), thus Y | x ∼ Γ( µ = exp( β ⊤ x ) , var = µ 2 /ν ) , where Γ( µ = a , var = b ) denotes a Gamma distribution with mean a and variance b . 17 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend