Lecture 1. From Linear Regression Nan Ye School of Mathematics and - - PowerPoint PPT Presentation

lecture 1 from linear regression
SMART_READER_LITE
LIVE PREVIEW

Lecture 1. From Linear Regression Nan Ye School of Mathematics and - - PowerPoint PPT Presentation

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20 Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1


slide-1
SLIDE 1

Lecture 1. From Linear Regression

Nan Ye

School of Mathematics and Physics University of Queensland

1 / 20

slide-2
SLIDE 2

Quiz

  • Q1. Which dataset is linear regression of y against x suitable for?
  • ● ●
  • −20

10 30 50 x 5 10 15 y

(a)

  • ●● ●
  • ● ●
  • ●●

−45 −30 −15 −5 5 15 x 1 y

(b)

  • −4

−2 2 4 6 x 2 4 6 y

(c)

  • −10

−6 −2 2 4 6 8 x 5 10 15 y

(d)

2 / 20

slide-3
SLIDE 3
  • Q2. If there is a unique least squares regression line y = β⊤x on

(x1, y1), . . . , (xn, yn) ∈ Rd × R, what is β? (a) (X⊤X)−1X⊤y (b) (XX⊤)−1Xy (c) X⊤y (d) Xy where X is the n × d design matrix with xi as the i-th row, and y = (y1, . . . , yn)⊤.

  • ● ●
  • −20

10 30 50 x 5 10 15 y

3 / 20

slide-4
SLIDE 4
  • Q3. Suggest possible models for the data shown in the figures.
  • ● ●
  • −20

10 30 50 x 5 10 15 y

(a) Continuous

  • ●● ●
  • ● ●
  • ●●

−45 −30 −15 −5 5 15 x 1 y

(b) Binary

  • −4

−2 2 4 6 x 2 4 6 y

(c) Cardinal

  • −10

−6 −2 2 4 6 8 x 5 10 15 y

(d) Nonnegative continuous

Linear regression

4 / 20

slide-5
SLIDE 5
  • Q3. Suggest possible models for the data shown in the figures.
  • ● ●
  • −20

10 30 50 x 5 10 15 y

(a) Continuous

  • ●● ●
  • ● ●
  • ●●

−45 −30 −15 −5 5 15 x 1 y

(b) Binary

  • −4

−2 2 4 6 x 2 4 6 y

(c) Cardinal

  • −10

−6 −2 2 4 6 8 x 5 10 15 y

(d) Nonnegative continuous

Linear regression

We will study some options in this course!

4 / 20

slide-6
SLIDE 6

Your Tasks

Assignment 4 14%

  • ut 18 Sep, due 12pm 2 Oct

Assignment 5 14%

  • ut 2 Oct, due 12pm 16 Oct

Consulting Project project description + data, out 2.5% half-time check, due 6pm 1 Oct 7.5% seminar, during a lecture in the week of 22 Oct 20% report, due 6pm on 26 Oct

There are bonus questions in lectures and assignments.

5 / 20

slide-7
SLIDE 7

Our Problem

Regression

6 / 20

slide-8
SLIDE 8

Course Objective

  • Understand the general theory of generalized linear models

model structure, parameter estimation, asymptotic normality, prediction

  • Be able to recognize and apply generalized linear models and

extensions for regression on different types of data

  • Be able to determine the goodness of fit and the prediction quality
  • f a model

Put it simply, to be able to do regression using generalized linear models and extensions...

7 / 20

slide-9
SLIDE 9

Course Overview

Generalized linear models (GLMs)

  • Building blocks

systematic and random components, exponential familes

  • Prediction and parameter estimation
  • Specific models for different types of data

continuous response, binary response, count response...

  • Modelling process and model diagnostics

Extensions of GLMs

  • Quasi-likelihood models
  • Nonparametric models
  • Mixed models and marginal models

Time series

8 / 20

slide-10
SLIDE 10

This Lecture

  • Revisit basics of OLS
  • Systematic and random components of OLS
  • Extensions of OLS to other types of data
  • A glimpse on generalized linear models

9 / 20

slide-11
SLIDE 11

Revisiting OLS

The objective function Ordinary least squares (OLS) finds a hyperplane minimizing the sum of squared errors (SSE) βn = arg min

β∈Rd n

∑︂

i=1

(x⊤

i β − yi)2,

where each xi ∈ Rd and each yi ∈ R.

Terminology x: input, independent variables, covariate vector, observation, predictors, explanatory variables, features. y: output, dependent variable, response.

10 / 20

slide-12
SLIDE 12

Solution The solution to OLS is βn = (X⊤X)−1X⊤y, where X is the n × d design matrix with xi as the i-th row, and y = (y1, . . . , yn)⊤.

The formula holds when X⊤X is non-singular. When X⊤X is singular, there are infinitely many possible values for βn. They can be obtained by solving the linear systems (X⊤X)β = X⊤y.

11 / 20

slide-13
SLIDE 13

Justification as MLE

  • Assumption: yi | xi

ind

∼ N(x⊤

i β, σ2).

  • Derivation: the log-likelihood of β is given by

ln p(y1, . . . , yn | x1, . . . , xn, β) = ∑︂

i

ln p(yi | xi, β) = ∑︂

i

ln (︃ 1 √ 2πσ exp(−(yi − x⊤β)2/2σ2) )︃ = const. − 1 σ2 ∑︂

i

(yi − x⊤

i β)2.

Thus minimizing the SSE is the same as maximizing the log-likelihood, i.e. maximum likelihood estimation (MLE).

12 / 20

slide-14
SLIDE 14

An Alternative View

  • OLS has two orthogonal components

(systematic) E(Y | x) = β⊤x. (random) Y | x is normally distributed with variance σ2.

  • This has two key features
  • Expected value of Y given x is a function of β⊤x.
  • Parameters of the conditional distribution of Y given x can be

determined from E(Y | x).

  • This defines a conditional distribution p(y | x, β), with parameters

estimated using MLE.

13 / 20

slide-15
SLIDE 15

Generalization

(systematic) E(Y | x) = g(β⊤x). (random) Y | x is normally/Poisson/Bernoulli/... distributed.

14 / 20

slide-16
SLIDE 16

Example 1. Logistic regression for binary response

  • When Y takes value 0 or 1, we can use the logistic function to

squash x⊤β to [0, 1], and use the Bernoulli distribution to model Y | x, as follows. (systematic) E(Y | x) = logistic(β⊤x) = 1 1 + e−β⊤x . (random) Y | x is Bernoulli distributed.

  • Or more compactly,

Y | x ∼ B (︃ 1 1 + e−β⊤x )︃ , where B(p) is the Bernoulli distribution with parameter p.

15 / 20

slide-17
SLIDE 17

Example 2. Poisson regression for count response

  • When Y is a count, we can use exponentiation to map β⊤x to a

non-negative value, and use the Poisson distribution to model Y | x, as follows. (systematic) E(Y | x) = exp(β⊤x). (random) Y | x is Poisson distributed.

  • Or more compactly,

Y | x ∼ Po (︂ exp(β⊤x) )︂ , where Po(λ) is a Poisson distribution with parameter λ.

16 / 20

slide-18
SLIDE 18

Example 3. Gamma regression for non-negative response

  • When Y is a non-negative continuous random variable, we can

choose the systematic and random components as follows. (systematic) E(Y | x) = exp(β⊤x) (random) Y | x is Gamma distributed.

  • We further assume the variance of the Gamma distribution is µ2/ν

(ν treated as known), thus Y | x ∼ Γ(µ = exp(β⊤x), var = µ2/ν), where Γ(µ = a, var = b) denotes a Gamma distribution with mean a and variance b.

17 / 20

slide-19
SLIDE 19

Generalized Linear Models

  • A GLM has the following structure

(systematic) E(Y | x) = h(β⊤x). (random) Y | x follows an exponential family distribution.

  • This is usually separated into three components
  • The linear predictor β⊤x.
  • The response function h.

People often specify the link function g = h−1 instead.

  • The exponential family for the conditional distribution of Y given x.

18 / 20

slide-20
SLIDE 20

Remarks on the exponential family

  • It is common!

normal, Bernoulli, Poisson, and Gamma distributions are exponential families.

  • It gives a well-defined model.

its parameters are determined by the mean µ = E(Y | x).

  • It leads to a unified treatment of many different models.

linear regression, logistic regression, ...

We will take a close look at these in the next few lectures.

19 / 20

slide-21
SLIDE 21

What You Need to Know

  • The approach of regression by separately specifying systematic and

random components.

  • Example applications of the approach
  • Linear regression, logistic regression, Poisson regression, Gamma

regression

  • The components of generalized linear models

20 / 20