Lecture 1. From Linear Regression Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20

Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) (b) y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) (d) 2 / 20

Q2. If there is a unique least squares regression line y = β ⊤ x on ( x 1 , y 1 ) , . . . , ( x n , y n ) ∈ R d × R, what is β ? ( X ⊤ X ) − 1 X ⊤ y ( XX ⊤ ) − 1 Xy (a) (b) X ⊤ y (c) (d) Xy where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . y 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 ● ● ● 3 / 20

Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 We will study some options in this course! (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

Your Tasks Assignment 4 14% out 18 Sep, due 12pm 2 Oct Assignment 5 14% out 2 Oct, due 12pm 16 Oct Consulting Project project description + data, out 2.5% half-time check, due 6pm 1 Oct 7.5% seminar, during a lecture in the week of 22 Oct 20% report, due 6pm on 26 Oct There are bonus questions in lectures and assignments. 5 / 20

Our Problem Regression 6 / 20

Course Objective • Understand the general theory of generalized linear models model structure, parameter estimation, asymptotic normality, prediction • Be able to recognize and apply generalized linear models and extensions for regression on different types of data • Be able to determine the goodness of fit and the prediction quality of a model Put it simply, to be able to do regression using generalized linear models and extensions... 7 / 20

Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 8 / 20

This Lecture • Revisit basics of OLS • Systematic and random components of OLS • Extensions of OLS to other types of data • A glimpse on generalized linear models 9 / 20

Revisiting OLS The objective function Ordinary least squares (OLS) finds a hyperplane minimizing the sum of squared errors (SSE) n ∑︂ ( x ⊤ i β − y i ) 2 , β n = arg min β ∈ R d i =1 where each x i ∈ R d and each y i ∈ R . Terminology x : input, independent variables, covariate vector, observation, predictors, explanatory variables, features. y: output, dependent variable, response. 10 / 20

Solution The solution to OLS is β n = ( X ⊤ X ) − 1 X ⊤ y , where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . The formula holds when X ⊤ X is non-singular. When X ⊤ X is singular, there are infinitely many possible values for β n . They can be obtained by solving the linear systems ( X ⊤ X ) β = X ⊤ y . 11 / 20

Justification as MLE ind ∼ N ( x ⊤ i β, σ 2 ). • Assumption: y i | x i • Derivation: the log-likelihood of β is given by ln p ( y 1 , . . . , y n | x 1 , . . . , x n , β ) ∑︂ = ln p ( y i | x i , β ) i (︃ )︃ 1 ∑︂ exp( − ( y i − x ⊤ β ) 2 / 2 σ 2 ) = ln √ 2 πσ i = const. − 1 ∑︂ ( y i − x ⊤ i β ) 2 . σ 2 i Thus minimizing the SSE is the same as maximizing the log-likelihood, i.e. maximum likelihood estimation (MLE). 12 / 20

An Alternative View • OLS has two orthogonal components E ( Y | x ) = β ⊤ x . (systematic) Y | x is normally distributed with variance σ 2 . (random) • This has two key features • Expected value of Y given x is a function of β ⊤ x . • Parameters of the conditional distribution of Y given x can be determined from E ( Y | x ). • This defines a conditional distribution p ( y | x , β ), with parameters estimated using MLE. 13 / 20

Generalization E ( Y | x ) = g ( β ⊤ x ) . (systematic) (random) Y | x is normally/Poisson/Bernoulli/... distributed . 14 / 20

Example 1. Logistic regression for binary response • When Y takes value 0 or 1, we can use the logistic function to squash x ⊤ β to [0 , 1], and use the Bernoulli distribution to model Y | x , as follows. 1 E ( Y | x ) = logistic ( β ⊤ x ) = (systematic) 1 + e − β ⊤ x . (random) Y | x is Bernoulli distributed . • Or more compactly, (︃ 1 )︃ Y | x ∼ B , 1 + e − β ⊤ x where B ( p ) is the Bernoulli distribution with parameter p . 15 / 20

Example 2. Poisson regression for count response • When Y is a count, we can use exponentiation to map β ⊤ x to a non-negative value, and use the Poisson distribution to model Y | x , as follows. E ( Y | x ) = exp( β ⊤ x ) . (systematic) (random) Y | x is Poisson distributed . • Or more compactly, (︂ )︂ exp( β ⊤ x ) Y | x ∼ Po , where Po ( λ ) is a Poisson distribution with parameter λ . 16 / 20

Example 3. Gamma regression for non-negative response • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( β ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is µ 2 /ν ( ν treated as known), thus Y | x ∼ Γ( µ = exp( β ⊤ x ) , var = µ 2 /ν ) , where Γ( µ = a , var = b ) denotes a Gamma distribution with mean a and variance b . 17 / 20

Lecture 1. From Linear Regression Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20 Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Lecture 6 Mojtaba Soltanalian- UIC msol@uic.edu http://msol.people.uic.edu Based on ECE 531

Machine Learning Regression Where we are Inputs Prob- Density ability Estimator Inputs

Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Bayesian D s -Optimal Designs for Generalized Linear Models with Varying Dispersion Parameter

. Surajit Ray Ray SAMSI, June 3 2005 - slide #1 Outline Outline Recap of (ordinary)

Workshop 8.2a: Heterogeneity Murray Logan July 23, 2016 Table of contents 1 Linear modelling

Regression Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of