Workshop 10.4: Generalized linear models Murray Logan August - PDF document

-1- Workshop 10.4: Generalized linear models Murray Logan August 16, 2016 Table of contents 1 Exponential family distributions 2 0.1. Linear models Homogeneity of variance   σ 2 . 0 0 ··· . .  σ 2  0 . ··· σ 2 )   y i = β 0 + β 1 × x i + ε i ε i ∼ N ( 0 , . V = cov = . . .   . . σ 2 � ��  . .  ··· Linearity Normality σ 2 0 . ··· ··· Zero covariance (=independence) . . . 0.2. Other data types • Binary - only 0 and 1 (dead/alive) (present/absent) • Proportional abundance - range from 0 to 100 • Count data - min of zero

-2- 0.3. Linear models 12 a) b) Present 1.0 ● ● ● ● ● ● ● ● ● 10 Predicted probability 0.8 of presence 8 Frequency 0.6 6 0.4 4 0.2 Absent 2 0.0 ● ● ● ● ● ● ● ● ● ● ● 0 0.0 0.4 0.8 X • expected values outside logical bounds • response not normally distributed 0.4. Logistic models 12 b) b) Present 1.0 ● ● ● ● ● ● ● ● ● 10 0.8 8 Frequency 0.6 6 0.4 4 0.2 Absent 2 0.0 ● ● ● ● ● ● ● ● ● ● ● 0 0.0 0.4 0.8 X • expected values outside logical bounds • response not normally distributed 1. Exponential family distributions

-3- 1.1. Gaussian distribution Virtually unbound measurements (weight, lengths etc) Probability density function Cumulative density function µ = 25, σ 2 = 5 µ = 25, σ 2 = 2 µ = 10, σ 2 = 2 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 2 σ 2 π e − ( x − µ )2 f ( x | µ , σ 2 ) = 1 2 σ 2 √ 1.2. Binomial distribution Presence/absence and data bound to the range [0,1] Probability density function Cumulative density function n = 50 n = 20 n = 3 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 ( n ) p k (1 − p ) n − k f ( k | n , p ) = p

-4- 1.3. Poisson distribution Count data (or count derivatives - like low densities) Probability density function Cumulative density function λ = 25 λ = 15 λ = 3 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 f ( x | λ ) = e − λ λ x x ! 1.4. Negative Binomial Count data (or count derivatives - like low densities) Probability density function Cumulative density function n = 25 n = 10 n = 1.5 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 µ x ω ω f ( x | µ , ω ) = Γ( x + ω ) Γ( ω ) x ! × ( µ + ω ) µ + ω

-5- 1.5. General linear models Homogeneity of variance   σ 2 . 0 0 ··· . .  σ 2  0 . ··· σ 2 )   y i = β 0 + β 1 × x i + ε i ε i ∼ N ( 0 , . V = cov = . . .   . . σ 2 � ��  . .  ··· Linearity Normality σ 2 0 . ··· ··· Zero covariance (=independence) . . . E ( Y ) = β 0 + β 1 x 1 + ... + β p x p + ε , ε ∼ Dist (...) � �� Link function Systematic 1.6. General linear models E ( Y ) = β 0 + β 1 x 1 + ... + β p x p + e � �� Random Link function Systematic • Random component. E ( Y i ) ∼ N ( µ i , σ 2 ) A nominated distribution (Gaussian, Poisson, Binomial, Gamma, Beta,. . . ) 1.7. General linear models E ( Y ) = β 0 + β 1 x 1 + ... + β p x p + e � �� Random Link function Systematic • Random component. • Systematic component β 0 + β 1 x 1 + ... + β p x p • Link function 1.8. Generalized linear models

-6- Response vari- Probability Distribu- Link function Model name able tion Continuous Gaussian identiy: Linear regression measurements µ Binary,proportions Binomial logit: Logistic regression ( ) π log 1 − π probit: Probit regression ∫ α + β . X 1 ( − 1 2 Z 2 ) exp dZ √ 2 π −∞ complimentary: Logistic regression log ( − log (1 − π )) Quasi-binomial logit: Logistic regression ( ) π log 1 − π Counts Poisson log: Poisson regression / log µ log-linear model Negative binomial Negative binomial ( µ ) log µ − θ regression Quasi- log: Poisson regression poisson log µ 1.9. OLS Parameter estimates 6 8 10 12 14 Sum of squares µ =10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 8 10 12 14 Parameter estimates

-7- 1.10. Maximum Likelihood 2 σ 2 π e − ( x − µ )2 f ( x | µ , σ 2 ) = 1 2 σ 2 √ 2 ln σ 2 − ∑ 2 ln L ( µ , σ 2 ) = − n 2 ln (2 π ) − n 1 i =1 ( x i − µ ) 2 2 σ 2 Maximum likelihood estimates: ∑ n x = 1 µ = ¯ ˆ i =1 x i n σ 2 = 1 ∑ n x ) 2 ˆ i =1 ( x i − ¯ n 1.11. Maximum Likelihood Parameter estimates 6 8 10 12 14 Log−likelihood µ =10 ● ● ● ● ● 6 6 6 8 8 8 10 10 10 12 12 12 14 14 14 Parameter estimates

Workshop 10.4: Generalized linear models Murray Logan August - PDF document

-1- Workshop 10.4: Generalized linear models Murray Logan August 16, 2016 Table of contents 1 Exponential family distributions 2 0.1. Linear models Homogeneity of variance 2 . 0 0 . . 2 0 .

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan 07 Feb 2017

Applied Machine Learning Applied Machine Learning Logistic Regression Siamak Ravanbakhsh Siamak

The Log-Linear Model The flu example from last class is actually one of our most common

IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning:

Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1

A Stack-based Algorithm for Neural Lattice Rescoring Gaurav Kumar Center for Language and Speech

Lecture 5 Linear Models Lin ZHANG, PhD School of Software Engineering Tongji University Fall

A Brief History of Lognormal and Power Law Distributions and an Application to File Size

Approximations of the Laplace Transform of a Lognormal Random Variable Leonardo Rojas Nandayapa