Limitations of linear models Richard Erickson Instructor DataCamp - - PowerPoint PPT Presentation

limitations of linear models
SMART_READER_LITE
LIVE PREVIEW

Limitations of linear models Richard Erickson Instructor DataCamp - - PowerPoint PPT Presentation

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in R Course overview Chapter 1: Review and limits of linear model and Poisson


slide-1
SLIDE 1

DataCamp Generalized Linear Models in R

Limitations of linear models

GENERALIZED LINEAR MODELS IN R

Richard Erickson

Instructor

slide-2
SLIDE 2

DataCamp Generalized Linear Models in R

Course overview

Chapter 1: Review and limits of linear model and Poisson regressions Chapter 2: Logistic (Binomial) regression Chapter 3: Interpreting and plotting GLMs Chapter 4: Multiple regression with GLMs

slide-3
SLIDE 3

DataCamp Generalized Linear Models in R

Workhorse of data science

Image source: US Department of Agriculture

slide-4
SLIDE 4

DataCamp Generalized Linear Models in R

Linear models

How can linear coefficients explain the data? Intercept for baseline effect Slope for linear predictor y = β + β x + ϵ

1

slide-5
SLIDE 5

DataCamp Generalized Linear Models in R

Linear models in R

lm(y ~ x, data = dat)

slide-6
SLIDE 6

DataCamp Generalized Linear Models in R

Assumption of linearity

slide-7
SLIDE 7

DataCamp Generalized Linear Models in R

Assumption of normality

slide-8
SLIDE 8

DataCamp Generalized Linear Models in R

Assumption of continuous variables

.

slide-9
SLIDE 9

DataCamp Generalized Linear Models in R

slide-10
SLIDE 10

DataCamp Generalized Linear Models in R

Chick diets impact on weight

ChickWeight data from datasets package ChickWeightsEnd last observation from study

How do diets 2, 3, and 4 compare to diet 1?

lm(formula = weight ~ Diet, data = ChickWeightEnd) Call: lm(formula = weight ~ Diet, data = ChickWeightEnd) Coefficients: (Intercept) Diet2 Diet3 Diet4 177.75 36.95 92.55 60.81

slide-11
SLIDE 11

DataCamp Generalized Linear Models in R

What about survivorship or counts?

What about chick survivorship or chick counts? Neither are continuous! We need a new tool The generalized linear model

slide-12
SLIDE 12

DataCamp Generalized Linear Models in R

Generalized linear model

Similar to linear models Non-normal error distribution Link functions: y = ψ(b + b x + ϵ)

1

slide-13
SLIDE 13

DataCamp Generalized Linear Models in R

GLMs in R

lm() same as glm( ..., family = "gaussian")

glm( y ~ x, data = data, family = "gaussian")

slide-14
SLIDE 14

DataCamp Generalized Linear Models in R

Let's practice!!

GENERALIZED LINEAR MODELS IN R

slide-15
SLIDE 15

DataCamp Generalized Linear Models in R

Poisson regression

GENERALIZED LINEAR MODELS IN R

Richard Erickson

Instructor

slide-16
SLIDE 16

DataCamp Generalized Linear Models in R

slide-17
SLIDE 17

DataCamp Generalized Linear Models in R

slide-18
SLIDE 18

DataCamp Generalized Linear Models in R

Poisson distribution

Discrete integers: x = 0, 1, 2, 3, ... Mean and variance parameter λ P(x) = Fixed area/time (e.g., goal per one game)

x! λ e

x −λ

slide-19
SLIDE 19

DataCamp Generalized Linear Models in R

Poisson distribution in R

dpois(x = ..., lambda = ...)

slide-20
SLIDE 20

DataCamp Generalized Linear Models in R

GLM with R requirements

Discrete counts: 0, 1, 2, 3... Defined area and time Log-scale coefficients

slide-21
SLIDE 21

DataCamp Generalized Linear Models in R

GLM with Poisson in R

glm(y ~ x, data = dat, family = 'poisson')

slide-22
SLIDE 22

DataCamp Generalized Linear Models in R

When not to use Poisson distribution

Non-count or non-positive data (e.g., 1.4 or -2) Non-constant sample area or time (e.g., trees km

  • vs. trees m

) Mean ≳30 Over-dispersed data Zero-inflated data

−1 −1

slide-23
SLIDE 23

DataCamp Generalized Linear Models in R

Formula intercepts

Comparison or intercept Comparison formula = y ~ x Intercept formula = y ~ x - 1

slide-24
SLIDE 24

DataCamp Generalized Linear Models in R

Goals per game

Two players, which approach do we use? If we want to know difference between players, use comparison: If we want to know average per player, use intercepts:

glm(goal ~ player, data = scores, family = "poisson") glm(goal ~ player -1, data = scores, family = "poisson")

slide-25
SLIDE 25

DataCamp Generalized Linear Models in R

Let's practice!

GENERALIZED LINEAR MODELS IN R

slide-26
SLIDE 26

DataCamp Generalized Linear Models in R

Basic lm() functions with glm()

GENERALIZED LINEAR MODELS IN R

Richard Erickson

Instructor

slide-27
SLIDE 27

DataCamp Generalized Linear Models in R

Interacting with model objects

Allow interaction with outputs Base R functions apply to glm() Useful shortcuts

slide-28
SLIDE 28

DataCamp Generalized Linear Models in R

Model print

print() usually default

> print(poissonOut) Call: glm(formula = y ~ x, family = "poisson", data = dat) Coefficients: (Intercept) x

  • 1.43036 0.05815

Degrees of Freedom: 29 Total (i.e. Null); 28 Residual Null Deviance: 35.63 Residual Deviance: 30.92 AIC: 66.02

slide-29
SLIDE 29

DataCamp Generalized Linear Models in R

Model summary

summary() provides more details

> summary(poissonOut) #... Deviance Residuals: Min 1Q Median 3Q Max

  • 1.6547 -0.9666 -0.7226 0.3830 2.3022

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.43036 0.59004 -2.424 0.0153 * x 0.05815 0.02779 2.093 0.0364 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1) Null deviance: 35.627 on 29 degrees of freedom Residual deviance: 30.918 on 28 degrees of freedom AIC: 66.024 Number of Fisher Scoring iterations: 5

slide-30
SLIDE 30

DataCamp Generalized Linear Models in R

Tidy output

provides standardized model outputs

tidy() from

Tidyverse Broom package

library(broom) > tidy(poissonOut) term estimate std.error statistic p.value 1 (Intercept) -1.43035579 0.59003923 -2.424171 0.01534339 2 x 0.05814858 0.02778801 2.092578 0.03638686

slide-31
SLIDE 31

DataCamp Generalized Linear Models in R

Regression coefficients

coef() prints regression coefficients

> coef(poissonOut) (Intercept) x

  • 1.43035579 0.05814858
slide-32
SLIDE 32

DataCamp Generalized Linear Models in R

Confidence intervals

confint() estimates the confidence intervals

> confint(poissonOut) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -2.725545344 -0.3897748 x 0.005500767 0.1155564

slide-33
SLIDE 33

DataCamp Generalized Linear Models in R

Predictions

predict(model, newData) newData argument:

Unspecified: predict() returns predictions based on original data used to fit the model. Specified: predict() returns predictions for newData.

slide-34
SLIDE 34

DataCamp Generalized Linear Models in R

Fire injury dataset

Daily civilian injuries Louisville, KY Count data, many zeros

slide-35
SLIDE 35

DataCamp Generalized Linear Models in R

Let's practice!

GENERALIZED LINEAR MODELS IN R