the magical world of mgcv Noam Ross @noamross #nyhackr, 2017-11-15 - - PowerPoint PPT Presentation

the magical world of mgcv
SMART_READER_LITE
LIVE PREVIEW

the magical world of mgcv Noam Ross @noamross #nyhackr, 2017-11-15 - - PowerPoint PPT Presentation

Nonlinear Modeling in R with GAMs: the magical world of mgcv Noam Ross @noamross #nyhackr, 2017-11-15 Pre-Thanks Gavin Simpson (@ucfagls) Eric Pedersen (@ericJpedersen) David Miller (@millerdl) Why Generalized Additive Models? When to


slide-1
SLIDE 1

Nonlinear Modeling in R with GAMs:

Noam Ross @noamross #nyhackr, 2017-11-15

the magical world

  • f mgcv
slide-2
SLIDE 2
slide-3
SLIDE 3

Pre-Thanks

Gavin Simpson (@ucfagls) Eric Pedersen (@ericJpedersen) David Miller (@millerdl)

slide-4
SLIDE 4

Why Generalized Additive Models?

slide-5
SLIDE 5

When to use GAMs

  • To predict from complex, nonlinear,

possibly interacting relationships

  • To understand and make inferences about

those relationships

  • To control for for those relationships
slide-6
SLIDE 6

Not bad at prediction!

From Kim Larsen @ Stitchfix: https://github.com/klarsen1/gampost

Performance in Binary Classification of Direct Mail Customer Acquisition

slide-7
SLIDE 7

A Thimbleful of Theory

slide-8
SLIDE 8

What are GAMs?

  • Generalized: Can handle many

distributions of normal, binomial, count, or other data

  • Additive: terms simply add

together, but terms themselves are not linear

  • Model: Model
slide-9
SLIDE 9

Going from Linear to Additive

slide-10
SLIDE 10

Going from Linear to Additive

slide-11
SLIDE 11

GAM Smooths are made of basis functions

slide-12
SLIDE 12

Basis functions can have 1, 2, or more dimensions

slide-13
SLIDE 13

Optimizing Wiggliness Log(L) - λW

Likelihood/Fit Wiggliness Smoothing Parameter

slide-14
SLIDE 14

Picking a Smoothing Parameter

slide-15
SLIDE 15

More Theory

slide-16
SLIDE 16

Picking a Smoothing Parameter

(This is automated in mgcv, phew!)

slide-17
SLIDE 17

A Smidgen of Syntax

slide-18
SLIDE 18

Fitting a GAM in R

lm(y ~ x1 + x2, data=data) glm(y ~ x1 + x2, data=data, family=binomial) library(mgcv) gam(y ~ x1 + s(x2), # model formula data=data, # your data family = gaussian # or something more exotic method = "REML") # how to pick λ

slide-19
SLIDE 19

The GAM Formula

y ~ x1 + # linear terms s( # smooth terms: x2, # variable bs = "tp", # the kind of basis function k = 10, # how many basis functions ...) # other complex and # basis-specific stuff

slide-20
SLIDE 20

Going from Linear to Additive

slide-21
SLIDE 21

The GAM Formula in 2D

y ~ s(x1) + s(x2) # Two additive smooths y ~ s(x1, x2) # 2D smooth/interaction y ~ te(x1, x2) # 2D smooth, two wigglinesses y ~ te(x1) + te(x2) + ti(x1, x2) # 2D smooth, two wigglinesses, interaction as # a separate term

slide-22
SLIDE 22

Smooths in Space

slide-23
SLIDE 23

Smooths in Space

gam(d ~ s(x, y) + s(depth), data=dolphin_observations)

slide-24
SLIDE 24

A Bevy of Basis Functions

slide-25
SLIDE 25

Slippery Smooths: "Soap Films"

gam(d ~ s(x, y, bs="so", xt = list(bnd=my_boundary), data=data)

slide-26
SLIDE 26

Smooths that Make the World Go Round

gam(y ~ s(latitude, longitude, bs="sos"), data=dat)

Spline-on-a-Sphere

slide-27
SLIDE 27

Smooths in Time

slide-28
SLIDE 28

Gaussian Process Smooths

gam(y ~ s(time, bs= "gp"), data=bat_antibodies, family = binomial)

slide-29
SLIDE 29

Cyclic Smooths

gam(y ~ s(time, bs= "gp") + s(month, bs = "cc"), data=bat_antibodies, family = "binomial")

slide-30
SLIDE 30

Smooths that Ain't Smooth

slide-31
SLIDE 31

Discrete Random Effects

gam(y ~ s(x, bs = "re"), data=dat)

slide-32
SLIDE 32

gam(y ~ s(xc, xf, bs = "fs"), data=dat) gam(y ~ te(xc, xf, bs = c("tp", "re"), data=dat)

(or , different slopes for different folks)

Factor-Smooth Interactions

slide-33
SLIDE 33

gam(y ~ te(xc, bs="gp") + ti(xc, xf, bs = c("gp", "re"), data=dat)

Different Slopes for Different Folks

slide-34
SLIDE 34

Markov Random Fields

gam(y ~ s(x, bs = "mrf", xt = list( nb = nb )), data=dat)

slide-35
SLIDE 35

Adaptive Smooths

(Smooths in your Smooths)

gam(y ~ s(x, bs= "ad"), data=data)

slide-36
SLIDE 36

A Plethora of Probability Distributions

slide-37
SLIDE 37

Data with Outliers: Student's T

gam(y ~ s(x), data=fat_tailed_data, family = scat)

slide-38
SLIDE 38

Count Data

gam(y ~ x, data=dat, family = poisson) gam(y ~ x, data=dat, family = negbin) gam(y ~ x, data=dat, family = tw)

slide-39
SLIDE 39

Count Data

gam(d ~ s(x, y, bs="tp") + s(depth), data=dolphin_observations, family = tw)

slide-40
SLIDE 40

Ordered Categorical Data

gam(ordered_factor ~ s(x), data=data, family = ocat)

slide-41
SLIDE 41

Multiple Output Variables

Unordered Categories: Multinomial gam(list(category ~ s(x1) + s(x2), ~ s(x1) + s(x2)), data= model_dat, family=multinom(K=2)) Multiple Continuous Outputs: Multivariate Normal gam(list(category ~ s(x1) + s(x2), ~ s(x1) + s(x3)), data= model_dat, family=mvn(K=2))

slide-42
SLIDE 42

And More!

Survival data: Cox Proportional hazards (family = cox.ph) Heteroscedastic data: Gaussian location-scale models (family = gaulss) Censored count data: Zero-inflated Poisson (family = ziplss)

slide-43
SLIDE 43

A Few more Features

slide-44
SLIDE 44

But I need variable selection

gam(y ~ s(x1) + s(x2) + s(x3) + s(x4) + s(x5) + s(x6), data=data, family = gaussian, select=TRUE)

slide-45
SLIDE 45

But my data is biggish

system.time( b1 <- gam(y ~ s(x0,bs=bs)+s(x1,bs=bs)+s(x2,bs=bs,k=k), data=dat) ) user system elapsed 57.610 259.800 21.673 system.time( b1 <- bam(y ~ s(x0,bs=bs)+s(x1,bs) +s(x2,bs=bs,k=k), data=dat, discrete=TRUE, nthreads=2) ) user system elapsed 5.535 33.670 2.532

bam() is a memory-efficient, high-performance, parallelizable alternative

slide-46
SLIDE 46

But I have complex hierarchical data

br <- gamm4(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"), random = ~ (x+0|g) + (1|g) + (1|a/b))

gamm OR gamm4::gamm4 gives you mgcv + lme4

slide-47
SLIDE 47

But I want full Bayes!

# generates JAGS code mgcv::jagam() # mgcv-style GAMs in Stan rstanarm::stan_gamm4() # greta/Tensorflow GAMs # (very in-development by @millerdl) gretaGAM::jagam2greta()

Chill, we've got your back

slide-48
SLIDE 48

A Roundup of Resources

slide-49
SLIDE 49

help(package="mgcv") ?smooth.terms ?missing.data ?gam.selection

slide-50
SLIDE 50

fromthebottomoftheheap.net

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

https://noamross.github.io/mgcv-esa-workshop/

slide-54
SLIDE 54

Coming this spring...

slide-55
SLIDE 55

Thank You!

Noam Ross @noamross #nyhackr, 2017-11-15