summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview - - PowerPoint PPT Presentation

β–Ά
summary dsm x tw summary dsm xyb tw summary dsm xy tw
SMART_READER_LITE
LIVE PREVIEW

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview - - PowerPoint PPT Presentation

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview Estimating smooths How wiggly are things? Measuring wigglyness Splines What about these "s" things? Smoothing Translating maths into R Building a model, from


slide-1
SLIDE 1

Overview

The count model, from scratch What is a GAM? What is smoothing? Fitting GAMs using dsm 2 / 34

Building a model, from scratch

Know count in segment Want : Additive model of smooths : is the link function

π‘œπ‘˜ π‘˜ = 𝑔([environmental covariates ) π‘œπ‘˜ ]π‘˜ 𝑑 = exp [ + 𝑑( ) + 𝑑( )] π‘œπ‘˜ 𝛾0 yπ‘˜ Depthπ‘˜ model terms exp

3 / 34

Building a model, from scratch

What about area and detectability? area of segment - "offset" probability of detection in segment

= exp [ + 𝑑( ) + 𝑑( )] π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ π΅π‘˜ π‘žΜ‚

π‘˜

4 / 34

Building a model, from scratch

It's a statistical model so: has a distribution (count) are residuals (differences between model and

  • bservations)

= exp [ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ π‘œπ‘˜ πœ—π‘˜

5 / 34

That's a Generalized Additive Model! That's a Generalized Additive Model!

6 / 34 6 / 34

Now let's look at each bit... Now let's look at each bit...

7 / 34 7 / 34

Response

where

= exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ ∼ count distribution π‘œπ‘˜

8 / 34 Response is a count Often, it's mostly zero mean variance (Poisson isn't good at this)

Count distributions

β‰ 

9 / 34 (NB there is a point mass at zero not plotted) Poisson is We estimate and

Tweedie distribution

Var (count) = πœšπ”½(count)π‘Ÿ π‘Ÿ = 1 π‘Ÿ 𝜚

10 / 34 Estimate (Poisson: )

Negative binomial distribution

Var (count) = 𝔽(count) + πœ†π”½(count)2 πœ† Var (count) = 𝔽(count)

11 / 34

Smooths

= exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜

12 / 34 Think =smooth Want a line that is "close" to all the data Balance between interpolation and "fit"

What about these "s" things?

𝑑

13 / 34

What is smoothing? What is smoothing?

14 / 34 14 / 34

Smoothing

We think underlying phenomenon is smooth "Abundance is a smooth function of depth" 1, 2 or more dimensions 15 / 34 We set: "type": bases (made up

  • f basis functions)

"maximum wigglyness": basis size (sometimes: dimension/complexity) Automatically estimate: "how wiggly it needs to be": smoothing parameter(s)

Estimating smooths

16 / 34 Functions made of other, simpler functions Basis functions , estimate

Splines

𝑐𝑙 𝛾𝑙 𝑑(𝑦) = (𝑦) βˆ‘πΏ

𝑙=1 𝛾𝑙𝑐𝑙

17 / 34

Measuring wigglyness

Visually: Lots of wiggles not smooth Straight line very smooth

β‡’ β‡’

18 / 34

How wiggly are things?

Set basis complexity or "size" Fitted smooths have effective degrees of freedom (EDF) Set "large enough"

𝑙 𝑙

19 / 34 I can't teach you all of GAMs in 1 week Good intro book (also a good textbook on GLMs and GLMMs) Quite technical in places More resources on course website

Getting more out of GAMs

20 / 34

Fitting GAMs using dsm Fitting GAMs using dsm

21 / 34 21 / 34

Translating maths into R

where are some errors, count distribution inside the link: formula=count ~ s(y) response distribution: family=nb() or family=tw() detectability: ddf.obj=df_hr

  • ffset, data: segment.data=segs,
  • bservation.data=obs

= exp[ + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ πœ—π‘˜ πœ—π‘˜ ∼ π‘œπ‘˜

22 / 34

Your rst DSM

library(dsm) dsm_x_tw <- dsm(count~s(x), ddf.obj=df, segment.data=segs, observation.data=obs, family=tw())

dsm is based on mgcv by Simon Wood 23 / 34

summary(dsm_x_tw)

## ## Family: Tweedie(p=1.326) ## Link function: log ## ## Formula: ## count ~ s(x) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -19.8115 0.2277 -87.01 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x) 4.962 6.047 6.403 1.07e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.0283 Deviance explained = 17.9% ## -REML = 409.94 Scale est. = 6.0413 n = 949

24 / 34 plot(dsm_x_tw) Dashed lines indicate +/- 2 standard errors Rug plot On the link scale EDF on axis

Plotting

𝑧

25 / 34

Adding a term

Just use +

dsm_xy_tw <- dsm(count ~ s(x) + s(y), ddf.obj=df, segment.data=segs,

  • bservation.data=obs,

family=tw())

26 / 34

summary(dsm_xy_tw)

## ## Family: Tweedie(p=1.306) ## Link function: log ## ## Formula: ## count ~ s(x) + s(y) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.0908 0.2381 -84.39 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x) 4.943 6.057 3.224 0.004239 ** ## s(y) 5.293 6.419 4.034 0.000322 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.0678 Deviance explained = 27.4% ## -REML = 399.84 Scale est. = 5.3157 n = 949

27 / 34

Plotting

plot(dsm_xy_tw, pages=1)

28 / 34

Bivariate terms

Assumed an additive structure No interaction We can specify s(x,y) (and s(x,y,z,...)) 29 / 34

Bivariate spatial term

dsm_xyb_tw <- dsm(count ~ s(x, y), ddf.obj=df, segment.data=segs,

  • bservation.data=obs,

family=tw())

30 / 34

summary(dsm_xyb_tw)

## ## Family: Tweedie(p=1.29) ## Link function: log ## ## Formula: ## count ~ s(x, y) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.2745 0.2477 -81.85 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 16.89 21.12 4.333 3.73e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.102 Deviance explained = 34.7% ## -REML = 394.86 Scale est. = 4.8248 n = 949

31 / 34

plot(dsm_xyb_tw, select=1, scheme=2, asp=1)

On link scale scheme=2 makes heatmap (set too.far to exclude points far from data)

Plotting

32 / 34

Comparing bivariate and additive models

33 / 34

Let's have a go... Let's have a go...

34 / 34 34 / 34

Lecture 2 : Generalized Additive Models Lecture 2 : Generalized Additive Models

1 / 34 1 / 34

slide-2
SLIDE 2

Overview

The count model, from scratch What is a GAM? What is smoothing? Fitting GAMs using dsm 2 / 34

slide-3
SLIDE 3

Building a model, from scratch

Know count in segment Want : Additive model of smooths : is the link function

π‘œπ‘˜ π‘˜ = 𝑔([environmental covariates ) π‘œπ‘˜ ]π‘˜ 𝑑 = exp [ + 𝑑( ) + 𝑑( )] π‘œπ‘˜ 𝛾0 yπ‘˜ Depthπ‘˜ model terms exp

3 / 34

slide-4
SLIDE 4

Building a model, from scratch

What about area and detectability? area of segment - "offset" probability of detection in segment

= exp [ + 𝑑( ) + 𝑑( )] π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ π΅π‘˜ π‘žΜ‚

π‘˜

4 / 34

slide-5
SLIDE 5

Building a model, from scratch

It's a statistical model so: has a distribution (count) are residuals (differences between model and

  • bservations)

= exp [ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ π‘œπ‘˜ πœ—π‘˜

5 / 34

slide-6
SLIDE 6

That's a Generalized Additive Model! That's a Generalized Additive Model!

6 / 34 6 / 34

slide-7
SLIDE 7

Now let's look at each bit... Now let's look at each bit...

7 / 34 7 / 34

slide-8
SLIDE 8

Response

where

= exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ ∼ count distribution π‘œπ‘˜

8 / 34

slide-9
SLIDE 9

Response is a count Often, it's mostly zero mean variance (Poisson isn't good at this)

Count distributions

β‰ 

9 / 34

slide-10
SLIDE 10

(NB there is a point mass at zero not plotted) Poisson is We estimate and

Tweedie distribution

Var (count) = πœšπ”½(count)π‘Ÿ π‘Ÿ = 1 π‘Ÿ 𝜚

10 / 34

slide-11
SLIDE 11

Estimate (Poisson: )

Negative binomial distribution

Var (count) = 𝔽(count) + πœ†π”½(count)2 πœ† Var (count) = 𝔽(count)

11 / 34

slide-12
SLIDE 12

Smooths

= exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜

12 / 34

slide-13
SLIDE 13

Think =smooth Want a line that is "close" to all the data Balance between interpolation and "fit"

What about these "s" things?

𝑑

13 / 34

slide-14
SLIDE 14

What is smoothing? What is smoothing?

14 / 34 14 / 34

slide-15
SLIDE 15

Smoothing

We think underlying phenomenon is smooth "Abundance is a smooth function of depth" 1, 2 or more dimensions 15 / 34

slide-16
SLIDE 16

We set: "type": bases (made up

  • f basis functions)

"maximum wigglyness": basis size (sometimes: dimension/complexity) Automatically estimate: "how wiggly it needs to be": smoothing parameter(s)

Estimating smooths

16 / 34

slide-17
SLIDE 17

Functions made of other, simpler functions Basis functions , estimate

Splines

𝑐𝑙 𝛾𝑙 𝑑(𝑦) = (𝑦) βˆ‘πΏ

𝑙=1 𝛾𝑙𝑐𝑙

17 / 34

slide-18
SLIDE 18

Measuring wigglyness

Visually: Lots of wiggles not smooth Straight line very smooth

β‡’ β‡’

18 / 34

slide-19
SLIDE 19

How wiggly are things?

Set basis complexity or "size" Fitted smooths have effective degrees of freedom (EDF) Set "large enough"

𝑙 𝑙

19 / 34

slide-20
SLIDE 20

I can't teach you all of GAMs in 1 week Good intro book (also a good textbook on GLMs and GLMMs) Quite technical in places More resources on course website

Getting more out of GAMs

20 / 34

slide-21
SLIDE 21

Fitting GAMs using dsm Fitting GAMs using dsm

21 / 34 21 / 34

slide-22
SLIDE 22

Translating maths into R

where are some errors, count distribution inside the link: formula=count ~ s(y) response distribution: family=nb() or family=tw() detectability: ddf.obj=df_hr

  • ffset, data: segment.data=segs,
  • bservation.data=obs

= exp[ + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ πœ—π‘˜ πœ—π‘˜ ∼ π‘œπ‘˜

22 / 34

slide-23
SLIDE 23

Your rst DSM

library(dsm) dsm_x_tw <- dsm(count~s(x), ddf.obj=df, segment.data=segs, observation.data=obs, family=tw())

dsm is based on mgcv by Simon Wood 23 / 34

slide-24
SLIDE 24

summary(dsm_x_tw)

## ## Family: Tweedie(p=1.326) ## Link function: log ## ## Formula: ## count ~ s(x) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -19.8115 0.2277 -87.01 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x) 4.962 6.047 6.403 1.07e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.0283 Deviance explained = 17.9% ## -REML = 409.94 Scale est. = 6.0413 n = 949

24 / 34

slide-25
SLIDE 25

plot(dsm_x_tw) Dashed lines indicate +/- 2 standard errors Rug plot On the link scale EDF on axis

Plotting

𝑧

25 / 34

slide-26
SLIDE 26

Adding a term

Just use +

dsm_xy_tw <- dsm(count ~ s(x) + s(y), ddf.obj=df, segment.data=segs,

  • bservation.data=obs,

family=tw())

26 / 34

slide-27
SLIDE 27

summary(dsm_xy_tw)

## ## Family: Tweedie(p=1.306) ## Link function: log ## ## Formula: ## count ~ s(x) + s(y) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.0908 0.2381 -84.39 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x) 4.943 6.057 3.224 0.004239 ** ## s(y) 5.293 6.419 4.034 0.000322 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.0678 Deviance explained = 27.4% ## -REML = 399.84 Scale est. = 5.3157 n = 949

27 / 34

slide-28
SLIDE 28

Plotting

plot(dsm_xy_tw, pages=1)

28 / 34

slide-29
SLIDE 29

Bivariate terms

Assumed an additive structure No interaction We can specify s(x,y) (and s(x,y,z,...)) 29 / 34

slide-30
SLIDE 30

Bivariate spatial term

dsm_xyb_tw <- dsm(count ~ s(x, y), ddf.obj=df, segment.data=segs,

  • bservation.data=obs,

family=tw())

30 / 34

slide-31
SLIDE 31

summary(dsm_xyb_tw)

## ## Family: Tweedie(p=1.29) ## Link function: log ## ## Formula: ## count ~ s(x, y) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.2745 0.2477 -81.85 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 16.89 21.12 4.333 3.73e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.102 Deviance explained = 34.7% ## -REML = 394.86 Scale est. = 4.8248 n = 949

31 / 34

slide-32
SLIDE 32

plot(dsm_xyb_tw, select=1, scheme=2, asp=1)

On link scale scheme=2 makes heatmap (set too.far to exclude points far from data)

Plotting

32 / 34

slide-33
SLIDE 33

Comparing bivariate and additive models

33 / 34

slide-34
SLIDE 34

Let's have a go... Let's have a go...

34 / 34 34 / 34