summary(dsm_ts_all) Each s() has its own options summary(dsm_all) - - PowerPoint PPT Presentation

summary dsm ts all each s has its own options summary dsm
SMART_READER_LITE
LIVE PREVIEW

summary(dsm_ts_all) Each s() has its own options summary(dsm_all) - - PowerPoint PPT Presentation

summary(dsm_ts_all) Each s() has its own options summary(dsm_all) Count model count~... Using reference bands Term selection Model with no shrinkage ... with shrinkage The story so far... p-values Goodness of t Implications of Tobler's


slide-1
SLIDE 1

The story so far...

How GAMs work How to include detection info Simple spatial-only models 2 / 37

Life isn't that simple

Which enivronmental covariates? Which response distribution? Which response? How to select between possible models? 3 / 37

Adding covariates Adding covariates

4 / 37 4 / 37

Model formulation

Pure spatial, pure environmental, mixed? Prior knowledge of biology/ecology of species What are drivers of distribution? What data is available? 5 / 37

Sperm whale covariates

6 / 37

Tobler's rst law of geography Tobler's rst law of geography

"Everything is related to everything else, but near things "Everything is related to everything else, but near things are more related than distant things" are more related than distant things"

Tobler (1970) Tobler (1970)

7 / 37 7 / 37

Implications of Tobler's law

8 / 37

Adding smooths

Already know that + is our friend Can build a big model...

dsm_all <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())

9 / 37

Each s() has its own options

s(..., k=...) to adjust basis size s(..., bs="...") for basis type lots more options (we'll see a few here) 10 / 37

Now we have a huge model, what do we do? Now we have a huge model, what do we do?

11 / 37 11 / 37 Two popular approaches (using -values) Stepwise selection - path dependence All possible subsets - computationally expensive (fishing?)

Term selection

𝑞

12 / 37

p-values

Test for zero effect of a smooth They are approximate for GAMs (but useful) Reported in summary 13 / 37

summary(dsm_all)

## ## Family: Tweedie(p=1.25) ## Link function: log ## ## Formula: ## count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + ## s(NPP) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.6368 0.2751 -75 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 5.225 7.153 1.233 0.2920 ## s(Depth) 3.568 4.439 6.641 1.82e-05 *** ## s(DistToCAS) 1.000 1.000 1.504 0.2204 ## s(SST) 5.927 6.986 2.068 0.0407 * ## s(EKE) 1.763 2.225 2.579 0.0693 . ## s(NPP) 2.393 3.068 0.856 0.4678 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

14 / 37

Path dependence is an issue here

(silly) Strategy: want all (***), remove terms 1-by-1 Two different universes appear: This isn't very satisfactory!

𝑞 ≈ 0

15 / 37 Already selecting wigglyness of terms (via a penalty) What about using it to remove the whole term?

Term selection during tting

16 / 37 Basis s(..., bs="ts") - thin plate splines with shrinkage remove the wiggles then remove the "linear" bits nullspace should be shrunk less than the wiggly part

Shrinkage approach

17 / 37

Shrinkage example

dsm_ts_all <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())

18 / 37

Model with no shrinkage

19 / 37

... with shrinkage

20 / 37

summary(dsm_ts_all)

## ## Family: Tweedie(p=1.277) ## Link function: log ## ## Formula: ## count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS, ## bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP, ## bs = "ts") + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.260 0.234 -86.59 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 1.8875209 29 0.705 4.33e-06 *** ## s(Depth) 3.6794182 9 4.811 < 2e-16 *** ## s(DistToCAS) 0.0000934 9 0.000 0.6797 ## s(SST) 0.3826654 9 0.063 0.2160 ## s(EKE) 0.8196256 9 0.499 0.0178 * ## s(NPP) 0.0003570 9 0.000 0.8372 ## --- f

21 / 37

EDF comparison

tp ts s(x,y) 5.2245 1.8875 s(Depth) 3.5679 3.6794 s(DistToCAS) 1.0001 0.0001 s(SST) 5.9267 0.3827 s(EKE) 1.7631 0.8196 s(NPP) 2.3931 0.0004 22 / 37

Removing terms?

  • 1. EDF

Terms with EDF<1 may not be useful (can we remove?)

  • 2. non-significant -value

Decide on a significance level and use that as a rule (In some sense leaving "shrunk" terms in is more "consistent" in terms of variance estimation, but can be computationally annoying)

𝑞

23 / 37

Comparing models Comparing models

24 / 37 24 / 37

Comparing models

Usually have >1 option How can we pick? Even if we have 1 model, is it any good? (This can be subtle, more in model checking tomorrow!) 25 / 37

Akaike's "An Information Criterion"

As for many other models, we can get an AIC from our model Comparison of AIC fine but: can't compare Tweedie (continuous) and negative binomial (discrete) distributions! (within distribution is fine)

AIC(dsm_all) ## [1] 1238.288 AIC(dsm_ts_all) ## [1] 1225.822

26 / 37

Selecting between response distributions Selecting between response distributions

27 / 37 27 / 37

Goodness of t

Q-Q plots Closer to the line is better But what does "close" mean? 28 / 37

Using reference bands

What is down to random variation? Where does the model actually fail? Resampling the response, generate bands

qq.gam(dsm_all, asp=1, main="Tweedie", cex=5, rep=100)

29 / 37

Which response type? Which response type?

30 / 37 30 / 37

Count model count~...

Effort is effective effort Response is count per segment 31 / 37

Estimated abundance abundance.est~...

Effort is area of each segment Response is estimated abundance per segment 32 / 37

When to use each approach?

Practical choice 2 detection function covariate "levels" "Observer"/"observation" -- change within segment "Segment" -- change between segments "Count model" only lets us use segment-level covariates "Estimated abundance" lets us use either 33 / 37 Detection covariate: Beaufort Changes at segment level count or abundance.est

Sperm whale response example (either)

34 / 37 Detection covariate: group size (size) Changes at observation level abundance.est only

Sperm whale response example (abundance.est)

35 / 37

Recap Recap

36 / 37 36 / 37

Recap

Adding smooths Path dependence Removing smooths

  • values

shrinkage Comparing models Comparing response distributions

𝑞

37 / 37

Lecture 3: Multivariate smoothing Lecture 3: Multivariate smoothing & model selection model selection

1 / 37 1 / 37

slide-2
SLIDE 2

The story so far...

How GAMs work How to include detection info Simple spatial-only models 2 / 37

slide-3
SLIDE 3

Life isn't that simple

Which enivronmental covariates? Which response distribution? Which response? How to select between possible models? 3 / 37

slide-4
SLIDE 4

Adding covariates Adding covariates

4 / 37 4 / 37

slide-5
SLIDE 5

Model formulation

Pure spatial, pure environmental, mixed? Prior knowledge of biology/ecology of species What are drivers of distribution? What data is available? 5 / 37

slide-6
SLIDE 6

Sperm whale covariates

6 / 37

slide-7
SLIDE 7

Tobler's rst law of geography Tobler's rst law of geography

"Everything is related to everything else, but near things "Everything is related to everything else, but near things are more related than distant things" are more related than distant things"

Tobler (1970) Tobler (1970)

7 / 37 7 / 37

slide-8
SLIDE 8

Implications of Tobler's law

8 / 37

slide-9
SLIDE 9

Adding smooths

Already know that + is our friend Can build a big model...

dsm_all <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())

9 / 37

slide-10
SLIDE 10

Each s() has its own options

s(..., k=...) to adjust basis size s(..., bs="...") for basis type lots more options (we'll see a few here) 10 / 37

slide-11
SLIDE 11

Now we have a huge model, what do we do? Now we have a huge model, what do we do?

11 / 37 11 / 37

slide-12
SLIDE 12

Two popular approaches (using -values) Stepwise selection - path dependence All possible subsets - computationally expensive (fishing?)

Term selection

𝑞

12 / 37

slide-13
SLIDE 13

p-values

Test for zero effect of a smooth They are approximate for GAMs (but useful) Reported in summary 13 / 37

slide-14
SLIDE 14

summary(dsm_all)

## ## Family: Tweedie(p=1.25) ## Link function: log ## ## Formula: ## count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + ## s(NPP) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.6368 0.2751 -75 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 5.225 7.153 1.233 0.2920 ## s(Depth) 3.568 4.439 6.641 1.82e-05 *** ## s(DistToCAS) 1.000 1.000 1.504 0.2204 ## s(SST) 5.927 6.986 2.068 0.0407 * ## s(EKE) 1.763 2.225 2.579 0.0693 . ## s(NPP) 2.393 3.068 0.856 0.4678 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

14 / 37

slide-15
SLIDE 15

Path dependence is an issue here

(silly) Strategy: want all (***), remove terms 1-by-1 Two different universes appear: This isn't very satisfactory!

𝑞 ≈ 0

15 / 37

slide-16
SLIDE 16

Already selecting wigglyness of terms (via a penalty) What about using it to remove the whole term?

Term selection during tting

16 / 37

slide-17
SLIDE 17

Basis s(..., bs="ts") - thin plate splines with shrinkage remove the wiggles then remove the "linear" bits nullspace should be shrunk less than the wiggly part

Shrinkage approach

17 / 37

slide-18
SLIDE 18

Shrinkage example

dsm_ts_all <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())

18 / 37

slide-19
SLIDE 19

Model with no shrinkage

19 / 37

slide-20
SLIDE 20

... with shrinkage

20 / 37

slide-21
SLIDE 21

summary(dsm_ts_all)

## ## Family: Tweedie(p=1.277) ## Link function: log ## ## Formula: ## count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS, ## bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP, ## bs = "ts") + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.260 0.234 -86.59 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 1.8875209 29 0.705 4.33e-06 *** ## s(Depth) 3.6794182 9 4.811 < 2e-16 *** ## s(DistToCAS) 0.0000934 9 0.000 0.6797 ## s(SST) 0.3826654 9 0.063 0.2160 ## s(EKE) 0.8196256 9 0.499 0.0178 * ## s(NPP) 0.0003570 9 0.000 0.8372 ## --- f

21 / 37

slide-22
SLIDE 22

EDF comparison

tp ts s(x,y) 5.2245 1.8875 s(Depth) 3.5679 3.6794 s(DistToCAS) 1.0001 0.0001 s(SST) 5.9267 0.3827 s(EKE) 1.7631 0.8196 s(NPP) 2.3931 0.0004 22 / 37

slide-23
SLIDE 23

Removing terms?

  • 1. EDF

Terms with EDF<1 may not be useful (can we remove?)

  • 2. non-significant -value

Decide on a significance level and use that as a rule (In some sense leaving "shrunk" terms in is more "consistent" in terms of variance estimation, but can be computationally annoying)

𝑞

23 / 37

slide-24
SLIDE 24

Comparing models Comparing models

24 / 37 24 / 37

slide-25
SLIDE 25

Comparing models

Usually have >1 option How can we pick? Even if we have 1 model, is it any good? (This can be subtle, more in model checking tomorrow!) 25 / 37

slide-26
SLIDE 26

Akaike's "An Information Criterion"

As for many other models, we can get an AIC from our model Comparison of AIC fine but: can't compare Tweedie (continuous) and negative binomial (discrete) distributions! (within distribution is fine)

AIC(dsm_all) ## [1] 1238.288 AIC(dsm_ts_all) ## [1] 1225.822

26 / 37

slide-27
SLIDE 27

Selecting between response distributions Selecting between response distributions

27 / 37 27 / 37

slide-28
SLIDE 28

Goodness of t

Q-Q plots Closer to the line is better But what does "close" mean? 28 / 37

slide-29
SLIDE 29

Using reference bands

What is down to random variation? Where does the model actually fail? Resampling the response, generate bands

qq.gam(dsm_all, asp=1, main="Tweedie", cex=5, rep=100)

29 / 37

slide-30
SLIDE 30

Which response type? Which response type?

30 / 37 30 / 37

slide-31
SLIDE 31

Count model count~...

Effort is effective effort Response is count per segment 31 / 37

slide-32
SLIDE 32

Estimated abundance abundance.est~...

Effort is area of each segment Response is estimated abundance per segment 32 / 37

slide-33
SLIDE 33

When to use each approach?

Practical choice 2 detection function covariate "levels" "Observer"/"observation" -- change within segment "Segment" -- change between segments "Count model" only lets us use segment-level covariates "Estimated abundance" lets us use either 33 / 37

slide-34
SLIDE 34

Detection covariate: Beaufort Changes at segment level count or abundance.est

Sperm whale response example (either)

34 / 37

slide-35
SLIDE 35

Detection covariate: group size (size) Changes at observation level abundance.est only

Sperm whale response example (abundance.est)

35 / 37

slide-36
SLIDE 36

Recap Recap

36 / 37 36 / 37

slide-37
SLIDE 37

Recap

Adding smooths Path dependence Removing smooths

  • values

shrinkage Comparing models Comparing response distributions

𝑞

37 / 37