So far... Uncertianty of what? CV plot Prediction data Plotting - - - PowerPoint PPT Presentation

β–Ά
so far uncertianty of what cv plot prediction data
SMART_READER_LITE
LIVE PREVIEW

So far... Uncertianty of what? CV plot Prediction data Plotting - - - PowerPoint PPT Presentation

So far... Uncertianty of what? CV plot Prediction data Plotting - data processing Variance of abundance In R... Variance propagation When can we use the delta method? GAM + detection function uncertainty Sources of uncertainty Eort


slide-1
SLIDE 1

So far...

Build, check & select detection models Build, check & select spatial models What about predictions? 2 / 45

Let's talk about maps Let's talk about maps

3 / 45 3 / 45 Grids! Cells are abundance estimate "snapshot" Sum cells to get abundance Sum a subset?

What does a map mean?

4 / 45

Going back to the formula

Count model ( observations): Predictions (index ): Need to "fill-in" values for , and .

π‘˜ = exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ 𝑠 = exp[ + ( ) + ( )] π‘œΜ‚

𝑠

𝐡𝑠 𝛾̂ 𝑑̂ y𝑠 𝑑̂ Depth𝑠 𝐡𝑠 y𝑠 Depth𝑠

5 / 45

Predicting

With these values can use predict in R predict(model, newdata=data, off.set=off.set)

  • ff.set gives the area of the grid cells

more info in ?predict.dsm 6 / 45

Prediction data

## x y Depth SST NPP DistToCAS ## 126 547984.6 788254 153.59825 12.04609 1462.521 11788.974 ## 127 557984.6 788254 552.31067 12.81379 1465.410 5697.248 ## 258 527984.6 778254 96.81992 12.90251 1429.432 13722.626 ## 259 537984.6 778254 138.23763 13.21393 1424.862 9720.671 ## 260 547984.6 778254 505.14386 13.75655 1379.351 8018.690 ## 261 557984.6 778254 1317.59521 14.42525 1348.544 3775.462 ## EKE off.set long lat ## 126 0.0008329031 1e+08 -66.52252 40.94697 ## 127 0.0009806611 1e+08 -66.40464 40.94121 ## 258 0.0011575423 1e+08 -66.76551 40.86781 ## 259 0.0013417297 1e+08 -66.64772 40.86227 ## 260 0.0026881567 1e+08 -66.52996 40.85662 ## 261 0.0045683752 1e+08 -66.41221 40.85087

7 / 45

Predictors

8 / 45

Making a prediction

Add another column to the prediction data Plotting then easier (in R)

predgrid$Nhat_tw <- predict(dsm_all_tw_rm, predgrid,

  • ff.set=predgrid$off.set)

9 / 45

Maps of predictions

p <- ggplot(predgrid) + geom_tile(aes(x=x, y=y, fill=Nhat_tw)) scale_fill_viridis() + coord_equal() print(p)

10 / 45

Total abundance

Each cell has an abundance, sum to get total

sum(predgrid$Nhat_tw) ## [1] 2491.863

11 / 45

Subsetting

R subsetting lets you calculate "interesting" estimates:

# how many sperm whales at depths shallower than 2500m? sum(predgrid$Nhat_tw[predgrid$Depth < 2500]) ## [1] 1006.27 # how many sperm whales East of 0? sum(predgrid$Nhat_tw[predgrid$x>0]) ## [1] 1383.744

12 / 45

Extrapolation Extrapolation

13 / 45 13 / 45 Predicting at values

  • utside those observed

What does "outside" mean? between transects?

  • utside "survey area"?

What do we mean by extrapolation?

14 / 45

Extrapolation

In general, try not to do it! Variance issues? Space-time interchangability? dsmextra package by Phil Bouchet https://densitymodelling.github.io/dsmextra/index.html 15 / 45

Prediction recap

Using predict Getting "overall" abundance Subsetting Plotting in R Extrapolation (and its dangers) 16 / 45

Estimating variance Estimating variance

17 / 45 17 / 45

Now we can make predictions Now we can make predictions

Now we are dangerous. Now we are dangerous.

18 / 45 18 / 45

Predictions are useless without uncertainty Predictions are useless without uncertainty

19 / 45 19 / 45

Where does uncertainty come from? Where does uncertainty come from?

20 / 45 20 / 45

Sources of uncertainty

Detection function parameters GAM parameters (And more! But only looking at these 2 here!) 21 / 45

Uncertianty of what?

Uncertainty from detection function + GAM Want to talk about , so need to do some maths dsm does this for you! Details in Miller et al (2013) appendix

𝑂̂

22 / 45

GAM + detection function uncertainty

(Getting a little fast-and-loose with the mathematics) the "delta method"

( ) β‰ˆ (GAM) + CV2 𝑂̂ CV2 (detection function) CV2

23 / 45

When can we use the delta method?

Assumes detection function and GAM are independent This is okay if: no detection function covariates 24 / 45

Variance propagation

When detection function is not independent Uncertainty "propagated" through the model Refit both models together Bravington, Miller and Hedley (2019) https://arxiv.org/abs/1807.07996 25 / 45

In R...

Functions in dsm to do this dsm.var.gam assumes spatial model and detection function are independent dsm.var.prop propagates uncertainty from detection function to spatial model

  • nly works for count models

covariates can only vary at segment level 26 / 45

Variance of abundance

Using dsm.var.gam

dsm_tw_var_ind <- dsm.var.gam(dsm_all_tw_rm, predgrid,

  • ff.set=predgrid$off.set)

summary(dsm_tw_var_ind) ## Summary of uncertainty in a density surface model calculated ## analytically for GAM, with delta method ## ## Approximate asymptotic confidence interval: ## 2.5% Mean 97.5% ## 1539.017 2491.863 4034.641 ## (Using log-Normal approximation) ## ## Point estimate : 2491.863 ## CV of detection function : 0.2113123 ## CV from GAM : 0.1329 ## Total standard error : 622.0386 ## Total coefficient of variation : 0.2496

27 / 45

Plotting - data processing

Calculate uncertainty per-cell dsm.var.* thinks predgrid is one "region" Need to split data into cells (using split()) Need width and height of cells for plotting 28 / 45

Plotting (code)

predgrid$width <- predgrid$height <- 10*1000 predgrid_split <- split(predgrid, 1:nrow(predgrid)) head(predgrid_split,3) ## $`1` ## x y Depth SST NPP DistToCAS ## 126 547984.6 788254 153.5983 12.04609 1462.521 11788.97 ## EKE off.set long lat Nhat_tw ## 126 0.0008329031 1e+08 -66.52252 40.94697 0.01417646 ## height width ## 126 10000 10000 ## ## $`2` ## x y Depth SST NPP DistToCAS ## 127 557984.6 788254 552.3107 12.81379 1465.41 5697.248 ## EKE off.set long lat Nhat_tw ## 127 0.0009806611 1e+08 -66.40464 40.94121 0.05123446 ## height width ## 127 10000 10000 ## ## $`3` ## x y Depth SST NPP DistToCAS ## 258 527984.6 778254 96.81992 12.90251 1429.432 13722.63 ## EKE off.set long lat Nhat_tw

29 / 45

p <- plot(dsm_tw_var_map,

  • bservations=FALSE,

plot=FALSE) + coord_equal() + scale_fill_viridis() print(p)

CV plot

dsm_tw_var_map <- dsm.var.gam(dsm_all_tw_rm, predgrid_split,

  • ff.set=predgrid$off.set)

30 / 45

Interpreting CV plots

Plotting coefficient of variation Standardise standard deviation by mean (per cell) Can be useful to overplot survey effort

CV = se( )/ 𝑂̂ 𝑂̂

31 / 45

Eort overplotted

32 / 45

Big CVs

Here CVs are "well behaved" Not always the case (huge CVs possible) These can be a pain to plot Use cut() in R to make categorical variable e.g. c(seq(0,1, len=10), 2:4, Inf) or somesuch (Example in practical) 33 / 45

Uncertainty recap

How does uncertainty arise in a DSM? Estimate variance of abundance estimate Map coefficient of variation 34 / 45

Practical advice Practical advice

35 / 45 35 / 45

Pilot studies and "you get what you pay for"

Designing surveys is hard Designing surveys is essential Better to fail one season than fail for 5, 10 years Get information early, get it cheap Inform design from a pilot study 36 / 45

Avoiding rules of thumb

Think about assumptions Detection function Spatial model Think about design Spatial coverage Covariate coverage 37 / 45

Sometimes things are complicated

Weather has a big effect on detectability Need to record during survey Disambiguate between distribution/detectability Potential confounding can be BAD 38 / 45

Visibility during POWER 2014

Thanks to Hiroto Murase and co. for this data! 39 / 45

Covariates can make a big dierence!

Same data, same spatial model With weather covariates and without 40 / 45

Disappointment

Sometimes you don't have enough data Or, enough coverage Or, the right covariates Sometimes, you can't build a spatial model 41 / 45

Segmenting

Example on course site Length of is reasonable Too big: no detail Too small: all 0/1 See also Redfern et al., (2008)

β‰ˆ 2π‘₯

42 / 45

Getting help Getting help

43 / 45 43 / 45

Resources

Course reading list has pointers to these topics DenMod wiki with FAQ and more Distance sampling Google Group Friendly, helpful, low traffic see distancesampling.org/distancelist.html 44 / 45

That's all folks! That's all folks!

45 / 45 45 / 45

Lecture 5: Predictions Lecture 5: Predictions and and variance variance

1 / 45 1 / 45

slide-2
SLIDE 2

So far...

Build, check & select detection models Build, check & select spatial models What about predictions? 2 / 45

slide-3
SLIDE 3

Let's talk about maps Let's talk about maps

3 / 45 3 / 45

slide-4
SLIDE 4

Grids! Cells are abundance estimate "snapshot" Sum cells to get abundance Sum a subset?

What does a map mean?

4 / 45

slide-5
SLIDE 5

Going back to the formula

Count model ( observations): Predictions (index ): Need to "fill-in" values for , and .

π‘˜ = exp[ + 𝑑( ) + 𝑑( )] + π‘œπ‘˜ π΅π‘˜π‘žΜ‚

π‘˜

𝛾0 yπ‘˜ Depthπ‘˜ πœ—π‘˜ 𝑠 = exp[ + ( ) + ( )] π‘œΜ‚

𝑠

𝐡𝑠 𝛾̂ 𝑑̂ y𝑠 𝑑̂ Depth𝑠 𝐡𝑠 y𝑠 Depth𝑠

5 / 45

slide-6
SLIDE 6

Predicting

With these values can use predict in R predict(model, newdata=data, off.set=off.set)

  • ff.set gives the area of the grid cells

more info in ?predict.dsm 6 / 45

slide-7
SLIDE 7

Prediction data

## x y Depth SST NPP DistToCAS ## 126 547984.6 788254 153.59825 12.04609 1462.521 11788.974 ## 127 557984.6 788254 552.31067 12.81379 1465.410 5697.248 ## 258 527984.6 778254 96.81992 12.90251 1429.432 13722.626 ## 259 537984.6 778254 138.23763 13.21393 1424.862 9720.671 ## 260 547984.6 778254 505.14386 13.75655 1379.351 8018.690 ## 261 557984.6 778254 1317.59521 14.42525 1348.544 3775.462 ## EKE off.set long lat ## 126 0.0008329031 1e+08 -66.52252 40.94697 ## 127 0.0009806611 1e+08 -66.40464 40.94121 ## 258 0.0011575423 1e+08 -66.76551 40.86781 ## 259 0.0013417297 1e+08 -66.64772 40.86227 ## 260 0.0026881567 1e+08 -66.52996 40.85662 ## 261 0.0045683752 1e+08 -66.41221 40.85087

7 / 45

slide-8
SLIDE 8

Predictors

8 / 45

slide-9
SLIDE 9

Making a prediction

Add another column to the prediction data Plotting then easier (in R)

predgrid$Nhat_tw <- predict(dsm_all_tw_rm, predgrid,

  • ff.set=predgrid$off.set)

9 / 45

slide-10
SLIDE 10

Maps of predictions

p <- ggplot(predgrid) + geom_tile(aes(x=x, y=y, fill=Nhat_tw)) scale_fill_viridis() + coord_equal() print(p)

10 / 45

slide-11
SLIDE 11

Total abundance

Each cell has an abundance, sum to get total

sum(predgrid$Nhat_tw) ## [1] 2491.863

11 / 45

slide-12
SLIDE 12

Subsetting

R subsetting lets you calculate "interesting" estimates:

# how many sperm whales at depths shallower than 2500m? sum(predgrid$Nhat_tw[predgrid$Depth < 2500]) ## [1] 1006.27 # how many sperm whales East of 0? sum(predgrid$Nhat_tw[predgrid$x>0]) ## [1] 1383.744

12 / 45

slide-13
SLIDE 13

Extrapolation Extrapolation

13 / 45 13 / 45

slide-14
SLIDE 14

Predicting at values

  • utside those observed

What does "outside" mean? between transects?

  • utside "survey area"?

What do we mean by extrapolation?

14 / 45

slide-15
SLIDE 15

Extrapolation

In general, try not to do it! Variance issues? Space-time interchangability? dsmextra package by Phil Bouchet https://densitymodelling.github.io/dsmextra/index.html 15 / 45

slide-16
SLIDE 16

Prediction recap

Using predict Getting "overall" abundance Subsetting Plotting in R Extrapolation (and its dangers) 16 / 45

slide-17
SLIDE 17

Estimating variance Estimating variance

17 / 45 17 / 45

slide-18
SLIDE 18

Now we can make predictions Now we can make predictions

Now we are dangerous. Now we are dangerous.

18 / 45 18 / 45

slide-19
SLIDE 19

Predictions are useless without uncertainty Predictions are useless without uncertainty

19 / 45 19 / 45

slide-20
SLIDE 20

Where does uncertainty come from? Where does uncertainty come from?

20 / 45 20 / 45

slide-21
SLIDE 21

Sources of uncertainty

Detection function parameters GAM parameters (And more! But only looking at these 2 here!) 21 / 45

slide-22
SLIDE 22

Uncertianty of what?

Uncertainty from detection function + GAM Want to talk about , so need to do some maths dsm does this for you! Details in Miller et al (2013) appendix

𝑂̂

22 / 45

slide-23
SLIDE 23

GAM + detection function uncertainty

(Getting a little fast-and-loose with the mathematics) the "delta method"

( ) β‰ˆ (GAM) + CV2 𝑂̂ CV2 (detection function) CV2

23 / 45

slide-24
SLIDE 24

When can we use the delta method?

Assumes detection function and GAM are independent This is okay if: no detection function covariates 24 / 45

slide-25
SLIDE 25

Variance propagation

When detection function is not independent Uncertainty "propagated" through the model Refit both models together Bravington, Miller and Hedley (2019) https://arxiv.org/abs/1807.07996 25 / 45

slide-26
SLIDE 26

In R...

Functions in dsm to do this dsm.var.gam assumes spatial model and detection function are independent dsm.var.prop propagates uncertainty from detection function to spatial model

  • nly works for count models

covariates can only vary at segment level 26 / 45

slide-27
SLIDE 27

Variance of abundance

Using dsm.var.gam

dsm_tw_var_ind <- dsm.var.gam(dsm_all_tw_rm, predgrid,

  • ff.set=predgrid$off.set)

summary(dsm_tw_var_ind) ## Summary of uncertainty in a density surface model calculated ## analytically for GAM, with delta method ## ## Approximate asymptotic confidence interval: ## 2.5% Mean 97.5% ## 1539.017 2491.863 4034.641 ## (Using log-Normal approximation) ## ## Point estimate : 2491.863 ## CV of detection function : 0.2113123 ## CV from GAM : 0.1329 ## Total standard error : 622.0386 ## Total coefficient of variation : 0.2496

27 / 45

slide-28
SLIDE 28

Plotting - data processing

Calculate uncertainty per-cell dsm.var.* thinks predgrid is one "region" Need to split data into cells (using split()) Need width and height of cells for plotting 28 / 45

slide-29
SLIDE 29

Plotting (code)

predgrid$width <- predgrid$height <- 10*1000 predgrid_split <- split(predgrid, 1:nrow(predgrid)) head(predgrid_split,3) ## $`1` ## x y Depth SST NPP DistToCAS ## 126 547984.6 788254 153.5983 12.04609 1462.521 11788.97 ## EKE off.set long lat Nhat_tw ## 126 0.0008329031 1e+08 -66.52252 40.94697 0.01417646 ## height width ## 126 10000 10000 ## ## $`2` ## x y Depth SST NPP DistToCAS ## 127 557984.6 788254 552.3107 12.81379 1465.41 5697.248 ## EKE off.set long lat Nhat_tw ## 127 0.0009806611 1e+08 -66.40464 40.94121 0.05123446 ## height width ## 127 10000 10000 ## ## $`3` ## x y Depth SST NPP DistToCAS ## 258 527984.6 778254 96.81992 12.90251 1429.432 13722.63 ## EKE off.set long lat Nhat_tw

29 / 45

slide-30
SLIDE 30

p <- plot(dsm_tw_var_map,

  • bservations=FALSE,

plot=FALSE) + coord_equal() + scale_fill_viridis() print(p)

CV plot

dsm_tw_var_map <- dsm.var.gam(dsm_all_tw_rm, predgrid_split,

  • ff.set=predgrid$off.set)

30 / 45

slide-31
SLIDE 31

Interpreting CV plots

Plotting coefficient of variation Standardise standard deviation by mean (per cell) Can be useful to overplot survey effort

CV = se( )/ 𝑂̂ 𝑂̂

31 / 45

slide-32
SLIDE 32

Eort overplotted

32 / 45

slide-33
SLIDE 33

Big CVs

Here CVs are "well behaved" Not always the case (huge CVs possible) These can be a pain to plot Use cut() in R to make categorical variable e.g. c(seq(0,1, len=10), 2:4, Inf) or somesuch (Example in practical) 33 / 45

slide-34
SLIDE 34

Uncertainty recap

How does uncertainty arise in a DSM? Estimate variance of abundance estimate Map coefficient of variation 34 / 45

slide-35
SLIDE 35

Practical advice Practical advice

35 / 45 35 / 45

slide-36
SLIDE 36

Pilot studies and "you get what you pay for"

Designing surveys is hard Designing surveys is essential Better to fail one season than fail for 5, 10 years Get information early, get it cheap Inform design from a pilot study 36 / 45

slide-37
SLIDE 37

Avoiding rules of thumb

Think about assumptions Detection function Spatial model Think about design Spatial coverage Covariate coverage 37 / 45

slide-38
SLIDE 38

Sometimes things are complicated

Weather has a big effect on detectability Need to record during survey Disambiguate between distribution/detectability Potential confounding can be BAD 38 / 45

slide-39
SLIDE 39

Visibility during POWER 2014

Thanks to Hiroto Murase and co. for this data! 39 / 45

slide-40
SLIDE 40

Covariates can make a big dierence!

Same data, same spatial model With weather covariates and without 40 / 45

slide-41
SLIDE 41

Disappointment

Sometimes you don't have enough data Or, enough coverage Or, the right covariates Sometimes, you can't build a spatial model 41 / 45

slide-42
SLIDE 42

Segmenting

Example on course site Length of is reasonable Too big: no detail Too small: all 0/1 See also Redfern et al., (2008)

β‰ˆ 2π‘₯

42 / 45

slide-43
SLIDE 43

Getting help Getting help

43 / 45 43 / 45

slide-44
SLIDE 44

Resources

Course reading list has pointers to these topics DenMod wiki with FAQ and more Distance sampling Google Group Friendly, helpful, low traffic see distancesampling.org/distancelist.html 44 / 45

slide-45
SLIDE 45

That's all folks! That's all folks!

45 / 45 45 / 45