Modeling defoliation of Pinus Radiata trees using hyperspectral - - PowerPoint PPT Presentation

modeling defoliation of pinus radiata trees using
SMART_READER_LITE
LIVE PREVIEW

Modeling defoliation of Pinus Radiata trees using hyperspectral - - PowerPoint PPT Presentation

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018 1


slide-1
SLIDE 1

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data

Patrick Schratz1, Jannes Muenchow1, Eugenia Iturritxa2, Alexander Brenning1

LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018  1 Department of Geography, GIScience group, University of Jena   2 NEIKER, Vitoria-Gasteiz, Spain   https://pjs-web.de  @pjs_228  @pat-s  @pjs_228  patrick.schratz@uni-jena.de  Patrick Schratz Slides: https://bit.ly/2Nls9Do

slide-2
SLIDE 2

Contribution to project goals

Action B1

Deliverables

Remotely-sensed forest health map (80%) (maps for all plots, Basque Country missing) Maps of forest disease potential (50%) (Diplodia & Fusarium, Armillaria & Heterobasidion missing)

Milestones

Data for spatial data analysis compiled  Developed model of forest disease potential  (xgboost) Final selection of algorithm for remotely-sensed forest health mapping  (xgboost)

2 / 28

slide-3
SLIDE 3

Introduction

Study aims

Modelling defoliation (as a proxy of tree health) using remote sensing data (high- dimensional modeling problem) Find the most important variables and predict defoliation to the whole Basque Country Find the best performing algorithm

3 / 28

slide-4
SLIDE 4

Data 

4 / 28

slide-5
SLIDE 5

Data 

Hyperspectral data

Airborne data collected end of September 2016 Characteristic Value Geometric resolution 1 m Radiometric resolution 12 bit Spectral resolution 126 bands (404.08 nm - 996.31 nm) Geometric, radiometric and atmospheric corrected

Survey data

In-situ data from Laukiz 1, Laukiz 2, Luiando, Oiartzun (total 1750 observations) Surveyed in September 2016 Variables like defoliation (in three height levels), number of cankers, age, diameter, etc.

5 / 28

slide-6
SLIDE 6

Data 

6 / 28

slide-7
SLIDE 7

Methods 

7 / 28

slide-8
SLIDE 8

Methods 

Variable retrival

Extract as much information from the hyperspecctral data as possible Lehnert, Meyer, and Bendix (2018) 90 vegetation indices (using the hsdar R package (Lehnert, Meyer, and Bendix, 2018)) 7xxx NRIs (Normalized Ratio Indices)

What are NRIs?

where and are the respective band numbers. The most famous NRI is the (Normalized Difference Vegetation Index). 122 valid bands (4 were corrupt): (125 * 126) / 2 = 7875 - corrupt bands and bands with division by zero = 7471 valid NRI variables.

i

bj bi bj N I

8 / 28

slide-9
SLIDE 9

Methods 

Algorithm benchmarking

Extreme Gradient Boosting (xgboost) (Chen and Guestrin, 2016) Support Vector Machine (SVM) (Vapnik, 1998) Ridge regression (RR) (Friedman, Hastie, and Tibshirani, 2010)

Hyperparameter tuning

Using SMBO (Sequential-based Model Optimization) (Bischl, Richter, Bossek, et al., 2017)

Different partitioning schemes for performance estimation

Plot level Tree level

9 / 28

slide-10
SLIDE 10

Methods 

Plot level

Training on 3 out of 4 plots, testing on the remaining one. Four performance estimates.

Tree level

Spatial partitioning using k-means clustering (Brenning, 2012). Five folds Five repetitions

10 / 28

slide-11
SLIDE 11

Methods 

Variable importance

Aim: Find the most important variables among the 7xxx predictors Method: Using the internal variable importance measure of the winning algorithm (xgboost): Gain: The relative contribution of the feature to the model Cover metric: How often a feature was selected to be the deciding feature in a tree for a specific observation Frequency: How often a feature occurs in all trees of the model

11 / 28

slide-12
SLIDE 12

Results 

12 / 28

slide-13
SLIDE 13

Results 

  • Fig. 1: Descriptive statistics of the response variable defoliation. .

13 / 28

slide-14
SLIDE 14

Results 

Performance (CV)

  • Tab. 1: Spatial block CV performances of RR, SVM and xgboost using RMSE as the error measure. Mean

and standard deviation are shown.

RR SVM xgboost 59.10 (22.71) 36.23 (15.73) 33.26 (16.61)

Plot level vs tree level

  • Tab. 2: Predictive performance of xgboost at the plot and tree level. The performances estimates for

"Plot level" correspond to the fold for which the respective plot was serving as the test set (block CV). For "Tree level" a five-fold five times repeated SpCV was used.

Plot/Data Plot level Tree level (SpCV) Laukiz 1 22.03 19.18 Laukiz 2 51.75 17.24 Luiando 13.20 8.30 Oiartzun 32.97 14.40

14 / 28

slide-15
SLIDE 15

Results 

  • Fig. 2: RMSE vs. mean point density and coefficient of variation (defoliation).

15 / 28

slide-16
SLIDE 16

Results 

Variable importance

  • Fig. 3: The 30 most important variables as estimated by the internal variable importance measure of the xgboost
  • algorithm. The higher the score, the more important the feature. "bf2" notes that a buffer of 2 meter was used to

extract the variable information to the tree observation. "NRI" means that a normalized ratio index with the subsequent bands was calculated. Features without "NRI" prefix are vegetation indices, e.g. "bf2_EVI".

16 / 28

slide-17
SLIDE 17

Results 

Variable importance

Acronym Name Formula Reference EVI Enhanced vegetation index 1 GDVI Generalized DVI* 2 D1 Derivative Index 3 mNDVI Normalized DVI* 4 mSR Simple Ratio Index 4 1: Huete, Liu, Batchily, et al. (1997) 2: Wu, Niu, Tang, et al. (2008) 3: Zarco-Tejada, Pushnik, Dobrowski, et al. (2003) 4: Sims and Gamon (2002) * Difference Vegetation Index

2.5 ×

R800−R670 R800−(6×R670)−(7.5×R475)+1) Rn

800−Rn 680

Rn

800+Rn 680

D730 D706 R800−R680 (R800+R680−2×R445 R800−R445 R680−R445

17 / 28

slide-18
SLIDE 18

Results 

Spatial prediction

  • Fig. 4: Spatially predicted defoliation (in %) from xgboost of Laukiz 1, Laukiz 2, Luiando and Oiartzun.

18 / 28

slide-19
SLIDE 19

Results 

Spatial prediction

  • Fig. 5: Histograms of predicted defoliation (in %) from xgboost of Laukiz 1, Laukiz 2, Luiando and

Oiartzun.

19 / 28

slide-20
SLIDE 20

Discussion 

20 / 28

slide-21
SLIDE 21

Discussion 

Derivation of indices

2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects)

21 / 28

slide-22
SLIDE 22

Discussion 

Derivation of indices

2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects)

RMSE vs. plot characteristics

Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots

21 / 28

slide-23
SLIDE 23

Discussion 

Derivation of indices

2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects)

RMSE vs. plot characteristics

Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots

Predictive performance

xgboost showed the best performance  RR showed a suprisingly bad performance Random Forest was not used due to the high number of variables (very long runtime)

21 / 28

slide-24
SLIDE 24

Discussion 

Predictive performance

High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important

22 / 28

slide-25
SLIDE 25

Discussion 

Predictive performance

High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important

Variable importance

Internal model variable importance measures can always be questioned (not comparable) How to find a threshold which subset of variables to use Possible enhancments: Use two features sets: Only NRI, only vegetation indices Conduct a Principal components analysis (PCA)

22 / 28

slide-26
SLIDE 26

Discussion 

Spatial prediction

To-do: Prediction to Basque Country using Sentinel-2 data with the seven most important variables

23 / 28

slide-27
SLIDE 27

Appendix 

24 / 28

slide-28
SLIDE 28

Appendix 

  • App. 1: Spectral signatures (mean and standard deviation) of each plot.

25 / 28

slide-29
SLIDE 29

References 

26 / 28

slide-30
SLIDE 30

References 

Bischl, B, J. Richter, J. Bossek, et al. (2017). "mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions". In: ArXiv e-prints. arXiv: 1703.03373 [stat]. Brenning, A. (2012). "Spatial Cross-Validation and Bootstrap for the Assessment of Prediction Rules in Remote Sensing: The R Package Sperrorest". In: 2012 IEEE International Geoscience and Remote Sensing Symposium. R package version 2.1.0. IEEE. DOI: 10.1109/igarss.2012.6352393. Chen, T. and C. Guestrin (2016). "XGBoost: A Scalable Tree Boosting System". In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '16. 01130. New York, NY, USA: ACM, pp. 785-794. ISBN: 978-1-4503-4232-2. DOI: 10.1145/2939672.2939785. Friedman, J, T. Hastie and R. Tibshirani (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent". In: Journal of Statistical Software 33.1. 05097, pp. 1-22. Huete, A. R, H. Q. Liu, K. Batchily, et al. (1997). "A Comparison of Vegetation Indices over a Global Set of TM Images for EOS-MODIS". In: Remote Sensing of Environment 59.3. 01474, pp. 440-451. ISSN: 0034-4257. DOI: 10/bgtpgv.

27 / 28

slide-31
SLIDE 31

References 

Lehnert, L. W, H. Meyer and J. Bendix (2018). Hsdar: Manage, Analyse and Simulate Hyperspectral Data in R. 00012 R package version 0.7.1. Sims, D. A. and J. A. Gamon (2002). "Relationships between Leaf Pigment Content and Spectral Reflectance across a Wide Range of Species, Leaf Structures and Developmental Stages". In: Remote Sensing of Environment 81.2. 01985, pp. 337-354. ISSN: 0034-4257. DOI: 10/fb9nnj. Vapnik, V. (1998). "The Support Vector Method of Function Estimation". In: Nonlinear

  • Modeling. Springer US, pp. 55-85. DOI: 10.1007/978-1-4615-5703-6_3.

28 / 28