Modeling defoliation of Pinus Radiata trees using hyperspectral - PowerPoint PPT Presentation

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018  1 Department of Geography, GIScience group, University of Jena   2 NEIKER, Vitoria-Gasteiz, Spain   https://pjs-web.de  @pjs_228  @pat-s  @pjs_228  patrick.schratz@uni-jena.de  Patrick Schratz Slides: https://bit.ly/2Nls9Do

Contribution to project goals Action B1 Deliverables Remotely-sensed forest health map (80%) (maps for all plots, Basque Country missing) Maps of forest disease potential (50%) (Diplodia & Fusarium, Armillaria & Heterobasidion missing) Milestones Data for spatial data analysis compiled  Developed model of forest disease potential  ( xgboost ) Final selection of algorithm for remotely-sensed forest health mapping  ( xgboost ) 2 / 28

Introduction Study aims Modelling defoliation (as a proxy of tree health) using remote sensing data (high- dimensional modeling problem) Find the most important variables and predict defoliation to the whole Basque Country Find the best performing algorithm 3 / 28

Data  4 / 28

Data  Hyperspectral data Airborne data collected end of September 2016 Characteristic Value Geometric resolution 1 m Radiometric resolution 12 bit Spectral resolution 126 bands (404.08 nm - 996.31 nm) Geometric, radiometric and atmospheric corrected Survey data In-situ data from Laukiz 1 , Laukiz 2 , Luiando , Oiartzun (total 1750 observations) Surveyed in September 2016 Variables like defoliation (in three height levels), number of cankers , age , diameter , etc. 5 / 28

Data  6 / 28

Methods  7 / 28

Methods  Variable retrival Extract as much information from the hyperspecctral data as possible Lehnert, Meyer, and Bendix (2018) 90 vegetation indices (using the hsdar R package (Lehnert, Meyer, and Bendix, 2018)) 7xxx NRIs (Normalized Ratio Indices) What are NRIs? b j i b i b j where and are the respective band numbers. The most famous NRI is the (Normalized Difference Vegetation Index). N I 122 valid bands (4 were corrupt): (125 * 126) / 2 = 7875 - corrupt bands and bands with division by zero = 7471 valid NRI variables . 8 / 28

Methods  Algorithm benchmarking Extreme Gradient Boosting (xgboost) (Chen and Guestrin, 2016) Support Vector Machine (SVM) (Vapnik, 1998) Ridge regression (RR) (Friedman, Hastie, and Tibshirani, 2010) Hyperparameter tuning Using SMBO (Sequential-based Model Optimization) (Bischl, Richter, Bossek, et al., 2017) Different partitioning schemes for performance estimation Plot level Tree level 9 / 28

Methods  Plot level Training on 3 out of 4 plots, testing on the remaining one. Four performance estimates. Tree level Spatial partitioning using k-means clustering (Brenning, 2012). Five folds Five repetitions 10 / 28

Methods  Variable importance Aim: Find the most important variables among the 7xxx predictors Method: Using the internal variable importance measure of the winning algorithm ( xgboost ): Gain : The relative contribution of the feature to the model Cover metric : How often a feature was selected to be the deciding feature in a tree for a specific observation Frequency : How often a feature occurs in all trees of the model 11 / 28

Results  12 / 28

Results  Fig. 1: Descriptive statistics of the response variable defoliation . . 13 / 28

Results  Performance (CV) Tab. 1: Spatial block CV performances of RR, SVM and xgboost using RMSE as the error measure. Mean and standard deviation are shown. RR SVM xgboost 59.10 (22.71) 36.23 (15.73) 33.26 (16.61) Plot level vs tree level Tab. 2: Predictive performance of xgboost at the plot and tree level. The performances estimates for "Plot level" correspond to the fold for which the respective plot was serving as the test set (block CV). For "Tree level" a five-fold five times repeated SpCV was used. Plot/Data Plot level Tree level (SpCV) Laukiz 1 22.03 19.18 Laukiz 2 51.75 17.24 Luiando 13.20 8.30 Oiartzun 32.97 14.40 14 / 28

Results  Fig. 2: RMSE vs. mean point density and coefficient of variation (defoliation). 15 / 28

Results  Variable importance Fig. 3: The 30 most important variables as estimated by the internal variable importance measure of the xgboost algorithm. The higher the score, the more important the feature. "bf2" notes that a buffer of 2 meter was used to extract the variable information to the tree observation. "NRI" means that a normalized ratio index with the subsequent bands was calculated. Features without "NRI" prefix are vegetation indices, e.g. "bf2_EVI". 16 / 28

Results  Variable importance Acronym Name Formula Reference EVI Enhanced vegetation index 1 R 800 − R 670 2.5 × R 800 −(6× R 670 )−(7.5× R 475 )+1) R n 800 − R n GDVI Generalized DVI* 2 680 R n 800 + R n 680 D1 Derivative Index 3 D 730 D 706 mNDVI Normalized DVI* 4 R 800 − R 680 ( R 800 + R 680 −2× R 445 mSR Simple Ratio Index 4 R 800 − R 445 R 680 − R 445 1: Huete, Liu, Batchily, et al. (1997) 2: Wu, Niu, Tang, et al. (2008) 3: Zarco-Tejada, Pushnik, Dobrowski, et al. (2003) 4: Sims and Gamon (2002) * Difference Vegetation Index 17 / 28

Results  Spatial prediction Fig. 4: Spatially predicted defoliation (in %) from xgboost of Laukiz 1 , Laukiz 2 , Luiando and Oiartzun . 18 / 28

Results  Spatial prediction Fig. 5: Histograms of predicted defoliation (in %) from xgboost of Laukiz 1 , Laukiz 2 , Luiando and Oiartzun . 19 / 28

Discussion  20 / 28

Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) 21 / 28

Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) RMSE vs. plot characteristics Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots 21 / 28

Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) RMSE vs. plot characteristics Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots Predictive performance xgboost showed the best performance  RR showed a suprisingly bad performance Random Forest was not used due to the high number of variables (very long runtime) 21 / 28

Discussion  Predictive performance High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important 22 / 28

Discussion  Predictive performance High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important Variable importance Internal model variable importance measures can always be questioned (not comparable) How to find a threshold which subset of variables to use Possible enhancments: Use two features sets: Only NRI, only vegetation indices Conduct a Principal components analysis (PCA) 22 / 28

Discussion  Spatial prediction To-do: Prediction to Basque Country using Sentinel-2 data with the seven most important variables 23 / 28

Appendix  24 / 28

Appendix  App. 1: Spectral signatures (mean and standard deviation) of each plot. 25 / 28

References  26 / 28

References  Bischl, B, J. Richter, J. Bossek, et al. (2017). "mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions". In: ArXiv e-prints . arXiv: 1703.03373 [stat]. Brenning, A. (2012). "Spatial Cross-Validation and Bootstrap for the Assessment of Prediction Rules in Remote Sensing: The R Package Sperrorest". In: 2012 IEEE International Geoscience and Remote Sensing Symposium . R package version 2.1.0. IEEE. DOI: 10.1109/igarss.2012.6352393. Chen, T. and C. Guestrin (2016). "XGBoost: A Scalable Tree Boosting System". In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . KDD '16. 01130. New York, NY, USA: ACM, pp. 785-794. ISBN: 978-1-4503-4232-2. DOI: 10.1145/2939672.2939785. Friedman, J, T. Hastie and R. Tibshirani (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent". In: Journal of Statistical Software 33.1. 05097, pp. 1-22. Huete, A. R, H. Q. Liu, K. Batchily, et al. (1997). "A Comparison of Vegetation Indices over a Global Set of TM Images for EOS-MODIS". In: Remote Sensing of Environment 59.3. 01474, pp. 440-451. ISSN: 0034-4257. DOI: 10/bgtpgv. 27 / 28

Modeling defoliation of Pinus Radiata trees using hyperspectral - PowerPoint PPT Presentation

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018 1

Increase in the Content of Extractives in Scots Pine ( Pinus sylvestris L ) in Scots Pine ( Pinus

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Integr Integrating social & ec ating social & ecological needs w ological needs when

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

The number of spanning trees of random 2 -trees Stephan Wagner (joint work with Elmar Teufl)

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

AVL TREES Height Balance : AVL Trees h 1 h 2 | h - h | 1 AVL AVL 2 1 non-AVL trees

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

Trade Wars and Trade Talks with Data Ralph Ossa University of Chicago and NBER January 2014

Discrete Logs for Hyperelliptic Curves Summer School on Elliptic and Hyperelliptic Curve

Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the

Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 10,

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS TaPP '11 A Day in the Life...

Security Drill SSC4 run 2010 Sven Gabriel, Nikhef (EGEE-OSCT/EGI-CSIRT) Thanks Atlas VO

Plankton Model with Time Delayed Nutrient Recycling Sue Ann Campbell, Matthew Kloosterman and

LCS 11 : Introduction to Cognitive Science. Acquisition of Syntax Jesse Harris April 8 , 2013

Modeling defoliation of Pinus Radiata trees using hyperspectral - PowerPoint PPT Presentation

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018 1

Increase in the Content of Extractives in Scots Pine ( Pinus sylvestris L ) in Scots Pine ( Pinus

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Integr Integrating social &amp; ec ating social &amp; ecological needs w ological needs when

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

The number of spanning trees of random 2 -trees Stephan Wagner (joint work with Elmar Teufl)

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

AVL TREES Height Balance : AVL Trees h 1 h 2 | h - h | 1 AVL AVL 2 1 non-AVL trees

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

Trade Wars and Trade Talks with Data Ralph Ossa University of Chicago and NBER January 2014

Discrete Logs for Hyperelliptic Curves Summer School on Elliptic and Hyperelliptic Curve

Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the

Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 10,

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS TaPP '11 A Day in the Life...

Security Drill SSC4 run 2010 Sven Gabriel, Nikhef (EGEE-OSCT/EGI-CSIRT) Thanks Atlas VO

Plankton Model with Time Delayed Nutrient Recycling Sue Ann Campbell, Matthew Kloosterman and

LCS 11 : Introduction to Cognitive Science. Acquisition of Syntax Jesse Harris April 8 , 2013

Integr Integrating social & ec ating social & ecological needs w ological needs when