Forest for the monitoring of wetland vegetation with multispectral - - PowerPoint PPT Presentation

forest for the monitoring of
SMART_READER_LITE
LIVE PREVIEW

Forest for the monitoring of wetland vegetation with multispectral - - PowerPoint PPT Presentation

Comparing CART and Random Forest for the monitoring of wetland vegetation with multispectral data. Julie Campagna, phD student, Angers France Aurlie Davranche, Associate professor Angers France IBS-DR Biometry Workshop cnrs.fr Wrzburg


slide-1
SLIDE 1

Julie Campagna, phD student, Angers France Aurélie Davranche, Associate professor Angers France

Comparing CART and Random Forest for the monitoring of wetland vegetation with multispectral data.

cnrs.fr

1

IBS-DR Biometry Workshop Würzburg University 09

  • ctober 2015

Workshop Wurszburg 10/2015

slide-2
SLIDE 2

SUMMARY

  • Decisions trees : Generalities
  • CART and Random Forest presentation
  • CART functionment
  • Random Forest functionment
  • Exemple of application : Remote sensing

Workshop Wurszburg 10/2015

2

slide-3
SLIDE 3

DECISION TREES

  • Method of classification (or regression)
  • Non parametric method
  • Can deal with a lot of data
  • Separate each sample to obtain the most homogeneous

classes as possible

  • Separability criterions existing :
  • Gini Index : CART
  • Chi square automatic interaction detection : CHAID
  • Shannon Entropy :C5.0

Workshop Wurszburg 10/2015

3

slide-4
SLIDE 4

COMPARAISON CART ET RANDOM FOREST

Two decision tree methods developped essentially by Breiman et al.

  • Cart was the first in 1984
  • Random Forest 2001
  • Different applications: biology, medecine, remote

sensing,…

  • Deal with a lot of data sample and variables
  • Not perturbated with extrems data or variables not

required

Workshop Wurszburg 10/2015

4

slide-5
SLIDE 5

CART : FUNDAMENTALS

  • Cart use Ginny criterion to separate a

training sample

n : Number of class to predict Fi : Class frequencie in the node

  • Dichotomous partionning
  • Decision rule appears

Workshop Wurszburg 10/2015

5

 

n i i

f I ² 1

slide-6
SLIDE 6

CART : IMPROVEMENT

Choose the result tree

  • 75% for training sample and 25% for

validation

  • 10 cross validation
  • (Esposito et al, CV-1SE)

Workshop Wurszburg 10/2015

6

Sample 75% 25% Cross Validation 10 folds>Error minimal Final tree Validation Accuracy

slide-7
SLIDE 7

CART : PRUNNING RESULT

Workshop Wurszburg 10/2015

7

slide-8
SLIDE 8

CART : PARAMETERS

  • Cart was implement in R using the package Rpart
  • Presence = « 1 » ; absence = « 2 »
  • Unbalanced sample

Optimal « Prior » parameter : iterative runs of the algorithm

Workshop Wurszburg 10/2015

8

slide-9
SLIDE 9

RANDOM FOREST : GENERAL OPERATION

  • RF grows many classification trees
  • To classify, each variable goes down each of the trees in the

forest.

  • Each tree gives a classification: we say the tree “votes” for that

class.

  • The forest choses the classification having the most vote (over all

the trees in the forest).

Workshop Wurszburg 10/2015

9

slide-10
SLIDE 10

RANDOM FOREST : STEP ONE

  • For each tree it selects randomly 2/3 of the sample for

training set and 1/3 for validation (Out Of Bag, OOB)

  • Variables are chosen randomly (generally sqrt(variables))

at each node with replacement

Workshop Wurszburg 10/2015

10

slide-11
SLIDE 11

RANDOM FOREST : STEP TWO FOREST CONSTRUCTION

Workshop Wurszburg 10/2015

11

slide-12
SLIDE 12

RANDOM FOREST : PARAMETERS

  • Can not deal with unbalanced samples
  • Two ways to ajust datas :
  • Up-sampling based on the size of the largest class
  • Down-sampling based on the size of the smallest class

Workshop Wurszburg 10/2015

12

slide-13
SLIDE 13

EXEMPLE OF APPLICATION : REMOTE SENSING

  • Satellite images usefull for

monitoring of wetland environments

  • In this case we used a high

spatial resolution image (World View 2) on Camargue in South of France.

  • Needs :
  • Mapping the vegetation
  • Create a method easy to

apply without knowledge in remote sensing and R programmation

Workshop Wurszburg 10/2015

13

[image removed]

slide-14
SLIDE 14

SAMPLE

  • 21 landcover classes from field data
  • 49 descriptive variables : reflectance values from bands

spectral data and multispectral indices

Workshop Wurszburg 10/2015

14

20 40 60 80 100 120 140 160 180

Class size

slide-15
SLIDE 15

EXEMPLE OF APPLICATION : REMOTE SENSING

Classification of Salicornia Fruticosa

Workshop Wurszburg 10/2015

15

[image removed]

slide-16
SLIDE 16

EXEMPLE OF APPLICATION : REMOTE SENSING

Cartography Results : Sarcocornia Fruticosa

Workshop Wurszburg 10/2015

16

slide-17
SLIDE 17

EXEMPLE OF APPLICATION : REMOTE SENSING

Confusion matricies :

Cart

Carte de référence Classe 1 Classe 2 Précision Globale Carte produite, classification Entraînement Classe 1 49 65 Classe2 1 1316 50 1381 0,953878407 Erreur d'omission 0,02 0,04706 7 Validation Classe 1 16 22 Classe2 428 16 450 0,9527897 Erreur d'omission 0,04888 9 Total Classe 1 65 87 Classe2 1 1744 66 1831 0,953610965 Erreur d'omission

0,015 0,047 RF

Carte de référence Classe 1 Classe 2 Précision Globale Erreur OOB Carte produite, classification RF_Up Classe 1 858 9 Classe2 1822 858 1831

0,991

0,26% 0 0,00494 RF_Down Classe 1 59 50 Classe2 7 1781 66 1831

0,97

3% Erreur d'omission

0,11 0,027

Workshop Wurszburg 10/2015

17

  • Close classification accuracy values
slide-18
SLIDE 18

EXEMPLE OF APPLICATION : REMOTE SENSING

  • The difference between global accuracy is really low

between CART and Random forest (around 1,5%) and both results are good.

  • CART provides an explicit model, the one of Random

Forest is implicit

  • An explicit model can be used again on a new dataset or an
  • ther image of the same date without repeat all the steps of

modeling : more easy to use without specific knowledge

Workshop Wurszburg 10/2015

18

slide-19
SLIDE 19

CONCLUSION AND DISCUSSION

  • On a same dataset and with all parameters suitable to CART

we obtain results not significantly different from Random Forest

  • This two models need some parameters to be capable to deal

with unbalanced samples

  • CART can generate an explicit model as Random Forest can’t
  • This two algorithms also permit to identify important

variables

Workshop Wurszburg 10/2015

19

slide-20
SLIDE 20

THANKS FOR YOUR ATTENTION !

Workshop Wurszburg 10/2015

20