Multivariate analysis of ecological data with ade4 St ephane Dray - - PowerPoint PPT Presentation

multivariate analysis of ecological data with ade4
SMART_READER_LITE
LIVE PREVIEW

Multivariate analysis of ecological data with ade4 St ephane Dray - - PowerPoint PPT Presentation

Multivariate analysis of ecological data with ade4 St ephane Dray Univ. Lyon 1 CARME 2011, Rennes St ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 1 / 31 Introduction Overview What? ade4 is an package for the exploratory analysis of


slide-1
SLIDE 1

Multivariate analysis of ecological data with ade4

St´ ephane Dray

  • Univ. Lyon 1

CARME 2011, Rennes

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 1 / 31

slide-2
SLIDE 2

Introduction Overview

What? ade4 is an package for the exploratory analysis of ecological data Multivariate analysis Graphics It contains 105 datasets 345 functions

37 multivariate methods (16 developped in the lab) 39 graphical functions

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 2 / 31

slide-3
SLIDE 3

Introduction Overview

What? ade4 is an package for the exploratory analysis of ecological data Multivariate analysis Graphics It contains 105 datasets 345 functions

37 multivariate methods (16 developped in the lab) 39 graphical functions

Why? to promote the methodological developments of the lab to facilitate the use by ecologists of these new statistical methods

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 2 / 31

slide-4
SLIDE 4

Introduction Overview

What? ade4 is an package for the exploratory analysis of ecological data Multivariate analysis Graphics It contains 105 datasets 345 functions

37 multivariate methods (16 developped in the lab) 39 graphical functions

Why? to promote the methodological developments of the lab to facilitate the use by ecologists of these new statistical methods How? A long history

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 2 / 31

slide-5
SLIDE 5

Introduction Short history

1980 2010 2000 1990

1980 : Set of programs written in BASIC

  • n a Data General Nova 3

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-6
SLIDE 6

Introduction Short history

1980 2010 2000 1990

1985 : Diagonalization procedure (assembly language for the Eclipse S/140)

◮ Analysis of real ecological datasets in a

” reasonable time”

◮ Use by the ecologists of the lab

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-7
SLIDE 7

Introduction Short history

1980 2010 2000 1990

1989 : Distribution of ADECO on modules in Microsoft QuickBasic Hypercard interface

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-8
SLIDE 8

Introduction Short history

1980 2010 2000 1990

1995 : ADE-4 modules in C Hypercard and Winplus interfaces

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-9
SLIDE 9

Introduction Short history

1980 2010 2000 1990

2000 : ADE-4 Metacard interface batch mode

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-10
SLIDE 10

Introduction Short history

1980 2010 2000 1990

2002 : ade4 package for

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 3 / 31

slide-11
SLIDE 11

Introduction Short history

Who? The ade4 users (a bibliographic study) An increasing community ...

Thioulouse et al. (1997) Statistics and Computing. Chessel et al. (2004) R News. Dray et al. (2007) R News. Dray and Dufour (2007) JSS.

... of ecologists

Subject Area (%) ECOLOGY 30.27 % MARINE & FRESHWATER BIOLOGY 18.95 % ENVIRONMENTAL SCIENCES 12.18 % MICROBIOLOGY 8.51 % PLANT SCIENCES 8.31 % SOIL SCIENCE 6.67 % GENETICS & HEREDITY 6.18 % EVOLUTIONARY BIOLOGY 6.09 % BIODIVERSITY CONSERVATION 5.60 % BIOCHEMISTRY & MOLECULAR BIOLOGY 5.41 % FORESTRY 5.41 % LIMNOLOGY 5.31 % ... ... St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 4 / 31

slide-12
SLIDE 12

Diversity of ecological datasets

Ecology, a fertile ground for methodological developments

Great diversity

1 Biological questions/models St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 5 / 31

slide-13
SLIDE 13

Diversity of ecological datasets

Ecology, a fertile ground for methodological developments

Great diversity

1 Biological questions/models 2 Sampling methods/tools St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 5 / 31

slide-14
SLIDE 14

Diversity of ecological datasets

Ecology, a fertile ground for methodological developments

Great diversity

1 Biological questions/models 2 Sampling methods/tools 3 Data structures

variables (quantitative, qualitative, ordinal, fuzzy, etc) constraints on individuals or variables (weights, spatial, phylogenetic, hierarchical, etc)

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 5 / 31

slide-15
SLIDE 15

Diversity of ecological datasets

Ecology, a fertile ground for methodological developments

Great diversity

1 Biological questions/models 2 Sampling methods/tools 3 Data structures

variables (quantitative, qualitative, ordinal, fuzzy, etc) constraints on individuals or variables (weights, spatial, phylogenetic, hierarchical, etc)

Usually, multivariate data

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 5 / 31

slide-16
SLIDE 16

Diversity of ecological datasets Community Ecology

One table : summarizing community data

species sites

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 6 / 31

slide-17
SLIDE 17

Diversity of ecological datasets Community Ecology

Two tables : linking species to environment

species sites environment

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 7 / 31

slide-18
SLIDE 18

Diversity of ecological datasets Community Ecology

Three tables : linking species traits to environment

species sites environment traits

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 8 / 31

slide-19
SLIDE 19

Diversity of ecological datasets Community Ecology

K-tables : temporal evolution of structures

species sites d a t e s

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 9 / 31

slide-20
SLIDE 20

Diversity of ecological datasets Community Ecology

K-tables : temporal evolution of co-structures

species sites environment dates

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 10 / 31

slide-21
SLIDE 21

Diversity of ecological datasets Community Ecology

Some constraints : space, phylogeny

species traits species sites

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 11 / 31

slide-22
SLIDE 22

The ade4 philosophy

” French way”of multivariate analysis

ade4 = theory coinertia dudi dudi.pca pcaiv scatter p

Q

p

X

  • n

XT

  • n

D

  • St´

ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 12 / 31

slide-23
SLIDE 23

The ade4 philosophy

One table, two viewpoints

X

cloud of n rows (individuals) variable 1 variable 2 variable p individuals hyperspace

X

cloud of p columns (variables) individual 1 individual 2 individual n variables hyperspace

what are the relationships between the variables ? what are the resemblances/differences between the individuals ?

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 13 / 31

slide-24
SLIDE 24

The ade4 philosophy Theoretical aspects

Statistical triplet

Multivariate methods aim to answer these two questions and seek for small dimension hyperspaces (few axes) where the representations of individuals and variables are as close as possible to the original ones. To answer the two previous questions, we define Q, a p × p positive symmetric matrix, used as an inner product in Rp and thus allows to measure distances between the n individuals D, a n × n positive symmetric matrix, used as an inner product in Rn and thus allows to measure relationships between the p variables. (X, Q, D)

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 14 / 31

slide-25
SLIDE 25

The ade4 philosophy Theoretical aspects

XQXTDK = KΛ XTDXQA = AΛ K contains the principal components (KTDK = Ir). A contains the principal axis (ATQA = Ir). L = XQA contains the row scores (projection of the rows of X onto the principal axes) C = XTDK contains the column scores (projection of the columns of X onto the principal components) Maximization of : Q(a) = aTQTXTDXQa = λ and S(k) = kTDTXQXTDk = λ XQa|kD =

  • XtDk|a
  • Q =

√ λ

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 15 / 31

slide-26
SLIDE 26

The ade4 philosophy The dudi class

Implementation in ade4

Computation are performed by the as.dudi (diagonalization in the smaller dimension) function : (X, Q, D)

as.dudi dudi

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 16 / 31

slide-27
SLIDE 27

The ade4 philosophy The dudi class

Implementation in ade4

Computation are performed by the as.dudi (diagonalization in the smaller dimension) function : (X, Q, D)

as.dudi dudi

ade4 (class dudi) duality diagram (theory) tab X (transformed) data table cw Q inner product for rows lw D inner product for columns eig Λ eigenvalues l1 K principal components c1 A principal axes li L row scores co C column scores

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 16 / 31

slide-28
SLIDE 28

The ade4 philosophy The dudi class

” Generic”function of the dudi class

print.dudi : display a dudi object is.dudi : test if an object is of the class dudi redo.dudi : recomputes an analysis with new number of axes t.dudi : transpose a dudi (X, Q, D) →

  • XT, D, Q
  • scatter.dudi / biplot.dudi : biplot

screeplot.dudi : barplot of eigenvalues summary.dudi : main information related to an analysis inertia.dudi : inertia statistics (absolute, relative = cos2) dist.dudi : dudi-based distance among rows/columns reconst : data approximation suprow / supcol : projection of supplementary rows/columns

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 17 / 31

slide-29
SLIDE 29

The ade4 philosophy The dudi class

User-level functions

The as.dudi function is an internal function. Is is called by user-friendly functions corresponding to different analyses. It can be used by experimented users to define their own analysis.

apropos("dudi.") [1] "dudi.acm" "dudi.coa" "dudi.dec" [4] "dudi.fca" "dudi.fpca" "dudi.hillsmith" [7] "dudi.mix" "dudi.nsc" "dudi.pca" [10] "dudi.pco"

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 18 / 31

slide-30
SLIDE 30

One-table analysis

Available methods

variables individuals

Function name Analysis name dudi.pca Principal component analysis dudi.pco Principal coordinate analysis dudi.coa Correspondence analysis dudi.acm Multiple correspondence analysis dudi.dec Decentered correspondence analysis dudi.fca Fuzzy correspondence analysis dudi.fpca Fuzzy PCA dudi.mix Mixed type analysis dudi.hillsmith Hill & Smith type analysis dudi.nsc Non-symmetric correspondence analysis

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 19 / 31

slide-31
SLIDE 31

One-table analysis

Principal Component Analysis : dudi.pca(df)

X =

  • xij − ( ¯

x j )

  • Q = Ip

D = 1

n In

Maximization of : Q(a) = aTQTXTDXQa = XQa 2

D= var(XQa)

S(k) = kTDTXQXTDk = XTDk 2

Q= p

  • j=1

cov2(k, xj )

Demo St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 20 / 31

slide-32
SLIDE 32

One-table analysis

Graphic function

Function name Type of graph s.arrow Factor map with arrows (projection of a vector basis) s.chull Factor map with convew hulls s.class Factor map with classes of points s.corcircle Factor map with correlation circle s.distri Factor map with frequency distribution s.hist Factor map with marginal histograms s.image Factor map with background image and contour curves s.kde2d Factor map with kernel density estimation s.label Factor map with labels s.logo Factor map with logos (pictures) s.match Factor map with paired coordinates s.match.class Factor map with paired coordinates and classes of points s.multinom Factor map with frequency profiles (genetics) s.traject Factor map with trajectories s.value Factor map with symbols proportional to some values sco.boxplot Boxplots on a score for a set of factors sco.class Labels on a score grouped by a factor sco.distri Mean & Std Dev for a weighted score sco.gauss Gauss curves on a score and a set of factors sco.label Labels on a score sco.match labels on two scores sco.quant Relations between a score and quantitative variables St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 21 / 31

slide-33
SLIDE 33

Two-table analysis

Available methods

individuals variables variables

Function name Analysis name between Between-class analysis within Within-class analysis discrimin Discriminant analysis coinertia Coinertia analysis cca Canonical correspondence analysis pcaiv PCA on Instrumental Variables pcaivortho Orthogonal PCAIV procuste Procustes analysis niche Niche (OMI) analysis

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 22 / 31

slide-34
SLIDE 34

Two-table analysis

Principal component analysis on instrumental variables : pcaiv(dudi,df)

(X, Q, D) Z a n × q matrix of explanatory variables

  • ˆ

X, Q, D

  • where :

ˆ X = PZX = Z(ZTDZ)−1ZTDX Maximization of : Q(a) = aTQT ˆ XTDˆ XQa = ˆ XQa 2

D= var(ˆ

XQa) S(k) = kTDT ˆ XQˆ X

TDk = ˆ

XTDk 2

Q= p

  • j=1

cov2(k, ˆ xj )

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 23 / 31

slide-35
SLIDE 35

Two-table analysis

Specific summary and plot function :

data(doubs) acp1 <- dudi.pca(doubs$poi, scannf = FALSE) pcaiv1 <- pcaiv(acp1, doubs$mil, scannf = FALSE) plot(pcaiv1)

d = 2

Loadings

d = 2

(Intercept) das alt pen deb pH dur pho nit amm

  • xy

dbo

Loadings

d = 0.5

Correlation

d = 0.5

(Intercept) das alt pen deb pH dur pho nit amm

  • xy

dbo

Correlation

Axis1 Axis2

Inertia axes

d = 5

Scores and predictions

  • 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

d = 0.1

Variables

d = 0.1

CHA TRU VAI LOC OMB BLA HOT TOX VAN CHE BAR SPI GOU BRO PER BOU PSO ROT CAR TAN BCO PCH GRE GAR BBO ABL ANG

Variables Eigenvalues

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 24 / 31

slide-36
SLIDE 36

Two-table analysis

Co-inertia analysis : coinertia(dudiX,dudiY)

(X, Q, D) (Y, R, D)

  • XTDY, R, Q
  • Maximization of :
  • XTDYRa|k
  • Q = kTQXTDYRa = cov(YRa, XQk)

Demo St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 25 / 31

slide-37
SLIDE 37

Other functions

K-table

variables individuals

Function name Analysis name sepan K-table separate analyses pta Partial triadic analysis foucart Foucart analysis statis STATIS analysis mfa Multiple factor analysis mcoa Multiple coinertia analysis statico 2 K-table analysis

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 26 / 31

slide-38
SLIDE 38

Other functions

K-table : class ktab

Series of tables are stored in object of class ktab which can be created using : ktab.within ktab.list.df ktab.list.dudi ktab.data.frame

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 27 / 31

slide-39
SLIDE 39

Other functions

Other methods

Function name Analysis name witwit.coa Internal correspondence analysis betweencoinertia Between-class coinertia analysis withincoinertia Within-class coinertia analysis rlq RLQ analysis dpcoa Double principal coordinate analysis multispati Spatial data analysis

Demo St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 28 / 31

slide-40
SLIDE 40

The ade-family Other thematic packages

adehabitat : analysis of habitat selection by animals adegenet : classes and methods for the multivariate analysis of genetic markers adephylo : exploratory analyses for the phylogenetic comparative method

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 29 / 31

slide-41
SLIDE 41

The ade-family The graphical interface

ade4TkGUI : ade4 Tcl/Tk Graphical User Interface

Demo St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 30 / 31

slide-42
SLIDE 42

The ade-family Ressources

mailing list http://listes.univ-lyon1.fr/wws/info/adelist web sites

Development on R-Forge : https://r-forge.r-project.org/R/?group_id=199 Software : http://pbil.univ-lyon1.fr/ADE-4/ Courses : http://pbil.univ-lyon1.fr/R/enseignement.html > 300 documents > 4000 pages

St´ ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 31 / 31