Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct - - PowerPoint PPT Presentation

gradient analysis
SMART_READER_LITE
LIVE PREVIEW

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct - - PowerPoint PPT Presentation

Multivariate Fundamentals: Rotation/Distance Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective: Use one dataset to explain another Use the spatial patterns of each dataset to try and understand the


slide-1
SLIDE 1

Gradient Analysis

Multivariate Fundamentals: Rotation/Distance

NMDS – Indirect Gradient Analysis NMDS – Direct Gradient Analysis

slide-2
SLIDE 2

Objective: Use one dataset to explain another

Use the spatial patterns of each dataset to try and understand the structure and data variation in terms of gradients in space of variables on multiple levels

E.g. environmental factors, species populations and characteristics of communities

“Gradient Analysis” is an umbrella term which includes both rotation-based and distance-based techniques All of which are aiming to determine “What in dataset A explains dataset B”?

slide-3
SLIDE 3

Gradient analysis

Data gets separated into 2 distinct datasets that have a spatial link: Which set of variables is classified as the response vs the predictors is not always clear You have to use some logic to say what set of variables potentially influences another

MAT MAP AHI … Sp1 Sp2 Sp3 …

Predictor Variables Response Variables

slide-4
SLIDE 4

Direct gradient analysis

Ordinates the data according to the independent variable (e.g. climate) and then investigates how the dependent variables (e.g. plant species) correlate to the ordination scores.

AHI MAP MAT Sp3 Sp1 Sp2 Example: Species 1 and 2 are associated with greater MAT and less MAP (warm & dry)

slide-5
SLIDE 5

AHI MAP MAT Sp3 Sp1 Sp2

Indirect gradient analysis

Ordinates your dependent variable (e.g. community data according to their similarity in species composition). Relationship between species frequencies and environmental gradients is then investigated by correlating the ordination scores with the environmental variables in the second step.

Example: Species 1 and 2 are associated with moderate MAT and moderate MAP (mild & moist)

slide-6
SLIDE 6

Direct vs Indirect gradient analysis

Can extend any ordination technique to a gradient analysis But the easiest way to look at both a direct and indirect gradient analysis is to use a PCA (rotation) or NMDS (distance) plot and simply add a second set of vectors to infer relationships between datasets (we do this in Lab 7)

NMDS – Indirect Gradient Analysis NMDS – Direct Gradient Analysis

Indirect: Lodgepole pine (PINUCON) is associated with environments with greater growing season precipitation (lnMSP) Direct: Lodgepole pine is still associated with wetter summer environments, but so is white spruce (PICEGLA)

slide-7
SLIDE 7

Direct vs Indirect gradient analysis

Direct and indirect ordinations may detect different trends hidden in your data structure Or by only considering one orientation you may miss an important pattern in you plant community data E.g. soil attributes or climate variables you did not measure Therefore it is good to use both direct and indirect gradient analysis to get the full picture of relationships within your data This will allow you to see the full data structure and correctly interpret environmental drivers and data responses Sometimes in the literature: Indirect gradient analysis Direct gradient analysis Constrained gradient analysis

Indirect gradient analysis Direct gradient analysis

slide-8
SLIDE 8

Constrained gradient analysis (CGA)

Goal of CGA is to utilize both datasets to infer (as in regression) patterns in species composition from patterns in environmental variables CGA identifies which environmental variables are most important in structuring the community

E.g. brings out pairs of variables between datasets that are highly associated with each other

Further describes how the environmental variables are related and how the community varies along these most important gradients BUT: CGA loses all structure between predictor and response variables You will just pull out the cross-correlated components from the datasets and ignore everything else Not necessarily a problem – but it is something to be aware of Canonical – something being optimized against some other constraint

slide-9
SLIDE 9

Canonical Correlation Analysis (CANCOR)

Rotation based technique for constrained gradient analysis Rotate both the predictor and response datasets independently to maximize correlation between corresponding variables among datasets Once correlation is maximized between datasets on the first axes – the axis for each dataset is fixed and rotation for the second predictor/response variable is carried out Repeated for all variables You do not need to have the same number of response and predictor variables Conical functions (rotations) will be built for the smaller number of variables Herold Hotelling (1895-1974)

slide-10
SLIDE 10

All CGA techniques are typically based on an underlying community model CANCOR (statistical test) assume that when variables are sampled over a sufficient range, responses will be linear or unimodal Linearity is an important assumption for CANCOR

Predictor Predictor Predictor Response Response Response

Golden Can fit linear curve BUT hard to fit – should transform response or predictor variables CANCOR (and others) will fail to detect a relationship Use MRT (Lab 8)

Canonical Correlation Analysis (CANCOR)

slide-11
SLIDE 11

CANCOR in R

CANCOR in R:

library(CCA) cancor(predictorData,responseData) (CCA package) cc(predictorData,responseData)

Data table of your predictor variables

E.g. Environmental Variables

To run CANCOR you need to install the CCA package Data table of your response variables

E.g. Species Community Variables

The cancor() function will display the correlation values The cc () function outputs a number of statistics that we can query to provide some more information from the analysis AND use to test significance Predictor data table and Response data table need to have the same number of rows BUT they do not need to have the same number of columns

slide-12
SLIDE 12

CANCOR in R

Correlations between the rotated axes The first correlation will be the maximum values and all successional correlations will be smaller Estimates of the predictor and response coefficients from the rotation model (matrix algebra) The value used to adjust each predictor and response variable under rotation

slide-13
SLIDE 13

CANCOR in R

We can use the output from the cc

function to individually test if the

correlation values between our axes are significant P-values test the hypothesis that the true correlation is equal to 0 i.e. There is no correlation Therefore small p-values reject this hypothesis and there is a true significant correlation between the axes Based on our CANCOR analysis of 3 predictor and 3 response variables, the correlations found between rotation 1 (0.93) and rotation 2 (0.7) were found to be significant BUT the correlation between rotations 3 (0.12) was found to NOT be significant

slide-14
SLIDE 14

CANCOR in R

If correlation for the canonical functions (rotated axes) is significant we can look at the loadings to see what each new rotated axes is related to in our original predictor and response variables In our case we only have to look at Can1 and Can2 We now have to interpret the loading values for the predictor and response variables together (e.g. associate high loadings together) Can1: When Env3 is rare, Spec 2 and Spec 3 have lower frequency (both negative) Spec2 and Spec3 prefer Env3 (reverse) Can2: When Env1 is abundant, Spec1 has a higher frequency (both positive) Spec 1 prefers Env1 Nothing really likes Env2

slide-15
SLIDE 15

Canonical Correspondence Analysis (CCA)

Rotation-based technique for constrained gradient analysis Like CANCOR, CCA aims to maximize the correlations between response and predictor variables, BUT response scores are constrained to be linear combinations of predictor variables in a effort to maximize the variance explained by the predictor data in all

  • rdination axes (e.g. CCA1, CCA2, etc.)

Multiple linear regression is used to solve the linear combinations of predictor variables Categorical variables can be used in CCA – converted to “dummy” variables where each class is assigned a numeric value (should be addressed with caution) CCA is considered an improvement over CANCOR in some fields CCA was developed for Ecology Like CANCOR – linearity of the relationship between response and predictor variables is assumed CCA may be able to detect some non-linear responses, BUT there are better techniques for that (MRT, RandomForest – Lab 8)

slide-16
SLIDE 16

CCA in R

CCA in R:

library(vegan) cca(responseData,Predictor1+Predictor2+…,data=predictorData) (vegan package)

Data table of your response variables

E.g. Species Community Variables

To run CCA you need to install the vegan package

A linear equation including the predictor variables (e.g. Environmental Variables) that you feel are related to the response variable outputs (e.g. Species Occurrence) You can include as many predictors as you wish HOWEVER, the more predictors you include the more complex the analysis and the capacity to detect strong relationships is reduced (so pick your predictor variables mindfully)

Data table of your predictor variables

E.g. Environmental Variables

Predictor data table and Response data table need to have the same number of rows BUT they do not need to have the same number of columns

slide-17
SLIDE 17

CCA in R

We analyzed a model where we included 3 environmental variables to explain species frequency Variance Explained: Total variance – total amount of variance in the response variables (e.g. species data) Constrained variance – how much is explained by the predictor variables (e.g. environmental data) Unconstrained variance – how much variance is left in the response variables (unexplained) Eigenvalues – how much of the variance is explained by the individual axes of the

  • rdination (you can plot these axes with the plot command – Lab 8)

You will have to figure out the % of variance explained by yourself Simply divide value/ total variance Constrained CCA1 = 0.005395/0.007953 = ~68% Constrained CCA2 = 0.000214/0.007953 = ~3%

Unconstrained defaults to a correspondence analysis (unconstrained)

slide-18
SLIDE 18

CCA in R

To determine if a significant relationship between our response and predictor variables exists we can run our CCA output through an ANVOA Generic ANOVA tells us if a significant relationship between the response and predictor variables exists ANOVA (overwrites as anova.cca in vegan package) For by option – selecting "term" p-values will be produces for each predictor term For permu option – the number of permutations to use to generate the p-values P-values test the hypothesis that the correlation between species variables and each environmental variable is 0 From p-values Env2 and Env3 are significantly associated with species occurrences

slide-19
SLIDE 19

CCA in R

From the image we can see:

Env3 appears positively associated with Spec3 and negatively associated with Spec2 and Spec1 Env2 appears positively associated with Spec3 and Spec1 and negatively associated with Spec2 Env1 appears negatively associated with Spec2 – but from ANOVA this is not a significant relationship

slide-20
SLIDE 20

Redundancy Analysis (RDA)

Rotation-based technique for constrained gradient analysis The goal of RDA is to apply linear regression in order to find linear combinations of predictor variables to represent as much variance in the response variables as possible CCA focuses more on species composition, i.e. relative abundance, if you have a gradient along which all species are positively correlated, RDA will detect such a gradient while CCA will not With RDA, it is possible to use 'species' that are measured in different units, BUT in this case, the data must be centered and standardized RDA can useful when gradients are short or you are conducting a short-term experimental study Like CCA categorical variables can be used in RDA – converted to “dummy” variables where each class is assigned a numeric value (should be addressed with caution) Like CANCOR and CCA – linearity of the relationship between response and predictor variables is assumed

slide-21
SLIDE 21

RDA in R

RDA in R:

library(vegan) rda(responseData,Predictor1+Predictor2+…,data=predictorData) (vegan package)

Data table of your response variables

E.g. Species Community Variables

To run RDA you need to install the vegan package

A linear equation including the predictor variables (e.g. Environmental Variables) that you feel are related to the response variable outputs (e.g. Species Occurrence)

Data table of your predictor variables

E.g. Environmental Variables

Predictor data table and Response data table need to have the same number of rows BUT they do not need to have the same number of columns With RDA it is possible to use response variables that are measured in different units BUT in this case the dependent data must be centered and standardized before executing the analysis to do this you can specify the option scale=TRUE (default is FALSE)

slide-22
SLIDE 22

RDA in R

We analyzed a model where we included 4 environmental variables to explain species frequency for 6 species Eigenvalues – how much of the variance is explained by the individual axes of the

  • rdination (you can plot these axes with the plot command – Lab 8)

You will have to figure out the % of variance explained by yourself Simply divide value/ total variance Constrained RDA1 = 74.52/112.88889 = ~66% Constrained RDA2 = 24.94/112.88889 = ~22% Constrained RDA3 = 8.88/112.88889 = ~8% Variance Explained: Total variance – total amount of variance in the response variables (e.g. species data) Constrained variance – how much is explained by the predictor variables (e.g. environmental data) Unconstrained variance – how much variance is left in the response variables (unexplained)

Unconstrained defaults to a PCA analysis (unconstrained)

slide-23
SLIDE 23

RDA in R

To determine if a significant relationship between our response and predictor variables exists we can run our RDA output through an ANVOA (like CCA) Generic ANOVA tells us if a significant relationship between the response and predictor variables exists ANOVA (overwrites as anova.cca in vegan package) For by option – selecting "term" p- values will be produces for each predictor term For permu option – the number of permutations to use to generate the p- values P-values test the hypothesis that the correlation between species variables and each environmental variable is 0 From p-values Depth, Sand, and Coral are all significantly associated with fish occurrences “Other substrate” was removed from the analysis due to collinearity (last column entered)

slide-24
SLIDE 24

RDA in R

From the image we can see:

All species were found to dislike environments characterized by Sand (i.e all species respond the same across an environmental gradient) Sp3 and Sp4 associated with Coral environments Sp1 and Sp2 associated with environments with greater water Depth