ababcdfghiejkl
play

ABabcdfghiejkl . . . . . .. . . . . . . .. . . . . . . .. . - PowerPoint PPT Presentation

Using distances to address the challenges of heterogeneous data Susan Holmes http:/ /www-stat.stanford.edu/susan/ Bio-X and Statistics, Stanford University July 29, 2015 ABabcdfghiejkl . . . . . .. . . . . . . .. . . . . . . .. .


  1. Principal Component Analysis: Dimension Reduction PCA seeks to replace the original (centered) matrix X by a matrix of lower rank, this can be solved using the singular value decomposition of X: X = USV ′ , with U ′ DU = I n and V ′ QV = I p and S diagonal XX ′ = US 2 U ′ , with U ′ DU = I n and S 2 = Λ PCA is a linear nonparametric multivariate method for dimension reduction. D and Q are the relevant metrics on the dual row and column spaces of n samples and p variables. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  2. A Commutative Diagram Approach Caillez and Pages, 1976. Escoufier, 1977 . Statisticians search for approximations with certain properties, for the case of PCA for instance, we rephrase the problem as follows: ▶ Q can be seen as a linear function from R p to R p ∗ = L ( R p ) , the space of scalar linear functions on R p . ▶ D can be seen as a linear function from R n to R n ∗ = L ( R n ) . ▶ R p ∗ R n − − − − → X �   � V = X t DX W = XQX t Q � V D  W      � R p R n ∗ ← − − − − X t This duality gives `transposable' data. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  3. Properties of the Diagram Rank of the diagram: X , X t , VQ and WD all have the same rank. For Q and D symmetric matrices, VQ and WD are diagonalisable and have the same eigenvalues. λ 1 ≥ λ 2 ≥ λ 3 ≥ . . . ≥ λ r ≥ 0 ≥ · · · ≥ 0 . Eigendecomposition of the diagram: VQ is Q symmetric, thus we can find Z such that VQZ = Z Λ , Z t QZ = I p , where Λ = diag ( λ 1 , λ 2 , . . . , λ p ) . (1) Modern extensions to this approach include Kernel methods in Machine Learning. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  4. Predicting and Summarizing through distances . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  5. Comparing Two Diagrams: the RV coefficient Many problems can be rephrased in terms of comparison of two ``duality diagrams" or put more simply, two characterizing operators, built from two ``triplets", usually with one of the triplets being a response or having constraints imposed on it. Most often what is done is to compare two such diagrams, and try to get one to match the other in some optimal way.(O = WD) To compare two symmetric operators, there is either a vector covariance as inner product covV ( O 1 , O 2 ) = Tr ( O t 1 O 2 ) = < O 1 , O 2 > or a vector correlation (Escoufier, 1977) Tr ( O t 1 O 2 ) RV ( O 1 , O 2 ) = . √ Tr ( O t 1 O 1 ) tr ( O t 2 O 2 ) If we were to compare the two triplets X n × 1 , 1 , 1 n I n and ( ) Y n × 1 , 1 , 1 n I n we would have RV = ρ 2 . ( ) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  6. Part II Dimension Reduction: the Euclidean embedding workhorse: MDS . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  7. Metric Multidimensional Scaling Schoenberg (1935) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  8. From Coordinates to Distances and Back If we started with original data in R p that are not centered: Y, apply the centering matrix with H = ( I − 1 n 11 ′ ) , and 1 ′ = (1 , 1 , 1 . . . , 1) X = HY , Call B = XX ′ , if D (2) is the matrix of squared distances between rows of X in the euclidean coordinates, we can show that − 1 2 HD (2) H = B Schoenberg's result: exact Euclidean distance If B is positive semi-definite then D can be seen as a distance between points in a Euclidean space. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  9. Reverse engineering an Euclidean embedding We can go backwards from a matrix D to X by taking the eigendecomposition of B = − 1 2 HD (2) H in much the same way that PCA provides the best rank r approximation for data by taking the singular value decomposition of X, or the eigendecomposition of XX ′ . s 1   0 0 0 ... s 2 0 0 0 ...   X ( r ) = US ( r ) V ′ with S ( r ) =   0 0 ... ... ...    s r  0 0 ... ...   ... ... ... 0 0 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  10. Multidimensional Scaling (MDS) Simple classical multidimensional scaling. ▶ Square D elementwise D (2) = D 2 . ▶ Compute − 1 2 HD 2 H = B. ▶ Diagonalize B to find the principal coordinates SV ′ . ▶ Choose a number of dimensions by inspecting the eigenvalue's screeplot. The advantage is that the original distances don't have to be Euclidean. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  11. Taking Categorical Data and Making it into a Continuum Horseshoe Example:Joint with Persi Diaconis and Sharad Goel (Annals of Applied Stats, 2005). Data from 2005 U.S. House of Representatives roll call votes. We further restricted our analysis to the 401 Representatives that voted on at least 90 % of the roll calls (220 Republicans, 180 Democrats and 1 Independent) leading to a 401 × 669 matrix of voting data. The Data V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 R1 -1 -1 1 -1 0 1 1 1 1 1 ... R2 -1 -1 1 -1 0 1 1 1 1 1 ... R3 1 1 -1 1 -1 1 1 -1 -1 -1 ... R4 1 1 -1 1 -1 1 1 -1 -1 -1 ... R5 1 1 -1 1 -1 1 1 -1 -1 -1 ... R6 -1 -1 1 -1 0 1 1 1 1 1 ... R7 -1 -1 1 -1 -1 1 1 1 1 1 ... R8 -1 -1 1 -1 0 1 1 1 1 1 ... R9 1 1 -1 1 -1 1 1 -1 -1 -1 ... . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . . R10 -1 -1 1 -1 0 1 1 0 0 0 ...

  12. L 1 distance We define a distance between legislators as 669 1 ˆ d ( l i , l j ) = ∑ | v ik − v jk | . 669 k =1 Roughly, ˆ d ( l i , l j ) is the percentage of roll calls on which legislators l i and l j disagreed. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  13. 0.15 0.1 0.05 0 ! 0.05 ! 0.1 ! 0.15 ! 0.2 ! 0.2 0.1 ! 0.1 0.05 0 0 0.1 ! 0.05 0.2 ! 0.1 3-Dimensional MDS mapping of legislators based on the 2005 U.S. House of Representatives roll call votes. We used dissimilarity indices 1-exp ( − λ d ( R 1 , R 2 )) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  14. 0.15 0.1 0.05 0 ! 0.05 ! 0.1 ! 0.15 ! 0.2 ! 0.2 0.1 ! 0.1 0.05 0 0 0.1 ! 0.05 0.2 ! 0.1 3-Dimensional MDS mapping of legislators based on the 2005 U.S. House of Representatives roll call votes. Color has been added to indicate the party affiliation of each representative. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  15. 100 90 80 70 National Journal Score 60 50 40 30 20 10 0 0 50 100 150 200 250 300 350 400 MDS Rank Comparison of the MDS derived rank for Representatives with the National Journal's liberal score . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  16. Fill-in of NNI moves: Billera, Holmes, Vogtmann (2001)(BHV). The boundaries between regions represent an area of uncertainty about the exact branching order. In biological terminology this is called an `unresolved' tree. More details here An Application: Visualizing Geodesic Distances between Trees ▶ Nearest Neighbor Interchange (NNI). Rotation Moves 0 1 2 3 4 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  17. Fill-in of NNI moves: Billera, Holmes, Vogtmann (2001)(BHV). The boundaries between regions represent an area of uncertainty about the exact branching order. In biological terminology this is called an `unresolved' tree. More details here An Application: Visualizing Geodesic Distances between Trees ▶ Nearest Neighbor Interchange (NNI). Rotation Moves 0 0 1 2 3 4 1 2 3 4 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  18. Fill-in of NNI moves: Billera, Holmes, Vogtmann (2001)(BHV). The boundaries between regions represent an area of uncertainty about the exact branching order. In biological terminology this is called an `unresolved' tree. More details here An Application: Visualizing Geodesic Distances between Trees ▶ Nearest Neighbor Interchange (NNI). Rotation Moves 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  19. An Application: Visualizing Geodesic Distances between Trees ▶ Nearest Neighbor Interchange (NNI). Rotation Moves 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4 ▶ Fill-in of NNI moves: Billera, Holmes, Vogtmann (2001)(BHV). The boundaries between regions represent an area of uncertainty about the exact branching order. In biological terminology this is called an `unresolved' tree. More details here . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  20. A Cone Path A path between two trees T and T ′ always exists. Since all orthants connect at the origin, any two trees T and T ′ can be connected by a two-segment path, this is called the cone-path. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  21. c c b b a a Theorem(Billera, Holmes, Vogtmann (BHV ,2001)): Tree space with BHV metric is a CAT(0) space, that is, it has non-positive curvature. This implies there are geodesic between any two trees (Gromov). Note: This space of trees is not an Euclidean space. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  22. c c b b a a The size of the ``pseudo-variance'' can be estimated from ∑ p i d ( T 0 , T i ) 2 . Properties of the Fréchet mean of a set of trees has been (Bhattacharya et al.2010, Miller, Mattingley, Owen, Marron, al. 2013). . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  23. Phylogenetic Trees Malaria Data as seen using ape Pga11 Plo6 Pvi10 Pcy9 Pkn8 Pfr7 Pbe5 Pma3 Pfa4 Pre1 Pme2 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  24. Sampling Distribution for Trees 1 Data 2 3 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  25. 1 Data 2 3 Treespace T n . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  26. 1 Data 2 3 4 True Sampling Distribution . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  27. ^ n * 1 Data * 2 * 3 * 4 Bootstrap Sampling Distribution (non parametric) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  28. Bootstrap of Malaria Data . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  29. Hierarchical Clustering Trees Human, short−chain dehydrogena short−chain dehydrogenase/redu Human zinc finger protein PLAG Human mRNA for endosialin prot syntaphilin Homo sapiens clone 24775 mRNA interferon gamma receptor 2 (i Human epithelial V−like antige Human DNA for muscle nicotinic hyaluronoglucosaminidase 2 Human RATS1 mRNA, complete cds Human mRNA encoding the c−myc Human mRNA for alpha−actinin, PAS−serine/threonine kinase chemokine (C−C motif) receptor Human mRNA for nel−related pro lymphotoxin beta (TNF superfam PAS−serine/threonine kinase Human Epstein−Barr virus induc granzyme K (serine protease, g Human insulin−like growth fact Human cDNA: FLJ22008 fis, clon ferritin, heavy polypeptide 1 eukaryotic translation initiat Human clone 295, 5cM region su KIAA0290 protein Human mRNA for KIAA0972 protei Human cDNA FLJ20849 fis, clone platelet/endothelial cell adhe protein tyrosine phosphatase, Human 54 kDa progesterone rece Human zinc finger protein ZNF2 POU domain, class 2, transcrip Human CpG island DNA genomic M ESTs, Weakly similar to MUC2_H Human sodium/myo−inositol cotr Human, Similar to phosphodiest Human, clone IMAGE:3875338, mR follicular lymphoma variant tr Human genomic DNA, chromosome amyloid beta (A4) precursor pr Incyte EST proteoglycan link protein stannin delta (Drosophila)−like 1 GRB2−related adaptor protein KIAA0752 protein STAT induced STAT inhibitor 3 KIAA0303 protein Human mRNA for KIAA0303 gene, Human cDNA FLJ10470 fis, clone selectin L (lymphocyte adhesio intracellular hyaluronan−bindi Human AF5q31 protein (AF5q31) HEA25_EFFE_3 MEL39_EFFE_2 HEA31_EFFE_2 MEL67_EFFE_4 HEA55_EFFE_4 HEA59_EFFE_5 HEA26_EFFE_1 MEL51_EFFE_5 MEL36_EFFE_1 MEL53_EFFE_3 HEA31_NAI_2 HEA55_NAI_4 MEL67_NAI_4 MEL53_NAI_3 HEA25_NAI_3 MEL51_NAI_5 HEA59_NAI_5 HEA26_NAI_1 MEL36_NAI_1 MEL39_NAI_2 MEL51_MEM_5 HEA26_MEM_1 MEL67_MEM_4 HEA31_MEM_2 HEA55_MEM_4 HEA25_MEM_3 HEA59_MEM_5 MEL53_NAI_3 MEL36_MEM_1 MEL39_MEM_2 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  30. Eigenvalues of MDS for bootstrapped trees ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20000 40000 60000 80000 120000 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  31. Bootstrapped trees 67 65 20 40 53 81 24 2 44 4 55 34 64 94 42 68 22 62 6 15 96 45 97 99 31 20 74 13 80 56 83 14 23 17 41 35 25 26 33 91 89 50 57 o o1 43 100 54 79 29 16 52 82 93 63 84 11 86 7576 60 0 3839 21 8 49 95 47 73 10 12 70 37 98 78 30 28 5 48 51 27 18 36 61 46 72 85 −20 −40 77 69 59 90 40 88 66 58 87 −60 71 32 92 9 19 7 3 −40 −20 0 20 40 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  32. Part III Combine and Compare Trees, Graphs and Contingent Count Data . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  33. Layers of Data in the Microbiome Joshua Lederberg:`the ecological community of commensal, symbiotic, and pathogenic microorganisms that literally share our body space and have been all but ignored as determinants of health and disease' Microbiome Complete collection of genes contained in the genomes of microbes living in a given environment. Numbers Humans shelter 100 trillion microbes ( 10 14 ), (we are made of 10 × 10 12 cells). Metagenome Composition of all genes present in an environment (soil, gut, seawater), regardless of species. Transciptome These are the mRNA transcripts in the cell, it reflects the genes that are being actively expressed at any given time. Metabolome The metabolites (small molecules) nucleic or fatty acids, sugars,... present in the sample either endogenous or exogenous (medication, pollution). . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  34. . Source: YK Lee and SK Mazmanian Science, 2010. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  35. Bacteria etc... and Us The human microbiome or human microbiota is the assemblage of microorganisms that reside on the surface and in deep layers of skin, in the saliva and oral mucosa, in the conjunctiva, and in the gastrointestinal tracts. ▶ They include bacteria, fungi, and archaea. ▶ Some of these organisms perform tasks that are useful for the human host. (live in symbiosis) ▶ Majority have no known beneficial or harmful effect. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  36. Human Microbiome: What are the data? DNA The Genomic material present (16sRNA-gene especially, but also shotgun). RNA What genes are being turned on (gene expression), transcriptomics. Mass Spec Specific signatures of chemical compounds present (LC/MS, GC/MS). Clinical Multivariate information about patients' clinical status, medication, weight. Environmental Location, nutrition, drugs, chemicals, temperature, time. Domain Knowledge Metabolic networks, phylogenetic trees, gene ontologies. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  37. Heterogeneous Data Objects Object oriented input and data manipulation with phyloseq (McMurdie and Holmes, 2013, Plos ONE) Object oriented data in R: matrix data.frame matrix Sample Variables sampleData Phylogenetic Tree slots: .Data, OTU Abundance Taxonomy Table class: phylo names, class: otuTable slots: see ape taxonomyTable slots: .Data, row.names, slots: .Data package .S3Class speciesAreRows Component data objects: phyloseq slots: otuTable Experiment-level data object: sampleData taxTab tre . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  38. Part IV Heteroscedasticity: Mixtures and to Normalize them Source: xkcd. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  39. Points are measured with unequal variance x n . . x . i x . 2 x . 1 x . 2 x x . . 3 1 x . p . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  40. Some real data (Caporoso et al, 2011) > GlobalPatterns phyloseq-class experiment-level object otu_table() OTU Table: [ 19216 taxa and 26 samples ] sample_data()Sample Data: [ 26 samples by 7 sample variables ] tax_table()Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ] phy_tree() Phylogenetic Tree:[ 19216 tips and 19215 internal nodes ] > sample_sums(GlobalPatterns) CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr F21Plmr 864077 1135457 697509 1543451 2076476 718943 433894 186297 ..... NP3 NP5 TRRsed1 TRRsed2 TRRsed3 TS28 TS29 Even1 1478965 1652754 58688 493126 279704 937466 1211071 1216137 > summary(sample_sums(GlobalPatterns)) Min. 1st Qu. Median Mean 3rd Qu. Max. 58690 567100 1107000 1085000 1527000 2357000 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  41. Points are measured with unequal variance x n . . x . i x . 2 x . 1 x . 2 x x . . 3 1 x . p . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  42. Equalization of variances In this binomial example the variance of the proportion n ) = pq n = q estimate is Var ( X n E ( X n ) , a function of the mean. This is a common occurrence and one that is traditionally dealt with in statistics by applying variance-stabilizing transformations. However, in order to find the right transformation, we need a good model for the error. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  43. Variance Stabilization Prefer to deal with errors across samples which are independent and identically distributed. In particular homoscedasticity (equal variances) across all the noise levels. This is not the case when we have unequal sample sizes and variations in the accuracy across instruments. A standard way of dealing with heteroscedastic noise is to try to decompose the sources of heterogeneity and apply transformations that make the noise variance almost constant. These are called variance stabilizing transformations. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  44. Mixture Modeling works Miracles ▶ Beta-Binomial (deepSNV). ▶ Zero inflated Poisson or Gaussian. ▶ Gamma-Poisson. Mixtures are ubiquitous because of a mathematical theorem De Finnetti's Theorem . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  45. Wolfgang Huber . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  46. Correct transformations are available McMurdie and Holmes (2014) ``Waste Not, Want Not: Why rarefying microbiome data is inadmissible'', PLOS Computational Biology, Methods. We propose to model the read counts If technical replicates have same number of reads: s j , Poisson variation with mean µ = s j u i . Taxa i incidence proportion u i . Number of reads for the sample j and taxa i would be K ij ∼ Poisson ( s j u i ) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  47. A distance on the known tree Monge-Kantorovich earth mover's distance on the tree. Used to compare two samples or body sites for instance. Incorporate taxa abundances and phylogenetic tree Adlercreutzia Clostridium Coprobacillus Alistipes Dehalobacterium Clostridium Clostridium Abundance 1 25 625 Class Actinobacteria (class) Bacilli Bacteroidia Epulopiscium Lachnospira Clostridia Erysipelotrichi Gammaproteobacteria Coprococcus Mollicutes Coprococcus Coprococcus Verrucomicrobiae Clostridium YS2 Blautia Clostridium Roseburia Moryella Clostridium Clostridium Duality diagram methods that can use any dependency structure. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  48. Unifrac Distance (Lozupone and Knight, 2005) is a distance between groups of organisms that are related to each other by a tree. Suppose we have the OTUs present in sample 1 (blue) and in sample 2(red). Question: Do the two samples differ phylogenetically? It is defined as the ratio of the sum of the lengths of the branches leading to members of group A or members of group B but not both to the total branch length of the tree. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  49. Weighted Unifrac distance A modification of UniFrac, weighted UniFrac is defined in (Lozupone et al., 2007) as n b i × | A i − B i ∑ | A T B T i =1 ▶ n = number of branches in the tree ▶ b i = length of the ith branch ▶ A i = number of descendants of ith branch in group A ▶ A T = total number of sequences in group A [7]. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . . [6].

  50. Costello et al. 2010 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  51. Rao's Distance We start with a distance between individuals. The heterogeneity of a population (H i ) is the average distance between members of that population. The heterogeneity between two populations (H ij ) is the average distance between a member of population i and a member of population j. The distance between two populations is D ij = H ij − 1 2( H i + H j ) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  52. Decomposition of Diversity If we have populations 1 , . . . , k with frequencies π 1 , . . . , π k , then the diversity of all the populations together is k H 0 = ∑ π i H i + ∑ ∑ π i π j D ij = H ( w ) + D ( b ) i =1 i j . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  53. Double Principal Coordinate Analysis Pavoine, Dufour and Chessel (2004), Purdom (2010) and Fukuyama et al. (2011). . Suppose we have n species in p locations and a (euclidean) matrix ∆ giving the squares of the pairwise distances between the species. Then we can ▶ Use the distances between species to find an embedding in n − 1 -dimensional space such that the euclidean distances between the species is the same as the distances between the species defined in ∆ . ▶ Place each of the p locations at the barycenter of its species profile. The euclidean distances between the locations will be the same as the square root of the Rao dissimilarity between them. ▶ Use PCA to find a lower-dimensional representation of the locations. Give the species and communities coordinates such that the inertia decomposes the same way the diversity does. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  54. Fukuyama and Holmes, 2012. Method Original description New formula Properties DPCoA square root of Rao's distance i b i ( A i / A T − B i / B T ) 2 ] 1/2 Most sensitive to outliers, least [ ∑ based on the square root of the sensitive to noise, upweights patristic distances deep differences, gives OTU locations wUniFrac i b i | A i / A T − B i / B T | i b i | A i / A T − B i / B T | Less sensitive to outliers/more ∑ ∑ sensitive to noise than DPCoA i b i 1 { Ai / AT − Bi / BT UniFrac fraction of branches leading to Sensitive to noise, upweights ∑ Ai / AT + Bi / BT ≥ 1 } exactly one group shallow differences on the tree Summary of the methods under consideration. ``Outliers" refers to highly abundant OTUs, and noise refers to noise in detecting low-abundance OTUs (see the text for more detail). . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  55. Antibiotic Time Course Data Measurements of about 2500 different bacterial OTUs from stool samples of three patients (D, E, F) Each patient sampled ∼ 50 times during the course of treatment with ciprofloxacin (an antibiotic). Times categorized as Pre Cp, 1st Cp, 1st WPC (week post cipro), Interim, 2nd Cp, 2nd WPC, and Post Cp. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  56. UniFrac weighted UniFrac weighted UF on presence/absence 0.2 ● 0.10 0.2 ● ● ● 0.1 ● Axis 2: 10.3% Axis 2: 12.3% Axis 2: 15.1% 0.1 0.05 ● ● subject ● ● ● ● ● ● 0.0 ● ● ● D ● ● ● 0.0 ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● E ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● F −0.1 ● ● ● ● ● ● ● ● ● ● ● ● −0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● −0.2 ● ● ● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 −0.4−0.3−0.2−0.1 0.0 0.1 0.2 0.3 −0.10 −0.05 0.000.050.100.150.20 Axis 1: 14.7% Axis 1: 47.6% Axis 1: 32.7% Comparing the UniFrac variants. From left to right: PCoA/MDS with unweighted UniFrac, with weighted UniFrac, and with weighted UniFrac performed on presence/absence data extracted from the abundance data used in the other two plots . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  57. (a) MDS of OTUs (b) DPCoA community plot (c) DPCoA OTU plot phylum ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4C0d−2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Actinobacteria ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bacteroidetes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Axis 2: 13.3% 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Axis 2: 3.7% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● subject ● ● ● ● ● Candidate division TM7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● D ● ● ● ● CS2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cyanobacteria ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● E ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Firmicutes ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● F −0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Fusobacteria ● ● ● ● ● ● ● ● ● ● −1.0 ● ● ● ● −0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Lentisphaerae ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.5 ● ● ● Proteobacteria ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.8 ● −1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Synergistetes −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 ● Verrucomicrobia (a) Axis 1: 6.2% Axis 1: 40.9% CS1 PCoA/MDS of the OTUs based on the patristic distance, (b) community and (c) species points for DPCoA after removing two outlying species. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  58. Antibiotic Stress We next want to visualize the effect of the antibiotic. Ordinations of the communities due to DPCoA and UniFrac with information about the whether the community was stressed or not stressed (pre cipro, interim, and post cipro were considered ``not stressed'', while first cipro, first week post cipro, second cipro, and second week post cipro were considered ``stressed''). We see that for UniFrac, the first axis seems to separate the stressed communities from the not stressed communities. DPCoA also seems to separate the out the stressed communities along the first axis (in the direction associated with Bacteroidetes), although only for subjects D and E. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  59. ● ● 0.2 ● ● ● ● ● F 2 ● ● ● ● ● ● ● ● ● ● Antibiotic stress ● F 1 ● 1: not stressed 0.1 ● ● ● ● ● ● ● ● ● 2: stressed ● ● ● ● ● ● ● ● ● Axis2 ● ● E 2 ● ● ● ● ● E 1 ● Subject 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● D E ● −0.1 D 2 F ● ● ● ● ● ● ● ● ● ● ● ● ● D 1 ● ● ● ● ● ● ● ● ● −0.2 ● ● ● ● ● ● ● ● ● ● ● −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 Axis1 PCoA/MDS with unweighted UniFrac. The labels represent subject plus antibiotic condition. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  60. ● ● ● ● ● 0.4 ● ● E 2 ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ●● E 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● Axis2 D 1 ● ● ● ● D 2 ● ● ● ● ● ● ● ● ● F 1 ● F 2 ● ● ● ● ● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.4 −0.6 ● −0.8 −1.0 −0.5 0.0 0.5 Axis1 Community points as represented by DPCoA. The labels represent subject plus antibiotic condition. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  61. Conclusions for Antibiotic Stress Since UniFrac emphasizes shallow differences on the tree and since PCoA/MDS with UniFrac seems to separate the subjects from each other better than the other two methods, we can conclude that the differences between subjects are mainly shallow ones. However, DPCoA also separates the subjects and the stressed versus non-stressed communities, and examining the community and OTU ordinations can tell us about the differences in the compositions of these communities. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  62. Distances enable statisticians to.... ▶ Summarize data with medians, means and principal directions. ▶ Encode some variations in uncertainty. ▶ Make comparisons of heterogeneous sources of information. ▶ Integrate network and tree information. ▶ Measure diversity, inertia and generalize the notion of variance. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  63. Questions for mathematicians ? ▶ How to make a method designed for uniformly distributed points work for points generated by mixtures of heterogeneous distributions? Examples from work by Edelsbrunner, Carlsson, Zoromodian and co-authors. Source:Zoromodian. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  64. Questions for mathematicians ▶ How to build distances between images that account for unequal measurement errors, even locally? x n . . x . i x . 2 x . 1 x . 2 x x . . 3 1 x . p Work by Adler, Taylor and Worsley (2003,2005,2007) using Random Fields. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  65. Are there better ways of approximating the commutative diagrams? This is also an important point of contact with the use of Stein's method in probability theory. Questions for mathematicians ▶ How well can the Euclidean embedding approximations do? . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  66. Questions for mathematicians ▶ How well can the Euclidean embedding approximations do? ▶ Are there better ways of approximating the commutative diagrams? This is also an important point of contact with the use of Stein's method in probability theory. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  67. Questions for mathematicians ▶ How to distinguish between the effect of the curvature of a state space and the effect of the unequal sampling? . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  68. Answers come from Differential Geometry. Xavier Pennec, Yann Ollivier, Tom Fletcher, Rabi Bhattacharya. In particular enable us to incorporate the relevant data dependent transformations into localized metrics. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  69. Output showing posterior uncertainty measures . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  70. Benefitting from the tools and schools of Statisticians....... Thanks to the R community: ▶ RStudio for tools for reproducible research and Hadley Wickham for ggplot2. ▶ Ecologists and biologists: Chessel, Jombart, Dray, Thioulouse ade4 and Emmanuel Paradis ape . Collaborators: David Relman, Alfred Spormann, Yves Escoufier, Les Dethfelsen, Justin Sonnenburg, Persi Diaconis, Sergio Baccallado, Elisabeth Purdom. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  71. Lab Group Postdoctoral Fellows Paul (Joey) McMurdie, Ben Callahan, Simon Rubinstein-Salzado, Christof Seiler. Students: John Chakerian, Julia Fukuyama, Kris Sankaran. Funding from NIH/ NIGMS R01, NSF-VIGRE and NSF-DMS. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  72. References L. Billera, S. Holmes, and K. Vogtmann. The geometry of tree space. Adv. Appl. Maths, 771--801, 2001. J. Chakerian and S. Holmes. distory :Distances between trees, 2010. Daniel Chessel, Anne Dufour, and Jean Thioulouse. The ade4 package - i: One-table methods. R News, 4(1):5--10, 2004. P. Diaconis, S. Goel, and S. Holmes. Horseshoes in multidimensional scaling and kernel methods. Annals of Applied Statistics, 2007 . Y. Escoufier. Operators related to a data matrix. In J.R. et al. Barra, editor, Recent developments in Statistics., pages 125--131. North Holland,, 1977 . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  73. Steven N Evans and Frederick A Matsen. The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. arXiv, q-bio.PE, Jan 2010. M Hamady, C Lozupone, and R Knight. Fast unifrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and phylochip data. The ISME Journal, Jan 2009. Susan Holmes. Multivariate analysis: The French way. In D. Nolan and T. P. Speed, editors, Probability and Statistics: Essays in Honor of David A. Freedman, volume 56 of IMS Lecture Notes--Monograph Series. IMS, Beachwood, OH, 2006. Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend