Statistical Foundations for Analyzing Human Microbiome Data Human - - PowerPoint PPT Presentation

statistical foundations for analyzing human microbiome
SMART_READER_LITE
LIVE PREVIEW

Statistical Foundations for Analyzing Human Microbiome Data Human - - PowerPoint PPT Presentation

Statistical Foundations for Analyzing Human Microbiome Data Human Microbiome Data Patricio S. La Rosa 1 , Paul Brooks 2 , Yanjiao Zhou 1 , Elena Deych 1 , Berkley Shands 1 , Ed Boone 2 , David Edwards 2 , Qin Wang 2 , Erica Sodergren 1 , George


slide-1
SLIDE 1

Statistical Foundations for Analyzing Human Microbiome Data Human Microbiome Data

Patricio S. La Rosa1, Paul Brooks2, Yanjiao Zhou1, Elena Deych1, Berkley Shands1, Ed Boone2, David Edwards2, Qin Wang2, Erica Sodergren1, George Weinstock1, and Bill Shannon1

1Washington University in St. Louis Medical School 2Virginia Commonwealth University

slide-2
SLIDE 2

Probability Models Simplify Data

( )2

1

u x−

Probability Models Simplify Data

  • Replace data by model

( )

( )

2

2 2 i

2 1 , ; X P

σ

πσ σ μ

u x i

e x

= =

Replace data by model and parameters

– Mean and std. dev. defines normal data – Statistical tests compare parameters (e g t test) parameters (e.g., t‐test)

  • What probability
  • What probability

models will work for HMP data? HMP data?

3/9/2011 IHMC Vancouver 2

slide-3
SLIDE 3

Dirichlet‐Multinomial Distribution Dirichlet Multinomial Distribution

  • Relative Abundance Data

Relative Abundance Data

– Numbers of individuals

  • bserved for each taxon

– Multivariate descriptor of ecological community

{ } ( )

( ) ( )

{ }

( ) ( ) { }

1 1 1 1 ! ! ! , ; X P

1 1 1 1 i

− + − − + − = =

∏ ∏ ∏

= = =

θ θ θ θ π θ π

N r K j x r j iK i i j i

i ij

r r x x N x L

{ }

dispersion

  • f

measure j, taxa

  • f

proportion mean = = θ π j

3/9/2011 IHMC Vancouver 3

slide-4
SLIDE 4

Does DM Fit HMP Data? Does DM Fit HMP Data?

  • Goodness‐of‐Fit

Goodness of Fit

– Power > 99% to correctly decide data is Dirichlet‐ Multinomial – Size of test to correctly decide data is decide data is multinomial ~5%

  • Simulations indicate DM

is good fit to HMP data

3/9/2011 IHMC Vancouver 4

slide-5
SLIDE 5

What Hypotheses Can We Test? What Hypotheses Can We Test?

  • Test model parameters

p

– [3] analogous to 1 sample t‐test – [4] analogous to 2 sample t‐test or ANOVA

3/9/2011 IHMC Vancouver 5

slide-6
SLIDE 6

Power and Sample Sizes? Power and Sample Sizes?

Table 3. Comparing RAD means from 2 populations using hypothesis test [5].

P/Nr 100 500 1000 10000 20000 10

0.78 0.87 0.89 0.90 0.90

20

0.89 0.97 0.98 0.98 0.98

40

0.98 >0.99 >0.99 >0.99 >0.99

60

>0.99 >0.99 >0.99 >0.99 >0.99

100

>0.99 >0.99 >0.99 >0.99 >0.99

slide-7
SLIDE 7

Object Data Analysis (ODA) Object Data Analysis (ODA)

  • Apply probability model to

Apply probability model to graphical (tree) objects

– Sequence reads map to paths in a tree – Samples map to a tree

( ) ( ) ( ) ( )

, exp , , ; − × = =

∗ ∗ ∗

g g d g c g g G P

i i

τ τ τ distance , disperison , microbiome core = = =

d g τ

3/9/2011 IHMC Vancouver 7

slide-8
SLIDE 8

Bacteria 2.97 Bacteroidetes 0.99 Firmicutes 1.49 Bacteroidia 0.99 B t id l 0 99 Clostridia 0.53 Cl t idi l 0 53 Bacilli 0.91 L t b ill l 0 9 Bacteroidales 0.99 Prevotellaceae 0.99 Clostridiales 0.53 Veillonellaceae 0.53 Lactobacillales 0.9 Enterococcaceae 0.44 Prevotella 0.99 Megasphaera 0.52 Pilibacter 0.41

3/9/2011 IHMC Vancouver 8

slide-9
SLIDE 9

How do we estimate the core? How do we estimate the core?

3/9/2011 IHMC Vancouver 9

slide-10
SLIDE 10

Are Variable Region Cores Equal? Are Variable Region Cores Equal?

3/9/2011 IHMC Vancouver 10

slide-11
SLIDE 11

Are Body Site Cores Equal? Are Body Site Cores Equal?

3/9/2011 IHMC Vancouver 11

slide-12
SLIDE 12

Why Use Probability Models? Why Use Probability Models?

  • Parameters simplify interpretation of data (e.g.,

p y p ( g , core defined by central graph) (

  • Formal hypotheses and P values (e.g., DM t‐test

and ANOVA analogs)

  • Existing statistical machinery (e.g., power

calculations for study design) y g )

  • All estimates come with error (e.g., confidence

) errors)

3/9/2011 IHMC Vancouver 12

slide-13
SLIDE 13

Two Posters Two Posters

  • Dirichlet‐Multinomial Power Calculations and

Statistical Tests for Microbiome Data

– La Rosa, Brooks, Deych, Boone, Edwards, Wang, , , y , , , g, Sodergren, Weinstock, Shannon

  • Statistical Analysis of Taxonomic Trees in

Microbiome Research

– La Rosa, Zhou, Deych, Shands, Sodergren, Weinstock, Shannon ,

3/9/2011 IHMC Vancouver 13