Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of - PowerPoint PPT Presentation

Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Likelihoods, Bootstraps and Testing Trees – p.1/60

Odds ratio justification for maximum likelihood D the data H 1 Hypothesis 1 H 2 Hypothesis 2 | the symbol for “given” Prob ( H 1 ) Prob ( D | H 1 ) Prob ( H 1 | D ) = Prob ( H 2 ) Prob ( D | H 2 ) Prob ( H 2 | D ) � �� Prior odds ratio Likelihood ratio Posterior odds ratio Likelihoods, Bootstraps and Testing Trees – p.2/60

If a space probe finds no Little Green Men on Mars yes no priors no yes 4 1 1 4 Likelihoods, Bootstraps and Testing Trees – p.3/60

If a space probe finds no Little Green Men on Mars yes no priors no yes likelihoods no 1 yes 0 4 1 1 4 Likelihoods, Bootstraps and Testing Trees – p.3/60

If a space probe finds no Little Green Men on Mars yes no priors no yes likelihoods no 1 yes 0 1 × 1 / 3 4 × 1 / 3 4 1 1 1 Likelihoods, Bootstraps and Testing Trees – p.3/60

If a space probe finds no Little Green Men on Mars yes no priors no yes likelihoods no 1 yes 0 no yes no posteriors yes 1 × 1 / 3 4 × 1 / 3 4 4 1 1 = = 1 3 1 12 Likelihoods, Bootstraps and Testing Trees – p.3/60

The likelihood ratio term ultimately dominates If we see one Little Green Man, the likelihood calculation does the right thing: 1 4 × 2 / 3 ∞ = 0 1 (put this way, this is OK but not mathematically kosher) If we send n space probes and keep seeing none, the likelihood ratio term is � 1 � n 3 It dominates the calculation, overwhelming the prior. Thus even if we don’t have a prior we can believe in, we may be interested in knowing which hypothesis the likelihood ratio is recommending ... Likelihoods, Bootstraps and Testing Trees – p.4/60

Likelihood in Simple Coin-Tossing Tossing a coin n times, with probability p of heads, the probability of outcome HHTHTTTTHTTH is pp ( 1 − p ) p ( 1 − p )( 1 − p )( 1 − p )( 1 − p ) p ( 1 − p )( 1 − p ) p which is L = p 5 ( 1 − p ) 6 Plotting L against p to find its maximum: Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.454 p Likelihoods, Bootstraps and Testing Trees – p.5/60

Differentiating to find the maximum: Differentiating the expression for L with respect to p and equating the derivative to 0, the value of p that is at the peak is found (not surprisingly) p = 5 / 11 : to be � 5 � ∂ L 6 p 5 ( 1 − p ) 6 = 0 ∂ p = p − 1 − p 5 − 11 p = 0 5 p = ˆ 11 Likelihoods, Bootstraps and Testing Trees – p.6/60

A log-likelihood curve A Likelihood curve in one parameter Ln (Likelihood) length of a branch in the tree Likelihoods, Bootstraps and Testing Trees – p.7/60

Its maximum likelihood estimate A Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) length of a branch in the tree maximum likelihood estimate (MLE) Likelihoods, Bootstraps and Testing Trees – p.8/60

The (approximate, asymptotic) confidence interval A Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it 1/2 the value of a chi−square Ln (Likelihood) with 1 d.f. significant at 95% 95% confidence interval length of a branch in the tree maximum likelihood estimate (MLE) Likelihoods, Bootstraps and Testing Trees – p.9/60

Contours of a log-likelihood surface in two dimensions length of branch 2 length of branch 1 Likelihoods, Bootstraps and Testing Trees – p.10/60

Contours of a log-likelihood surface in two dimensions length of branch 2 MLE length of branch 1 Likelihoods, Bootstraps and Testing Trees – p.11/60

Log-likelihood-based confidence set for two variables shaded area is the joint confidence interval length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi−square value with two degrees of freedom which is significant at 95% level length of branch 1 Likelihoods, Bootstraps and Testing Trees – p.12/60

Confidence interval for one variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi−square value with one degree of freedom which is significant at 95% level length of branch 1 Likelihoods, Bootstraps and Testing Trees – p.13/60

Confidence interval for the other variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi−square value with one degree of freedom which is significant at 95% level length of branch 1 Likelihoods, Bootstraps and Testing Trees – p.14/60

Calculating the likelihood of a tree If we have molecular sequences on a tree, the likelihood is the product over sites of the data D [ i ] for each site (if those evolve independently): sites � Prob ( D [ i ] | T ) = Prob ( D | T ) = L i = 1 With log -likelihoods, the product becomes a sum: sites � ln Prob ( D [ i ] | T ) ln L = ln Prob ( D | T ) = i = 1 Likelihoods, Bootstraps and Testing Trees – p.15/60

Calculating the likelihood for site i on a tree A C C C G t t 4 5 t1 t 2 t 3 y x t 6 ti are t7 z "branch lengths", (rate X time) t w 8 Sum over all possible states (bases) at interior nodes: � � � � L ( i ) = Prob ( w ) Prob ( x | w , t 7 ) x y z w × Prob ( A | x , t 1 ) Prob ( C | x , t 2 ) Prob ( z | w , t 8 ) × Prob ( C | z , t 3 ) Prob ( y | z , t 6 ) Prob ( C | y , t 4 ) Prob ( G | y , t 5 ) Likelihoods, Bootstraps and Testing Trees – p.16/60

Calculating the likelihood for site i on a tree We use the conditional likelihoods: L ( i ) j ( s ) These compute the probability of everything at site i at or above node j on the tree, given that node j is in state s . Thus it assumes something ( s ) that we don’t know in practice – so we compute these for all states s . At the tips we can define these quantities: if the observed state is (say) C , the vector of L ’s is ( 0 , 1 , 0 , 0 ) . If we observe an ambiguity, say R (purine), they are ( 1 , 0 , 1 , 0 ) , ( 1 / 2 , 0 , 1 / 2 , 0 ) not Likelihoods, Bootstraps and Testing Trees – p.17/60

The “pruning" algorithm: j k vj vk l � � � L ( i ) Prob ( s j | s , v j ) L ( i ) ℓ ( s ) = j ( s j ) s j �� Prob ( s k | s , v k ) L ( i ) × k ( s k ) s k (Felsenstein, 1973; 1981). Likelihoods, Bootstraps and Testing Trees – p.18/60

and at the bottom of the tree: � L ( i ) π s L ( i ) = 0 ( s ) 0 s (Felsenstein, 1973, 1981) and having gotten the likelihoods for each site: sites � L ( i ) L = 0 i = 1 Likelihoods, Bootstraps and Testing Trees – p.19/60

What does “tree space" (with branch lengths) look like? an example: three species with a clock trifurcation A B C not possible etc. t 1 t 1 t 2 OK t 2 when we consider all three possible topologies, the space looks like: t1 t1 t2 t2 Likelihoods, Bootstraps and Testing Trees – p.20/60

For one tree topology The space of trees varying all 2n − 3 branch lengths, each a nonegative number, defines an “orthant" (open corner) of a ( 2n − 3 ) -dimensional real space: B v 2 v wall 3 wall A v C 8 v v 1 7 v 9 v 4 D v 6 F v 9 f l o o r v 5 E Likelihoods, Bootstraps and Testing Trees – p.21/60

Through the looking-glass Shrinking one of the n − 1 interior branches to 0, we arrive at a trifurcation: B v 2 v 3 A v C 8 v v 1 7 v 9 v 4 D v 6 F v 5 E Here, as we pass “through the looking glass" we are also touch the space for two other tree topologies, and we could enter either. Likelihoods, Bootstraps and Testing Trees – p.22/60

Through the looking-glass Shrinking one of the n − 1 interior branches to 0, we arrive at a trifurcation: B v 2 v 3 A v C 8 v v 1 7 v 9 v 4 D v 6 F v 5 E B v 2 v 3 v A C 8 v v 4 v 1 7 D v v 6 F 5 E Here, as we pass “through the looking glass" we are also touch the space for two other tree topologies, and we could enter either. Likelihoods, Bootstraps and Testing Trees – p.22/60

Through the looking-glass Shrinking one of the n − 1 interior branches to 0, we arrive at a trifurcation: B v 2 v 3 A v C 8 v v 1 7 v 9 v 4 D v 6 F v 5 E B v 2 v 3 v A C 8 v v 4 v 1 7 B D v 2 v v 6 v F 5 3 A v C 8 v v 4 v 1 7 E D v 9 v 6 F v 5 E Here, as we pass “through the looking glass" we are also touch the space for two other tree topologies, and we could enter either. Likelihoods, Bootstraps and Testing Trees – p.22/60

Through the looking-glass Shrinking one of the n − 1 interior branches to 0, we arrive at a trifurcation: B v 2 v 3 A v C 8 v v 1 7 v 9 v 4 D v 6 F v 5 E B v 2 v 3 v A C 8 v v 4 v 1 7 B B D v 2 v 2 v v v 6 v F 5 3 3 v C A v C 8 8 v v v 4 A v 1 5 v 9 7 E E D v v 1 v 9 7 v 6 F v v 4 v 5 6 F E D Here, as we pass “through the looking glass" we are also touch the space for two other tree topologies, and we could enter either. Likelihoods, Bootstraps and Testing Trees – p.22/60

Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of - PowerPoint PPT Presentation

Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Likelihoods, Bootstraps and Testing Trees p.1/60 Odds ratio justification for maximum likelihood D the data H 1

3 = 12 = 1 1 1 4 Likelihoods, Bootstraps and Testing Trees p.1/60 Likelihoods,

Prior odds ratio Likelihood ratio Posterior

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

Tournament Trees Winner trees. Loser Trees. Winner Trees Complete binary tree with n external

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Lab # 9 : Restriction enzyme mapping Restriction Enzymes Restriction enzymes , also known as

+ Cellular interactome !"#$%"$&'()*&+,),&-)(('.& Applications Our

3DGenomics for genome engineering Marc A. Marti-Renom Structural Genomics Group (ICREA, CNAG-CRG)

Model for U-Insertion RNA Editing Activites needed for U-insertion: Endonuclease to cut the

RNA bioinformatics folding/interaction, stability/dynamics, thermodynamics/kinetics, GT

Computing in the Statistics Curriculum Roger D. Peng Johns Hopkins Bloomberg School of Public

Pol olymo ymorp rphisms hisms and nd RF RFLPs By Am Amr r S. Mo Moustafa, stafa, M.

ENGINEDB: A repository of functional analogue gene products Giulia De Sario, Angelica Tulipano,

Sambuz

Useful Links

Newsletter

Mail Us

Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of - PowerPoint PPT Presentation

Likelihoods, Bootstraps and Testing Trees Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Likelihoods, Bootstraps and Testing Trees p.1/60 Odds ratio justification for maximum likelihood D the data H 1

3 = 12 = 1 1 1 4 Likelihoods, Bootstraps and Testing Trees p.1/60 Likelihoods,

Prior odds ratio Likelihood ratio Posterior

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Algorithms and Data Structures Balanced Trees (AVL-Trees, (a,b)-Trees, Red-Black-Trees)

Tournament Trees Winner trees. Loser Trees. Winner Trees Complete binary tree with n external

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Lab # 9 : Restriction enzyme mapping Restriction Enzymes Restriction enzymes , also known as

+ Cellular interactome !&quot;#$%&quot;$&amp;'()*&amp;+,),&amp;-)(('.&amp; Applications Our

3DGenomics for genome engineering Marc A. Marti-Renom Structural Genomics Group (ICREA, CNAG-CRG)

Model for U-Insertion RNA Editing Activites needed for U-insertion: Endonuclease to cut the

RNA bioinformatics folding/interaction, stability/dynamics, thermodynamics/kinetics, GT

Computing in the Statistics Curriculum Roger D. Peng Johns Hopkins Bloomberg School of Public

Pol olymo ymorp rphisms hisms and nd RF RFLPs By Am Amr r S. Mo Moustafa, stafa, M.

ENGINEDB: A repository of functional analogue gene products Giulia De Sario, Angelica Tulipano,

Sambuz

Useful Links

Newsletter

Mail Us

+ Cellular interactome !"#$%"$&'()*&+,),&-)(('.& Applications Our