Kingmans coalescent Random collision of lineages as go back in time - PowerPoint PPT Presentation

Kingman’s coalescent Random collision of lineages as go back in time (sans recombination) Collision is faster the smaller the effective population size u9 In a diploid population of u8 Average time for u7 u6 effective population size N, k copies to coalesce to u5 4N u4 k−1 = Average time for n k(k−1) copies to coalesce u3 = 4N ( 1 − 1 ( generations n Average time for two copies to coalesce u2 = 2N generations Week 9: Coalescents – p.17/60

Coalescence is faster in small populations Change of population size and coalescents N e time the changes in population size will produce waves of coalescence the tree time Coalescence events time The parameters of the growth curve for Ne can be inferred by likelihood methods as they affect the prior probabilities of those trees that fit the data. Week 9: Coalescents – p.24/60

“Skyline” and “Skyride” plots in BEAST Classical Skyline Plot ORMCP Model Bayesian Skyline Plot Effective Population Size 1.0 1.0 1.0 0.01 0.01 0.01 0.001 0.001 0.001 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 Uniform Bayesian Skyride Time−Aware Bayesian Skyride BEAST Bayesian Skyride Effective Population Size 1.0 1.0 1.0 0.01 0.01 0.01 0.001 0.001 0.001 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 Time (Past to Present) Time (Past to Present) Time (Past to Present) Figure from Minin, Bloomquist, and Suchard 2008

BEST Liu and Pearl (2007); Edwards et al. (2007) • X – sequence data • G – a genealogy (gene tree – with branch lengths) • S – a species tree • θ – demographic parameters • Λ – parameters of molecular sequence evolution Pr( S, θ ) Pr( X | S, θ ) Pr( S, θ | X ) = Pr( X ) � = Pr( S ) Pr( θ ) Pr( X | G ) Pr( G | S, θ ) dG � �� ∝ Pr( S ) Pr( θ ) Pr( X | G, Λ ) Pr( Λ ) d Λ Pr( G | S, θ ) dG

BEST – importance sampling 1. Generate a collection of gene trees, G , using an approximation of the coalescent prior 2. Sample from the distribution of the species trees conditional on the gene trees, G . 3. Use “importance weights” to correct the sample for the fact that an approximate prior was used

BEST – importance sampling 1. Generate a collection of gene trees, G , using an approximation of the coalescent prior (a) Use a tweaked version of MrBayes to sample N sets of gene trees, G , from Pr( G | X ) = Pr † ( G ) Pr( X | G ) † Pr † ( X ) (b) Pr † ( G ) is an approximate prior on gene trees from using a “maximal” species tree. 2. Sample from the distribution of the species trees conditional on the gene trees, G . 3. Use “importance weights” to correct the sample for the fact that an approximate prior was used

BEST – importance sampling 1. Generate a collection of gene trees, G , using an approximation of the coalescent prior 2. Sample from the distribution of the species trees conditional on the gene trees, G . (a) From each set of gene trees ( G j for 1 ≤ j ≤ N ) generate k species trees using coalescent theory: Pr( S i | G j ) = Pr( S i ) Pr( G j | S i ) Pr( G j ) 3. Use “importance weights” to correct the sample for the fact that an approximate prior was used

BEST – importance sampling 1. Generate a collection of gene trees, G , using an approximation of the coalescent prior 2. Sample from the distribution of the species trees conditional on the gene trees, G . 3. Use “importance weights” to correct the sample for the fact that an approximate prior was used (a) Estimate � Pr( G j ) by using the harmonic mean estimator from the MCMC in step 2. (b) Compute a normalization factor N � � Pr( G j ) β = Pr( G j ) j =1 (c) Reweight all sampled species trees by � Pr( G j ) Pr( G j ) β

BEST – conclusions 1. very expensive computationally (long MrBayes runs are needed) 2. should correctly deal with the variability in gene tree caused by the coalescent process.

∗ BEAST overview Goal: approximate Pr( S | X ) Pr( S | X ) ∝ Pr( X | S ) Pr( S ) � = Pr( X | G ) Pr( G | S ) Pr( S ) dG � � = Pr( X | G ) Pr( G | S, θ ) Pr( S ) dGd θ � � � = Pr( X | G, Λ ) Pr( G | S, θ ) Pr( S ) dGd θ d Λ = { N 1 , N 2 , . . . , } θ = { κ, π , . . . } Λ

Pr( S ) from speciation model S

Pr( G | S ) G S

Gene tree in a species tree w/ variable population size Pr( G | S ) = � b G in grey i Pr( G i | S i ) S in black A1 A2 A3 C1 C2 C3 B2 B1 B3 A C B Figure modified from Heled and Drummond 2010

In Species A

In Species C

In Species B

In ancestor of AC

In ancestor of ACB

Gene tree in a species tree w/ variable population size Pr( G | S ) = � b G in grey i Pr( G i | S i ) S in black A1 A2 A3 C1 C2 C3 B2 B1 B3 A C B Figure modified from Heled and Drummond 2010

MCMC update to gene tree → Changing G affects Pr( X | G, Λ ) and Pr( G | S, θ )

Another MCMC update to gene tree → Some changes to G are incompatible with S (and will be rejected).

MCMC update to species tree → Changing S affects Pr( G | S, θ ) , but not Pr( X | G, Λ ) . Note that the red dots are “flags” for when a lineage enters a new species; the heights are determined by the species tree.

Another MCMC update to species tree → Some changes to S are incompatible with C (and will be rejected).

An MCMC update to the population size → N e ∈ θ , so changing N e affects Pr( G | S, θ ) , but not Pr( X | G, Λ ) .

Multiple gene tree in a species tree w/ variable population size Figure from modified Heled and Drummond 2010

∗ BEST Similar model to BEST, but more efficient much implementation. Both attempt to sample the posterior distribution of species trees, gene trees, demographic parameter values and mutational parameter values. Both will be very sensitive to migration, but they represent the state-of-the-art for estimating species trees from gene trees.

Multiple Sequence Alignment - main points • The goal of MSA is to introduce gaps such that residues in the same column are homologous (all residues in the column descended from a residue in their common ancestor). • The problem is recast as: – reward matches (+ scores) – penalize rare substitutions (- scores), – penalize gaps (- scores), – try to find an alignment that maximizes the total score • pairwise alignment is tractable • MSA is usually done progressively • progressive alignment algorithms are heuristic, and do not optimize an evolutionary defensible criterion

Multiple Sequence Alignment tools • clustal variants are popular, but not very reliable. • simultaneous inference of MSA and tree is the most defensible (but computationally demanding) • Promising tools for MSA (roughly in order of computational tractability): 1. Simultaneous MSA + Trees (Handel, BAliPhy, BEAST, AliFritz . . . ) 2. FSA (fast statistical alignment); Infernal (for rRNA); Prank 3. MAFFT, Muscle, ProbCons • Iterative “meta-solutions” (e.g. SAT` e ) allow MSA uncertainty to be incorporated in tree inference. • GBlocks (and similar tools) cull ambiguously aligned regions.

human KRSV chimp KRV orang KPRV

human chimp orangutan KPRV KRV KRSV del S KRSV S->R P->R KPSV

human KRSV chimp KRV gorilla KSV orang KPRV How should we align these sequences? human human KRSV KRSV chimp OR chimp KR-V K-RV gorilla gorilla KS-V K-SV orang orang KPRV KPRV

Pairwise alignment Gap penalties and a substitution matrix imply a score for any alignment. Pairwise alignment involves finding the alignment that maximizes this score. • substitution matrices assign positive values to matches or similar substitutions (for example Leucine → Isoleucine). • unlikely substitutions receive negative scores • gaps are rare and are heavily penalized (given large negative values).

Scoring an alignment. Simplest case Costs: Match 1 Mismatch 0 Gap -5 Alignment: Pongo V D E V G G E L G R L F V V P T Q Gorilla V E V A G D L G R L L I V Y P S R Score 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 Total score = 5

Scoring an different alignment. Simplest case Match 1 Mismatch 0 Gap -5 V D E V G G E L G R L - F V V P T Q Pongo Gorilla V - E V A G D L G R L L I V Y P S R Score 1 -5 1 1 0 1 0 1 1 1 1 -5 0 1 0 1 0 0 Total score = 0

Kingmans coalescent Random collision of lineages as go back in time - PowerPoint PPT Presentation

Kingmans coalescent Random collision of lineages as go back in time (sans recombination) Collision is faster the smaller the effective population size u9 In a diploid population of u8 Average time for u7 u6 effective population size N,

The Metric Coalescent joint with David Aldous Daniel Lanoue University of California, Berkeley

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

Quartet Inference from SNP Data Under the Coalescent Model Syed Shalan Naqvi Quartet Inference

Fundamentals of Evoluon Fundamentals of Evoluon EEEB G6110 EEEB G6110 Session 11: The

Kingman Park Historic District (Proposed) Historic Preservation Review Board D.C. Historic

The Coalescent Evolution backward in time Joachim Hermisson Mathematics and Biosciences Group

SCOTTI: Inferring transmission with the Structured Coalescent Nicola De Maio, Chieh-Hsi Wu,

X X J. Morabito R. R. Reilly 3/21/01 Coalescent Knowledge Slide No: 1 (KKM)

Scaling and local limits of Baxter permutations and bipolar orientations through coalescent-walk

Fundamentals of Evoluon Fundamentals of Evoluon EEEB G6110 EEEB G6110 Session 12:

Decaying Majoron DM and Structure Formation kingman Cheung September 11 2019 TAUP References

Management Issues in Hypoglossal Stimulation for OSA Kingman P Strohl M.D. Professor of

Statistical binning enables an accurate coalescent-based estimation of the avian tree Siavash

Quartet Inference from SNP Data Under the Coalescent Model

Presented by Farzaneh Khajouei The Mul7species Coalescent

Coalescent-based models for selection and superspreading in viral phylogenies Patrick Hoscheit

NEW COMPUTATIONAL RESULTS ON SOLVING THE SEQUENTIAL PROCEDURE WITH FEEDBACK David Boyce,

Implementation Status of Fukushima Lessons Learned at Duke Energy Bill Pitesa Senior Vice

Third Quarter 2010 Investor Call Investor Call Terry Turner, President and CEO Harold Carpenter,

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple CSCE

Eigensystem multiscale analysis for MSA Key step Anderson localization in energy intervals II

Two proofs of Strmers theorem Stanislaw Szarek Case Western Reserve/Paris 6 Paris, October

Supportin Supporting g children children and youths and youths from vulnerabl from vulnerable

Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business