statistical binning enables an accurate coalescent based
play

Statistical binning enables an accurate coalescent-based estimation - PowerPoint PPT Presentation

Statistical binning enables an accurate coalescent-based estimation of the avian tree Siavash Mirarab, Md. Shamsuzzoha Bayzid, Bastien Boussau, and Tandy Warnow. Science (2014) Avian whole genomes phylogenies [Jarvis, Mirarab, et al., Science,


  1. Statistical binning enables an accurate coalescent-based estimation of the avian tree Siavash Mirarab, Md. Shamsuzzoha Bayzid, Bastien Boussau, and Tandy Warnow. Science (2014)

  2. Avian whole genomes phylogenies [Jarvis, Mirarab, et al., Science, 2014] 48 representative birds Species tree error Hope! Data (i.e., # of genes) 2

  3. Gene tree discordance gene: 
 recombination-free orthologous regions in genomes gene 1 gene 2 gene 999 gene 1000 ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle 3

  4. Gene tree discordance The species tree ¡Owl ¡Finch Falcon ¡ ¡ ¡ ¡Eagle gene 1 gene 2 gene 999 gene 1000 A gene tree ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle 3

  5. Gene tree discordance The species tree ¡Owl ¡Finch Falcon ¡ ¡ ¡ ¡Eagle gene 1 gene 2 gene 999 gene 1000 A gene tree ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle Causes of gene tree discordance: • Modeled by multi-species coalescent • Incomplete Lineage Sorting (ILS) • Highly probable for radiations (e.g., short branches) such as the bird radiation; 60 mya • Duplication and loss • The species is identifiable from the gene • Horizontal Gene Transfer (HGT) tree distribution [Degnan and Salter, 2005] 3

  6. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT 4

  7. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G Concatenation 4

  8. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ 81% CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G ML Concatenation ¡Owl ¡Finch 4

  9. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ 81% CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G ML Concatenation ¡Owl ¡Finch Error - Statistically inconsistent & positively misleading [Roch and Steel, Theo. Pop. Gen., 2014] 
 - Mixed accuracy in simulations [Kubatko and Degnan, Systematic Biology, 2007] 
 [Mirarab, et al., Systematic Biology, 2014] Data 4

  10. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT 5

  11. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl 5

  12. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl 5

  13. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl Error Can be statistically consistent • MP-EST (maximum pseudo-likelihood) [Liu, Yu, Edwards, BMC Evol. Bio., 2010] • BUCKy-pop., NJst, STAR, ASTRAL, … Data 5

  14. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl Error Can be statistically consistent • MP-EST (maximum pseudo-likelihood) True gene trees [Liu, Yu, Edwards, BMC Evol. Bio., 2010] • BUCKy-pop., NJst, STAR, ASTRAL, … Data 5

  15. Gene trees on the avian dataset 14,000 “genes”: 8,000 exons and 2,500 introns 
 3,500 Ultra-Conserved Elements 20% branches (percentage) 15% median mean 10% 5% 0 0% 25% 50% 75% 100% branch bootstrap support A measure of confidence in estimated gene tree branches 6

  16. Gene trees on the avian dataset 14,000 “genes”: 8,000 exons and 2,500 introns 
 3,500 Ultra-Conserved Elements 20% branches (percentage) 15% median mean 10% 14,000 noisy gene trees 5% 0 0% 25% 50% 75% 100% branch bootstrap support A measure of confidence in estimated gene tree branches 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend