Data Requirement for Species Tree from Multiple Gene Trees - - PowerPoint PPT Presentation

data requirement for species tree from multiple gene trees
SMART_READER_LITE
LIVE PREVIEW

Data Requirement for Species Tree from Multiple Gene Trees - - PowerPoint PPT Presentation

Data Requirement for Species Tree from Multiple Gene Trees (Dasarathy, Nowak, Roch 2015) Daewon Seo Mar. 14. 2017 Introduction Incomplete Lineage Sorting (ILS), . Gene tree topologies could be different from species tree Two


slide-1
SLIDE 1

Data Requirement for Species Tree from Multiple Gene Trees

(Dasarathy, Nowak, Roch 2015)

Daewon Seo

  • Mar. 14. 2017
slide-2
SLIDE 2

Introduction

  • Incomplete Lineage Sorting (ILS), ….
  • Gene tree topologies could be different from species tree
  • Two statistically consistent algorithms
  • Key parameter : smallest species branch length
  • Assuming perfect gene trees are given,

GLASS: # of genes ~ O

  • STEAC: # of genes ~O
slide-3
SLIDE 3

Introduction

  • Focus on data length(≜ ), not number of genes(≜ )
  • In a single gene tree, to reconstruct topology with high probability,

~O 1

  • Therefore, in GLASS,

~O 1

  • In STEAC,

~O 1

slide-4
SLIDE 4

Introduction

  • METAL
  • Modified STEAC algorithm

~O

  • for any 1  ~Θ
  • While STEAC needs molecular clock assumption, METAL does not
slide-5
SLIDE 5

Gene Tree Generation Process

  • Given an unknown species tree,

: time of species tree

  • : random time of gene tree branch

1

  • Samples ~ JC model in each gene

tree

Gene1: ((A,B),C) Gene2: (A,(B,C)) - discordant

slide-6
SLIDE 6

METAL

Concatenation is good!

slide-7
SLIDE 7

METAL with molecular clock

  • Take normalized Hamming distance of concatenated

sequence ̂ 1 ,

  • [Thm 1] ̂ is ultrametric
  • [Thm 2] UPGMA works! (other methods as well)
  • To achieve error less than ,

and 1

slide-8
SLIDE 8

METAL with non-molecular clock

  • ̂ is no longer ultrametric, so a new metric satisfying

four-point condition is needed

if and only if

  • Set

3 4 log 1 4 3 ̂

A B C D

slide-9
SLIDE 9

METAL with non-molecular clock

  • [Thm 3] satisfies the four-point condition
  • Thus,
  • log

1

  • ̂ is our corrected distance
  • [Thm 4] We can characterize the error probability of NJ over

,

  • when is small enough

and 1

slide-10
SLIDE 10

Discussion

  • What is the exact tradeoff of , ?
  • Hypothesis test argument gives ∈ Ω
  • Steel and Szekely (2002), ∈ Ω

 ,

  • This paper, ∈ , 1
  • GLASS ∈ , ∈ ⇒ ∈
  • What if mutation rate varies over gene trees?
slide-11
SLIDE 11

Thank you!