CS 581 Paper Presentation Muhammad Samir Khan Recovering the - - PowerPoint PPT Presentation

โ–ถ
cs 581 paper presentation
SMART_READER_LITE
LIVE PREVIEW

CS 581 Paper Presentation Muhammad Samir Khan Recovering the - - PowerPoint PPT Presentation

CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir Overview Introduction (what is LGT?) Notation


slide-1
SLIDE 1

CS 581 Paper Presentation

Muhammad Samir Khan

Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir

slide-2
SLIDE 2

Overview

  • Introduction (what is LGT?)
  • Notation
  • Model
  • Bounded-rates Model
  • Yule Process
  • Quartet Based Approach
  • Bounded Rates Model
  • Yule Process
  • Preferential LGT
  • Further Results
slide-3
SLIDE 3

What is LGT?

  • Non-vertical transfer of genes
  • Overall evolution is tree-like
  • Particularly common in

bacteria

  • Primary Reason for the spread
  • f antibiotic resistance 1
  • 1. https://en.wikipedia.org/wiki/Horizontal_gene_transfer
  • 2. http://www.nature.com/nrmicro/journal/v3/n9/images/nrmicro1253-f1.gif
slide-4
SLIDE 4

Species Phylogeny

  • ๐‘ˆ

๐‘ก = (๐‘Š ๐‘ก, ๐น๐‘ก, ๐‘€๐‘ก: ๐‘ , ๐œ)

  • ๐‘Š

๐‘ก

vertices

  • ๐น๐‘ก

edges

  • ๐‘€๐‘ก

leaves

  • ๐‘ 

root

  • ๐œ(๐‘“)

interspeciation times

  • Number of leaves ๐‘œ = ๐‘œ+ + ๐‘œโˆ’
  • ๐‘œ+ > 0 extant species
  • ๐‘œโˆ’ โ‰ฅ 0 extinct species

๐‘  extinct extant ๐œ(๐‘“)

slide-5
SLIDE 5

Extant Phylogeny

  • Denoted ๐‘ˆ

๐‘ก + = (๐‘Š ๐‘ก+, ๐น๐‘ก +, ๐‘€๐‘ก +: ๐‘ +, ๐œ+)

  • Restrict to extant leaves ๐‘ˆ

๐‘ก|๐‘€๐‘ก +

  • Suppress vertices of degree 2 (add up

the branch lengths)

  • Root at the most recent common

ancestor of ๐‘€๐‘ก

+

  • ๐‘ˆ

๐‘ก + is ultrametric

  • Want to recover the extant

phylogeny

๐‘  time

slide-6
SLIDE 6

Gene Trees

  • ๐‘ˆ

๐‘• = (๐‘Š ๐‘•, ๐น๐‘•, ๐‘€๐‘•: ๐œ•๐‘•) for a gene ๐‘• is an unrooted tree

  • ๐‘Š

๐‘•

vertices

  • ๐น๐‘•

edges

  • ๐‘€๐‘•

leaves subset of ๐‘€๐‘ก

  • ๐œ•๐‘•(๐‘“)

branch lengths (expected number of substitutions)

  • Each vertex of degree 2 or 3
  • ๐’ฐ

๐‘• = ๐’ฐ[๐‘ˆ ๐‘•] is the topology of ๐‘ˆ ๐‘• with degree 2 vertices suppressed

  • Not ultrametric
slide-7
SLIDE 7

LGT Transfer โ€“ Subtree Prune and Regraft

  • LGT Transfer takes place on

locations along the edges

  • Recipient location: pruning
  • Donor location: regrafting
  • A new node at donor location
  • 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational

Biology, 20(2), 93-112.

slide-8
SLIDE 8

Contemporaneous Locations

  • Two locations ๐‘ฆ, ๐‘ง are contemporaneous if their ๐œ-distance to the root

is identical: ๐œ ๐‘ฆ, ๐‘  = ๐œ(๐‘ง, ๐‘ )

  • For ๐‘† > 0, ๐ท๐‘ฆ

(๐‘†) is the set of locations contemporaneous to ๐‘ฆ and

with MRCA at ๐œ-distance at most ๐‘† from ๐‘ฆ: ๐ท๐‘ฆ

(๐‘†) =

๐‘ง โˆถ ๐œ ๐‘ฆ, ๐‘  = ๐œ ๐‘ง, ๐‘  , ๐œ ๐‘ฆ, ๐‘ง โ‰ค 2๐‘†

slide-9
SLIDE 9

Random LGT

  • Species phylogeny fixed ๐‘ˆ

๐‘ก = ๐‘Š ๐‘ก, ๐น๐‘ก, ๐‘€๐‘ก: ๐‘ , ๐œ

  • 0 < ๐‘† โ‰ค โˆž (possibly depending on ๐‘œ)
  • Each edge has a rate of LGT ๐œ‡ ๐‘“ : 0 < ๐œ‡ ๐‘“ < +โˆž
  • ฮ› ๐‘“ = ๐œ‡ ๐‘“ ๐œ ๐‘“
  • ฮ›๐‘ข๐‘๐‘ข = ฯƒ๐‘“โˆˆ๐น๐‘ก ฮ› ๐‘“
  • ฮ› = ฯƒ๐‘“โˆˆ๐น(๐‘ˆ

๐‘ก|๐‘€๐‘ก +) ฮ› ๐‘“

  • Taxon sampling probability ๐‘ž โˆถ 0 < ๐‘ž โ‰ค 1
slide-10
SLIDE 10

Random LGT

  • LGT locations:
  • Start from root (chronologically)
  • Along each edge ๐‘“ โˆˆ ๐น๐‘ก, select a recipient location according to a continuous-

time Poisson process with rate ๐œ‡ ๐‘“

  • If ๐‘ฆ is selected as a recipient location, donor location is selected uniformly at

random from ๐ท๐‘ฆ

๐‘†

  • Keep each extant leaf independently with probability ๐‘ž, to get ๐‘€๐‘•
  • Gene tree ๐‘ˆ

๐‘• is obtained by keeping the subtree restricted to ๐‘€๐‘•

slide-11
SLIDE 11

Bounded Rates Model

  • Constants:
  • ๐œ๐œ‡ โˆถ 0 < ๐œ๐œ‡ < 1
  • ๐œ๐œ โˆถ 0 < ๐œ๐œ < 1
  • าง

๐œ โˆถ 0 < าง ๐œ < +โˆž

  • าง

๐œ‡ possibly depending on ๐‘œ+ : 0 < าง ๐œ‡ < +โˆž

  • Used to control the amount of LGT
  • Under the bounded rates model:

๐œ๐œ‡ าง ๐œ‡ โ‰ค ๐œ‡ ๐‘“ โ‰ค าง ๐œ‡ โˆ€๐‘“ โˆˆ ๐น๐‘ก ๐œ๐œ าง ๐œ โ‰ค ๐œ+ ๐‘“+ โ‰ค าง ๐œ โˆ€๐‘“+ โˆˆ ๐น๐‘ก

+

slide-12
SLIDE 12

Yule Process

  • Branching process that starts with two species
  • Each species generates a new offspring at rate ๐œ‰ โˆถ 0 < ๐œ‰ < +โˆž
  • No extinct species
  • Stop when number of species = ๐‘œ + 1 (ignore the last species)
  • ๐œ๐œ‡ าง

๐œ‡ โ‰ค ๐œ‡ ๐‘“ โ‰ค าง ๐œ‡ for every edge ๐‘“ โˆˆ ๐น๐‘ก

  • ๐œ๐œ‡ constant: 0 < ๐œ๐œ‡ < 1
  • าง

๐œ‡ possibly depending on ๐‘œ: 0 < าง ๐œ‡ < +โˆž

slide-13
SLIDE 13

Quartet Based Approach

  • Input: Gene trees ๐‘ˆ

๐‘•1, โ€ฆ , ๐‘ˆ ๐‘•๐‘‚

Output: Estimated extant species phylogeny เท  ๐‘ˆ

  • Let ๐‘Œ = ๐‘, ๐‘, ๐‘‘, ๐‘’ be a four-tuple of extant species
  • Three possible quartets
  • ๐‘Ÿ1 = ๐‘๐‘|๐‘‘๐‘’
  • ๐‘Ÿ2 = ๐‘๐‘‘|๐‘๐‘’
  • ๐‘Ÿ3 = ๐‘๐‘’|๐‘๐‘‘
  • Frequency of quartet:

๐‘”

๐‘Œ ๐‘Ÿ๐‘— = ๐‘•๐‘˜โˆถ๐‘ŒโІ๐‘€๐‘•๐‘˜,๐’ฐ

๐‘•๐‘˜|๐‘Œ=๐‘Ÿ๐‘—

๐‘•๐‘˜โˆถ๐‘ŒโІ๐‘€๐‘•๐‘˜

slide-14
SLIDE 14

Quartet Based Approach

  • 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational

Biology, 20(2), 93-112.

slide-15
SLIDE 15

Bounded Rates Model

  • 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational

Biology, 20(2), 93-112.

slide-16
SLIDE 16

Yules Process

  • 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational

Biology, 20(2), 93-112.

slide-17
SLIDE 17

Preferential LGT

  • 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational

Biology, 20(2), 93-112.

slide-18
SLIDE 18

Further Results

  • Highways of LGT
  • The same model as before with additional โ€œhighwaysโ€
  • Highways are pairs of edges where LGT occurs deterministically
  • Highways can be different for different genes
  • Same result holds under the bounded rates model
  • Assuming no extinctions
  • Frequency of genes affected by highways is low
  • Distance Based Approach under the GTR model
  • Compute the distance matrix by using the median of distances
  • Use any statistically consistent distance based method
slide-19
SLIDE 19

Questions?