A Two-State Model of Tree Evolution and Its Applications - - PowerPoint PPT Presentation

a two state model of tree evolution and its applications
SMART_READER_LITE
LIVE PREVIEW

A Two-State Model of Tree Evolution and Its Applications - - PowerPoint PPT Presentation

A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition NIEMA MOSHIRI AND SIAVASH MIRARAB Presented by: Surbhi Jain INTRODUCTION Background Approx 6 billion base


slide-1
SLIDE 1

NIEMA MOSHIRI AND SIAVASH MIRARAB Presented by: Surbhi Jain

A ¡Two-­‑State ¡Model ¡of ¡Tree ¡ Evolution ¡and ¡Its ¡Applications ¡to ¡ Alu Retrotransposition

slide-2
SLIDE 2

INTRODUCTION

slide-3
SLIDE 3

Protein coding DNA Intergenic regions Alu – 11 % LI – 17%

Background

  • Approx 6 billion base pairs of DNA in body
  • Only 3 – 10% actually code for proteins
  • 90 – 97% integenic regions
  • within intervening regions are repeating

elements – SINE (short interspersed repeating elements)

  • Alu most common
  • ~11% of the human genome
  • >1 million copies

Adapted from presentation on Alu elements. PCR Workshop (2005).

slide-4
SLIDE 4

Alu elements

  • Alu elements probably arose from a gene that encodes the RNA

component of the signal recognition particle, which labels proteins for export from the cell.

  • Roughly 1 million copies --11% of total genome
  • Recognition site for restriction enzyme Alu I (A G^C T) is found

within the Alu region – hence the name.

  • Approx 300 bp in length
  • Alu does not encode any functional molecules and depends on

the machinery of the active class of repetitive elements in order to be copied and moved about the genome.

slide-5
SLIDE 5

Retrotransposon -“Jumping Gene”

  • Copy and paste model
  • Transcribed into mRNA

by RNA polymerase

  • Converted to double

stranded DNA by reverse transcriptase

  • Integrated into different

spot in genome at the site

  • f a single or double

stranded break

Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.

slide-6
SLIDE 6

Why study Alu elements?

For biologists:

Impact on the genome

– insertion mutations – recombination between elements – gene conversion – gene expression

Implicated in human diseases:

– Neurofibromatosis – Haemophilia – Familial hypercholesterolaemia – Breast cancer – Insulin-resistant diabetes type II – Ewing sarcoma

Impact on genome regulation

– distribution of methylation – transcription of genes throughout the genome

Transcription of Alu elements

– changes in response to cellular stress – might be involved in maintaining or regulating the cellular stress response

Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.

slide-7
SLIDE 7

Why study Alu elements?

For phylogeneticists:

  • Alu elements are a primary source for the origin of simple sequence

repeats in primate genomes

  • Alu-insertion polymorphisms are a boon for the study of human

population genetics and primate comparative genomics because they are neutral, identical-by-descent genetic markers with known ancestral states

  • Phylogenetic analysis of Alu elements belonging to the Alu Ye5 subfamily

has provided the strongest evidence yet that the chimp is humans' closest living relative

Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.

slide-8
SLIDE 8

Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.

slide-9
SLIDE 9

Dual Birth Model

𝜇" ¡, 𝜇$ ¡= birth parameters r = 𝜇" ¡/𝜇$ ¡ Not exchangeable

  • The right child of any branch is always active while the left one is inactive.
  • Active entities propagate with rate b (for “birth”), and inactive entities

activate and simultaneously propagate with rate a (for “activation”)

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

slide-10
SLIDE 10

METHODS

slide-11
SLIDE 11
  • Fixed-n sampling procedure to generate 20 replicate “true” trees
  • 6 experiments each varying a single parameter

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

  • Tree inference: FastTreeII & RAxML
  • r estimation: cherry based & length based estimator
  • Error measurement: normalized Robinson–Foulds (RF) distance &

Matching Split (MS) metric

Simulations

slide-12
SLIDE 12

Human Alu dataset

  • Dataset of 885,011 Alu repeats

– Human Alu profile hidden Markov models (profile HMMs) from Dfam database – nhmmer to scan the hg19 reference genome

  • Alignment: PASTA
  • Tree inference: FastTreeII and RAxML

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

slide-13
SLIDE 13

RESULTS

slide-14
SLIDE 14

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

Simulations

slide-15
SLIDE 15

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

Simulations

slide-16
SLIDE 16

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

Human Alu dataset

  • 7% of Alu repeats have propagated at least once
  • 𝜇"(activation events per year per inactive element) = 1.426 x 10-8
  • 𝜇$ ¡(propagation event per active element per year) = 2.384 x 10-6
slide-17
SLIDE 17

Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.

Limitations and Future Directions

  • All elements are born into an inactive state, have

identical rate of activation at birth, and an identical rate of birth

  • Modeling deactivation would enable the estimation
  • f the number of elements that are active at any

specific point in time

  • Allowing deaths in addition to births
  • Discussion about the merit of 𝜇𝑏 ¡, 𝜇𝑐 ¡values
slide-18
SLIDE 18

Thanks for your attention. Questions?

slide-19
SLIDE 19

Questions ¡from ¡the ¡class

  • How well do ML phylogeny models deal with retrotransposition. Is it the case that retrotransposition causes

errors in typical models or that instead the birth-death model is just significantly better in these cases.

  • How much do birth-death models change the performance on canonical applications of phylogeny?
  • What are some technical difficulties in incorporating the deactivation process in the model?
  • Is there a possible explanation to why n did not influence the mean tree error too much but has large impact
  • n variance of the tree error? (Figure 3(a) lower center)
  • Is there any way to assess the validity of their results? How accurate are their estimations of Alu

parameters?

  • Are there any additions to their model that can capture more of the biological complexity of these

sequences?

  • Why is dominance of an Alu insertion governed by an element being under "selective pressure"?
  • How would the MCMC approach be used to estimate r in this context?
  • What is the process of setting a node ’active’ supposed to represent in biology?
  • They assume a molecular clock while trying to find the activation and propagation events per year; is this ok

to assume?

  • Is it correct to say that translating the standard time-reversible models of evolution to this model would

involve using two sets of substitution parameters, one for each child edge of an internal node?

  • The paper states that Aluelements have no known biological function of their own, but their being studied

can provide insights into their contributions to genetic disease. So, have previous studies positively linked their presence in the genome to any specific diseases?

  • Does the dual-birth model lose anything by not accounting for death rates, where branches can go extinct

with a constant rate?

slide-20
SLIDE 20

Supplementary slides

  • For example, Price et al. (2004 ) used whole-genome Alu

data to estimate the total number of active elements to have been at least 143 throughout the history of Alu elements

  • Wang et al. (2006 ) used human polymorphism data to

estimate the number of currently-active Alu elements to be at least 31

  • Wacholder and Pollock (2016 ) introduced a novel
  • Bayesian transposable element ancestral reconstruction

method and used it to estimate a lower-bound of 1386Alu elements to have ever been active

  • These studies are looking for a strong evidence of

transposition capability and do not rule out the possibility that others are able to propagate