NIEMA MOSHIRI AND SIAVASH MIRARAB Presented by: Surbhi Jain
A Two-State Model of Tree Evolution and Its Applications - - PowerPoint PPT Presentation
A Two-State Model of Tree Evolution and Its Applications - - PowerPoint PPT Presentation
A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition NIEMA MOSHIRI AND SIAVASH MIRARAB Presented by: Surbhi Jain INTRODUCTION Background Approx 6 billion base
INTRODUCTION
Protein coding DNA Intergenic regions Alu – 11 % LI – 17%
Background
- Approx 6 billion base pairs of DNA in body
- Only 3 – 10% actually code for proteins
- 90 – 97% integenic regions
- within intervening regions are repeating
elements – SINE (short interspersed repeating elements)
- Alu most common
- ~11% of the human genome
- >1 million copies
Adapted from presentation on Alu elements. PCR Workshop (2005).
Alu elements
- Alu elements probably arose from a gene that encodes the RNA
component of the signal recognition particle, which labels proteins for export from the cell.
- Roughly 1 million copies --11% of total genome
- Recognition site for restriction enzyme Alu I (A G^C T) is found
within the Alu region – hence the name.
- Approx 300 bp in length
- Alu does not encode any functional molecules and depends on
the machinery of the active class of repetitive elements in order to be copied and moved about the genome.
Retrotransposon -“Jumping Gene”
- Copy and paste model
- Transcribed into mRNA
by RNA polymerase
- Converted to double
stranded DNA by reverse transcriptase
- Integrated into different
spot in genome at the site
- f a single or double
stranded break
Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.
Why study Alu elements?
For biologists:
Impact on the genome
– insertion mutations – recombination between elements – gene conversion – gene expression
Implicated in human diseases:
– Neurofibromatosis – Haemophilia – Familial hypercholesterolaemia – Breast cancer – Insulin-resistant diabetes type II – Ewing sarcoma
Impact on genome regulation
– distribution of methylation – transcription of genes throughout the genome
Transcription of Alu elements
– changes in response to cellular stress – might be involved in maintaining or regulating the cellular stress response
Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.
Why study Alu elements?
For phylogeneticists:
- Alu elements are a primary source for the origin of simple sequence
repeats in primate genomes
- Alu-insertion polymorphisms are a boon for the study of human
population genetics and primate comparative genomics because they are neutral, identical-by-descent genetic markers with known ancestral states
- Phylogenetic analysis of Alu elements belonging to the Alu Ye5 subfamily
has provided the strongest evidence yet that the chimp is humans' closest living relative
Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.
Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics.
Dual Birth Model
𝜇" ¡, 𝜇$ ¡= birth parameters r = 𝜇" ¡/𝜇$ ¡ Not exchangeable
- The right child of any branch is always active while the left one is inactive.
- Active entities propagate with rate b (for “birth”), and inactive entities
activate and simultaneously propagate with rate a (for “activation”)
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
METHODS
- Fixed-n sampling procedure to generate 20 replicate “true” trees
- 6 experiments each varying a single parameter
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
- Tree inference: FastTreeII & RAxML
- r estimation: cherry based & length based estimator
- Error measurement: normalized Robinson–Foulds (RF) distance &
Matching Split (MS) metric
Simulations
Human Alu dataset
- Dataset of 885,011 Alu repeats
– Human Alu profile hidden Markov models (profile HMMs) from Dfam database – nhmmer to scan the hg19 reference genome
- Alignment: PASTA
- Tree inference: FastTreeII and RAxML
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
RESULTS
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
Simulations
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
Simulations
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
Human Alu dataset
- 7% of Alu repeats have propagated at least once
- 𝜇"(activation events per year per inactive element) = 1.426 x 10-8
- 𝜇$ ¡(propagation event per active element per year) = 2.384 x 10-6
Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology.
Limitations and Future Directions
- All elements are born into an inactive state, have
identical rate of activation at birth, and an identical rate of birth
- Modeling deactivation would enable the estimation
- f the number of elements that are active at any
specific point in time
- Allowing deaths in addition to births
- Discussion about the merit of 𝜇𝑏 ¡, 𝜇𝑐 ¡values
Thanks for your attention. Questions?
Questions ¡from ¡the ¡class
- How well do ML phylogeny models deal with retrotransposition. Is it the case that retrotransposition causes
errors in typical models or that instead the birth-death model is just significantly better in these cases.
- How much do birth-death models change the performance on canonical applications of phylogeny?
- What are some technical difficulties in incorporating the deactivation process in the model?
- Is there a possible explanation to why n did not influence the mean tree error too much but has large impact
- n variance of the tree error? (Figure 3(a) lower center)
- Is there any way to assess the validity of their results? How accurate are their estimations of Alu
parameters?
- Are there any additions to their model that can capture more of the biological complexity of these
sequences?
- Why is dominance of an Alu insertion governed by an element being under "selective pressure"?
- How would the MCMC approach be used to estimate r in this context?
- What is the process of setting a node ’active’ supposed to represent in biology?
- They assume a molecular clock while trying to find the activation and propagation events per year; is this ok
to assume?
- Is it correct to say that translating the standard time-reversible models of evolution to this model would
involve using two sets of substitution parameters, one for each child edge of an internal node?
- The paper states that Aluelements have no known biological function of their own, but their being studied
can provide insights into their contributions to genetic disease. So, have previous studies positively linked their presence in the genome to any specific diseases?
- Does the dual-birth model lose anything by not accounting for death rates, where branches can go extinct
with a constant rate?
Supplementary slides
- For example, Price et al. (2004 ) used whole-genome Alu
data to estimate the total number of active elements to have been at least 143 throughout the history of Alu elements
- Wang et al. (2006 ) used human polymorphism data to
estimate the number of currently-active Alu elements to be at least 31
- Wacholder and Pollock (2016 ) introduced a novel
- Bayesian transposable element ancestral reconstruction
method and used it to estimate a lower-bound of 1386Alu elements to have ever been active
- These studies are looking for a strong evidence of
transposition capability and do not rule out the possibility that others are able to propagate