Computational aspects of ncRNA research Mihaela Zavolan - - PowerPoint PPT Presentation
Computational aspects of ncRNA research Mihaela Zavolan - - PowerPoint PPT Presentation
Computational aspects of ncRNA research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects on ncRNA research Bacterial ncRNAs Gene discovery Target discovery Discovery of transcription
Computational aspects on ncRNA research
- Gene discovery
- Target discovery
- Discovery of transcription regulatory elements for
ncRNAs Bacterial ncRNAs
Computational aspects on ncRNA research
- Gene discovery:
automated annotation gene prediction
- Expression profiling:
sample comparisons visualization
- Target discovery:
modeling miRNA-mRNA target interaction
- Characterization of regulatory networks involving RNAs:
miRNA target prediction prediction of transcription regulatory elements miRNAs
Computational aspects on ncRNA research
siRNAs: design
- Optimization of silencing efficacy
- Minimization of off-target effects
ncRNA gene prediction
Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure.
Proportion of miRNA sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)
Structure stabilization
GGACaag GUCC GUGCucauGUAC GGACag GUUC GUAUuuu GUAC
Identification of pairs of sites with high mutual information
Mutations that are fixed in evolution preserve RNA structure (covariance models behind tRNAscan-SE (S. Eddy), RNAalifold (I. Hofacker))
ncRNA gene prediction
Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure.
50-50 200-200 300-50
mir-100 is expected to preserve its hairpin secondary structure through the various steps of miRNA biogenesis.
Prediction of bacterial ncRNAs
http://cwx.prenhall.com/horton/medialib/media_portfolio/
TATA box factor binding site
Promoter regions recognized by 70 subunit of E.coli
RNA hairpins regulate transcription termination
http://cwx.prenhall.com/horton/medialib/media_portfolio/
Conserved secondary structures of Vibrio ncRNAs
Lenz et al. - The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 118:69-82 (2004).
miRNA gene discovery
Studies driven by computation Studies driven by experiment
- 1. genome-wide computational
prediction
- 2. validation
(Lai et al., 2003 - fly; Lim et al., 2003 - worm; Lim et al., 2003 - vertebrates; Berezikov et al., 2005 - vertebrates; Pfeffer et al., 2005 - viruses). 1. large-scale cloning 2. functional annotation 3. miRNA gene prediction 4. validation (Houbaviy et al., 2003 - mouse; Dostie et al., 2003 - rat; Aravin et al., 2003 - fly; Suh et al., 2004 - man; Pfeffer et al., 2004 - man, viruses).
Fast, incomplete. Laborious, exhaustive.
Functional annotation of small RNAs
Genome sequence Sequences with known function (mRNA, rRNA, tRNA, miRNA, etc.)
ALIGNMENT
Small (16-30 nc) cloned RNAs
Functional annotation of small RNAs
Small (16-30 nc) cloned RNAs
Functional annotation of small RNAs
rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs
match known sequences
Functional annotation of small RNAs
rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs
match genome multiple copies hairpin conservation
Novel miRNAs rRNA tRNA miRNA mRNA
Functional annotation of small RNAs
rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs
match genome multiple copies hairpin conservation
Novel miRNAs rRNA tRNA miRNA mRNA
multiple genome matches
Novel miRNAs rRNA tRNA miRNA mRNA rasiRNA
miRNA gene prediction
He & Hannon (Nat. Rev. Genet. 2004)
Main clue: miRNA precursors form stem loop structures Issues:
- find the locations in the genome that can give rise to
miRNAs
- predict the sequence of the mature miRNA
... so do many other genomic regions
let-7a mir-147 Fragment of protein- coding gene
miRNA gene prediction using SVM
Classify candidate stem loops using the model. Build a model from positive and negative examples. Detect candidate stem loops in (large) genomic sequences.
miRNA gene prediction using SVM
hsa-let-7c
L = 84 dG = -33.5 kcal/mole Nucleotide composition:
A - 20% C - 19% G - 29% U - 32%
Paired nucleotides:
A-U - 31% G-U - 14% G-C - 29%
longest symmetrical region longest slighly asymmetrical region
Proportion of nucleotides in:
symmetrical loops - 17% asymmetrical loops - 4%
average distance between loops
negative stem
longest symmetrical regions longest slightly asymmetrical region
L = 68 dG = -22.6 kcal/mole Pfeffer et al. 2005
miRNA gene prediction using SVM
Negatives Positives 29% false negatives 3% false positives
Negatives: mRNAs, rRNAs, tRNAs, viral stem loops Positives: human genomic regions containing known miRNAs
Features with largest negative weights: Free energy
- Nr. nc. in symmetrical loops in LSAR
- Nr. nc. in asymmetrical loops in LSAR
- Avg. size of asymmetrical loops
Features with largest positive weights: Stem length Length longest symmetrical region
- Nr. A-U pairs in LSAR
- Nr. G-C pairs in LSAR
Used SVMlight http://svmlight.joachims.org/
Detecting candidate stem loops
50-50 200-200 300-50
Search for stems whose secondary structure remains the same irrespective of their flanking sequences.
example: hsa-mir-100
86% of the known human microRNAs belong to such robust stems. Density of robust stems in human genome: approximately 1 every 10 kb.
Classification of candidate stem loops
L = 78 dG = 31.6 kcal/mole
LSR LSAR
miRNA precursor? SVM score: 0.8 yes: miR-UL1 of CMV (cloning frequency: 101)
Identification of microRNAs of the herpesvirus family. Nature Methods (2005).
Application: miRNA gene prediction in viruses
Sensitivity-specificity plots for evaluating the performance of prediction programs
Sn = TP TP + FN ,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs
Sn = TP TP + FN ,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs
Sn = TP TP + FN ,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs
Sn = TP TP + FN ,Sp = TP TP + FP
Variations on miRNA gene prediction
Lim, L. P. et al. (2003) Genes & Dev. 17:991
=
f
w
f
- f
v
Variations on miRNA gene prediction
Berezikov, E. et al. (2005) Cell 120:21 Proportion of miRNA sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)
Variations on miRNA gene prediction
Xie, X. et al. (2004) Nature 434:338
miRNA gene prediction servers
http://genes.mit.edu/mirscan/ http://www.mirz.unibas.ch
Prediction of ncRNAs using comparative genomics
RNAz (www.tbi.univie.ac.at/~wash/RNAz)
- Start with an alignment of homologous sequences
- Compute the following features:
- mean free energy of aligned sequences
- structure conservation index ( )
- mean pairwise identity
- number of sequences in the alignment
- Use a SVM to classify candidates
SCI = EA / E
EA is the free energy of the alignment (takes into account mutations that preserve the structure), and is the mean free energy of aligned sequences. E
Modeling miRNA-mRNA interaction for target prediction
target: C.e._COG-1A miRNA : cel-lsy-6 target 5' C CA A 3' GU CUUAUACAAAA CG GAGUAUGUUUU miRNA 3' GCUUUA CA 5' target: C.e_LIN-41A miRNA : cel-let-7 target 5' U AUU U 3' UUAUACAACC CUGCCUC GAUAUGUUGG GAUGGAG miRNA 3' UU AU U 5' target: C.e_hbl-1 miRNA : cel-let-7 target 5' U GUU C A 3' AUUAUACAACC C ACCUCA UGAUAUGUUGG G UGGAGU miRNA 3' U AU A 5'
Hybrids generated using RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/
Known miRNA-mRNA interactions in C.elegans
Modeling miRNA-mRNA interaction
Use evolutionary conservation to determine what defines an miRNA target site.
- Define an interaction model (e.g. the first 8
nucleotides of the miRNA have to be perfectly paired with their mRNA target site).
Modeling miRNA-mRNA interaction
Use evolutionary conservation to determine what defines an miRNA target site.
- Define an interaction model (e.g. the first 8
nucleotides of the miRNA have to be perfectly paired with their mRNA target site).
- Determine the locations of all candidate sites in a
reference species (e.g. human).
Modeling miRNA-mRNA interaction
Use evolutionary conservation to determine what defines an miRNA target site.
- Define an interaction model (e.g. the first 8
nucleotides of the miRNA have to be perfectly paired with their mRNA target site).
- Determine the locations of all candidate sites in a
reference species (e.g. human).
- Determine the number of these candidate sites that
are conserved in a set of species that have the miRNA.
Modeling miRNA-mRNA interaction
Use evolutionary conservation to determine what defines an miRNA target site.
- Define an interaction model (e.g. the first 8
nucleotides of the miRNA have to be perfectly paired with their mRNA target site).
- Determine the locations of all candidate sites in a
reference species (e.g. human).
- Determine the number of these candidate sites that
are conserved in a set of species that have the miRNA.
- Compare with the number of conserved candidate
sites that we get for a “random miRNA” that has approximately the same number of predicted sites in the species of reference.
Modeling miRNA-mRNA interaction
Use evolutionary conservation to determine what defines an miRNA target site.
- Define an interaction model (e.g. the first 8
nucleotides of the miRNA have to be perfectly paired with their mRNA target site).
- Determine the locations of all candidate sites in a
reference species (e.g. human).
- Determine the number of these candidate sites that
are conserved in a set of species that have the miRNA.
- Compare with the number of conserved candidate
sites that we get for a “random miRNA” that has approximately the same number of predicted sites in the species of reference.
Lewis et al. 2005
Modeling miRNA-mRNA interaction
S/N ratio Interaction model Some miRNAs have hundreds
- f targets but many do not.
miRNA target prediction servers
http://pictar.bio.nyu.edu/
siRNA design
Empirical “rules” for siRNA design - derived from the work in the Tuschl Lab (siRNA user’s guide: http://www.rockefeller.edu/labheads/tuschl/sirna.html):
siRNA design
Refining the rules by analyzing large datasets of siRNAs (Reynolds et al. 2004, many others): different siRNAs for the same gene can have markedly different silencing efficiencies.
siRNA design
S<50% S>50% S>80% S>95%
+1 +1/A +1 +1 +1 +1
- 1
- 1
siRNA design
Far et al. (2003) Nucl. Acids
- Res. 31:4417
Accesibility of target site influences siRNA efficacy:
Target accessibility prediction server http://sfold.wadsworth.org/index.pl