RESEARCH & METHODS RNA-RNA interaction prediction Jerome - - PowerPoint PPT Presentation
RESEARCH & METHODS RNA-RNA interaction prediction Jerome - - PowerPoint PPT Presentation
COMP598: ADVANCED COMPUTATIONAL BIOLOGY RESEARCH & METHODS RNA-RNA interaction prediction Jerome Waldispuhl School of Computer Science, McGill From slides from Ivo Hofacker (University of Vienna) Motivation Experimental and
Motivation
- Experimental and bioinformatical methods find novel ncRNAs
en masse
- Give no hint as to the function of these novel ncRNAs
- Functional characterization of ncRNAs is difficult and slow
- Most ncRNAs function through interaction with other RNAs
- Identification of interaction partners is the easiest approach to
learn about possible functions
- Most obvious in the case of miRNA target prediction
Well known Examples of RNA-RNA Interaction
- micro RNAs regulate mRNA translation
- snoRNAs guide methylation and pseudouridylation of rRNA
- some well studied bacterial examples
- RyhB is transcribed under low Fe, binds several mRNA of Fe
binding proteins (sdh, sodB) and leads to mRNA degradation
- GadY interacts with the 3’ UTR of GadX and inhibits its
degradation
- DsrA is expressed at low temperatures and stimulates the
translation of RpoS a translational regulator
- OxyS is expressed under oxidative stress and inhibits
translation of its targets RpoS and flhA
- T-box motifs bind uncharged tRNAs to control transcription of
aminoacyl synthetases
Interaction of OxyS and fhla
Binding of OxyS to fhlA mRNA makes the ribosome binding site (start codon) inaccessible
Transcriptional control by T-box Motifs
Concentration of un-charged tRNAs controls transcription of its aminoacyl synthetase
Challenges
- Few well-studied examples
- Energetics of many interaction motifs are unknown
- Length of the interacting region is often quite small
- Binding is a concentration dependent process
- Folding kinetics rather than thermodynamics may play a role
- A single small RNA may have many targets
- RNA chaperones such as Hfq may be required for binding
- ncRNAs often act within RNPs, what’s the influence of the
protein?
Overview of Prediction Strategies
- Co-folding by concatenation of two sequences, e.g.
RNAcofold, pairfold, DINAMELT, Nupack
- Co-folding with pseudoknot-like structures, IRIS
- Using only inter-molecular interaction, i.e. assume that both
molecules are unstructured by themselves. RNAhybrid, RNAduplex, codeRNAplex
- Combine interaction search with accessibility calculations.
RNAup, RNAplfold + RNAplex, oligowalk
Simple Co-folding of two RNAs
- Poor man’s approach to cofolding:
- Concatenate two RNAs using a short linker
- Use conventional folding programs such as mfold
- Proper way:
- Use modified folding algorithm that keeps track of the break
between the strands
- Any loop containing the break point is treated specially.
- Implemented in the RNAcofold program of the Vienna RNA
package
- Limited to structures that are pseudo-knot free for
concatenated sequences.
Pair Probabilities from RNAcofold
Concentration Dependence of RNA-RNA interactions
Binding processes are always concentration dependent For two RNAs we have three reactions in equilibrium: A + B ⇋ AB A + A ⇋ AA B + B ⇋ BB Compute concentrations of all five monomers and dimers.
1 10 total siRNA concentration b [nmol] 5 10 concentration [nmol] mRNA-siRNA mRNA dimer mRNA monomer siRNA monomer
UNAFold: prediction of RNA/DNA hybridization
(Dimitrov&Zuker,2004)
Principles:
- Simple modification of the McCaskill’s
algorithm.
- Stacking energies computed from
experimental measures. Allowed configurations: Motivation: Let A and B be two polynucleotide
- sequences. In solution, UNAFold aims to
predict the concentration of single stranded folded and unfolded A and B AND hybridization AA, BB and AB. Results: Reproduce experimental observations
Sfold: Accessibilty prediction through Boltzmann sampling (Ding&Lawrence,2001)
Sample secondary structures using a stochastic backtracking procedure: Principle:
- Estimate accessibility (not base paired)
- f each nucleotide in the sample set.
- Identify the hybridization regions.
Structures (not) Predicted by RNAcofold
knot-free pseudo-knotted
Predicting more complex Structures
Without restricting allowed structure motif RNA-RNA interaction is NP-complete
- The most general algorithms
(Alkan 2006, Pervouchine 2004) allow structures where
- Intra-molecular pairs form
pseudo-knot free structures
- Inter-molecular pairs are not
allowed to cross
- Run time is too slow for most
purposes (O(n3 · m3))
Fast Interaction Search
Methods for fast interaction search
- Search for sequence complementarity by BLAST
- Better: Interaction search using thermodynamics
- Simplified folding algorithm without intra-molecular pairs.
- Runs in O(n · m) time.
- Used in RNAhybrid (miRNA target prediction), RNAduplex,
RNAplex What’s the effect of neglecting intra-molecular structure?
Frequency of ncRNA - mRNA Interactions
- 500
- 400
- 300
- 200
- 100
Free energy of interaction [kcal/mol] 0.00 0.02 0.04 0.06 0.08 0.10 density
I II III IV
RNA-mRNA interaction interaction energies (from RNAduplex) red: ncRNA candidates from RNAz, grey: shuffled sequences. Enrichments relative to randomly chosen conserved regions: I: 2.3, II: 1.9, III: 1.4, IV: 1.1
Combining Interaction and Accessibility
Two ingredients for efficient hybridization
- Complementarity
- Accessibility
How to quantify these? Complementarity → interaction energy Accessibility → probability to be unpaired
A G C U G G G A A A C C C G A A A G G GACC G G A A C C C G G C G C C G G C C G G
RNA Hybridization as a two Step Process
Free energy
∆Gopen
− − − − − ⇀ ↽ − − − − −
∆G
duplex
− − − − − ⇀ ↽ − − − − −
∆∆G
− − − − ⇀ ↽ − − − −
Example: ompN and RybB
U A G G A U G C C U U U G A U U C A A C G A A U C U G U A G A A G U U C A A UC U U U U G C A A A U A AG U U A A G U U U U UA A G G A U A A A A A A A U G A A A A G A A A A G U A U U G G C A C U U G U C A U C C C G G C U C U G C U G G C U G C U G G C G C A G C A C A C GC CGCU G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C
MFE -38.2 kcal/mol Cost of opening 23.6 kcal/mol
G C C A C U G C U U U U C U U U G A U G U C C C C A U U U U G U G G A G C C C A U C A A C C C C GCC A U U U C G G U U C A A G G U U G G U G G G U U U U U U
- 24 kcal/mol
GCCAC-----TGCTTTTCTTTGATGTCCCCATTTT-GTGGA-------GC-CCATCAACCCCGCCATTTCGGTT---CAAG-GTTGGTGGGTTTTTT ||| |||| |||||| ||| ||||| |||| || ||| || || || |||| |||| || ||| |||||| -40.30 AGGTCAAACAACGGC-AGAAACAATATT--TAAAGTCGCCGCACACGACGCGGTCGTCGGT-CGTCTCGGCCCTACTGTTCACGGTTATGAAAAGAAACC-3’
Example: ompN and RybB
1 U A G G A U G C C U U U G A U U C A A C G A A U C U G U A G A A G U U C A A UC U U U U G C A A A U A AG U U A A G U U U U UA A G G A U A A A A A A A U G A A A A G A A A A G U A U U G G C A C U U G U C A U C C C G G C U C U G C U G G C U G C U G G C G C A G C A C A C GC CGCU G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C G C C A C U G C U U U U C U U U G A U G U C C C C A U U U U G U G G A G C C C A U C A A C C C C GCC A U U U C G G U U C A A G G U U G G U G G G U U U U U U
u a g g a u g c c u u u g a u u c a a c g a a u c u g u a g a a g u u c a a u c u u u u g c a a a u a a g u u a a g u u uuu a a g g a u a a a a a a A UG A A A A G A A A A G U A UU G G C A C U U G U C A UCC C G G C U C U G C U G G C U G C U G G C G C A G C A C A C G C C G C U G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C G C C A C U G C U U U U C U U U GA U G U C CCCAUU U U G U G G A G C C C A U C A A C C C C G C C A U U U C G G U U C A A G G U U G G U G G G U U U U U U
∆Gopen = 1.6 + 3.9 kcal/mol, ∆∆G = −16 kcal/mol
The RNAup Approach
n (3’) m (3’) 1 (5’)
i j
1 (5’)
j* i*
- Compute probability that a site at [i..j] is unpaired (equivalent
to the energy ∆Gopen needed to force it open).
- Consider all possible ways of binding to the region [i..j] to
compute the interaction energy ∆Ginteract
- Total binding energy is the sum of these contributions:
∆∆G = ∆Gopen + ∆Ginteract
- Currently, restrict interactions to a single region
Computing Accessibility
∆Gopen is equivalent to the probability that the region [i..j] is unpaired in equilibrium ∆Gopen = −RT ln Pu[i, j]
- Constrained folding ∆Gopen = ∆G constr − ∆G free
- Boltzmann sampling, works for short regions only
- Direct computation by modified folding algorithm
Computing Accessibility
∆Gopen is equivalent to the probability that the region [i..j] is unpaired in equilibrium ∆Gopen = −RT ln Pu[i, j]
- Constrained folding ∆Gopen = ∆G constr − ∆G free
- Boltzmann sampling, works for short regions only
- Direct computation by modified folding algorithm
Pu[i, j] = Z1,i−1Zj+1,n Zn +
- h<i,j<l
ph,l · Prob ([i, j]|(k, l))
l k i j i j l k
C
p q
C
p q i j l k i j l k
M M
...
RNAup
Structural Information
RNAup
Structural Information
RNAup
Structural Information
RNAup
Structural Information
RNAup
Structural Information
RNAup
Interaction Information
Example: siRNA Binding
0.2 0.4 0.6 0.8 1 Probabilities VR1 straight VR1 HP5_16 VR1 HP5_11 0.2 0.4 Expression
Sequence position
160 180 160 180 160 180 160 180 VR1 HP5_6 1060 1080
- 25
- 20
- 15
- 10
- 5
∆Gi [kcal/mol]
Data taken from Schubert et al 2006
A scanning Version of RNAup
Can we adapt this method for fast searching in large databases?
- Local folding algorithms can scan very large sequences by
restricting the size of local structures to some maximum L.
- RNAplfold computes the probability that regions of length u
are unpaired by averaging over all windows of length L
- Runtime is linear in the length of the database O(n · L2)
u L
A scanning Version of RNAup
Can we adapt this method for fast searching in large databases?
- Local folding algorithms can scan very large sequences by
restricting the size of local structures to some maximum L.
- RNAplfold computes the probability that regions of length u
are unpaired by averaging over all windows of length L
- Runtime is linear in the length of the database O(n · L2)
u L
Computes average over all windows containing the region
πL[i, j] = 1 L − (j − i) + 1
i
- u=j−L
Pu,L[i, j]
50 100 150 200 250 sequence position 0.0 0.2 0.4 0.6 0.8 1.0 p[unpaired]
RNAup RNAplfold -u sfold
U C U A G A A A G U U U U C A C A A A G C U A A C A G G U A C C U C G A G A A G U U U U C A C A A A G C U A A C A C C G G A A G U U U U C A C A A A G C U A A C A A C U A G U G U A C C A A G U U U U C A C A A A G C U A A C A A U C G C G G G C C C U A G A G C G G C C G C U U C G A G C A G A C A U G A U A A G A U A C A U U G A U G A G U U U G G A C A A A C C A C A A C U A G A A U G C A G U G A A A A A A A U G C U U U A U U U G U G A A A U U U G U G A U G C U A U U G C U U U A U U U G U A A C C A U U A U A A G C U G C A A U A A A C AAccessibility of miRNA targets
1300 1310 1320 1330 1340 1350 position in sequence 1e-06 0.0001 0.01 1 log(accessibility) U 8 U 16
NON WORKING -36.5 kcal/mol
1660 1670 1680 1690 1700 position in sequence 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 log(accessibility) U 8 U 16
WORKING -28.3 kcal/mol