RESEARCH & METHODS RNA-RNA interaction prediction Jerome - - PowerPoint PPT Presentation

research methods
SMART_READER_LITE
LIVE PREVIEW

RESEARCH & METHODS RNA-RNA interaction prediction Jerome - - PowerPoint PPT Presentation

COMP598: ADVANCED COMPUTATIONAL BIOLOGY RESEARCH & METHODS RNA-RNA interaction prediction Jerome Waldispuhl School of Computer Science, McGill From slides from Ivo Hofacker (University of Vienna) Motivation Experimental and


slide-1
SLIDE 1

COMP598: ADVANCED COMPUTATIONAL BIOLOGY RESEARCH & METHODS

RNA-RNA interaction prediction

Jerome Waldispuhl School of Computer Science, McGill From slides from Ivo Hofacker (University of Vienna)

slide-2
SLIDE 2

Motivation

  • Experimental and bioinformatical methods find novel ncRNAs

en masse

  • Give no hint as to the function of these novel ncRNAs
  • Functional characterization of ncRNAs is difficult and slow
  • Most ncRNAs function through interaction with other RNAs
  • Identification of interaction partners is the easiest approach to

learn about possible functions

  • Most obvious in the case of miRNA target prediction
slide-3
SLIDE 3

Well known Examples of RNA-RNA Interaction

  • micro RNAs regulate mRNA translation
  • snoRNAs guide methylation and pseudouridylation of rRNA
  • some well studied bacterial examples
  • RyhB is transcribed under low Fe, binds several mRNA of Fe

binding proteins (sdh, sodB) and leads to mRNA degradation

  • GadY interacts with the 3’ UTR of GadX and inhibits its

degradation

  • DsrA is expressed at low temperatures and stimulates the

translation of RpoS a translational regulator

  • OxyS is expressed under oxidative stress and inhibits

translation of its targets RpoS and flhA

  • T-box motifs bind uncharged tRNAs to control transcription of

aminoacyl synthetases

slide-4
SLIDE 4

Interaction of OxyS and fhla

Binding of OxyS to fhlA mRNA makes the ribosome binding site (start codon) inaccessible

slide-5
SLIDE 5

Transcriptional control by T-box Motifs

Concentration of un-charged tRNAs controls transcription of its aminoacyl synthetase

slide-6
SLIDE 6

Challenges

  • Few well-studied examples
  • Energetics of many interaction motifs are unknown
  • Length of the interacting region is often quite small
  • Binding is a concentration dependent process
  • Folding kinetics rather than thermodynamics may play a role
  • A single small RNA may have many targets
  • RNA chaperones such as Hfq may be required for binding
  • ncRNAs often act within RNPs, what’s the influence of the

protein?

slide-7
SLIDE 7

Overview of Prediction Strategies

  • Co-folding by concatenation of two sequences, e.g.

RNAcofold, pairfold, DINAMELT, Nupack

  • Co-folding with pseudoknot-like structures, IRIS
  • Using only inter-molecular interaction, i.e. assume that both

molecules are unstructured by themselves. RNAhybrid, RNAduplex, codeRNAplex

  • Combine interaction search with accessibility calculations.

RNAup, RNAplfold + RNAplex, oligowalk

slide-8
SLIDE 8

Simple Co-folding of two RNAs

  • Poor man’s approach to cofolding:
  • Concatenate two RNAs using a short linker
  • Use conventional folding programs such as mfold
  • Proper way:
  • Use modified folding algorithm that keeps track of the break

between the strands

  • Any loop containing the break point is treated specially.
  • Implemented in the RNAcofold program of the Vienna RNA

package

  • Limited to structures that are pseudo-knot free for

concatenated sequences.

slide-9
SLIDE 9

Pair Probabilities from RNAcofold

slide-10
SLIDE 10

Concentration Dependence of RNA-RNA interactions

Binding processes are always concentration dependent For two RNAs we have three reactions in equilibrium: A + B ⇋ AB A + A ⇋ AA B + B ⇋ BB Compute concentrations of all five monomers and dimers.

1 10 total siRNA concentration b [nmol] 5 10 concentration [nmol] mRNA-siRNA mRNA dimer mRNA monomer siRNA monomer

slide-11
SLIDE 11

UNAFold: prediction of RNA/DNA hybridization

(Dimitrov&Zuker,2004)

Principles:

  • Simple modification of the McCaskill’s

algorithm.

  • Stacking energies computed from

experimental measures. Allowed configurations: Motivation: Let A and B be two polynucleotide

  • sequences. In solution, UNAFold aims to

predict the concentration of single stranded folded and unfolded A and B AND hybridization AA, BB and AB. Results: Reproduce experimental observations

slide-12
SLIDE 12

Sfold: Accessibilty prediction through Boltzmann sampling (Ding&Lawrence,2001)

Sample secondary structures using a stochastic backtracking procedure: Principle:

  • Estimate accessibility (not base paired)
  • f each nucleotide in the sample set.
  • Identify the hybridization regions.
slide-13
SLIDE 13

Structures (not) Predicted by RNAcofold

knot-free pseudo-knotted

slide-14
SLIDE 14

Predicting more complex Structures

Without restricting allowed structure motif RNA-RNA interaction is NP-complete

  • The most general algorithms

(Alkan 2006, Pervouchine 2004) allow structures where

  • Intra-molecular pairs form

pseudo-knot free structures

  • Inter-molecular pairs are not

allowed to cross

  • Run time is too slow for most

purposes (O(n3 · m3))

slide-15
SLIDE 15

Fast Interaction Search

Methods for fast interaction search

  • Search for sequence complementarity by BLAST
  • Better: Interaction search using thermodynamics
  • Simplified folding algorithm without intra-molecular pairs.
  • Runs in O(n · m) time.
  • Used in RNAhybrid (miRNA target prediction), RNAduplex,

RNAplex What’s the effect of neglecting intra-molecular structure?

slide-16
SLIDE 16

Frequency of ncRNA - mRNA Interactions

  • 500
  • 400
  • 300
  • 200
  • 100

Free energy of interaction [kcal/mol] 0.00 0.02 0.04 0.06 0.08 0.10 density

I II III IV

RNA-mRNA interaction interaction energies (from RNAduplex) red: ncRNA candidates from RNAz, grey: shuffled sequences. Enrichments relative to randomly chosen conserved regions: I: 2.3, II: 1.9, III: 1.4, IV: 1.1

slide-17
SLIDE 17

Combining Interaction and Accessibility

Two ingredients for efficient hybridization

  • Complementarity
  • Accessibility

How to quantify these? Complementarity → interaction energy Accessibility → probability to be unpaired

A G C U G G G A A A C C C G A A A G G GACC G G A A C C C G G C G C C G G C C G G

slide-18
SLIDE 18

RNA Hybridization as a two Step Process

Free energy

∆Gopen

− − − − − ⇀ ↽ − − − − −

∆G

duplex

− − − − − ⇀ ↽ − − − − −

∆∆G

− − − − ⇀ ↽ − − − −

slide-19
SLIDE 19

Example: ompN and RybB

U A G G A U G C C U U U G A U U C A A C G A A U C U G U A G A A G U U C A A UC U U U U G C A A A U A AG U U A A G U U U U UA A G G A U A A A A A A A U G A A A A G A A A A G U A U U G G C A C U U G U C A U C C C G G C U C U G C U G G C U G C U G G C G C A G C A C A C GC CGCU G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C

MFE -38.2 kcal/mol Cost of opening 23.6 kcal/mol

G C C A C U G C U U U U C U U U G A U G U C C C C A U U U U G U G G A G C C C A U C A A C C C C GCC A U U U C G G U U C A A G G U U G G U G G G U U U U U U

  • 24 kcal/mol

GCCAC-----TGCTTTTCTTTGATGTCCCCATTTT-GTGGA-------GC-CCATCAACCCCGCCATTTCGGTT---CAAG-GTTGGTGGGTTTTTT ||| |||| |||||| ||| ||||| |||| || ||| || || || |||| |||| || ||| |||||| -40.30 AGGTCAAACAACGGC-AGAAACAATATT--TAAAGTCGCCGCACACGACGCGGTCGTCGGT-CGTCTCGGCCCTACTGTTCACGGTTATGAAAAGAAACC-3’

slide-20
SLIDE 20

Example: ompN and RybB

1 U A G G A U G C C U U U G A U U C A A C G A A U C U G U A G A A G U U C A A UC U U U U G C A A A U A AG U U A A G U U U U UA A G G A U A A A A A A A U G A A A A G A A A A G U A U U G G C A C U U G U C A U C C C G G C U C U G C U G G C U G C U G G C G C A G C A C A C GC CGCU G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C G C C A C U G C U U U U C U U U G A U G U C C C C A U U U U G U G G A G C C C A U C A A C C C C GCC A U U U C G G U U C A A G G U U G G U G G G U U U U U U

u a g g a u g c c u u u g a u u c a a c g a a u c u g u a g a a g u u c a a u c u u u u g c a a a u a a g u u a a g u u uuu a a g g a u a a a a a a A UG A A A A G A A A A G U A UU G G C A C U U G U C A UCC C G G C U C U G C U G G C U G C U G G C G C A G C A C A C G C C G C U G A A A U U U A U A A C A A A G A C G G C A A C A A A C U G G A C C G C C A C U G C U U U U C U U U GA U G U C CCCAUU U U G U G G A G C C C A U C A A C C C C G C C A U U U C G G U U C A A G G U U G G U G G G U U U U U U

∆Gopen = 1.6 + 3.9 kcal/mol, ∆∆G = −16 kcal/mol

slide-21
SLIDE 21

The RNAup Approach

n (3’) m (3’) 1 (5’)

i j

1 (5’)

j* i*

  • Compute probability that a site at [i..j] is unpaired (equivalent

to the energy ∆Gopen needed to force it open).

  • Consider all possible ways of binding to the region [i..j] to

compute the interaction energy ∆Ginteract

  • Total binding energy is the sum of these contributions:

∆∆G = ∆Gopen + ∆Ginteract

  • Currently, restrict interactions to a single region
slide-22
SLIDE 22

Computing Accessibility

∆Gopen is equivalent to the probability that the region [i..j] is unpaired in equilibrium ∆Gopen = −RT ln Pu[i, j]

  • Constrained folding ∆Gopen = ∆G constr − ∆G free
  • Boltzmann sampling, works for short regions only
  • Direct computation by modified folding algorithm
slide-23
SLIDE 23

Computing Accessibility

∆Gopen is equivalent to the probability that the region [i..j] is unpaired in equilibrium ∆Gopen = −RT ln Pu[i, j]

  • Constrained folding ∆Gopen = ∆G constr − ∆G free
  • Boltzmann sampling, works for short regions only
  • Direct computation by modified folding algorithm

Pu[i, j] = Z1,i−1Zj+1,n Zn +

  • h<i,j<l

ph,l · Prob ([i, j]|(k, l))

l k i j i j l k

C

p q

C

p q i j l k i j l k

M M

...

slide-24
SLIDE 24

RNAup

Structural Information

slide-25
SLIDE 25

RNAup

Structural Information

slide-26
SLIDE 26

RNAup

Structural Information

slide-27
SLIDE 27

RNAup

Structural Information

slide-28
SLIDE 28

RNAup

Structural Information

slide-29
SLIDE 29

RNAup

Interaction Information

slide-30
SLIDE 30

Example: siRNA Binding

0.2 0.4 0.6 0.8 1 Probabilities VR1 straight VR1 HP5_16 VR1 HP5_11 0.2 0.4 Expression

Sequence position

160 180 160 180 160 180 160 180 VR1 HP5_6 1060 1080

  • 25
  • 20
  • 15
  • 10
  • 5

∆Gi [kcal/mol]

Data taken from Schubert et al 2006

slide-31
SLIDE 31

A scanning Version of RNAup

Can we adapt this method for fast searching in large databases?

  • Local folding algorithms can scan very large sequences by

restricting the size of local structures to some maximum L.

  • RNAplfold computes the probability that regions of length u

are unpaired by averaging over all windows of length L

  • Runtime is linear in the length of the database O(n · L2)

u L

slide-32
SLIDE 32

A scanning Version of RNAup

Can we adapt this method for fast searching in large databases?

  • Local folding algorithms can scan very large sequences by

restricting the size of local structures to some maximum L.

  • RNAplfold computes the probability that regions of length u

are unpaired by averaging over all windows of length L

  • Runtime is linear in the length of the database O(n · L2)

u L

Computes average over all windows containing the region

πL[i, j] = 1 L − (j − i) + 1

i

  • u=j−L

Pu,L[i, j]

slide-33
SLIDE 33

50 100 150 200 250 sequence position 0.0 0.2 0.4 0.6 0.8 1.0 p[unpaired]

RNAup RNAplfold -u sfold

U C U A G A A A G U U U U C A C A A A G C U A A C A G G U A C C U C G A G A A G U U U U C A C A A A G C U A A C A C C G G A A G U U U U C A C A A A G C U A A C A A C U A G U G U A C C A A G U U U U C A C A A A G C U A A C A A U C G C G G G C C C U A G A G C G G C C G C U U C G A G C A G A C A U G A U A A G A U A C A U U G A U G A G U U U G G A C A A A C C A C A A C U A G A A U G C A G U G A A A A A A A U G C U U U A U U U G U G A A A U U U G U G A U G C U A U U G C U U U A U U U G U A A C C A U U A U A A G C U G C A A U A A A C A
slide-34
SLIDE 34

Accessibility of miRNA targets

1300 1310 1320 1330 1340 1350 position in sequence 1e-06 0.0001 0.01 1 log(accessibility) U 8 U 16

NON WORKING -36.5 kcal/mol

1660 1670 1680 1690 1700 position in sequence 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 log(accessibility) U 8 U 16

WORKING -28.3 kcal/mol

slide-35
SLIDE 35

Accessibility and miRNA targets

slide-36
SLIDE 36

Accessibility predicts siRNA efficiency

Data provided by Dharmacon