A partition function algorithm for RNA-RNA interaction Hamidreza - - PowerPoint PPT Presentation

a partition function algorithm for rna rna interaction
SMART_READER_LITE
LIVE PREVIEW

A partition function algorithm for RNA-RNA interaction Hamidreza - - PowerPoint PPT Presentation

A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen Wayne State University chitsaz@wayne.edu Benasque RNA Meeting July 27 th , 2012 1/76, Mini biography Robotics RNA


slide-1
SLIDE 1

A partition function algorithm for RNA-RNA interaction

Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen

Wayne State University chitsaz@wayne.edu

Benasque RNA Meeting

July 27th, 2012 1/76,

slide-2
SLIDE 2

Mini biography

Robotics → RNA → Genome Assembly ◮ University of Illinois, Urbana-Champaign (Steven M. LaValle): PhD,

Computer Science, 2008

◮ Simon Fraser University, Vancouver (Cenk Sahinalp): Postdoc, RNA

algorithms, 2009

◮ University of California, San Diego (Pavel Pevzner): Postdoc, genome

assembly, 2011

◮ Wayne State University, Detroit: Assistant professor, 2011-

2/76,

slide-3
SLIDE 3

Mini biography

Robotics → RNA → Genome Assembly ◮ University of Illinois, Urbana-Champaign (Steven M. LaValle): PhD,

Computer Science, 2008

◮ Simon Fraser University, Vancouver (Cenk Sahinalp): Postdoc, RNA

algorithms, 2009

◮ University of California, San Diego (Pavel Pevzner): Postdoc, genome

assembly, 2011

◮ Wayne State University, Detroit: Assistant professor, 2011-

2/76,

slide-4
SLIDE 4

Mini biography

Robotics → RNA → Genome Assembly ◮ University of Illinois, Urbana-Champaign (Steven M. LaValle): PhD,

Computer Science, 2008

◮ Simon Fraser University, Vancouver (Cenk Sahinalp): Postdoc, RNA

algorithms, 2009

◮ University of California, San Diego (Pavel Pevzner): Postdoc, genome

assembly, 2011

◮ Wayne State University, Detroit: Assistant professor, 2011-

2/76,

slide-5
SLIDE 5

Mini biography

Robotics → RNA → Genome Assembly ◮ University of Illinois, Urbana-Champaign (Steven M. LaValle): PhD,

Computer Science, 2008

◮ Simon Fraser University, Vancouver (Cenk Sahinalp): Postdoc, RNA

algorithms, 2009

◮ University of California, San Diego (Pavel Pevzner): Postdoc, genome

assembly, 2011

◮ Wayne State University, Detroit: Assistant professor, 2011-

2/76,

slide-6
SLIDE 6

Single-cell bacterial genome and transcriptome assembly

3/76,

slide-7
SLIDE 7

Algorithmic Biology Laboratory

Wayne State University

http://compbio.cs.wayne.edu

4/76,

slide-8
SLIDE 8

Central dogma

DNA → RNA → Protein

5/76,

slide-9
SLIDE 9

Motivation

Post-transcriptional regulation of gene expression

6/76,

slide-10
SLIDE 10

Regulatory RNA

Repression example (Argaman and Altuvia, J. Mol. Biol. 2000)

7/76,

slide-11
SLIDE 11

Regulatory RNA

Activation example (Repoila, Majdalani, and Gottesman, Mol. Microbiol. 2003)

8/76,

slide-12
SLIDE 12

Background

RNA-RNA MFE structure prediction ◮ Avoid intramolecular base pairing

RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et

  • al. 2008)

No internal structure

◮ Concatenate input sequences as a single strand; no pseudoknots

PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins

◮ Predict binding sites

RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure

◮ Concatenate input sequences; consider special pseudoknots

NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!

9/76,

slide-13
SLIDE 13

Background

RNA-RNA MFE structure prediction ◮ Avoid intramolecular base pairing

RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et

  • al. 2008)

No internal structure

◮ Concatenate input sequences as a single strand; no pseudoknots

PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins

◮ Predict binding sites

RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure

◮ Concatenate input sequences; consider special pseudoknots

NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!

9/76,

slide-14
SLIDE 14

Background

RNA-RNA MFE structure prediction ◮ Avoid intramolecular base pairing

RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et

  • al. 2008)

No internal structure

◮ Concatenate input sequences as a single strand; no pseudoknots

PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins

◮ Predict binding sites

RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure

◮ Concatenate input sequences; consider special pseudoknots

NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!

9/76,

slide-15
SLIDE 15

Background

RNA-RNA MFE structure prediction ◮ Avoid intramolecular base pairing

RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et

  • al. 2008)

No internal structure

◮ Concatenate input sequences as a single strand; no pseudoknots

PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins

◮ Predict binding sites

RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure

◮ Concatenate input sequences; consider special pseudoknots

NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!

9/76,

slide-16
SLIDE 16

Background (continued)

RNA-RNA MFE structure prediction

Consider inter- and intramolecular base pairing

IRIS (Pervouchine 2004), inteRNA (Alkan et al. 2005), Grammatical Approach (Kato et al. 2009) Voilà, now we are talking business. The problem is NP-Hard (Alkan et al. 2005); no surprise as pseudoknots are NP-Hard. Exclude zigzags and crossing interactions to lift the curse of complexity and obtain an exact O(n6)-time O(n4)-space DP algorithm (albeit for simple base-pair counting). First order zigzag. A general zigzag involves an arbitrary number of kissing hairpins.

10/76,

slide-17
SLIDE 17

Ahhh...but MFE is often wrong!

Question: how about

  • 1. computing base pairing probabilities,
  • 2. sampling from the Boltzmann ensemble of interaction structures,

clustering, centroids, etc.,

  • 3. and computing equilibrium concentrations and melting temperature for

RNA-RNA compounds? Answer: the key enabling technology is the partition function. All of the above can be computed from the partition function.

11/76,

slide-18
SLIDE 18

Ahhh...but MFE is often wrong!

Question: how about

  • 1. computing base pairing probabilities,
  • 2. sampling from the Boltzmann ensemble of interaction structures,

clustering, centroids, etc.,

  • 3. and computing equilibrium concentrations and melting temperature for

RNA-RNA compounds? Answer: the key enabling technology is the partition function. All of the above can be computed from the partition function.

11/76,

slide-19
SLIDE 19

Ahhh...but MFE is often wrong!

Question: how about

  • 1. computing base pairing probabilities,
  • 2. sampling from the Boltzmann ensemble of interaction structures,

clustering, centroids, etc.,

  • 3. and computing equilibrium concentrations and melting temperature for

RNA-RNA compounds? Answer: the key enabling technology is the partition function. All of the above can be computed from the partition function.

11/76,

slide-20
SLIDE 20

Ahhh...but MFE is often wrong!

Question: how about

  • 1. computing base pairing probabilities,
  • 2. sampling from the Boltzmann ensemble of interaction structures,

clustering, centroids, etc.,

  • 3. and computing equilibrium concentrations and melting temperature for

RNA-RNA compounds? Answer: the key enabling technology is the partition function. All of the above can be computed from the partition function.

11/76,

slide-21
SLIDE 21

Partition function

Q(T) = ∑s∈S e−Gs/RT, S = All considered interaction structures, p(s) ∝ e−Gs/RT, and Q is the normalizing factor. Also other thermodynamic quantities can be derived from Q.

12/76,

slide-22
SLIDE 22

Partition function

Q(T) = ∑s∈S e−Gs/RT, S = All considered interaction structures, p(s) ∝ e−Gs/RT, and Q is the normalizing factor. Also other thermodynamic quantities can be derived from Q.

12/76,

slide-23
SLIDE 23

Partition function hardness ≥ MFE hardness

Partition function

s∈S

e−Gs/RT.

MFE secondary structure

argmins∈SGs. Transform a partition function algorithm to an MFE algorithm by e−Gs → Gs

× → + ∑ → min

13/76,

slide-24
SLIDE 24

Partition function hardness ≥ MFE hardness

Partition function

s∈S

e−Gs/RT.

MFE secondary structure

argmins∈SGs. Transform a partition function algorithm to an MFE algorithm by e−Gs → Gs

× → + ∑ → min

13/76,

slide-25
SLIDE 25

Turner energy model

Mathews et al. 1999 14/76,

slide-26
SLIDE 26

Our extension of the Turner model

Chitsaz et al., Bioinformatics 25(12): i365-i373

Hybrid component: as if intramolecular, with penalties. Kissing loop: like multibranch loop.

15/76,

slide-27
SLIDE 27

Interaction partition function

How?

Divide and conquer using dynamic programming: Q(T) = ∑

s∈S

e−Gs/RT

= ∑

s=sa∪sb

e−(Gsa+Gsb)/RT

= [ ∑

sa∈Sa

e−Gsa/RT][ ∑

sb∈Sb

e−Gsb/RT]

= Qa(T)Qb(T).

16/76,

slide-28
SLIDE 28

Partition function for single strand (McCaskill 1990)

straight horizontal line: nucleotides indexed from 1 to n solid arc: a base pair dashed arc: can be base pair or not white region: open to more recursions blue region: finalized in the recursion, compute its energy contribution green region: open to more recursions with multibranch loop energy

17/76,

slide-29
SLIDE 29

Partition function for two strands

straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two

=

I Ia Ib iR k2 k1 k2 k1 jR iS jS

QI

iR,jR,iS,jS =QiR,jR QiS,jS +

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIb

k1,jR,iS,k2+

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIa

k1,jR,iS,k2. 18/76,

slide-30
SLIDE 30

Partition function for two strands

straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two

=

I Ia Ib iR k2 k1 k2 k1 jR iS jS

QI

iR,jR,iS,jS =QiR,jR QiS,jS +

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIb

k1,jR,iS,k2+

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIa

k1,jR,iS,k2. 18/76,

slide-31
SLIDE 31

Partition function for two strands

straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two

=

I Ia Ib iR k2 k1 k2 k1 jR iS jS

QI

iR,jR,iS,jS =QiR,jR QiS,jS +

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIb

k1,jR,iS,k2+

iR≤k1<jR iS<k2≤jS

QiR,k1−1Qk2+1,jS QIa

k1,jR,iS,k2. 18/76,

slide-32
SLIDE 32

QIb

=

Ib Ih = Ihh Ih Ib Ih Ib Ih Ia = Ihb jS iS k1 k′

1

k1 k′

1

k1 k′

1

k1 iR jR k2 k′

2

k2 k′

2

k2 k′

2

k2 bz bz

b: stands for bond

19/76,

slide-33
SLIDE 33

QIa

a: stands for arc s: stands for subsume e: stands for equivalent =

I I I Is Ie Ia Is′ k2 k1 k1 k2 k2 k1 jS iR iS jR

20/76,

slide-34
SLIDE 34

QIs and QIe

=

Ism Isk Is iR jS jR iS

s: stands for subsume k: stands for kissing-loop m: stands for multi-loop

=

Ie gm Ism Isk gk k2 k1 k2 k1 iR iS jS jR

e: stands for equivalent

21/76,

slide-35
SLIDE 35

All tables

=

bz b i j j i k2 k1

22/76,

slide-36
SLIDE 36

All tables

=

i j i j i j i j i j i j j i j i e d g e e e e e e i j e bz bz bz bz bz bz bz bz

g

e

g g g

k1 k2 k1 k2 k1 k2 d d d d d d d d k1 k2

23/76,

slide-37
SLIDE 37

All tables

=

e d k2 i j g k1 i j gm i j e e d d

24/76,

slide-38
SLIDE 38

All tables

=

Ind Idd I∗d jR iR jS iS

25/76,

slide-39
SLIDE 39

All tables

=

I I I Is Ie Ia Is′ k2 k1 k1 k2 k2 k1 jS iR iS jR

26/76,

slide-40
SLIDE 40

All tables

=

I∗n Idn Inn jR iR jS iS

27/76,

slide-41
SLIDE 41

All tables

=

Is Ie Iar Is′ Ir Ir Ir jR iR k2 k1 k1 k2 k2 k1 jS iS

28/76,

slide-42
SLIDE 42

All tables

=

Ib Ih

= Ihh

Ih Ib Ih Ib Ih Ia jS iS k1 k′

1

k1 k′

1

k1 k′

1

k1 iR jR k2 k′

2

k2 k′

2

k2 k′

2

k2 bz bz

29/76,

slide-43
SLIDE 43

All tables

=

Idn Idd Id∗ jR iR jS iS

30/76,

slide-44
SLIDE 44

All tables

=

Idn Iadn jR iR jS iS k2 k1

31/76,

slide-45
SLIDE 45

All tables

=

I Ia Ib iR k2 k1 k2 k1 jR iS jS

32/76,

slide-46
SLIDE 46

All tables

=

Ih Ih jR iR jS iS k1 k2

33/76,

slide-47
SLIDE 47

All tables

=

Ikk Iadd Iadd Iand Iadn Iadd iR jR jS iS Ibr Ib Ib Ib Iar Is Is′ Ie Is Is′ Ie Isk Isk′

34/76,

slide-48
SLIDE 48

All tables

=

Imk Iand Iand iR jR jS iS Is Ism′ Iand Ie Iann Isk 35/76,

slide-49
SLIDE 49

All tables

=

Inn Ind In∗ jR iR jS iS

36/76,

slide-50
SLIDE 50

All tables

=

Inn Iann jR iR jS iS k2 k1

37/76,

slide-51
SLIDE 51

All tables

=

Ism Isk Is iR jS jR iS

38/76,

slide-52
SLIDE 52

All tables

=

Ism gm gk Imm Ikm jR jS iS iR

39/76,

slide-53
SLIDE 53

All tables

=

b b i j i j i k1 k2 j i j b k1 k2 bz

40/76,

slide-54
SLIDE 54

All tables

=

b i j i j i j k1 k2

41/76,

slide-55
SLIDE 55

All tables

=

gk e d k2 i j g k1 i j i j e e d d

42/76,

slide-56
SLIDE 56

All tables

=

Iadd Ie Isk′ Id∗ Ism Isk Idd I∗d Ism′ Idd Idd jR iR jS iS k2 k1 k2 k1 k1 k2 k1 k2 k2 k1

43/76,

slide-57
SLIDE 57

All tables

=

Iadn Ie Isk′ Ism Idn I∗n Ism′ Idn Idn jR iR jS iS k2 k1 k1 k2 k1 k2 k2 k1 44/76,

slide-58
SLIDE 58

All tables

=

Iand Ism Ie Isk Ism′ Ind In∗ Ind Ind jR iR jS iS k2 k1 k1 k2 k1 k2 k2 k1 45/76,

slide-59
SLIDE 59

All tables

=

gm gk Isk Ikk Imk iS jS iR jR

46/76,

slide-60
SLIDE 60

All tables

=

Iann Ism Inn Ism′ Ie Inn Inn jR iR jS iS k2 k1 k1 k2 k2 k1

47/76,

slide-61
SLIDE 61

All tables

=

Ib Ib Ia Ihh jR iR jS iS k1 k1 k2 k2 Ihm Ihh

48/76,

slide-62
SLIDE 62

All tables

=

Ibr Iar Ibr Ih jR iR jS iS k1 k1 k2 k2 Ihh Ihb

49/76,

slide-63
SLIDE 63

All tables

=

Ie gm Ism Isk gk k2 k1 k2 k1 iR iS jS jR

50/76,

slide-64
SLIDE 64

All tables

=

Ib Iadd Idd k2 k2 k1 k1 jR iR jS iS

51/76,

slide-65
SLIDE 65

All tables

=

Ih Ih Ihb jR iR jS iS k1 k1 k2 k2 bz bz

52/76,

slide-66
SLIDE 66

All tables

=

Ih Ihh jR iR jS iS k1 k2

53/76,

slide-67
SLIDE 67

All tables

=

Ikm Iadn Iadn iR jR iS jS Ism Is′ Iadn Ie Iann Isk′ 54/76,

slide-68
SLIDE 68

All tables

=

Imm Ism Iann Iann Ism′ Iann Ie jR iR jS iS

55/76,

slide-69
SLIDE 69

All tables

=

Ind Iand jR iR jS iS k2 k1

56/76,

slide-70
SLIDE 70

All tables

=

Iar Ibr Ir k2 k1 k2 k1 jR iR jS iS

57/76,

slide-71
SLIDE 71

Equilibrium concentrations

For two RNAs R and S

Assume five types of chemical compounds: R, S, RR, SS, RS. Solve KR = QI

RR

Q2

R = NRR

N2

R ,

KS =

QI

SS

Q2

S = NSS

N2

S ,

KRS =

QI

RS

QRQS = NRS NRNS ,

NRS = N0

R − 2NRR − NR = N0 S − 2NSS − NS,

to obtain the equilibrium concentrations N. N0 are the initial concentrations of single strands.

58/76,

slide-72
SLIDE 72

Equilibrium concentration of OxyS with wild type fhlA

20 40 60 80 100 200 400 600 800 1000 OxyS in complex [%] fhlA Concentration [nM]

wild type fhlA

Our Algorithm Experiment

  • Init. [OxyS] = 2nM, [fhlA] = 0 to 1000nM

59/76,

slide-73
SLIDE 73

Equilibrium concentration of OxyS with fhlA mutants

20 40 60 80 100 100 200 300 400 OxyS in complex [%] fhlAA8G Concentration [nM]

fhlAA8G

Our Algorithm Experiment 20 40 60 80 100 0 100 200 300 400 500 600 700 OxyS in complex [%] fhlAC13G Concentration [nM]

fhlAC13G

Our Algorithm Experiment 20 40 60 80 100 200 400 600 800 1000 OxyS in complex [%] fhlAG37C;G38C Concentration [nM]

fhlAG37C;G38C

Our Algorithm Experiment 20 40 60 80 100 150 300 450 600 750 900 OxyS in complex [%] fhlAG38C;G39C Concentration [nM]

fhlAG38C;G39C

Our Algorithm Experiment

60/76,

slide-74
SLIDE 74

Melting temperature prediction

Comparison of piRNA results over three data sets

Set Size Length Avg error piRNA RNAcofold UNAFold I 9 short pairs 5-7nt 1.48◦C 9.35◦C 8.55◦C II 12 pairs

∼ 20nt

4.86◦C 22.97◦C 9.12◦C III 62 pairs 22− 40nt 1.91◦C 14.34◦C 26.53◦C Set Size Length Spearman rank correlation piRNA RNAcofold UNAFold I 9 short pairs 5-7nt 0.97 0.97 0.57 II 12 pairs

∼ 20nt

0.41

−0.03

0.1 III 62 pairs 22− 40nt 0.3

−0.04

0.24

61/76,

slide-75
SLIDE 75

Promised base pairing probabilities

PI and PIa examples

PI

iR,jR,iS,jS = ∑

1≤k1<iR jS<k2≤LS

PIa

k1,jR,iS,k2

(QIs

k1,iR,jS,k2 + QIs′ k1,iR,jS,k2 + QIe k1,iR,jS,k2)QI iR,jR,iS,jS

QIa

k1,jR,iS,k2

,

PIa

iR,jR,iS,jS = ∑

1≤k1≤iR jS≤k2≤LS

PI

k1,jR,iS,k2

Qk1,iR−1QjS+1,k2QIa

iR,jR,iS,jS

QI

k1,jR,iS,k2

+

1≤k1<iR jS≤k2≤LS

PIb

k1,jR,iS,k2

QIhh

k1,iR,jS,k2QIa iR,jR,iS,jS

QIb

k1,jR,iS,k2

.

More on this part will be presented by Peter Stadler.

62/76,

slide-76
SLIDE 76

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-77
SLIDE 77

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-78
SLIDE 78

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-79
SLIDE 79

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-80
SLIDE 80

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-81
SLIDE 81

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-82
SLIDE 82

Sampling from the Boltzmann ensemble

◮ Push I(1,n,1,m) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the

recursions.

◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only

  • ne case here, i.e.

Qtop =

iR≤k1<jR iS<k2≤jS

Qleft

iR,k1,k2,jSQright k1+1,jR,iS,k2+1

◮ Find k∗

1,k∗ 2 such that

iR≤k1<k∗ 1 iS<k2≤k∗ 2

··· ≃α ∑

iR≤k1<jR iS<k2≤jS

··· .

◮ Push left(iR,k∗

1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.

63/76,

slide-83
SLIDE 83

Fast Ponty-style sampling of the Boltzmann ensemble

· · · k1 k2 iR iS

iS + 1 iS + 2

jS · · · jR

iR + 1 iR + 2

k1 k2 · · · iR+jR

2

  • jR − 1

iR + 1

jR iR iS jS iS+jS

2

  • ·

· ·

jS − 1 iS + 1

Naïve traversal of indices Balanced traversal of indices O(n4) O(n2 logn)

64/76,

slide-84
SLIDE 84

Time and space complexity of piRNA

◮ O(n4m2 + n2m4) time. ◮ O(n2m2) space. ◮ about 100 tables in the dynamic programming. ◮ takes about 1 day on 64 CPUs with 150GB RAM for two 110nt RNAs

(OxyS-fhlA). Therefore, a fast heuristic is on demand for high-throughput applications, possibly as a filtering step.

65/76,

slide-85
SLIDE 85

Time and space complexity of piRNA

◮ O(n4m2 + n2m4) time. ◮ O(n2m2) space. ◮ about 100 tables in the dynamic programming. ◮ takes about 1 day on 64 CPUs with 150GB RAM for two 110nt RNAs

(OxyS-fhlA). Therefore, a fast heuristic is on demand for high-throughput applications, possibly as a filtering step.

65/76,

slide-86
SLIDE 86

Binding sites prediction

biRNA : a fast algorithm to predict simultaneous binding sites of two nucleic acids

Pros

◮ Predicts multiple simultaneous binding sites. ◮ Computes a more accurate local energy of binding. ◮ Considers zigzags and crossing interactions. ◮ Maintains tractability for existing cases in the literature.

Cons

◮ Approximates the intramolecular site accessibility energy. ◮ Its running time grows exponentially with the maximum number of

simultaneous binding sites.

66/76,

slide-87
SLIDE 87

Binding sites prediction

biRNA : a fast algorithm to predict simultaneous binding sites of two nucleic acids

Pros

◮ Predicts multiple simultaneous binding sites. ◮ Computes a more accurate local energy of binding. ◮ Considers zigzags and crossing interactions. ◮ Maintains tractability for existing cases in the literature.

Cons

◮ Approximates the intramolecular site accessibility energy. ◮ Its running time grows exponentially with the maximum number of

simultaneous binding sites.

66/76,

slide-88
SLIDE 88

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-89
SLIDE 89

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-90
SLIDE 90

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-91
SLIDE 91

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-92
SLIDE 92

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-93
SLIDE 93

biRNA

Steps of the algorithm for R and S

  • 1. For all short subsequences W, compute Pu(W), the prob. of being

unpaired (Mückstein et al. 2008).

  • 2. Obtain V , a short list of candidate sites.
  • 3. For all pairs W1,W2, compute Pu(W1,W2), the joint pairwise prob. of

being simultaneously unpaired.

  • 4. Build tree-structured Markov Random Fields (MRF) T = (V ,E) to

approximate the distribution of being simultaneously unpaired (Chow and Liu 1968).

  • 5. Compute QI

W RW S, the interaction partition functions restricted to

subsequences W R and W S using piRNA.

  • 6. Compute a matching between T R and T S that minimizes the binding

energy or equivalently maximizes the binding probability.

67/76,

slide-94
SLIDE 94

biRNA

Binding energy minimization

Exhaustive search to find matching M = {(W R

1 ,W S 1 ),(W R 2 ,W S 2 ),...,(W R k ,W S k )} that minimizes

∆G(M) = EDR

u (M)+ EDS u (M)+∆GRS b (M),

in which EDR

u (M) = −RT logPR∗ u (W R 1 ,W R 2 ,...,W R k )

EDS

u (M) = −RT logPS∗ u (W S 1 ,W S 2 ,...,W S k )

∆GRS

b (M) = −RT ∑ 1≤i≤k

log(QI

W R

i W S i − QW R i QW S i ).

R is the universal gas constant and T is temperature.

68/76,

slide-95
SLIDE 95

Experimental results

Multi-sites

Pair Binding Site(s) biRNA RNAup Literature Site(s) Site OxyS-fhlA [22,30] [95,87] (23,30) (94,87)

  • [98,104]

[45,39] (96,104) (48,39) (96,104) (48,39) CopA-CopT [22,33] [70,59] (22,31) (70,61)

  • [48,56]

[44,36] (49,57) (43,35) (49,67) (43,24) [62,67] [29,24] (58,67) (33,24)

  • 69/76,
slide-96
SLIDE 96

Experimental results

Uni-sites

Pair GcvB gltI GcvB argT GcvB dppA GcvB livJ GcvB livK GcvB

  • ppA

GcvB STM4351 MicA lamB MicA

  • mpA

DsrA rpoS RprA rpoS IstR tisA MicC

  • mpC

MicF

  • mpF

RyhB sdhD RyhB sodB SgrS ptsG IncRNA54 repZ Lengths: 71-253 nt Running time: 10 min - 1 hour on 8 dual core CPUs and 20GB of RAM

70/76,

slide-97
SLIDE 97

Summary

◮ We presented piRNA an O(n4m2 + n2m4)-time O(n2m2)-space

complexity algorithm for interaction partition function, base-pair probabilities, minimum free energy secondary structure, equilibrium concentrations, melting temperature, and some other derivatives of the partition function.

◮ piRNA outperforms all other alternatives and is available at

http://compbio.cs.wayne.edu/chitsaz/.

◮ We presented biRNA , a fast RNA-RNA binding sites prediction algorithm. ◮ biRNA ’s tree-structured MRF approximation is accurate enough for

predicting binding sites and may be used in other applications.

71/76,

slide-98
SLIDE 98

Summary

◮ We presented piRNA an O(n4m2 + n2m4)-time O(n2m2)-space

complexity algorithm for interaction partition function, base-pair probabilities, minimum free energy secondary structure, equilibrium concentrations, melting temperature, and some other derivatives of the partition function.

◮ piRNA outperforms all other alternatives and is available at

http://compbio.cs.wayne.edu/chitsaz/.

◮ We presented biRNA , a fast RNA-RNA binding sites prediction algorithm. ◮ biRNA ’s tree-structured MRF approximation is accurate enough for

predicting binding sites and may be used in other applications.

71/76,

slide-99
SLIDE 99

Summary

◮ We presented piRNA an O(n4m2 + n2m4)-time O(n2m2)-space

complexity algorithm for interaction partition function, base-pair probabilities, minimum free energy secondary structure, equilibrium concentrations, melting temperature, and some other derivatives of the partition function.

◮ piRNA outperforms all other alternatives and is available at

http://compbio.cs.wayne.edu/chitsaz/.

◮ We presented biRNA , a fast RNA-RNA binding sites prediction algorithm. ◮ biRNA ’s tree-structured MRF approximation is accurate enough for

predicting binding sites and may be used in other applications.

71/76,

slide-100
SLIDE 100

Summary

◮ We presented piRNA an O(n4m2 + n2m4)-time O(n2m2)-space

complexity algorithm for interaction partition function, base-pair probabilities, minimum free energy secondary structure, equilibrium concentrations, melting temperature, and some other derivatives of the partition function.

◮ piRNA outperforms all other alternatives and is available at

http://compbio.cs.wayne.edu/chitsaz/.

◮ We presented biRNA , a fast RNA-RNA binding sites prediction algorithm. ◮ biRNA ’s tree-structured MRF approximation is accurate enough for

predicting binding sites and may be used in other applications.

71/76,

slide-101
SLIDE 101

Future work

◮ RNA design for positive and negative interactions. ◮ Better interaction energy model, which requires more data. ◮ Incorporation of non-canonical base pairs.

72/76,

slide-102
SLIDE 102

Future work

◮ RNA design for positive and negative interactions. ◮ Better interaction energy model, which requires more data. ◮ Incorporation of non-canonical base pairs.

72/76,

slide-103
SLIDE 103

Future work

◮ RNA design for positive and negative interactions. ◮ Better interaction energy model, which requires more data. ◮ Incorporation of non-canonical base pairs.

72/76,

slide-104
SLIDE 104

Acknowledgement

Collaborators

◮ Rolf Backofen, University of Freiburg, Germany ◮ Cenk Sahinalp, SFU, Canada ◮ Raheleh Salari

Funding

73/76,

slide-105
SLIDE 105

Thanks for your attention!

74/76,

slide-106
SLIDE 106

Hybrid component

Example

stem1 bulge stem2 internal stem3

R S β1 ∗σ

Ghybrid = β1 +σ(Gstem1 + Gbulge + Gstem2 + Ginternal + Gstem3).

75/76,

slide-107
SLIDE 107

Kissing loop

Example

R S

β2 β2 β3 β3

Gkissing = 4β2 + 2β3.

76/76,