A partition function algorithm for RNA-RNA interaction
Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen
Wayne State University chitsaz@wayne.edu
Benasque RNA Meeting
July 27th, 2012 1/76,
A partition function algorithm for RNA-RNA interaction Hamidreza - - PowerPoint PPT Presentation
A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen Wayne State University chitsaz@wayne.edu Benasque RNA Meeting July 27 th , 2012 1/76, Mini biography Robotics RNA
Wayne State University chitsaz@wayne.edu
July 27th, 2012 1/76,
2/76,
2/76,
2/76,
2/76,
3/76,
4/76,
5/76,
6/76,
7/76,
8/76,
RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et
No internal structure
PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins
RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure
NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!
9/76,
RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et
No internal structure
PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins
RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure
NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!
9/76,
RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et
No internal structure
PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins
RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure
NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!
9/76,
RNAhybrid (Rehmsmeier et al. 2004), RNAduplex (Bernhart et al. 2006), UNAFold (Markham et
No internal structure
PairFold (Andronescu et al. 2005), RNAcofold (Bernhart et al. 2006) No kissing hairpins
RNAup (Mückstein et al. 2008), intaRNA (Busch et al. 2008) Just one binding site not complete structure
NUPACK (Dirks et al. 2003,2007) Still no kissing hairpins!
9/76,
IRIS (Pervouchine 2004), inteRNA (Alkan et al. 2005), Grammatical Approach (Kato et al. 2009) Voilà, now we are talking business. The problem is NP-Hard (Alkan et al. 2005); no surprise as pseudoknots are NP-Hard. Exclude zigzags and crossing interactions to lift the curse of complexity and obtain an exact O(n6)-time O(n4)-space DP algorithm (albeit for simple base-pair counting). First order zigzag. A general zigzag involves an arbitrary number of kissing hairpins.
10/76,
11/76,
11/76,
11/76,
11/76,
12/76,
12/76,
s∈S
13/76,
s∈S
13/76,
Mathews et al. 1999 14/76,
Chitsaz et al., Bioinformatics 25(12): i365-i373
15/76,
s∈S
s=sa∪sb
sa∈Sa
sb∈Sb
16/76,
straight horizontal line: nucleotides indexed from 1 to n solid arc: a base pair dashed arc: can be base pair or not white region: open to more recursions blue region: finalized in the recursion, compute its energy contribution green region: open to more recursions with multibranch loop energy
17/76,
straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two
I Ia Ib iR k2 k1 k2 k1 jR iS jS
QI
iR,jR,iS,jS =QiR,jR QiS,jS +
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIb
k1,jR,iS,k2+
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIa
k1,jR,iS,k2. 18/76,
straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two
I Ia Ib iR k2 k1 k2 k1 jR iS jS
QI
iR,jR,iS,jS =QiR,jR QiS,jS +
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIb
k1,jR,iS,k2+
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIa
k1,jR,iS,k2. 18/76,
straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two
I Ia Ib iR k2 k1 k2 k1 jR iS jS
QI
iR,jR,iS,jS =QiR,jR QiS,jS +
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIb
k1,jR,iS,k2+
iR≤k1<jR iS<k2≤jS
QiR,k1−1Qk2+1,jS QIa
k1,jR,iS,k2. 18/76,
=
Ib Ih = Ihh Ih Ib Ih Ib Ih Ia = Ihb jS iS k1 k′
1
k1 k′
1
k1 k′
1
k1 iR jR k2 k′
2
k2 k′
2
k2 k′
2
k2 bz bz
19/76,
I I I Is Ie Ia Is′ k2 k1 k1 k2 k2 k1 jS iR iS jR
20/76,
Ism Isk Is iR jS jR iS
Ie gm Ism Isk gk k2 k1 k2 k1 iR iS jS jR
21/76,
bz b i j j i k2 k1
22/76,
i j i j i j i j i j i j j i j i e d g e e e e e e i j e bz bz bz bz bz bz bz bz
g
e
g g g
k1 k2 k1 k2 k1 k2 d d d d d d d d k1 k2
23/76,
e d k2 i j g k1 i j gm i j e e d d
24/76,
Ind Idd I∗d jR iR jS iS
25/76,
I I I Is Ie Ia Is′ k2 k1 k1 k2 k2 k1 jS iR iS jR
26/76,
I∗n Idn Inn jR iR jS iS
27/76,
Is Ie Iar Is′ Ir Ir Ir jR iR k2 k1 k1 k2 k2 k1 jS iS
28/76,
Ib Ih
= Ihh
Ih Ib Ih Ib Ih Ia jS iS k1 k′
1
k1 k′
1
k1 k′
1
k1 iR jR k2 k′
2
k2 k′
2
k2 k′
2
k2 bz bz
29/76,
Idn Idd Id∗ jR iR jS iS
30/76,
Idn Iadn jR iR jS iS k2 k1
31/76,
I Ia Ib iR k2 k1 k2 k1 jR iS jS
32/76,
Ih Ih jR iR jS iS k1 k2
33/76,
=
Ikk Iadd Iadd Iand Iadn Iadd iR jR jS iS Ibr Ib Ib Ib Iar Is Is′ Ie Is Is′ Ie Isk Isk′
34/76,
Imk Iand Iand iR jR jS iS Is Ism′ Iand Ie Iann Isk 35/76,
Inn Ind In∗ jR iR jS iS
36/76,
Inn Iann jR iR jS iS k2 k1
37/76,
Ism Isk Is iR jS jR iS
38/76,
Ism gm gk Imm Ikm jR jS iS iR
39/76,
b b i j i j i k1 k2 j i j b k1 k2 bz
40/76,
b i j i j i j k1 k2
41/76,
gk e d k2 i j g k1 i j i j e e d d
42/76,
=
Iadd Ie Isk′ Id∗ Ism Isk Idd I∗d Ism′ Idd Idd jR iR jS iS k2 k1 k2 k1 k1 k2 k1 k2 k2 k1
43/76,
Iadn Ie Isk′ Ism Idn I∗n Ism′ Idn Idn jR iR jS iS k2 k1 k1 k2 k1 k2 k2 k1 44/76,
Iand Ism Ie Isk Ism′ Ind In∗ Ind Ind jR iR jS iS k2 k1 k1 k2 k1 k2 k2 k1 45/76,
gm gk Isk Ikk Imk iS jS iR jR
46/76,
Iann Ism Inn Ism′ Ie Inn Inn jR iR jS iS k2 k1 k1 k2 k2 k1
47/76,
Ib Ib Ia Ihh jR iR jS iS k1 k1 k2 k2 Ihm Ihh
48/76,
Ibr Iar Ibr Ih jR iR jS iS k1 k1 k2 k2 Ihh Ihb
49/76,
Ie gm Ism Isk gk k2 k1 k2 k1 iR iS jS jR
50/76,
Ib Iadd Idd k2 k2 k1 k1 jR iR jS iS
51/76,
Ih Ih Ihb jR iR jS iS k1 k1 k2 k2 bz bz
52/76,
Ih Ihh jR iR jS iS k1 k2
53/76,
Ikm Iadn Iadn iR jR iS jS Ism Is′ Iadn Ie Iann Isk′ 54/76,
Imm Ism Iann Iann Ism′ Iann Ie jR iR jS iS
55/76,
Ind Iand jR iR jS iS k2 k1
56/76,
Iar Ibr Ir k2 k1 k2 k1 jR iR jS iS
57/76,
RR
Q2
R = NRR
N2
R ,
QI
SS
Q2
S = NSS
N2
S ,
QI
RS
QRQS = NRS NRNS ,
R − 2NRR − NR = N0 S − 2NSS − NS,
58/76,
59/76,
20 40 60 80 100 100 200 300 400 OxyS in complex [%] fhlAA8G Concentration [nM]
fhlAA8G
Our Algorithm Experiment 20 40 60 80 100 0 100 200 300 400 500 600 700 OxyS in complex [%] fhlAC13G Concentration [nM]
fhlAC13G
Our Algorithm Experiment 20 40 60 80 100 200 400 600 800 1000 OxyS in complex [%] fhlAG37C;G38C Concentration [nM]
fhlAG37C;G38C
Our Algorithm Experiment 20 40 60 80 100 150 300 450 600 750 900 OxyS in complex [%] fhlAG38C;G39C Concentration [nM]
fhlAG38C;G39C
Our Algorithm Experiment
60/76,
61/76,
iR,jR,iS,jS = ∑
1≤k1<iR jS<k2≤LS
k1,jR,iS,k2
k1,iR,jS,k2 + QIs′ k1,iR,jS,k2 + QIe k1,iR,jS,k2)QI iR,jR,iS,jS
k1,jR,iS,k2
iR,jR,iS,jS = ∑
1≤k1≤iR jS≤k2≤LS
k1,jR,iS,k2
iR,jR,iS,jS
k1,jR,iS,k2
1≤k1<iR jS≤k2≤LS
k1,jR,iS,k2
k1,iR,jS,k2QIa iR,jR,iS,jS
k1,jR,iS,k2
62/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top(iR,jR,iS,jS). ◮ Pick a case of top according to α. For simplicity, we assume there is only
iR≤k1<jR iS<k2≤jS
iR,k1,k2,jSQright k1+1,jR,iS,k2+1
◮ Find k∗
1,k∗ 2 such that
iR≤k1<k∗ 1 iS<k2≤k∗ 2
iR≤k1<jR iS<k2≤jS
◮ Push left(iR,k∗
1,k∗ 2,jS) and right(k1 + 1,jR,iS,k2 + 1) onto the stack.
63/76,
iS + 1 iS + 2
iR + 1 iR + 2
2
iR + 1
2
jS − 1 iS + 1
64/76,
65/76,
65/76,
66/76,
66/76,
W RW S, the interaction partition functions restricted to
67/76,
W RW S, the interaction partition functions restricted to
67/76,
W RW S, the interaction partition functions restricted to
67/76,
W RW S, the interaction partition functions restricted to
67/76,
W RW S, the interaction partition functions restricted to
67/76,
W RW S, the interaction partition functions restricted to
67/76,
1 ,W S 1 ),(W R 2 ,W S 2 ),...,(W R k ,W S k )} that minimizes
u (M)+ EDS u (M)+∆GRS b (M),
u (M) = −RT logPR∗ u (W R 1 ,W R 2 ,...,W R k )
u (M) = −RT logPS∗ u (W S 1 ,W S 2 ,...,W S k )
b (M) = −RT ∑ 1≤i≤k
W R
i W S i − QW R i QW S i ).
68/76,
Pair Binding Site(s) biRNA RNAup Literature Site(s) Site OxyS-fhlA [22,30] [95,87] (23,30) (94,87)
[45,39] (96,104) (48,39) (96,104) (48,39) CopA-CopT [22,33] [70,59] (22,31) (70,61)
[44,36] (49,57) (43,35) (49,67) (43,24) [62,67] [29,24] (58,67) (33,24)
Pair GcvB gltI GcvB argT GcvB dppA GcvB livJ GcvB livK GcvB
GcvB STM4351 MicA lamB MicA
DsrA rpoS RprA rpoS IstR tisA MicC
MicF
RyhB sdhD RyhB sodB SgrS ptsG IncRNA54 repZ Lengths: 71-253 nt Running time: 10 min - 1 hour on 8 dual core CPUs and 20GB of RAM
70/76,
71/76,
71/76,
71/76,
71/76,
72/76,
72/76,
72/76,
73/76,
74/76,
stem1 bulge stem2 internal stem3
R S β1 ∗σ
75/76,
R S
β2 β2 β3 β3
76/76,