A partition function algorithm for RNA-RNA interaction Hamidreza - PowerPoint PPT Presentation

Partition function for two strands straight vertical line: intermolecular bond solid: a base pair dotted: not a base pair dashed: either of those two k 1 k 1 i R j R = Ia I Ib j S i S k 2 k 2 ∑ Q I Q i R , k 1 − 1 Q k 2 + 1 , j S Q Ib i R , j R , i S , j S = Q i R , j R Q i S , j S + k 1 , j R , i S , k 2 + iR ≤ k 1 < jR iS < k 2 ≤ jS ∑ Q i R , k 1 − 1 Q k 2 + 1 , j S Q Ia k 1 , j R , i S , k 2 . iR ≤ k 1 < jR iS < k 2 ≤ jS 18/76,

Q Ib = Ihh k 1 k ′ k 1 j R i R 1 = Ih Ih Ia Ib k 2 k 2 k ′ j S i S 2 k ′ k 1 k ′ k 1 1 bz 1 Ih Ib Ih Ib bz k 2 k 2 k ′ k ′ 2 2 = Ihb b : stands for bond 19/76,

Q Ia a : stands for arc s : stands for subsume e : stands for equivalent k 1 k 1 k 1 i R j R = Ia Is ′ I I Is I Ie i S j S k 2 k 2 k 2 20/76,

Q Is and Q Ie j R i R = Is Ism Isk j S i S s : stands for subsume k : stands for kissing-loop m : stands for multi-loop i R j R = Ie Ism Isk k 2 k 1 k 2 k 1 j S i S gm gk e : stands for equivalent 21/76,

All tables bz = b j i i j k 1 k 2 22/76,

All tables g g = d e i d i i j e j d e j k 1 k 2 bz bz bz bz i e i i d e j d j d e j g g g bz bz i d i bz bz i d e j e j d e j k 1 k 2 k 1 k 2 k 1 k 2 23/76,

All tables g gm = d e i j i d e j i d e j k 1 k 2 24/76,

All tables i R j R = I nd I dd I ∗ d j S i S 25/76,

All tables k 1 k 1 k 1 i R j R = Is ′ Ia I I Is I Ie j S i S k 2 k 2 k 2 26/76,

All tables i R j R = I ∗ n I nn I dn j S i S 27/76,

All tables k 1 k 1 k 1 i R j R = Is ′ Ia r I r I r Is I r Ie j S i S k 2 k 2 k 2 28/76,

All tables = Ihh k ′ k 1 j R k 1 i R 1 = Ih Ih Ia Ib k ′ k 2 k 2 j S i S 2 k ′ k ′ k 1 k 1 bz 1 1 Ih Ih Ib Ib bz k ′ k 2 k ′ k 2 2 2 29/76,

All tables i R j R = I d ∗ I dn I dd j S i S 30/76,

All tables k 1 i R j R = I dn Ia dn j S i S k 2 31/76,

All tables k 1 k 1 i R j R = Ia I Ib i S j S k 2 k 2 32/76,

All tables i R j R k 1 = Ih Ih j S i S k 2 33/76,

All tables i R j R = Ikk Ib r Ib Is Ib Is ′ Ib Ie Ia r j S i S Ia dd Is ′ Isk ′ Is Ia dd Ia dd Ie Ia dn Isk Ia nd 34/76,

All tables i R j R = Ism ′ Imk Ia nd Is Ia nd Ia nd Ie Ia nn Isk j S i S 35/76,

All tables i R j R = I n ∗ I nn I nd j S i S 36/76,

All tables k 1 i R j R = I nn Ia nn j S i S k 2 37/76,

All tables i R j R = Is Ism Isk j S i S 38/76,

All tables gm gk i R j R = Ism Imm Ikm i S j S 39/76,

All tables b = bz i j i j i b j i b j k 1 k 2 k 1 k 2 40/76,

All tables = b i j i j j i k 2 k 1 41/76,

All tables g gk = d e i j i d e j i d e j k 1 k 2 42/76,

All tables k 1 k 1 k 1 k 1 k 1 i R j R = Isk ′ Ia dd Isk I d ∗ I ∗ d Ism ′ I dd Ism I dd Ie I dd j S i S k 2 k 2 k 2 k 2 k 2 43/76,

All tables k 1 k 1 k 1 k 1 i R j R = Isk ′ I ∗ n Ism ′ Ia dn I dn I dn Ism I dn Ie j S i S k 2 k 2 k 2 k 2 44/76,

All tables k 1 k 1 k 1 k 1 i R j R = Isk I n ∗ Ism ′ Ia nd I nd I nd Ism I nd Ie j S i S k 2 k 2 k 2 k 2 45/76,

All tables gm gk i R j R = Isk Imk Ikk j S i S 46/76,

All tables k 1 k 1 k 1 i R j R = Ism ′ I nn Ia nn I nn Ism I nn Ie j S i S k 2 k 2 k 2 47/76,

All tables k 1 k 1 i R j R = Ihh Ihm Ihh Ib Ia Ib k 2 k 2 j S i S 48/76,

All tables k 1 k 1 i R j R = Ihb Ihh Ih Ib r Ia r Ib r k 2 k 2 j S i S 49/76,

All tables i R j R = Ie Ism Isk k 2 k 1 k 2 k 1 j S i S gm gk 50/76,

All tables k 1 k 1 i R j R = Ia dd I dd Ib j S i S k 2 k 2 51/76,

All tables i R j R k 1 k 1 bz = Ih Ihb Ih bz k 2 j S i S k 2 52/76,

All tables i R j R k 1 = Ihh Ih j S i S k 2 53/76,

All tables i R j R = Isk ′ Is ′ Ia nn Ikm Ia dn Ism Ia dn Ia dn Ie j S i S 54/76,

All tables i R j R = Ism ′ Imm Ia nn Ism Ia nn Ia nn Ie j S i S 55/76,

All tables k 1 i R j R = I nd Ia nd j S i S k 2 56/76,

All tables k 1 k 1 i R j R = I r Ia r Ib r j S i S k 2 k 2 57/76,

Equilibrium concentrations For two RNAs R and S Assume five types of chemical compounds: R , S , RR , SS , RS . Solve K R = Q I R = N RR RR R , Q 2 N 2 Q I S = N SS K S = SS S , Q 2 N 2 Q I N RS K RS = RS Q R Q S = N R N S , N RS = N 0 R − 2 N RR − N R = N 0 S − 2 N SS − N S , to obtain the equilibrium concentrations N . N 0 are the initial concentrations of single strands. 58/76,

Equilibrium concentration of OxyS with wild type fhlA 100 OxyS in complex [%] 80 60 wild type fhlA 40 20 Our Algorithm Experiment 0 0 200 400 600 800 1000 fhlA Concentration [nM] Init. [OxyS] = 2nM, [fhlA] = 0 to 1000nM 59/76,

Equilibrium concentration of OxyS with fhlA mutants 100 100 OxyS in complex [%] OxyS in complex [%] 80 80 60 60 fhlA A8G fhlA C13G 40 40 20 20 Our Algorithm Our Algorithm Experiment Experiment 0 0 0 100 200 300 400 0 100 200 300 400 500 600 700 fhlA A8G Concentration [nM] fhlA C13G Concentration [nM] 100 100 OxyS in complex [%] OxyS in complex [%] 80 80 60 60 fhlA G37C;G38C fhlA G38C;G39C 40 40 20 20 Our Algorithm Our Algorithm Experiment Experiment 0 0 0 200 400 600 800 1000 0 150 300 450 600 750 900 fhlA G37C;G38C Concentration [nM] fhlA G38C;G39C Concentration [nM] 60/76,

Melting temperature prediction Comparison of piRNA results over three data sets Set Size Length Avg error piRNA RNAcofold UNAFold 1 . 48 ◦ C 9 . 35 ◦ C 8 . 55 ◦ C I 9 short pairs 5-7nt 4 . 86 ◦ C 22 . 97 ◦ C 9 . 12 ◦ C ∼ 20nt II 12 pairs 1 . 91 ◦ C 14 . 34 ◦ C 26 . 53 ◦ C III 62 pairs 22 − 40nt Set Size Length Spearman rank correlation piRNA RNAcofold UNAFold 0 . 97 0 . 97 0 . 57 I 9 short pairs 5-7nt 0 . 41 ∼ 20nt − 0 . 03 0 . 1 II 12 pairs 0 . 3 22 − 40nt − 0 . 04 0 . 24 III 62 pairs 61/76,

Promised base pairing probabilities P I and P Ia examples ( Q Is k 1 , i R , j S , k 2 + Q Is ′ k 1 , i R , j S , k 2 + Q Ie k 1 , i R , j S , k 2 ) Q I i R , j R , i S , j S = ∑ i R , j R , i S , j S P I P Ia , k 1 , j R , i S , k 2 Q Ia k 1 , j R , i S , k 2 1 ≤ k 1 < iR jS < k 2 ≤ LS Q k 1 , i R − 1 Q j S + 1 , k 2 Q Ia i R , j R , i S , j S = ∑ i R , j R , i S , j S P Ia P I + k 1 , j R , i S , k 2 Q I k 1 , j R , i S , k 2 1 ≤ k 1 ≤ iR jS ≤ k 2 ≤ LS Q Ihh k 1 , i R , j S , k 2 Q Ia ∑ i R , j R , i S , j S P Ib . k 1 , j R , i S , k 2 Q Ib k 1 , j R , i S , k 2 1 ≤ k 1 < iR jS ≤ k 2 ≤ LS More on this part will be presented by Peter Stadler. 62/76,

Sampling from the Boltzmann ensemble ◮ Push I ( 1 , n , 1 , m ) onto the stack. ◮ Iterate until the stack is empty, i.e. reaching a leaf (structure) in the recursions. ◮ In each iteration, sample 0 ≤ α ≤ 1 uniformly at random. ◮ Pop from the stack top ( i R , j R , i S , j S ) . ◮ Pick a case of top according to α . For simplicity, we assume there is only one case here, i.e. ∑ i R , k 1 , k 2 , j S Q right Q top = Q left k 1 + 1 , j R , i S , k 2 + 1 iR ≤ k 1 < jR iS < k 2 ≤ jS ◮ Find k ∗ 1 , k ∗ 2 such that ··· ≃ α ∑ ∑ ··· . iR ≤ k 1 < k ∗ iR ≤ k 1 < jR 1 iS < k 2 ≤ jS iS < k 2 ≤ k ∗ 2 ◮ Push left ( i R , k ∗ 1 , k ∗ 2 , j S ) and right ( k 1 + 1 , j R , i S , k 2 + 1 ) onto the stack. 63/76,

Fast Ponty-style sampling of the Boltzmann ensemble k 2 k 2 j S � i S + j S � 2 · · · · · · j S − 1 i S + 2 i S + 1 j S i S + 1 k 1 k 1 i S i S i R j R · · · � i R + j R � j R − 1 i R j R i R + 1 i R + 1 i R + 2 · · · 2 Naïve traversal of indices Balanced traversal of indices O ( n 2 log n ) O ( n 4 ) 64/76,

Time and space complexity of piRNA ◮ O ( n 4 m 2 + n 2 m 4 ) time. ◮ O ( n 2 m 2 ) space. ◮ about 100 tables in the dynamic programming. ◮ takes about 1 day on 64 CPUs with 150GB RAM for two 110nt RNAs (OxyS-fhlA). Therefore, a fast heuristic is on demand for high-throughput applications, possibly as a filtering step. 65/76,

Binding sites prediction biRNA : a fast algorithm to predict simultaneous binding sites of two nucleic acids Pros ◮ Predicts multiple simultaneous binding sites. ◮ Computes a more accurate local energy of binding. ◮ Considers zigzags and crossing interactions. ◮ Maintains tractability for existing cases in the literature. Cons ◮ Approximates the intramolecular site accessibility energy. ◮ Its running time grows exponentially with the maximum number of simultaneous binding sites. 66/76,

biRNA Steps of the algorithm for R and S 1. For all short subsequences W , compute P u ( W ) , the prob. of being unpaired (Mückstein et al. 2008). 2. Obtain V , a short list of candidate sites. 3. For all pairs W 1 , W 2 , compute P u ( W 1 , W 2 ) , the joint pairwise prob. of being simultaneously unpaired. 4. Build tree-structured Markov Random Fields (MRF) T = ( V , E ) to approximate the distribution of being simultaneously unpaired (Chow and Liu 1968). 5. Compute Q I W R W S , the interaction partition functions restricted to subsequences W R and W S using piRNA . 6. Compute a matching between T R and T S that minimizes the binding energy or equivalently maximizes the binding probability. 67/76,

biRNA Binding energy minimization Exhaustive search to find matching M = { ( W R 1 , W S 1 ) , ( W R 2 , W S 2 ) ,..., ( W R k , W S k ) } that minimizes ∆ G ( M ) = ED R u ( M )+ ED S u ( M )+∆ G RS b ( M ) , in which ED R u ( M ) = − RT log P R ∗ u ( W R 1 , W R 2 ,..., W R k ) ED S u ( M ) = − RT log P S ∗ u ( W S 1 , W S 2 ,..., W S k ) b ( M ) = − RT ∑ ∆ G RS log ( Q I i − Q W R i Q W S i ) . W R i W S 1 ≤ i ≤ k R is the universal gas constant and T is temperature. 68/76,

Experimental results Multi-sites biRNA RNAup Pair Binding Site(s) Literature Site(s) Site OxyS-fhlA [22,30] [95,87] (23,30) (94,87) - - [98,104] [45,39] (96,104) (48,39) (96,104) (48,39) CopA-CopT [22,33] [70,59] (22,31) (70,61) - - [48,56] [44,36] (49,57) (43,35) (49,67) (43,24) [62,67] [29,24] (58,67) (33,24) - - 69/76,

Experimental results Uni-sites Pair GcvB gltI GcvB argT GcvB dppA GcvB livJ GcvB livK GcvB oppA GcvB STM4351 MicA lamB MicA ompA DsrA rpoS RprA rpoS IstR tisA MicC ompC MicF ompF RyhB sdhD RyhB sodB SgrS ptsG IncRNA 54 repZ Lengths: 71-253 nt Running time: 10 min - 1 hour on 8 dual core CPUs and 20GB of RAM 70/76,

Summary ◮ We presented piRNA an O ( n 4 m 2 + n 2 m 4 ) -time O ( n 2 m 2 ) -space complexity algorithm for interaction partition function, base-pair probabilities, minimum free energy secondary structure, equilibrium concentrations, melting temperature, and some other derivatives of the partition function. ◮ piRNA outperforms all other alternatives and is available at http://compbio.cs.wayne.edu/chitsaz/ . ◮ We presented biRNA , a fast RNA-RNA binding sites prediction algorithm. ◮ biRNA ’s tree-structured MRF approximation is accurate enough for predicting binding sites and may be used in other applications. 71/76,

A partition function algorithm for RNA-RNA interaction Hamidreza - PowerPoint PPT Presentation

A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen Wayne State University chitsaz@wayne.edu Benasque RNA Meeting July 27 th , 2012 1/76, Mini biography Robotics RNA

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian Wild Markus Nebel Anika

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Vector-partition functions Matthias Beck San Francisco State University math.sfsu.edu/beck

the interaction The Interaction interaction models translations between user and system

the interaction physical characteristics of interaction interaction styles the

Long Noncoding RNA The Dark Matter of the Genome Megan McSweeney BMS 265 Long Noncoding RNA

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

Programming Molecules Anne Condon U. British Columbia 100 nm Paul Rothemund, 2006 Programming

1-10: Learning Goals Lets use different base-height pairs to find the area of a triangle.

Stanford-UBC at TAC-KBP Eneko Agirre , Angel Chang, Dan Jurafsky, Christopher Manning, Valentin

Using Big Data to Assess Rare Sequence Variants Kirk

Information Theory and Coding i f s f f Image, Video and Audio Compression Markus

Sleep Modes Pacemaker Training Program The heart benefits from a decreased heart rate

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. One of the

Global and local alignments Global vs. local alignments Global: align all nucleotides

Sambuz

Useful Links

Newsletter

Mail Us

A partition function algorithm for RNA-RNA interaction Hamidreza - PowerPoint PPT Presentation

A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk Sahinalp, Rolf Backofen Wayne State University chitsaz@wayne.edu Benasque RNA Meeting July 27 th , 2012 1/76, Mini biography Robotics RNA

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian Wild Markus Nebel Anika

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Vector-partition functions Matthias Beck San Francisco State University math.sfsu.edu/beck

the interaction The Interaction interaction models translations between user and system

the interaction physical characteristics of interaction interaction styles the

Long Noncoding RNA The Dark Matter of the Genome Megan McSweeney BMS 265 Long Noncoding RNA

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

Programming Molecules Anne Condon U. British Columbia 100 nm Paul Rothemund, 2006 Programming

1-10: Learning Goals Lets use different base-height pairs to find the area of a triangle.

Stanford-UBC at TAC-KBP Eneko Agirre , Angel Chang, Dan Jurafsky, Christopher Manning, Valentin

Using Big Data to Assess Rare Sequence Variants Kirk

Information Theory and Coding i f s f f Image, Video and Audio Compression Markus

Sleep Modes Pacemaker Training Program The heart benefits from a decreased heart rate

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. One of the

Global and local alignments Global vs. local alignments Global: align all nucleotides

Sambuz

Useful Links

Newsletter

Mail Us

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics