Colloque MASIM@Journées BIM 2019
Secondary structure prediction of RNA complexes
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 1 / 21
Secondary structure prediction of RNA complexes Audrey Legendre , - - PowerPoint PPT Presentation
Colloque MASIM@Journes BIM 2019 Secondary structure prediction of RNA complexes Audrey Legendre , Eric Angel, Fariza Tahi 05/11/2019 Audrey Legendre , Eric Angel, Fariza Tahi 05/11/2019 1 / 21 CONTEXT Audrey Legendre , Eric Angel, Fariza Tahi
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 1 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 2 / 21
GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA
(PDB)
Riboswitch (Seeliger et al., 2012)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21
GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA
(PDB)
Riboswitch (Seeliger et al., 2012)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21
GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA
(PDB)
Riboswitch (Seeliger et al., 2012)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 4 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 5 / 21
(Deigan et al., 2009)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 6 / 21
Tools Pseudoknots Several solutions Experimental data NanoFolder (Bindewald
et al., 2016)
✗ tr NUPACK tr (Zadeh
et al., 2011)
✗
MultiRNAFold
(Andronescu et al., 2005)
✗ ✗ ✗
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 7 / 21
(RNA Complex Prediction)
(Constrained RNA Complex Prediction)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 8 / 21
BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)
RNAsubopt (Lorenz et al., 2011)
MFE
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21
BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)
RNAsubopt (Lorenz et al., 2011)
MFE User constraints Structural data
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21
BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)
RNAsubopt (Lorenz et al., 2011)
MFE User constraints Structural data
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21
Mono-objective: Free energy User constraints Structural data
s1
s2
s3
i1
i2
i3
i4
i5
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21
Structure 1 / Structure 2
Free energy User constraints Structural data
s1
s2
s3
i1
i2
i3
i4
i5
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21
Mono-objective: Free energy User constraints Structural data
s1
s2
s3
i1
i2
i3
i4
i5
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21
Multi-objective: Free energy User constraints Structural data s1
60 0.65
235 2.95 s2
80 0.75 s3
75 0.7 i1
80 0.9 i2
0.8 i3
0.5 0.75 i4
60 0.6 i5
70 0.7
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21
Hao, 2013)
s1
s2
s3
i1
i2
i3
i4
i5
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 11 / 21
three objectives,
movements,
05/11/2019 12 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 13 / 21
30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)
30 solutions of RNAsubopt
(Lorenz et al., 2011)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 14 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 15 / 21
RCPred* NUPACK* (Zadeh et al., 2011) NanoFolder** (Bindewald et al., 2016) MultiRNAFold** (Andronescu et al., 2005) * RCPred, NUPACK: best among 10 first solutions
** one solution is returned
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 16 / 21
30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)
30 solutions of RNAsubopt
(Lorenz et al., 2011)
(Mauger et al., 2015)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 17 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 18 / 21
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 18 / 21
30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)
30 solutions of RNAsubopt
(Lorenz et al., 2011)
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 19 / 21
Constraint Type Positions Present in the reference structure a Base pair RNA 1, base 9 with RNA 2, base 1
Base pair RNA 3, base 8 with RNA 4, base 1
Base pair RNA 2, base 8 with RNA 3, base 1 ✗ d Single base RNA 2, base 8 ✗
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 20 / 21
RNA complex prediction as a constrained maximum weight clique problem. BMC bioinformatics, 20(3), 128.)
Eukaryotic ribosome (PDB): 28 S, 5 S, 5.8 S, 18 S rRNAs and proteins
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 21 / 21
Andronescu, M., Bereg, V., Hoos, H. H., and Condon, A. (2008). RNA STRAND: the RNA secondary structure and statistical analysis database. BMC bioinformatics, 9(1):340. Andronescu, M., Zhang, Z. C., and Condon, A. (2005). Secondary structure prediction of interacting RNA molecules. Journal of molecular biology, 345(5):987–1001. Benlic, U. and Hao, J.-K. (2013). Breakout local search for the quadratic assignment problem. Applied Mathematics and Computation, 219(9):4800–4815. Bertram, K., Agafonov, D. E., Dybkov, O., Haselbach, D., Leelaram, M. N., Will, C. L., Urlaub, H., Kastner, B., Lührmann, R., and Stark, H. (2017). Cryo-EM structure of a pre-catalytic human spliceosome primed for activation. Cell, 170(4):701–713. Bindewald, E., Afonin, K. A., Viard, M., Zakrevsky, P., Kim, T., and Shapiro, B. A. (2016). Multistrand structure prediction of nucleic acid assemblies and design of RNA switches. Nano letters, 16(3):1726–1735. Bindewald, E., Wendeler, M., Legiewicz, M., Bona, M. K., Wang, Y., Pritt, M. J., Le Grice, S. F., and Shapiro, B. A. (2011). Correlating SHAPE signatures with three-dimensional RNA
Carlisle, M. C. and Lloyd, E. L. (1995). On the k-coloring of
Deigan, K. E., Li, T. W., Mathews, D. H., and Weeks, K. M. (2009). Accurate SHAPE-directed RNA structure prediction. Federation of American Societies for Experimental Biology, 23(1). Dibrov, S. M., McLean, J., Parsons, J., and Hermann, T. (2011). Self-assembling RNA square. Proceedings of the National Academy of Sciences, 108(16):6405–6408. Janssen, S. and Giegerich, R. (2014). The RNA shapes studio. Bioinformatics, 31(3):423–425. Legendre, A., Angel, E., and Tahi, F. (2018). Bi-objective integer programming for RNA secondary structure prediction with
Lorenz, R., Bernhart, S. H., Zu Siederdissen, C. H., Tafer, H., Flamm, C., Stadler, P. F., and Hofacker, I. L. (2011). ViennaRNA package 2.0. Algorithms for Molecular Biology, 6(1):26. Mauger, D. M., Golden, M., Yamane, D., Williford, S., Lemon,
conserved architecture of hepatitis C virus RNA genomes. Proceedings of the National Academy of Sciences, 112(12):3692–3697. Seeliger, J. C., Topp, S., Sogi, K. M., Previti, M. L., Gallivan,
gene expression system for mycobacteria. PloS one, 7(1):e29266. Tang, Y., Bouvier, E., Kwok, C. K., Ding, Y., Nekrutenko, A., Bevilacqua, P. C., and Assmann, S. M. (2015). StructureFold: genome-wide RNA secondary structure mapping and reconstruction in vivo. Bioinformatics, 31(16):2668–2675. Taufer, M., Licon, A., Araiza, R., Mireles, D., Van Batenburg, F., Gultyaev, A. P., and Leung, M.-Y. (2008). Pseudobase++: an extension of pseudobase for easy searching, formatting and visualization of pseudoknots. Nucleic acids research, 37(suppl_1):D127–D135. Turner, D. H. and Mathews, D. H. (2009). NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic acids research, 38(suppl_1):D280–D282. Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., et al. (2014). Landscape and variation of RNA secondary structure across the human transcriptome. Nature, 505(7485):706. Wu, Q. and Hao, J.-K. (2015). A review on algorithms for maximum clique problems. European Journal of Operational Research, 242(3):693–709. Zadeh, J. N., Steenberg, C. D., Bois, J. S., Wolfe, B. R., Pierce,
NUPACK: analysis and design of nucleic acid systems. Journal
Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 21 / 21
CIstructure =
i CIi,
with CIi the confidence index of a constraint i respected by the structure.
Matthews coefficient correlation (Bindewald et al., 2011): MCC = TP × TN − FP × FN
,
where TP: True positives (nucleotides with normalized reactivity below a threshold, fixed at 0.5, and involved into a Watson-Crick or G-U base pair), TN: True negatives (reactivity > 0.5 and doesn’t involved into a base pair), FP: False positives (reactivity < 0.5 and doesn’t involved into a base pair), FN: False negatives (reactivity > 0.5 and doesn’t involved into a base pair).
R(i) = r(i) − min(r) max(r) − min(r), where r(i) is raw reactivity of the nucleotide i and r the vector of all the reactivities of the RNA.
transcript and its length: Rexp(i) = ln(r(i) + 1) l
i=0 ln(r(i) + 1)/l
, where r(i) is the raw reactivity of the nucleotide i and l the length of the transcript. The abundance of the transcript is the number of reads
RDMS(i) = RDMS − Rcontrôle, RPARS(i) = RS1 − RV1.
Data : Interval graph with k vertices. Result : Optimal coloration of the graph. We assume that the k vertices are sorted in function of the left extremities of their intervals, in growing order. If two intervals are identical, they are
Let k colors be c1, . . . , ck; while the graph is not entirely colored do Give to the next vertex not colored yet, the smaller color not already given in the neighborhood; end
2
TP×TN−FP×FN
(TP+FP)(TP+FN)(TN+FP)(TN+FN)
RCPred* NUPACK* (Zadeh et al., 2011) NanoFolder+ (Bindewald et al., 2016) MultiRNAFold+ (Andronescu et al., 2005) * RCPred, NUPACK: mfe
+ 1 solution is returned