Secondary structure prediction of RNA complexes Audrey Legendre , - - PowerPoint PPT Presentation

secondary structure prediction of rna complexes
SMART_READER_LITE
LIVE PREVIEW

Secondary structure prediction of RNA complexes Audrey Legendre , - - PowerPoint PPT Presentation

Colloque MASIM@Journes BIM 2019 Secondary structure prediction of RNA complexes Audrey Legendre , Eric Angel, Fariza Tahi 05/11/2019 Audrey Legendre , Eric Angel, Fariza Tahi 05/11/2019 1 / 21 CONTEXT Audrey Legendre , Eric Angel, Fariza Tahi


slide-1
SLIDE 1

Colloque MASIM@Journées BIM 2019

Secondary structure prediction of RNA complexes

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 1 / 21

slide-2
SLIDE 2

CONTEXT

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 2 / 21

slide-3
SLIDE 3

RNA structures and interactions

Primary structure

GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA

Secondary structure Tertiary structure

(PDB)

RNA stability

Riboswitch (Seeliger et al., 2012)

RNA-RNA interaction

→ RNA complexes

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21

slide-4
SLIDE 4

RNA structures and interactions

Primary structure

GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA

Secondary structure Tertiary structure

(PDB)

RNA stability

Riboswitch (Seeliger et al., 2012)

→ Need to predict several structures

RNA-RNA interaction

→ RNA complexes

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21

slide-5
SLIDE 5

RNA structures and interactions

Primary structure

GCGGAUUUAGCUCAGUU GGGAGAGCGCCAGACUG AAGAPCUGGAGGUCCUG UGUPCGAUCCACAGAAU UCGCACCA

Secondary structure Tertiary structure

(PDB)

RNA stability

Riboswitch (Seeliger et al., 2012)

→ Need to predict several structures

RNA-RNA interaction

→ RNA complexes

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 3 / 21

slide-6
SLIDE 6

Motifs of RNA structures and interactions

Single RNA representation Internal pseudoknot Motif-free interaction External pseudoknot

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 4 / 21

slide-7
SLIDE 7

Pseudoknot motifs

Pseudoknot types (Taufer et al., 2008) Pseudoknot depth

Pseudoknot of depth 3 Decomposition into 3 subsets without pseudoknot:

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 5 / 21

slide-8
SLIDE 8

Experimental data on RNA structure

  • Structural data: SHAPE, DMS, PARS
  • User knowledge: base pair, single base,

motifs, . . .

(Deigan et al., 2009)

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 6 / 21

slide-9
SLIDE 9

Our goal

Predict RNA complex secondary structure

  • with different motifs (pseudoknots),
  • taking into account experimental data,
  • and returning several solutions.

State-of-art

Tools Pseudoknots Several solutions Experimental data NanoFolder (Bindewald

et al., 2016)

✗ tr NUPACK tr (Zadeh

et al., 2011)

MultiRNAFold

(Andronescu et al., 2005)

✗ ✗ ✗

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 7 / 21

slide-10
SLIDE 10

METHODS

RCPred

(RNA Complex Prediction)

C-RCPred

(Constrained RNA Complex Prediction)

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 8 / 21

slide-11
SLIDE 11

Principle of our methods

RNA sequences RNA secondary structures

BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

RNAsubopt (Lorenz et al., 2011)

RCPred Mono-objective: truc truc truc

MFE

RNA complexes Visualisation

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21

slide-12
SLIDE 12

Principle of our methods

RNA sequences RNA secondary structures

BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

RNAsubopt (Lorenz et al., 2011)

C-RCPred Multi-objective: truc truc truc

MFE User constraints Structural data

User constraints Structural data RNA complexes Visualisation

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21

slide-13
SLIDE 13

Principle of our methods

RNA sequences RNA secondary structures

BiokoP (Legendre et al., 2018) pKiss (Janssen and Giegerich, 2014) RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

RNAsubopt (Lorenz et al., 2011)

C-RCPred Multi-objective: truc truc truc

MFE User constraints Structural data

User constraints Structural data RNA complexes Visualisation Clustering Interactivity

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 9 / 21

slide-14
SLIDE 14

Algorithmic approach

Finding RNA complexes

  • RNA complex: set of secondary

structures and interactions all compatible between each other → Finding a clique in a graph

Undirected weighted graph

  • s vertices: secondary structures
  • i vertices: interactions
  • Edges: the two linked vertices are

compatible truc truc truc

Mono-objective: Free energy User constraints Structural data

RCPred

  • 14.8

s1

  • 5.6

s2

  • 3.1

s3

  • 1.6

i1

  • 4.5

i2

  • 5.6

i3

  • 3.5

i4

  • 4.7

i5

  • 6.1

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21

slide-15
SLIDE 15

Algorithmic approach

Finding RNA complexes

  • RNA complex: set of secondary

structures and interactions all compatible between each other → Finding a clique in a graph

Undirected weighted graph

  • s vertices: secondary structures
  • i vertices: interactions
  • Edges: the two linked vertices are

compatible

Structure 1 / Structure 2

  • Mono-objective:

Free energy User constraints Structural data

RCPred

  • 14.8

s1

  • 5.6

s2

  • 3.1

s3

  • 1.6

i1

  • 4.5

i2

  • 5.6

i3

  • 3.5

i4

  • 4.7

i5

  • 6.1

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21

slide-16
SLIDE 16

Algorithmic approach

Finding RNA complexes

  • RNA complex: set of secondary

structures and interactions all compatible between each other → Finding a clique in a graph

Undirected weighted graph

  • s vertices: secondary structures
  • i vertices: interactions
  • Edges: the two linked vertices are

compatible truc truc truc RCPred

Mono-objective: Free energy User constraints Structural data

  • 14.8

s1

  • 5.6

s2

  • 3.1

s3

  • 1.6

i1

  • 4.5

i2

  • 5.6

i3

  • 3.5

i4

  • 4.7

i5

  • 6.1

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21

slide-17
SLIDE 17

Algorithmic approach

Finding RNA complexes

  • RNA complex: set of secondary

structures and interactions all compatible between each other → Finding a clique in a graph

Undirected weighted graph

  • s vertices: secondary structures
  • i vertices: interactions
  • Edges: the two linked vertices are

compatible truc truc truc C-RCPred

Multi-objective: Free energy User constraints Structural data s1

  • 5.6

60 0.65

  • 14.8

235 2.95 s2

  • 3.1

80 0.75 s3

  • 1.6

75 0.7 i1

  • 4.5

80 0.9 i2

  • 5.6

0.8 i3

  • 3.5

0.5 0.75 i4

  • 4.7

60 0.6 i5

  • 6.1

70 0.7

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 10 / 21

slide-18
SLIDE 18

Algorithmic approach

Clique problem: NP-hard → use of approximate algorithm (Wu and Hao, 2015)

Heuristic Breakout local search (Benlic and

Hao, 2013)

  • Modify the solution at each iteration.
  • Movement: adding, remove or replace

a vertex, reinitialize the clique.

  • Choose the best movement among the

neighborhood.

s1

  • 5.6

s2

  • 3.1

s3

  • 1.6

i1

  • 4.5

i2

  • 5.6

i3

  • 3.5

i4

  • 4.7

i5

  • 6.1

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 11 / 21

slide-19
SLIDE 19

Algorithmic approach

Adaptation of the Breakout local search heuristic

  • RCPred
  • save sub-optimal solutions.
  • C-RCPred:
  • Generate and save the Pareto set for

three objectives,

  • Modification of the choice of the

movements,

  • Control of the pseudoknot depth.
  • Audrey Legendre, Eric Angel, Fariza Tahi

05/11/2019 12 / 21

slide-20
SLIDE 20

RESULTS

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 13 / 21

slide-21
SLIDE 21

RCPred results

Data and protocol

RNA sequences of 90 RNA complexes from RNA STRAND (Andronescu et al., 2008) RNA secondary structures

30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

30 solutions of RNAsubopt

(Lorenz et al., 2011)

RCPred 10 first RNA complexes

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 14 / 21

slide-22
SLIDE 22

RCPred results

Importance of returning sub-optimal solutions

→ The best solution is not always the MFE structure.

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 15 / 21

slide-23
SLIDE 23

RCPred results

Comparison to the state-of-art

RCPred* NUPACK* (Zadeh et al., 2011) NanoFolder** (Bindewald et al., 2016) MultiRNAFold** (Andronescu et al., 2005) * RCPred, NUPACK: best among 10 first solutions

** one solution is returned

→ RCPred performs better than the tools of the literature.

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 16 / 21

slide-24
SLIDE 24

C-RCPred results

Data and protocol for evaluation of structural data criteria

RNA sequences of square complex (Dibrov et al., 2011) RNA secondary structures

30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

30 solutions of RNAsubopt

(Lorenz et al., 2011)

C-RCPred 10 first RNA complexes Structural data SHAPE data

(Mauger et al., 2015)

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 17 / 21

slide-25
SLIDE 25

C-RCPred results

Evaluation on square complex

→ Poor performance

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 18 / 21

slide-26
SLIDE 26

C-RCPred results

Evaluation on square complex

→ Poor performance

Observation

→ Low quality of pre- dicted secondary struc- tures and interactions

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 18 / 21

slide-27
SLIDE 27

C-RCPred results

Data and protocol for evaluation of user constraint criteria RNA sequences of PDB_01165 complex from RNA STRAND (Andronescu et al., 2008) RNA secondary structures

30 solutions of BiokoP (Legendre et al., 2018) 30 solutions of pKiss (Janssen and Giegerich, 2014) 30 solutions of RNAsubopt (Lorenz et al., 2011)

RNA-RNA interactions

30 solutions of RNAsubopt

(Lorenz et al., 2011)

C-RCPred 10 first RNA complexes User constraints

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 19 / 21

slide-28
SLIDE 28

C-RCPred results

Constraint Type Positions Present in the reference structure a Base pair RNA 1, base 9 with RNA 2, base 1

  • b

Base pair RNA 3, base 8 with RNA 4, base 1

  • c

Base pair RNA 2, base 8 with RNA 3, base 1 ✗ d Single base RNA 2, base 8 ✗

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 20 / 21

slide-29
SLIDE 29

Conclusion

  • RCPred and C-RCPred ( Legendre, A., Angel, E., Tahi, F. (2019). RCPred:

RNA complex prediction as a constrained maximum weight clique problem. BMC bioinformatics, 20(3), 128.)

  • Take advantages of the numerous tools for secondary structure and

interaction prediction

  • RNA complex as a clique

Perspectives

  • Find more structural data for RNA

complexes

  • New criteria: Maximum Expected

Accuracy

  • RNA-protein complexes

Eukaryotic ribosome (PDB): 28 S, 5 S, 5.8 S, 18 S rRNAs and proteins

RCPred available on EvryRNA platform (https://EvryRNA.ibisc.univ-evry.fr/)

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 21 / 21

slide-30
SLIDE 30

Thank you for your attention !

slide-31
SLIDE 31

References

Andronescu, M., Bereg, V., Hoos, H. H., and Condon, A. (2008). RNA STRAND: the RNA secondary structure and statistical analysis database. BMC bioinformatics, 9(1):340. Andronescu, M., Zhang, Z. C., and Condon, A. (2005). Secondary structure prediction of interacting RNA molecules. Journal of molecular biology, 345(5):987–1001. Benlic, U. and Hao, J.-K. (2013). Breakout local search for the quadratic assignment problem. Applied Mathematics and Computation, 219(9):4800–4815. Bertram, K., Agafonov, D. E., Dybkov, O., Haselbach, D., Leelaram, M. N., Will, C. L., Urlaub, H., Kastner, B., Lührmann, R., and Stark, H. (2017). Cryo-EM structure of a pre-catalytic human spliceosome primed for activation. Cell, 170(4):701–713. Bindewald, E., Afonin, K. A., Viard, M., Zakrevsky, P., Kim, T., and Shapiro, B. A. (2016). Multistrand structure prediction of nucleic acid assemblies and design of RNA switches. Nano letters, 16(3):1726–1735. Bindewald, E., Wendeler, M., Legiewicz, M., Bona, M. K., Wang, Y., Pritt, M. J., Le Grice, S. F., and Shapiro, B. A. (2011). Correlating SHAPE signatures with three-dimensional RNA

  • structures. RNA.

Carlisle, M. C. and Lloyd, E. L. (1995). On the k-coloring of

  • intervals. Discrete Applied Mathematics, 59(3):225–235.

Deigan, K. E., Li, T. W., Mathews, D. H., and Weeks, K. M. (2009). Accurate SHAPE-directed RNA structure prediction. Federation of American Societies for Experimental Biology, 23(1). Dibrov, S. M., McLean, J., Parsons, J., and Hermann, T. (2011). Self-assembling RNA square. Proceedings of the National Academy of Sciences, 108(16):6405–6408. Janssen, S. and Giegerich, R. (2014). The RNA shapes studio. Bioinformatics, 31(3):423–425. Legendre, A., Angel, E., and Tahi, F. (2018). Bi-objective integer programming for RNA secondary structure prediction with

  • pseudoknots. BMC bioinformatics, 19(1):13.

Lorenz, R., Bernhart, S. H., Zu Siederdissen, C. H., Tafer, H., Flamm, C., Stadler, P. F., and Hofacker, I. L. (2011). ViennaRNA package 2.0. Algorithms for Molecular Biology, 6(1):26. Mauger, D. M., Golden, M., Yamane, D., Williford, S., Lemon,

  • S. M., Martin, D. P., and Weeks, K. M. (2015). Functionally

conserved architecture of hepatitis C virus RNA genomes. Proceedings of the National Academy of Sciences, 112(12):3692–3697. Seeliger, J. C., Topp, S., Sogi, K. M., Previti, M. L., Gallivan,

  • J. P., and Bertozzi, C. R. (2012). A riboswitch-based inducible

gene expression system for mycobacteria. PloS one, 7(1):e29266. Tang, Y., Bouvier, E., Kwok, C. K., Ding, Y., Nekrutenko, A., Bevilacqua, P. C., and Assmann, S. M. (2015). StructureFold: genome-wide RNA secondary structure mapping and reconstruction in vivo. Bioinformatics, 31(16):2668–2675. Taufer, M., Licon, A., Araiza, R., Mireles, D., Van Batenburg, F., Gultyaev, A. P., and Leung, M.-Y. (2008). Pseudobase++: an extension of pseudobase for easy searching, formatting and visualization of pseudoknots. Nucleic acids research, 37(suppl_1):D127–D135. Turner, D. H. and Mathews, D. H. (2009). NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic acids research, 38(suppl_1):D280–D282. Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., et al. (2014). Landscape and variation of RNA secondary structure across the human transcriptome. Nature, 505(7485):706. Wu, Q. and Hao, J.-K. (2015). A review on algorithms for maximum clique problems. European Journal of Operational Research, 242(3):693–709. Zadeh, J. N., Steenberg, C. D., Bois, J. S., Wolfe, B. R., Pierce,

  • M. B., Khan, A. R., Dirks, R. M., and Pierce, N. A. (2011).

NUPACK: analysis and design of nucleic acid systems. Journal

  • f computational chemistry, 32(1):170–173.

Audrey Legendre, Eric Angel, Fariza Tahi 05/11/2019 21 / 21

slide-32
SLIDE 32

Compatibility threshold

Maximum number of conflicts = 100-threshold 100 × T

  • ,

with T the number of nucleotides involved into common base pairs of the two structures. Interaction 1 RNA A RNA B Interaction 2 RNA A RNA B T = 12 Threshold 80 90 100 Allowed number of conflicts 2 1

slide-33
SLIDE 33

Computing the weights of the vertices

Free energy

  • Turner model (Turner and Mathews, 2009) (ViennaRNA) (without pseudoknot)
  • r
  • NUPACK model (pseudoknot for alone RNA but not for interactions).

User constraints

CIstructure =

i CIi,

with CIi the confidence index of a constraint i respected by the structure.

Structural data

Matthews coefficient correlation (Bindewald et al., 2011): MCC = TP × TN − FP × FN

  • (TP + FP) × (TP + FN) × (TN + FP) × (TN + FN)

,

where TP: True positives (nucleotides with normalized reactivity below a threshold, fixed at 0.5, and involved into a Watson-Crick or G-U base pair), TN: True negatives (reactivity > 0.5 and doesn’t involved into a base pair), FP: False positives (reactivity < 0.5 and doesn’t involved into a base pair), FN: False negatives (reactivity > 0.5 and doesn’t involved into a base pair).

slide-34
SLIDE 34

Normalization of structural data

  • SHAPE: linear normalization to have values between 0 and 1:

R(i) = r(i) − min(r) max(r) − min(r), where r(i) is raw reactivity of the nucleotide i and r the vector of all the reactivities of the RNA.

  • DMS and PARS: two scores per nucleotide.
  • Normalization of the two scores in fonction of the abundance of the

transcript and its length: Rexp(i) = ln(r(i) + 1) l

i=0 ln(r(i) + 1)/l

, where r(i) is the raw reactivity of the nucleotide i and l the length of the transcript. The abundance of the transcript is the number of reads

  • btained in the experiment, i.e. the sum of all scores.
  • The difference of the two gives the final reactivity (Tang et al., 2015).

RDMS(i) = RDMS − Rcontrôle, RPARS(i) = RS1 − RV1.

slide-35
SLIDE 35

Optimal coloration of an interval graph (Carlisle and Lloyd, 1995)

Data : Interval graph with k vertices. Result : Optimal coloration of the graph. We assume that the k vertices are sorted in function of the left extremities of their intervals, in growing order. If two intervals are identical, they are

  • rdored arbitrarly;

Let k colors be c1, . . . , ck; while the graph is not entirely colored do Give to the next vertex not colored yet, the smaller color not already given in the neighborhood; end

slide-36
SLIDE 36

Statistics

  • TP, True Positive: number of base pairs correctly predicted.
  • FP, False Positive: number of base pairs in the prediction but not in

the referenced structure.

  • FN, False Negative: number of base pairs not predicted.
  • TN, True Negative: n(n−1)

2

− TP − FN − FP.

Matthews Correlation Coefficient MCC

MCC =

TP×TN−FP×FN

(TP+FP)(TP+FN)(TN+FP)(TN+FN)

slide-37
SLIDE 37

RCPred results

Comparison to the state-of-art

RCPred* NUPACK* (Zadeh et al., 2011) NanoFolder+ (Bindewald et al., 2016) MultiRNAFold+ (Andronescu et al., 2005) * RCPred, NUPACK: mfe

+ 1 solution is returned

→ RCPred performs better than the tools of the literature.

slide-38
SLIDE 38

C-RCPred results

Evaluation on B complex

→ Poor perfor- mance

Explanation

→ Low quality

  • f

predicted secondary structures and inter- actions

slide-39
SLIDE 39

C-RCPred results

Evaluation on C complex

→ Poor perfor- mance

Explanation

→ Low quality

  • f

predicted secondary structures and inter- actions