Mining molecular flexibility: novel tools, novel insights F. Cazals, - PowerPoint PPT Presentation

Mining molecular flexibility: novel tools, novel insights F. Cazals, Inria – Algorithm-Biology-Structure Joint work with (Methods) R. Tetley, Inria – Algorithm-Biology-Structure (Class II fusion) F. Rey, Institut Pasteur Paris

Mining molecular flexibility: novel tools, novel insights Introduction Multiscale analysis of structurally conserved motifs Combined RMSD The Structural Bioinformatics Library Outlook Multiscale analysis of structurally conserved motifs Technicalities

Challenge Dynamics of proteins : specification ⊲ Input: structure(s) of biomolecules + potential energy model ⊲ Output ◮ Thermodynamics: meta-stable states and observables ◮ Dynamics: Markov state model – requires rare transition events ⊲ Time-scales ◮ Biological time-scale > millisecond ◮ Integration time step in molecular dynamics: ∆ t ∼ 10 − 15 s ◮ 5.058ms of simulation time; ◮ ∼ 230 GPU years on NVIDIA GeForce GTX 980 proc. ⊲ Ref: Chodera et al, eLife, 2019

Combined RMSD : TBEV glycoprotein in two different conformations pre and post fusion ⊲ Classical analysis: ⊲ Our motifs: Motif Alignment size lRMSD Large 88 1.69 Small 40 0.38 Statistics from Apurva: ◮ 370 a.a. aligned ◮ lRMSD: 11.1Å

Structural Motif ⊲ Input: We are given two polypeptide chains S A and S B Definition 1. Given two sets of a.a. M A = { a i 1 , . . . , a i s } ⊂ S A and M B = { b i 1 , . . . , b i s } ⊂ S B , and a one-to-one alignment { ( a i j ↔ b i j ) } between them, we define the least RMSD ratio as follows: r lRMSD ( M A , M B ) = lRMSD ( M A , M B ) / lRMSD ( S A , S B ) . (1) The sets M A and M B are called structural motifs provided that | M A | = | M B | ≥ s 0 and r lRMSD ( M A , M B ) ≤ r 0 , for appropriate thresholds s 0 and r 0 .

Key idea: exploiting quasi-isometric deformations to identify almost rigid | isometric regions in structures ⊲ Quasi-isometric deformation: (selected) distances (almost) preserved d ′ d 2 d 3 3 d ′ 2 d 1 ∼ d ′ d ′ 1 d 1 1 d 2 ∼ d ′ 2 d 3 � = d ′ 3 ⊲ Tracking such deformation may be done at two scales: ◮ Global preservation: maximal cliques – NP-hard problem. ◮ Local preservation: spanning trees connecting atoms whose relative distances are conserved.

Multi-scale rigidity: embodied in the notion of filtration ⊲ Key ideas ◮ Filtration: sequence of nested topological space – read: sequence of nested sets of amino-acids ◮ Ordering of a.a.: by decreasing rigidity index – those involved in rigid blocks come first

Motifs for two structures A and B: a generic approach ◮ Step 1: use an aligner for the seed alignment and scores ◮ (A and B) Compute a seed alignment ◮ (A, then B) Sort residues by decreasing structural conservation ◮ Step 2: use a filtration to perform a multiscale analysis ◮ (A, then B) Identify structurally conserved regions ◮ Step 3: reuse the aligner to bootstrap the alignment ◮ (A and B) Re-compute a structural alignment between pairs of regions Step 3: Identifying Step 2: Filtrations and persistence diagrams structural motifs Step 1: Seed alignments, scores Build filtrations: • from conserved distances (CD) Identification of struc- Given two structures, • from space filling diagram (SFD) tural motifs compute a pairwise structural alignment For each chain: build the persistence diagram of connected components of the filtration Step 4: Filtering structural motifs Death Compute distance conservation scores Hierarchical representation with Hasse diagrams Birth s ij = | d A ij − d B ij | Statistical assessment of structural motifs ⊲ NB: s is the distance variation | D ( t , t ′ ) | applied to C carbons.

Generic method: instantiations ⊲ Main steps: ◮ step 1 ≡ alignment to rigidity scores; ◮ step 2 ≡ rigidity scores to filtrations; ◮ step 3 ≡ filtrations to motifs via local alignments. ⊲ Ingredient 1: an aligner for steps 1 and 3 ◮ Options: Kpax , Apurva , ( FATCAT ) ⊲ Ingredient 2: filtration encoding based on rigidity scores ◮ Option 1: based on conserved distances (cf Kruskal’s MST algorithm) ◮ Option 2: based on space filling diagrams (Voronoi / α -shapes) ⊲ Resulting programs: Align-Kpax-CD , Align-Kpax-SFD , Align-Apurva-CD , Align-Apurva-SFD ⊲ Nb: conformation vs homologous proteins: (trivial) alignment

Motifs reveal the multi-scale structural conservation within global alignments ⊲ Size of motifs vs lRMSD on challenging cases 1BGE vs 2GMF 1CEW vs 1MOL 1CID vs 2RHE 1CRL vs 1EDE ⊲ Ref: Pairs of structures: from Godzik et al, Bioinformatics, 2003

Comparing two molecules: the combined RMSD ⊲ Rationale: use one rigid motion for each rigid/structurally conserved region ⊲ Motifs for two molecules A and B , and their intersection graph A 1 M ( A ) B 1 1 M ( B ) A 2 1 B 2 M ( B ) A 3 B 3 2 M ( A ) 2 A 4 A 5 B 4 M ( A ) M ( B ) 3 3 A 6 B 5 Definition 2. Consider two structures A and B for which non-overlapping domains { C ( A ) , C ( B ) } i = 1 ,..., m have been identified. Assume that a lRMSD has been i i computed for each pair ( C ( A ) , C ( B ) ) . Let w i be the weights associated with an i i individual lRMSD . The combined RMSD is defined by � m � w i lRMSD 2 ( C ( A ) , C ( B ) � � RMSD Comb. ( A , B ) = ) . (2) � i i � i w i i = 1 ⊲ Rmk: comes into two guises, namely vertex weighted and edge weighted

Combined RMSD : TBEV glycoprotein in two different conformations pre and post fusion ⊲ Classical analysis: ⊲ Our motifs: Motif Alignment size lRMSD Large 88 1.69 Small 40 0.38 Statistics from Apurva: ◮ 370 a.a. aligned ◮ lRMSD: 11.1Å

The Structural Bioinformatics Library http://sbl.inria.fr ⊲ Ref: Cazals and Dreyfus; Bioinformatics, 2016

SBL and Jupyter notebooks: guided tour http://sbl.inria.fr/applications

Summary and outlook ⊲ Combined RMSD – RMSD Comb. ◮ Structural comparisons based on (relatively) independent sets ⊲ Multiscale analysis of structural conservation ◮ Segregating dof (internal coords.) into active and passive ◮ Towards more efficient algorithms for thermodynamics - dynamics ⊲ Software: all tools in the SBL ⊲ Ongoing ◮ Design of move sets ◮ Applications to energy landscapes: exploration, thermodynamics

Bibliography • Combined RMSD: [1] • Structural motifs: [2] • Software: [3] • Partition functions [4] • Cluster matching: [5] F. Cazals and R. Tetley. Characterizing molecular flexibility by combining lRMSD measures. Proteins , 87(5):380–389, 2019. F. Cazals and R. Tetley. Multiscale analysis of structurally conserved motifs. 2019. Submitted. F. Cazals and T. Dreyfus. The Structural Bioinformatics Library: modeling in biomolecular science and beyond. Bioinformatics , 7(33):1–8, 2017. A. Chevallier and F. Cazals. Wang-landau algorithm: an adapted random walk to boost convergence. J. of Computational Physics (Under revision) , 2019. F. Cazals, D. Mazauric, R. Tetley, and R. Watrigant. Comparing two clusterings using matchings between clusters of clusters. ACM J. of Experimental Algorithms , 24(1):1–42, 2019.

Step 1: rigidity score as C α ranks for chains A and B d A i,j ⊲ Input: a structural alignment yields i j Chain A ◮ d A i , j : dist. between C α i and j on chain A ◮ d B i , j : dist. between C α i and j on Chain B chain B d B i,j ⊲ Distance difference matrix between A and B: s ij = | d A i , j − d B i , j | , i = 1 , . . . , N , j = 1 , . . . , N . (3) ⊲ C α rank of residue i: index of the smallest s ij involving this residue in the sorted sequence Sorted { s ij } . Assuming the ordering of scores a 1 b 1 depicted, the ranks are as follows: ◮ one for C 1 and C 2 a 4 b 4 a 3 a 2 b 2 b 3 ◮ two for C 3 and C 4 Sorted scores: s 12 < s 34 < s 23 < s 13 < s 14 < s 24 ◮ likewise for the second chain.

Mining molecular flexibility: novel tools, novel insights F. Cazals, - PowerPoint PPT Presentation

Mining molecular flexibility: novel tools, novel insights F. Cazals, Inria Algorithm-Biology-Structure Joint work with (Methods) R. Tetley, Inria Algorithm-Biology-Structure (Class II fusion) F. Rey, Institut Pasteur Paris Mining

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

produce Good Flexibility? I. What does flexibility do? II. What flexibility does a

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

Probing dimerization and structural flexibility Probing dimerization and structural flexibility

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Welcome to our Traveller Insights Session 1 Travellers Insights and Trends 2 Tips and Tools to

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

TUTORIAL 4 Ambarish Kunwar Department of Biosciences and Bioengineering IIT Bombay

FASTA - Pearson and Lipman (88) Earlier version by the same authors, FASTP, appeared in 85

Truncations of Haar unitary matrices and bivariate tied-down Brownian bridge A. Rouault

os, Klarner and the 3 x + 1 Problem Erd Je ff Lagarias , University of Michigan Ann Arbor,

Results from the Sorcerer II Global Ocean Sampling Expedition The Sorcerer I I Expedition

Integrable Flows for Starlike Curves in Centroaffine Space A. Calini 1 T. Ivey 1 -Beffa 2 G.

Evolvable, Biologically Plausible Visual Architectures Aaron Sloman

Program Verification Notes by Jonathan Buss Based in part on materials prepared by B.

Sambuz

Useful Links

Newsletter

Mail Us