nci doe cancer initiative ras biology in membranes
play

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level - PowerPoint PPT Presentation

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards Predictive Biology Through HPC) GTC 2017 Brian Van Essen Computer Scientist May 9, 2017 LLNL-PRES-730749 This work was performed under the auspices of


  1. NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards Predictive Biology Through HPC) GTC 2017 Brian Van Essen Computer Scientist May 9, 2017 LLNL-PRES-730749 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

  2. Cancer Moonshot Pilot 2 RAS activation Adaptive sampling molecular Predictive simulation experiments dynamics simulation codes and analysis of RAS (FNLCR) Adaptive time Adaptive spatial stepping resolution Experiments on nanodisc Phase field Coarse- Classical Phase Field model of model grain MD MD lipid membrane High-fidelity subgrid modeling Granular RAS membrane interaction Machine learning guided CryoEM simulations imaging dynamic validation X-ray/neutron scattering Atomic resolution RAS-RAF interaction Multi-modal experimental data, image reconstruction, analytics Mechanistic Unsupervised deep network models feature learning Protein structure databases RAS Activation Uncertainty quantification 2 LLNL-PRES-730749

  3. Molecular-level Deep Learning Goals Identify characteristics of: § Individual molecules — Hand engineered vs learned features § Collection of molecules (simulation frame) — Instantaneous state of the system § Progression of system over time — Identify / predict behavior Adapt simulation to explore state space: Can machine learned features identify and § Observe / analyze rare events highlight biologically interesting correlations? 3 LLNL-PRES-730749

  4. Molecular-level Deep Learning Techniques Use unsupervised learning to maximize labeled data § Convolutional autoencoders extract molecular-level features § Fully-connected autoencoders characterize state of simulation frame § Recurrent autoencoder predicts: — future events -- queue in-depth (expensive) analysis — state transitions -- progress simulation Data set characteristics: § Input dimensions: ~1.26e6 per time step (6000 lipids x 30 beads per lipid x (position + velocity + type)) § Sample size: O(10 6 ) for simulation requiring O(10 9 ) time steps 4 LLNL-PRES-730749

  5. RAS Monomer Simulations § Most MD studies of RAS have been in solution with no membrane § RAS only has biological activity when embedded in a membrane § NMR experiments have shown that RAS dynamics in membranes are complicated and are affected by the membrane composition and binding partners Inactive K-Ras binding GDP Active K-Ras binding GNP 5 LLNL-PRES-730749

  6. Overview: Molecular Dynamics (MD) § Represent every atom in a system § Describe the forces on all atoms: F = −∇ U ( r ) = m a = m !! r § Integrate: F = ma (millions of times) § Result: position of every atom as a function of time § Compare with experiments: structures/dynamics Current limitations § 100,000’s of atoms § 10,000’s of water molecules § 1,000’s of lipids § < 1 µs 6 LLNL-PRES-730749

  7. Coarse Grained Molecular Dynamics (CGMD) All atom CG § Merge several heavy atoms into a single “bead” § Describe bead-bead interactions with averaged force field — Sacrifice atomistic structural and dynamic information — Much less computer and time intensive — Same computational scaling properties DPPC lipid § 6 orders of magnitude increase in sampling! — 100s of μs* (+3 orders of magnitude) — 100,000s of lipids (+2 order of magnitude) *Actual “physiological’ timescale is even longer as there is also about a 10-fold increase in dynamics Protein α-helix 7 LLNL-PRES-730749

  8. Adaptive resolution MD/CGMD coupled with phase field § Model complex (many lipid) Phase Field Atomistic (MD) bilayer with phase field to capture structure and topology § Model Ras on membrane using full atomistic resolution § Use CGMD as ”glue” to connect different models Connecting MD and CGMD with continuum-scale phase field models will access biologically Coarse relevant time and length scales Grained (CGMD) 8 LLNL-PRES-730749

  9. Simulation of full system will incorporate a large number of smaller simulations § 10-100 µm lipid patches § Dynamic membrane § Thousands of Ras proteins — Mutant and wild-type — Many conformations — Many environments Investigate diffusion and aggregation in of Ras in context of specific membrane properties � (10 5 ) 100,000-atom simulations 9 LLNL-PRES-730749

  10. Simulations of KRAS have started in more biologically relevant lipid environments Distribution of lipids in average plasma membrane Completed coarse-grained (CG) simulations of Headgroups Tail unsaturation average mammalian plasma membrane with 63 distinct lipid types § Working on improving CG parameters for specific lipid types to be § consistent with all-atom (AA) simulations of lipids (LANL and LLNL) Outer leaflet Investigating “simple” average plasma membrane [only 18 lipid § types] Looking into tissue specific lipid compositions § Initial CGMD of KRAS proteins in complex human average plasma membrane 64 Kras4b in 70 nm x 70 nm membrane Inner leaflet § HVR in alpha helix conformation § Inserted in inner plasma membrane leaflet § Ingólfsson H.I., M.N. Melo, F. van Eerden, C. Arnarez, C.A. Lopez, T.A. Wassenaar, X. Periole, A.H. de Vries, D.P. Tieleman and S.J. Marrink. 2014. Lipid organization of the plasma membrane. J Am Chem Soc , 136:14554-14559 10 LLNL-PRES-730749

  11. KRAS4b in mammalian plasma membrane § 20,000 lipids (70x70 nm) § 40 µs pre-equilibration § 64 Ras proteins cluster readily § Associates with and aggregates charged lipids in the membrane Helgi Ingólfsson, LLNL 11 LLNL-PRES-730749

  12. Automated hypothesis generation and dynamic validation High dimensional model parameters High-fidelity simulation Hypothesis generation – use the ML model to predict CORAL computing architectures power the parameters for experimental dynamic validation loop data Ensembles of simulation Machine learning to train a reduced- [parameter|output] sets order predictive model 12 LLNL-PRES-730749

  13. Project will build understanding on computational advances Capability Time 13 LLNL-PRES-730749

  14. Applying Deep Learning to molecular-level simulations Challenges: Train neural networks on simulation data (not image slices) § Minimal prior art on deep neural networks trained on molecular dynamics § Labeling data is time consuming and requires domain experts § Approach Developing learned features that compliment standard molecular level features § Create an encoded representation that characterizes simulation state § Create model that can predict future simulation state § Questions Are these features useful for existing needs such as cluster detection § Can these encoding be used to queue domain scientists § ML provides data reduction and representation – how does this interface with traditional physics § 14 LLNL-PRES-730749

  15. Cluster Detection Cholesterol density Avg. Outer Inner Brain 15 LLNL-PRES-730749

  16. Cluster Detection Domain size(s) and dynamics? § Neighbor counting and clustering § Density maps - time and space correlation 1) x,y,z coordinates 2) Lipid type 3) Lipid area 4) Local bilayer height 5) Lipid order § Structure factor analysis 6) Lipid tilt 7) Lipid movement § Lipid feature selection for fancy clustering 8) Local density ... 16 LLNL-PRES-730749

  17. Cluster Detection Challenging Cases – Cluster boundaries are not well defined 17 LLNL-PRES-730749

  18. Cluster Detection Learn features for cluster detection and characterizing state § Use a multi-layer perceptron stacked auto- X' encoder to generate features that describe the Decoder state of a simulation frame § Generate automatically extracted features representing molecular simulation data Code z State of Frame § Establish framework for building future tools using learned features Encoder Expected outcome: X § Improvement in the understanding of protein formation and easing of the handling large-scale Molecular Molecular Features molecular dynamics output CNN 18 LLNL-PRES-730749

  19. Can we leveraging deep learning for static state? § Do learned features out perform hand selected features for cluster detection? § Do we have enough labeled data to learn complex representations? § Does the compressed frame representation provide good basis for representing MD simulation state? § Can we develop state descriptions that are meaningful to domain experts? 19 LLNL-PRES-730749

  20. State Transition Coupled Phase-Field Particle Model § Bilayer and water mapped to sheets with concentration and height fields Water RAS RAS § RAS mapped to particles as ”point” particles Bilayer 20 LLNL-PRES-730749

  21. State Transition Statistical Multi-scale Coupling Phase Field Statistical atoms Density, composition, and curvature consistent with the phase field Full dynamical atoms Accelerate particles with parallel replica dynamics 21 LLNL-PRES-730749

  22. State Transition Parallel Replica Dynamics A.F. Voter Phys. Rev. B, 57, R13985 (1998) Parallelizes time evolution Assumptions: - infrequent events - exponential distribution of first-escape times kt p ( t ) ke − = p(t) t 22 LLNL-PRES-730749

  23. State Transition Ensemble Multi-scale (Statistical Coupling) Phase Field Phase field parameters determined via atomistic MD Many 100k atom MD simulations 23 LLNL-PRES-730749

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend