NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level - - PowerPoint PPT Presentation

nci doe cancer initiative ras biology in membranes
SMART_READER_LITE
LIVE PREVIEW

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level - - PowerPoint PPT Presentation

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards Predictive Biology Through HPC) GTC 2017 Brian Van Essen Computer Scientist May 9, 2017 LLNL-PRES-730749 This work was performed under the auspices of


slide-1
SLIDE 1

LLNL-PRES-730749

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

NCI-DOE Cancer Initiative: Ras Biology in Membranes

Molecular level Deep Learning (Towards Predictive Biology Through HPC)

GTC 2017

Brian Van Essen Computer Scientist May 9, 2017

slide-2
SLIDE 2

LLNL-PRES-730749

2

Multi-modal experimental data, image reconstruction, analytics

Adaptive spatial resolution Adaptive time stepping High-fidelity subgrid modeling

Experiments

  • n nanodisc

CryoEM imaging X-ray/neutron scattering Protein structure databases

Adaptive sampling molecular dynamics simulation codes

Unsupervised deep feature learning Uncertainty quantification Mechanistic network models

RAS activation experiments (FNLCR)

Phase field model Coarse- grain MD Classical MD

Machine learning guided dynamic validation

Granular RAS membrane interaction simulations Atomic resolution RAS-RAF interaction RAS Activation

Predictive simulation and analysis of RAS

Phase Field model of lipid membrane

Cancer Moonshot Pilot 2

slide-3
SLIDE 3

LLNL-PRES-730749

3

Identify characteristics of:

§ Individual molecules

— Hand engineered vs learned features

§ Collection of molecules (simulation frame)

— Instantaneous state of the system

§ Progression of system over time

— Identify / predict behavior

Adapt simulation to explore state space:

§ Observe / analyze rare events

Molecular-level Deep Learning Goals

Can machine learned features identify and highlight biologically interesting correlations?

slide-4
SLIDE 4

LLNL-PRES-730749

4

Use unsupervised learning to maximize labeled data

§ Convolutional autoencoders extract molecular-level features § Fully-connected autoencoders characterize state of simulation frame § Recurrent autoencoder predicts:

— future events -- queue in-depth (expensive) analysis — state transitions -- progress simulation

Data set characteristics:

§ Input dimensions: ~1.26e6 per time step (6000 lipids x 30 beads per lipid x

(position + velocity + type))

§ Sample size: O(106) for simulation requiring O(109) time steps

Molecular-level Deep Learning Techniques

slide-5
SLIDE 5

LLNL-PRES-730749

5

Inactive K-Ras binding GDP Active K-Ras binding GNP

§ Most MD studies of RAS have been in solution with no membrane § RAS only has biological activity when embedded in a membrane § NMR experiments have shown that RAS dynamics in membranes are complicated and

are affected by the membrane composition and binding partners

RAS Monomer Simulations

slide-6
SLIDE 6

LLNL-PRES-730749

6

Overview: Molecular Dynamics (MD)

§ Represent every atom in a system § Describe the forces on all atoms: § Integrate: F = ma (millions of times) § Result: position of every atom as a function of time § Compare with experiments: structures/dynamics

Current limitations

§ 100,000’s of atoms § 10,000’s of water molecules § 1,000’s of lipids § < 1 µs

F = −∇U(r) = ma = m!! r

slide-7
SLIDE 7

LLNL-PRES-730749

7

DPPC lipid Protein α-helix

All atom

§ Merge several heavy atoms into a single “bead” § Describe bead-bead interactions with averaged force field

— Sacrifice atomistic structural and dynamic information — Much less computer and time intensive — Same computational scaling properties

§ 6 orders of magnitude increase in sampling!

— 100s of μs* (+3 orders of magnitude) — 100,000s of lipids (+2 order of magnitude)

Coarse Grained Molecular Dynamics (CGMD)

*Actual “physiological’ timescale is even longer as there is also about a 10-fold increase in dynamics

CG

slide-8
SLIDE 8

LLNL-PRES-730749

8

Atomistic (MD) Coarse Grained (CGMD) Phase Field

Adaptive resolution MD/CGMD coupled with phase field

§ Model complex (many lipid)

bilayer with phase field to capture structure and topology

§ Model Ras on membrane using

full atomistic resolution

§ Use CGMD as ”glue” to

connect different models

Connecting MD and CGMD with continuum-scale phase field models will access biologically relevant time and length scales

slide-9
SLIDE 9

LLNL-PRES-730749

9

Simulation of full system will incorporate a large number of smaller simulations

§ 10-100 µm lipid patches § Dynamic membrane § Thousands of Ras proteins

— Mutant and wild-type — Many conformations — Many environments

(105) 100,000-atom simulations Investigate diffusion and aggregation in of Ras in context of specific membrane properties

slide-10
SLIDE 10

LLNL-PRES-730749

10

Inner leaflet Outer leaflet

Ingólfsson H.I., M.N. Melo, F. van Eerden, C. Arnarez, C.A. Lopez, T.A. Wassenaar, X. Periole, A.H. de Vries, D.P. Tieleman and S.J. Marrink. 2014. Lipid organization of the plasma membrane. J Am Chem Soc, 136:14554-14559

Simulations of KRAS have started in more biologically relevant lipid environments

Completed coarse-grained (CG) simulations of

§

average mammalian plasma membrane with 63 distinct lipid types

§

Working on improving CG parameters for specific lipid types to be consistent with all-atom (AA) simulations of lipids (LANL and LLNL)

§

Investigating “simple” average plasma membrane [only 18 lipid types]

§

Looking into tissue specific lipid compositions

Initial CGMD of KRAS proteins in complex human average plasma membrane

§

64 Kras4b in 70 nm x 70 nm membrane

§

HVR in alpha helix conformation

§

Inserted in inner plasma membrane leaflet

Tail unsaturation Headgroups

Distribution of lipids in average plasma membrane

slide-11
SLIDE 11

LLNL-PRES-730749

11

KRAS4b in mammalian plasma membrane

Helgi Ingólfsson, LLNL

§ 20,000 lipids (70x70 nm) § 40 µs pre-equilibration § 64 Ras proteins cluster readily § Associates with and aggregates

charged lipids in the membrane

slide-12
SLIDE 12

LLNL-PRES-730749

12

Automated hypothesis generation and dynamic validation

High-fidelity simulation Ensembles of simulation [parameter|output] sets CORAL computing architectures power the dynamic validation loop Machine learning to train a reduced-

  • rder predictive model

High dimensional model parameters Hypothesis generation – use the ML model to predict parameters for experimental data

slide-13
SLIDE 13

LLNL-PRES-730749

13

Capability

Project will build understanding on computational advances

Time

slide-14
SLIDE 14

LLNL-PRES-730749

14

Challenges:

§

Train neural networks on simulation data (not image slices)

§

Minimal prior art on deep neural networks trained on molecular dynamics

§

Labeling data is time consuming and requires domain experts Approach

§

Developing learned features that compliment standard molecular level features

§

Create an encoded representation that characterizes simulation state

§

Create model that can predict future simulation state Questions

§

Are these features useful for existing needs such as cluster detection

§

Can these encoding be used to queue domain scientists

§

ML provides data reduction and representation – how does this interface with traditional physics

Applying Deep Learning to molecular-level simulations

slide-15
SLIDE 15

LLNL-PRES-730749

15

Cholesterol density

Outer Inner

Avg. Brain

Cluster Detection

slide-16
SLIDE 16

LLNL-PRES-730749

16

Domain size(s) and dynamics?

§ Neighbor counting and clustering § Density maps - time and space correlation § Structure factor analysis § Lipid feature selection for fancy clustering

1) x,y,z coordinates 2) Lipid type 3) Lipid area 4) Local bilayer height 5) Lipid order 6) Lipid tilt 7) Lipid movement 8) Local density ...

Cluster Detection

slide-17
SLIDE 17

LLNL-PRES-730749

17

Challenging Cases – Cluster boundaries are not well defined

Cluster Detection

slide-18
SLIDE 18

LLNL-PRES-730749

18

Learn features for cluster detection and characterizing state

§ Use a multi-layer perceptron stacked auto-

encoder to generate features that describe the state of a simulation frame

§ Generate automatically extracted features

representing molecular simulation data

§ Establish framework for building future tools

using learned features Expected outcome:

§ Improvement in the understanding of protein

formation and easing of the handling large-scale molecular dynamics output

Molecular CNN X z X' Encoder Decoder Code State of Frame Molecular Features

Cluster Detection

slide-19
SLIDE 19

LLNL-PRES-730749

19

§ Do learned features out perform hand selected features for cluster detection? § Do we have enough labeled data to learn complex representations? § Does the compressed frame representation provide good basis for representing MD

simulation state?

§ Can we develop state descriptions that are meaningful to domain experts?

Can we leveraging deep learning for static state?

slide-20
SLIDE 20

LLNL-PRES-730749

20

Water Bilayer RAS RAS

Coupled Phase-Field Particle Model

§ Bilayer and water mapped to sheets

with concentration and height fields

§ RAS mapped to particles as ”point”

particles State Transition

slide-21
SLIDE 21

LLNL-PRES-730749

21

Statistical Multi-scale Coupling

Phase Field Statistical atoms

Density, composition, and curvature consistent with the phase field

Full dynamical atoms

Accelerate particles with parallel replica dynamics

State Transition

slide-22
SLIDE 22

LLNL-PRES-730749

22

p(t) t

kt

ke t p

= ) (

Parallel Replica Dynamics A.F. Voter Phys. Rev. B, 57, R13985 (1998)

Parallelizes time evolution Assumptions:

  • infrequent events
  • exponential distribution of first-escape times

State Transition

slide-23
SLIDE 23

LLNL-PRES-730749

23

Phase Field

Ensemble Multi-scale (Statistical Coupling)

Phase field parameters determined via atomistic MD

Many 100k atom MD simulations

State Transition

slide-24
SLIDE 24

LLNL-PRES-730749

24

Phase Field

Ensemble Multi-scale (Statistical Coupling)

Phase field parameters determined via atomistic MD

Many 100k’s atom MD simulations

Effective long range RAS-RAS interactions

State Transition

slide-25
SLIDE 25

LLNL-PRES-730749

25

Accelerated Dynamics Require “State” Identification Can Machine Learning Help?

Simple Basin Complex Basin Energy Low Dimensional High Dimensional State Transition

slide-26
SLIDE 26

LLNL-PRES-730749

26

Can the compressed representation over time be use for:

§ predicting state transitions? § formation of interesting temporal sequences? § create initial conditions? § steering through simulation state space?

Can we leveraging deep learning for dynamic state?

slide-27
SLIDE 27

LLNL-PRES-730749

27

RNN RNN

...

RNN y₁ y₂ RNN yt-₁ yt

...

Molecular CNN Frame Autoencoder

...

st₊₁ RNN yt₊₁ st₊₂ RNN yt₊₂

§ Training over both space and time makes models very large § Compressive autoencoders provide data reduction

Can Recurrent Networks provide building blocks for advanced analysis

§ Potential RNN outputs:

— predicted states — is the current state the

same as previous

— is the simulation

exploring an interesting configuration § Starting point for ML

exploration

slide-28
SLIDE 28

LLNL-PRES-730749

28

Phase field model Free energy and phase diagram

From MD simulations

Back-mapping PF to MD

CHARMM (all- atom) MARTINI (coarse grain bead)

Phase field membrane simulations

Phase field model Back-mapped atomistic regions with RAS proteins

Multiscale modeling: Phase Field/MD coupling

Phase Field (PF) Model

§

N-lipid model developed for deformable bilayer

§

Implementation on MOOSE

§

CHARMM-informed free energy Molecular Dynamics (MD)

§

Petascale MD code supporting CHARMM forcefield (ddcMD)

§

Strong-scaling framework for long-range forces (FMM) Continuum-Particle Coupling

§

Back-mapping toolchain developed Scoping and evaluation studies in progress

§

Accelerated dynamics

§

Machine learning/data analysis

§

Code harness

f(c1,c2,h )

free energy functional c1, c2, h c1, c2, h

slide-29
SLIDE 29

LLNL-PRES-730749

29

Rank 0 - N0 Rank 1 - N1 Rank 2 - N2 Rank 3 - N3 Rank 1 - N5 Rank 2 - N6 Rank 3 - N7 Rank 0 - N4 Peer-wise communication

NVRAM NVRAM NVRAM NVRAM

DP0 MB0 Input Data Partition 0 from Lustre DP0 MB1 DP0 MB2 DP0 MB3

NVRAM NVRAM NVRAM NVRAM

DP1 MB0 Input Data Partition 1 from Lustre DP1 MB1 DP1 MB2 DP1 MB3

Model Replica 0 Model Replica 1

Model M0 - Layer H1 Model M0 - Layer H2 Model M0 - Input Layer Model M1 - Layer H1 Model M1 - Layer H2 Model M1 - Input Layer

§

Deep Neural Network training / classification

Optimized distributed memory algorithm

Train large networks fast

Optimize for strong & weak scaling

§

Unique HPC resources at scale

InfiniBand interconnect (low latency / high cross section bandwidth)

Tightly-coupled GPU accelerators

Node-local NVRAM

High bandwidth Parallel File System

State-of-the art distributed linear algebra library

§

Open source under Apache license

http://software.llnl.gov/lbann

https://github.com/LLNL/lbann

LBANN: Livermore Big Artificial Neural Network Toolkit

slide-30
SLIDE 30

LLNL-PRES-730749

30

Goal is to develop a predictive molecular-scale model of RAS-driven cancer initiation and growth that can provide the needed insight to accelerate diagnostic and targeted therapy design.

Pilot 2 Aims and Objectives

  • Aim 1: Adaptive time

and length scaling in dynamic multi-scale simulations

1O2

P

  • Aim 2:

Extended RAS-complex interaction model

  • Aim 3: Development
  • f machine learning

for dynamic validation of models

slide-31
SLIDE 31

LLNL-PRES-730749

31

Multi-disciplinary team from FNLCR, LLNL, LANL, ORNL and ANL

Argonne National Lab: Prasanna Balaprakash, Tom Brettin, FangFang Xia FNLCR / NCI Team: Frantz Jean-Francois, Frank McCormick, Dhirendra Simanshu, Eric Stahlberg, Andy Stephen, Tommy Turbyville Oak Ridge National Lab: , Pratul K. Agarwal, Debsindhu Bhowmik, Arvind Ramanathan, Blake

  • A. Wiilson, Christopher B. Stanley

Lawrence Livermore National Lab: Harsh Bhatia, Barry Belmont, Tim Carpenter, Francesco Di Natale, Jim Glosli, Helgi Ingolfsson, Piyush Karande, Felice Lightstone, Tomas Oppelstrup, Liam Stanton, Michael Surh, Sachin Talathi, Brian Van Essen, Yue Yang, Xiaohua Zhang Los Alamos National Lab: Angel Garcia, Christoph Jungans, Cesar Lopez, Chris Neale, Danny Perez, Sandrasegaram Gnanakaran, Tim Travers, Art Voter

slide-32
SLIDE 32