Scaling Challenges in NAMD: Past and Future Outline NAMD: An - - PowerPoint PPT Presentation

scaling challenges in namd past and future outline
SMART_READER_LITE
LIVE PREVIEW

Scaling Challenges in NAMD: Past and Future Outline NAMD: An - - PowerPoint PPT Presentation

NAMD Team: Chee Wai Lee Abhinav Bhatele Kumaresh P. Eric Bohm James Phillips Sameer Kumar Gengbin Zheng David Kunzman Laxmikant Kale Chao Mei Klaus Schulten Scaling Challenges in NAMD: Past and Future Outline NAMD: An Introduction


slide-1
SLIDE 1

Scaling Challenges in NAMD: Past and Future

NAMD Team: Abhinav Bhatele Eric Bohm Sameer Kumar David Kunzman Chao Mei Chee Wai Lee Kumaresh P. James Phillips Gengbin Zheng Laxmikant Kale Klaus Schulten

slide-2
SLIDE 2

2

Outline

  • NAMD: An Introduction
  • Past Scaling Challenges

– Conflicting Adaptive Runtime Techniques – PME Computation – Memory Requirements

  • Performance Results
  • Comparison with other MD codes
  • Future Challenges:

– Load Balancing – Parallel I/O – Fine-grained Parallelization

slide-3
SLIDE 3

3

What is NAMD ?

  • A parallel molecular dynamics application
  • Simulate the life of a bio-molecule
  • How is the simulation performed ?

– Simulation window broken down into a large number

  • f time steps (typically 1 fs each)

– Forces on every atom calculated every time step – Velocities and positions updated and atoms migrated to their new positions

slide-4
SLIDE 4

4

How is NAMD parallelized ?

HYBRID DECOMPOSITION

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

What makes NAMD efficient ?

  • Charm++ runtime support

– Asynchronous message-driven model – Adaptive overlap of communication and computation

slide-7
SLIDE 7

7

Bonded Computes Non-bonded Computes Patch Integration Patch Integration Reductions Multicast Point to Point Point to Point PME

Non-bonded Work Bonded Work Integration PME Communication

slide-8
SLIDE 8

8

What makes NAMD efficient ?

  • Charm++ runtime support

– Asynchronous message-driven model – Adaptive overlap of communication and computation

  • Load balancing support

– Difficult problem: balancing heterogeneous computation – Measurement-based load balancing

slide-9
SLIDE 9

9

What makes NAMD highly scalable ?

  • Hybrid decomposition scheme
  • Variants of this hybrid scheme used by Blue Matter and

Desmond

slide-10
SLIDE 10

10

Scaling Challenges

  • Scaling a few thousand atom simulations to tens
  • f thousands of processors

– Interaction of adaptive runtime techniques – Optimizing the PME implementation

  • Running multi-million atom simulations on

machines with limited memory

– Memory Optimizations

slide-11
SLIDE 11

11

Conflicting Adaptive Runtime Techniques

  • Patches multicast data to computes
  • At load balancing step, computes re-assigned to

processors

  • Tree re-built after computes have

migrated

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

  • Solution

– Persistent spanning trees – Centralized spanning tree creation

  • Unifying the two techniques
slide-15
SLIDE 15

15

PME Calculation

  • Particle Mesh Ewald (PME) method used for

long range interactions

– 1D decomposition of the FFT grid

  • PME is a small portion of the total computation

– Better than the 2D decomposition for small number of processors

  • On larger partitions

– Use a 2D decomposition – More parallelism and better overlap

slide-16
SLIDE 16

16

Automatic Runtime Decisions

  • Use of 1D or 2D algorithm for PME
  • Use of spanning trees for multicast
  • Splitting of patches for fine-grained parallelism
  • Depend on:

– Characteristics of the machine – No. of processors – No. of atoms in the simulation

slide-17
SLIDE 17

17

Reducing the memory footprint

  • Exploit the fact that building blocks for a bio-

molecule have common structures

  • Store information about a particular kind of atom
  • nly once
slide-18
SLIDE 18

18

O H H O H H 14333 14332 14334 14496 14495 14497 O H H O H H

  • 1

+1

  • 1

+1

slide-19
SLIDE 19

19

Reducing the memory footprint

  • Exploit the fact that building blocks for a bio-

molecule have common structures

  • Store information about a particular kind of atom
  • nly once
  • Static atom information increases only with the

addition of unique proteins in the simulation

  • Allows simulation of 2.8 M Ribosome on Blue

Gene/L

slide-20
SLIDE 20

20

Memory Reduction

0.01 0.1 1 10 100 1000

IAPP DHFR Lysozyme ApoA1 F1- ATPase STMV Bar Domain Ribosome

Benchmark Memory Usage (MB) Original New

< 0.5 MB < 0.5 MB

slide-21
SLIDE 21

21

NAMD on Blue Gene/L

1 million atom simulation on 64K processors (LLNL BG/L)

slide-22
SLIDE 22

22

NAMD on Cray XT3/XT4

5570 atom simulation on 512 processors at 1.2 ms/step

slide-23
SLIDE 23

23

Comparison with Blue Matter

  • Blue Matter developed specifically for Blue Gene/L

Time for ApoA1 (ms/step)

  • NAMD running on

4K cores of XT3 is comparable to BM running on 32K cores of BG/L

slide-24
SLIDE 24

24

3.46 4.2 5.52 7.48 11.42 19.59 NAMD CO mode (No MTS) 2.04 2.71 3.78 5.8 9.73 16.83 NAMD CO mode (1 pe/node) 2.09 3.14 5.39 9.97 18.95 38.42 Blue Matter (2 pes/node)

  • 3.7

5.3 5.62 9.99 11.99 NAMD VN mode (No MTS) 2.11 2.29 3.06 4.06 6.26 9.82 NAMD VN mode (2 pes/node) 16384 8192 4096 2048 1024 512 Number of Nodes

slide-25
SLIDE 25

25

Comparison with Desmond

  • Desmond is a proprietary MD program

Time (ms/step) for Desmond on 2.4 GHz Opterons and NAMD on 2.6 GHz Xeons

  • Uses single precision

and exploits SSE instructions

  • Low-level infiniband

primitives tuned for MD

slide-26
SLIDE 26

26

1.5 2.0 7.1 9.4 256 1.1 1.4 4.2 5.2 512 1.0

  • 2.5

3.0 1024

  • 3.7

6.3 11.5 21.0 41.4 Desmond DHFR 2.0 18.2 33.5 64.3 126.8 256.8 Desmond ApoA1 2.4 4.3 8.09 14.9 27.3 NAMD DHFR 1.9 13.4 26.5 50.7 104.9 199.3 NAMD ApoA1 2048 128 64 32 16 8 Number of Cores

slide-27
SLIDE 27

27

NAMD on Blue Gene/P

slide-28
SLIDE 28

28

Future Work

  • Optimizing PME computation

– Use of one-sided puts between FFTs

  • Reducing communication and other overheads

with increasing fine-grained parallelism

  • Running NAMD on Blue Waters

– Improved distributed load balancers – Parallel Input/Output

slide-29
SLIDE 29

29

Summary

  • NAMD is a highly scalable and portable MD

program

– Runs on a variety of architectures – Available free of cost on machines at most supercomputing centers – Supports a range of sizes of molecular systems

  • Uses adaptive runtime techniques for high

scalability

  • Automatic selection of algorithms at runtime best

suited for the scenario

  • With new optimizations, NAMD is ready for the

next generation of parallel machines

slide-30
SLIDE 30

Questions ?