NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01 1 - - PowerPoint PPT Presentation

namd scalable molecular dynamics
SMART_READER_LITE
LIVE PREVIEW

NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01 1 - - PowerPoint PPT Presentation

NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01 1 Molecular dynamics and NAMD MD to understand the structure and function of biomolecules proteins, DNA, membranes NAMD is a production quality MD program Active use


slide-1
SLIDE 1

1

NAMD - Scalable Molecular Dynamics

Gengbin Zheng 9/1/01

slide-2
SLIDE 2

2

Molecular dynamics and NAMD

  • MD to understand the structure and function of

biomolecules

– proteins, DNA, membranes

  • NAMD is a production quality MD program

– Active use by biophysicists (science publications) – 50,000+ lines of C++ code – 1000+ registered users – Features and “accessories” such as

  • VMD: visualization and analysis
  • BioCoRE: collaboratory
  • Steered and Interactive Molecular Dynamics
slide-3
SLIDE 3

3

Molecular Dynamics

slide-4
SLIDE 4

4

Molecular Dynamics

  • Collection of [charged] atoms, with bonds
  • Like N-Body problem, but much complicated.
  • At each time-step

– Calculate forces on each atom

  • non-bonded: electrostatic and van der Waal’s
  • Bonds(2), angle(3) and dihedral(4)

– Integration: calculate velocities and advance positions

  • 1 femtosecond time-step, millions needed!
  • Thousands of atoms (1,000 - 100,000)
slide-5
SLIDE 5

5

Cut-off radius

  • Use of cut-off radius to reduce work

– 8 - 14 Å – Far away charges ignored!

  • 80-95 % work is non-bonded force computations
  • Some simulations need far away contributions

– Periodic systems: Ewald, Particle-Mesh Ewald – Aperiodic systems: FMA

  • Even so, cut-off based computations are

important:

– near-atom calculations are part of the above – Cycles: multiple time-stepping is used: k cut-off steps, 1 PME/FMA

slide-6
SLIDE 6

6

Spatial Decomposition

But the load balancing problems are still severe: Patch

slide-7
SLIDE 7

7

Patch Compute Proxy

slide-8
SLIDE 8

8

FD + SD

  • Now, we have many more objects to load

balance:

– Each diamond can be assigned to any processor – Number of diamonds (3D):

  • 14·Number of Patches
slide-9
SLIDE 9

9

Load Balancing

  • Is a major challenge for this application

– especially for a large number of processors

  • Unpredictable workloads

– Each diamond (force object) and patch encapsulate variable amount of work – Static estimates are inaccurate

  • Measurement based Load Balancing Framework

– Robert Brunner’s recent Ph.D. thesis – Very slow variations across timesteps

slide-10
SLIDE 10

10

Load Balancing

  • Based on migratable objects
  • Collect timing data for several cycles
  • Run heuristic load balancer

– Several alternative ones:

  • Alg7 - Greedy
  • Refinement
  • Re-map and migrate objects accordingly

– Registration mechanisms facilitate migration

slide-11
SLIDE 11

11

Load balancing strategy

Greedy variant (simplified): Sort compute objects (diamonds) Repeat (until all assigned) S = set of all processors that:

  • - are not overloaded
  • - generate least new commun.

P = least loaded {S} Assign heaviest compute to P Refinement: Repeat

  • Pick a compute from

the most overloaded PE

  • Assign it to a suitable

underloaded PE Until (No movement) Cell Cell Compute

slide-12
SLIDE 12

12

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

2 4 6 8 10 12 14 Average Processors Time

migratable work non-migratable work

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000

2 4 6 8 1 1 2 1 4 A v e r a g e Processors

Time

migratable work non-migratable work

slide-13
SLIDE 13

13

Results on Linux Cluster

Speedup on Linux Cluster 10 20 30 40 50 60 70 80 20 40 60 80 100 120 Processors Speedup

slide-14
SLIDE 14

14

Performance of Apo-A1 on Asci Red

200 400 600 800 1000 1200 500 1000 1500 2000 2500 Processors Speedup

slide-15
SLIDE 15

15

Performance of Apo-A1 on O2k and T3E

50 100 150 200 250 50 100 150 200 250 300 Processors Speedup

slide-16
SLIDE 16

16

Future and Planned work

  • Increased speedups on 2k-10k processors

– Smaller grainsizes – New algorithms for reducing communication impact – New load balancing strategies

  • Further performance improvements for

PME/FMA

– With multiple timestepping – Needs multi-phase load balancing

slide-17
SLIDE 17

17

Steered MD: example picture

Image and Simulation by the theoretical biophysics group, Beckman Institute, UIUC