HOOMD-blue - Scalable Molecular Dynamics and Monte Carlo Joshua - - PowerPoint PPT Presentation

hoomd blue
SMART_READER_LITE
LIVE PREVIEW

HOOMD-blue - Scalable Molecular Dynamics and Monte Carlo Joshua - - PowerPoint PPT Presentation

HOOMD-blue - Scalable Molecular Dynamics and Monte Carlo Joshua Anderson and Jens Glaser Glotzer Group, Chemical Engineering, University of Michigan Blue Waters Symposium, Sun River, OR 05/12/2015 T HE G LOTZER G ROUP Scaling on OLCF Cray


slide-1
SLIDE 1

THE GLOTZER GROUP

HOOMD-blue -


Scalable Molecular Dynamics and Monte Carlo

Joshua Anderson and Jens Glaser

Glotzer Group, Chemical Engineering, University of Michigan Blue Waters Symposium, Sun River, OR 05/12/2015

slide-2
SLIDE 2

THE GLOTZER GROUP

% %

%

Figure 4 – Example of colloidal scale nucleation and growth of a crystal from a fluid of hard octahedra.

Scaling on OLCF Cray XK7

%

  • Fig. 1: Strong scaling benchmarks for all three simulation types (MD, MC, DEM) supported by HOOMD-blue, on Titan. Shown is
slide-3
SLIDE 3

THE GLOTZER GROUP

Applications of HOOMD-blue

Beltran-Villegas et al. Soft Matter 2014

Long, A.W. and Ferguson, A.L.

  • J. Phys. Chem. B 2014

Marson, R. L. et al. Nano Lett. 2014 Trefz, B. et al. PNAS 2014 Nguyen et al. Phys Rev. Lett. 2014 Knorowski, C. and Travesset, A. JACS 2014 Mahynsk, A. Nat. Comm. 2014 Glaser et al. Macromolecules 2014

>100 peer-reviewed publications using HOOMD-blue as of May 2015 http://codeblue.umich.edu/hoomd-blue/publications.html

A B C

slide-4
SLIDE 4

THE GLOTZER GROUP

Universality of Block Copolymer Melts

ð Þ

H 64 S1 16 S1 32 S1 64 S1 128 S2 16 S2 32 S2 64 S3 16 S3 32 S3 64

e 1 N ODT

102 103 104 10 15 20 25 30

N

100 200 500 1000 2000 5000 10000 10 15 20 25 30

N

eN ODT

FH SCFT

AB Diblock copolymer melt

Glaser, J., Medapuram, P., Beardsley,


  • T. M., Matsen, M. W., & Morse, D. C.


PRL, 113, 068302 (2014) Medapuram P., Glaser J., Morse D. C. 
 Macromolecules 2015, 48, 819-839.

slide-5
SLIDE 5

THE GLOTZER GROUP

Spatial domain decomposition

  • Particles can leave and enter

domains under periodic boundary conditions

  • Ghost particles required for force

computation

  • Update positions of ghost

particles every time step

rcut rbuff

slide-6
SLIDE 6

THE GLOTZER GROUP

Scaling bottlenecks in spatial domain decomposition 6 GB/s

CPU CPU GPU GPU

6 GB/s 1000’s of cores

4-12 cores

Network

slide-7
SLIDE 7

THE GLOTZER GROUP

Compute vs. Communication

GPU=K20X

■ ■ ■ ■ ◆ ◆ ◆ ◆

▼ ▼ ▼ ▼ ▲ ▲ ▲ ▲

◆ ◆ ◆ ◆

2 4 8 16 20 50 100 200 μ s

Migrate

Ghost exchange

Ghost update

  • Neighbor

■ Force ◆ Communication

1 2 4 8 16 20 50 100 200 500 P (=# GPUs) average tstep [μ s]

N=64,000

slide-8
SLIDE 8

THE GLOTZER GROUP

Optimization of the communication algorithm

  • Device-resident data
  • Autotune kernels
  • Overlap synchronization with computation

Pair NVT

pack

Comm Comm

Thermo unpack

Pair

Collective

MPI GPU

50 μs

Profile of 1 MD time step communication
 computation overlap pack/unpack on GPU auto-tune kernel

slide-9
SLIDE 9

THE GLOTZER GROUP

Weak scaling up to 108,000,000 particles

32,000 particles/GPU

  • ● ● ● ●●

■ ■ ■ ■ ■ ■ ■ ■ ■ ■

weak scaling

  • HOOMD-blue 1.0

■ LAMMPS-GPU 11Nov13

1 2 5 10 20 50100 1000 100 200 500 1000 2000 # of GPUs (= # of nodes) time steps/sec.

Trung Nguyen

slide-10
SLIDE 10

THE GLOTZER GROUP

Strong Scaling of a LJ Liquid (N=10,976,000)

■ ■ ■ ■ ■ ■ ■ ■ ■

strong scaling

  • HOOMD-blue 1.0

■ LAMMPS-GPU 11Nov13

4 8 16 32 64 128 256 5121024 10 20 50 100 200 500 1000 # of GPUs (= # of nodes) time steps/sec.

Trung Nguyen

slide-11
SLIDE 11

THE GLOTZER GROUP

Strong Scaling Efficiency

80% efficiency at
 250,000 ptls/GPU

■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

  • 256,000

■ 864,000 ◆ 2,048,000 ▲ 4,000,000 ▼ 6,912,000 ○ 10,976,000

105 5×105 106 1.5×106 2×106 2.5×106 20 40 60 80 100 N/P (Number of particles per GPU) efficiency [%]

slide-12
SLIDE 12

THE GLOTZER GROUP

Polymer Brush Scaling

Jaime Millan

■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆

□ □ □ □

  • N=107,520

■ N=430,080 ◆ N=1,720,320

CPU N=430,080

1 2 4 8 16 32 64 128 1 2 5 10 20 50 100 200 500 1000 # nodes (= # GPUs) time steps/sec.

slide-13
SLIDE 13

THE GLOTZER GROUP

GPUDirect RDMA on Wilkes

  • d
  • UDA 5/6

InfiniBand GPU

GPU Memory

CPU

Chip set

System Memory

Pak Lui, Filippo Spiga, Rong Shi

slide-14
SLIDE 14

THE GLOTZER GROUP

Dissipative Particle Dynamics on Blue Waters and Titan

slide-15
SLIDE 15

THE GLOTZER GROUP

Summary - Molecular Dynamics

  • Multi-GPU support in HOOMD 1.0 enables large-

scale MD using spatial domain decomposition


  • Strong Scaling extends to 1000’s of GPUs, and to

more complex systems

  • GPUDirect RDMA is a promising technology,

although strong scaling is ultimately limited by PCIe and kernel launch latency

Glaser J., Nguyen T.D., Anderson J.A. et al.
 Strong scaling of general-purpose molecular dynamics simulations on GPUs.

  • Comput. Phys. Commun. 192, pp. 97-107 (2015)


doi:10.1016/j.cpc.2015.02.028.

slide-16
SLIDE 16

THE GLOTZER GROUP

Molecular dynamics

Tethered nanospheres Langevin dynamics

Marson, R, Nano Letters 14, 4, 2014

Surfactant coated surfaces Dissipative particle dynamics

Pons-Siepermann, I. C., Soft matter 6 3919 (2012)

Self-propelled colloids Non-equilibrium MD

Nguyen N., Phys Rev E 86 1, 2012

Truncated Tetrahedra Hard particle MC

Damasceno, P. F. et al., ACS Nano 6, 609 (2012)

Arbitrary polyhedra Hard particle MC

Damasceno, P. F. et al., Science 337, 453 (2012)

Interacting nanoplates Hard particle MC with interactions Hard disks - hexatic Hard particle MC

Engel M. et al., PRE 87, 042134 (2013) Ye X. et al., Nature Chemistry cover article (2013)

Monte Carlo

Quasicrystal growth Molecular Dynamics

Engel M. et al., Nature Materials (in press)

slide-17
SLIDE 17

THE GLOTZER GROUP

THE GLOTZER GROUP

Hard particle Monte Carlo

  • Hard Particle Monte Carlo plugin for

HOOMD-blue

  • 2D Shapes
  • Disk
  • Convex (Sphero)polygon
  • Concave polygon
  • Ellipse
  • 3D Shapes
  • Sphere
  • Ellipsoid
  • Convex (Sphero)polyhedon
  • NVT and NPT ensembles
  • Frenkel-Ladd free energy
  • Parallel execution on a single GPU
  • Domain decomposition across

multiple nodes (CPUs or GPUs)

H

β-Mn cP20 (A13) #P04

[100]

Damasceno et al., Science (2012) Engel M. et al., PRE 87, 042134 (2013) Damasceno, P. F. et al., ACS Nano 6, 609 (2012) Damasceno et al., Science (2012)

slide-18
SLIDE 18

THE GLOTZER GROUP

Easy and flexible to use

from hoomd_script import * from hoomd_plugins import hpmc init.read_xml(filename=‘init.xml’) mc = hpmc.integrate.convex_polygon(seed=10, d=0.25, a=0.3); mc.shape_param.set('A', vertices=[(-0.5, -0.5), (0.5, -0.5), (0.5, 0.5), (-0.5, 0.5)]); run(10e3)

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

THE GLOTZER GROUP

Overlap checks

  • Disk/sphere - trivial
  • Convex polygons - separating axis
  • Concave polygons - brute force
  • Spheropolygons - XenoCollide/GJK
  • Convex polyhedra - XenoCollide/GJK
  • Ellipsoid / Ellipse: Matrix method
  • Compute delta in double, convert to

single for expensive overlap check

⊖ =

Separating axis XenoCollide 1001.842 - 1000.967 = 0.875

∆~ r

slide-22
SLIDE 22

THE GLOTZER GROUP

Divergence

(a) (b) Time (0.5 ms total) Thread

Initialization Trial move Circumsphere check Overlap check Overlap divergence Early exit divergence

slide-23
SLIDE 23

THE GLOTZER GROUP

Strong scaling - squares

GPU: Tesla K20X, CPU: Xeon E5-2680 (XSEDE Stampede)

106 107 108 109

Trial moves per second

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

P - GPUs/CPU cores 29x

N=1,048,576 N=65,536 N=4,096 80% efficiency 50% efficiency GPU GPU GPU CPU CPU CPU

slide-24
SLIDE 24

THE GLOTZER GROUP

Weak scaling - truncated octahedra (3D)

GPU: Tesla K20X on Cray XK7, CPU: AMD bulldozer on Cray XE6

1.6x

5 6 7 8

Trials moves / N / sec

8 27 64 125 216 343 512 1000

Nodes XK7 XE6

slide-25
SLIDE 25

THE GLOTZER GROUP

Questions?

Funding / Resources

  • National Science Foundation, Division of Materials Research Award # DMR 1409620
  • This work was partially supported by a Simons Investigator award from the Simons Foundation to

Sharon Glotzer

  • This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is

supported by National Science Foundation grant number OCI-1053575.

  • This research is part of the Blue Waters sustained petascale computing project, which is supported by

the National Science Foundation (award number ACI 1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.

  • This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge

National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

email: joaander@umich.edu

HOOMD-blue: http://codeblue.umich.edu/hoomd-blue Monte Carlo code not yet publicly available.

  • It will eventually be released open-source as part of HOOMD-blue
  • Paper on hard disks: Anderson, J. A. et al., JCP 254, 27-38 (2013)
  • Paper on 3D, anisotropic shapes, multi-GPU: coming soon