DRAFT
Using a Hybrid Cray S Supercomputer to M Model N Non-Icing - - PowerPoint PPT Presentation
Using a Hybrid Cray S Supercomputer to M Model N Non-Icing - - PowerPoint PPT Presentation
DRAFT Using a Hybrid Cray S Supercomputer to M Model N Non-Icing Surfaces f for Cold- Climate Wind Turbines Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X GE Global Research Masako Yamada DRAFT Opportunity in
2 GE Title or job number 11/11/2013
DRAFT
Wind energy production > 285 GW/year and growing
- Cold regions favorable
- Lower human population
- Good wind conditions
- 45-50 GW opportunity from 2013-2017 ~$2million/MW installed
- Technical need
- Anti-icing surfaces
- 3-10% energy losses due to icing
- Shut-downs
- Active heating expensive
Opportunity in Cold-Climate Wind
VTT Tech chnica cal Research ch Centre of Finland
http://www.vtt.fi/news/2013/280520 13_wind_energy.jsp?lang=en
3 GE Title or job number 11/11/2013
DRAFT
ALCC Awards 40 + 40 million hours
DOE ASCR Leadership Computing Challenge Awards Energy-relevant applications
- 1. Non-Icing Surfaces for Cold-Climate Wind Turbines
- Jaguar (Cray XK6) at Oak Ridge National Lab
- Molecular dynamics using LAMMPS
- 1 million mW water molecule droplets on engineered surfaces
- Completed >300 simulations
- Achieved >200x speedup from 2011 to 2013
- >5x from GPU acceleration
- 2. Accelerated Non-Icing Surfaces for Cold-Climate Wind
Turbines
- Titan (Cray XK7, hybrid) at Oak Ridge National Lab
- “Time parallelization” via Parallel Replica method
- Expected 10 – 100x faster results
4 GE Title or job number 11/11/2013
DRAFT
Titan enables leadership-class study
- Size of simulation ~ 1 million molecules
- Droplet size >> critical nucleus size
- Mimic physical dimensions (*somewhat)
- Duration of simulation ~ 1 microsecond
- Nucleation is an activated process
- Freezing rarely observed in MD simulations
- Number of simulations ~ 100’s
- Study requires “embarrassingly parallel” runs
- Different surfaces, ambient temperatures, conductivity
- Multiple replicates required due to stochastic nature
*million molecule droplet ~ 50nm diameter
5 GE Title or job number 11/11/2013
DRAFT
Year Software/Language # of Molecu cules Hardware 1995 Pascal Few Desktop Mac 2000 C, Fortran90 Hundreds IBM SP, SGI O2K 2010 NAMD, LAMMPS 1000’s Linux HPC Present GPU-enabled LAMMPS Millions Titan
Personal history with MD
LOGO
1995 2000 2013
6 GE Title or job number 11/11/2013
DRAFT
>200x overall speedup since 2011
- 1. Switched to mW water potential
3-body model is more expensive/complex than 2-body but
- Particle reduction – at least 3x
- Timestep increase – 10x
- No long-range forces
- 2. LAMMPS dynamic load balance – 2-3x
- 3. GPU acceleration of 3-body model – 5x
2011: 6 femtosecond/1024 CPU-second (SPC/E) 2013: 2 picoseconds/1024 CPU-second (mW)
7 GE Title or job number 11/11/2013
DRAFT
- 1. mW water potential
Stillinger Weber 3-body particle = one water molecule
- Introduced in 2009, Nature paper in 2011
- Bulk water properties comparable or better than existing
point-charge models
- Much faster than point-charge models
- Exemplary test case by authors: 180x faster than SPC/E
- GE production simulation: 40-50x faster than SPC/E
asymmetric million molecule droplet on engineered surface; loaded onto 64 nodes
SPC/E mW
8 GE Title or job number 11/11/2013
DRAFT
- 2. LAMMPS dynamic load balance
Introduced in 2012 Adjusts size of processor sub-domains to equalize number of particles 2-3x speedup for 1 million molecule droplets on 64 nodes (with user-specified processor mapping)
No load balancing Default load balancing User-specified mapping
9 GE Title or job number 11/11/2013
DRAFT
- 3. GPU-acceleration of 3-body potential
See details
- W. Michael Brown and Masako Yamada
Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body Potentials. Computer Physics Communications. 2013.Computer Physics Communications, (2013)
10 GE Title or job number 11/11/2013
DRAFT
Load 1 million molecules on Host/CPU
+ + + +
1 million molecules
- 64 nodes
- Processor sub-domains
correspond to “spatial” partitioning of droplet
- 8 MPI tasks/node
- 1 core/paired-unit
11 GE Title or job number 11/11/2013
DRAFT
Per node ~ 15,000 molecules
Accelerator NVIDIA Tesla K20X GPU Host AMD Opteron 6274 CPU
Global Memory
“Kernel”
Processor 1
Local Memory
Core 192 Private Core 1 Private Core 2 Private
Host Memory
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15
….
Processor 2 Processor 14
Work item Work item Work item Work item Work item Work item Work iterm
Work Group
Work item = fundamental unit of activity
12 GE Title or job number 11/11/2013
DRAFT
Parallelization in LAMMPS
3-body potential Neighbor-lists Time integration Thermostat/barostat Bond/angle calculations Statistics
Host Accelerator
13 GE Title or job number 11/11/2013
DRAFT
Generic 3-body potential
𝑉 = 𝜚 𝒒𝑗, 𝒒𝑘, 𝒒𝑙
𝑙>𝑘 𝑘≠𝑗
𝑠𝑗𝑘 < 𝑠
𝑑, 𝑠𝑗𝑙 < 𝑠 𝑑 𝑗
0 otherwise
Good ca candidate for GPU 1. Occupies majority of computational time 2. Can be decomposed into independent kernels/work-items
(0,0,0) 𝑠
𝑗𝑘
𝑠
𝑗𝑙
i j k
𝒒𝑗 𝒒𝑘 𝒒𝑙
𝑠
𝑑= cutoff
𝑠
𝛽= neighbor
skin
Stillinger-Weber MEAM Tersoff REBO/AIREBO Bond-order…
14 GE Title or job number 11/11/2013
DRAFT
Redundant Computation Approach
Atom-decomposition
- 1 atom 1 computational kernel only
- fewest operations (and effective parallelization) but
– shared memory access a bottleneck
Force-decomposition
- 1 atom 3 computational kernels required
- redundant computations but
– reduced shared memory issues – many work-items = more effective use of cores
15 GE Title or job number 11/11/2013
DRAFT
Stillinger-Weber Parallelization
2-body operations 3-body operations (𝑠
𝑗𝑘 < 𝑠 𝛽) .AND. (𝑠 𝑗𝑙 < 𝑠 𝛽) == .TRUE.
update forces on i only 3-body operations (𝑠
𝑗𝑘 < 𝑠 𝛽) .AND. (𝑠 𝑗𝑙 < 𝑠 𝛽) == .FALSE.
neighbor-of-neighbor interactions
3 kernels
no data dependencies
Atom 𝑗 𝑉 = 𝜚2
𝑘<𝑗
(𝑠𝑗𝑘)
𝑗
+ 𝜚3
𝑙>𝑘
𝑠𝑗𝑘, 𝑠𝑗𝑙, 𝜄
𝑘𝑗𝑙 𝑘≠𝑗 𝑗
16 GE Title or job number 11/11/2013
DRAFT
Neighbor List
- 3-body force-decomposition approach involves
neighbor-of-neighbor operations
- Requires additional overhead
- increase in border size shared by two processes
- neighbor list for ghost atoms “straddling” across cores
- GPU implementation not necessarily faster than
CPU but less time spent in host-accelerator data transfer (note: neighbor lists are huge)
17 GE Title or job number 11/11/2013
DRAFT
GPU acceleration benefit
>5x speedup achieved in production water droplet of 1 million molecules on engineered surface (64 nodes) Not limited to Stillinger-Weber -- applicable to MEAM, Tersoff, REBO, AIREBO, Bond-order, etc.
18 GE Title or job number 11/11/2013
DRAFT
Implementation
19 GE Title or job number 11/11/2013
DRAFT
6 different surfaces
Interaction potential developed at GE Global Research
20 GE Title or job number 11/11/2013
DRAFT
Freezing front propagation
Visualization of “latent heat” release
DRAFT
Steinhardt-Nelson order parameter particle mobility Side View Bottom View
Visualizing crystalline regions
22 GE Title or job number 11/11/2013
DRAFT
Advanced visualization
Mike Matheson, Oak Ridge National Lab Will include visuals/movies here
23 GE Title or job number 11/11/2013
DRAFT
Next steps
- Quasi “time parallelization” using Parallel Replica
Method
- Launch dozens of replicates simultaneously; monitor
ensemble behavior
- Expected outcome: 10-100x faster results
- Analysis and application of simulation results
24 GE Title or job number 11/11/2013
DRAFT
Credits
- Mike Brown (ORNL) – GPU acceleration
- Paul Crozier (Sandia) – dynamic load balancing
- Valeria Molinero (Utah) – mW potential
- Aaron Keyes (Umich, Berkeley) – Steinhardt-Nelson order parameters
- Art Voter/Danny Perez (LANL) – Parallel Replica method
- Mike Matheson (ORNL) -- Visualization
- Jack Wells, Suzy Tichenor (ORNL) – General
- Azar Alizadeh, Branden Moore, Rick Arthur, Margaret Blohm (GE Global Research)
This research was conducted in part under the auspices of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy under Contract No. DEAC05-00OR22725 with UT- Battelle, LLC. This research was also conducted in part under the auspices of the GE Global Research High Performance Computing program.
25 GE Title or job number 11/11/2013
DRAFT
References
- http://www.vtt.fi/news/2013/28052013_wind_energy.jsp?lang=en
- W. Michael Brown, W. M and Yamada, M. Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body
- Potentials. Computer Physics Communications. 2013.Computer Physics Communications, (2013)
- Shi, B. and Dhir, V. K. Molecular dynamics simulation of the contact angle of liquids on solid surfaces. The Journal of Chemical Physics,
130, 3 (01/21/ 2009), 034705-034705; Sergi, D., Scocchi, G. and Ortona, A. Molecular dynamics simulations of the contact angle between water droplets and graphite surfaces. Fluid Phase Equilibria, 332, 0 (10/25/ 2012), 173-177.
- Oxtoby, D. W. Homogeneous nucleation: theory and experiment. Journal of Physics: Condensed Matter, 4, 38 1992), 7627.
- Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117, 1 (3/1/ 1995), 1-19.
- Humphrey, W., Dalke, A. and Schulten, K. VMD: Visual molecular dynamics. Journal of Molecular Graphics, 14, 1 (2// 1996), 33-38.
- Keys, A. S. Shape Matching Analysis Code. University of Michigan, City, 2011; Keys, A. S., Iacovella, C. R. and Glotzer, S. C.
Characterizing Structure Through Shape Matching and Applications to Self-Assembly. Annual Review of Condensed Matter Physics, 2, 1 (2011/03/01 2011), 263-285; Steinhardt, P. J., Nelson, D. R. and Ronchetti, M. Bond-orientational order in liquids and glasses. Physical Review B, 28, 2 (07/15/ 1983), 784-805.
- Stillinger, F. H. and Weber, T. A. Computer simulation of local order in condensed phases of silicon. Physical Review B, 31, 8 (04/15/
1985), 5262-5271.
- Berendsen, H. J. C., Grigera, J. R. and Straatsma, T. P. The missing term in effective pair potentials. The Journal of Physical Chemistry,
91, 24 (1987/11/01 1987), 6269-6271.
- Molinero, V. and Moore, E. B. Water Modeled As an Intermediate Element between Carbon and Silicon†. The Journal of Physical
Chemistry B, 113, 13 (2009/04/02 2008), 4008-4016; Moore, E. B. and Molinero, V. Structural transformation in supercooled water controls the crystallization rate of ice. Nature, 479, 7374 (11/24/print 2011), 506-508.
- Yamada, M., Mossa, S., Stanley, H. E. and Sciortino, F. Interplay between Time-Temperature Transformation and the Liquid-Liquid
Phase Transition in Water. Physical Review Letters, 88, 19 (04/26/ 2002), 195701.
- Brown, W. M., Wang, P., Plimpton, S. J. and Tharrington, A. N. Implementing molecular dynamics on hybrid high performance
computers – short range forces. Computer Physics Communications, 182, 4 (4// 2011), 898-911.
- Voter, A. F. Parallel replica method for dynamics of infrequent events. Physical Review B, 57, 22 (06/01/ 1998), R13985-R13988.
26 GE Title or job number 11/11/2013
DRAFT