Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang - PowerPoint PPT Presentation

Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang Computer Network Information Center(CNIC), Chinese Academy of Sciences

2 Outline Outline Background Phase Field Model ➢ Large Scale Smiulations ➢ Compute intensive large time step algorithm CETD Schemes ➢ Localized Exponential Integration ➢ Acceleration on heterogeneous platform GPU ➢ Sunway TaihuLight, MIC ➢ Summary

Background

Micro-structures in Materials Micro-structures: meso-scale morphological patterns

Micro-structures in Materials Fatigue Failure

相场模型 ➢ ➢ 梯相度场流、系成统分场等

Phase Field Model Phase Field Model 7

Explicit time marching- small time step Allen-Cahn (AC) equation Martin Bauer et. al., SC2015. 8 ×10 9 cells. Takashi Shimokawabe et. al. , SC2011. 4 ×10 9 cells, TSUBAME 2.0. SuperMUC, Hornet and JUQUEEN. Tomohiro Takaki et. al. Acta Materialia, 2016. 4 ×10 9 cells, TSUBAME 2.5.

Energy stability Energy stability

Large Scale Phase Field Simulations AC equation, explicit time marching CH equation, implicit time marching Small time step-size Large time step-size Integration scheme design, easy Integration scheme design, hard Stencil computing Multi-level preconditioner-solver performance ~ 25% peak performance < 10% peak Large scale simulation ~10 billion cells Large scale simulation ~ 0.1 billion cells The limited resolution in 3D simulations(CH) constitutes bottlenecks in validating predictions based on the phase field approach. Accurate large-time-step marching scheme, scalability, efficiency

Compute intensive large time step algorithm

Exponential Time Differencing (ETD) = + u t Lu N ( u , t ).   t   − = + + + L t L t Ls u ( t ) e u ( t ) e e N ( u ( t s ), t s ) ds . + n 1 n n n 0 exact integration polynomial approx. Stable large time step-size Exact integration & proper splitting of L and N High order accuracy Multi-step, prediction-correction, Runge-Kutta

13 Second order Second order ETD scheme ETD scheme Unconditionally Energy Stable

Time Integration Accuracy High order accuracy in time is important for simulating coarsening dynamics with large-time-step schemes. Time step-size can be 10-100X than 1 st order implicit schemes; > 4 orders of magnitude larger than explicit Euler scheme. Extensive numerical 1 st order stabilized 1 st order cETD 2 nd order cETD experiments can be found in semi-implicit Euler “Fast and accurate algorithms for simulating coarsening dynamics of Cahn–Hilliard equations”, Computational Materials Science, 108 (2015), pp 272-282.

Ex Exam ample ple

Localization ETD • • •    •       • • •    •    = =  L t e   U               • • • •        N N N      (N N N ) (N N N ) x y z x y z x y z M. Hochbruck and A. Ostermann , “Exponential integrators,” Acta Numerica, vol. 19, pp. 209 – 286, 2010. compact ETD • • •      • • •     = A t e =   U         • • •     N x N x based on FD spatial discretization + subdomain coupling techniques Efficient direct subdomain integration overlapping BC & discretization large time step-size, stable and accurate, compute intensive

GPU Acceleration

MPI Communication 26 adjacent subdomains Twice per step 3-round scheme

Simulation setup P100-PCIe-12GB ： 4.7T=4812.8GFlops ； 540GB subdomain ： 768*768*384=0.2109G Points, 216 subdomain = 45G points; 20,000~50,000 time steps, average step size ~ 10,000X vs. explicit Subdomain divided into 192*192*192 blocks when calculating matrix exponentials ~ perform 32 tensor dot production simultaneously 2.45TFlops/step

Performance Between subdomain: 73ms (pack,copy,MPI) Tensor dot production: 2.42T@3.19T/sec, 759ms, ~ 66% peak Stencil & pointwise: 47ms Overall performance: DP 2,787GFlops ~ 58% peak, ~880ms/step Explicit FD scheme Stencil : 12.8GFlops/step @ 40% peak ~6.2ms/step 10,000 steps= 62 sec ETD is 70X faster!

Other Platforms

Sunway TaihuLight 40,960 SW26010 many-core processors; 260 cores, divided into 4 core groups (CGs), 1 MPE + 64 CPEs 8GB main memory for each CG 64KB SPM for each CPE MPI recommended among CGs DMA available SPM main memory

Performance Analysis DGEMM: 457.2 and 408.5 GFlops, 60% and 53% peak Aggregate DMA BW in T and SP: ~ 22GB/s Overall : 316.1 to 324.5 Gflops, 41%-42% peak

Summary

Summary A promising algorithm for a variety of architectures Large time step, scalable, compute intensive Idea applicable to other stiff evolution equations fluid dynamics, structure- fluid interaction… Thank you!

Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang - PowerPoint PPT Presentation

Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang Computer Network Information Center(CNIC), Chinese Academy of Sciences 2 Outline Outline Background Phase Field Model Large Scale Smiulations Compute intensive

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Internet Topology Generation for Large Scale BGP Simulation Jean-Michel Fourneau - Houssame

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

8 Hayfi field eld Cross Churc rch h of Englan and d Scho hool ol :

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Analysis of Nodalization Uncertainty for Higher-order Numerical Scheme under RBHT Experimental

Rotorcraft Noise Prediction with Multi-disciplinary Coupling Methods Yi Liu NIA CFD Seminar,

Autonomous Ground Systems Improved Relative Positioning for Path Following in Autonomous Convoys

LARGE EDDY SIMULATION OF CAVITATING PROPELLER FLOWS 1 Chalmers University of Technology, 2 Berg

HAZARDOUS BACKWARD IN TIME CON- TINUATION IN NONLINEAR PARABOLIC EQUATIONS AND DEBLURRING NON-

Yet Another Approach To Model Merging merge and diff relations and very short version rules

International Conference on Challenges of the Anthropocene (ICCA), 10-12 May 2017 Title:

3D Object Tracking in Driving Environment: a Short Review and a Benchmark Dataset Pedro Giro,

Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang - PowerPoint PPT Presentation

Accelerating Large-scale Phase Field Simulation with GPU Jian Zhang Computer Network Information Center(CNIC), Chinese Academy of Sciences 2 Outline Outline Background Phase Field Model Large Scale Smiulations Compute intensive

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Internet Topology Generation for Large Scale BGP Simulation Jean-Michel Fourneau - Houssame

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

8 Hayfi field eld Cross Churc rch h of Englan and d Scho hool ol :

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Analysis of Nodalization Uncertainty for Higher-order Numerical Scheme under RBHT Experimental

Rotorcraft Noise Prediction with Multi-disciplinary Coupling Methods Yi Liu NIA CFD Seminar,

Autonomous Ground Systems Improved Relative Positioning for Path Following in Autonomous Convoys

LARGE EDDY SIMULATION OF CAVITATING PROPELLER FLOWS 1 Chalmers University of Technology, 2 Berg

HAZARDOUS BACKWARD IN TIME CON- TINUATION IN NONLINEAR PARABOLIC EQUATIONS AND DEBLURRING NON-

Yet Another Approach To Model Merging merge and diff relations and very short version rules

International Conference on Challenges of the Anthropocene (ICCA), 10-12 May 2017 Title:

3D Object Tracking in Driving Environment: a Short Review and a Benchmark Dataset Pedro Giro,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team