Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and - PowerPoint PPT Presentation

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, @brant_robertson UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

UC Santa Cruz: a world-leading center for astrophysics Home to one of the largest computational • https://www.usnews.com/education/best-global-universities/space-science astrophysics groups in the world. Home to the University of California • Observatories. World-wide top 5 graduate program for • astronomy and astrophysics according to US News and World Report. Many PhD students in our program • interested in professional data science. http://www.astro.ucsc.edu • UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

GPUs as a scientific tool Grid code on a CPU Grid code on a GPU UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

A (brief) intro to finite volume methods i,j,k − δ t n + 1 n + 1 ⇣ ⌘ conserved quantity u n +1 i,j,k = u n F 2 ,j,k − F 2 2 at time n+1 i − 1 i + 1 2 ,j,k δ x − δ t n + 1 n + 1 ⇣ ⌘ G 2 2 ,k − G 2 conserved quantity i,j − 1 i,j + 1 2 ,k δ y at time n − δ t n + 1 n + 1 ⇣ ⌘ z H 2 2 − H 2 i,j,k − 1 i,j,k + 1 δ z Simulation cell 2 H i,j,k + 1 2 “fluxes” of conserved quantities across G i,j + 1 2 ,k F i + 1 each cell face 2 ,j,k y x UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Conserved variable update in standard C for (i=0; i<nx; i++) { density[i] += dt/dx * (F.d[i-1] - F.d[i]); momentum_x[i] += dt/dx * (F.mx[i-1] - F.mx[i]); momentum_y[i] += dt/dx * (F.my[i-1] - F.my[i]); momentum_z[i] += dt/dx * (F.mz[i-1] - F.mz[i]); Energy[i] += dt/dx * (F.E[i-1] - F.E[i]); } Simple loop; potential for loop parallelization, vectorization. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Conserved variable update using CUDA // copy the conserved variable array onto the GPU cudaMemcpy(dev_conserved, host_conserved, 5*n_cells*sizeof(Real), cudaMemcpyHostToDevice); // call cuda kernel Update_Conserved_Variables<<<dimGrid,dimBlock>>>(dev_conserved, F_x, nx, dx, dt); // copy the conserved variable array back to the CPU cudaMemcpy(host_conserved, dev_conserved, 5*n_cells*sizeof(Real), cudaMemcpyDeviceToHost); Memory transfer, CUDA kernel, memory transfer… UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Conserved variable update CUDA kernel void Update_Conserved_Variables(Real *dev_conserved, Real *dev_F, int nx, Real dx, Real dt) { // get a global thread ID id = threadIdx.x + blockIdx.x * blockDim.x; // update the conserved variable array if (id < nx) { dev_conserved[ id] += dt/dx * (dev_F[ id-1] - dev_F[ id]); dev_conserved[ nx + id] += dt/dx * (dev_F[ nx + id-1] - dev_F[ nx + id]); dev_conserved[2*nx + id] += dt/dx * (dev_F[2*nx + id-1] - dev_F[2*nx + id]); dev_conserved[3*nx + id] += dt/dx * (dev_F[3*nx + id-1] - dev_F[3*nx + id]); dev_conserved[4*nx + id] += dt/dx * (dev_F[4*nx + id-1] - dev_F[4*nx + id]); } } Mapping between CUDA thread and simulation cell; memory coalescence for transfer efficiency. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla : • A GPU-native, massively- parallel, grid-based hydrodynamics code written Computational by Evan Schneider for her PhD thesis. hydrodynamics • Incorporates state-of-the-art hydrodynamics algorithms on (unsplit integrators, 3 rd order spatial reconstruction, precise ll (parallel) Riemann solvers, dual energy formulation, etc). architectures • Includes GPU-accelerated radiative cooling and Cholla are also a group photoionization. of cactus species that grows in the Sonoran • github.com/cholla-hydro/cholla Desert of southern Arizona. Schneider & Robertson (2015) UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla leverages the world’s most powerful supercomputers Titan: Oak Ridge Leadership Computing Facility UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla achieves excellent scaling to >16,000 NVIDIA GPUs Strong Scaling test, 512 3 cells Weak Scaling test, ~322 3 cells / GPU Weak scaling: Strong scaling: Total problem size Same total problem increases, work size, work divided assigned to each amongst more processor stays the processors. same. Tests performed on ORNL Titan (AST 109, 115, 125). Schneider & Robertson (2015, 2017) UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

2D implosion test with Cholla on NVIDIA GPUs P = 1 ρ = 1 Example test calculation: implosion (1024 2 ) 55,804,166,144 cell updates P = 0 . 14 ρ = 0 . 1 symmetric about y=x to roundoff error UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Application: modeling galactic outflows Image credit: hubblesite.org UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla can simulate the structure of galactic winds Important questions: z • How does mass and v shock Cloud momentum become entrained in galactic winds? y • How does the detailed structure of galactic winds arise? x Shock Front Cholla + NVIDIA GPUs form a unique tool simulating astrophysical fluids. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla can simulate the structure of galactic winds Schneider, E. & Robertson, B. 2017, ApJ, 834, 144 1.25e9 cells, 512 NVIDIA K20X GPUs on ORNL Titan UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Leveraging the NVIDIA DGX-1 for astrophysical research • Unlike risk-adverse mission-critical astronomical software, pipeline and high-level analysis software can NVIDIA leverage new and emerging DGX-1 technologies. • Utilize investments in software from Silicon Valley, data science, other industries. 2x 20-core Intel E5-2698 v4 CPUs, 8x NVIDIA P100 GPUs, 768 GB/s Bandwidth, • UCSC Astrophysicists use the NVIDIA 4x Mellanox EDR Infiniband NICs DGX-1 for astrophysical simulation and astronomical data analysis. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Accelerated simulations of disk galaxies • The UCSC Astrophysics DGX-1 system is our development platform for constructing complex initial conditions. • The DGX-1 system is powerful enough to perform high-quality Cholla simulations of disk galaxies. 256 3 , single P100, 2hrs UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Cholla + Titan global outflow simulations of galactic outflows 2048 cells 2048 cells 4096 cells Cholla simulations of g ~66,000 r . y o l . s n w o M82 initial conditions gain region e ly e i s v u e r l l a a n u o n s n r e a p . w r w o F w . 6 m 1 o / 1 r f 1 d / 5 e 0 d a n o o l n y w r a o r D b i L . 6 - 2 8 a n - 9 o 6 z 7 i r : A 3 4 f . o 5 0 y 0 t 2 i s r . e s y v h i n a p m ( U o H α r y n t i s b e . A p ) o e ~33,000 ly t d c t s e e u . e l q n e o d t m d o O e i A t s d r v O e d t W e o N s b m & A r e l e p a r Y e s h r . e a g s t v n a s s a l u l e i l a e d c G r R n a c I , t h n s c t - i i s . m A n UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson u o S c n s i n A

Cholla + ORNL density temperature Titan global simulations of galactic outflows Test calculation on • x-y Titan - 1024 3 , largest hydro simulation of a single galaxy ever performed. 512 K20X GPUs, • 6hours, ~90K core hours ~47M core hour • allocation (AST-125) x-z UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Using NVIDIA GPUs for astronomical data analysis Hubble Ultra Deep Field UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Human galaxy classification…. Expert classifications of Hubble images from the CANDELS survey. Kartaltepe et al., ApJS, 221, 11 (2015) UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Human galaxy classification does not scale. New observatories will image >10 billion galaxies. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Morpheus — a UCSC deep learning model for astronomical galaxy classification by Ryan Hausen Residual Block “Residual Block” Convolution Layers Hausen & Robertson, (in preparation) Keeps Same Dimensions NVIDIA DGX-1 Addition + Output Input Identity Fully Connected Fully Connected Layer Layer Multiband Classification Classi Imaging PDF Series of Residual Blocks UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Hausen & Robertson, Morpheus preliminary UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Summary • The Cholla hydrodynamical simulation code uses NVIDIA GPUs to model astrophysical fluid dynamics, written by Evan Schneider for her PhD thesis supervised by Brant Robertson. • UCSC Astrophysics is using the ORNL Titan supercomputer and DGX-1 system, each powered by NVIDIA GPUs, for astrophysical simulation and astronomical data analysis. • The Morpheus Deep Learning Framework for Astrophysics is under development by Ryan Hausen at UCSC for automated galaxy classification and other astrophysical machine learning applications. UC Santa Cruz Astrophysics NVIDIA GTC2017 @brant_robertson

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and - PowerPoint PPT Presentation

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, @brant_robertson UC Santa Cruz

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Time-domain Astrophysics in the Era of Big Data V. Ashley Villar Center for Astrophysics |

The Scholars Academy: The Scholars Academy: An Accelerated Program for An Accelerated Program

What is Accelerated Reader? Accelerated Reader is a computer program that helps teachers manage

Roseburn Primary School Dream Believe Achieve Accelerated Reading A Guide for Parents

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

Accelerated Development of Materials, The Future Is Here (!) Raymundo Arryave Accelerated

ACCELERATED COMPUTING FOR AI Bryan Catanzaro, 7 December 2018 ACCELERATED COMPUTING: REDUCE

Future of High Energy Astrophysics Future of High Energy Astrophysics Nicholas White NASA GSFC

High time-domain Astrophysics with SALT High time-domain Astrophysics with SALT Stephen Potter

Nuclear Astrophysics at SJTU Lie-Wen Chen ( ) Department of Physics and Astronomy,

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for

Enzo-E/Cello astrophysics and cosmology Adaptive mesh refinement astrophysics using Charm++ James

ME 416/516 Dynamics The Mathematics of Analytical Dynamics (Integral Formulation) Gregory P.

Two-stage Benchmarking of Time-Series Models for Small Area Estimation Danny Pfeffermann,

Final Issue Report on the RAA Amendments Margie Milam RAA Developments- Dakar Board Resolution

Stability Analysis of Material Point Method Dr. Martin Berzins, Dr. Mike Kirby, Chris Gritton

Associate Professor University of Cincinnati Linda Levin, PhD Associate Professor University of

Alpine Cultural Heritage Protection: approaches and techniques CHEERS project: Insights on WP T3

A Production-Based Economic Explanation for the Gross Profitability Premium Discussion by

Project Report Study sites: Haryana (Palwal district) and Orissa (Khurda district), India 1