Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and - - PowerPoint PPT Presentation

accelerated astrophysics
SMART_READER_LITE
LIVE PREVIEW

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and - - PowerPoint PPT Presentation

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, @brant_robertson UC Santa Cruz


slide-1
SLIDE 1

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Accelerated Astrophysics:

Using NVIDIA GPUs to Simulate and Understand the Universe

  • Prof. Brant Robertson

Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, @brant_robertson

slide-2
SLIDE 2

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

UC Santa Cruz: a world-leading center for astrophysics

  • Home to one of the largest computational

astrophysics groups in the world.

  • Home to the University of California

Observatories.

  • World-wide top 5 graduate program for

astronomy and astrophysics according to US News and World Report.

  • Many PhD students in our program

interested in professional data science.

  • http://www.astro.ucsc.edu

https://www.usnews.com/education/best-global-universities/space-science

slide-3
SLIDE 3

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

GPUs as a scientific tool

Grid code on a CPU Grid code on a GPU

slide-4
SLIDE 4

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

A (brief) intro to finite volume methods

un+1

i,j,k = un i,j,k − δt

δx ⇣ F

n+ 1

2

i− 1

2 ,j,k − F

n+ 1

2

i+ 1

2 ,j,k

⌘ − δt δy ⇣ G

n+ 1

2

i,j− 1

2 ,k − G

n+ 1

2

i,j+ 1

2 ,k

⌘ − δt δz ⇣ H

n+ 1

2

i,j,k− 1

2 − H

n+ 1

2

i,j,k+ 1

2

x z y

Fi+ 1

2 ,j,k

Gi,j+ 1

2 ,k

Hi,j,k+ 1

2

conserved quantity at time n+1 conserved quantity at time n “fluxes” of conserved quantities across each cell face Simulation cell

slide-5
SLIDE 5

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Conserved variable update in standard C

for (i=0; i<nx; i++) { density[i] += dt/dx * (F.d[i-1] - F.d[i]); momentum_x[i] += dt/dx * (F.mx[i-1] - F.mx[i]); momentum_y[i] += dt/dx * (F.my[i-1] - F.my[i]); momentum_z[i] += dt/dx * (F.mz[i-1] - F.mz[i]); Energy[i] += dt/dx * (F.E[i-1] - F.E[i]); }

Simple loop; potential for loop parallelization, vectorization.

slide-6
SLIDE 6

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Conserved variable update using CUDA

// call cuda kernel Update_Conserved_Variables<<<dimGrid,dimBlock>>>(dev_conserved, F_x, nx, dx, dt); // copy the conserved variable array onto the GPU cudaMemcpy(dev_conserved, host_conserved, 5*n_cells*sizeof(Real), cudaMemcpyHostToDevice); // copy the conserved variable array back to the CPU cudaMemcpy(host_conserved, dev_conserved, 5*n_cells*sizeof(Real), cudaMemcpyDeviceToHost);

Memory transfer, CUDA kernel, memory transfer…

slide-7
SLIDE 7

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Conserved variable update CUDA kernel

void Update_Conserved_Variables(Real *dev_conserved, Real *dev_F, int nx, Real dx, Real dt) { // get a global thread ID id = threadIdx.x + blockIdx.x * blockDim.x; // update the conserved variable array if (id < nx) { dev_conserved[ id] += dt/dx * (dev_F[ id-1] - dev_F[ id]); dev_conserved[ nx + id] += dt/dx * (dev_F[ nx + id-1] - dev_F[ nx + id]); dev_conserved[2*nx + id] += dt/dx * (dev_F[2*nx + id-1] - dev_F[2*nx + id]); dev_conserved[3*nx + id] += dt/dx * (dev_F[3*nx + id-1] - dev_F[3*nx + id]); dev_conserved[4*nx + id] += dt/dx * (dev_F[4*nx + id-1] - dev_F[4*nx + id]); } }

Mapping between CUDA thread and simulation cell; memory coalescence for transfer efficiency.

slide-8
SLIDE 8

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

  • A GPU-native, massively-

parallel, grid-based hydrodynamics code written by Evan Schneider for her PhD thesis.

  • Incorporates state-of-the-art

hydrodynamics algorithms (unsplit integrators, 3rd order spatial reconstruction, precise Riemann solvers, dual energy formulation, etc).

  • Includes GPU-accelerated

radiative cooling and photoionization.

  • github.com/cholla-hydro/cholla

Cholla:

Computational hydrodynamics

  • n

ll (parallel) architectures

Cholla are also a group

  • f cactus species that

grows in the Sonoran Desert of southern Arizona.

Schneider & Robertson (2015)

slide-9
SLIDE 9

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Cholla leverages the world’s most powerful supercomputers

Titan: Oak Ridge Leadership Computing Facility

slide-10
SLIDE 10

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Cholla achieves excellent scaling to >16,000 NVIDIA GPUs

Weak scaling: Total problem size increases, work assigned to each processor stays the same. Strong scaling: Same total problem size, work divided amongst more processors.

Schneider & Robertson (2015, 2017)

Strong Scaling test, 5123 cells

Weak Scaling test, ~3223 cells / GPU

Tests performed on ORNL Titan (AST 109, 115, 125).

slide-11
SLIDE 11

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

55,804,166,144 cell updates symmetric about y=x to roundoff error

ρ = 0.1 P = 0.14 P = 1 ρ = 1

Example test calculation: implosion (10242)

2D implosion test with Cholla on NVIDIA GPUs

slide-12
SLIDE 12

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Application: modeling galactic outflows

Image credit: hubblesite.org

slide-13
SLIDE 13

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Cholla + NVIDIA GPUs form a unique tool simulating astrophysical fluids.

Cholla can simulate the structure of galactic winds

x y z vshock Shock Front Cloud

Important questions:

  • How does mass and

momentum become entrained in galactic winds?

  • How does the detailed

structure of galactic winds arise?

slide-14
SLIDE 14

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Cholla can simulate the structure of galactic winds

1.25e9 cells, 512 NVIDIA K20X GPUs on ORNL Titan

Schneider, E. & Robertson, B. 2017, ApJ, 834, 144

slide-15
SLIDE 15

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Leveraging the NVIDIA DGX-1 for astrophysical research

2x 20-core Intel E5-2698 v4 CPUs, 8x NVIDIA P100 GPUs, 768 GB/s Bandwidth, 4x Mellanox EDR Infiniband NICs

  • Unlike risk-adverse mission-critical

astronomical software, pipeline and high-level analysis software can leverage new and emerging technologies.

  • Utilize investments in software from

Silicon Valley, data science, other industries.

  • UCSC Astrophysicists use the NVIDIA

DGX-1 for astrophysical simulation and astronomical data analysis. NVIDIA DGX-1

slide-16
SLIDE 16

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

  • The UCSC

Astrophysics DGX-1 system is our development platform for constructing complex initial conditions.

  • The DGX-1 system is

powerful enough to perform high-quality Cholla simulations of disk galaxies.

Accelerated simulations

  • f disk galaxies

2563, single P100, 2hrs

slide-17
SLIDE 17

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

i s c

  • n

s i n I n d i a n a Y a l e N O A O t e l e s c

  • p

e i n H α ( m a S m i t h , G a l l a g h e r & W e s t m

  • q

u e t t e ) .

  • s

t a r c l u s t e r s e m b e d d e d

A n n u . R e v . A s t r

  • n

. A s t r

  • p

h y s . 2 5 . 4 3 : 7 6 9

  • 8

2 6 . D

  • w

n l

  • a

d e d f r

  • m

w w w . a n n u a l r e v i e w s .

  • r

g A c c e s s p r

  • v

i d e d b y U n i v e r s i t y

  • f

A r i z

  • n

a

  • L

i b r a r y

  • n

5 / 1 1 / 1 6 . F

  • r

p e r s

  • n

a l u s e

  • n

l y .

4096 cells

~66,000 ly ~33,000 ly gain region

  • utflow

2048 cells 2048 cells

Cholla simulations of M82 initial conditions

Cholla + Titan global simulations

  • f galactic outflows
slide-18
SLIDE 18

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Cholla + ORNL Titan global simulations

  • f galactic
  • utflows
  • Test calculation on

Titan - 10243, largest hydro simulation of a single galaxy ever performed.

  • 512 K20X GPUs,

6hours, ~90K core hours

  • ~47M core hour

allocation (AST-125) x-y x-z density temperature

slide-19
SLIDE 19

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Hubble Ultra Deep Field

Using NVIDIA GPUs for astronomical data analysis

slide-20
SLIDE 20

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Kartaltepe et al., ApJS, 221, 11 (2015)

Human galaxy classification….

Expert classifications

  • f Hubble images

from the CANDELS survey.

slide-21
SLIDE 21

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Human galaxy classification does not scale.

New observatories will image >10 billion galaxies.

slide-22
SLIDE 22

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

NVIDIA DGX-1

Input + Output Residual Block Identity Convolution Layers Addition Keeps Same Dimensions Fully Connected Layer Classi Series of Residual Blocks

Classification PDF Multiband Imaging “Residual Block” Fully Connected Layer Hausen & Robertson, (in preparation)

Morpheus — a UCSC deep learning model for astronomical galaxy classification by Ryan Hausen

slide-23
SLIDE 23

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Hausen & Robertson, Morpheus preliminary

slide-24
SLIDE 24

UC Santa Cruz Astrophysics @brant_robertson NVIDIA GTC2017

Summary

  • The Cholla hydrodynamical simulation code uses NVIDIA

GPUs to model astrophysical fluid dynamics, written by Evan Schneider for her PhD thesis supervised by Brant Robertson.

  • UCSC Astrophysics is using the ORNL Titan supercomputer

and DGX-1 system, each powered by NVIDIA GPUs, for astrophysical simulation and astronomical data analysis.

  • The Morpheus Deep Learning Framework for Astrophysics

is under development by Ryan Hausen at UCSC for automated galaxy classification and other astrophysical machine learning applications.