HPC-AI CONVERGENCE Workshop on the Convergence of ML & HPC - - PowerPoint PPT Presentation

hpc ai convergence
SMART_READER_LITE
LIVE PREVIEW

HPC-AI CONVERGENCE Workshop on the Convergence of ML & HPC - - PowerPoint PPT Presentation

NVIDIA TO ACCELERATE THE HPC-AI CONVERGENCE Workshop on the Convergence of ML & HPC Gunter Roeth gunterr@nvidia.com March 2020 GRAND CHALLENGES REQUIRE MASSIVE COMPUTING REINVENTING THE LI-ION BATTERY UNDERSTANDING HIVS STRUCTURE


slide-1
SLIDE 1

Workshop on the Convergence of ML & HPC Gunter Roeth gunterr@nvidia.com March 2020

NVIDIA TO ACCELERATE THE HPC-AI CONVERGENCE

slide-2
SLIDE 2

2

GRAND CHALLENGES REQUIRE MASSIVE COMPUTING

REINVENTING THE LI-ION BATTERY

3M Node Hours | 7 Days on Titan

UNDERSTANDING HIV’S STRUCTURE

10M node Hours |16 Days on BlueWaters

CLOUD RESOLVING CLIMATE SIMULATIONS

100M Node Hours | 840 Days on Piz Daint

slide-3
SLIDE 3

3

TOP500 EFFECTS

All #1 #500

1 TFLOPS 100 GFLOPS 10 TFLOPS 100 TFLOPS 1 PFLOPS 10 PFLOPS 100 PFLOPS

slide-4
SLIDE 4

4

SOMETHING NEW:

AI + HPC = REVOLUTION

slide-5
SLIDE 5

5

INGREDIENTS: BIG DATA

slide-6
SLIDE 6

6 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

BIG DATA IN SCIENCE

Big Science ingests/outputs Big Data

Large Hadron Collider Square Kilometer Array Johns Hopkins Turbulence Database

slide-7
SLIDE 7

7

AI WORKFLOW FOR HPC

ERRORS REGRESSION TESTING (FP16/INT8) INFERENCE (FP16/INT8) TRAINING (FP32/FP16) SIMULATION (FP64/FP32)

DATA

REGRESSION SET NEW DATA TRAINING SET

slide-8
SLIDE 8

THE CONVERGENCE OF HPC * AI

Integrating the Third and Fourth Pillars of Scientific Discovery AI

New algorithms and models with potential to increase model size and accuracy

HPC

40+ years of algorithms based on first principles theory

Commercially viable fusion energy Understanding cosmological dark energy and matter Clinically viable precision medicine Improve or validate the Standard Model of Physics Climate/weather forecasts with ultra- high fidelity

Dramatically Improves Accuracy and /or Time-to-Solution at Large Scale

slide-9
SLIDE 9

9

AI FOR HPC

Transformative Tool To Accelerate The Pace of Scientific Innovation

Improves Accuracy Enabling realization of full scientific potential Accelerates Time to Solution Unlocking the use of science in exciting new ways

300,000X Faster Predict Molecular Energetics Drug Discovery 5,000X Faster Process LIGO Signal Understanding Universe Weeks to 10 milliseconds Analyze Gravitational Lensing Astrophysics 14X Faster Generate Bose-Einstein Condensate (Physics) 90% accuracy Fusion Sustainment Clean Energy 33% Faster Track Neutrinos Particle Physics 70% accuracy Score Protein Ligand Drug Discovery 11% higher accuracy Monitor Earth’s Vital Climate

slide-10
SLIDE 10

10

INTELLIGENT HPC

DL Driving Future HPC Breakthroughs Pre- processing Simulation Post- processing

  • Trained networks as solvers
  • Super-resolution of coarse simulations
  • Low- and mixed-precision
  • Simulation for training, network in production
  • Select/classify/augment/

distribute input data

  • Control job parameters
  • Analyze/reduce/augment
  • utput data
  • Act on output data

From calendar time to real time?

slide-11
SLIDE 11

11

THE SHAPE OF AI SUPERCOMPUTING

slide-12
SLIDE 12

12

VOLTA TENSOR CORE GPU FUELS WORLD'S FASTEST SUPERCOMPUTER

Fused HPC and AI Computing In a Unified Platform

Genomics (CoMet) World’s First Exascale Run Finding Genes-to-disease Connection Same accuracy as FP64 w/ Tensor Core Quantum Chemistry (QMCPack) Simulate New Materials High-Temperature Semiconductors

50X

Over Titan

150X

Over Titan

Summit Supercomputer Oakridge National Labs AI: 3 Exaflops HPC: 122 Petaflops

Measured performance: Summit node vs Titan node

slide-13
SLIDE 13

13

1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025

AI: A NEW COMPUTING PARADIGM

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,

  • O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected

for 2010-2015 by K. Rupp

102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year

slide-14
SLIDE 14

14

NVIDIA

DL AND HPC – JOINTLY SOLVE NEW PROBLEMS, BETTER

slide-15
SLIDE 15

AI SUPERCOMPUTING IS HERE

DATA SCIENCE COMPUTATIONAL SCIENCE COMPUTATIONAL & DATA SCIENCE

Turbulent Flow Molecular Dynamics Structural Analysis N-body Simulation “Next move?” “Is there cancer?” “What’s happening?” “What does she mean?” Understanding Universe Clean Energy Drug Discovery Monitoring Climate Change

Extending The Reach of HPC By Combining Computational & Data Science

S8242 – DL for Computational Science, Jeff Adie & Yang Juntao Presented ~20 Success Stories of DL in Computational Science

(GTC on-demand: http://on-demand-gtc.gputechconf.com)

slide-16
SLIDE 16

16

Computational Chemistry

slide-17
SLIDE 17

17

55

AI Quantum Breakthrough

Background

Developing a new drug costs $2.5B and takes 10-15 years. Quantum chemistry (QC) simulations are important to accurately screen millions of potential drugs to a few most promising drug candidates.

Challenge

QC simulation is computationally expensive so researchers use approximations, compromising on accuracy. To screen 10M drug candidates, it takes 5 years to compute on CPUs.

Solution

Researchers at the University of Florida and the University of North Carolina leveraged GPU deep learning to develop ANAKIN-ME, to reproduce molecular energy surfaces with super speed (microseconds versus several minutes), extremely high (DFT) accuracy, and at 1-10/millionths of the cost of current computational methods. Essentially the DL model is trained to learn Hamiltonian of the Schrodinger equation.

Impact

Faster, more accurate screening at far lower cost

slide-18
SLIDE 18

18

NEURAL NETWORK MODEL APPROACH

Training set: ~20M DFT data points. Molecules with 1 to 8 atoms from GDB database

slide-19
SLIDE 19

20

Computational Mechanics

slide-20
SLIDE 20

21

FEA UPDATED WITH NEURAL NETWORK

FEA trained deep neural network for surrogate modelling of estimated stress distribution. Deepvirtuality, a spinoff from Volkswagen Data:Lab under Nvidia Inception Program has demonstrate with their software aimed for a quicker prediction of structural data.

Deep Learning for Solid Mechanics

An demonstration of Structure Born Noise of a V12 Engine with Deepvirtuality Torsional Frequencies of a Car Body by Deepvirtuallity

slide-21
SLIDE 21

22 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Eulerian Fluid Simulation Approximating PDE solutions

“Accelerating Eulerian Fluid Simulation With Convolutional Networks”, Thompson et al., 2016

slide-22
SLIDE 22

23

SimRes at SC19 in Denver Physics Informed NN

slide-23
SLIDE 23

24

EXAMPLES OF PINN

Vortex induced vibrations problem of flow past a circular cylinder. (eta) Incompressible Navier-Stokes equations

slide-24
SLIDE 24

25

Train first Data Driven Networks (DDNN)

slide-25
SLIDE 25

26

AUTOMOTIVE AERODYNAMICS

Inference Training

slide-26
SLIDE 26

27

DATA DRIVEN METHODS

27

  • Need to generate a lot of Simulations (accuracy dependent on

the simulation code)

  • No Physics Awareness; Generalizability may be limited
  • Not very efficient for Complex 3D Geometries/Curved Surfaces
  • Interpolation/Extrapolation Errors

+ Not dependent on Physics Pros & Cons

slide-27
SLIDE 27

28

Use Physics Informed NN (PINNs)

slide-28
SLIDE 28

29

PHYSICS INFORMED NEURAL NETS: ARCHITECTURE

29

A Neural Network Architecture for Computational Mechanics/Physics problems

❑ Point Cloud for 3D Geometries & Meshes (Fixed/Moving, Deforming, Structured & Unstructured) ❑ Physics Driven & Physics Aware Networks (respects the governing PDEs, Multi-disciplinary) ❑ Performance optimized for GPU tensor cores

PINN - Physics Informed Neural Networks Point Cloud representation of Computational Domain & Data on 3D Geometries

slide-29
SLIDE 29

30

PHYSICS DRIVEN METHODS

30

❑ Problem Modeling:

  • Complex Geometries

❑ Sampling Insensitivity ❑ Network Architecture:

  • Faithfully represent the Physics with initial & boundary conditions
  • Architectural Requirements for nth order derivatives
  • Loss Convergence Acceleration
  • Activation Functions
  • Gradients & Discontinuities
  • Global vs. Local

Special Considerations

slide-30
SLIDE 30

32

Results of Physics Informed NN (PINNs)

slide-31
SLIDE 31

33

STEADY STATE: 2D LID DRIVEN CAVITY SIMNET VS. OPENFOAM

U velocity difference = 0.2% V velocity difference = 0.4%

slide-32
SLIDE 32

34

HEAT SINK: A MULTI-PHYSICS PROBLEM

Heat Sink – * Temperatures to not exceed the design criteria Objectives – * Similar accuracy as the Solver * Geometry representation with Point Clouds * Multiple simultaneous parametrized & unparametrized geometries Physics involved – CFD & Heat Transfer Ansys IcePack used for Simulation (** we kindly acknowledge Ansys’s support **)

slide-33
SLIDE 33

35

CFD (turbulent) Fluid-Solid Interface Conditions Temperature Heat Flux Heat Transfer in Fluid Heat Transfer in Solid PINN Network Architecture 10 layers for non-Physics Informed Network 10 x 2n layers for nth order PDEs 50-500 neurons per layer Swish Activation Function Multi-Physics PDEs CFD (with turbulence) – 2nd Order PDE Heat Transfer in Solids & Fluid

NETWORK ARCHITECTURE

Multi-Physics Neural Networks

slide-34
SLIDE 34

37

HEATSINK DESIGN OPTIMIZATION

Physics Informed Neural Net for Coupled CFD-Heat Transfer Problems

slide-35
SLIDE 35

38

Earth Science

slide-36
SLIDE 36

39

ANOMALY DETECTION IN CLIMATE DATA

Identifying “extreme” weather events in multi-decadal datasets with 5-layered Convolutional Neural Network. Reaching 99.98% of detection accuracy. (Kim et al, 2017)

Deep Learning for Climate Modeling

Systemic framework for detection and localization of extreme climate event Dataset: Visualization of historic cyclones from JWTC hurricane report from 1979 to 2016

slide-37
SLIDE 37

40

EMULATING RRTMG WITH DEEP NEURAL NETWORKS FOR THE ENERGY EXASCALE EARTH SYSTEM MODEL

  • Rapid Radiation Transfer Model for GCMs (RRTMG) is the most time-consuming

component of General Circulation Models(GCMs)

  • Oak Ridge National Laboratory made use of Deep Neural Network to learn from

RRTMG model. D Deep Learning for Climate Modeling

GCM for climate modeling Short Wave Test Results Long Wave Test Results

slide-38
SLIDE 38

41

Computational Physics

slide-39
SLIDE 39

42

ACCELERATING MODELS FOR ACCELERATORS

Challenge

The Full Order GEANT Simulation is used to model the CERN LHC, NoVA, DUNE and other High Energy Particle Physics Experiments The GEANT simulation is over 10Mn lines of C++ code with a flat execution profile that takes hours/days to simulate an experiment run, so each experiment uses a Reduced Order Fast Sim

Solution

A Generative Adversarial Network (CaloGAN) was trained on Fast Simulation data and compared against “ground truth” using GEANT output

Impact

The CaloGAN was shown to be 5 Orders of Magnitude faster than the FAST Simulation and nearly as accurate as the GEANT ground truth

slide-40
SLIDE 40

47

DEEP LEARNING FOR GRAVITATIONAL WAVE DETECTION

Deep learning method named deep filtering was used in the first detection of gravitational wave. Numerical simulated data was used for training deep filtering, a convolutional neural network to replace matched filtering. It provided 20X speed up on single core and potential to be accelerated further with GPU.

Deep Learning for Computational Physics

Gravitational wave due to black hole collide and merge LIGO facility To be observed Actual Signal Caused by Gravitational Wave Actual observed data How to find The signal??? Deep Learning

slide-41
SLIDE 41

50

Challenge

Neutrino detection experiments are massive and extremely expensive to build where detection sensitivity is directly proportional to size

Solution

Convolutional Neural Net (ImageNet with layers removed) Trained with data from Full Order Models like GEANT and GENIE Validated with Full Order Model Data

Impact

Accuracy increased by 33% as a POC, and then improved with further tuning to 49% An equivalent increase of 15Mn pounds of detector mass

HUNTING “GHOST PARTICLES” WITH A BETTER TOOL

slide-42
SLIDE 42

53

MAKING COMMERCIALLY VIABLE FUSION ENERGY POSSIBLE

Challenge

Accurate prediction of plasma disruption with enough lead time to shut down or modify the reactor

Solution

Recurrent Neural Net (Custom for Fusion) Trained and validated with data from Joint European Tokamak (JET)

Impact

Accuracy is >95% with <5% False Alarm with 60ms lead time with higher accuracy at 10ms lead time Next step: add higher order classifiers to training set, and model improved actuator response time

slide-43
SLIDE 43

NVIDIA ECOSYSTEM FOR CONVERGED METHODS

Optimized Applications: 600+ Apps and 50 Containerized Apps and DL Frameworks…. CUDA-X HPC & AI

Linear Algebra

Desktop Development Data Center Supercomputers GPU-Accelerated Cloud

Parallel Algorithms

Signal Processing

Deep Learning Machine Learning Visualization

Compilers and 3rd Party Libraries: PGI, GCC, KOKKOS, RAJA, MAGMA, PETSC, All DL Frameworks….. CUDA Workflow Support: RAPIDS, MERLIN….

slide-44
SLIDE 44

Merci gunterr@nvidia.com