ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable - - PowerPoint PPT Presentation

on tacc s s tampede knl
SMART_READER_LITE
LIVE PREVIEW

ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable - - PowerPoint PPT Presentation

SDV IS AND I N -S ITU V ISUALIZATION ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable Visualization Technologies pnav@tacc.utexas.edu 1 High-Fidelity Visualization Natively on Xeon and Xeon Phi 2 O UTLINE Stampede


slide-1
SLIDE 1

SDVIS AND IN-SITU VISUALIZATION

ON TACC’S STAMPEDE-KNL

Paul A. Navrátil, Ph.D. Manager – Scalable Visualization Technologies pnav@tacc.utexas.edu

1

slide-2
SLIDE 2

2

High-Fidelity Visualization Natively on Xeon and Xeon Phi

slide-3
SLIDE 3

OUTLINE

„ Stampede Architecture

„ Stampede – Sandy Bridge „ Stampede - KNL „ Stampede 2 – KNL + Skylake

„ Software-Defined Visualization Stack

„ VNC „ OpenSWR „ OSPRay

„ Path to In-Situ

„ ParaView Catalyst „ VisIt Libsim „ Direct renderer integration

3

slide-4
SLIDE 4

4

STAMPEDE ARCHITECTURE

slide-5
SLIDE 5

STAMPEDE SANDY BRIDGE

„ Mellanox FDR Interconnect „ 6400 compute nodes, each with:

„ 2x Intel Xeon E5-2680

“Sandy Bridge”

„ 1x Intel Xeon Phi SE10P „ 32 GB RAM / 8 GB Phi RAM

„ 16 Large Memory nodes, each with:

„ 4x Intel Xeon E5-4650 “Sandy Bridge” „ 2x NVIDIA Quadro 2000 “Fermi” „ 1 TB RAM

„ 128 GPU nodes, each with:

„ 2x Intel Xeon E5-2680 „ 1x Intel Xeon Phi SE10P „ 1x NVIDIA Tesla K20 “Kepler” „ 32 GB RAM

5

slide-6
SLIDE 6

STAMPEDE KNL

„ Deployed in 2016 as planned

upgrade to Stampede

„ First KNL-based system in Top500 „ Intel OmniPath interconnect „ 508 nodes, each with:

„ 1x Intel Xeon Phi 7250 „ 96 GB RAM + 16 GB MCDRAM

„ Notes:

„ Shared $WORK and $SCRATCH

Separate $HOME directories

„ Separate Login Node

login-knl1.stampede.tacc.utexas.edu

„ Login is Intel Xeon E5-2695

“Haswell”

„ Compile on compute node

  • r use “–xMIC-AVX512” on login

„ “normal” and ”development”

queues are Cache-Quadrant

„ Other MCDRAM configs available

by queue name

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

STAMPEDE 2 (COMING 2017)

„ ~18 PF Dell Intel Xeon + Intel Xeon Phi system „ Combine KNL + Skylake + OmniPath + 3D XPoint „ Phase 1: Spring 2017

„ Stampede KNL + 4200 new KNL nodes + new filesystem „ 60% of Stampede Sandy Bridge to remain operational during this phase

„ Phase 2: Fall 2017

„ 1736 Intel Skylake nodes

„ Phase 3: Spring 2018

„ Add 3D XPoint memory to subset of nodes

8

slide-9
SLIDE 9

KEY ARCHITECTURAL TAKE-AWAY

„ Current and near-future cyberinfrastructure will use processors with many cores „ Each core contains wide vector units: use them for max utilization (e.g., AVX512) „ Fortunately the Software-Defined Visualization stack is optimized for such processors! „ Use your preferred rendering method independent of the underlying hardware

„ Performant rasterization „ Performant ray tracing „ Visualization and analysis on the simulation machine

9

slide-10
SLIDE 10

SOFTWARE-DEFINED VISUALIZATION – WHY?

10

FILE SIZE 100 GBPS 10 GBPS 1 GBPS 300 MBPS 54 MBPS 1 GB < 1 sec 1 sec 10 sec 35 sec 2.5 min 1 TB ~100 sec ~17 min ~3 hours ~10 hours ~43 hours 1 PB ~1 day ~12 days ~121 days >1 year ~5 years

Increasingly Difficult to Move Data from Simulation Machine

slide-11
SLIDE 11

11

SOFTWARE-DEFINED VISUALIZATION

slide-12
SLIDE 12

SOFTWARE-DEFINED VISUALIZATION – WHY?

12

slide-13
SLIDE 13

SOFTWARE-DEFINED VISUALIZATION – WHY?

13

slide-14
SLIDE 14

SOFTWARE-DEFINED VISUALIZATION – WHY?

14

slide-15
SLIDE 15

SOFTWARE-DEFINED VISUALIZATION – WHY?

15

slide-16
SLIDE 16

SOFTWARE-DEFINED VISUALIZATION STACK

„ OpenSWR Software Rasterizer

„ openswr.org „ Performant rasterization for Xeon and Xeon Phi „ Thread-parallel vector processing (previous parallel Mesa3D only has threaded fragments) „ Support for wide vector instruction sets, particularly AVX2 (and soon AVX512) „ Integrated into Mesa3D 12.0 as optional driver (mesa3d.org)

„ Best Uses

„ Lines „ Graphs „ User Interfaces

16

slide-17
SLIDE 17

SOFTWARE-DEFINED VISUALIZATION STACK

„ OSPRay Ray Tracer

„ ospray.org „ Performant ray tracing for Xeon and Xeon Phi incorporating Embree kernels „ Thread- and wide-vector parallel using Intel ISPC (including AVX512 support) „ Parallel rendering support via distributed framebuffer

„ Best Uses

„ Photorealistic rendering „ Realistic lighting „ Realistic material effects „ Large geometry „ Implicit geometry (e.g., molecular ”ball and stick” models)

17

slide-18
SLIDE 18

SOFTWARE-DEFINED VISUALIZATION STACK

„ GraviT Scheduling Framework

„ tacc.github.io/GraviT/ „ Large-scale, data-distributed ray tracing (uses OSPRay for rendering engine target) „ Parallel rendering support via distributed ray scheduling

„ Best Uses

„ Large distrubted data „ Data outside of renderer control „ Incoherent ray-intensive sampling (e.g., global illumination approximations)

18

slide-19
SLIDE 19

OSPRAY TEST SUITE – SAMPLE IMAGES

19

Test 0 Test 1 Test 4 Test 2 Test 3 Test 5 Test 6 Test 7 Test 8

slide-20
SLIDE 20

OSPRAY TEST SUITE – MCDRAM PERFORMANCE RESULTS

20

slide-21
SLIDE 21

PARAVIEW TEST SUITE – MANYSPHERES

21

slide-22
SLIDE 22

22

Likely VNC limited

slide-23
SLIDE 23

23

Likely VNC limited

slide-24
SLIDE 24

24

Definitely VNC limited!

slide-25
SLIDE 25

FIU CORE SAMPLE – SAMPLE IMAGE

25

slide-26
SLIDE 26

26

Likely VNC limited

slide-27
SLIDE 27

27

Likely VNC limited

slide-28
SLIDE 28

28

Likely hitting VNC desktop limits

Definitely VNC limited!

slide-29
SLIDE 29

29

PATH TO IN-SITU VISUALIZATION

slide-30
SLIDE 30

WHY IN-SITU VISUALIZATION?

„ Processors (like KNL) enabling larger, more detailed simulations „ File system technologies not scaling at same rate (if at all….) „ Touching disk is expensive:

„ During simulation: time checkpointing is (often) not time computing „ During analysis: loading the data is (often) the overwhelming majority of runtime

„ In-situ capabilities overcome this data bottleneck

„ Render directly from resident simulation data „ Tightly coupled vis opens doors for online analysis, computational steering, etc

30

slide-31
SLIDE 31

CURRENT IN-SITU OPTIONS

„ Simulation developer

„ Implement visualization API (ParaView Catalyst, VisIt libsim, VTK) „ Implement data framework (ADIOS, etc) „ Implement direct rendering calls (OSPRay API, etc)

„ Simulation user

„ Hope the developers do one of the above J „ Do one of the above yourself L „ Hope technology keeps post-hoc analysis viable (3D XPoint NVRAM might help)

31

slide-32
SLIDE 32

IN-SITU VISUALIZATION APIS

„ ParaView Catalyst (and Cinema) (www.paraview.org/in-situ/) „ VisIt Libsim (www.visitusers.org/index.php?title=Libsim_Batch) „ Direct VTK integration (www.vtk.org) „ Visualization ops already implemented „ Need coordination b/t teams to ensure

simulation and vis performance

32

Image courtesy of Kitware Inc.

slide-33
SLIDE 33

IN-SITU-COMPATIBLE DATA FRAMEWORKS

„ ADIOS – https://www.olcf.ornl.gov/center-projects/adios/ „ Damaris – https://hal.inria.fr/hal-00859603/en „ DIY – http://www.mcs.anl.gov/~tpeterka/software.html „ GLEAN – http://www.mcs.anl.gov/project/glean-situ-visualization-and-analysis „ SCIRun – http://www.sci.utah.edu/cibc-software/scirun.html „ (Possibly) more invasive implementation effort „ (Possibly) broader benefits beyond visualization (framework now controls data) „ Requires engagement from simulation team to ensure performance and accuracy 33

slide-34
SLIDE 34

IN-SITU DIRECT RENDERING

„ Render directly within simulation using API (e.g., OSPRay, OpenGL, etc) „ Must implement visualization operations within simulation code

„ Lightest weight, lowest overhead „ Requires visualization algorithm knowledge for efficient implementation

„ Locks in particular rendering and visualization modes 34

slide-35
SLIDE 35

IN-SITU FUTURE?

35

Useful perspectives at ISAV – http://conferences.computer.org/isav/2016/

slide-36
SLIDE 36

TACC/KITWARE IPCC – UNIMPEDED IN SITU VISUALIZATION ON INTEL XEON AND INTEL XEON PHI

„ Optimize ParaView Catalyst for current and near-future Intel architectures

„ KNL, Skylake, Omnipath, 3D XPoint NVRAM „ Use Stampede-KNL as testbed to target TACC Stampede 2, NERSC Cori, LANL Trinity

„ Focus on data and rendering paths for OpenSWR and OSPRay

„ Parallelize VTK data processing filters „ Catalyst integration with simulation „ Targeted algorithm improvements

„ Increase processor and memory utilization

36

slide-37
SLIDE 37

THANK YOU!

pnav@tacc.utexas.edu

37