ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable - PowerPoint PPT Presentation

SDV IS AND I N -S ITU V ISUALIZATION ON TACC’ S S TAMPEDE -KNL Paul A. Navrátil, Ph.D. Manager – Scalable Visualization Technologies pnav@tacc.utexas.edu 1

High-Fidelity Visualization Natively on Xeon and Xeon Phi 2

O UTLINE Stampede Architecture Stampede – Sandy Bridge Stampede - KNL Stampede 2 – KNL + Skylake Software-Defined Visualization Stack VNC OpenSWR OSPRay Path to In-Situ ParaView Catalyst VisIt Libsim Direct renderer integration 3

S TAMPEDE A RCHITECTURE 4

S TAMPEDE S ANDY B RIDGE 16 Large Memory nodes, each with: 4x Intel Xeon E5-4650 “Sandy Bridge” 2x NVIDIA Quadro 2000 “Fermi” 1 TB RAM 128 GPU nodes, each with: Mellanox FDR Interconnect 2x Intel Xeon E5-2680 6400 compute nodes, each with: 1x Intel Xeon Phi SE10P 2x Intel Xeon E5-2680 “Sandy Bridge” 1x NVIDIA Tesla K20 “Kepler” 1x Intel Xeon Phi SE10P 32 GB RAM 32 GB RAM / 8 GB Phi RAM 5

S TAMPEDE KNL Notes: Shared $WORK and $SCRATCH Separate $HOME directories Separate Login Node login-knl1.stampede.tacc.utexas.edu Deployed in 2016 as planned Login is Intel Xeon E5-2695 upgrade to Stampede “Haswell” Compile on compute node First KNL-based system in Top500 or use “ –xMIC-AVX512 ” on login Intel OmniPath interconnect “normal” and ”development” 508 nodes, each with: queues are Cache-Quadrant Other MCDRAM configs available 1x Intel Xeon Phi 7250 by queue name 96 GB RAM + 16 GB MCDRAM 6

S TAMPEDE 2 ( COMING 2017) ~18 PF Dell Intel Xeon + Intel Xeon Phi system Combine KNL + Skylake + OmniPath + 3D XPoint Phase 1: Spring 2017 Stampede KNL + 4200 new KNL nodes + new filesystem 60% of Stampede Sandy Bridge to remain operational during this phase Phase 2: Fall 2017 1736 Intel Skylake nodes Phase 3: Spring 2018 Add 3D XPoint memory to subset of nodes 8

K EY A RCHITECTURAL T AKE -A WAY Current and near-future cyberinfrastructure will use processors with many cores Each core contains wide vector units: use them for max utilization (e.g., AVX512 ) Fortunately the Software-Defined Visualization stack is optimized for such processors! Use your preferred rendering method independent of the underlying hardware Performant rasterization Performant ray tracing Visualization and analysis on the simulation machine 9

S OFTWARE -D EFINED V ISUALIZATION – W HY ? 100 F ILE S IZE 10 G BPS 1 G BPS 300 M BPS 54 M BPS G BPS 1 GB < 1 sec 1 sec 10 sec 35 sec 2.5 min 1 TB ~100 sec ~17 min ~3 hours ~10 hours ~43 hours 1 PB ~1 day ~12 days ~121 days >1 year ~5 years Increasingly Difficult to Move Data from Simulation Machine 10

S OFTWARE -D EFINED V ISUALIZATION 11

S OFTWARE -D EFINED V ISUALIZATION – W HY ? 12

S OFTWARE -D EFINED V ISUALIZATION S TACK OpenSWR Software Rasterizer openswr.org Performant rasterization for Xeon and Xeon Phi Thread-parallel vector processing (previous parallel Mesa3D only has threaded fragments) Support for wide vector instruction sets, particularly AVX2 (and soon AVX512) Integrated into Mesa3D 12.0 as optional driver (mesa3d.org) Best Uses Lines Graphs User Interfaces 16

S OFTWARE -D EFINED V ISUALIZATION S TACK OSPRay Ray Tracer ospray.org Performant ray tracing for Xeon and Xeon Phi incorporating Embree kernels Thread- and wide-vector parallel using Intel ISPC (including AVX512 support) Parallel rendering support via distributed framebuffer Best Uses Photorealistic rendering Realistic lighting Realistic material effects Large geometry Implicit geometry (e.g., molecular ”ball and stick” models) 17

S OFTWARE -D EFINED V ISUALIZATION S TACK GraviT Scheduling Framework tacc.github.io/GraviT/ Large-scale, data-distributed ray tracing (uses OSPRay for rendering engine target) Parallel rendering support via distributed ray scheduling Best Uses Large distrubted data Data outside of renderer control Incoherent ray-intensive sampling (e.g., global illumination approximations) 18

OSPR AY T EST S UITE – S AMPLE I MAGES Test 0 Test 1 Test 4 Test 2 Test 3 Test 5 Test 7 Test 8 Test 6 19

OSPR AY T EST S UITE – MCDRAM P ERFORMANCE R ESULTS 20

P ARA V IEW T EST S UITE – M ANY S PHERES 21

Likely VNC limited 22

Definitely VNC limited! 24

FIU C ORE S AMPLE – S AMPLE I MAGE 25

Definitely VNC limited! Likely hitting VNC desktop 28 limits

P ATH TO I N -S ITU V ISUALIZATION 29

W HY I N -S ITU V ISUALIZATION ? Processors (like KNL) enabling larger, more detailed simulations File system technologies not scaling at same rate (if at all….) Touching disk is expensive: During simulation: time checkpointing is (often) not time computing During analysis: loading the data is (often) the overwhelming majority of runtime In-situ capabilities overcome this data bottleneck Render directly from resident simulation data Tightly coupled vis opens doors for online analysis, computational steering, etc 30

C URRENT I N -S ITU O PTIONS Simulation developer Implement visualization API (ParaView Catalyst, VisIt libsim, VTK) Implement data framework (ADIOS, etc) Implement direct rendering calls (OSPRay API, etc) Simulation user Hope the developers do one of the above J Do one of the above yourself L Hope technology keeps post-hoc analysis viable (3D XPoint NVRAM might help) 31

I N -S ITU V ISUALIZATION API S ParaView Catalyst (and Cinema) (www.paraview.org/in-situ/) VisIt Libsim (www.visitusers.org/index.php?title=Libsim_Batch) Direct VTK integration (www.vtk.org) Visualization ops already implemented Need coordination b/t teams to ensure simulation and vis performance Image courtesy of Kitware Inc. 32

I N -S ITU -C OMPATIBLE D ATA F RAMEWORKS ADIOS – https://www.olcf.ornl.gov/center-projects/adios/ Damaris – https://hal.inria.fr/hal-00859603/en DIY – http://www.mcs.anl.gov/~tpeterka/software.html GLEAN – http://www.mcs.anl.gov/project/glean-situ-visualization-and-analysis SCIRun – http://www.sci.utah.edu/cibc-software/scirun.html (Possibly) more invasive implementation effort (Possibly) broader benefits beyond visualization (framework now controls data) Requires engagement from simulation team to ensure performance and accuracy 33

I N -S ITU D IRECT R ENDERING Render directly within simulation using API (e.g., OSPRay, OpenGL, etc) Must implement visualization operations within simulation code Lightest weight, lowest overhead Requires visualization algorithm knowledge for efficient implementation Locks in particular rendering and visualization modes 34

I N -S ITU F UTURE ? Useful perspectives at ISAV – http://conferences.computer.org/isav/2016/ 35

TACC/K ITWARE IPCC – U NIMPEDED I N S ITU V ISUALIZATION ON I NTEL X EON AND I NTEL X EON P HI Optimize ParaView Catalyst for current and near-future Intel architectures KNL, Skylake, Omnipath, 3D XPoint NVRAM Use Stampede-KNL as testbed to target TACC Stampede 2, NERSC Cori, LANL Trinity Focus on data and rendering paths for OpenSWR and OSPRay Parallelize VTK data processing filters Catalyst integration with simulation Targeted algorithm improvements Increase processor and memory utilization 36

T HANK YOU ! pnav@tacc.utexas.edu 37

ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable - PowerPoint PPT Presentation

SDV IS AND I N -S ITU V ISUALIZATION ON TACC S S TAMPEDE -KNL Paul A. Navrtil, Ph.D. Manager Scalable Visualization Technologies pnav@tacc.utexas.edu 1 High-Fidelity Visualization Natively on Xeon and Xeon Phi 2 O UTLINE Stampede

KNL KNL KNL KNL KNL KNL KNL Example code: Check available memory [Xajacks@eln4

Recommendations 2018 Membership of TACC Committee TACC Committee brings together representatives

The Open Science Computing Ecosystem at the Texas Advanced Computing Center (TACC) Siva

MEMORY ON THE KNL Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Some slides from Intel

THE PAST, PRESENT AND FUTURE OF IRODS AT TACC Chris Jordan TACC Data Management and Collections

EPANKO GRAPHITE PROJECT ASX:KNL FSE:FMK D I S C L A I M E R Securities Disclaimer This

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures Yusuke

All the things you need to know about Intel MPI Library Jerome Vienne viennej@tacc.utexas.edu

Learn Basic Commands in an Hour Ritu Arora (rauta@tacc.utexas.edu) Texas Advanced Computing

So, What Actually is a Cloud? Dan Stanzione Deputy Director, TACC UT-Austin Originally from

Using the Interactive Parallelization Tool to Generate Parallel Programs (OpenMP, MPI, and CUDA )

Text Analytics for Collaborative Content (TACC) Introduction Collaboration is the key aspect in

HPC-CH Meeting Software Management for HPC Scientific Software Management at sciCORE Pablo

Remora: A Resource Monitoring Tool for Everyone Carlos Rosales carlos@tacc.utexas.edu Where

Introduction to HPC2N Birgitte Bryds HPC2N, Ume a University 4-5 December 2019 1 / 21

Introduction to HPC2N Birgitte Bryds, Mirko Myllykoski, Pedro Ojeda-May HPC2N, Ume a

In Situ Adaptive Tabulation for Real-Time Control J. D. Hedengren T. F. Edgar The University of

IN-SITU IMPOUNDMENT Tim Silar, P.G. Principal Geoscientist CLOSURE AND John Magee, P.E.

OBSERVATOR Y S YSTEM S IN TH E MEDITERRANE A N SEA Authors: Georgios Sylaios Democritus

A Lagrangian strategy for in situ sampling of the physical-biological A Lagrangian strategy for in

WELLS RANCH SEC. 25: OBSERVATIONS FROM A UNDERGROUND IN-SITU LABORATORY Dave Koskella Bob

An in situ sediment sound speed An in situ sediment sound speed measurement platform:

Program 1. Company presentation 2. Smart Burners - How it works haemers- technologies.com 4

Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev

Sambuz

Useful Links

Newsletter

Mail Us