VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware - PowerPoint PPT Presentation

VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware Inc.

VTK-m Project • Supercomputer Hardware Advances Everyday – More and more parallelism • High-Level Parallelism – “The Free Lunch Is Over” (Herb Sutter)

VTK-m Project Goals • A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. • Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms

VTK-m Project Goals • Make it easier for simulation codes to take advantage these parallel visualization and analysis tasks on a wide range of current and next-generation hardware.

VTK-m Project • Combines the strengths of multiple projects: – EAVL, Oak Ridge National Laboratory – DAX, Sandia National Laboratory – PISTON, Los Alamos National Laboratory

VTK-m Architecture Filters Post Processing DataModel Worklets In-Situ Execution Arrays Data Parallel Algorithms

Gaps in Current Data Models • Traditional data set models target only common combinations of cell and point arrangements • This limits their expressiveness and flexibility Point Arrangement Cells Coordinates Explicit Logical Implicit Structured ? Strided n/a Grid Structured Rectilinear Image ? Separated Grid Data ? ? Strided Unstructured Grid Unstructured ? ? ? Separated

Arbitrary Compositions for Flexibility • EAVL allows clients to construct data sets from cell and point arrangements that exactly match their original data – In effect, this allows for hybrid and novel mesh types • Native data results in greater accuracy and efficiency Point Arrangement Cells Coordinates Explicit Logical Implicit    Strided EAVL Structured    Separated    Strided Unstructured Data Set    Separated

Other Data Model Gaps Addressed in EAVL A B Multiple simultaneous Low/high dimensional data Multiple cell groups in one mesh coordinate systems (9D mesh in GenASiS) (E.g. subsets, face sets, flux surfaces) (lat/lon + Cartesian xyz) H H C C H H Non-physical data (graph, Novel and hybrid mesh types Mixed topology meshes sensor, performance data) (quadtree grid from MADNESS) (atoms + bonds, sidesets)

Memory Efficiency in EAVL • Data model designed for memory efficient representations Memory Usage VTK EAVL – Lower memory usage for same mesh relative to 128 64 Bytes per Crid Cell traditional data models 32 – Less data movement for common transformations leads 16 8 to faster operation 4 2 • Example: threshold data selection 1 Original Threshold Threshold Threshold – 7x memory usage reduction Data (a) (b) (c) – 5x performance improvement Total Runtime VTK EAVL 16 Runtime (msec) 8 4 2 1 35 < Density < 45 Cells Remaining

Dax: Data Analysis Toolkit for Extreme Scale Kenneth Moreland Sandia National Laboratories Robert Maynard Kitware, Inc.

Dax Framework Control Execution Environment Environment Device Grid Topology Cell Operations Worklet Array Handle Field Operations Adapter Invoke Basic Math Allocate Make Cells Transfer Schedule Sort … dax::cont dax::exec

dax::cont::ArrayHandle<dax::Scalar> inputHandle = dax::cont::make_ArrayHandle(input); dax::cont::ArrayHandle<dax::Scalar> sineResult; dax::cont::DispatcherMapField<Sine> dispatcher; dispatcher.Invoke(inputHandle, sineResult); Control Environment Execution Environment struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(FieldIn, FieldOut); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };

Dax Success • ParaView/VTK – Zero-copy support for vtkDataArray – Exposed as a plugin inside ParaView • Will fall back to cpu version 16

Dax Success • TomViz: an open, general S/TEM visualization tool – Built on top of ParaView framework – Operates on large (1024 3 and greater) volumes – Uses Dax for algorithm construction • Implements streaming, interactive, incremental contouring – Streams indexed sub-grids to threaded contouring algorithms 17

Piston • Focuses on developing data-parallel algorithms that are portable across multi-core and many-core architectures for use by LCF codes of interest • Algorithms are integrated into LCF codes in-situ either directly or though integration with ParaView Catalyst Ocean temperature isosurface generated PISTON isosurface with curvilinear PISTON integration with VTK and across four GPUs using distributed PISTON coordinates ParaView

Distributed Parallel Halo Finder • Particles are distributed among processors according to a decomposition of the physical space • Overload zones (where particles are assigned to two processors) are defined such that every halo will be fully contained within at least one processor • Each processor finds halos within its domain: Drop in PISTON multi- /many-core accelerated algorithms • At the end, the parallel halo finder performs a merge step to handle “mixed” halos (shared between two processors), such that a unique set of halos is reported globally

Distributed Parallel Halo Finder Performance Improvements  On Moonlight with 1024 3 particles on 128 nodes with 16 processes per node, PISTON on GPUs was 4.9x faster for halo + most bound particle center finding  On Titan with 1024 3 particles on 32 nodes with 1 process per node, PISTON on GPUs was 11x faster for halo + most bound particle center finding  Implemented grid-based most bound particle center finder using a Poisson solver that performs fewer total computations than standard O(n 2 ) algorithm Science Impact  These performance improvements allowed halo analysis to be performed on a very large 8192 3 particle data set across 16,384 nodes on Titan for which analysis using the existing CPU algorithms was not feasible Publications  Submitted to PPoPP15: “Utilizing Many -Core Accelerators for Halo and Center Finding within a Cosmology Simulation” Christopher Sewell, Li -ta Lo, Katrin Heitmann, Salman Habib, and James Ahrens • This test problem has ~90 million particles per process. • Due to memory constraints on the GPUs, we utilize a hybrid approach, in which the halos are computed on the CPU but the centers on the GPU. • The PISTON MBP center finding algorithm requires much less memory than the halo finding algorithm but provides the large majority of the speed-up, since MBP center finding takes much longer than FOF halo finding with the original CPU code.

Results: Visual comparison of halos Original Algorithm VTK-m Algorithm

Questions? Filters Post Processing DataModel Worklets In-Situ Execution Arrays Data Parallel Algorithms

VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware - PowerPoint PPT Presentation

VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware Inc. VTK-m Project Supercomputer Hardware Advances Everyday More and more parallelism High-Level Parallelism The Free Lunch Is Over (Herb Sutter) VTK-m

ParaView and VTK with OSPRay and OpenSWR David DeMarle, Intel HPC DevCon 2016 VTK - open source

A Deeper Dive Into the Volunteer Toolkit Learning Objectives Today we will. Assess our

Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware)

VTK: The Visualiza.on Toolkit Part I: Overview and Graphics

VTK: The Visualiza.on Toolkit Part II: Visualiza.on Model

In-Situ Data Analysis and Visualization: ParaView, Calalyst and VTK-m GTC, San Jose, CA March,

VTK Vanguard Whats new in the trusty old Visualization Toolkit releases Emails

Jon Lequire Hornets Nest System Analyst Boy Scout for 12 years, Eagle Scout Girl Scout for 11

Computer Graphics: Visualisation Lecture 3 Taku Komura tkomura@inf.ed.ac.uk Institute for

Scien&fic Data File Formats Han-Wei Shen The Ohio

Windows Persistent Memory Support Neal Christiansen Microsoft Principal Development Lead Santa

Windows Persistent Memory Support Neal Christiansen Microsoft Principal Development Lead Santa

Risk management Protection for investors and managers control. The Company developed a unique

Milind Milind Talegaonkar Date: 2020.08.04 Talegaonkar 13:42:38 +05'30' Milind Talegaonkar

OMERO Storage Manager A new tool for OMERO/Columbus users developed by DAX Archiving Solutions

Investor Presentation The leading light in Photonics. August 2020 Disclaimer This presentation

W&W Group continues positive trend in the third quarter W&W Group 9M/19 Agenda 1

LAN LANARKSHIRE T ARKSHIRE TOURISM RECO OURISM RECOVER VERY 24 JUL 24 JULY Y 2020 2020 JIM

1H 2016 Results Presentation August 2016 Disclaimer Neither this presentation (the

The global patent database Family and Publication Searching Straightforward search forms simplify

C2 Financial Technology Alternative Products November 2016 Important Disclosures The

FIXING RHODE ISLAND SCHOOL BUILDINGS Office of General Treasurer Seth Magaziner 1 SCHOOLS OUR

TD Securities 2014 London Energy Conference Jim V. Bertram, CEO January 13, 2014 Disclaimer The

FY21 SRP Business Solutions Program Sustainable Cities Network June 9, 2020 Welcome / SRP

VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware - PowerPoint PPT Presentation

VTK-m: Uniting GPU Acceleration Successes Robert Maynard Kitware Inc. VTK-m Project Supercomputer Hardware Advances Everyday More and more parallelism High-Level Parallelism The Free Lunch Is Over (Herb Sutter) VTK-m

ParaView and VTK with OSPRay and OpenSWR David DeMarle, Intel HPC DevCon 2016 VTK - open source

A Deeper Dive Into the Volunteer Toolkit Learning Objectives Today we will. Assess our

Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware)

VTK: The Visualiza.on Toolkit Part I: Overview and Graphics

VTK: The Visualiza.on Toolkit Part II: Visualiza.on Model

In-Situ Data Analysis and Visualization: ParaView, Calalyst and VTK-m GTC, San Jose, CA March,

VTK Vanguard Whats new in the trusty old Visualization Toolkit releases Emails

Jon Lequire Hornets Nest System Analyst Boy Scout for 12 years, Eagle Scout Girl Scout for 11

Computer Graphics: Visualisation Lecture 3 Taku Komura tkomura@inf.ed.ac.uk Institute for

Scien&amp;fic Data File Formats Han-Wei Shen The Ohio

Windows Persistent Memory Support Neal Christiansen Microsoft Principal Development Lead Santa

Windows Persistent Memory Support Neal Christiansen Microsoft Principal Development Lead Santa

Risk management Protection for investors and managers control. The Company developed a unique

Milind Milind Talegaonkar Date: 2020.08.04 Talegaonkar 13:42:38 +05'30' Milind Talegaonkar

OMERO Storage Manager A new tool for OMERO/Columbus users developed by DAX Archiving Solutions

Investor Presentation The leading light in Photonics. August 2020 Disclaimer This presentation

W&amp;W Group continues positive trend in the third quarter W&amp;W Group 9M/19 Agenda 1

LAN LANARKSHIRE T ARKSHIRE TOURISM RECO OURISM RECOVER VERY 24 JUL 24 JULY Y 2020 2020 JIM

1H 2016 Results Presentation August 2016 Disclaimer Neither this presentation (the

The global patent database Family and Publication Searching Straightforward search forms simplify

C2 Financial Technology Alternative Products November 2016 Important Disclosures The

FIXING RHODE ISLAND SCHOOL BUILDINGS Office of General Treasurer Seth Magaziner 1 SCHOOLS OUR

TD Securities 2014 London Energy Conference Jim V. Bertram, CEO January 13, 2014 Disclaimer The

FY21 SRP Business Solutions Program Sustainable Cities Network June 9, 2020 Welcome / SRP

Scien&fic Data File Formats Han-Wei Shen The Ohio

W&W Group continues positive trend in the third quarter W&W Group 9M/19 Agenda 1