Roadmap for Many-core Visualization Software in DOE Jeremy Meredith - PowerPoint PPT Presentation

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory

Supercomputers! • Supercomputer Hardware Advances Everyday – More and more parallelism • High-Level Parallelism – “The Free Lunch Is Over” (Herb Sutter)

VTK-m Project • Combines the strengths of multiple projects: – EAVL, Oak Ridge National Laboratory – DAX, Sandia National Laboratory – PISTON, Los Alamos National Laboratory

VTK-m Goals • A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. • Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms

VTK-m Goals • Make it easier for simulation codes to take advantage these parallel visualization and analysis tasks on a wide range of current and next-generation hardware.

VTK-m Architecture Filters Post Processing Data Model Worklets In-Situ Data Parallel Algorithms Execution Arrays

Extreme-scale Analysis and Visualization Library ( EAVL ) EAVL enables advanced visualization and analysis for the next generation of scientific simulations, supercomputing systems, and end-user analysis tools. New Mesh Layouts Greater Memory Efficiency • More accurately represent simulation • Support future low-memory systems data in analysis results • Minimize data movement and • Support novel simulation applications transformation costs Parallel Algorithm Framework In Situ Support • Accelerator-based system support • Direct zero-copy mapping of data from simulation to analysis codes • Pervasive parallelism for multi-core • Heterogeneous processing models and many-core processors J.S. Meredith, S. Ahern, D. Pugmire, R. Sisneros, "EAVL: The Extreme-scale Analysis and Visualization http://ft.ornl.gov/eavl Library", Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), 2012.

Gaps in Current Data Models • Traditional data set models target only common combinations of cell and point arrangements • This limits their expressiveness and flexibility Point Arrangement Cells Coordinates Explicit Logical Implicit Hybrid ? ? Strided Structured Grid Image ? ? Structured Separated Rectilinear Grid Data ? ? ? Hybrid ? ? Strided Unstructured Grid ? ? ? ? Unstructured Separated ? ? ? Hybrid

Arbitrary Compositions for Flexibility • EAVL allows clients to construct data sets from cell and point arrangements that exactly match their original data – In effect, this allows for hybrid and novel mesh types • Native data results in greater accuracy and efficiency Point Arrangement Cells Coordinates Explicit Logical Implicit Hybrid     Strided EAVL     Structured Separated     Hybrid     Strided Data Set     Unstructured Separated     Hybrid

Other Data Model Gaps Addressed in EAVL A B Multiple simultaneous Low/high dimensional data Multiple cell groups in one mesh coordinate systems (9D mesh in GenASiS) (E.g. subsets, face sets, flux surfaces) (lat/lon + Cartesian xyz) H H C C H H Non-physical data (graph, Novel and hybrid mesh types Mixed topology meshes sensor, performance data) (quadtree grid from MADNESS) (atoms + bonds, sidesets)

Memory Efficiency in EAVL • Data model designed for memory efficient representations Memory Usage VTK EAVL – Lower memory usage for same mesh relative to 128 64 Bytes per Crid Cell traditional data models 32 – Less data movement for common transformations leads 16 8 to faster operation 4 2 • Example: threshold data selection 1 Original Threshold Threshold Threshold – 7x memory usage reduction Data (a) (b) (c) – 5x performance improvement Total Runtime VTK EAVL 16 Runtime (msec) 8 4 2 1 35 < Density < 45 Cells Remaining

Tightly Coupled In Situ with EAVL • Efficient in situ visualization and analysis – light weight, zero-dependency library – zero-copy references to host simulation – heterogeneous memory support for accelerators – flexible data model supports non-physical data types • Example: scientific and performance visualization, tightly coupled EAVL with SciDAC Xolotl plasma/surface simulation In Situ Scientific Visualization with Xolotl and EAVL In Situ Performance Visualization with Xolotl and EAVL Species concentrations across grid Cluster concentrations at 2.5mm Solver time for each MPI task Solver time at each time step

Loosely coupled In Situ with EAVL • Application de-coupled from visualization using ADIOS and Data Spaces – EAVL plug-in reads data from staging nodes – System nodes running EAVL perform visualization operations and rendering • Example: field and particle data, EAVL in situ with XGC SciDAC simulation via ADIOS and Data Spaces Visualization of XGC field data from running simulation HPC Application Vis/Analysis ( EAVL ) ADIOS ADIOS Staging ( Data Spaces ) Visualization of XGC particles from running simulation. All Supercomputer node layout for loosely coupled EAVL in situ particles (left), and selected subset of particles (right).

Data Parallelism in EAVL • Algorithm development framework in EAVL Runtimes for Surface Normal Operation 160 µs combines productivity with pervasive parallelism 140 µs 120 µs 100 µs – Data parallel primitives map functors onto 80 µs 60 µs 40 µs mesh-aware iteration patterns 20 µs 0 µs Intel AMD OpenMP NVIDIA NVIDIA NVIDIA • Example: surface normal operation Xeon Opteron 4xAMD GeForce Tesla Tesla E5520 8356 8356 8800GTX C1060 C2050 – strong performance scaling on multi-core and many-core devices Performance Scaling on Xeon Phi (CPU, GPU, MIC/KNF ) Parallel Efficiency Relative Runtime 100 % 90 % 80 % Publications: 70 % 60 % • D. Pugmire , J. Kress, J.S. Meredith, N. Podhorszki, J. Choi, S. Klasky, “Towards Scalable Visualization Plugins for Data 50 % 40 % Staging Workflows”, 5th International Workshop on Big Data Analytics: Challenges and Opportunities (BDAC), 2014. 30 % 20 % • C. Sewell, J.S. Meredith, K. Moreland, T. Peterka, D. DeMarle, L.-T. Lo, J. Ahrens, R. Maynard, B. Geveci, "The SDAV 10 % Software Frameworks for Visualization and Analysis on Next-Generation Multi-Core and Many-Core Architectures", 0 % 2 4 8 16 32 64 128 Seventh Workshop on Ultrascale Visualization (UltraVis), 2012. Number of Threads • J.S. Meredith, R. Sisneros, D. Pugmire, S. Ahern, "A Distributed Data-Parallel Framework for Analysis and Visualization Algorithm Development", Workshop on General Purpose Processing on Graphics Processing Units (GPGPU5), 2012. • J.S. Meredith, S. Ahern, D. Pugmire, R. Sisneros, "EAVL: The Extreme-scale Analysis and Visualization Library", Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), 2012.

Advanced Rendering in EAVL • Advanced rendering capabilities – raster/vector, ray tracing, volume rendering – all GPU accelerated using EAVL’s data parallel API – parallel rendering support via MPI and IceT • Examples: ambient occlusion lighting effects highlight subtle shape cues for scientific understanding • Example: direct volume rendering achieves high accuracy images with GPU-accelerated performance Shear-wave perturbations in Ebola glycoprotein with Direct volume rendering from SPECFEM3D_GLOBAL code proteins from survivor Shepard global interpolant

Dax: Data Analysis Toolkit for Extreme Scale Kenneth Moreland Sandia National Laboratories Robert Maynard Kitware, Inc.

Dax Success • ParaView/VTK – Zero-copy support for vtkDataArray – Exposed as a plugin inside ParaView • Will fall back to cpu version 19

Dax Success • TomViz: an open, general S/TEM visualization tool – Built on top of ParaView framework – Operates on large (1024 3 and greater) volumes – Uses Dax for algorithm construction • Implements streaming, interactive, incremental contouring – Streams indexed sub-grids to threaded contouring algorithms 20

dax::cont::ArrayHandle<dax::Scalar> inputHandle = dax::cont::make_ArrayHandle(input); dax::cont::ArrayHandle<dax::Scalar> sineResult; dax::cont::DispatcherMapField<Sine> dispatcher; dispatcher.Invoke(inputHandle, sineResult); Control Environment Execution Environment struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(FieldIn, FieldOut); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };

Results: Visual comparison of halos Original Algorithm PISTON Algorithm

Piston • Focuses on developing data-parallel algorithms that are portable across multi-core and many-core architectures for use by LCF codes of interest • Algorithms are integrated into LCF codes in-situ either directly or though integration with ParaView Catalyst Ocean temperature isosurface generated PISTON isosurface with curvilinear PISTON integration with VTK and across four GPUs using distributed PISTON coordinates ParaView

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith - PowerPoint PPT Presentation

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory Supercomputers! Supercomputer Hardware Advances Everyday More and more parallelism High-Level Parallelism The Free Lunch Is

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

PSHE curriculum Robert Willmott Core Themes Core Theme 1: Health and Core Theme 2: Core Theme

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

unification 2016 unification strategic roadmap succession unification strategic roadmap

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Software Sustainability in the Many-Core Era Jonas Thies > Software Sustainability in the

Many-core Computing Many-core Computing Can compilers and tools do the Can compilers and tools

I nterOperability am ong Grids: A Case Study w ith GARUDA & EGEE Grids Sham jith K. V.

Some Mathematical Challenges from Life Sciences Part I Peter Schuster, Universitt Wien Peter

Signal Detection I m aging in the SEM Images are formed because of the beam interactions that

2017 Background of RCI Business Created in March, 2004 Left John Deere in February

EPOCH Microplate Spectrophotometers 2012.Sep. BioTeks product range: microplate

Gregory Fripp Founder and Executive Director www.whisperingroots.org 402-321-7228

The DelsaMax system Pre-Course Reading material DelsaMax PRO DelsaMax Return to I ndex Service

Information Services and Technology Community Forum Friday, April 28, 2006 1 Agenda and

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith - PowerPoint PPT Presentation

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory Supercomputers! Supercomputer Hardware Advances Everyday More and more parallelism High-Level Parallelism The Free Lunch Is

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

PSHE curriculum Robert Willmott Core Themes Core Theme 1: Health and Core Theme 2: Core Theme

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

unification 2016 unification strategic roadmap succession unification strategic roadmap

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Software Sustainability in the Many-Core Era Jonas Thies &gt; Software Sustainability in the

Many-core Computing Many-core Computing Can compilers and tools do the Can compilers and tools

I nterOperability am ong Grids: A Case Study w ith GARUDA &amp; EGEE Grids Sham jith K. V.

Some Mathematical Challenges from Life Sciences Part I Peter Schuster, Universitt Wien Peter

Signal Detection I m aging in the SEM Images are formed because of the beam interactions that

2017 Background of RCI Business Created in March, 2004 Left John Deere in February

EPOCH Microplate Spectrophotometers 2012.Sep. BioTeks product range: microplate

Gregory Fripp Founder and Executive Director www.whisperingroots.org 402-321-7228

The DelsaMax system Pre-Course Reading material DelsaMax PRO DelsaMax Return to I ndex Service

Information Services and Technology Community Forum Friday, April 28, 2006 1 Agenda and

Software Sustainability in the Many-Core Era Jonas Thies > Software Sustainability in the

I nterOperability am ong Grids: A Case Study w ith GARUDA & EGEE Grids Sham jith K. V.