Data Management, In-Situ Workflows and Extreme Scales Manish - PowerPoint PPT Presentation

Data Management, In-Situ Workflows and Extreme Scales Manish Parashar, Ph.D . Director, Rutgers Discovery Informatics Institute RDI 2 Distinguished Professor, Department of Computer Science Philip Davis, Shaohua Duan, Yubo Qin, Melissa Romanus, Pradeep Subedi, Zhe Wang ROSS 2018 @ HPDC’18, Tempe, AZ, USA June 12, 2018

Outline • Extreme scale simulation-based science – opportunities and challenges • Rethinking the simulations-to-insights pipeline: Data staging and in-Situ workflows • Runtime management for data staging and in-situ workflows – Data placement – Resilience • Conclusion

Science / Society Transformed by Compute & Data • computation & da The sc scientific process ss has evolved to include co data • Nearly every field of discovery is transitioning from “data poor” to “ da data rich ” Oceanography Neuroscience: EEG, fMRI OOI Crisis Management POS Terminals Economics: KSTAR Fusion: Sociology: The Web Physics: LHC Internet of Things Biology: Sequencing Astronomy: LSST Personalized Medicine

Moving Aggressively Towards Exascale • Create systems that can apply exaflops of computing power to exabytes of data • Improve HPC application developer productivity • Establish hardware technology for future HPC systems • ….

Moving Aggressively Towards Exascale

Moving Aggressively Towards Exascale Source: Hyperion (IDC) Paints a Bullish Picture of HPC Future By John Russell

Extreme Scales => Extreme Challenges • Exponential increase in parallelism • Extreme core counts, concurrency • Diversity in emerging memory and storage technologies • New memory technologies • Increasing performance gap between memory and disks • Growing data volumes, increasing in data costs • Data access costs vary widely with the location • Variability and heterogeneity in data movement cost (performance, energy) • Increasingly heterogeneous machine architecture • Complex CPU + accelerator architectures • Proliferation of accelerators • Diverse and complex application/user requirements • Complex application workflows; complex mapping onto heterogeneous systems • Large numbers of domain scientists and non-experts • Reliability, energy efficiency, correctness, ….

RDI 2 Scientific Discovery through Simulations: A BigData Problem • Scientific simulations running on current high-end computing systems generate huge amounts of data! – If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~ 170TB per hour -> ~ 700PB per day -> ~ 1.4EB per year • Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to efficiently manage and explore extreme scale data: “find the needles in haystack” ??

RDI 2 Traditional Simulation -> Insight Workflows Break Down Storage Servers n o i t a l u m i S a t a D w a R Analysis/Visualization Simulation Cluster Machines … Figure. Traditional data analysis pipeline – Perform data manipulations and • Traditional simulation -> insight analysis on mid-size clusters pipeline – Collect experimental / observational – Run large-scale simulation data workflows on large supercomputers – Move to analysis sites – Dump data to parallel disk systems – Perform comparison of – Export data to archives experimental/observational to – Move data to users’ sites – usually validate simulation data selected subsets

The Cost of Data Movement • The energy cost of moving data is a • Moving data between node significant concern memory and persistent storage is slow! performance gap bitrate*length 2 Energy_move_data = cross_section_area_of_wire K. Yelick, “Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer”

We need to Rethink extreme scale simulation workflows! Traditional data analysis pipeline The costs of data movement (power and performance) are increasing and dominating! We need to Rethink extreme scale simulation workflows! – Reduce data movement – Move computation/analytics closer to the data – Add value to simulation data along the IO path In-situ workflows, In-transit processing

Some Recent Research Addressing In-Situ • Swift/T • SuperGlue – Workflow coordination, all – Standardizing glue components for applications share an MPI context, HPC workflows which is split by an execution • Landrush wrapper – Leverage heterogeneous compute • Catalyst and Libsim node resources like GPUs to run – Embed analysis/viz in simulation in-situ workflow processes using time division. • Damaris • ADIOS – Leverages dedicated cores in – Flexible I/O abstractions for end to multicore nodes to offload data end data pipelines management tasks • FlowVR • Mercury – Independent task coordination – RPC and bulk message passing across processes. across applications. • Decaf • FlexPath – Decoupled dataflow middleware – Communication between MPI for in-situ workflows applications using a reliable transport. • Bredala – Semantic data redistribution of • DataSpaces complex data structures for in-situ applications

RDI 2 Rethinking the Data Management Pipeline – Hybrid Staging + In-Situ & In-Transit Execution • Reduce data movement • Move computation/analytics closer to the data source • Process, transform data along the data path

DataSpaces: Extreme Scale Data Staging Service The DataSpaces Abstraction Virtual shared-space programming abstraction l Simple API for coordination, interaction and messaging l Distributed, associative, in-memory object store l Online data indexing, flexible querying l Adaptive cross-layer runtime management l Hybrid in-situ/in-transit execution l Efficient, high-throughput/low-latency asynchronous data transport l

The DataSpaces Staging Abstraction • In-memory storage distributed across set of cores/node • In-staging data processing, querying sharing and exchange Runtime data coupling Data staging Online data analysis and processing 15

Design Space for Staging • Location of the compute resources Analysis Tasks DRAM DRAM Simulation – Same cores as the simulation (in situ) Simulation Node Visualization – Some (dedicated) cores on the same nodes Network Communication Sharing cores with the simulation – Some dedicated nodes on the same machine DRAM DRAM – Dedicated nodes on an external resource Staging Node Processing data on remote nodes Using distinct cores on same node • Data access, placement, and persistence – Direct access to simulation data structures Node N ... – Shared memory access via hand-off / copy CPUs – Shared memory access via non-volatile near Node 2 Node 1 Staging option 1 node storage (NVRAM) DRAM Staging option 2 – Data transfer to dedicated nodes or external CPUs NVRAM resources Staging option 3 SSD DRAM • Synchronization and scheduling Hard Disk NVRAM – Execute synchronously with simulation SSD every n th simulation time step Hard Disk – Execute asynchronously Network 16

Extreme Scale Storage Hierarchies: Devices SRAM: Latency ~1X DRAM: Latency ~10X 3D-RAM: Latency ~100X NAND-SSDs: Latency ~100,000X Disks: Latency ~10 MillionX 17

Extreme Scale Storage Hierarchies: Architectures • Non-volatile memory attached to nodes or to burst-buffer nodes • Storage nodes accessed via PFS (Lustre) or object stores (DAOS) 18

Time-Sensitivity of Data Storage in Scientific Workflows Credit: Gary Grider LANL 19

Outline • Extreme scale simulation-based science – opportunities and challenges • Rethinking the simulations-to-insights pipeline: Data staging and in-Situ workflows • Runtime management for data staging and in-situ workflows – Data placement – Resilience • Conclusion 20

In-Staging Data Management • Limited DRAM capacity, and decreasing bandwidth vs. increasing data size -- need to use multiple memory levels for staging • Effectiveness of staging is sensitive to the data placement across the staging cores/nodes and the levels of the memory hierarchy – Data access latency can significant impact the overall performance of the workflows • Efficient data placement can be challenging because of the complex and dynamic data exchange/access patterns exhibited by the different components of the workflow, and by different workflows

Example: Managing Multi-tiered Data Staging in DataSpaces A multi-tiered data staging approach that leverages both DRAM and SSD to • support code coupling and data management in data-intensive simulation workflows. Efficient utility-based application-aware data placement mechanism • -- Application-aware: utilizing temporal and spatial data access attributes -- Adaptive: placing data objects dynamically based on data read patterns

Data Management, In-Situ Workflows and Extreme Scales Manish - PowerPoint PPT Presentation

Data Management, In-Situ Workflows and Extreme Scales Manish Parashar, Ph.D . Director, Rutgers Discovery Informatics Institute RDI 2 Distinguished Professor, Department of Computer Science Philip Davis, Shaohua Duan, Yubo Qin, Melissa Romanus,

RECENT USES OF IN SITU STABILIZATION, IN SITU CHEMICAL OXIDATION, AND IN SITU CHEMICAL

Momentum i i Filtered Filtered = Momentum v f x G

Extreme Heat Preparedness Objectives What is extreme heat ? How does it impact SF? What are the

2014: Extreme territories 2 2015: Extreme territories 3 2016: Extreme territories 4 2018:

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Waste Data Automation Alan Housley Vice President Marketing / LoadMan On-Board Truck Scales

Waste Data Automation Alan Housley Vice President Marketing / LoadMan On-Board Truck Scales

MATHEMATICS 1 CONTENTS Extreme values in one dimension Extreme values in two dimensions

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Spring Scales Theyre only accurate when everything is at rest Turn off all electronic

Extreme Value Theory in Risk Management See McNeil, Extreme Value Theory for Risk Managers Risk

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata

Workflows Description, Workflows Description, Enactment and Monitoring in Enactment and

In Situ I/O Processing: A Case for In Situ I/O Processing: A Case for Location Flexibility

An in situ sediment sound speed An in situ sediment sound speed measurement platform:

Ex-situ and in-situ studies of radiation damage mechanisms in Zr-Nb alloys Junliang Liu 1 , Guanze

In Situ Measurements of Jet Energy Scale in ATLAS Doug Schouten, Andres Tanasiczjuk, and Mike

In-situ MapReduce for Log Processing Dionysios Logothe9s,

In Situ Visualization using VisIt Brad Whitlock Jean M. Favre Jeremy S. Meredith Lawrence

HUMAN-COMPUTER CO-CREATION Anna Kantosalo Matemaattis-luonnontieteellinen tiedekunta CC-2017

Enabling Precision W and Z Physics at ILC with In-Situ Center-of-Mass Energy Measurements (plus

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics

Against a feature driven view of wh -movement Hadas Kotek McGill University DGfS workshop

CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Distributed