Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy - - PowerPoint PPT Presentation

exa dm enabling scientific discovery in exascale
SMART_READER_LITE
LIVE PREVIEW

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy - - PowerPoint PPT Presentation

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 , George Karypis 2 , Chandrika Kamath 1 1 Lawrence Livermore National Laboratory 2 University of Minnesota DOE Exascale Research Conference October


slide-1
SLIDE 1

Exa-DM: Enabling Scientific Discovery in Exascale Simulations

Jeremy Iverson1,2, Ya Ju Fan1, George Karypis2, Chandrika Kamath1

1Lawrence Livermore National Laboratory 2University of Minnesota

DOE Exascale Research Conference October 2012

LLNL-PRES-584212 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 1 / 20

slide-2
SLIDE 2

Outline

1

Motivation

2

In-situ analysis

3

Compression using graph-based clustering

4

Compressed sensing

5

Next steps

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 2 / 20

slide-3
SLIDE 3

Motivation

How will analysis of simulation output change at the exascale?

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 3 / 20

slide-4
SLIDE 4

Motivation

Present: write out the simulation output for analysis

Problem: identify and track coherent structures in plasma turbulence. 64 poloidal planes, 600,000 grid points, 10 variables per grid point 8000 time steps, output every 5 time steps Unstructured mesh, 2.5TB of data

Need a single analysis algorithm and parameters for all time steps. The results may be unexpected....

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 4 / 20

slide-5
SLIDE 5

Motivation

The present approach will not work at the exascale

Present Future ...

Exa-DM: Find ways of intelligently reducing the size of the output so we can still enable scientific discovery at the exascale.

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 5 / 20

slide-6
SLIDE 6

Motivation

We are investigating several different solutions

Move the analysis in-situ, but need to

know which analysis algorithm and parameters to use modify algorithms: low memory sizes and high cost of data movement co-exist with the simulation

Exploit similarity between coherent structures and clustering Create general reduced representations, such as compressive sensing We consider the problem of detection and tracking of coherent structures, which occurs in fusion, materials science, combustion, and other domains.

A collaboration with Zhihong Lin (UCI, GSEP SciDAC PI) and Sean Garrick (UMN), who provide data and domain expertise in fusion and chemically reacting flows, respectively. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 6 / 20

slide-7
SLIDE 7

In-situ analysis

An in-situ version of a threshold-based algorithm

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 7 / 20

slide-8
SLIDE 8

In-situ analysis

Threshold-based algorithm to extract coherent structures

Calculate and apply the threshold Extract the structures using connected component analysis

Given input Desired output

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 8 / 20

slide-9
SLIDE 9

In-situ analysis

Parallel connected component analysis

Identify local connected components Make local labels globally unique Exchange ghost labels and identify cross-PE merges All-to-all exchange of merges Local playback of merges

Schematic of connected components in parallel†

Work-in-progress: We are investigating variants of the basic method and analyzing their scalability.

† Schematic adapted from C. Harrison, H. Childs, and K.P. Gaither, “Data-parallel mesh connected components labeling and

analysis,” Eurographics Symposium on Parallel Graphics and Visualization, 2011. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 9 / 20

slide-10
SLIDE 10

Compression using graph-based clustering

Fast algorithms for compression using graph-based clustering

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 10 / 20

slide-11
SLIDE 11

Compression using graph-based clustering

Goal: reduce the data required for accurate reconstruction

Exploit the grid topology and local smoothness Model the grid as a graph Decompose graph into sets of vertices satisfying an error constraint Represent each set of vertices by a single value Encode sets and their representative values Compress encoding using auxiliary compression program

Set-based (use only values) Region-based (also use topology)

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 11 / 20

slide-12
SLIDE 12

Compression using graph-based clustering

We considered two encodings

Scalar quantization: Differential encoding:

For more details: J. Iverson, C. Kamath, G. Karypis, ”Fast and effective lossy compression algorithms for scientific datasets,” Europar, Rhodes Island, Greece, 2012. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 12 / 20

slide-13
SLIDE 13

Compression using graph-based clustering

We outperform state-of-the-art lossy compression methods

Observations on 7 datasets (structured and unstructured): Can compress to 2-5% of original size Set-based outperforms at all PSNR levels At lower reconstruction error, gap between set-based and other methods grows

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 13 / 20

slide-14
SLIDE 14

Compressed sensing

General reduced representations using compressed sensing

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 14 / 20

slide-15
SLIDE 15

Compressed sensing

CS: applicable to data that are sparse in some basis

Signal X of length N is sparse: if K = # nonzeros, then K << N Generate a reduced representation of X Y = ΦX where Φ ∈ RM×N and M ≈ K Recover signal by solving a linear programming problem min

X X1

such that Y = ΦX

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 15 / 20

slide-16
SLIDE 16

Compressed sensing

How do we select Φ ?

Our data are sparse The sensing matrix Φ is chosen to be a random matrix†. Related to random projections (machine learning), sketches (data management), and the Johnson-Lindenstrauss Lemma (theoretical computer science).

† Figure from R. Baraniuk, “Compressive Signal Processing,” Talk presented at the CASIS workshop, LLNL, May 2012;

available from casis.llnl.gov. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 16 / 20

slide-17
SLIDE 17

Compressed sensing

Our early results indicate that this works!

Original, N=2524,K = 531 M= 900 M= 1050 M= 1100

M= 900 M = 1050 M = 1100 R2 0.59 0.86 1.0 error2 6.08 3.486 0.0016

error = original − reconstructed R2 = 1.0 −

  • i xi − ˆ

xi

  • i xi − ¯

x , ¯ x = 1 N

  • i

xi

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 17 / 20

slide-18
SLIDE 18

Next steps

Next steps

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 18 / 20

slide-19
SLIDE 19

Next steps

We continue our investigations into potential solutions

Run parallel versions of the connected components algorithms to generate scaling results. Extend clustering to multivariate and temporal data. Understand better the applicability and effectiveness of compressed sensing in the context of our problem. Compare different compression techniques to evaluate their scalability, accuracy, and applicability to datasets from simulations. Our goal: Intelligently reduce the amount of data written out by extreme-scale simulations while still enabling scientists to perform data analysis and scientific discovery.

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 19 / 20

slide-20
SLIDE 20

Next steps

Acknowledgments

Zhihong Lin (UCI) and Sean Garrick (UMN) for data and domain expertise, as well as others who provided datasets in the public domain. Funding sources: ASCR Exascale Program

For more details https://computation.llnl.gov/casc/sapphire/ https://computation.llnl.gov/casc/StarSapphire/ Chandrika Kamath, kamath2@llnl.gov

Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 20 / 20