Analysis of Large-Scale Scalar Data Using Hixels Joshua A. Levine 2 - - PowerPoint PPT Presentation

analysis of large scale scalar data
SMART_READER_LITE
LIVE PREVIEW

Analysis of Large-Scale Scalar Data Using Hixels Joshua A. Levine 2 - - PowerPoint PPT Presentation

LDAV 2011 Analysis of Large-Scale Scalar Data Using Hixels Joshua A. Levine 2 , in collaboration with D. Thompson 1 , J.C. Bennett 1 , P.-T. Bremer 3 , A. Gyulassy 1 , P.P. Pbay 4 , V. Pascucci 1 1 2 3 4 HPC Has Lead to Increases in Both Data


slide-1
SLIDE 1

LDAV 2011

Analysis of Large-Scale Scalar Data Using Hixels

Joshua A. Levine2, in collaboration with

  • D. Thompson1, J.C. Bennett1, P.-T. Bremer3, A. Gyulassy1, P.P. Pébay4, V. Pascucci1

4 3 2 1

slide-2
SLIDE 2

HPC Has Lead to Increases in Both Data Size and Complexity

  • “Hero” runs
  • Increased spatial resolution
  • Increased number of variables
  • Uncertainty Quantification (UQ)
  • Ensembles of runs
  • Polynomial Chaos
  • Stochastic Simulations
  • Many analysis methods do not scale

with size & complexity of the data

Images courtesy of: National Energy Research Scientific Computing Center, Los Alamos National Laboratory, Argonne National Laboratory, and Oak Ridge Leadership Computing Facility.

slide-3
SLIDE 3

Hixels: A Unified Data Representation

  • A hixel is a point with an associated

histogram of scalar values

  • Hixel samples may represent:
  • Spatial down-sampling
  • Ensemble values
  • Random variables
  • Trade data size/complexity for

uncertainty

f h(f)

slide-4
SLIDE 4

1D Example of Hixels (Block Compression)

slide-5
SLIDE 5

Motivation: Feature-Based Analysis

  • Characterize and define features
  • Segmentation domain by function behavior
  • Answer questions:
  • How many features are there?
  • What is the behavior of other variables within

these features?

  • How do you define a good threshold value on

which to segment the domain?

Data courtesy of: Dr. Jacqueline Chen, SNL

slide-6
SLIDE 6

Goal: Extend Topological Methods

  • What structures are present?
  • How persistent are they?
  • How do we visualize features?
  • Our Contributions:
  • 1. Sampled topology
  • 2. Topological analysis of statistically

associated buckets

  • 3. Visualizing fuzzy isosurfaces
slide-7
SLIDE 7

Sampled Topology: Algorithm

  • 1. Sample the hixels to construct a scalar field Vi
  • 2. Compute the Morse complex for Vi

a) Identify basins around minima & arcs between adjacent basins b) Encode arc locations in a binary field Ci

  • Boundaries = 1, Rest = 0
  • 3. Construct aggregate A as mean of the Ci’s
  • 4. Visualize variability of arc locations

Assumption: hixels are independent

slide-8
SLIDE 8

Aggregate Segmentation on Temporal Jet

1 run 16 runs 64 runs 256 runs 16384 runs p = 0.128 p = 0.008 p = 0

1

A

slide-9
SLIDE 9

Convergence of Sampled Topology

slide-10
SLIDE 10

Varying Block Size & Persistence

1x1 1 runs 2x2 512 runs p = 0.064 p = 0.004 p = 0 p = 0.016 p = 0.256 4x4 2048 runs 8x8 8196 runs 16x16 16384 runs

1

A

slide-11
SLIDE 11

Topological Analysis of Statistically Associated Buckets: Algorithm

  • Aimed at recovering prominent features from ensemble data
  • Exploit dependencies between runs
  • Identify regions in space & scalar values consistent with positive association
  • Perform topological segmentation on these regions individually
  • 1. Compute buckets
  • 2. Compute contingency statistics
  • 3. Identify sheets
  • 4. Perform topological analysis on individual sheets
slide-12
SLIDE 12

Computing Buckets

  • Values of high probability associated

with peaks in the histogram

  • Identify peaks + range of function values

around that peak

  • Topological segmentation on histogram
  • Use areal (hypervolume) persistence
  • Weight of interval = area of the histogram
  • Merge until the probability of smallest bucket

is above a particular threshold

bins buckets

slide-13
SLIDE 13

Persistence Simplification of Buckets

Persistence Pairs

slide-14
SLIDE 14

Persistence Simplification of Buckets

slide-15
SLIDE 15

Persistence Simplification of Buckets

slide-16
SLIDE 16

Persistence Simplification of Buckets

slide-17
SLIDE 17

Effect of Persistence on Bucket Count

p = 16 p = 32 p = 64 p = 128 p = 256 p = 512

Number of Buckets Persistence Threshold (p)

slide-18
SLIDE 18

Contingency Tables on Bucketed Hixels

e f g b a c d h f i y x j h2 h1 h3

h1-h2 e f g a 4 2 b 2 3 1 c 5 1 d 6 h1-h3 h i j a 5 1 b 1 4 1 c 2 4 d 1 5

slide-19
SLIDE 19

Pointwise Mutual Information (PMI) Encodes Association Between Hixels

 

 

    

        y p x p y x p y x

Y X Y X

, log : , pmi

,

Goal: Identify buckets that co-

  • ccur more frequently than if

statistically independent pmi(x,y)=0 => x independent y

e f g b a c d h f i y x j h2 h1 h3

slide-20
SLIDE 20

Positive PMI Constructs Sheets of Statistically Associated Buckets

Before: Bucketed Hixels

slide-21
SLIDE 21

Positive PMI Constructs Sheets of Statistically Associated Buckets

After: Sheets Connecting Buckets

slide-22
SLIDE 22

An Ensemble of Mixed Distributions

  • 512 x 512 hixels, 128 bins each
  • 3200 samples from Poisson distribution
  • l is a 100 at 5 source points in a circle
  • l decreases to 12 distance from source points
  • 9600 samples from a Gaussian distribution
  • m & s are min & max at 4 points in a circle
  • m & s vary distance from source points

 µ

Mean Poisson Surface Mean Gaussian Surface

slide-23
SLIDE 23

Mean Surface (Yellow) for Combined Samples

An Ensemble of Mixed Distributions

Mean Poisson Surface Mean Gaussian Surface

slide-24
SLIDE 24

“Simple” Topological Tests Fail!

  • Probability that each hixel corresponds to
  • Minimum ~ 20%
  • Maximum ~ 20%
  • Saddle ~ 7%
  • Regular point ~ 53%

Sample Frequency

slide-25
SLIDE 25

Sheets Isolate Prominent Features

Basins of Minima Basins of Maxima

slide-26
SLIDE 26

Sheets for Lifted Ethylene Jet

Buckets per hixel

slide-27
SLIDE 27

Visualizing Fuzzy Isosurfaces: Algorithm

  • 1. Compute likelihood function g
  • 2. Volume render g
  • Provides a fuzzy description of the

likelihood of where an isosurface exists

           

  • therwise

, , , a b b a a b b a g

a b f h(f) k

slide-28
SLIDE 28

Comparison to Downsampling

Fuzzy iso Mean Lower left 43 83 163 323 643

g

   

slide-29
SLIDE 29

Fuzzy Isosurface of Temporal Jet

Likelihood that isovalue k = 0.506 passes through a hixel

23 83 323

g

   

slide-30
SLIDE 30

Conclusions and Summary

  • Unified representations of large scalar

fields from various modalities

  • 3 proof of concept applications
  • Sampled topology
  • Topological analysis of statistically associated

buckets

  • Visualizing fuzzy isosurfaces

bins buckets

slide-31
SLIDE 31

Future Work

  • Larger ensembles/larger data
  • Performance/scaling
  • Infer sheets from multivariate hixels
  • Issues to study
  • What is preserved by hixels vs. resolution loss
  • Identify appropriate number of bins/hixel
  • Persistence thresholds for bucketing algorithm
  • Balance data storage vs. feature preservation
  • What topological features can/cannot be preserved by

hixelation

bins buckets

slide-32
SLIDE 32

Acknowledgement

Contact: Joshua A. Levine jlevine@sci.utah.edu

This work was supported by the Department of Energy Office of Advanced Scientific Computing Research, award number DE-SC0001922. Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94AL85000. This work was also performed under the auspices of the US Department of Energy (DOE) by the Lawrence Livermore National Laboratory under contract nos. DE-AC52-07NA27344, LLNL-JRNL-412904L and by the University of Utah under contract DE-FC02-06ER25781. We are grateful to Dr. Jacqueline Chen for the combustion data sets and M. Eduard Göller, Georg Glaeser, and Johannes Kastner for the stag beetle dataset.