End-to-End In Situ Data Processing and Analytics
Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University
End-to-End In Situ Data Processing and Analytics Han-Wei Shen - - PowerPoint PPT Presentation
End-to-End In Situ Data Processing and Analytics Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University In Situ Processing and Visualization ExaFLOPs supercomputers is becoming a reality (exa =
Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University
Supercomputer
Simulator Raw data
Disk I/O Post-analysis
Memory
I/O
Supercomputer (in-situ analysis)
Simulator Raw data
Disk I/O Post-analysis
Memory Data Proxy
Reconstruction Visualization In-situ data processing
Data Proxy
I/O
Data Proxy
I/O
distributions
visualization capabilities
uncertainty
novel parameter configurations
Storage
Data Summaries
Histogram Gaussian Mixture Model Gaussian
In Situ Data Reduction and Transformation
Post-Hoc Analysis and Visualization
106 pixels) is relatively smaller than in object space (~ 109~15 voxels)
limited ability to explore the occluded features
9
One pixel frustum Subpixel ray casting Histogram
10
One pixel frustum
11
Super Computer
12
Post Analysis Machine
13
Transfer function Curve: opacity function Histogram
value with high frequency
value with high frequency
14
Transfer function Curve: opacity function Histogram Opacity function Histogram Importance distribution
! " # = !(# " ∗ !(")
15
Image from Proxy (PSNR: 37.07) 15.3GB Image from Raw Data 271GB
(per view and time step)
warped to different views
17
Block Distributions Spatial Distribution (GMM) Data Modeling (A Local Block) Partition Raw Data Value Estimation (Bayes’ Rule) Value estimation (PDF) at location, ℓ Any spatial location (ℓ ) Statistical Visualizations from PDFs
,(%&) ∑012
345 ,(%0)
18
Prob. Data Value Data of a block
19
Block Histogram Spatial Distribution (GMM) Data Modeling (A Local Block) Partition Raw Data Value Estimation (Bayes’ Rule) Value estimation (PDF) at location, ℓ Any spatial location (ℓ ) Statistical Visualizations from PDFs
!%&$}
GMM
20
EM Algorithm
probability density function for each value interval (V)
evidences
histogral
GMMs at
Prob. Prob.
22
Block histogram
Size: 131.4MB Block size: 22"
Block histogram w/ interpolation
Size: 131.4MB Block size: 22"
Block GMM
Size: 163.71MB Block size: 10"
Our approach
Size: 151.54MB Block size: 32" Number of Gaussians: 4
Raw data
Size: 10871MB
Volume rendering from the reconstructed volume of Turbine pressure variable
! ⃗ # = ∑&'(
)
*&+( ⃗ #|.&, Σ&)
angle: GMM ℎ(3) = ∑&'(
)
*&+(3|.4
&, Σ4&)
ᶿ ᶿ ᶿ
)
#4, … , ⃗ #67(}
#| ⃗ #4, … , ⃗ #67(
#| ⃗ #67(
#| ⃗ #%&' = ) ∗ ∑01'
2
304 + ⃗ #%&', 50 56
0, Σ60
4 ⃗ # 50, Σ0
# = ∑*+'
,
#|0*, Σ*)
#| ⃗ #%&' = 4 ∗ ∑*+'
,
#%&', 0* 07
*, Σ7*
. ⃗ # 0*, Σ*
8
*+' ,
#|0*, Σ*) .(6|07
9, Σ79)
.(6|07
', Σ7')
! ⃗ #| ⃗ #%&'
#| ⃗ #%&' from the second step
#| ⃗ #%&'
#| ⃗ #%&' from the second step
#| ⃗ #%&' only when the mean of the winding angle distribution has an absolute value larger than a threshold
Baseline Monte Carlo Conditional Monte Carlo
Data Reduction Single Line Tracing Monte Carlo Tracing Baseline Our Method Baseline CMV Baseline CMC Time (s) 73.35 76.53 0.1003 0.1080 3.307 5.480
# = ∑&'(
)
*&+( ⃗ #|.&, Σ&), plus ℎ(3) = ∑&'(
)
*&+(3|.4
&, Σ4&)
1
K i i i i
=
30
Incremental estimation of temporal data distribution Update block distributions incrementally
, , 1 ,
i t i t i t
, , 1 ,
i t i t i t
2 2 2 , , 1 , ,
(1 ) ( )
i t i t i t i t
x s b s b µ
New data points observed GMM before update at t = t0 Distribution after update at t = t1
, , , ,
foreground t i t i t i t
32
High similarity value Low similarity value Target distribution
, , ,
( ) 1 ( , )
similarity t i t norm i t t
Possibility b b f y = -
/ / /
n m i j i j i j
= =
33
+ =
Foreground measure Similarity measure Final combined field
( ) * ( ) (1 )* ( )
feature i similarity i foreground i
Possibility b Possibility b Possibility b g g = +
35
Extract and track features using classification fields Generate classification field using (1) foreground information (2) similarity measure Incremental estimation of temporal data distribution Update block distributions incrementally Estimate foreground possibility Estimate similarity with target Feature –aware classification field
+
T=10 T=20 T=40
38
A probability distribution field
40
binary tree using hierarchical clustering
binary tree using hierarchical clustering
concentration (CHL) on all 600 ensemble members
43
44
45