End-to-End In Situ Data Processing and Analytics Han-Wei Shen - PowerPoint PPT Presentation

End-to-End In Situ Data Processing and Analytics Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University

In Situ Processing and Visualization • ExaFLOPs supercomputers is becoming a reality (exa = 1,000,000,000,000,000,000) • Number of cores per processor will increase • Memory per core will decrease • The speed and size of memory and I/O devices cannot keep pace with the increase of compute power • Cost of moving data will increase • It will be very difficult for scientists to store and analyze even a small portion of their simulation output In situ Visualization Generating Visualization While the Simulation is Still Running

Characteristics of In Situ Visualization • Data are transient; only available for a short time • Mainly batch mode processing; Interactive exploration is not possible • Need to know what is needed a priori; Salient information might not be found • Limited parameters to explore; Sophisticated visualization is not possible Disk I/O I/O Simulator Raw data Memory Supercomputer Post-analysis

In Situ Visualization Strategies • Generate images from preselect parameters (e.g. Catalyst, Libsim) • Database from a large collection of images (e.g. Cinema Project) • Visualization with explorable contents (e.g. Explorable Images) • Feature extraction (e.g. Contour trees, flowlines) • Data Reduction – Compact data representation or representative samples or time steps (e.g. compression, key time steps) Disk Data In-situ Data I/O I/O I/O Proxy Reconstruction data processing Proxy Data Proxy Visualization Simulator Raw data Memory Supercomputer (in-situ analysis) Post-analysis

In Situ Visualization Software • Application aware vs. not • Tightly or loosely coupled • Shallow or deep copy • Space or time share • Data synchronization and communication • Software control (automatic or human control) • Proximity: Same or different machines • Single or multi purpose (e.g. ADIOS) APIs • Types of output (data, images, etc)

Distribution-based In Situ Analytics @ OSU Approaches Goals • Probability Distributions • Preserve collected as in situ time • Important data characteristics • Field values and feature locations • Block or particle based • Allow • Histograms, GMMs • Post-hoc analysis with standard • Multivariate visualization capabilities • Distribution-based post-hoc • Quantitative analysis of quality of uncertainty analysis • Interactive data driven queries • Resampling based visualization • Predict • Direct inference based on • Results of simulations with distributions novel parameter • Interactive data queries configurations

In Situ Research @OSU Histogram Gaussian Mixture Model Gaussian Data Summaries Storage In Situ Data Reduction and Transformation Post-Hoc Analysis and Visualization Visualization and Analytics: • Distribution Modeling: • • Sampling Spatial Partition • • Scalar data visualization algorithms Field and particle data • Vector data visualization algorithms • Image space (View dependent) • • Feature tracking • Object space Distribution Exploration • Multivariate • • Distribution Search Time-varying • • Ensemble data analysis Ensemble data •

View Dependent Distributions Proxy Methods Motivations • Image space approaches have • Collects samples during emerged as a promising method volume ray casting • The scale of data defined in image space (~ 10 6 pixels) is relatively smaller than in object • Allows change of transfer space (~ 10 9~15 voxels) functions in post-hoc • Freely explore the occluded analysis features • Errors are constrained in • Existing image-based approaches have the depth dimension limited ability to explore the occluded features • Warping the samples to • Inevitable data loss in the compact different views are representation possible

View Dependent Proxy Construction • Image-based proxy is constructed at each selected view • Subpixel ray casting to collect samples in the pixel frustum • Histogram is used to statistically summarize data in the pixel frustum One pixel frustum Subpixel ray casting Histogram 9

Irregular Frustum Subdivision • Histogram does not keep samples’ order in the pixel frustum • Samples‘ order is critical to provide depth cue in rendering • A pixel frustum is sub-divided into sub-frustums which are summarized by histograms • More sub-frustums: more accurate samples’ order and store more histograms One pixel frustum 10

Data Visualization in Post Analysis Machine Super Computer 11

Data Visualization in Post Analysis Machine Post Analysis Machine 12

Importance Sampling • Samples drawn from a histogram are biased towards to the value with high frequency • Samples with high frequency may have low opacity • Interesting features consist of samples with high opacity • Importance sampling • Combine histogram and opacity function Histogram Transfer function Curve: opacity function 13

Importance Sampling • Samples drawn from a histogram are biased towards to the value with high frequency • Samples with high frequency may have low opacity • Interesting features consist of samples with high opacity • Importance sampling • Combine histogram and opacity function Histogram Histogram Importance distribution ! " # = ! ( # " ∗ !(") Transfer function Opacity function Curve: opacity function 14

Quality and Storage • Turbine dataset • 50 time steps • 6 views proxy • Budget: 50MB (per view and time step) Image from Proxy (PSNR: 37.07) Image from Raw Data 15.3GB 271GB 15

Object Space Distributions Proxy Arbitrary view exploration • Option 1: Samples generated from the view dependent proxies can be warped to different views • Option 2: Create object space distributions

Data Modeling – Block Histogram Data Modeling (A Local Block) Statistical Visualizations Value Estimation from PDFs Partition Any Block Distributions (Bayes’ Rule) Raw Data spatial location ( ℓ ) Value estimation (PDF) at location, ℓ Spatial Distribution (GMM) 17

Data Modeling – Block Distributions • Block histogram or value GMM summarizes data samples in a block • Bin ! " represents a continuous data value range [$ % & , ( % & ] ,(% & ) • * ! " = 345 ,(% 0 ) ∑ 012 • 6(! 7 ) : number of grid points whose values are in range $ % 0 , ( % 0 Prob. Data Value Data of a block 18

Data Modeling – Spatial Distribution Data Modeling (A Local Block) Statistical Visualizations Value Estimation from PDFs Partition Any Block Histogram (Bayes’ Rule) Raw Data spatial location ( ℓ ) Value estimation (PDF) at location, ℓ Spatial Distribution (GMM) 19

Data Modeling – Spatial Distribution • Block histogram does not retain samples’ locations • Each bin creates a spatial distribution: { ! " , ! $ , … ! %&$ } • ! ' ( : maps a spatial location ( ℓ ) to a probability • how likely ℓ has a sample whose value within the range of + , • Estimated by a multivariate GMM (Spatial GMM) • Spatial GMM modeling EM Algorithm • Collects coordinates of all grid points assigned to bin + , • Uses EM algorithm to estimate the parameters of the GMM • Repeat the process for each bin 20

Value Estimation at a location X • Spatial GMMs to model spatial P(v| x ) ~ P( x |v) * P(v) probability density function for each value interval (V) Prob. • Bayes’ rule • The prior is adjusted by the related evidences • Prior P( v ) : block distribution/ histogral • Evidences: probabilities of spatial GMMs at • Posterior: estimated PDF at x Prob.

Post-Hoc Analysis Sampling-based Volume Rendering Raw data Block histogram Block histogram w/ Block GMM Our approach Size: 10871MB Size: 131.4MB interpolation Size: 163.71MB Size: 151.54MB Block size: 22 " Block size: 10 " Block size: 32 " Size: 131.4MB Number of Gaussians: 4 Block size: 22 " Volume rendering from the reconstructed volume of Turbine pressure variable 22

Particle Tracing in Distribution Fields • Representing the vectors in the block using Gaussian mixture model (GMM): ) # = ∑ &'( ! ⃗ * & +( ⃗ #|. & , Σ & ) • Th e vector transition information can also be represented by GMMs of winding angle: GMM ℎ(3) = ∑ &'( ) * & +(3|. 4 & , Σ 4& ) ᶿ ᶿ ᶿ

Particle Tracing in Distribution Fields • What to do with vector GMM of vector ! ⃗ # = ) ∑ &'( * & +( ⃗ #|. & , Σ & ) • Use Monte Carlo sampling to trace a bundle of traces • Use the mean vector to trace a single trace • ! ⃗ # is an unconditional distribution • Condition of ! ⃗ # ? • Have already traced the particle for 2 steps, by { ⃗ # 4 , … , ⃗ # 67( } • Conditional distribution ! ⃗ #| ⃗ # 4 , … , ⃗ # 67( • Assume a Markov model • Conditional distribution ! ⃗ #| ⃗ # 67(

Particle Tracing in Distribution Fields • Conditional distribution ! ⃗ #| ⃗ # %&' • Bayes Theorem • ! ⃗ #| ⃗ # %&' = ) ∗ ! ⃗ # ∗ ! ⃗ # %&' | ⃗ # • Replace ⃗ # %&' with its angle with ⃗ # : +( ⃗ # %&' , ⃗ #) • ! ⃗ #| ⃗ # %&' = ) ∗ ! ⃗ # ∗ ! +( ⃗ # %&' , ⃗ #)| ⃗ # • As a result • ! ⃗ #| ⃗ # %&' = ) ∗ 2 5 6 0 , Σ 60 ∑ 01' 3 0 4 + ⃗ # %&' , 5 0 4 ⃗ # 5 0 , Σ 0

End-to-End In Situ Data Processing and Analytics Han-Wei Shen - PowerPoint PPT Presentation

End-to-End In Situ Data Processing and Analytics Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University In Situ Processing and Visualization ExaFLOPs supercomputers is becoming a reality (exa =

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

RECENT USES OF IN SITU STABILIZATION, IN SITU CHEMICAL OXIDATION, AND IN SITU CHEMICAL

In Situ I/O Processing: A Case for In Situ I/O Processing: A Case for Location Flexibility

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

An in situ sediment sound speed An in situ sediment sound speed measurement platform:

Ex-situ and in-situ studies of radiation damage mechanisms in Zr-Nb alloys Junliang Liu 1 , Guanze

Current distribution in PEMFC: I-Validation step by ex-situ and in-situ electrical

In Situ X-ray Structural Analysis of In Situ X-ray Structural Analysis of Nanoscale Molecular

The In Situ Situ Stress Field of the West Tuna Area, Stress Field of the West Tuna Area,

Nuclear techniques for the Nuclear techniques for the in- -situ detection of mineral situ

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

ROBOTICS at Carleton the core in-situ resource utilisation (ISRU) technology Prof Alex Ellery

Community Presentation July 2010 1 SWVP-022790 Project Overview Florence Copper Project

Company Presentation May 2018 Raw Materials and Technology for the CleanTech Revolution

Architecture, implementation and application of soil moisture in-situ sensor network across

NCSS NCSS NRCS NRCS NCSS NCSS National Cooperative Soil Survey National Cooperative Soil

2019-2020 Recommendations for Influenza Vaccination MCAAP Webinar 11-21-19 Susan M. Lett, MD,

THE LATEST DEVELOPMENTS IN SCIENTIFIC REVIEW, REGULATION AND MARKETING AUTHORISATION PROCEDURES

NATIONAL EXPERIENCE OF APPLICATION OF THE REQUIREMENTS FOR MARKETING AUTORISATIONS - considering

End-to-End In Situ Data Processing and Analytics Han-Wei Shen - PowerPoint PPT Presentation

End-to-End In Situ Data Processing and Analytics Han-Wei Shen Professor Department of Computer Science and Engineering The Ohio State University In Situ Processing and Visualization ExaFLOPs supercomputers is becoming a reality (exa =

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

RECENT USES OF IN SITU STABILIZATION, IN SITU CHEMICAL OXIDATION, AND IN SITU CHEMICAL

In Situ I/O Processing: A Case for In Situ I/O Processing: A Case for Location Flexibility

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

An in situ sediment sound speed An in situ sediment sound speed measurement platform:

Ex-situ and in-situ studies of radiation damage mechanisms in Zr-Nb alloys Junliang Liu 1 , Guanze

Current distribution in PEMFC: I-Validation step by ex-situ and in-situ electrical

In Situ X-ray Structural Analysis of In Situ X-ray Structural Analysis of Nanoscale Molecular

The In Situ Situ Stress Field of the West Tuna Area, Stress Field of the West Tuna Area,

Nuclear techniques for the Nuclear techniques for the in- -situ detection of mineral situ

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

ROBOTICS at Carleton the core in-situ resource utilisation (ISRU) technology Prof Alex Ellery

Community Presentation July 2010 1 SWVP-022790 Project Overview Florence Copper Project

Company Presentation May 2018 Raw Materials and Technology for the CleanTech Revolution

Architecture, implementation and application of soil moisture in-situ sensor network across

NCSS NCSS NRCS NRCS NCSS NCSS National Cooperative Soil Survey National Cooperative Soil

2019-2020 Recommendations for Influenza Vaccination MCAAP Webinar 11-21-19 Susan M. Lett, MD,

THE LATEST DEVELOPMENTS IN SCIENTIFIC REVIEW, REGULATION AND MARKETING AUTHORISATION PROCEDURES

NATIONAL EXPERIENCE OF APPLICATION OF THE REQUIREMENTS FOR MARKETING AUTORISATIONS - considering

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues