Research in Middleware Systems For In-Situ Data Analytics and - - PowerPoint PPT Presentation

research in middleware systems for in situ data
SMART_READER_LITE
LIVE PREVIEW

Research in Middleware Systems For In-Situ Data Analytics and - - PowerPoint PPT Presentation

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others) Outline Middleware Systems Work on In Situ Analysis


slide-1
SLIDE 1

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis

Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others)

slide-2
SLIDE 2

Outline

  • Middleware Systems

– Work on In Situ Analysis – Analysis of Instrument Data

  • Compression/Summarization of Streaming

Data

– Post analysis using just summary

slide-3
SLIDE 3

In Situ Analysis – Simulation Data

  • In-Situ Algorithms

– No disk I/O – Indexing, compression, visualization, statistical analysis, etc.

  • In-Situ Resource Scheduling Systems

– Enhance resource utilization – Simplify the management of analytics code – GoldRush, Glean, DataSpaces, FlexIO, etc.

3

Algorithm/Application Level Platform/System Level

Seamlessly Connected?

slide-4
SLIDE 4

Opportunity

  • Explore the Programming Model Level in In-

Situ Environment

– Between application level and system level – Hides all the parallelization complexities by simplified API – A prominent example: MapReduce

4

+

In Situ

slide-5
SLIDE 5

Challenges

  • Hard to Adapt MR to In-Situ Environment

– MR is not designed for in-situ analytics

  • 4 Mismatches

– Data Loading Mismatch – Programming View Mismatch – Memory Constraint Mismatch – Programming Language Mismatch

5

slide-6
SLIDE 6

System Overview

6

Shared-Memory System Distributed System In-Situ System

In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning

slide-7
SLIDE 7

Two In-Situ Modes

7

Time Sharing Mode: Minimizes memory consumption Space Sharing Mode: Enhances resource utilization when simulation reaches its scalability bottleneck

slide-8
SLIDE 8

Smart vs. Spark

  • To Make a Fair Comparison

– Bypass programming view mismatch

  • Run on an 8-core node: multi-threaded but not distributed

– Bypass memory constraint mismatch

  • Use a simulation emulator that consumes little memory

– Bypass programming language mismatch

  • Rewrite the simulation in Java and only compare computation time
  • 40 GB input and 0.5 GB per time-step

8

813 424 210 105 15550 10403 7750 6559 0.0E+00 2.0E+03 4.0E+03 6.0E+03 8.0E+03 1.0E+04 1.2E+04 1.4E+04 1.6E+04 1.8E+04 1 2 4 8

Computation Times (secs) # of Threads Smart Spark

344 173 96 43 10361 6697 4766 3992 0.0E+00 2.0E+03 4.0E+03 6.0E+03 8.0E+03 1.0E+04 1.2E+04 1 2 4 8

Computation Times (secs) # of Threads Smart Spark

62X 92X K-Means Histogram

slide-9
SLIDE 9

Smart vs. Low-Level Implementations

  • Setup

– Smart: time sharing mode; Low-Level: OpenMP + MPI – Apps: K-means and logistic regression – 1 TB input on 8–64 nodes

  • Programmability

– 55% and 69% parallel codes are either eliminated or converted into sequential code

  • Performance

– Up to 9% extra overheads for k-means – Nearly unnoticeable overheads for logistic regression

9

200 400 600 800 1,000 1,200 1,400 1,600

8 16 32 64 Computation Times (secs) # of Nodes Smart Low-Level

200 400 600 800 1,000 1,200 1,400 1,600 1,800

8 16 32 64 Computation Times (secs) # of Nodes Smart Low-Level

K-Means Logistic Regression

slide-10
SLIDE 10

Tomography at Advanced Photon Source

EuroPar’15 10

slide-11
SLIDE 11

Tomographic Image Reconstruction

  • Analysis of tomographic datasets is

challenging

  • Long image reconstruction/analysis time

– E.g. 12GB Data, 12 hours with 24 Cores – Different reconstruction algorithms

  • Longer computation times

– Input dataset < Output dataset

  • 73MB vs. 476MB
  • Parallelization using MATE+

– Predecessor of Smart System

11 EuroPar’15

slide-12
SLIDE 12

12 EuroPar’15

Inputs IS: Assigned projection slices Recon: Reconstruction object dist: Subsetting distance Output Recon: Final reconstruction object /* (Partial) iteration i */ For each assigned projection slice, is, in IS { IR = GetOrderedRaySubset(is, i, dist); For each ray, ir, in rays IR { (k, off, val) = LocalRecon(ir, Recon(is)); ReconRep(k) = Reduce (ReconRep(k), off, val); } } /* Combine updated replicas */ Recon = PartialCombination(ReconRep) /* Exchange and update adjacent slices*/ Recon = GlobalCombination(Recon)

Mapping to a MapReduce-like API

Recon[i]

Node 0

Partial Combination()

Local Recon(…)

Recon Rep

Thread m

… …

Partial Combination Phase

… …

Node n

Thread 0

Thread m

Local Reconstruction Phase Local Recon(…)

Recon Rep

Thread 0

Global Combination Phase

Iteration i

… …

Projs. Recon [i- 1] Inputs Inputs

slide-13
SLIDE 13

In Situ Analysis

  • How do we decide what data to save?

– This analysis cannot take too much time/memory – Simulations already consume most available memory – Scientists cannot accept much slowdown for analytics

  • How insights can be obtained in-situ?

– Must be memory and time efficient

  • What representation to use for data stored

in disks?

– Effective analysis/visualization – Disk/Network Efficient

slide-14
SLIDE 14

Specific Issues

  • Bitmaps as data summarization

– Utilize extra computer power for data reduction – Save memory usage, disk I/O and network transfer time

  • In-Situ Data Reduction

– In-Situ generate bitmaps

 Bitmaps generation is time-consuming  Bitmaps before compression has big memory cost

  • In-Situ Data Analysis

– Time steps selection

 Can bitmaps support time step selection?  Efficiency of time step selection using bitmaps

  • Offline Analysis:

– Only keep bitmaps instead of data – Types of analysis supported by bitmaps

slide-15
SLIDE 15

Time-Steps Selection

Full Data

IO Devices

Correlation Metrics (Slow) Correlation Metrics (Slow) IO (Slow) IO (Slow)

Bitmaps

IO Devices

Correlation Metrics (Fast) Correlation Metrics (Fast) IO (Fast) IO (Fast)

slide-16
SLIDE 16

Efficiency Comparison for In-Situ Analysis - MIC

  • MIC:
  • More cores
  • Lower bandwidth
  • Full Data (original):
  • Huge data writing time
  • Bitmaps:
  • Good scalability of both bitmaps

generation and time step selection using bitmaps

  • Much smaller data writing time
  • Overall: 0.81x to 3.28x
  • Simulation: Heat3D; Processor: MIC
  • Time steps: select 25 over 100 time steps
  • 1.6 GB per time step (200*1000*1000)
  • Metrics: Conditional Entropy