Storing and Processing Multi-dimensional Scientific Datasets Alan - - PowerPoint PPT Presentation
Storing and Processing Multi-dimensional Scientific Datasets Alan - - PowerPoint PPT Presentation
Storing and Processing Multi-dimensional Scientific Datasets Alan Sussman UMIACS & Department of Computer Science http://www.cs.umd.edu/~als Data Exploration and Analysis Large data collections emerge as important resources Data
Alan Sussman - 3/5/08 2
Data Exploration and Analysis
Large data collections emerge as important resources
– Data collected from sensors and large-scale simulations – Multi-resolution, multi-scale, multi-dimensional
- data elements often correspond to points in multi-dim attribute
space
- medical images, satellite data, hydrodynamics data, etc.
– Terabytes to petabytes today
Low-cost, high-performance, high-capacity
commodity hardware
– 5 PCs, 5 Terabytes of disk storage for << $10,000
Alan Sussman - 3/5/08 3
Large Data Collections
Scientific data exploration and analysis
– To identify trends or interesting phenomena – Only requires a portion of the data, accessed through
spatial index
e.g., Quad-tree, R-tree
Spatial (range) query often used to specify iterator
– computation on data obtained from spatial query – computation aggregates data (MapReduce) - resulting
data product size significantly smaller than results of range query
Alan Sussman - 3/5/08 4
Specify portion of raw sensor data corresponding to some search criterion Output grid onto which a projection is carried out
Typical Query
Alan Sussman - 3/5/08 5
Target example applications
Processing Remotely-Sensed Data
NOAA Tiros-N w/ AVHRR sensor
AVHRR Level 1 Data AVHRR Level 1 Data
- As the TIROS-N satellite orbits, the
Advanced Very High Resolution Radiometer (AVHRR) sensor scans perpendicular to the satellite’s track.
- At regular intervals along a scan line measurements
are gathered to form an instantaneous field of view (IFOV).
- Scan lines are aggregated into Level 1 data sets.
A single file of Global Area Coverage (GAC) data represents:
- ~one full earth orbit.
- ~110 minutes.
- ~40 megabytes.
- ~15,000 scan lines.
One scan line is 409 IFOV’s
Water Contamination Study Pathology Satellite Data Processing Multi-perspective volume reconstruction
Alan Sussman - 3/5/08 6
Outline
Active Data Repository
Overall architecture Query planning Query execution Experimental Results
DataCutter
Alan Sussman - 3/5/08 7
Active Data Repository (ADR)
An object-oriented framework (class library + runtime
system) for building parallel databases of multi-dimensional datasets
– enables integration of storage, retrieval and processing of multi-
dimensional datasets on distributed memory parallel machines.
– can store and process multiple datasets. – provides support and runtime system for common operations
such as
data retrieval, memory management, scheduling of processing across a parallel machine.
– customizable for application specific processing.
Dataset Service Attribute Space Service Data Aggregation Service Indexing Service Query Execution Service Query Planning Service Query Interface Service Query Submission Service
Front End
Application Front End
Query
Client 2 (sequential)
Results
Client 1 (parallel) Back End
ADR Architecture
Alan Sussman - 3/5/08 9
Active Data Repository (ADR)
Dataset is collection of user-defined data chunks
– a data chunk contains a set of data elements – multi-dim bounding box (MBR) for each chunk, used by spatial index – chunks declustered across disks to maximize aggregate I/O bandwidth
Separate planning and execution phases for queries
– Tile output if too large to fit entirely in memory – Plan each tile’s I/O, data movement and computation
Identify all chunks of input that map to tile Distribute processing for chunks among processors
– All processors work on one tile at a time
Alan Sussman - 3/5/08 10
Query Planning
Index lookup Tiling Workload partitioning
Index lookup
Select data chunks of interest Compute mapping between input and output chunks
Tiling
Partition output chunks so that each tile fits in memory Use Hilbert curve to minimize total length of tile boundaries
Workload partitioning
Each aggregation operation involves an input/output chunk pair Want good load balance and low communication overhead
Alan Sussman - 3/5/08 11
Query Execution
Broadcast query plan to all processors For each output tile:
– Initialization phase
Read output chunks into memory, replicate if necessary
– Reduction phase
Read and process input chunks that map to current tile
– Combine phase
Combine partial results in replicated output chunks, if any
– Output handling
Compute final output values
O ← Output dataset, I ← Input dataset A ← Accumulator (for intermediate results) [SI, SO] ← Intersect(I, O, Rquery) foreach oe in SO do read oe ae ← Initialize(oe) foreach ie in SI do read ie SA ← Map(ie) ∩ SO foreach ae in SA do ae ← Aggregate(ie, ae) foreach ae in SO do
- e ← Output(ae)
write oe
ADR Processing Loop
Alan Sussman - 3/5/08 13
Query Execution Strategies
Distributed Accumulator (DA)
– Assign aggregation operation to owner of output chunk
Fully Replicated Accumulator (FRA)
– Assign aggregation operation to owner of input chunk – Requires combine phase
Sparsely Replicated Accumulator (SRA)
– similar to FRA, but only replicate output chunk when
needed
Alan Sussman - 3/5/08 14
Performance Evaluation
128-node IBM SP, with 256MB memory per node Datasets generated by Application Emulators
– Satellite Data Processing (SAT) – non-uniform mapping – Virtual Microscope (VM)
1-5-1 1-40-20 Comp (ms) tinit-tred-tcomb 1.0 16-128 192MB 1.5-24GB VM 4.6 161-1307 25MB 1.6-26GB SAT Fan-out (avg) Fan-in Output Input App
Alan Sussman - 3/5/08 15
Query Execution Time (sec)
50 100 150 200 250 300 8 16 32 64 128 Number of Processors FRA DA SRA 5 10 15 20 25 30 35 8 16 32 64 128 Number of Processors FRA DA SRA
SAT VM (Fixed input size)
Alan Sussman - 3/5/08 16
Summary of Experimental Results
Communication volume
– Comm. VolumeDA ∝ fan-out – Comm. VolumeFRA/SRA ∝ fan-in
DA may have computational load imbalance due to
non-uniform mapping
Relative performance depends on
– Query characteristics (e.g., fan-in, fan-out) – Machine configurations (e.g., number of processors)
No strategy always outperforms the others
Alan Sussman - 3/5/08 17
ADR queries vs. Other Approaches
Similar to out-of-core reductions
(more general MapReduce)
– Commutative & associative – Most reduction optimization
techniques target in-core data
– Out-of-core techniques require
data redistribution
Similar to relational group-by
queries
– Distributive & algebraic [Gray96] – spatial-join + group-by – For ADR, output data items and
extents known prior to processing
double x[max_nodes], y[max_nodes]; integer ia[max_edges], ib[max_edges]; for (i=0; i<max_edges; i++) x[ia[i]] += y[ib[i]]; Select Dept, AVG(Salary) From Employee Group By Dept
Alan Sussman - 3/5/08 18
Outline
Active Data Repository
DataCutter
Architecture Filter-stream programming Group Instances Transparent copies
Alan Sussman - 3/5/08 19
Distributed Grid Environment
Heterogeneous Shared Resources:
Host level: machine, CPUs, memory, disk storage Network connectivity
Many Remote Datasets:
Inexpensive archival storage Islands of useful data Too large for replication
Alan Sussman - 3/5/08 20
DataCutter
Target same classes of applications as ADR
Indexing Service
Multi-level hierarchical indexes based on spatial indexing
methods – e.g., R-trees
– Relies on underlying multi-dimensional space – User can add new indexing methods
Filtering Service
Distributed C++ (and Java) component framework Transparent tuning and adaptation for heterogeneity Filters implemented as threads – 1 process per host
Alan Sussman - 3/5/08 21
Filter-Stream Programming (FSP)
Purpose: Specialized components for processing data
based on Active Disks research [Acharya, Uysal, Saltz: ASPLOS’98],
macro-dataflow, functional parallelism
filters – logical unit of computation
–
high level tasks
–
init,process,finalize interface
streams – how filters communicate
–
unidirectional buffer pipes
–
uses fixed size buffers (min, good)
users specify filter connectivity and filter-level characteristics
Extract ref Extract raw 3D reconstruction View result
Raw Dataset Reference DB
Alan Sussman - 3/5/08 22
FSP: Abstractions
Filter Group
–
logical collection of filters to use together
–
application starts filter group instances
Unit-of-work cycle
–
“work” is application defined (ex.: a query)
–
work is appended to running instances
–
init(), process(), finalize() called for each uow
–
process() returns { EndOfWork | EndOfFilter }
–
allows for adaptivity A B
uow 0 uow 1 uow 2 buf buf buf buf
S
Alan Sussman - 3/5/08 23
Optimization Techniques
Mapping filters to hosts
– allow components to execute concurrently
Multiple filter group instances
– allow work to be processed concurrently
Transparent copies
– keep pipeline full by avoiding filter processing imbalance and use
write policies to deal with dynamic buffer distribution
Application memory tuning
– minimize resource usage to allow for copies
Alan Sussman - 3/5/08 24
Optimization - Group Instances
Work host3 (2 cpu) host2 (2 cpu) host1 (2 cpu)
P0 F0 C0 P1 F1 C1
Match # instances to environment (CPU capacity, network)
Alan Sussman - 3/5/08 25
Transparent Copies
replicate filters within an instance (intra-work) write policy to distribute work buffers to copies
–
shared queue within host
–
across hosts - round robin (RR), weighted RR (WRR), demand-driven (DD), user-defined (UD)
single stream illusion, UOWi < UOWi+1 state consistency problems addressed by a merge step
F0 P0 C0 F1 ?
Alan Sussman - 3/5/08 26
Runtime Pipeline Balancing
Use local information:
– queue size, send time / receiver acks
Adjust number of transparent copies Demand based dataflow (choice of consumer)
– Within a host – perfect shared queue among copies – Across hosts
Round Robin (RR) Weighted Round Robin (WRR) Demand-Driven (DD) sliding window (buffer consumption rate) User-defined
Alan Sussman - 3/5/08 27
Experiment – Isosurface Rendering
Isosurface rendering on Red/Blue Linux cluster at Maryland
– Red – 16 2-processor PII-450, 256MB, 18GB SCSI disk – Blue – 12 2-processor PIII-550, 1GB, 2-8GB SCSI disk +
1 8-processor PIII-550, 4GB, 2-18GB SCSI disk
– Connected via Gigabit Ethernet
UT Austin ParSSim chemical species transport simulation
– Single time step 3D visualization, read all data for 1 time step
Two implementations of Raster filter – z-buffer and active
pixels
Alan Sussman - 3/5/08 28
Sample Isosurface Visualization
V = 0.35 V = 0.7
Alan Sussman - 3/5/08 29
Experimental setup
read dataset isosurface extraction shade + rasterize merge / view R E Ra M
150 MB 38.6 MB 11.8 MB 28.5 MB 0.64s 0.68s 1.64s 1.65s 11.67s 9.43s 0.73s 0.90s = 14.68s = 12.66s Active Pixel Z-buffer 32.0MB Active Pixel Z-buffer
Experiment to follow combines R and E filters, since that showed best performance in experiments not shown
Alan Sussman - 3/5/08 30
Active Pixel vs. Z-Buffer
2 Raster Filters
5 10 15 20 25 1 2 4 8 # of processors Time (seconds) Active Zbuffer
1 Raster Filter
5 10 15 20 25 1 2 4 8 # of processors Time (seconds) Active Zbuffer
Configuration: RE-Ra-M Only Red nodes used – each one runs 1 RE, 1 or 2 RA, and one node runs M
Alan Sussman - 3/5/08 31
Heterogeneous Nodes
Active Pixel algorithm on 8-processor Blue node + Red data nodes Blue node runs 7 Ra or ERa copies and M, Red nodes each run 1 of each except M
RE-Ra-M
1 2 3 4 5 6 7 8 9 1 2 4 8 # of processors Time (seconds) RR WRR DD
R-ERa-M
1 2 3 4 5 6 7 8 9 1 2 4 8 # of processors Time (seconds) RR WRR DD
Alan Sussman - 3/5/08 32
Summary of Results
Placement matters
– Heterogeneity of shared resources, data volume
More instances and transparent copies
– Balance applications for heterogeneity
No static choice will work
– Runtime heterogeneity and dynamic shared resources
Alan Sussman - 3/5/08 33
DataCutter as a Grid Service
Application Level Programming Models Infrastructure Services Resource Level
Grid available Resources SRB User specified Resources Legion Client/Server Sockets Condor Pool Idle Resources JavaRMI, DCOM, CORBA NetSolve, Ninf AppLeS HPC++ NWS
DataCutter
Harmony DSM MPI RPC DPSS Globus
Alan Sussman - 3/5/08 34
Acknowledgments
Students
– Chialin Chang – ADR – Michael Beynon, Renato Ferreira – DataCutter
Other faculty and postdocs (now at Ohio State)
– Joel Saltz – Tahsin Kurc – Umit Catalyurek