for Data Intensive Scalable Computing CAP3 Gene Assembly Program - - PowerPoint PPT Presentation

for data intensive scalable computing
SMART_READER_LITE
LIVE PREVIEW

for Data Intensive Scalable Computing CAP3 Gene Assembly Program - - PowerPoint PPT Presentation

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive application Embarrassingly parallel operation All runtimes performs equally well Measured


slide-1
SLIDE 1

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

Data/compute intensive applications implemented as MapReduce “filters” Architecture of CGL-MapReduce

Measured using 32 Compute nodes each with 8 cores and 16 GB of memory

  • Compute intensive

application

  • Embarrassingly

parallel operation

  • All runtimes

performs equally well

Number of Reads processed

High Energy Physics Data Analysis CAP3 – Gene Assembly Program

  • Data intensive

application

  • MapReduce style

parallel operation

  • Both runtimes perform

comparably well

Jaliya Ekanayake {jekanaya@cs.indiana.edu}

slide-2
SLIDE 2

Iterative MapReduce- Kmeans Clustering and Matrix Multiplication

Iterative MapReduce algorithm for Matrix Multiplication Kmeans Clustering implemented as an iterative MapReduce application Overhead of parallel runtimes – Matrix Multiplication

  • Compute intensive

application O(n^3)

  • Higher data transfer

requirements O(n^2)

  • CGL-MapReduce

shows minimal

  • verheads next to

MPI

Overhead of parallel runtimes – Kmeans Clustering

  • O(n) calculations in

each iteration

  • Small data transfer

requirements O(1)

  • With large data sets,

CGL-MapReduce shows negligible

  • verheads
  • Extremely higher
  • verheads in Hadoop

and Dryad

Jaliya Ekanayake {jekanaya@cs.indiana.edu}

slide-3
SLIDE 3
  • Performance of MPI on virtualized resources

– Evaluated using a dedicated private cloud infrastructure – Exactly the same hardware and software configurations in bare-metal and virtual nodes – Applications with different communication: computation ratios – Different virtual machine(VM) allocation strategies {1-VM per node to 8-VMs per node}

High Performance Parallel Computing on Cloud

Performance of Matrix multiplication under different VM configurations Overhead under different VM configurations for Concurrent Wave Equation Solver

  • O(n^2) communication (n = dimension of a matrix)
  • More susceptible to bandwidth than latency
  • Minimal overheads under virtualized

resources

  • O(1) communication (Smaller messages)
  • More susceptible to latency
  • Higher overheads under virtualized

resources

Jaliya Ekanayake {jekanaya@cs.indiana.edu}