(Towards) Programming Models for Jungle Computing Jason Maassen - - PowerPoint PPT Presentation

towards programming models for jungle computing
SMART_READER_LITE
LIVE PREVIEW

(Towards) Programming Models for Jungle Computing Jason Maassen - - PowerPoint PPT Presentation

(Towards) Programming Models for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands Requirements (revisited) Resource independence Transparent / easy


slide-1
SLIDE 1

(Towards) Programming Models for Jungle Computing

Jason Maassen

Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands

slide-2
SLIDE 2

Requirements

(revisited)

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • System-support for malleability and fault-tolerance
  • Globally unique naming
  • Transparent parallelism & application-level fault-

tolerance

  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 2

slide-3
SLIDE 3

Requirements

(revisited)

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • System-support for malleability and fault-tolerance
  • Globally unique naming
  • Transparent parallelism & application-level fault-

tolerance

  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 3

slide-4
SLIDE 4

Where are we ?

ComplexHPC Spring School 2011 4

slide-5
SLIDE 5

Introduction

  • We now have everything we need to create and

run Jungle computing applications

  • Creating such applications is still difficult!
  • IPL is communication library (not programming model)
  • Applications using IPL must implement their own:
  • Work distribution
  • Load balancing
  • Fault tolerance
  • ...

ComplexHPC Spring School 2011 5

slide-6
SLIDE 6

Programming Models

  • RMI (Object oriented RPC)
  • Client – server
  • Distributed system
  • MPJ (MPI for Java)
  • SPMD
  • Homogeneous cluster
  • Joris (Image processing)
  • User transparent

parallelism (sequential)

  • Homogeneous cluster

ComplexHPC Spring School 2011 6

  • Satin (Divide & Conquer)
  • User transparent

parallelism (recursive)

  • Automatic

load-balancing and fault-tolerance

  • Grids (heterogeneous

performance)

slide-7
SLIDE 7

Satin

Divide&Conquer

  • Nicely fits hierarchical grids

ComplexHPC Spring School 2011 7

job1

job2 job3

job4 job4 job5 job6 job7

cluster1 cluster2 cluster3 cluster4 job1

job2 job3

job4 job4 job5 job6 job7

slide-8
SLIDE 8

What is Missing ?

  • Support for heterogeneous hardware
  • Many state-of-the-art systems use accelerators
  • GPUs
  • Cells
  • FPGA
  • Huge performance gain for certain algorithms
  • Fastest NVidia GPU offers 1.5 TFlop/s!
  • Examples: DAS-4, CIEMAT-CIE clusters

ComplexHPC Spring School 2011 8

slide-9
SLIDE 9

Problems

  • Accelerators typically require specialized tools to

program them

  • CUDA, OpenCL, Verilog, etc.
  • These tools are designed to create applications

for a single accelerator

  • Not a set of similar accelerators
  • Not mix of different accelerators

ComplexHPC Spring School 2011 9

slide-10
SLIDE 10

What do we need ?

  • A programming model that can combine

specialized accelerator codes with all the benefits of Ibis!

  • Ibis/Constellation
  • Inspired by the Many Task Computing model
  • Task scheduling with match-making
  • Ensures that each job is send to a machine that can actually

execute it.

ComplexHPC Spring School 2011 10

slide-11
SLIDE 11

Many Task Computing

According to Foster, Raicu et al “High-performance computations comprising multiple distinct activities, coupled via file system operations or message passing. Tasks may be small or large, uni- processor or multi-processor, compute-intensive or data-

  • intensive. The set of tasks may be static or dynamic,

homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity

  • f computing, and volumes of data may be extremely

large.”

ComplexHPC Spring School 2011 11

  • Applications are dynamic and heterogeneous

workflows / DAGs of tasks

slide-12
SLIDE 12

MTC in the Jungle

  • MTC has advantages for Jungle Computing
  • Many distinct activities
  • Can be implemented independently using the tools

and targeted to the HPC architecture, that best suit them

  • Reduced programming complexity
  • Complete applications are constructed using

sequences and combinations of activities

ComplexHPC Spring School 2011 12

slide-13
SLIDE 13

Constellation Model

  • Application: set of activities
  • Loosly coupled (communicate using events)
  • Size and complexity may vary
  • Sub-second sequential jobs to large parallel

simulations that take hours

  • Hardware: set of executors
  • Capable of running activities
  • May represent anything from a single core to

entire cluster, a GPU, etc.

ComplexHPC Spring School 2011 13

slide-14
SLIDE 14

Constellation Model

  • Both activities and executors can be tagged

using a context

  • Simple application defined label
  • Defines relationship between activites and executors
  • Data dependencies
  • Hardware requirements and capabilities
  • Data and resource sizes
  • ...
  • Constellation RTS performs match-making

and load-balancing

ComplexHPC Spring School 2011 14

slide-15
SLIDE 15

Constellation Example

ComplexHPC Spring School 2011 15

slide-16
SLIDE 16

Early Experiments

  • Supernova detection application
  • Our winning entry in the 2008 Data Challenge
  • Originally IPL + JavaGAT
  • Ported to Constellation
  • Analyse 1052 image pairs
  • Varying resolution
  • Test Constellation in

3 different scenarios

ComplexHPC Spring School 2011 16

slide-17
SLIDE 17

Scenario 1

Data Locality

  • Data distributed over

4 clusters (DAS3/4)

  • Activity: entire application
  • Executor: complete node
  • Use context to express

data locality

  • Locality aware task farming
  • No change in application!
  • Use constellation wrapper
  • Adapt context to tune application

ComplexHPC Spring School 2011 17

slide-18
SLIDE 18

Scenario 2

Executor Granularity

  • Single 48 core machine
  • Activity: entire application (a-c)

single task (d)

  • Executor: [n]-cores
  • No change in application

for experiment (a-c)

  • Only change executor config.
  • Completely ported

application in (d)

  • Significant performance gain!

ComplexHPC Spring School 2011 18

slide-19
SLIDE 19

Scenario 3:

Heterogeneous System

  • 18 node GPU cluster
  • 8 cores + 1 GPU per node
  • Activity: single task
  • Executor: 1 core (top)

1 core or 1 GPU (bottom)

  • Replaced activity 7.2

with GPU version.

  • Label activities and

executors accordingly

  • Constellation takes care
  • f rest!
  • Significant performance gain.

ComplexHPC Spring School 2011 19

slide-20
SLIDE 20

Conclusions

  • Initial experiments show that Constellation works

well for a wide range of hardware configurations

  • Allows integration of specialized accellerator codes
  • Easy to extend and reconfigure applications
  • Suitable basis for a Jungle Computing model

ComplexHPC Spring School 2011 20

slide-21
SLIDE 21

Future Work

  • More applications on wider range of hardware
  • Integration of executor deployment into model
  • Implement on top of Constellation:
  • Domain specific languages
  • Phyxis-DT (successor of Joris)
  • User-friendly workflow models
  • Existing programming models
  • Satin

ComplexHPC Spring School 2011 21

slide-22
SLIDE 22

ComplexHPC Spring School 2011 22