towards programming models for jungle computing
play

(Towards) Programming Models for Jungle Computing Jason Maassen - PowerPoint PPT Presentation

(Towards) Programming Models for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands Requirements (revisited) Resource independence Transparent / easy


  1. (Towards) Programming Models for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands

  2. Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault- tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 2

  3. Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault- tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 3

  4. Where are we ? ComplexHPC Spring School 2011 4

  5. Introduction ● We now have everything we need to create and run Jungle computing applications ● Creating such applications is still difficult! ● IPL is communication library (not programming model) ● Applications using IPL must implement their own: ● Work distribution ● Load balancing ● Fault tolerance ● ... ComplexHPC Spring School 2011 5

  6. Programming Models ● Satin (Divide & Conquer) ● RMI (Object oriented RPC) ● User transparent ● Client – server parallelism (recursive ) ● Distributed system ● Automatic ● MPJ (MPI for Java) load-balancing and ● SPMD fault-tolerance ● Homogeneous cluster ● Grids (heterogeneous ● Joris (Image processing) performance) ● User transparent parallelism (sequential) ● Homogeneous cluster ComplexHPC Spring School 2011 6

  7. Satin Divide&Conquer ● Nicely fits hierarchical grids job1 job1 cluster1 cluster3 job2 job3 job2 job3 cluster2 job4 job4 job4 job5 job4 job5 job6 job6 job7 job7 cluster4 ComplexHPC Spring School 2011 7

  8. What is Missing ? ● Support for heterogeneous hardware ● Many state-of-the-art systems use accelerators ● GPUs ● Cells ● FPGA ● Huge performance gain for certain algorithms ● Fastest NVidia GPU offers 1.5 TFlop/s! ● Examples: DAS-4, CIEMAT-CIE clusters ComplexHPC Spring School 2011 8

  9. Problems ● Accelerators typically require specialized tools to program them ● CUDA, OpenCL, Verilog, etc. ● These tools are designed to create applications for a single accelerator ● Not a set of similar accelerators ● Not mix of different accelerators ComplexHPC Spring School 2011 9

  10. What do we need ? ● A programming model that can combine specialized accelerator codes with all the benefits of Ibis! ● Ibis/Constellation ● Inspired by the Many Task Computing model ● Task scheduling with match-making ● Ensures that each job is send to a machine that can actually execute it. ComplexHPC Spring School 2011 10

  11. Many Task Computing According to Foster, Raicu et al “High -performance computations comprising multiple distinct activities, coupled via file system operations or message passing. Tasks may be small or large, uni- processor or multi-processor, compute-intensive or data- intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large.” ● Applications are dynamic and heterogeneous workflows / DAGs of tasks ComplexHPC Spring School 2011 11

  12. MTC in the Jungle ● MTC has advantages for Jungle Computing ● Many distinct activities ● Can be implemented independently using the tools and targeted to the HPC architecture, that best suit them ● Reduced programming complexity ● Complete applications are constructed using sequences and combinations of activities ComplexHPC Spring School 2011 12

  13. Constellation Model ● Application: set of activities ● Loosly coupled (communicate using events) ● Size and complexity may vary ● Sub-second sequential jobs to large parallel simulations that take hours ● Hardware: set of executors ● Capable of running activities ● May represent anything from a single core to entire cluster, a GPU, etc. ComplexHPC Spring School 2011 13

  14. Constellation Model ● Both activities and executors can be tagged using a context ● Simple application defined label ● Defines relationship between activites and executors ● Data dependencies ● Hardware requirements and capabilities ● Data and resource sizes ● ... ● Constellation RTS performs match-making and load-balancing ComplexHPC Spring School 2011 14

  15. Constellation Example ComplexHPC Spring School 2011 15

  16. Early Experiments ● Supernova detection application ● Our winning entry in the 2008 Data Challenge ● Originally IPL + JavaGAT ● Ported to Constellation ● Analyse 1052 image pairs ● Varying resolution ● Test Constellation in 3 different scenarios ComplexHPC Spring School 2011 16

  17. Scenario 1 Data Locality ● Data distributed over 4 clusters (DAS3/4) ● Activity: entire application ● Executor: complete node ● Use context to express data locality ● Locality aware task farming ● No change in application! ● Use constellation wrapper ● Adapt context to tune application ComplexHPC Spring School 2011 17

  18. Scenario 2 Executor Granularity ● Single 48 core machine ● Activity: entire application (a-c) single task (d) ● Executor: [n]-cores ● No change in application for experiment (a-c) ● Only change executor config. ● Completely ported application in (d) ● Significant performance gain! ComplexHPC Spring School 2011 18

  19. Scenario 3: Heterogeneous System ● 18 node GPU cluster ● 8 cores + 1 GPU per node ● Activity: single task ● Executor: 1 core (top) 1 core or 1 GPU (bottom) ● Replaced activity 7.2 with GPU version. ● Label activities and executors accordingly ● Constellation takes care of rest! ● Significant performance gain. ComplexHPC Spring School 2011 19

  20. Conclusions ● Initial experiments show that Constellation works well for a wide range of hardware configurations ● Allows integration of specialized accellerator codes ● Easy to extend and reconfigure applications ● Suitable basis for a Jungle Computing model ComplexHPC Spring School 2011 20

  21. Future Work ● More applications on wider range of hardware ● Integration of executor deployment into model ● Implement on top of Constellation: ● Domain specific languages ● Phyxis-DT (successor of Joris) ● User-friendly workflow models ● Existing programming models ● Satin ComplexHPC Spring School 2011 21

  22. ComplexHPC Spring School 2011 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend