Work Queue + Python A Framework For Scalable Scientific Ensemble - PowerPoint PPT Presentation

Work Queue + Python A Framework For Scalable Scientific Ensemble Applications Peter Bui , Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas Thain University of Notre Dame

Distributed Computing Examples Examples ● Condor cluster ● SGE grid ● Beowulf cluster

Programming Challenges Resource Management ● Storage ● CPUs ● Network Scheduling ● Packaging ● Deployment ● Task dispatch Fault tolerance ● Nodes die ● Jobs fail ● Network problems

Work Queue A flexible master/worker framework for building large scale scientific ensemble applications that span many machines including clusters, grids, and clouds. Features ● Data management ● Fault tolerance ● Scheduling ● Fast abort ● Flexible worker deployment ● Catalog discovery service

Master/Worker Model Central Master Pool of Workers ● Divides work into tasks ● Receive input and ● Sends tasks to Workers executable files ● Execute task command ● Gathers results ● Return output files

Work Queue: Data Management, Fault Tolerance

Work Queue: Scheduling, Fast Abort Provides multiple algorithms for assigning tasks to workers: 1. First Come First Serve 2. Cached Files 3. Fastest Time 4. Preferred Hosts 5. Random To prevent stragglers, collect statistics, and perform fast abort on slow workers.

Work Queue: Worker Deployment, Architecture

Work Queue + Python ● Library is written in C and provides a straightforward API ○ C is a low-level language ○ Domain scientists familiar with scripting languages ● Provide Python bindings to library ○ Initially hand-written, but switched to SWIG ○ Allow scientists to high-level language ○ Access to large community and ecosystem of third-party software

Python-WorkQueue Module WorkQueue Task # Import work_queue module # Import work_queue module from work_queue import WorkQueue from work_queue import Task # Create master work queue # Create task wq = WorkQueue() task = Task('date > output.txt') # Set catalog project name # Specify output file wq.specify_name('project.name') task.specify_output_file('output.txt') # Set selection algorithm # Submit task to master wq.specify_algorithm( wq.submit(task) WORK_QUEUE_SCHEDULE_FILES) # Set fast abort factor wq.activate_fast_abort(1.5)

Example: Distributed Convert from workqueue import WorkQueue, Task import os, sys wq = WorkQueue() output_ext = sys.argv[1] # For each file, construct & submit a transcoding task for input_file in sys.argv[2:]: output_file = os.path.splitext(input_file)[0] + '.' + output_ext task = Task('convert %s %s' % (input_file, output_file)) task.specify_input_file(input_file) task.specify_output_file(output_file) wq.submit(task) # While workqueue is not empty, poll for task and then print command and result while not wq.empty(): task = wq.wait(1) if task: print task.command, task.result

Application: Replica Exchange

Evaluation: Replica Exchange Events A: Start 100 SGE workers B: Add 150 Condor workers C: Add 110 Condor and 40 Amazon EC2 workers D: Remove 100 SGE workers E: Remove 125 Condor and 25 Amazon EC2 workers

Application: Folding@Work

Evaluation: Folding@Work Results after One Month 283830 Tasks Assigned Results received 122141 Simulation time gathered 305 us Execution time average (min) 125 Execution time std. dev (min) 87 Number of workers 5000 Number of unique machines 370 Represents about 3,000 CPU days of work.

Future Work ● Use SWIG to generate bindings for additional languages (PERL, Lua, etc.) ● Monitoring and visualization software ● Extend Work Queue to better support hierarchical workflows ○ Multiple masters ○ Resource manage and allocation ● Integrate into Programming Paradigms course

Summary Work Queue is a flexible and powerful framework for constructing scalable scientific ensemble applications. ● Provides data management, fault-tolerance, multiple scheduling algorithms, fast abort, and support for multiple distributed systems. ● With Python-WorkQueue module it is now available in a user-friendly language.

Questions? CCTools Software Download http://cse.nd.edu/~ccl/software

Analysis ● Work Queue transparently handles worker additions and failures ● Work Queue harnesses resources from multiple distributed systems ● Work Queue scales to hundreds to thousands of workers

Work Queue: Success Stories Makeflow AllPairs SAND Wavefront

Work Queue versus MPI Work Queue MPI ● Orchestrates ensemble of ● Coordinates multiple multiple external executables instances of single ● Number of workers dynamic executable ● Scale up to large number of ● Number of workers static workers (10s, 100s, 1000s) ● Difficult to scale up to ● Reliable and fault tolerant limited number of workers at the task level (16, 32, 64) ● Allows for heterogeneous ● Reliable at application level deployment environments but no fault tolerance ● Workers communicate only ● Requires homogeneous with Master deployment environment ● Workers can communicate with anyone

Work Queue + Python A Framework For Scalable Scientific Ensemble - PowerPoint PPT Presentation

Work Queue + Python A Framework For Scalable Scientific Ensemble Applications Peter Bui , Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas Thain University of Notre Dame Distributed Computing Examples Examples Condor cluster

ADT Queue 1 Queues 2 Queue of cars 3 Queue at logical level A queue is an ADT in which

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

Priority Queue Queue Enqueue an item Dequeue: Item returned has been in the queue

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Back of queue detection Edward D. Cox, Indiana DOT 1 Back ck of queue, queue, m many option

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

Queue Mode Scheduling at Subaru Telescope Eric Jeschke Software Division eric@naoj.org Queue

Priority Queues, Heaps, Graphs, and Sets Priority Queue Queue Enqueue an item

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Queues The Abstract Data Type Queue FIFO queue ADT Another common linear data structure

CS261 Data Structures Dynamic Array Queue and Deque Queues int isEmpty(); void addBack(TYPE

Stack and Queue ADT Stack Queue 2 ADT Example All main programs rely on concept of

queue ADT Sept. 23, 2016 1 Queue dequeue (remove from enqueue front) (add at back) Queues

Data Structures in Java Lecture 7: Queues. 9/30/2015 Daniel Bauer 1 The Queue ADT A Queue

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.

Distributed Transactions and Concurrency CS425/ECE 428 Nikita Borisov Topics for Today

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability

What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the

Distributed Systems (ICE 601) Transactions & Concurrency Control - Part1 Dongman Lee ICU

ECE 650 Systems Programming & Engineering Spring 2018 Database Transaction Processing Tyler

COSC 5351 Advanced Computer Architecture Slides modified from Hennessy CS252 course slides MP

Extreme Computing NoSQL www.inf.ed.ac.uk PREVIOUSLY: BATCH Query most/all data Results

Serializability with Snapshot Isolation under the Hood Mihaela Bornea 1 , S. Elnikety 2 , O.

Work Queue + Python A Framework For Scalable Scientific Ensemble - PowerPoint PPT Presentation

Work Queue + Python A Framework For Scalable Scientific Ensemble Applications Peter Bui , Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas Thain University of Notre Dame Distributed Computing Examples Examples Condor cluster

ADT Queue 1 Queues 2 Queue of cars 3 Queue at logical level A queue is an ADT in which

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

Priority Queue Queue Enqueue an item Dequeue: Item returned has been in the queue

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Back of queue detection Edward D. Cox, Indiana DOT 1 Back ck of queue, queue, m many option

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

Queue Mode Scheduling at Subaru Telescope Eric Jeschke Software Division eric@naoj.org Queue

Priority Queues, Heaps, Graphs, and Sets Priority Queue Queue Enqueue an item

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Queues The Abstract Data Type Queue FIFO queue ADT Another common linear data structure

CS261 Data Structures Dynamic Array Queue and Deque Queues int isEmpty(); void addBack(TYPE

Stack and Queue ADT Stack Queue 2 ADT Example All main programs rely on concept of

queue ADT Sept. 23, 2016 1 Queue dequeue (remove from enqueue front) (add at back) Queues

Data Structures in Java Lecture 7: Queues. 9/30/2015 Daniel Bauer 1 The Queue ADT A Queue

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.

Distributed Transactions and Concurrency CS425/ECE 428 Nikita Borisov Topics for Today

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability

What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the

Distributed Systems (ICE 601) Transactions &amp; Concurrency Control - Part1 Dongman Lee ICU

ECE 650 Systems Programming &amp; Engineering Spring 2018 Database Transaction Processing Tyler

COSC 5351 Advanced Computer Architecture Slides modified from Hennessy CS252 course slides MP

Extreme Computing NoSQL www.inf.ed.ac.uk PREVIOUSLY: BATCH Query most/all data Results

Serializability with Snapshot Isolation under the Hood Mihaela Bornea 1 , S. Elnikety 2 , O.

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Distributed Systems (ICE 601) Transactions & Concurrency Control - Part1 Dongman Lee ICU

ECE 650 Systems Programming & Engineering Spring 2018 Database Transaction Processing Tyler