in parallel dag based
play

in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic - PowerPoint PPT Presentation

Detecting Bottlenecks in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic Battr Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische


  1. Detecting Bottlenecks in Parallel DAG-based Data Flow Programs Björn Lohrmann Dominic Battré Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische Universität Berlin

  2. Introduction (1) IaaS clouds offer virtual machines on-demand Why use clouds for data processing? ■ Fast and unlimited** scale-out ■ Pricing Model ♦ Pay-as-you-go ♦ 10 nodes for 1 day = 1 node for 10 days ■ No long-term obligations **almost 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 2

  3. Introduction (2) Frameworks are required for effective use of clouds ? Parallelization Job Modelling Job Scheduling Eucalyptus Hadoop VM Nephele Management etc. Job Job Deployment Monitoring 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 3

  4. Prerequisites ● Jobs modelled as directed Task 4 acyclic graphs ■ Vertices are tasks ■ Edges are communication channels ● Each task has 1..n parallel Task 2 Task 3 task instances ● Unidirectional and blocking communication Task 1 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 4

  5. Overview Key question of this talk: Task 5 Task 5 ● Given a DAG-shaped job, how many task instances should I assign to each task? Task 3 Task 4 Our approach ● Begin with 1 instance for Task 2 each task ● Iteratively detect bottlenecks and add instances where Task 1 necessary 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 5

  6. Bottlenecks Negative effects of bottlenecks: Task 5 Task 5 ■ Input starvation ■ Output blockage Task 3 Task 4 Low throughput of workflow Low resource utilization Time and money wasted Task 2 Task 1 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 6

  7. Bottlenecks Types: Task 3 CPU ● CPU ■ Enough input available Task 2 CPU ■ Throughput limited by CPU ■ Lack of input for subsequent Task 1 CPU tasks ● I/O ■ Transport infrastructure is Task 2 CPU overloaded (NICs, switches, etc) ■ Forces tasks to wait Task 1 CPU 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 7

  8. Bottleneck Detection ● Monitor job at runtime: ● Continuously measure CPU load and I/O wait on task instances ● Aggregate to task statistics ● Continuously analyze task statistics: ■ Traverse task nodes in reverse topological order and check for CPU bottlenecks ■ If none found traverse edges in reverse topological order and check for I/O bottlenecks ■ If bottleneck found: Report it! 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 8

  9. Implementation ● Based on Nephele framework ■ Java framework ■ 1 master, n workers ■ Task instance = Java thread ● Analysis of thread state statistics: ■ Threshold for CPU bottleneck: ♦ USR + SYS + BLK >= 90% time ■ Threshold for I/O bottleneck ♦ WAIT caused by sending on channel >= 90% time 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 9

  10. Evaluation Demo Job Setup: ● Private compute cloud PDF Index Writer Writer ● Hosts with two Intel Xeon 2,66Ghz, 32 GB RAM and PDF Inverted 1GB Ethernet Creator Index ● KVM guests with one virtual CPU and 2GB RAM OCR ● Eucalyptus framework for VM File allocation/deallocation Reader 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 10

  11. Evaluation (2) Phase 1: Fine tuning 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 11

  12. Evaluation (1) Phase 2: Scale-out 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 12

  13. Conclusion ● Bottleneck detection is useful to scale out jobs in the cloud, while maintaining high resource utilization ● We presented a simple approach to gather and analyze relevant statistics ● Right now, manual adaptation and job re-runs are necessary to eliminate bottlenecks ● Future work: ■ Dynamically and automatically adjust parallelization at runtime 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend