in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic - - PowerPoint PPT Presentation

in parallel dag based
SMART_READER_LITE
LIVE PREVIEW

in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic - - PowerPoint PPT Presentation

Detecting Bottlenecks in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic Battr Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische


slide-1
SLIDE 1

Detecting Bottlenecks in Parallel DAG-based Data Flow Programs

Björn Lohrmann

Dominic Battré Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische Universität Berlin

slide-2
SLIDE 2

Introduction (1)

IaaS clouds offer virtual machines on-demand Why use clouds for data processing?

■ Fast and unlimited** scale-out ■ Pricing Model

♦ Pay-as-you-go ♦ 10 nodes for 1 day = 1 node for 10 days

■ No long-term obligations

**almost

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 2

slide-3
SLIDE 3

Introduction (2)

Frameworks are required for effective use of clouds

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 3

VM Management Job Deployment Job Modelling Job Monitoring Parallelization Job Scheduling

?

Eucalyptus Hadoop Nephele etc.

slide-4
SLIDE 4

Prerequisites

  • Jobs modelled as directed

acyclic graphs

■ Vertices are tasks ■ Edges are communication channels

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 4

Task 1 Task 2 Task 3 Task 4

  • Each task has 1..n parallel

task instances

  • Unidirectional and blocking

communication

slide-5
SLIDE 5

Overview

Key question of this talk:

  • Given a DAG-shaped job,

how many task instances should I assign to each task?

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 5

Task 1 Task 5 Task 2 Task 4 Task 3 Task 5

Our approach

  • Begin with 1 instance for

each task

  • Iteratively detect bottlenecks

and add instances where necessary

slide-6
SLIDE 6

Bottlenecks

Negative effects of bottlenecks:

■ Input starvation ■ Output blockage

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 6

Low throughput of workflow Low resource utilization Time and money wasted

Task 1 Task 5 Task 2 Task 4 Task 3 Task 5

slide-7
SLIDE 7

Bottlenecks

Types:

  • CPU

■ Enough input available ■ Throughput limited by CPU ■ Lack of input for subsequent tasks

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 7

Task 1 Task 2 Task 3 CPU CPU CPU Task 1 Task 2 CPU CPU

  • I/O

■ Transport infrastructure is

  • verloaded (NICs, switches,

etc) ■ Forces tasks to wait

slide-8
SLIDE 8

Bottleneck Detection

  • Monitor job at runtime:
  • Continuously measure CPU load and I/O wait on task

instances

  • Aggregate to task statistics
  • Continuously analyze task statistics:

■ Traverse task nodes in reverse topological order and check for CPU bottlenecks ■ If none found traverse edges in reverse topological

  • rder and check for I/O bottlenecks

■ If bottleneck found: Report it!

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 8

slide-9
SLIDE 9

Implementation

  • Based on Nephele framework

■ Java framework ■ 1 master, n workers ■ Task instance = Java thread

  • Analysis of thread state statistics:

■ Threshold for CPU bottleneck:

♦ USR + SYS + BLK >= 90% time

■ Threshold for I/O bottleneck

♦ WAIT caused by sending on channel >= 90% time

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 9

slide-10
SLIDE 10

Evaluation

Demo Job

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 10

File Reader PDF Writer OCR Inverted Index PDF Creator Index Writer

Setup:

  • Private compute cloud
  • Hosts with two Intel Xeon

2,66Ghz, 32 GB RAM and 1GB Ethernet

  • KVM guests with one virtual

CPU and 2GB RAM

  • Eucalyptus framework for VM

allocation/deallocation

slide-11
SLIDE 11

Evaluation (2)

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 11

Phase 1: Fine tuning

slide-12
SLIDE 12

Evaluation (1)

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 12

Phase 2: Scale-out

slide-13
SLIDE 13

Conclusion

  • Bottleneck detection is useful to scale out jobs in the

cloud, while maintaining high resource utilization

  • We presented a simple approach to gather and

analyze relevant statistics

  • Right now, manual adaptation and job re-runs are

necessary to eliminate bottlenecks

  • Future work:

■ Dynamically and automatically adjust parallelization at runtime

15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 13