Exploiting Inter-Flow Relationship for Coflow Placement in Data - - PowerPoint PPT Presentation

exploiting inter flow relationship for coflow placement
SMART_READER_LITE
LIVE PREVIEW

Exploiting Inter-Flow Relationship for Coflow Placement in Data - - PowerPoint PPT Presentation

Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers Xin Sunny Huang , T. S. Eugene Ng Rice University 1 This Work Optimizing Coflow performance has many benefits such as avoiding application straggles [1,2] and improving


slide-1
SLIDE 1

Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers

Xin Sunny Huang, T. S. Eugene Ng Rice University

1

slide-2
SLIDE 2

This Work

  • Optimizing Coflow performance has many

benefits such as avoiding application straggles[1,2] and improving resource utilization[3,4].

  • Coflow placement is an unexplored, important

factor to determine Coflow performance.

  • 2D-Placement leverages inter-flow relationship to

find good placement for Coflows.

2 [1] Orchestra (SIGCOMM ’11). [2] Varys (SIGCOMM ’14). [3] CARBYNE (OSDI ‘16). [4] YARN-ME (memory elasticity, in ATC ’17)

slide-3
SLIDE 3

Coflow #3

(broadcast)

Coflow #2

(aggregation)

Coflow #1

(shuffle)

  • Coflow [1] : A set of parallel flows.
  • Produced by distributed applications (e.g. Hadoop & Spark).
  • Performance is measured by Coflow Completion Time (CCT),

i.e. the slowest flow’s completion time.

[1] Chowdhury, M. et al. Coflow: An application layer abstraction for cluster networking. (HotNets’12) 3

Coflow

slide-4
SLIDE 4

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

4

Coflow Scheduling

  • Prior works demonstrate benefits of Coflow scheduling.
  • Limitation: Assume predetermined placement for Coflows,

i.e. predetermined sender/receiver locations.

Varys (SIGCOMM ’14), Aalo (SIGCOMM ’15), CODA (SIGCOMM ’16) and Sunflow (CoNEXT ’16), etc.

Existing

slide-5
SLIDE 5

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

5

Coflow Scheduling

  • Prior works demonstrate benefits of Coflow scheduling.
  • Limitation: Assume predetermined placement for Coflows,

i.e. predetermined sender/receiver locations.

Varys (SIGCOMM ’14), Aalo (SIGCOMM ’15), CODA (SIGCOMM ’16) and Sunflow (CoNEXT ’16), etc.

Existing Newly arriving

slide-6
SLIDE 6
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

6

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

slide-7
SLIDE 7
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

7

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

slide-8
SLIDE 8
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

8

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

slide-9
SLIDE 9
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

9

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

slide-10
SLIDE 10
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

10

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow

slide-11
SLIDE 11
  • Coflow placement can be flexible (e.g. cluster scheduler

to choose machines for tasks in a stage).

  • Placement and scheduling decide Coflow performance.

11

Coflow Placement

1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3 1 2 N

. . .

1 2 N

. . .

N-1 N-1

3 3

This work: good placement under

  • ptimal scheduling

Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow

slide-12
SLIDE 12

Coflow Placement Constrained by Inter-Flow Relationship

  • Within a Coflow, flows’ placement are dependent.

12

slide-13
SLIDE 13

Coflow Placement Constrained by Inter-Flow Relationship

  • Within a Coflow, flows’ placement are dependent.

13

slide-14
SLIDE 14

Coflow Placement Constrained by Inter-Flow Relationship

  • Within a Coflow, flows’ placement are dependent.

14

slide-15
SLIDE 15

Coflow Placement Constrained by Inter-Flow Relationship

  • Within a Coflow, flows’ placement are dependent.

15

slide-16
SLIDE 16

Coflow Placement Constrained by Inter-Flow Relationship

  • Within a Coflow, flows’ placement are dependent.

16

slide-17
SLIDE 17

Challenge #1: Intra-Coflow Bottleneck Delay

17

1 2 4 1 2 4 3 3

30

2 2 2 2 2

30 30 50

C2

How to place?

Network with C1

s1 s2 s3 r2 r1 30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4

slide-18
SLIDE 18

Challenge #1: Intra-Coflow Bottleneck Delay

18

1 2 4 1 2 4 3 3

30

2 2 2 2 2

30 30 50

C2

How to place?

Network with C1

s1 s2 s3 r2 r1 30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 Only consider C2 : C1 is prioritized under

  • ptimal scheduling,

and thus C1 is not sensitive to C2.

slide-19
SLIDE 19

Challenge #1: Intra-Coflow Bottleneck Delay

19

C2

2 2 2 2 2 30 30 30 50 s1 s2 s3 r2 r1

How to place?

Network with C1

3 4 in.1 2

  • ut.1 2

3 4

Optimal

3 4 in.1 2

  • ut.1 2

3 4

  • ut.1, out.2, out.3:

less bandwidth Bottleneck at r1

50 30 2 30 2 2 30 2 2

Place r1 at less busy port out.4

slide-20
SLIDE 20

Challenge #2: Inter-Coflow Bottleneck Contentions

20

3 4 in.1 2

  • ut.1 2

3 4 50 30 2 30 2 2 30 2 2

C3

20 20 20 s1 s2 s3 r1

How to place?

Place r1 at less busy port out.1 Optimal

3 4 in.1 2 50 30 2 30 2 2 30 2 2

  • ut.1 2

3 4

In-cast bottleneck at r1 in.1, out.3, out.4: heavily delay C2 (priority: C1>C3>C2)

slide-21
SLIDE 21

Intra-Coflow Inter-Coflow

21

Summary: Keys to Coflow Placement

Avoid delaying critical endpoints (bottleneck) Avoid contentions among critical endpoints.

slide-22
SLIDE 22

Intra-Coflow Inter-Coflow

22

2D-Placement

Identify critical endpoints that require better placement.

Step 1: Calculate endpoint demand

slide-23
SLIDE 23

Intra-Coflow Inter-Coflow

23

2D-Placement

Identify critical endpoints that require better placement.

Find ports with less contentions.

Step 2: Calculate load on ports Step 1: Calculate endpoint demand

slide-24
SLIDE 24

Intra-Coflow Inter-Coflow

24

2D-Placement

Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Identify critical endpoints that require better placement.

Find ports with less contentions. Avoid contentions on critical endpoints.

Step 2: Calculate load on ports Step 1: Calculate endpoint demand

slide-25
SLIDE 25

Intra-Coflow Inter-Coflow

25

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4

C2

90 50 Network with C1

slide-26
SLIDE 26

Intra-Coflow Inter-Coflow

26

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4

C2

90 50 Network with C1

Step 1: Calculate endpoint demand

slide-27
SLIDE 27

Intra-Coflow Inter-Coflow

27

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1

Step 1: Calculate endpoint demand

slide-28
SLIDE 28

Intra-Coflow Inter-Coflow

28

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

80

slide-29
SLIDE 29

Intra-Coflow Inter-Coflow

29

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

80 32

slide-30
SLIDE 30

Intra-Coflow Inter-Coflow

30

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

80 32 34

slide-31
SLIDE 31

Intra-Coflow Inter-Coflow

31

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

80 32 34 90

slide-32
SLIDE 32

Intra-Coflow Inter-Coflow

32

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

80 32 34 90 52

slide-33
SLIDE 33

Intra-Coflow Inter-Coflow

33

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

50 30 2 30 2 2 30 2 2 80 32 34 90 52

slide-34
SLIDE 34

Intra-Coflow Inter-Coflow

34

2D-Placement

30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2

  • ut.1 2

3 4 30 30 80 2 4 4 4 4 2

C2

90 50

Step 2: Calculate load on ports

Network with C1 Step 3: Place heavily loaded endpoints

  • n less loaded ports!

Step 1: Calculate endpoint demand

50 30 2 30 2 2 30 2 2

Greedy heuristic

80 32 34 90 52

slide-35
SLIDE 35

Simulation setup

  • Implemented a flow-level, discrete-event simulator
  • Workload[1] : realistic trace derived from Facebook cluster
  • 1hr traffic trace, > 500 Coflows, > 700,000 flows
  • Baseline: flow-by-flow placement for Coflows (Neat [3])
  • Coflow schedulers: Aalo [2] (this talk) and Varys [1] (paper),

both designed to minimize average CCT by prioritizing small Coflows to avoid HOL blocking.

[1] Varys (SIGCOMM ’14). [2] Aalo (SIGCOMM ’15). [3] Neat (CoNEXT ‘16) 35

slide-36
SLIDE 36

Improvement in Average CCT

36

0.87 0.82 0.77 0.77 0.87 0.00 0.20 0.40 0.60 0.80 1.00 x0.5 x0.75 x1 x1.25 x1.5

Traffic Scale Factor

2D-Placement’s average-CCT over Neat’s average-CCT

2D-Placement improves over Neat by up to 23% under Aalo Scheduling.

↓ Lower is better Aalo

slide-37
SLIDE 37

Improvement in Individual CCT

37

  • 100

100 300 500 700 900 1100 0.001 0.1 10 1000

Aalo

Ratio of Coflow bottleneck L over link bandwidth B (second) CCT reduction (second)

Individual CCT Reduction by 2D-Placement from Neat Small Coflows are prioritized and less sensitive to placement. Large Coflows are harder to place and more sensitive to placement.

For large Coflows, 2D-Placement is

  • nly 0.85× of Neat under Aalo scheduling.

↑ Higher is better

60 sec Reduction = 0

slide-38
SLIDE 38

38

More in paper: Results under Varys scheduling, Sensitivity to Schedulers, …

slide-39
SLIDE 39

Conclusions

  • First study on Coflow placement, which has

decisive impact on Coflow performance.

  • Coflow placement is more challenging due to

inter-flow dependency.

  • 2D-Placement leverages inter-flow relationship to

find good placement for Coflows.

39

Thank You!

slide-40
SLIDE 40

40

Thank You!

Xin Sunny Huang, T. S. Eugene Ng Rice University

slide-41
SLIDE 41

Backup slides

41

slide-42
SLIDE 42

Sensitivity to Schedulers

42

  • 2D-Placement’s improvement over Neat is usually larger

under Aalo scheduling.

1.

Aalo, due to lack of precise information of Coflow size, may allow temporary violation of the smallest- Coflow-first priority.

2.

Neat optimizes placement based on a specific traffic priority used for scheduling. Thus it is prone to error in scheduling dynamics during runtime.

3.

2D-Placement optimizes placement in a more general case independent of the scheduling.

slide-43
SLIDE 43

Improvement in Average CCT

43

0.87 0.82 0.77 0.77 0.87 1.00 0.96 0.79 0.74 0.78 0.00 0.20 0.40 0.60 0.80 1.00 x0.5 x0.75 x1 x1.25 x1.5

Traffic Scale Factor Aalo Varys

2D-Placement’s average-CCT over Neat’s average-CCT

2D-Placement improves over Neat by up to 26%.

↓ Lower is better

slide-44
SLIDE 44

Improvement in Individual CCT

44

For large Coflows, 2D-Placement is only 0.85× (0.92×) of Neat under Aalo (Varys) scheduling.

  • 100

100 300 500 700 900 1100 0.001 0.1 10 1000

Aalo

CCT reduction (second)

  • 100

100 300 500 700 900 1100 1300 1500 0.001 0.1 10 1000

Varys

Individual CCT Reduction by 2D-Placement from Neat

Ratio of Coflow bottleneck L over link bandwidth B (second)

slide-45
SLIDE 45

45

Thank You!

Xin Sunny Huang, T. S. Eugene Ng Rice University