Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers
Xin Sunny Huang, T. S. Eugene Ng Rice University
1
Exploiting Inter-Flow Relationship for Coflow Placement in Data - - PowerPoint PPT Presentation
Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers Xin Sunny Huang , T. S. Eugene Ng Rice University 1 This Work Optimizing Coflow performance has many benefits such as avoiding application straggles [1,2] and improving
Xin Sunny Huang, T. S. Eugene Ng Rice University
1
benefits such as avoiding application straggles[1,2] and improving resource utilization[3,4].
factor to determine Coflow performance.
find good placement for Coflows.
2 [1] Orchestra (SIGCOMM ’11). [2] Varys (SIGCOMM ’14). [3] CARBYNE (OSDI ‘16). [4] YARN-ME (memory elasticity, in ATC ’17)
Coflow #3
(broadcast)
Coflow #2
(aggregation)
Coflow #1
(shuffle)
i.e. the slowest flow’s completion time.
[1] Chowdhury, M. et al. Coflow: An application layer abstraction for cluster networking. (HotNets’12) 3
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
4
i.e. predetermined sender/receiver locations.
Varys (SIGCOMM ’14), Aalo (SIGCOMM ’15), CODA (SIGCOMM ’16) and Sunflow (CoNEXT ’16), etc.
Existing
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
5
i.e. predetermined sender/receiver locations.
Varys (SIGCOMM ’14), Aalo (SIGCOMM ’15), CODA (SIGCOMM ’16) and Sunflow (CoNEXT ’16), etc.
Existing Newly arriving
to choose machines for tasks in a stage).
6
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
to choose machines for tasks in a stage).
7
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
to choose machines for tasks in a stage).
8
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
to choose machines for tasks in a stage).
9
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
to choose machines for tasks in a stage).
10
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow
to choose machines for tasks in a stage).
11
1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3 1 2 N
. . .
1 2 N
. . .
N-1 N-1
3 3
This work: good placement under
Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow
12
13
14
15
16
17
1 2 4 1 2 4 3 3
30
2 2 2 2 2
30 30 50
C2
How to place?
Network with C1
s1 s2 s3 r2 r1 30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4
18
1 2 4 1 2 4 3 3
30
2 2 2 2 2
30 30 50
C2
How to place?
Network with C1
s1 s2 s3 r2 r1 30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 Only consider C2 : C1 is prioritized under
and thus C1 is not sensitive to C2.
19
C2
2 2 2 2 2 30 30 30 50 s1 s2 s3 r2 r1
How to place?
Network with C1
3 4 in.1 2
3 4
Optimal
3 4 in.1 2
3 4
less bandwidth Bottleneck at r1
50 30 2 30 2 2 30 2 2
Place r1 at less busy port out.4
Challenge #2: Inter-Coflow Bottleneck Contentions
20
3 4 in.1 2
3 4 50 30 2 30 2 2 30 2 2
C3
20 20 20 s1 s2 s3 r1
How to place?
Place r1 at less busy port out.1 Optimal
3 4 in.1 2 50 30 2 30 2 2 30 2 2
3 4
In-cast bottleneck at r1 in.1, out.3, out.4: heavily delay C2 (priority: C1>C3>C2)
Intra-Coflow Inter-Coflow
21
Summary: Keys to Coflow Placement
Avoid delaying critical endpoints (bottleneck) Avoid contentions among critical endpoints.
Intra-Coflow Inter-Coflow
22
2D-Placement
Identify critical endpoints that require better placement.
Step 1: Calculate endpoint demand
Intra-Coflow Inter-Coflow
23
2D-Placement
Identify critical endpoints that require better placement.
Find ports with less contentions.
Step 2: Calculate load on ports Step 1: Calculate endpoint demand
Intra-Coflow Inter-Coflow
24
2D-Placement
Step 3: Place heavily loaded endpoints
Identify critical endpoints that require better placement.
Find ports with less contentions. Avoid contentions on critical endpoints.
Step 2: Calculate load on ports Step 1: Calculate endpoint demand
Intra-Coflow Inter-Coflow
25
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4
C2
90 50 Network with C1
Intra-Coflow Inter-Coflow
26
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4
C2
90 50 Network with C1
Step 1: Calculate endpoint demand
Intra-Coflow Inter-Coflow
27
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1
Step 1: Calculate endpoint demand
Intra-Coflow Inter-Coflow
28
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
80
Intra-Coflow Inter-Coflow
29
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
80 32
Intra-Coflow Inter-Coflow
30
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
80 32 34
Intra-Coflow Inter-Coflow
31
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
80 32 34 90
Intra-Coflow Inter-Coflow
32
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
80 32 34 90 52
Intra-Coflow Inter-Coflow
33
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
50 30 2 30 2 2 30 2 2 80 32 34 90 52
Intra-Coflow Inter-Coflow
34
2D-Placement
30 30 30 50 s1 s2 s3 r2 r1 2 2 2 2 2 3 4 in.1 2
3 4 30 30 80 2 4 4 4 4 2
C2
90 50
Step 2: Calculate load on ports
Network with C1 Step 3: Place heavily loaded endpoints
Step 1: Calculate endpoint demand
50 30 2 30 2 2 30 2 2
Greedy heuristic
80 32 34 90 52
both designed to minimize average CCT by prioritizing small Coflows to avoid HOL blocking.
[1] Varys (SIGCOMM ’14). [2] Aalo (SIGCOMM ’15). [3] Neat (CoNEXT ‘16) 35
36
0.87 0.82 0.77 0.77 0.87 0.00 0.20 0.40 0.60 0.80 1.00 x0.5 x0.75 x1 x1.25 x1.5
Traffic Scale Factor
2D-Placement’s average-CCT over Neat’s average-CCT
2D-Placement improves over Neat by up to 23% under Aalo Scheduling.
↓ Lower is better Aalo
37
100 300 500 700 900 1100 0.001 0.1 10 1000
Aalo
Ratio of Coflow bottleneck L over link bandwidth B (second) CCT reduction (second)
Individual CCT Reduction by 2D-Placement from Neat Small Coflows are prioritized and less sensitive to placement. Large Coflows are harder to place and more sensitive to placement.
For large Coflows, 2D-Placement is
↑ Higher is better
60 sec Reduction = 0
38
More in paper: Results under Varys scheduling, Sensitivity to Schedulers, …
decisive impact on Coflow performance.
inter-flow dependency.
find good placement for Coflows.
39
40
Xin Sunny Huang, T. S. Eugene Ng Rice University
41
42
under Aalo scheduling.
1.
Aalo, due to lack of precise information of Coflow size, may allow temporary violation of the smallest- Coflow-first priority.
2.
Neat optimizes placement based on a specific traffic priority used for scheduling. Thus it is prone to error in scheduling dynamics during runtime.
3.
2D-Placement optimizes placement in a more general case independent of the scheduling.
43
0.87 0.82 0.77 0.77 0.87 1.00 0.96 0.79 0.74 0.78 0.00 0.20 0.40 0.60 0.80 1.00 x0.5 x0.75 x1 x1.25 x1.5
Traffic Scale Factor Aalo Varys
2D-Placement’s average-CCT over Neat’s average-CCT
2D-Placement improves over Neat by up to 26%.
↓ Lower is better
44
For large Coflows, 2D-Placement is only 0.85× (0.92×) of Neat under Aalo (Varys) scheduling.
100 300 500 700 900 1100 0.001 0.1 10 1000
Aalo
CCT reduction (second)
100 300 500 700 900 1100 1300 1500 0.001 0.1 10 1000
Varys
Individual CCT Reduction by 2D-Placement from Neat
Ratio of Coflow bottleneck L over link bandwidth B (second)
45
Xin Sunny Huang, T. S. Eugene Ng Rice University