Saath: Speeding up CoFlows by Exploiting the Spatial Dimension
Rohan Gandhi Y. Charlie Hu Akshay Jajoo
1
Exploiting the Spatial Dimension Akshay Jajoo Rohan Gandhi Y. - - PowerPoint PPT Presentation
Saath: Speeding up CoFlows by Exploiting the Spatial Dimension Akshay Jajoo Rohan Gandhi Y. Charlie Hu Chengkok-Koh 1 Analytics Jobs in Big Data Analytics jobs in data-centers Process huge amount of data Distributed in nature
1
2
3
Map Stage Shuffle (Communication) Reduce Stage
4
Map Stage Shuffle (Communication) Reduce Stage
[1] Based on information from full facebook trace used in Aalo. Aalo slides.
5
Map Stage Shuffle Reduce Stage
CoFlow
[2] Hotnets 2012, CoFlow: a networking abstraction for cluster application, Mosharaf Choudhary, Ion Stoica
6
Reduce Stage Map Stage
CoFlow 1 CoFlow 2
7
[3] M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with Varys. In SIGCOMM, 2014.
8
9
10
11
12
P3 P2 P3
High Low
P2 P2 P1
13
Sender Node-1 Sender Node-2
14
DC Network Receiver Node-1 Receiver Node-2
[4] Sigcomm 2014, CONGA, M. Alizadeh et.al ;[5] Sigcomm 2009, VL2, A Greenberg et.al
C3 C2
Global Co-ordinator Sender Node-1 Sender Node-2
15
DC Network
Receiver Node-1 Receiver Node-2
[6] Sigcomm 2015, Aalo, Choudhary et.al.
C2 C1
Q0 Q1
High Low C2 C3
Q0 Q1
High Low
16
Global Co-ordinator Sender Node-1 Sender Node-2
17
C2 C3 C2 C1 DC Network
Q0 Q1 Q0 Q1
18
Global Co-ordinator Sender Node-1 Sender Node-2 DC Network
Q0 Q1 Q0 Q1
Global Co-ordinator Sender Node-1 Sender Node-2 DC Network C2 C1
Q0 Q1
C2 C3
Q0 Q1
C2 C3 C2 C1 Average CCT = (2+1+2)/3= 5/3 Average CCT =(1+2+1)/3 = 4/3
19
20
21
22
23
24
25
26
27
Global Co-ordinator Sender Node-1 Sender Node-2 C2 C3 C2 C1 DC Network
Q0 Q1 Q0 Q1
Global Co-ordinator Sender Node-1 Sender Node-2 C2 C3 C2 C1 DC Network
Q0 Q1 Q0 Q1
Average CCT = (1+2+2)/3 = 5/3 Average CCT = (1+2+1)/3 = 4/3
28
t
2t
P1 P2 P3
P4
29
30
31
32
[7] : https://github.com/coflow/coflow-benchmark
33
34
35
36
37
SAATH Aalo Average P90 Average P90 Global Coordinator CPU % 37.8 42.7 33.5 35.5 Memory(MB) 229 284 267 374 Total time (msec) 0.57 2.85 0.1 0.2 (LCoF/All-or-none) (msec) (0.02/0.24) (0.03/0.7) Local Node CPU % 5.6 5.7 5.5 5.7 Memory(MB) 1.68 1.7 1.75 1.78
38
39