HotCloud’20
T
- wards An Application Objective-Aware Network Interface
T owards An Application Objective-Aware Network Interface Sangeetha - - PowerPoint PPT Presentation
T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey HotCloud20 Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter
HotCloud’20
2
Network Fabric
2
Network Fabric
2
Network Fabric
3
4
A B C
f1 f2 c1 c3 f2 f1 c2
Network Compute
f1 f2 c1 c2 c3 1 1.5 2 2.5
f1 f2 c1 c2 c3 0.5 1.5 1 2
5
Worker Worker Worker Parameter Server
Read A Read B Read C Read D Update A Update B Update C Update D
Sample TensorFlow Model: One Iteration
Input Data
6
Client Proxy Req 1 Req n
Gather Update Scatter
7
CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …
7
CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …
7
CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …
7
CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …
7
CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …
CadentFlow with deadlines provide flexibility for delaying some flows without affecting application performance
(CCT) is 1s, but upto 1.5s is tolerable without any impact
Min CCT
8
A B C
f1 f2 c1 c3 f2 f1 c2
Performance-Optimized
f1 f2 c1 c2 c3 0.5 1.5 1 2
c1 takes 0.5s c1 takes 1s
Performance-Optimized
f1 f2 c1 c2 0.5 1.5 2 c3 2.5
9
Sample TensorFlow Model: One Iteration
Read A Read B Read C Read D Update A Update B Update C Update D Input Data
Sample TensorFlow Model: One Iteration
Read A Read B Read C Read D Update A Update B Update C Update D Input Data
t=3ms t=5ms t=4ms t=2ms
Sample TensorFlow Model: One Iteration
Read A Read B Read C Read D Update A Update B Update C Update D Input Data
t=3ms t=5ms t=4ms t=2ms
delay of flow i
dependencies and computation/communication times
a link)
(MADD) [Coflow control in Varys]
10
Sample TensorFlow Model: One Iteration
Read A Read B Read C Read D Update A Update B Update C Update D Input Data
11
AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)
Coflow optimized CadentFlow optimized
CadentFlow optimization Coflow-optimization
8 workers, 8 PS
11
AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)
Coflow optimized CadentFlow optimized
CadentFlow optimization Coflow-optimization
8 workers, 8 PS
11
AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)
Coflow optimized CadentFlow optimized
CadentFlow optimization Coflow-optimization
8 workers, 8 PS
et
50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)
16 workers,16 PS
11
AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)
Coflow optimized CadentFlow optimized
CadentFlow optimization Coflow-optimization
8 workers, 8 PS
et
50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)
16 workers,16 PS
et
52 00 01 52
0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT flexibility ratio (max feasible CCT/ min CCT)
et
52 00 01 52
0.4 0.8 1.2 1.6 CCT flexibility ratio (max feasible CCT/ min CCT)
11
AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)
Coflow optimized CadentFlow optimized
CadentFlow optimization Coflow-optimization
8 workers, 8 PS
et
50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)
16 workers,16 PS
et
52 00 01 52
0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT flexibility ratio (max feasible CCT/ min CCT)
et
52 00 01 52
0.4 0.8 1.2 1.6 CCT flexibility ratio (max feasible CCT/ min CCT)
12
Email: sangeetha.aj@uci.edu