T owards An Application Objective-Aware Network Interface Sangeetha - - PowerPoint PPT Presentation

t owards an application objective aware network interface
SMART_READER_LITE
LIVE PREVIEW

T owards An Application Objective-Aware Network Interface Sangeetha - - PowerPoint PPT Presentation

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey HotCloud20 Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter


slide-1
SLIDE 1

HotCloud’20

T

  • wards An Application Objective-Aware Network Interface

Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey

slide-2
SLIDE 2

Evolution of Application Network Interface (ANI)

2

Network Fabric

ANI Packet Metrics Delay, jitter

slide-3
SLIDE 3

Evolution of Application Network Interface (ANI)

2

Network Fabric

ANI Packet Flow Metrics Delay, jitter Flow Completion Time

slide-4
SLIDE 4

Evolution of Application Network Interface (ANI)

2

Network Fabric

ANI Packet Flow Coflow Metrics Delay, jitter Flow Completion Time Coflow Completion Time

slide-5
SLIDE 5

What is the ultimate goal of an ANI?

Translating application requirements to actionable network requirements

3

Are current ANIs sufficient?

slide-6
SLIDE 6

Understanding an Application’s Objective

  • Applications have complex interdependencies

between computation and communication

  • Prioritizing flows based on computations in

succeeding stage is critical

4

A B C

f1 f2 c1 c3 f2 f1 c2

Network Compute

Coflow-Optimized

f1 f2 c1 c2 c3 1 1.5 2 2.5

Performance-Optimized

f1 f2 c1 c2 c3 0.5 1.5 1 2

Current abstractions fail to capture application

  • bjective effectively
slide-7
SLIDE 7

An Example Application: Distributed Deep Learning

5

  • Gigabytes of data transferred in each iteration

which lasts milliseconds 
 (e.g., VGG-16 send ~1GB data every 200ms)

  • Parameters consumed in a particular order
  • Parameter updates from PS to workers send in

the best order can accelerate training

Worker Worker Worker Parameter Server

  • p1
  • p2
  • p4
  • p3
  • p4’
  • p2’
  • p1’
  • p3’

Read A Read B Read C Read D Update A Update B Update C Update D

Sample TensorFlow Model: One Iteration

Input Data

slide-8
SLIDE 8

Other Applications

  • User-facing partition-aggregation workloads


(remote dependency resolution at a Web proxy)

  • Graph processing systems
  • Iterative analytics with deadlines (eg: Naiad) and so on …

6

Client Proxy Req 1 Req n

Gather Update Scatter

slide-9
SLIDE 9

Towards A Novel Application Network Interface

  • Computation completely represented by a DAG. What is the network equivalent?
  • The goal is to capture an application’s network objective
  • CadentFlow:

7

CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …

slide-10
SLIDE 10

Towards A Novel Application Network Interface

  • Computation completely represented by a DAG. What is the network equivalent?
  • The goal is to capture an application’s network objective
  • CadentFlow:
  • A set of flows with metrics AND

7

CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …

slide-11
SLIDE 11

Towards A Novel Application Network Interface

  • Computation completely represented by a DAG. What is the network equivalent?
  • The goal is to capture an application’s network objective
  • CadentFlow:
  • A set of flows with metrics AND
  • An application-level objective

7

CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …

slide-12
SLIDE 12

Towards A Novel Application Network Interface

  • Computation completely represented by a DAG. What is the network equivalent?
  • The goal is to capture an application’s network objective
  • CadentFlow:
  • A set of flows with metrics AND
  • An application-level objective
  • Metrics may be priority, deadline, weight, etc.

7

CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …

slide-13
SLIDE 13

Towards A Novel Application Network Interface

  • Computation completely represented by a DAG. What is the network equivalent?
  • The goal is to capture an application’s network objective
  • CadentFlow:
  • A set of flows with metrics AND
  • An application-level objective
  • Metrics may be priority, deadline, weight, etc.

7

CF = {(f1, T1), (f2, T2), … , (fn, Tn), Γ} where Ti = (ti1, mi1), (ti2, mi2) …

slide-14
SLIDE 14

Defining CCT flexibility ratio

  • When computation is the bottleneck,

CadentFlow with deadlines provide flexibility for delaying some flows without affecting application performance

  • In the example, best Coflow Completion Time

(CCT) is 1s, but upto 1.5s is tolerable without any impact

  • CCT flexibility ratio = Max tolerable CCT

Min CCT

8

A B C

f1 f2 c1 c3 f2 f1 c2

Performance-Optimized

f1 f2 c1 c2 c3 0.5 1.5 1 2

c1 takes 0.5s c1 takes 1s

Performance-Optimized

f1 f2 c1 c2 0.5 1.5 2 c3 2.5

slide-15
SLIDE 15

Distributed DNN Training CadentFlow

  • Priority-based
  • Assign priorities based on DAG structure
  • Objective: Minimize completion time subject

to priorities

9

Sample TensorFlow Model: One Iteration

  • p4’
  • p2’
  • p1’
  • p3’
  • p1
  • p2
  • p4
  • p3

Read A Read B Read C Read D Update A Update B Update C Update D Input Data

p1 p2 p2 p3

slide-16
SLIDE 16

Distributed DNN Training CadentFlow

  • Priority-based
  • Assign priorities based on DAG structure
  • Objective: Minimize completion time subject

to priorities

  • Deadline-based
  • Assign deadlines based on per-op computation

time

  • Objective: Minimize maxi(endTimei− deadlineii)
  • 9

Sample TensorFlow Model: One Iteration

  • p4’
  • p2’
  • p1’
  • p3’
  • p1
  • p2
  • p4
  • p3

Read A Read B Read C Read D Update A Update B Update C Update D Input Data

t=3ms t=5ms t=4ms t=2ms

d=0ms d=3ms d=3ms d=12ms

slide-17
SLIDE 17

Distributed DNN Training CadentFlow

  • Priority-based
  • Assign priorities based on DAG structure
  • Objective: Minimize completion time subject

to priorities

  • Deadline-based
  • Assign deadlines based on per-op computation

time

  • Objective: Minimize maxi(endTimei− deadlineii)
  • 9

Sample TensorFlow Model: One Iteration

  • p4’
  • p2’
  • p1’
  • p3’
  • p1
  • p2
  • p4
  • p3

Read A Read B Read C Read D Update A Update B Update C Update D Input Data

t=3ms t=5ms t=4ms t=2ms

d=0ms d=3ms d=3ms d=12ms

delay of flow i

slide-18
SLIDE 18

Quantifying benefits achievable with a better network abstraction

  • Representative application: distributed deep learning
  • Methodology
  • Tracing distributed deep learning workloads to obtain

dependencies and computation/communication times

  • Simulate various network control schemes
  • 1. TCP (max-min fairness across flows sharing

a link)

  • 2. Minimum Allocation for Desired Duration

(MADD) [Coflow control in Varys]

  • 3. CadentFlow-optimized scheme

10

Sample TensorFlow Model: One Iteration

  • p4’
  • p2’
  • p1’
  • p3’
  • p1
  • p2
  • p4
  • p3

Read A Read B Read C Read D Update A Update B Update C Update D Input Data

slide-19
SLIDE 19

Performance Improvement

11

AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)

Coflow optimized CadentFlow optimized

CadentFlow optimization Coflow-optimization

8 workers, 8 PS

Up to 25% improvement in iteration time with CadentFlow

slide-20
SLIDE 20

Performance Improvement

11

AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)

Coflow optimized CadentFlow optimized

CadentFlow optimization Coflow-optimization

8 workers, 8 PS

Coflow optimization may delay completion time because smaller parameters are delayed

slide-21
SLIDE 21

Performance Improvement

11

AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)

Coflow optimized CadentFlow optimized

CadentFlow optimization Coflow-optimization

8 workers, 8 PS

  • v2

et

  • v1
  • v3
  • v2

50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)

16 workers,16 PS

slide-22
SLIDE 22

Performance Improvement

11

AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)

Coflow optimized CadentFlow optimized

CadentFlow optimization Coflow-optimization

8 workers, 8 PS

  • v2

et

  • v1
  • v3
  • v2

50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)

16 workers,16 PS

  • v2

et

  • v1
  • v3
  • v2
  • 50

52 00 01 52

  • 19

0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT flexibility ratio (max feasible CCT/ min CCT)

  • v2

et

  • v1
  • v3
  • v2
  • 50

52 00 01 52

  • 19

0.4 0.8 1.2 1.6 CCT flexibility ratio (max feasible CCT/ min CCT)

slide-23
SLIDE 23

Performance Improvement

11

AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP)

Coflow optimized CadentFlow optimized

CadentFlow optimization Coflow-optimization

8 workers, 8 PS

  • v2

et

  • v1
  • v3
  • v2

50 52 00 01 52 19 0.2 0.4 0.6 0.8 1 Iteration time (relative to TCP)

16 workers,16 PS

  • v2

et

  • v1
  • v3
  • v2
  • 50

52 00 01 52

  • 19

0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT flexibility ratio (max feasible CCT/ min CCT)

  • v2

et

  • v1
  • v3
  • v2
  • 50

52 00 01 52

  • 19

0.4 0.8 1.2 1.6 CCT flexibility ratio (max feasible CCT/ min CCT)

When gain in iteration time is lower, CCT flexibility ratio is higher

slide-24
SLIDE 24

Open Challenges

  • Extracting the application objective
  • Designing network controllers that can handle multiple application objectives
  • How to handle conflicting objectives?
  • Implementation challenges
  • Real-time decision making
  • End host vs. in-network implementation

12

slide-25
SLIDE 25

Thank You

Email: sangeetha.aj@uci.edu