T owards An Application Objective-Aware Network Interface Sangeetha - PowerPoint PPT Presentation

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey HotCloud’20

Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter Network Fabric 2

Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter Flow Flow Completion Time Network Fabric 2

Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter Flow Flow Completion Time Network Fabric Coflow Coflow Completion Time 2

What is the ultimate goal of an ANI? Translating application requirements to actionable network requirements Are current ANIs sufficient? 3

Understanding an Application’s Objective • Applications have complex interdependencies f 2 c 2 f 1 between computation and communication A C • Prioritizing flows based on computations in f 1 c 1 succeeding stage is critical f 2 B c 3 Coflow-Optimized Performance-Optimized f 1 Network f 1 f 2 Current abstractions fail f 2 to capture application objective effectively c 1 c 2 c 3 Compute c 1 c 2 c 3 0 0.5 1 1.5 2 0 1 1.5 2 2.5 4

An Example Application: Distributed Deep Learning Parameter Server Worker Worker Worker • Gigabytes of data transferred in each iteration Update A op1’ which lasts milliseconds   (e.g., VGG-16 send ~1GB data every 200ms) op2’ op3’ Update B Update C • Parameters consumed in a particular order op4’ Update D • Parameter updates from PS to workers send in op4 Read D the best order can accelerate training Read B op2 op3 Read C op1 Read A Input Data Sample TensorFlow Model: One Iteration 5

Other Applications Req 1 • User-facing partition-aggregation workloads   (remote dependency resolution at a Web proxy) Req n Client Proxy • Graph processing systems • Iterative analytics with deadlines (eg: Naiad) and so on … Gather Scatter Update 6

Towards A Novel Application Network Interface • Computation completely represented by a DAG. What is the network equivalent? • The goal is to capture an application’s network objective • CadentFlow: CF = {(f 1 , T 1 ), (f 2 , T 2 ), … , (f n , T n ), Γ } where T i = (t i1 , m i1 ), (t i2 , m i2 ) … 7

Towards A Novel Application Network Interface • Computation completely represented by a DAG. What is the network equivalent? • The goal is to capture an application’s network objective • CadentFlow: • A set of flows with metrics AND CF = {(f 1 , T 1 ), (f 2 , T 2 ), … , (f n , T n ), Γ } where T i = (t i1 , m i1 ), (t i2 , m i2 ) … 7

Towards A Novel Application Network Interface • Computation completely represented by a DAG. What is the network equivalent? • The goal is to capture an application’s network objective • CadentFlow: • A set of flows with metrics AND • An application-level objective CF = {(f 1 , T 1 ), (f 2 , T 2 ), … , (f n , T n ), Γ } where T i = (t i1 , m i1 ), (t i2 , m i2 ) … 7

Towards A Novel Application Network Interface • Computation completely represented by a DAG. What is the network equivalent? • The goal is to capture an application’s network objective • CadentFlow: • A set of flows with metrics AND • An application-level objective • Metrics may be priority, deadline, weight, etc. CF = {(f 1 , T 1 ), (f 2 , T 2 ), … , (f n , T n ), Γ } where T i = (t i1 , m i1 ), (t i2 , m i2 ) … 7

Defining CCT flexibility ratio f 2 c 2 f 1 A When computation is the bottleneck, • C f 1 c 1 CadentFlow with deadlines provide flexibility for delaying some flows without affecting f 2 B c 3 application performance Performance-Optimized Performance-Optimized In the example, best Coflow Completion Time • (CCT) is 1s, but upto 1.5s is tolerable without f 1 f 2 f 1 f 2 any impact CCT flexibility ratio = Max tolerable CCT • c 1 c 2 c 3 c 1 c 2 c 3 Min CCT 2.5 0 0.5 1 1.5 2 0 0.5 1.5 2 c1 takes 0.5s c1 takes 1s 8

Distributed DNN Training CadentFlow • Priority-based • Assign priorities based on DAG structure Update A op1’ Update B op2’ op3’ Update C • Objective: Minimize completion time subject to priorities op4’ Update D p 3 op4 Read D p 2 Read B op2 op3 Read C p 2 p 1 op1 Read A Input Data Sample TensorFlow Model: One Iteration 9

Distributed DNN Training CadentFlow • Priority-based • Assign priorities based on DAG structure Update A op1’ Update B op2’ op3’ Update C • Objective: Minimize completion time subject to priorities op4’ Update D d=12ms t=2ms • Deadline-based op4 Read D • Assign deadlines based on per-op computation Read B op2 op3 Read C d=3ms d=3ms t=5ms t=4ms time op1 Read A d=0ms t=3ms Input Data • Objective: Minimize max i (endTime i − deadline i i ) Sample TensorFlow Model: One Iteration • 9

Distributed DNN Training CadentFlow • Priority-based • Assign priorities based on DAG structure Update A op1’ Update B op2’ op3’ Update C • Objective: Minimize completion time subject to priorities op4’ Update D d=12ms t=2ms • Deadline-based op4 Read D • Assign deadlines based on per-op computation Read B op2 op3 Read C d=3ms d=3ms t=5ms t=4ms time op1 Read A d=0ms t=3ms Input Data • Objective: Minimize max i (endTime i − deadline i i ) Sample TensorFlow Model: One Iteration delay of flow i • 9

Quantifying benefits achievable with a better network abstraction • Representative application: distributed deep learning • Methodology Update A op1’ • Tracing distributed deep learning workloads to obtain Update B op2’ op3’ Update C dependencies and computation/communication times op4’ Update D • Simulate various network control schemes op4 Read D 1. TCP (max-min fairness across flows sharing Read B op2 op3 Read C a link) op1 Read A 2. Minimum Allocation for Desired Duration (MADD) [Coflow control in Varys] Input Data 3. CadentFlow-optimized scheme Sample TensorFlow Model: One Iteration 10

Performance Improvement Coflow-optimization CadentFlow optimization Co fl ow optimized CadentFlow optimized AlexNet-v2 CifarNet Inception-v1 Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 Up to 25% improvement in iteration time ResNet-v2-101 with CadentFlow ResNet-v2-152 VGG-19 0 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP) 8 workers, 8 PS 11

Performance Improvement Coflow-optimization CadentFlow optimization Co fl ow optimized CadentFlow optimized Coflow optimization may delay AlexNet-v2 CifarNet completion time because smaller Inception-v1 parameters are delayed Inception-v3 MobileNet-v2 ResNet-v1-50 ResNet-v1-152 ResNet-v1-200 ResNet-v2-101 ResNet-v2-152 VGG-19 0 0.2 0.4 0.6 0.8 1 1.2 Iteration time (relative to TCP) 8 workers, 8 PS 11

Performance Improvement Coflow-optimization CadentFlow optimization Co fl ow optimized CadentFlow optimized AlexNet-v2 -v2 CifarNet et Inception-v1 -v1 Inception-v3 -v3 MobileNet-v2 -v2 ResNet-v1-50 50 ResNet-v1-152 52 ResNet-v1-200 00 ResNet-v2-101 01 ResNet-v2-152 52 VGG-19 19 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Iteration time Iteration time (relative to TCP) (relative to TCP) 8 workers, 8 PS 16 workers,16 PS 11

Performance Improvement Coflow-optimization CadentFlow optimization Co fl ow optimized CadentFlow optimized -v2 AlexNet-v2 -v2 -v2 et CifarNet et et -v1 Inception-v1 -v1 -v1 -v3 Inception-v3 -v3 -v3 -v2 MobileNet-v2 -v2 -v2 -50 ResNet-v1-50 50 -50 52 ResNet-v1-152 52 52 00 ResNet-v1-200 00 00 01 ResNet-v2-101 01 01 52 ResNet-v2-152 52 52 -19 VGG-19 -19 19 0 0.4 0.8 1.2 1.6 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT fl exibility ratio Iteration time Iteration time CCT fl exibility ratio (max feasible CCT/ min CCT) (relative to TCP) (relative to TCP) (max feasible CCT/ min CCT) 8 workers, 8 PS 16 workers,16 PS 11

Performance Improvement Coflow-optimization CadentFlow optimization Co fl ow optimized CadentFlow optimized -v2 AlexNet-v2 -v2 -v2 et CifarNet et et -v1 Inception-v1 -v1 -v1 -v3 Inception-v3 -v3 -v3 -v2 MobileNet-v2 -v2 -v2 -50 ResNet-v1-50 50 -50 52 ResNet-v1-152 52 52 00 ResNet-v1-200 00 00 01 ResNet-v2-101 01 01 52 ResNet-v2-152 52 52 -19 VGG-19 -19 19 0 0.4 0.8 1.2 1.6 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 CCT fl exibility ratio Iteration time Iteration time CCT fl exibility ratio (max feasible CCT/ min CCT) (relative to TCP) (relative to TCP) (max feasible CCT/ min CCT) When gain in iteration time is lower, CCT flexibility ratio is higher 8 workers, 8 PS 16 workers,16 PS 11

T owards An Application Objective-Aware Network Interface Sangeetha - PowerPoint PPT Presentation

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey HotCloud20 Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Socket (Session) Aware Socket (Session) Aware Change of IP - SACIP Change of IP - SACIP network

Network Coding-Aware Queue Network Coding Aware Queue Management for Unicast Flows over Coded

T owards Network Containment in Malware Analysis Systems Mariano Graziano, Corrado Leita, Davide

Presenty User Interface Application Framework MIT license http://www.squeaksource.com/Presenty

Project Objective Project Objective Project Objective Project

COMBINING SWIFT AND OBJECTIVE-C AGENDA Using Objective-C from Swift Using Swift from

I get NOC'd down... ...but I get up again T owards excellent NOC customer service even

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

Linux Kernel Crypto API Herbert Xu Red Hat Inc. Current State Async + sync cipher interface.

WatchKit Segues Segues Transition to another interface controller Push segues and modal segues

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract

User Interface Design User Interface Design Designing effective Designing effective interfaces

Introduction to Multi-Objective Investment Decision-Making New York City, New York December 3-4,

Adaptive Management: Overview and Superfund Task Force Pilot Case Studies Kate Garufi, EPA HQ

Optimal Asset Allocation and Risk Shifting in Money Management Suleyman Basak (LBS), Anna Pavlova

Air Traffic Complexity Resolution in Multi-Sector Planning Using CP Pierre Flener 1 Justin Pearson

Advisory Group Call National Center for Health in Public Housing 10/22/2019 Agenda HRSA

Challenges in Framing the Problem: Just what are we trying to optimize anyway? Michael C. Runge

Ori Data Structures and Operations Data Structures and Operations An Interactive Paper

Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning 1 Support

T owards An Application Objective-Aware Network Interface Sangeetha - PowerPoint PPT Presentation

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi Hashemi Roy Campbell Brighten Godfrey HotCloud20 Evolution of Application Network Interface (ANI) ANI Metrics Packet Delay, jitter

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Socket (Session) Aware Socket (Session) Aware Change of IP - SACIP Change of IP - SACIP network

Network Coding-Aware Queue Network Coding Aware Queue Management for Unicast Flows over Coded

T owards Network Containment in Malware Analysis Systems Mariano Graziano, Corrado Leita, Davide

Presenty User Interface Application Framework MIT license http://www.squeaksource.com/Presenty

Project Objective Project Objective Project Objective Project

COMBINING SWIFT AND OBJECTIVE-C AGENDA Using Objective-C from Swift Using Swift from

I get NOC'd down... ...but I get up again T owards excellent NOC customer service even

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

Linux Kernel Crypto API Herbert Xu Red Hat Inc. Current State Async + sync cipher interface.

WatchKit Segues Segues Transition to another interface controller Push segues and modal segues

TDDE18 &amp; 726G77 Interface, command line and vector interface An interface is an abstract

User Interface Design User Interface Design Designing effective Designing effective interfaces

Introduction to Multi-Objective Investment Decision-Making New York City, New York December 3-4,

Adaptive Management: Overview and Superfund Task Force Pilot Case Studies Kate Garufi, EPA HQ

Optimal Asset Allocation and Risk Shifting in Money Management Suleyman Basak (LBS), Anna Pavlova

Air Traffic Complexity Resolution in Multi-Sector Planning Using CP Pierre Flener 1 Justin Pearson

Advisory Group Call National Center for Health in Public Housing 10/22/2019 Agenda HRSA

Challenges in Framing the Problem: Just what are we trying to optimize anyway? Michael C. Runge

Ori Data Structures and Operations Data Structures and Operations An Interactive Paper

Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning 1 Support

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract