1 Background | Problems | Challenges | Design | Evaluation | Summary - - PowerPoint PPT Presentation

1
SMART_READER_LITE
LIVE PREVIEW

1 Background | Problems | Challenges | Design | Evaluation | Summary - - PowerPoint PPT Presentation

SPEED Resource-Ef+icient and High-Performance Deployment for Data Plane Programs Xiang Chen , Hongyan Liu, Qun Huang, Peiqiao Wang, Dong Zhang, Haifeng Zhou, Chunming Wu Control Plane Applications Monitor Security Routing Data Plane


slide-1
SLIDE 1

SPEED


Resource-Ef+icient and High-Performance Deployment for Data Plane Programs

Xiang Chen, Hongyan Liu, Qun Huang, Peiqiao Wang, Dong Zhang, 
 Haifeng Zhou, Chunming Wu

slide-2
SLIDE 2

Data Plane Programmable Switches (e.g., To+ino, Trident) Monitor Security Routing Control Plane Applications

Background | Problems | Challenges | Design | Evaluation | Summary

1

slide-3
SLIDE 3

Data Plane Programmable Switches (e.g., To+ino, Trident) Monitor Security Routing Control Plane Applications DP Programs (e.g., P4)

gen

Background | Problems | Challenges | Design | Evaluation | Summary

1

slide-4
SLIDE 4

Data Plane Programmable Switches (e.g., To+ino, Trident) Monitor Security Routing Control Plane Applications DP Programs (e.g., P4) Program Deployment

gen input deploy

Background | Problems | Challenges | Design | Evaluation | Summary

1

slide-5
SLIDE 5

Data Plane Program Deployment

MAC learn Routing Switching ACL Pkt in Pkt out

Program (4 MATs) Input: data plane programs w/ match action tables (MATs)

2

slide-6
SLIDE 6

Data Plane Program Deployment

MAC learn Routing Switching ACL Pkt in Pkt out

Program (4 MATs) Details of an MAT (ACL)

Match Pkt.srcip Pkt.dstip hit Action Output to Port1 details Rules Action Drop else Pkt in Pkt out

Input: data plane programs w/ match action tables (MATs)

2

slide-7
SLIDE 7

Data Plane Program Deployment

Switch Arch (4 stages)

MAC learn Routing Switching ACL Pkt in Pkt out

Program (4 MATs)

ALUs for Actions of MATs RAM for MAT rules S1 S2 S3 S4

Input: data plane programs w/ match action tables (MATs) Target: programmable switches w/ switch stages

2

slide-8
SLIDE 8

Data Plane Program Deployment

Switch Arch (4 stages)

MAC learn Routing Switching ACL Pkt in Pkt out

Program (4 MATs)

S1 S2 S3 S4

Output: Mapping between an MAT and a stage Input: data plane programs w/ match action tables (MATs) Target: programmable switches w/ switch stages

2

slide-9
SLIDE 9

Data Plane Program Deployment

Output: Mapping between an MAT and a stage Input: data plane programs w/ match action tables (MATs) Target: programmable switches w/ switch stages Enable deployment of advanced network applications


(1) Software-de+ined measurement: FlowRadar, Martini, PINT, OmniMon, etc. (2) In-network acceleration: NetCache, NetChain, NetLock, Cheetah, etc.
 (3) Traf+ic scheduling and optimization: PIFO, PIEO, HPCC, P4air, etc.

Background | Problems | Challenges | Design | Evaluation | Summary

2

slide-10
SLIDE 10

Background | Problems | Challenges | Design | Evaluation | Summary

Requirements of Program Deployment

Given multiple input data plane programs:

  • 1. Resource ef+iciency

given that switch resources are limited (e.g., <10 MB memory)

  • 2. High end-to-end packet processing performance

satisfy tight latency/throughput requirements issued by apps simultaneously deploy these programs on network

3

slide-11
SLIDE 11

Background | Problems | Challenges | Design | Evaluation | Summary

Limitations of Existing Solutions

(1) Compiler design: RMT (NSDI’15), dRMT (SIGCOMM’17), etc.

(2) Virtualization: Hyper4 (CoNEXT’16), P4Visor (CoNEXT’18), etc.

4

slide-12
SLIDE 12

Background | Problems | Challenges | Design | Evaluation | Summary

Limitations of Existing Solutions

(1) Compiler design: RMT (NSDI’15), dRMT (SIGCOMM’17), etc.

(2) Virtualization: Hyper4 (CoNEXT’16), P4Visor (CoNEXT’18), etc.

Support program deployment on a single programmable switch (1) Poor resource ef+iciency as scaling to multiple programs (2) Low performance due to lack of considering constraints

(device connectivity, traf+ic routing, etc.)

4

slide-13
SLIDE 13

Background | Problems | Challenges | Design | Evaluation | Summary

Goal

Provide program deployment that achieves: (1) Resource Ef+iciency: make the best use of switch resources (2) High Performance: low latency and high throughput

Program#1 Program#2 Program#N ··· Our Framework Input

  • utput

P#1 P#2 P#3 Programmable Networks

5

slide-14
SLIDE 14

Background | Problems | Challenges | Design | Evaluation | Summary

Challenges

(1) Program diversity: case-by-case analysis and deployment
 e.g., Count-Min (sequential layout), NetCache (branch-heavy)

6

slide-15
SLIDE 15

Background | Problems | Challenges | Design | Evaluation | Summary

Challenges

(1) Program diversity: case-by-case analysis and deployment
 e.g., Count-Min (sequential layout), NetCache (branch-heavy) (2) Heterogeneous constraints: complicated problem solving
 switch resource limitations vs. network-wide constraints

(e.g., device connectivity)

6

slide-16
SLIDE 16

Background | Problems | Challenges | Design | Evaluation | Summary

Challenges

(1) Program diversity: case-by-case analysis and deployment
 e.g., Count-Min (sequential layout), NetCache (branch-heavy) (2) Heterogeneous constraints: complicated problem solving
 switch resource limitations vs. network-wide constraints (3) Inter-device coordination: pkt scheduling among switches
 to preserve original packet processing semantics

(e.g., device connectivity)

6

slide-17
SLIDE 17

SPEED Framework

Background | Problems | Challenges | Design | Evaluation | Summary

(1) Table dependency graph for program diversity (2) Program merging for achieving resource ef+iciency (3) One big switch for heterogeneous constraints (4) Inter-device packet scheduling for device coordination

7

slide-18
SLIDE 18

SPEED Framework

Background | Problems | Challenges | Design | Evaluation | Summary

(1) Table dependency graph for program diversity (2) Program merging for achieving resource ef+iciency (3) One big switch for heterogeneous constraints (4) Inter-device packet scheduling for device coordination This Talk

7

slide-19
SLIDE 19

Table Dependency Graph (TDG)

Universal intermediate representation of data plane programs
 T=(VT, ET): a node in VT is an MAT; an edge in ET is an MAT dep

L2/L3 routing program TDG for the program

Figures extracted from “Compiling Packet Programs to Recon+igurable Switches”, NSDI 2015

8

slide-20
SLIDE 20

Table Dependency Graph (TDG)

Universal intermediate representation of data plane programs
 T=(VT, ET): a node in VT is an MAT; an edge in ET is an MAT dep

L2/L3 routing program TDG for the program

Bene+it#1: Handle program diversity Bene+it#2: Ease SPEED analysis on program properties

Background | Problems | Challenges | Design | Evaluation | Summary

8

slide-21
SLIDE 21

Program Merging for Resource EfLiciency

Motivation#1: Requirement for reducing resource usage
 Motivation#2: Occurrence of redundant MATs among programs

Background | Problems | Challenges | Design | Evaluation | Summary

9

slide-22
SLIDE 22

Program Merging for Resource EfLiciency

Motivation#1: Requirement for reducing resource usage
 Motivation#2: Occurrence of redundant MATs among programs In Software-de+ined Measurement (SDM):

Background | Problems | Challenges | Design | Evaluation | Summary

Program#1 for +low count Program#2 for heavy hitter Program#3 for anomalies

9

slide-23
SLIDE 23

Program Merging for Resource EfLiciency

Motivation#1: Requirement for reducing resource usage
 Motivation#2: Occurrence of redundant MATs among programs In Software-de+ined Measurement (SDM):

Program#1 for +low count Program#2 for heavy hitter Program#3 for anomalies A: CRC hashing B: CRC hashing C: CRC hashing

Background | Problems | Challenges | Design | Evaluation | Summary

Redundant MATs (3× hashing)

9

slide-24
SLIDE 24

Program Merging for Resource EfLiciency

Motivation#1: Requirement for reducing resource usage
 Motivation#2: Occurrence of redundant MATs among programs In Software-de+ined Measurement (SDM):

Program#1 for +low count Program#2 for heavy hitter Program#3 for anomalies

Redundant MATs (3× hashing)

Background | Problems | Challenges | Design | Evaluation | Summary

(only one hashing)

Program#4 merge #1-#3

+ + =

CRC hashing A: CRC hashing B: CRC hashing C: CRC hashing

9

slide-25
SLIDE 25

Program Merging for Resource EfLiciency

Algorithm based on longest common subsequence (LCS)
 Input: n TDGs Output: a compound TDG, Tm WorkLlow: n-1 iterations; each iteration takes 2 TDGs to merge

Background | Problems | Challenges | Design | Evaluation | Summary

10

slide-26
SLIDE 26

a1 a2 a3 b1 b2 b4 b3 b5 (a) TDG T1 (b) TDG T2

11

slide-27
SLIDE 27

a1 a2 a3 b1 b2 b4 b3 b5 (a) TDG T1 (b) TDG T2 (d) Pairs of 
 Redundant MATs a1 b1 a2 b3 a3 b4 a1 a2 a3 (c) Topological Orderings b1 b2 b3 b4 b5

11

slide-28
SLIDE 28

a1 a2 a3 b1 b2 b4 b3 b5 (a) TDG T1 (b) TDG T2 a1 b1 a2 b3 a3 b4 a1 a2 a3 (c) Topological Orderings (e) Longest Common
 Subsequence (LCS) b1 b2 b3 b4 b5 b1 b2 b3 b4 b5 a1 a2 a3 (d) Pairs of 
 Redundant MATs

11

slide-29
SLIDE 29

a1 a2 a3 b1 b2 b4 b3 b5 (a) TDG T1 (b) TDG T2 c1 b2 c3 c2 b5 (f) Merging T1 and T2
 into TDG Tm a1 b1 a2 b3 a3 b4 a1 a2 a3 (c) Topological Orderings (e) Longest Common
 Subsequence (LCS) b1 b2 b3 b4 b5 b1 b2 b3 b4 b5 a1 a2 a3 (d) Pairs of 
 Redundant MATs

11

slide-30
SLIDE 30

One Big Switch (OBS) Abstraction

To place Tm, SPEED abstracts substrate network as an OBS
 (1) Separate heterogeneous constraints in two phases (2) In each phase, only consider one objective
 Bene+it#1: Simplify program deployment Bene+it#2: Achieve multi-objective deployment

Background | Problems | Challenges | Design | Evaluation | Summary

S2 (4 stages) S1 (4 stages)

12

slide-31
SLIDE 31

One Big Switch (OBS) Abstraction

To place Tm, SPEED abstracts substrate network as an OBS
 (1) Separate heterogeneous constraints in two phases (2) In each phase, only consider one objective
 Bene+it#1: Simplify program deployment Bene+it#2: Achieve multi-objective deployment

Background | Problems | Challenges | Design | Evaluation | Summary

S2 (4 stages) OBS (8 stages, +irst 4 of S1, last 4 of S2) S1 (4 stages)

Consolidate all stages


  • f all programmable switches

12

slide-32
SLIDE 32

One Big Switch (OBS) Abstraction

Background | Problems | Challenges | Design | Evaluation | Summary

To place Tm, SPEED abstracts substrate network as an OBS
 Property#1: Separate heterogeneous constraints in two phases Property#2: In a phase, one obj and one type of constraints
 Bene+it#1: Simplify program deployment Bene+it#2: Achieve multi-objective deployment

12

slide-33
SLIDE 33

One Big Switch (OBS) Abstraction

Background | Problems | Challenges | Design | Evaluation | Summary

To place Tm, SPEED abstracts substrate network as an OBS
 Property#1: Separate heterogeneous constraints in two phases Property#2: In a phase, one obj and one type of constraints
 Bene+it#1: Simplify program deployment Bene+it#2: Achieve multi-objective deployment

12

slide-34
SLIDE 34

One Big Switch (OBS) Abstraction

Background | Problems | Challenges | Design | Evaluation | Summary

To place Tm, SPEED abstracts substrate network as an OBS
 Property#1: Separate heterogeneous constraints in two phases Property#2: In a phase, one obj and one type of constraints
 Bene+it#1: Simplify program deployment Bene+it#2: Achieve multi-objective deployment Phase#1: TDG placement on OBS 
 Phase#2: OBS placement on network Program deployment
 in SPEED

12

slide-35
SLIDE 35

Phase#1: TDG Placement on OBS

Background | Problems | Challenges | Design | Evaluation | Summary

Formulate as ILP:
 Goal: For MAT u of Tm, place u on an OBS stage v Obj: min (# occupied OBS stages) C#1: Per-stage resource limitation 
 C#2: MAT dependencies (i.e., edges of Tm) Solve ILP using Gurobi solver [1] Compound TDG Tm

[1] Gurobi solver: https://www.gurobi.com/ a2 b2 c1 a3 b3

OBS Stages

13

slide-36
SLIDE 36

Phase#2: OBS Placement on Network

Background | Problems | Challenges | Design | Evaluation | Summary

Formulate as ILP:
 Goal: For OBS stage u, place u on a real stage v Obj: max (throughput) | min (latency) C#1: One-to-one mapping 
 C#2: Performance metrics Solve ILP using Gurobi solver [1]

[1] Gurobi solver: https://www.gurobi.com/

OBS Stages Network

14

slide-37
SLIDE 37

MAT a1

[Action]
 idx = crc32(pkt.srcIP); [Match] None [Action]
 update(CM, idx); [Match] None

MAT a2

[Action]
 forward(output_port); [Match] pkt.srcIP [Rule Number] 1024 [Rule Number] 1 [Rule Number] 1

MAT a3

[Action]
 idx = crc32(pkt.srcIP); [Match] None [Action]
 update(ES, idx); [Match] None [Action]
 forward(output_port); [Match] pkt.srcIP [Rule Number] 512 [Rule Number] 1 [Rule Number] 1

MAT b1 MAT b2 MAT b3 TDG1 of
 Task#1 TDG2 of
 Task#2

Example: Software-deLined Measurement (SDM)

SDM deploys two measurement tasks via SPEED:

15

slide-38
SLIDE 38

MAT a1

[Action]
 idx = crc32(pkt.srcIP); [Match] None [Action]
 update(CM, idx); [Match] None

MAT a2

[Action]
 forward(output_port); [Match] pkt.srcIP [Rule Number] 1024 [Rule Number] 1 [Rule Number] 1

MAT a3

[Action]
 idx = crc32(pkt.srcIP); [Match] None [Action]
 update(ES, idx); [Match] None [Action]
 forward(output_port); [Match] pkt.srcIP [Rule Number] 512 [Rule Number] 1 [Rule Number] 1

MAT b1 MAT b2 MAT b3 TDG1 of
 Task#1 TDG2 of
 Task#2 Step#1: Program Merging
 Tm ← Merge(TDG1, TDG2) a2 b2 c1 a3 b3 c1 ← Merge(a1, b1)

16

slide-39
SLIDE 39

Step#1: Program Merging
 Tm ← Merge(TDG1, TDG2) a2 b2 c1 a3 b3 c1 ← Merge(a1, b1) c1 a2 b2 a3 b3 Step#2: Place Tm on OBS Stage 1 Stage 2 Stage 3 Stage 4

Background | Problems | Challenges | Design | Evaluation | Summary

17

slide-40
SLIDE 40

Step#1: Program Merging
 Tm ← Merge(TDG1, TDG2) a2 b2 c1 a3 b3 c1 ← Merge(a1, b1)

Path#1
 t=55ms

Link (N1,N2)

N2

c1 a2 b2 a3 b3 Step#2: Place Tm on OBS Stage 1 Stage 2 Stage 3 Stage 4 Step#3: Place OBS on Network

N1

S1 S2 Path#2
 t=32ms Background | Problems | Challenges | Design | Evaluation | Summary

17

slide-41
SLIDE 41

Background | Problems | Challenges | Design | Evaluation | Summary

Evaluation

Testbed: Sender <=> To+ino <=> Receiver; Simulator: Mininet Workload: 10 real programs (5 SDM, 5 switch.p4) Comparison: FFL, FFLS (NSDI’15), Heuristics (BFS, NodeRank) (1) Can SPEED achieve resource ef+iciency? (2) Can SPEED achieve high packet processing performance?

More results can be found in our paper :-)

18

slide-42
SLIDE 42

Can SPEED achieve resource efLiciency?

Deploy SDM programs Deploy switch.p4 programs Yes! SPEED reduces number of switch stages by up to 25%

Background | Problems | Challenges | Design | Evaluation | Summary

19

slide-43
SLIDE 43

Can SPEED achieve high performance?

AboveNet topologic Internet2 topologic Yes! SPEED achieves 14%-59% latency reduction

Background | Problems | Challenges | Design | Evaluation | Summary

20

slide-44
SLIDE 44

Takeaways

SPEED: Resource-Ef+icient and Performant Program Deployment (1) TDG, (2) program merging, (3) OBS-based placement Evaluation on 10 real-world data plane programs: (1) save up to 25% switch stages; (2) reduce latency by 14%-59%

Background | Problems | Challenges | Design | Evaluation | Summary

21

slide-45
SLIDE 45

Thank you very much!


Xiang Chen, Hongyan Liu, Qun Huang, Peiqiao Wang, Dong Zhang, 
 Haifeng Zhou, Chunming Wu
 Email: wasdnsxchen@gmail.com Page: wasdns.github.io