In-Network Computing to the rescue of Faulty Links - - PowerPoint PPT Presentation

in network computing to the rescue of faulty links
SMART_READER_LITE
LIVE PREVIEW

In-Network Computing to the rescue of Faulty Links - - PowerPoint PPT Presentation

In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx), DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679. Path Node 1 Node 2 2 Path Node


slide-1
SLIDE 1

In-Network Computing to the rescue

  • f Faulty Links

Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx),

DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679.


slide-2
SLIDE 2

Node 1 Node 2

Path

2

slide-3
SLIDE 3

Node 1 Node 2

Path

Packet loss -> Application malfunction

3

slide-4
SLIDE 4

Node 1 Node 2

Path

Congestion

Unstable, various mitigations

4

slide-5
SLIDE 5

Node 1 Node 2

Path

Congestion Corruption

Stable, rerouting mitigation + replacement

5

slide-6
SLIDE 6

Node 1 Node 2

Congestion Corruption

Link

6

slide-7
SLIDE 7

Node 1 Node 2

Path

Congestion Corruption

Traffic Engineering

7

slide-8
SLIDE 8

50 100 150 200 250 300 Time (seconds) 2 4 6 8 TCP Throughput (Gb/s) Loss Rate

10−1 10−2 10−3 10−4 10−5 10−6 10−7

Loss and TCP th'put

8

Loss disproportionate to corruption

slide-9
SLIDE 9

Node 1 Node 2

Congestion Corruption

Disable Link(s)

Current solution

9

slide-10
SLIDE 10

Node 1 Node 2

Congestion Corruption

Forward Error-Correction

This talk

10

slide-11
SLIDE 11

Node 1 Node 2

Congestion Corruption

FEC

This talk

11

slide-12
SLIDE 12

This talk

Node 1 Node 2

FEC

12

In-network solution
 relying on computing

slide-13
SLIDE 13

This talk

Node 1 Node 2

FEC

13

Build on recent advances in programmable
 datacenter
 networks. In-network solution
 relying on computing

slide-14
SLIDE 14

Design goals

  • Transparent to rest of the network
  • Low overhead (beyond the FEC overhead)
  • Low complexity activation: between adjacent elements

about whether to activate FEC

  • Support for different traffic classes (affecting latency

and redundancy)

14

slide-15
SLIDE 15

Where to decide? Central

Distributed

(Each element sees to its own links. Faster reaction time) (Single element decides
 for other elements’ links)

15

slide-16
SLIDE 16

What to do? Repeat Redund

(Resend information) (In the hope more info gets through)

16

slide-17
SLIDE 17

What layer FEC?

Physical Link Network

(Change Ethernet) (End-to-end overhead) (Overhead on faulty links)

17

slide-18
SLIDE 18

What layer FEC?

Physical Link Network

(End-to-end overhead) (Overhead on faulty links)

18

slide-19
SLIDE 19

What layer FEC?

Physical Link Network

(Overhead on faulty links)

19

slide-20
SLIDE 20

LL FEC

Client Server Encoding Switch Decoding Switch Faulty Link

(But design could work on non-switch-to-switch links)

20

slide-21
SLIDE 21

21

Data

slide-22
SLIDE 22

22

Data

slide-23
SLIDE 23

23

Parity Data

slide-24
SLIDE 24

24

Parity Data

slide-25
SLIDE 25

25

k h

(Block)

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

Stats Stats

(see paper)

slide-29
SLIDE 29

29

Tagging

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

Parity frames

slide-32
SLIDE 32

32

1 block = k data frames + h parity frames

slide-33
SLIDE 33
slide-34
SLIDE 34

Traffic classification: protocol+port
 (Configured by network controller)

slide-35
SLIDE 35

Implementation

  • High-level logic in P4 (e.g., traffic classification)
  • Two toolchains: Xilinx’s SDNet and P4’s p4c-BMv2
  • External logic in C, targeting both FPGA board (Xilinx

ZCU102) and CPU (x86)

  • Work-in-progress: stats gathering, hardware decoding.

35

slide-36
SLIDE 36
slide-37
SLIDE 37

Post- processor Pre- processor Packet Port In Packet Port Out

FEC External Function FEC UserEngine

P4 PX C

FEC Implementation

Packet In Packet Out Packet Stream In Packet Stream Out Data Words In Data Words Out

slide-38
SLIDE 38

Evaluation

  • Unmodified host stacks and applications.
  • Raw throughput.


DPDK vs FPGA/CPU implementation of Encoder
 FPGA: 9.3Gbps
 CPU: 1.4Gbps (8 physical cores)

  • Goodput vs Error-rate


iperf vs model.

38

slide-39
SLIDE 39

Evaluation

10−5 10−4 10−3 10−2 10−1 Error Rate (Percent of packets Lost) 2 4 6 8 10 Throughput (Gb/s)

No FEC (25, 1) (25, 5) (25, 10) (10, 5) (5, 5)

39

slide-40
SLIDE 40

Evaluation

10−5 10−4 10−3 10−2 10−1 Error Rate (Percent of packets Lost) 2 4 6 8 10 Throughput (Gb/s)

No FEC (25, 1) (25, 5) (25, 10) (10, 5) (5, 5)

40

slide-41
SLIDE 41

Evaluation

10−5 10−4 10−3 10−2 10−1 Error Rate (Percent of packets Lost) 2 4 6 8 10 Throughput (Gb/s)

No FEC (25, 1) (25, 5) (25, 10) (10, 5) (5, 5)

41

slide-42
SLIDE 42

10−5 10−4 10−3 10−2 10−1 Error Rate (Percent of packets Lost) 101 102 103 Congestion Window Size (KB)

No FEC (25, 1) (25, 5) (25, 10) (10, 5) (5, 5)

slide-43
SLIDE 43

Conclusions

  • Design for in-network lossy-link mitigation


Components: FEC + management logic

  • Goals: network transparency, quick reaction, configurable

classes, low non-FEC overhead.

  • Compatible with existing/centralised approaches, to alert

technicians/SREs.

  • Ongoing work: completing implementation,


integrating new “externs” on heterogeneous host/network

43

slide-44
SLIDE 44

In-Network Computing to the rescue

  • f Faulty Links

Acknowledgements: Isaac Pedisich (UPenn), Gordon Brebner (Xilinx),

DARPA Contracts No. HR0011-17-C-0047 and HR0011-16-C-0056, and NSF grant CNS-1513679.