SpeedLight: Synchronized Network Snapshots Nofel Yaseen , John - - PowerPoint PPT Presentation

speedlight synchronized
SMART_READER_LITE
LIVE PREVIEW

SpeedLight: Synchronized Network Snapshots Nofel Yaseen , John - - PowerPoint PPT Presentation

SpeedLight: Synchronized Network Snapshots Nofel Yaseen , John Sonchack, Vincent Liu 1 Network Measurements 2 Network Measurements Measurements are how we understand networks Operators: configuration, management and provisioning


slide-1
SLIDE 1

SpeedLight: Synchronized Network Snapshots

Nofel Yaseen, John Sonchack, Vincent Liu

1

slide-2
SLIDE 2

Network Measurements

2

slide-3
SLIDE 3

Network Measurements

  • Measurements are how we understand networks
  • Operators: configuration, management and provisioning
  • Architects: designing new protocols and topologies
  • Researchers: measurement studies and evaluation
  • Today’s measurement techniques
  • Single device, e.g., counters, sampling
  • Single path or packet, e.g., pings, INT, ECN

3

slide-4
SLIDE 4

A Case for Consistency

A B X Y

4

slide-5
SLIDE 5

A Case for Consistency

A B X Y

5

slide-6
SLIDE 6

A Case for Consistency

What is the reason for this packet drop?

A B X Y

6

slide-7
SLIDE 7

A Case for Consistency

7

slide-8
SLIDE 8

A Case for Consistency

8

slide-9
SLIDE 9

A Case for Consistency

9

Congestion

slide-10
SLIDE 10

A Case for Consistency

10

Congestion Poor Load Balancing

slide-11
SLIDE 11

A Case for Consistency

11

Congestion Poor Load Balancing

  • Single Device: No relationship among measurements across time or devices.
slide-12
SLIDE 12

A Case for Consistency

12

Congestion Poor Load Balancing

  • Single Device: No relationship among measurements across time or devices.
slide-13
SLIDE 13

A Case for Consistency

13

Congestion Poor Load Balancing

  • Single Device: No relationship among measurements across time or devices.
  • Single Path or Packet: No relationship among measurements across paths or packets.
slide-14
SLIDE 14

A Case for Consistency

14

Congestion Poor Load Balancing

  • Single Device: No relationship among measurements across time or devices.
  • Single Path or Packet: No relationship among measurements across paths or packets.

Existing tools fail to capture simultaneous behavior

slide-15
SLIDE 15

Our Goal

15

slide-16
SLIDE 16

Our Goal

Truly simultaneous behavior is not possible

  • Causal consistency, i.e., the set should make sense
  • Near synchrony, i.e., it should be as close as

possible to an actual state (<RTT)

A set of data-plane measurements that capture the state of the network at ~(single point in time)

16

slide-17
SLIDE 17

Speedlight

A set of data-plane measurements that capture the state of the network at ~(single point in time)

17

slide-18
SLIDE 18

Speedlight

  • A P4-based system for Synchronized Network Snapshot
  • Implemented on Wedge100BF
  • Can capture network-wide state of any value accessible in the data

plane

  • Amenable to partial deployment
  • <100μs synchronization, even for large networks

A set of data-plane measurements that capture the state of the network at ~(single point in time)

18

slide-19
SLIDE 19

Outline

  • Chandy - Lamport Algorithm.
  • Challenges of taking Synchronized Network Snapshots.
  • Protocol
  • Prototype Implementation
  • Evaluation

19

slide-20
SLIDE 20

Global Network View

  • Partition the network into pre- and post-snapshot
  • e is pre-snapshot ⇒ all events that caused e are pre-snapshot
  • E.g., receive and send of a message

Figure adapted from Linh T. X. Phan

Event B0 Event B1 Event B2 Event B3 Event A0 Event A1 Event A2 Event A3

A B

20

slide-21
SLIDE 21

Global Network View

  • Partition the network into pre- and post-snapshot
  • e is pre-snapshot ⇒ all events that caused e are pre-snapshot
  • E.g., receive and send of a message

Figure adapted from Linh T. X. Phan

Event B0 Event B1 Event B2 Event B3 Event A0 Event A1 Event A2 Event A3

A B

Inconsistent cut

21

slide-22
SLIDE 22

Global Network View

  • Partition the network into pre- and post-snapshot
  • e is pre-snapshot ⇒ all events that caused e are pre-snapshot
  • E.g., receive and send of a message

Figure adapted from Linh T. X. Phan

Event B0 Event B1 Event B2 Event B3 Event A0 Event A1 Event A2 Event A3

A B

Inconsistent cut Consistent cut

22

slide-23
SLIDE 23

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1

23

slide-24
SLIDE 24

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1

24

slide-25
SLIDE 25

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1 SS# 2

25

slide-26
SLIDE 26

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1 SS# 2

26

slide-27
SLIDE 27

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1 SS# 2 SS# 2

27

slide-28
SLIDE 28

Chandy–Lamport (CL) Snapshots

  • Messages carry the current SS#
  • On seeing a message with a new SS# for the first time
  • Node takes a local checkpoint
  • Node attaches the new SS# to all subsequent messages
  • On seeing a message with an old SS#
  • Message was in-flight. Update channel state.

A B C

Figure adapted from Linh T. X. Phan

SS# 1 SS# 1 SS# 1 SS# 2 SS# 2 SS# 2

28

slide-29
SLIDE 29

Challenges for Synchronized Network Snapshots

29

slide-30
SLIDE 30

Challenges for Synchronized Network Snapshots

  • 1. CL provides no guarantee of synchrony
  • We want something that’s close to an actual state
  • 2. CL assumes single-threaded nodes, FIFO channels
  • Modern networks are highly parallel – breaks consistency
  • 3. CL assumes general purpose CPUs
  • Switch data planes are extremely limited
  • Switch CPUs are no better than remote hosts (wrt consistency)

30

slide-31
SLIDE 31

Observer

Ensuring Synchrony

Challenge 1: Chandy- Lamport provides no guarantee of synchrony

31

slide-32
SLIDE 32

Observer

Ensuring Synchrony

Challenge 1: Chandy- Lamport provides no guarantee of synchrony

32

  • Router CPUs are synchronized via PTP
slide-33
SLIDE 33

Observer

Take SS# n at time t

  • Router CPUs are synchronized via PTP
  • User/Observer schedules a snapshot at every router

Ensuring Synchrony

Challenge 1: Chandy- Lamport provides no guarantee of synchrony

33

slide-34
SLIDE 34

Observer

Take SS# n at time t

  • Router CPUs are synchronized via PTP
  • User/Observer schedules a snapshot at every router

CPU ASIC

Ensuring Synchrony

Challenge 1: Chandy- Lamport provides no guarantee of synchrony

34

slide-35
SLIDE 35

Observer

Take SS# n at time t

  • Router CPUs are synchronized via PTP
  • User/Observer schedules a snapshot at every router

CPU ASIC

Ensuring Synchrony

Challenge 1: Chandy- Lamport provides no guarantee of synchrony

35

slide-36
SLIDE 36

Challenge 2: CL assumes single-threaded nodes, FIFO channels

ASIC

Ensuring Consistency

Observer CPU

Figure from P4 language Specification

36

slide-37
SLIDE 37

Challenge 2: CL assumes single-threaded nodes, FIFO channels

ASIC

Ensuring Consistency

Observer CPU

37

  • Data plane snapshot on the level of individual processing

units and priority channels

slide-38
SLIDE 38

Challenge 2: CL assumes single-threaded nodes, FIFO channels

ASIC

Ensuring Consistency

Observer CPU

  • Data plane snapshot on the level of individual processing

units and priority channels

  • Snapshot propagates even if CPU invocation is delayed

38

slide-39
SLIDE 39

Challenge 2: CL assumes single-threaded nodes, FIFO channels

ASIC

Ensuring Consistency

Observer CPU

  • Data plane snapshot on the level of individual processing

units and priority channels

  • Snapshot propagates even if CPU invocation is delayed

Ethernet IP Snapshot TCP/UDP Data

39

slide-40
SLIDE 40

Compensate for Data-plane Limitations

Challenge 3: CL assumes general purpose CPUs

40

slide-41
SLIDE 41
  • Programmable ASICs are limited
  • Limited programming model, registers and accesses
  • Control plane compensates, for example:
  • Detects snapshot completion
  • Notifications
  • Extract from RAM
  • Lack of traffic
  • Liveness
  • Skipped snapshots

Compensate for Data-plane Limitations

Challenge 3: CL assumes general purpose CPUs

41

slide-42
SLIDE 42

Implementation and Evaluation

  • Implemented on a Barefoot Wedge100BF-32X
  • Control plane: ~2000 lines of Python
  • Data plane: ~1000 lines of P4 (per variant)
  • Evaluation
  • How synchronized is Speedlight?
  • What is the overhead?
  • How does its results compare against current mechanism?

42

slide-43
SLIDE 43

Implementation and Evaluation

  • Implemented on a Barefoot Wedge100BF-32X
  • Control plane: ~2000 lines of Python
  • Data plane: ~1000 lines of P4 (per variant)
  • Evaluation
  • How synchronized is Speedlight?
  • What is the overhead?
  • How does its results compare against current mechanism?

43

slide-44
SLIDE 44

How Synchronized is Speedlight?

44

slide-45
SLIDE 45

How Synchronized is Speedlight?

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000

CDF Synchronization (us)

Speedlight Polling

45

slide-46
SLIDE 46

How Synchronized is Speedlight?

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000

CDF Synchronization (us)

Speedlight Polling

Median: 6.4μs

46

slide-47
SLIDE 47

How Synchronized is Speedlight?

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000

CDF Synchronization (us)

Speedlight Polling

Median: 6.4μs Median: 3500 μs

47

slide-48
SLIDE 48

How Does Synchronization Scale?

48

slide-49
SLIDE 49

How Does Synchronization Scale?

20 40 60 80 100 10 100 1000 10000 Synchronization (us) Number of Routers

49

  • Average synchronization in simulated network of 64-port routers
slide-50
SLIDE 50

How Does Synchronization Scale?

20 40 60 80 100 10 100 1000 10000 Synchronization (us) Number of Routers

  • Average synchronization in simulated network of 64-port routers
  • Number of routers only increases probability of hitting tail, not

length of the tail

50

slide-51
SLIDE 51

What’s the Overhead?

51

slide-52
SLIDE 52

What’s the Overhead?

  • No delays
  • Network Overhead: 8 bytes per Packet

52

slide-53
SLIDE 53

What’s the Overhead?

  • No delays
  • Network Overhead: 8 bytes per Packet

Computational Resources Stateless ALUs 24 Stateful ALUs 11

53

slide-54
SLIDE 54

What’s the Overhead?

  • No delays
  • Network Overhead: 8 bytes per Packet

Computational Resources Stateless ALUs 24 Stateful ALUs 11 Memory Resources SRAM 770 kB TCAM 244 kB

54

slide-55
SLIDE 55

Use Case: Synchronized Traffic - GraphX

SpeedLight Snapshots Polling

55

slide-56
SLIDE 56

Use Case: Synchronized Traffic - GraphX

SpeedLight Snapshots Polling

56

slide-57
SLIDE 57

Use Case: Synchronized Traffic - GraphX

SpeedLight Snapshots Polling

57

slide-58
SLIDE 58

Use Case: Synchronized Traffic - GraphX

SpeedLight Snapshots Polling

ECMP

58

slide-59
SLIDE 59

Use Case: Load Balancing

59

slide-60
SLIDE 60

Use Case: Load Balancing

0.2 0.4 0.6 0.8 1 50 100 150 200 250 CDF Standard Deviation (ms) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots

Hadoop

60

slide-61
SLIDE 61

Use Case: Load Balancing

0.2 0.4 0.6 0.8 1 50 100 150 200 250 CDF Standard Deviation (ms) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots

Hadoop

  • Polling shows no difference

between ECMP and flowlets.

  • Reality: flowlets halve 90 pct

stddev

61

slide-62
SLIDE 62

Use Case: Load Balancing

0.2 0.4 0.6 0.8 1 50 100 150 200 250 CDF Standard Deviation (ms) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots 0.2 0.4 0.6 0.8 1 20 40 60 80 100 CDF Standard Deviation (us) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots

Hadoop Memcache

  • Polling shows no difference

between ECMP and flowlets.

  • Reality: flowlets halve 90 pct

stddev

62

slide-63
SLIDE 63

Use Case: Load Balancing

0.2 0.4 0.6 0.8 1 50 100 150 200 250 CDF Standard Deviation (ms) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots 0.2 0.4 0.6 0.8 1 20 40 60 80 100 CDF Standard Deviation (us) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots

Hadoop Memcache

  • Polling shows no difference

between ECMP and flowlets.

  • Reality: flowlets halve 90 pct

stddev

  • Polling consistently overestimates

imbalance

63

slide-64
SLIDE 64

Use Case: Load Balancing

0.2 0.4 0.6 0.8 1 50 100 150 200 250 CDF Standard Deviation (ms) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots 0.2 0.4 0.6 0.8 1 20 40 60 80 100 CDF Standard Deviation (us) ECMP Polling ECMP Snapshots Flowlet Polling Flowlet Snapshots

Hadoop Memcache

Averaging shows perfect balance in both cases

  • Polling shows no difference

between ECMP and flowlets.

  • Reality: flowlets halve 90 pct

stddev

  • Polling consistently overestimates

imbalance

64

slide-65
SLIDE 65

Speedlight Summary

  • Unsynchronized measurements can be misleading
  • Speedlight: A complete picture of the network
  • Causal consistency
  • Approximate synchrony (<RTT)
  • Wedge100BF-32X implementation
  • https://github.com/eniac/Speedlight

65

slide-66
SLIDE 66

THANK YOU QUESTIONS AND COMMENTS

66

slide-67
SLIDE 67

When We Go Too Fast

10 100 1000 10000 4 8 16 32 64 Maximum Rate (Hz) # of Ports/Router

  • Limited by number of Ports
  • Detect Inconsistent/Incomplete Snapshots

67