Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, - - PowerPoint PPT Presentation

fault tolerant service function chaining
SMART_READER_LITE
LIVE PREVIEW

Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, - - PowerPoint PPT Presentation

Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH UN UNIV IVERSITY ITY OF F WATE TERLOO Middleboxes and Service Function Chains Firewall IDS NAT Internet 1 Middlebox Failures 100


slide-1
SLIDE 1

Fault Tolerant Service Function Chaining

  • M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH

UN UNIV IVERSITY ITY OF F WATE TERLOO

slide-2
SLIDE 2

Middleboxes and Service Function Chains

1

Firewall IDS NAT Internet

slide-3
SLIDE 3

Middlebox Failures

2 20 81 36 2 43 11 1 3 25 50 75 100 L2 Switches L3 Routers Middleboxes Others Percent Contribution High Severity Incidents Population

Demystifying the dark side of the middle: A field study of middlebox failures in datacenters IMC 2013

slide-4
SLIDE 4

NAT

Middlebox Fault Tolerance

3

NAT Internet Alice Bob

slide-5
SLIDE 5

NAT

Consistent State Replication

4

NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Alice ⬄Apple Bob ⬄Bing

slide-6
SLIDE 6

NAT

Consistent State Replication

5

NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Bob ⬄Bing

slide-7
SLIDE 7

Previous Approaches

EXTERNALIZED STATE

StatelessNF, NSDI 2017 CHC, NSDI 2019

SNAPSHOT BASED

Pico Replication, SoCC 2013 FTMB, SIGCOMM 2015 REINFORCE, CoNEXT 2018

6
slide-8
SLIDE 8

Externalized State Approach

7

NAT Internet Fault Tolerant Data Store Read/Write State

slide-9
SLIDE 9

NAT

Snapshot Based Approaches

8

NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Alice ⬄Apple Bob ⬄Bing Primary state Replicated state

slide-10
SLIDE 10

Snapshot Based Approaches for a Chain

9

FW FW IDS NAT IDPS NAT Internet

High Latency Low Throughput

Primary state Replicated state

slide-11
SLIDE 11

Our Approach

10

Fault Tolerant Firewall IDS NAT

F I

F’

N I’

Primary state Replicated state

slide-12
SLIDE 12

Goals

Consistent state replication to tolerate ! middlebox failures Minimizing performance overhead during normal operation Minimizing disruption during middlebox failures

11
slide-13
SLIDE 13

Fault Tolerant Chaining (FTC)

In-chain replication

  • Replicates a chain’s state instead of the state of individual middleboxes
  • Each middlebox’s state replicated to subsequent ! middlebox servers

Transactional packet processing

  • Simplifies the development of multi-threaded middleboxes
  • Improves scalability and performance

Data dependency vectors

  • Enables concurrent state replication
12
slide-14
SLIDE 14

r1 r3 r2

Normal Operation

13

m1 m2 m3 Forw. Buffer

2 1 2 3

slide-15
SLIDE 15

r1 r3 r2

Normal Operation

14

m1 m2 m3 Forw. Buffer

1 2 1 2 3

slide-16
SLIDE 16

r1 r3 r2

Normal Operation

15

m1 m2 m3 Forw. Buffer

3 1 2 1 2 3

slide-17
SLIDE 17

Failure Recovery

16

r3 r2 r1 m1 m2 m3

1 3 2 1 3 2

r’2 m2

2 1 2

Forw. Buffer Primary state Replicated state

slide-18
SLIDE 18

Transactional Packet Processing

Existing approaches

  • Single thread or batched packet processing
  • FTMB: multi threaded packet processing
  • Tracking state changes in granularity of each state variable read/write
  • Frequent periodic state snapshots

Our approach

  • Packet transaction model for concurrent packet processing
  • Using isolation property to tracking state changes in granularity of packet transactions
17
slide-19
SLIDE 19

Data Dependency Vectors

Tracking data changes instead of thread operations Enabling different number of threads at the middlebox and replicas

  • Fail over to smaller machine
  • Scale up to a larger machine
18

Middlebox Product Throughput CPU Core IPSec HP VSR1001 268 Mbps 1 HP VSR1008 926 Mbps 8 WAN Optimizer STEELHEAD CCX770M 10 Mbps 2 STEELHEAD CCX1555M 50 Mbps 4 WAF Barracuda Level 1 100 Mbps 1 Barracuda Level 5 200 Mbps 2

slide-20
SLIDE 20

Data Dependency Vectors Example

19

1 2 3 4 5 Middlebox Replica hold W(1) R(1), W(3)

⟨0,x,x⟩

⟨1,x,4⟩ ⟨0,3,4⟩ ⟨1,x,4⟩

⟨1,3,4⟩

⟨1,x,4⟩

⟨0,3,4⟩

⟨0,x,x⟩

⟨0,3,4⟩

⟨1,3,4⟩

⟨2,3,5 ⟩

Middlebox’s dependency vector: Replica’s dependency vector: 1 2

⟨0,3,4⟩

⟨1,3,4⟩

⟨2,3,5 ⟩

4 5 ? ✓ ✓

⟨0,x,x⟩

⟨1,x,4⟩ ⟨0,3,4⟩ ⟨0,3,4⟩

⟨1,3,4⟩

⟨1,x,4⟩

slide-21
SLIDE 21

Evaluation

METHOD Comparing FTC with:

NF, Non-Fault tolerant system

  • Ideal performance

FTMB (SIGCOMM 2015)

  • State logging + Snapshots

FTMB + Snapshot (SIGCOMM 2015)

  • State logging + Snapshots

ENVIRONMENTS A cluster of 12 servers

  • 40 Gbps network

SAVI Cloud environment

  • A virtual network of OVS switches

MoonGen and pktGen traffic generators

  • UDP traffic
  • Packet size: 256 B
20
slide-22
SLIDE 22

Fault Tolerant NATs

21

1 2 4 8 Threads 2 4 6 8 10 Throughput (Mpps) NF FTC FTMB

2x higher throughput

slide-23
SLIDE 23

Fault Tolerant Chains – Throughput

22

2 3 4 5 Chain Length 2 4 6 8 10 Throughput (Mpps) NF FTC FTMB FTMB+Snapshot

3.5x higher throughput 39% drop due to snapshots 1.8x higher throughput

slide-24
SLIDE 24

Fault Tolerant Chains – Latency

23

40 60 80 100 120 140 160 180 Latency (µs) 0.0 0.2 0.4 0.6 0.8 1.0 Packets (CDF) NF FTC FTMB

slide-25
SLIDE 25

Conclusion

Keep operation of a chain of middleboxes online after ! middleboxes fail

  • In-chain replication
  • Transactional packet processing
  • Data dependency vectors
24