Fault Tolerant Service Function Chaining
- M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH
UN UNIV IVERSITY ITY OF F WATE TERLOO
Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, - - PowerPoint PPT Presentation
Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH UN UNIV IVERSITY ITY OF F WATE TERLOO Middleboxes and Service Function Chains Firewall IDS NAT Internet 1 Middlebox Failures 100
UN UNIV IVERSITY ITY OF F WATE TERLOO
Middleboxes and Service Function Chains
1Firewall IDS NAT Internet
Middlebox Failures
2 20 81 36 2 43 11 1 3 25 50 75 100 L2 Switches L3 Routers Middleboxes Others Percent Contribution High Severity Incidents PopulationDemystifying the dark side of the middle: A field study of middlebox failures in datacenters IMC 2013
NAT
Middlebox Fault Tolerance
3NAT Internet Alice Bob
NAT
Consistent State Replication
4NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Alice ⬄Apple Bob ⬄Bing
NAT
Consistent State Replication
5NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Bob ⬄Bing
Previous Approaches
EXTERNALIZED STATE
StatelessNF, NSDI 2017 CHC, NSDI 2019
SNAPSHOT BASED
Pico Replication, SoCC 2013 FTMB, SIGCOMM 2015 REINFORCE, CoNEXT 2018
6Externalized State Approach
7NAT Internet Fault Tolerant Data Store Read/Write State
NAT
Snapshot Based Approaches
8NAT Internet Alice Bob Alice ⬄Apple Bob ⬄Bing Alice ⬄Apple Bob ⬄Bing Primary state Replicated state
Snapshot Based Approaches for a Chain
9FW FW IDS NAT IDPS NAT Internet
High Latency Low Throughput
Primary state Replicated state
Our Approach
10Fault Tolerant Firewall IDS NAT
F I
F’
N I’
…
Primary state Replicated state
Goals
Consistent state replication to tolerate ! middlebox failures Minimizing performance overhead during normal operation Minimizing disruption during middlebox failures
11Fault Tolerant Chaining (FTC)
In-chain replication
Transactional packet processing
Data dependency vectors
r1 r3 r2
Normal Operation
13m1 m2 m3 Forw. Buffer
2 1 2 3
r1 r3 r2
Normal Operation
14m1 m2 m3 Forw. Buffer
1 2 1 2 3
r1 r3 r2
Normal Operation
15m1 m2 m3 Forw. Buffer
3 1 2 1 2 3
Failure Recovery
16r3 r2 r1 m1 m2 m3
1 3 2 1 3 2
r’2 m2
2 1 2
Forw. Buffer Primary state Replicated state
Transactional Packet Processing
Existing approaches
Our approach
Data Dependency Vectors
Tracking data changes instead of thread operations Enabling different number of threads at the middlebox and replicas
Middlebox Product Throughput CPU Core IPSec HP VSR1001 268 Mbps 1 HP VSR1008 926 Mbps 8 WAN Optimizer STEELHEAD CCX770M 10 Mbps 2 STEELHEAD CCX1555M 50 Mbps 4 WAF Barracuda Level 1 100 Mbps 1 Barracuda Level 5 200 Mbps 2
Data Dependency Vectors Example
191 2 3 4 5 Middlebox Replica hold W(1) R(1), W(3)
⟨0,x,x⟩
⟨1,x,4⟩ ⟨0,3,4⟩ ⟨1,x,4⟩
≥
⟨1,3,4⟩
⟨1,x,4⟩
≥
⟨0,3,4⟩
⟨0,x,x⟩
≥
⟨0,3,4⟩
⟨1,3,4⟩
⟨2,3,5 ⟩
Middlebox’s dependency vector: Replica’s dependency vector: 1 2
⟨0,3,4⟩
⟨1,3,4⟩
⟨2,3,5 ⟩
4 5 ? ✓ ✓
⟨0,x,x⟩
⟨1,x,4⟩ ⟨0,3,4⟩ ⟨0,3,4⟩
⟨1,3,4⟩
⟨1,x,4⟩
Evaluation
METHOD Comparing FTC with:
NF, Non-Fault tolerant system
FTMB (SIGCOMM 2015)
FTMB + Snapshot (SIGCOMM 2015)
ENVIRONMENTS A cluster of 12 servers
SAVI Cloud environment
MoonGen and pktGen traffic generators
Fault Tolerant NATs
211 2 4 8 Threads 2 4 6 8 10 Throughput (Mpps) NF FTC FTMB
2x higher throughput
Fault Tolerant Chains – Throughput
222 3 4 5 Chain Length 2 4 6 8 10 Throughput (Mpps) NF FTC FTMB FTMB+Snapshot
3.5x higher throughput 39% drop due to snapshots 1.8x higher throughput
Fault Tolerant Chains – Latency
2340 60 80 100 120 140 160 180 Latency (µs) 0.0 0.2 0.4 0.6 0.8 1.0 Packets (CDF) NF FTC FTMB
Conclusion
Keep operation of a chain of middleboxes online after ! middleboxes fail