restoring tcp sessions with a distributed hash table Advanced - - PowerPoint PPT Presentation

restoring tcp sessions with a distributed hash table
SMART_READER_LITE
LIVE PREVIEW

restoring tcp sessions with a distributed hash table Advanced - - PowerPoint PPT Presentation

restoring tcp sessions with a distributed hash table Advanced Networking RP2 Peter Boers June 29, 2016 System and Network Engineering - FNWI - UVA scaling infrastructure Imagine you are one of the largest providers of web services in the


slide-1
SLIDE 1

restoring tcp sessions with a distributed hash table

Advanced Networking RP2

Peter Boers June 29, 2016

System and Network Engineering - FNWI - UVA

slide-2
SLIDE 2

scaling infrastructure

∙ Imagine you are one of the largest providers of web services in the world… ∙ How do you make sure that you can service your infrastructure and make sure your clients never know that this is happening?

1

slide-3
SLIDE 3

load balancers

Why do you balance load? ∙ To maintain the integrity of the end to end session between the Client who is trying to access a Service. ∙ To distribute load across multiple end points

2

slide-4
SLIDE 4

traditional solutions

Traditional hardware and software Load Balancers can do some or all of the following: ∙ Maintain a high available setup ∙ Layers 3,4 and/or 7 in the OSI stack ∙ TLS offloading ∙ Compression ∙ Marshaling of TCP sessions ∙ Proxying However sometimes these solutions require high licensing fees and they are unable to scale enough.

3

slide-5
SLIDE 5

traditional solutions

Traditional hardware and software Load Balancers can do some or all of the following: ∙ Maintain a high available setup ∙ Layers 3,4 and/or 7 in the OSI stack ∙ TLS offloading ∙ Compression ∙ Marshaling of TCP sessions ∙ Proxying However sometimes these solutions require high licensing fees and they are unable to scale enough.

3

slide-6
SLIDE 6

traditional solutions

Figure: Simple high available setup

4

slide-7
SLIDE 7

new network design

In a recent draft RFC by Facebook and Arista Networks, a new network design for very large data centers is discussed[1]: ∙ ”Environments of this scale have a unique set of network requirements with an emphasis on operational simplicity and network stability.” ∙ This document proposes the use of EBGP as the only routing protocol. ∙ To distribute load and traffic, Anycast in combination with Equal Cost MultiPath routing (ECMP) will be used instead of traditional load balancers.

5

slide-8
SLIDE 8

new network design

In a recent draft RFC by Facebook and Arista Networks, a new network design for very large data centers is discussed[1]: ∙ ”Environments of this scale have a unique set of network requirements with an emphasis on operational simplicity and network stability.” ∙ This document proposes the use of EBGP as the only routing protocol. ∙ To distribute load and traffic, Anycast in combination with Equal Cost MultiPath routing (ECMP) will be used instead of traditional load balancers.

5

slide-9
SLIDE 9

new network design

In a recent draft RFC by Facebook and Arista Networks, a new network design for very large data centers is discussed[1]: ∙ ”Environments of this scale have a unique set of network requirements with an emphasis on operational simplicity and network stability.” ∙ This document proposes the use of EBGP as the only routing protocol. ∙ To distribute load and traffic, Anycast in combination with Equal Cost MultiPath routing (ECMP) will be used instead of traditional load balancers.

5

slide-10
SLIDE 10

new network design

In a recent draft RFC by Facebook and Arista Networks, a new network design for very large data centers is discussed[1]: ∙ ”Environments of this scale have a unique set of network requirements with an emphasis on operational simplicity and network stability.” ∙ This document proposes the use of EBGP as the only routing protocol. ∙ To distribute load and traffic, Anycast in combination with Equal Cost MultiPath routing (ECMP) will be used instead of traditional load balancers.

5

slide-11
SLIDE 11

new network design

In a recent draft RFC by Facebook and Arista Networks a new network design for very large data centers is discussed[1]: ∙ ”Environments of this scale have a unique set of network requirements with an emphasis on operational simplicity and network stability.” ∙ This document proposes the use of an EBGP only as routing protocol. ∙ To distribute load and traffic, Anycast in combination with Equal Cost MultiPath routing (ECMP) will be used instead of traditional load balancers. The goal is to achieve greater horizontal scalability and use proven Network protocols for simplicity

6

slide-12
SLIDE 12

new network design

Figure: New Design

7

slide-13
SLIDE 13

new network design

Features of the new network: ∙ Balancing no longer done at the edge but at the endpoints ∙ All hosts take part in the routing protocol ∙ Layer 3/4 balancing is no longer scalable through traditional means ∙ How do you maintain the integrity of a TCP session?

8

slide-14
SLIDE 14

research questions

How can a DHT be leveraged to maintain TCP session state in the case of a failure in a Large BGP networks with thousands of hosts [1]? ∙ What technical requirements are needed to maintain the TCP session in the case of a failure? ∙ Does using a DHT to lookup invalid sessions provide enough performance so that the session can continue?

9

slide-15
SLIDE 15

method - why a dht?

What is good about a Distributed Hash Table in this situation? ∙ Nodes do not have all the information, but know where to look it up. ∙ Distributes the information evenly over all nodes. ∙ Scales well: O(n) = log(n) ∙ Stores key-value pairs. Kademlia implementation chosen to build the Distributed Hash Table.

10

slide-16
SLIDE 16

method - why a dht?

What is good about a Distributed Hash Table in this situation? ∙ Nodes do not have all the information, but know where to look it up. ∙ Distributes the information evenly over all nodes. ∙ Scales well: O(n) = log(n) ∙ Stores key-value pairs. Kademlia implementation chosen to build the Distributed Hash Table.

10

slide-17
SLIDE 17

method - how to handle tcp

How do we detect on the node if the TCP session is wrong? ∙ Nodes must track connections ∙ If the connection is not NEW or ESTABLISHED do a look up on the DHT. ∙ The 4-tuple is ideal for storing in the DHT: Client socket = key. Server socket = value { ”145.100.102.131:12445” : ”10.100.10.1:80” } When a wrong session arrives do a look up and redirect the traffic.

11

slide-18
SLIDE 18

method - how to handle tcp

How do we detect on the node if the TCP session is wrong? ∙ Nodes must track connections ∙ If the connection is not NEW or ESTABLISHED do a look up on the DHT. ∙ The 4-tuple is ideal for storing in the DHT: Client socket = key. Server socket = value { ”145.100.102.131:12445” : ”10.100.10.1:80” } When a wrong session arrives do a look up and redirect the traffic.

11

slide-19
SLIDE 19

method - how to handle tcp

How do we detect on the node if the TCP session is wrong? ∙ Nodes must track connections ∙ If the connection is not NEW or ESTABLISHED do a look up on the DHT. ∙ The 4-tuple is ideal for storing in the DHT: Client socket = key. Server socket = value { ”145.100.102.131:12445” : ”10.100.10.1:80” } When a wrong session arrives do a look up and redirect the traffic.

11

slide-20
SLIDE 20

scenario

In the scenario we assume the following: ∙ N amount of servers hosting a website and taking part in a DHT

  • verlay

∙ The website is balanced using ECMP and Anycast on the network ∙ All new TCP sessions are stored in the DHT Then we simulate a link failure: ∙ Let ECMP recalculate the path of the traffic ∙ Lookup the ”Key” (Client socket) ∙ Forward traffic to the ”Value” (Server Identifier)

12

slide-21
SLIDE 21

scenario

In the scenario we assume the following: ∙ N amount of servers hosting a website and taking part in a DHT

  • verlay

∙ The website is balanced using ECMP and Anycast on the network ∙ All new TCP sessions are stored in the DHT Then we simulate a link failure: ∙ Let ECMP recalculate the path of the traffic ∙ Lookup the ”Key” (Client socket) ∙ Forward traffic to the ”Value” (Server Identifier)

12

slide-22
SLIDE 22

scenario - step 1

13

slide-23
SLIDE 23

scenario - step 2

14

slide-24
SLIDE 24

scenario - step 3

15

slide-25
SLIDE 25

in what case is the test successful?

How do you measure when a fail over is within an industry standard acceptable window? ∙ Amazon Web Services load balancing health check has a default

  • f 30 seconds and a minimum of 5 seconds, with a timeout of 30

seconds[2] ∙ Kemp technologies has a default health check of 9 seconds and a minimum of 3 seconds, with a timeout of 15 seconds[3] ∙ f5 technologies has a default health check every 5 seconds, with a timeout of 15 seconds[4]

16

slide-26
SLIDE 26

in what case is the test successful?

How do you measure when a fail over is within an industry standard acceptable window? ∙ Amazon Web Services load balancing health check has a default

  • f 30 seconds and a minimum of 5 seconds, with a timeout of 30

seconds[2] ∙ Kemp technologies has a default health check of 9 seconds and a minimum of 3 seconds, with a timeout of 15 seconds[3] ∙ f5 technologies has a default health check every 5 seconds, with a timeout of 15 seconds[4]

16

slide-27
SLIDE 27

in what case is the test successful?

How do you measure when a fail over is within an industry standard acceptable window? ∙ Amazon Web Services load balancing health check has a default

  • f 30 seconds and a minimum of 5 seconds, with a timeout of 30

seconds[2] ∙ Kemp technologies has a default health check of 9 seconds and a minimum of 3 seconds, with a timeout of 15 seconds[3] ∙ f5 technologies has a default health check every 5 seconds, with a timeout of 15 seconds[4]

16

slide-28
SLIDE 28

in what case is the test successful?

How do you measure when a fail over is within an industry standard acceptable window? ∙ Amazon Web Services load balancing health check has a default

  • f 30 seconds and a minimum of 5 seconds, with a timeout of 30

seconds[2] ∙ Kemp technologies has a default health check of 9 seconds and a minimum of 3 seconds, with a timeout of 15 seconds[3] ∙ f5 technologies has a default health check every 5 seconds, with a timeout of 15 seconds[4] This means: in the worst case scenario there is a timeout of 20 seconds to around one minute before TCP session restoration

17

slide-29
SLIDE 29

results

Results for the test setup of this research: ∙ Setting Time - The time that it takes to set a key in the DHT ∙ Detection Time - The time that it takes to detect a Link Failure ∙ Lookup Time - The Time it takes to Lookup a key on the DHT.

18

slide-30
SLIDE 30

setting time

  • 0.200

0.202 0.204 0.206 0.208

Seconds

Attempts

15

Figure: This plot shows the time in seconds that it takes to set the Key - Value pair on the DHT

19

slide-31
SLIDE 31

detection time

1.0 1.2 1.4 1.6 1.8 2.0

Seconds

Attempts

10

+

Figure: This plot shows the time in seconds that it takes between the failure

  • f a link and the rerouting of packets

20

slide-32
SLIDE 32

lookup time

0.1000 0.1005 0.1010 0.1015 0.1020

Seconds

Attempts

11

+

Figure: This plot shows the time in seconds that it takes for a node to lookup a key on the DHT

21

slide-33
SLIDE 33

conclusion - discussion

Key findings: ∙ On this small scale it is fast enough to detect a failure and act

  • n it.

∙ No protocol changes needed. ∙ Horizontal scalability is very simple in this model Future Efforts: ∙ What is the performance cost when it scales? ∙ Convert script to binary and integrate with other software ∙ How do you make sure it is reliable?

22

slide-34
SLIDE 34

conclusion - discussion

Key findings: ∙ On this small scale it is fast enough to detect a failure and act

  • n it.

∙ No protocol changes needed. ∙ Horizontal scalability is very simple in this model Future Efforts: ∙ What is the performance cost when it scales? ∙ Convert script to binary and integrate with other software ∙ How do you make sure it is reliable?

22

slide-35
SLIDE 35

references

  • P. Lapukhov, A. Premji, and J. Mitchell. Use of BGP for routing in

large-scale data centers. Tech. rep. Technical report, IETF, 2016. url: https://datatracker.ietf.org/doc/draft-ietf- rtgwg-bgp-routing-large-dc/.

  • Amazon. Elastic Load Balancing - Configure Health Checks.
  • 2016. url: http://docs.aws.amazon.com/

ElasticLoadBalancing/latest/DeveloperGuide/elb- healthchecks.html (visited on 06/28/2016). Kemp Technologies. Frequently Asked Questions. 2016. url: https://kemptechnologies.com/faq/ (visited on 06/28/2016). f5 solutions. Manual Chapter: Configuring Monitors. 2016. url: https://support.f5.com/kb/en-us/products/big- ip_ltm/manuals/product/ltm_configuration_guide_ 10_0_0/ltm_monitors.html#1201151 (visited on 06/28/2016).

23