Network Routing Hatem Takruri, Ibrahim Kettaneh , Ahmed Alquraan, - - PowerPoint PPT Presentation

network routing
SMART_READER_LITE
LIVE PREVIEW

Network Routing Hatem Takruri, Ibrahim Kettaneh , Ahmed Alquraan, - - PowerPoint PPT Presentation

FLAIR: Accelerating Reads with Consistency-Aware Network Routing Hatem Takruri, Ibrahim Kettaneh , Ahmed Alquraan, Samer Al-Kiswany 1 In Introduction Modern cloud applications Are read intensive R:W in Googles F1 advertising system


slide-1
SLIDE 1

FLAIR: Accelerating Reads with Consistency-Aware Network Routing

Hatem Takruri, Ibrahim Kettaneh, Ahmed Alquraan, Samer Al-Kiswany

1

slide-2
SLIDE 2

In Introduction

Modern cloud applications

  • Are read intensive
  • R:W in Google’s F1 advertising system is 380:1 [1]
  • R:W in Facebook’s TAO is 500:1 [2]
  • Require data reliability

Main approach: Replication Strongly consistent replication protocols are popular

2 [1] J. Shute, R. Vingralek, B. Samwel, et al., F1: a distributed SQL database that scales. Proc. VLDB Endow., 2013. 6(11): p. 1068-1079. [2] N. Bronson, Z. Amsden, G. Cabrera, et al. TAO: Facebook's distributed data store for the social graph. in Proceedings of the USENIX Technical Conference. 2013. San Jose, CA: USENIX

slide-3
SLIDE 3

In Inefficiency of f current replication protocols

Modern strongly consistent protocols are inefficient for read- heavy workloads Main reason: they are leader-based

3

Paxos Viewstamp Replication

Raft ZAB

slide-4
SLIDE 4

Leader-based Consensus In Inefficiency

4

Leader Followers

WriteRequest Replicate Replicate Replicate Replicate

slide-5
SLIDE 5

Leader-based Consensus In Inefficiency

5

Leader Followers Inefficient: Only the leader handles read/write requests Missed opportunity: Utilize the followers to serve reads.

. . .

slide-6
SLIDE 6

Current Approaches to Utilizing Followers

Read leases

  • The leader grants read leases to followers
  • With a valid lease: followers serve reads
  • On write: leader revokes leases
  • Drawbacks
  • Complicates lease management
  • Increases write latency
  • Complicates fault tolerance

Eventual consistency

  • Replicas serve reads, albeit the possibility of returning stale data

6

slide-7
SLIDE 7

FLAIR: Fast, , Linearizable, , Network Accelerated Cl Client Reads

A novel approach to serve reads from followers while maintaining linearizability. A shim layer atop current leader-based protocols. Main idea

  • The network detects read/write conflicts, and
  • Load balance reads across consistent replicas

Enabler: Programmable switches

7

slide-8
SLIDE 8

FLAIR in a Nutshell

  • Network switches monitor write requests/responses
  • Identify which objects are stable, and on which replicas
  • Load balance reads across consistent replicas

8

FLAIR is an in-network consistency-aware load-balancing protocol

slide-9
SLIDE 9

Evaluation Summary ry

  • Implemented FLAIR using P4
  • Evaluated FLAIR on a cluster with Barefoot Tofino switch

FLAIR achieves

  • 1.4 to 2.1 higher throughput
  • 1.5 to 2.4 lower latency

9

slide-10
SLIDE 10

Outline

  • Overview of programmable switches
  • FLAIR design
  • Implementation
  • Evaluation

10

slide-11
SLIDE 11

Programmable Switches Overview

11

Example: key-based routing pipeline Node 1 Node 2 Key range: [0, 1000) Key range: [1000, 2000) Client L2 L3 GET: Key = 200

slide-12
SLIDE 12

Programmable Switches Overview

12

Example: key-based routing pipeline

L2 Table IPV4 Table ACL Table KV Routing Table Packet header and metadata

Pipeline Stage

slide-13
SLIDE 13

Programmable Switches Overview

13

Match header.key  [0, 1000) header.key  [1000, 2000) Action forward(0) forward(1) Example: key-based routing pipeline

L2 Table IPV4 Table ACL Table KV Routing Table Packet header and metadata

Match + Action Tables

slide-14
SLIDE 14

Programmable Switches Overview

14

L2 Table IPV4 Table ACL Table KV Routing Table Packet header and metadata

Match header.key  [0, 1000) header.key  [1000, 2000) Action forward(0) forward(1)

forward(index): dstIPAddr = addressArray[index]

addressArray IP 1 IP 2 IP 3 Example: key-based routing pipeline

Facilitates building application-optimized network substrate.

Custom Actions Switch Memory

slide-15
SLIDE 15

Programmable Switches Challenges

  • No loops or recursion
  • Restricted pipeline-based programming model
  • Limited number of pipeline stages
  • Limited computational power
  • Restricted memory access model

15

Can we use programmable switches to build consistency-aware network routing protocol?

slide-16
SLIDE 16

Outline

  • Overview of programmable switches
  • FLAIR design
  • Implementation
  • Evaluation

16

slide-17
SLIDE 17

FLAIR Design

17

FLAIR pipeline

Which nodes can serve Read(Key1)?

Clients Leader Follower1 Follower2 Flair modules Read(Key1) Network controller

slide-18
SLIDE 18

FLAIR’s Object Stability Array

18

Objects stability array

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Unstable … Stable Replicas L, F1 All

slide-19
SLIDE 19

FLAIR in Action

Leader Follower 1

19

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All All All … Objects stability array

Follower 2

Read(KeyHash = 5000)

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

slide-20
SLIDE 20

FLAIR in Action

20

Objects stability array Check

20

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All All All … Read(KeyHash = 5000)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

slide-21
SLIDE 21

FLAIR in Action

21

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

Read(KeyHash = 5000) ReadResponse(KeyHash = 5000, Val = A) Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All All All …

slide-22
SLIDE 22

FLAIR in Action

22

Objects stability array Write(KeyHash = 5000, Val = B)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All All All …

slide-23
SLIDE 23

FLAIR in Action

23

Objects stability array Update Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

… Write(KeyHash = 5000, Val = B)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

slide-24
SLIDE 24

FLAIR in Action

24

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

… Write(KeyHash = 5000, Val = B)

slide-25
SLIDE 25

FLAIR in Action

25

Objects stability array Read(KeyHash = 5000) Write(KeyHash = 5000, Val = B)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

slide-26
SLIDE 26

FLAIR in Action

26

Objects stability array Read(KeyHash = 5000) Check Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

… Write(KeyHash = 5000, Val = B)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

slide-27
SLIDE 27

FLAIR in Action

27

Objects stability array Write(KeyHash = 5000, Val = B)

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A Key = 5000 Value = A

ReadResponse(KeyHash = 5000) Read(KeyHash = 5000) Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

slide-28
SLIDE 28

FLAIR in Action

28

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = A Key = 5000 Value = A

Write(KeyHash = 5000, Val = B)

Key = 5000 Value = A

Write(KeyHash = 5000, Val = B) Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

… Write(KeyHash = 5000, Val = B)

slide-29
SLIDE 29

FLAIR in Action

29

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = B Key = 5000 Value = B Key = 5000 Value = A

Ack. Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

slide-30
SLIDE 30

FLAIR in Action

30

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = B Key = 5000 Value = B Key = 5000 Value = A

WriteResponse(Key1,[L,F1]) Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Unstable Stable … Stable Replicas All

  • All

Stale Follower

slide-31
SLIDE 31

FLAIR in Action

31

Objects stability array WriteResponse(Key1,[L,F1]) Update

Leader Follower 1 Follower 2

Key = 5000 Value = B Key = 5000 Value = B Key = 5000 Value = A

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All L, F1 All …

Stale Follower

slide-32
SLIDE 32

FLAIR in Action

32

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = B Key = 5000 Value = B Key = 5000 Value = A

WriteResponse(Key1,[L,F1]) Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All L, F1 All …

Stale Follower

slide-33
SLIDE 33

FLAIR in Action

33

Objects stability array

Leader Follower 1 Follower 2

Key = 5000 Value = B Key = 5000 Value = B Key = 5000 Value = A

Key range [0,4096) [4096, 8192) [8192, 12288) … Status Stable Stable Stable … Stable Replicas All L, F1 All … Read(KeyHash = 5000) Read(KeyHash = 5000) Read(KeyHash = 5000) Check

Stale Follower

slide-34
SLIDE 34

FLAIR Design

  • Concurrent writes to the same object
  • Packet reordering
  • Failures
  • Switch failure
  • Leader failure
  • Follower failure
  • Network partitioning

Protocol validation

  • Detailed proof
  • TLA+ model checking

34

slide-35
SLIDE 35

Concurrent Writes

35

Time

  • n the switch

w2(x) w1(x)

w1request w2request w1response w2response x is unstable

slide-36
SLIDE 36

Concurrent Writes

36

Time

  • n the switch

w2(x) w1(x)

w1request w2request w1response w2response

x is unstable

slide-37
SLIDE 37

Concurrent Writes

37

Time

  • n the switch

w2(x, w2_seq#) w1(x, w1_seq#)

w1request w2request w1response w2response

x is unstable

  • Every write gets a unique sequence number
  • Objects stability array stores the sequence number of the last write
slide-38
SLIDE 38

Concurrent Writes

38

Time

  • n the switch

w2(x, w2_seq#) w1(x, w1_seq#)

w1request w2request w1response w2response

x is unstable

Hash range [0,4096) … Status Unstable … Stable Replicas

Seq#

w2_seq#

unstable objects array

  • Every write gets a unique sequence number
  • Objects stability array stores the sequence number of the last write
  • Objects remain unstable until last sequence number is acknowledged
slide-39
SLIDE 39

Outline

  • Overview of programmable switches
  • FLAIR architecture
  • Implementation
  • Evaluation

39

slide-40
SLIDE 40

Im Implementation

FlairKV is a key-value store that optimizes Raft using FLAIR

  • Implemented using P4 language
  • Utilizes 12 registers and 30 tables (only 5% of ASIC memory)
  • Implemented consistency-aware load balancing techniques
  • Random
  • Leader avoidance
  • Follower load awareness

40

slide-41
SLIDE 41

Evaluation

Alternatives

  • Leader-based (Raft, VR)
  • Leader lease (Opt. Raft)
  • Unreplicated
  • Fast Paxos
  • Follower-leases (Fleases)

Metrics

  • Throughout
  • Latency
  • Load balancing efficiency

41

We varied:

  • Number of replicas
  • Number of clients
  • Read to write ratio
  • Workload skewness
  • Data set size
slide-42
SLIDE 42

Throughput

42

100 200 300 400 500 1 4 16 64 256

Throughput (1000 ops/sec)

Number of clients Workload YCSB-B (95% reads) Uniform workload

slide-43
SLIDE 43

Throughput

43

100 200 300 400 500 1 4 16 64 256

Throughput (1000 ops/sec)

Number of clients

Raft Fast Paxos VR

Workload YCSB-B (95% reads) Uniform workload

slide-44
SLIDE 44

Throughput

44

100 200 300 400 500 1 4 16 64 256

Throughput (1000 ops/sec)

Number of clients

  • Opt. Raft

Raft Fast Paxos Unrep. VR

Workload YCSB-B (95% reads) Uniform workload

slide-45
SLIDE 45

Throughput

45

100 200 300 400 500 1 4 16 64 256

Throughput (1000 ops/sec)

Number of clients

Fleases

  • Opt. Raft

Raft Fast Paxos Unrep. VR

Workload YCSB-B (95% reads) Uniform workload

slide-46
SLIDE 46

Throughput

46

100 200 300 400 500 1 4 16 64 256

Throughput (1000 ops/sec)

Number of clients

FlairKV Fleases

  • Opt. Raft

Raft Fast Paxos Unrep. VR

Workload YCSB-B (95% reads) Uniform workload 42% higher throughput 2.1X higher throughput

slide-47
SLIDE 47

Read Latency

47

Workload YCSB-B (95% reads) Uniform workload

CDF

slide-48
SLIDE 48

Read Latency

48

Workload YCSB-B (95% reads) Uniform workload

CDF

slide-49
SLIDE 49

Read Latency

49

Workload YCSB-B (95% reads) Uniform workload

CDF

slide-50
SLIDE 50

Read Latency

50

Workload YCSB-B (95% reads) Uniform workload FLAIR reduces latency

  • Avoid inconsistent

replicas

  • Avoid the leader

CDF

50% lower latency

slide-51
SLIDE 51

Conclusion

  • FLAIR a shim layer atop leader-based consensus protocol
  • Exploits programmable switches
  • Builds in-network consistency-aware load balancing
  • Maintains linearizability
  • FlairKV achieves
  • 2.1 higher throughput than classical approaches, and up to 1.4 than leases
  • 2.4 lower latency than classical approaches, and up to 1.5 than leases

Despite their limitations, programmable switches can be leveraged to accelerate complex system protocols.

51

slide-52
SLIDE 52

52

FLAIR project: https://wasl.uwaterloo.ca/projects/flair/