Sharding Scaling Paxos: Shards We can use Paxos to decide on the - - PowerPoint PPT Presentation

sharding scaling paxos shards
SMART_READER_LITE
LIVE PREVIEW

Sharding Scaling Paxos: Shards We can use Paxos to decide on the - - PowerPoint PPT Presentation

Sharding Scaling Paxos: Shards We can use Paxos to decide on the order of operations, e.g., to a key-value store - leader sends each op to all servers - practical limit on how ops/second What if we want to scale to more clients? Sharding


slide-1
SLIDE 1

Sharding

slide-2
SLIDE 2

Scaling Paxos: Shards

We can use Paxos to decide on the order of operations, e.g., to a key-value store

  • leader sends each op to all servers
  • practical limit on how ops/second

What if we want to scale to more clients? Sharding among multiple Paxos groups

  • partition key-space among groups
  • for single key operations, still linearizable
slide-3
SLIDE 3

State machine

Paxos

Replicated, Sharded Database

State machine State machine

Paxos

State machine

Paxos

slide-4
SLIDE 4

State machine

Paxos

Replicated, Sharded Database

State machine State machine

Paxos

State machine

Paxos

Which keys are where?

slide-5
SLIDE 5

State machine

Paxos

Lab 4 (and other systems)

State machine State machine

Paxos

State machine

Paxos Paxos

Shard master

slide-6
SLIDE 6

Replicated, Sharded Database

Shard master decides

  • which Paxos group has which keys

Shards operate independently How do clients know who has what keys?

  • Ask shard master? Becomes the bottleneck!
  • Avoid shard master communication if possible

Can clients predict which group has which keys?

slide-7
SLIDE 7

Recurring Problem

Client needs to access some resource Sharded for scalability How does client find specific server to use? Central redirection won’t scale!

slide-8
SLIDE 8

Another scenario

Client

slide-9
SLIDE 9

Another scenario

Client GET index.html

slide-10
SLIDE 10

Another scenario

Client index.html

slide-11
SLIDE 11

Another scenario

Client index.html Links to: logo.jpg, jquery.js, …

slide-12
SLIDE 12

Another scenario

Client Cache 1 Cache 2 Cache 3 GET logo.jpg GET jquery.js

slide-13
SLIDE 13

Another scenario

Client 2 Cache 1 Cache 2 Cache 3 GET logo.jpg GET jquery.js

slide-14
SLIDE 14

Other Examples

Scalable stateless web front ends (FE)

  • cache efficient iff same client goes to same FE

Scalable shopping cart service Scalable email service Scalable cache layer (Memcache) Scalable network path allocation Scalable network function virtualization (NFV) …

slide-15
SLIDE 15

What’s in common?

Want to assign keys to servers with minimal communication, fast lookup Requirement 1: clients all have same assignment

slide-16
SLIDE 16

Proposal 1

For n nodes, a key k goes to k mod n Cache 1 Cache 2 Cache 3 “a”, “d”, “ab” “b” “c”

slide-17
SLIDE 17

Proposal 1

For n nodes, a key k goes to k mod n Problems with this approach? Cache 1 Cache 2 Cache 3 “a”, “d”, “ab” “b” “c”

slide-18
SLIDE 18

Proposal 1

For n nodes, a key k goes to k mod n Problems with this approach?

  • uneven distribution of keys

Cache 1 Cache 2 Cache 3 “a”, “d”, “ab” “b” “c”

slide-19
SLIDE 19

A Bit of Queueing Theory

Assume Poisson arrivals:

  • random, uncorrelated, memoryless
  • utilization (U): fraction of time server is busy (0 - 1)
  • service time (S): average time per request
slide-20
SLIDE 20

Queueing Theory

20 S 40 S 60 S 80 S 100 S 0.2 0.4 0.6 0.8 1.0

R = S/(1-U) Utilization U Response Time R

Variance in response time ~ S/(1-U)^2

slide-21
SLIDE 21

Requirements, revisited

Requirement 1: clients all have same assignment Requirement 2: keys uniformly distributed

slide-22
SLIDE 22

Proposal 2: Hashing

For n nodes, a key k goes to hash(k) mod n Hash distributes keys uniformly Cache 1 Cache 2 Cache 3 h(“a”)=1 h(“abc”)=2 h(“b”)=3

slide-23
SLIDE 23

Proposal 2: Hashing

For n nodes, a key k goes to hash(k) mod n Hash distributes keys uniformly But, new problem: what if we add a node? Cache 1 Cache 2 Cache 3 h(“a”)=1 h(“abc”)=2 h(“b”)=3

slide-24
SLIDE 24

Proposal 2: Hashing

For n nodes, a key k goes to hash(k) mod n Hash distributes keys uniformly But, new problem: what if we add a node? Cache 1 Cache 2 Cache 3 h(“a”)=1 h(“abc”)=2 h(“b”)=3 Cache 4

slide-25
SLIDE 25

Proposal 2: Hashing

For n nodes, a key k goes to hash(k) mod n Hash distributes keys uniformly But, new problem: what if we add a node? Cache 1 Cache 2 Cache 3 h(“a”)=1 h(“abc”)=2 h(“b”)=3 Cache 4 h(“a”)=3 h(“b”)=4

slide-26
SLIDE 26

h(“b”)=4 h(“a”)=3

Proposal 2: Hashing

For n nodes, a key k goes to hash(k) mod n Hash distributes keys uniformly But, new problem: what if we add a node?

  • Redistribute a lot of keys! (on average, all but K/n)

Cache 1 Cache 2 Cache 3 h(“abc”)=2 Cache 4

slide-27
SLIDE 27

Requirements, revisited

Requirement 1: clients all have same assignment Requirement 2: keys uniformly distributed Requirement 3: add/remove node moves only a few keys

slide-28
SLIDE 28

First, hash the node ids

Proposal 3: Consistent Hashing

slide-29
SLIDE 29

First, hash the node ids

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232

slide-30
SLIDE 30

First, hash the node ids

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1)

slide-31
SLIDE 31

First, hash the node ids

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2)

slide-32
SLIDE 32

First, hash the node ids

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3)

slide-33
SLIDE 33

First, hash the node ids

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3)

slide-34
SLIDE 34

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3)

slide-35
SLIDE 35

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “a”

slide-36
SLIDE 36

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “a” hash(“a”)

slide-37
SLIDE 37

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “a”

slide-38
SLIDE 38

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “b”

slide-39
SLIDE 39

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “b” hash(“b”)

slide-40
SLIDE 40

First, hash the node ids Keys are hashed, go to the “next” node

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 232 hash(1) hash(2) hash(3) “b”

slide-41
SLIDE 41

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3

slide-42
SLIDE 42

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b”

slide-43
SLIDE 43

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” What if we add a node?

slide-44
SLIDE 44

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” Cache 4

slide-45
SLIDE 45

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” Cache 4 Only “b” has to move! On average, K/n keys move

slide-46
SLIDE 46

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” Cache 4

slide-47
SLIDE 47

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” Cache 4

slide-48
SLIDE 48

Load Balance

Assume # keys >> # of servers

  • For example, 100K users -> 100 servers

How far off of equal balance is hashing?

  • What is typical worst case server?

How far off of equal balance is consistent hashing?

  • What is typical worst case server?
slide-49
SLIDE 49

Proposal 3: Consistent Hashing

Cache 1 Cache 2 Cache 3 “a” “b” Cache 4 Only “b” has to move! On average, K/n keys move but all between two nodes

slide-50
SLIDE 50

Requirements, revisited

Requirement 1: clients all have same assignment Requirement 2: keys uniformly distributed Requirement 3: add/remove node moves only a few keys Requirement 4: minimize worst case overload Requirement 5: parcel out work of redistributing keys

slide-51
SLIDE 51

First, hash the node ids to multiple locations

Proposal 4: Virtual Nodes

Cache 1 Cache 2 Cache 3 232

slide-52
SLIDE 52

First, hash the node ids to multiple locations

Proposal 4: Virtual Nodes

Cache 1 Cache 2 Cache 3 232 1 1 1 1 1

slide-53
SLIDE 53

First, hash the node ids to multiple locations

Proposal 4: Virtual Nodes

Cache 1 Cache 2 Cache 3 232 1 1 1 1 1 2 2 2 2 2

slide-54
SLIDE 54

First, hash the node ids to multiple locations As it turns out, hash functions come in families s.t. their members are independent. So this is easy!

Proposal 4: Virtual Nodes

Cache 1 Cache 2 Cache 3 232 1 1 1 1 1 2 2 2 2 2

slide-55
SLIDE 55

Prop 4: Virtual Nodes

Cache 1 Cache 2 Cache 3

slide-56
SLIDE 56

Prop 4: Virtual Nodes

Cache 1 Cache 2 Cache 3

slide-57
SLIDE 57

Prop 4: Virtual Nodes

Cache 1 Cache 2 Cache 3

slide-58
SLIDE 58

Prop 4: Virtual Nodes

Cache 1 Cache 2 Cache 3 Keys more evenly distributed and migration is evenly spread out.

slide-59
SLIDE 59

How Many Virtual Nodes?

How many virtual nodes do we need per server?

  • to spread worst case load
  • to distribute migrating keys

Assume 100000 clients, 100 servers

  • 10?
  • 100?
  • 1000?
  • 10000?
slide-60
SLIDE 60

Requirements, revisited

Requirement 1: clients all have same assignment Requirement 2: keys uniformly distributed Requirement 3: add/remove node moves only a few keys Requirement 4: minimize worst case overload Requirement 5: parcel out work of redistributing keys

slide-61
SLIDE 61

Key Popularity

  • What if some keys are more popular than others
  • Hashing is no longer load balanced!
  • One model for popularity is the Zipf distribution
  • Popularity of kth most popular item, 1 < c < 2
  • 1/k^c
  • Ex: 1, 1/2, 1/3, … 1/100 … 1/1000 … 1/10000
slide-62
SLIDE 62

Zipf “Heavy Tail” Distribution

slide-63
SLIDE 63

Zipf Examples

  • Web pages
  • Movies
  • Library books
  • Words in text
  • Salaries
  • City population
  • Twitter followers

Whenever popularity is self-reinforcing Popularity changes dynamically: what is popular right now?

slide-64
SLIDE 64

Proposal 5: Table Indirection

Consistent hashing is (mostly) stateless

  • Map is hash function of # servers, # virtual nodes
  • Unbalanced with zipf workloads, dynamic load

Instead, put a small table on each client: O(# vnodes)

  • table[hash(key)] -> server
  • Same table on every client
  • Shard master adjusts table entries to balance load
  • Periodically broadcast new table
slide-65
SLIDE 65

Table Indirection

Cache 1 Cache 2 Cache 3 1 2 3 3 3 2 2 232 2 Split hash range into buckets, assign each bucket to a server, busy server gets fewer buckets, can change over time

slide-66
SLIDE 66

Table Indirection

Cache 1 Cache 2 Cache 3 1 2 3 3 3 2 2 232 2 Split hash range into buckets, assign each bucket to a server, low load servers get more buckets, can change over time hash(“despacito”)

slide-67
SLIDE 67

Table Indirection

Cache 1 Cache 2 Cache 3 1 2 3 3 3 2 2 232 2 Split hash range into buckets, assign each bucket to a server, low load servers get more buckets, can change over time hash(“paxos”)

slide-68
SLIDE 68

Proposal 6: Power of Two Choices

Read-only or stateless workloads:

  • allow any task to be handled on one of two servers
  • pair picked at random: hash(k), hash’(k)
  • (using consistent hashing with virtual nodes)
  • periodically collect data about server load
  • send new work to less loaded server of the two
  • or with likelihood ~ (1 - load)
slide-69
SLIDE 69

Power of Two Choices

Why does this work?

  • every key assigned to a different random pair
  • suppose k1 happens to map to same server as a

popular key k2

  • k1’s alternate very likely to be different than k2’s

alternate Generalize: spread very busy keys over more choices

slide-70
SLIDE 70

Power of Two Choices

Cache 1 Cache 2 Cache 3 1 hash(“despacito”) 2

slide-71
SLIDE 71

Power of Two Choices

Cache 1 Cache 2 Cache 3 1 hash(“despacito”) 2

slide-72
SLIDE 72

Power of Two Choices

Cache 1 Cache 2 Cache 3 2 hash(“paxos”) 3

slide-73
SLIDE 73

Power of Two Choices

Cache 1 Cache 2 Cache 3 2 hash(“paxos”) 3

slide-74
SLIDE 74

Requirements, revisited

Requirement 1: clients all have same assignment Requirement 2: keys uniformly distributed Requirement 3: add/remove node moves only a few keys Requirement 4: minimize worst case overload Requirement 5: parcel out work of redistributing keys Requirement 6: balance work even with zipf demand

slide-75
SLIDE 75

Next

“Distributed systems in practice”

  • Memcache: scalable caching layer between

stateless front ends and storage

  • GFS: scalable distributed storage for stream files
  • BigTable: scalable key-value store
  • Spanner: cross-data center transactional key-value

store

slide-76
SLIDE 76

Thursday

Yegge on Service-Oriented Architectures

  • Steve Yegge, prolific programmer and blogger
  • Moved from Amazon to Google
  • Reading is an accidentally-leaked memo about

differences between Amazon’s and Google’s system architectures (at that time)

  • SOA: separate applications (e.g. Google Search) into

many primitive services, run internally as products