Chubby Doug Woos Logistics notes Lab 3a due tonight Fridays class - - PowerPoint PPT Presentation

chubby
SMART_READER_LITE
LIVE PREVIEW

Chubby Doug Woos Logistics notes Lab 3a due tonight Fridays class - - PowerPoint PPT Presentation

Chubby Doug Woos Logistics notes Lab 3a due tonight Fridays class is in GWN 201! Next few papers Three real-world systems from Google Chubby: coordination service BigTable: storage for structured data GFS: storage for bulk data All


slide-1
SLIDE 1

Chubby

Doug Woos

slide-2
SLIDE 2

Logistics notes

Lab 3a due tonight Friday’s class is in GWN 201!

slide-3
SLIDE 3

Next few papers

Three real-world systems from Google Chubby: coordination service BigTable: storage for structured data GFS: storage for bulk data All highly influential, have open-source clones Chubby -> Zookeeper, etcd BigTable -> HBase, Cassandra, other NoSQL stores GFS -> HDFS

slide-4
SLIDE 4

Chubby

Distributed coordination service Goal: allow client applications to synchronize and manage dynamic configuration state Intuition: only some parts of an app need consensus!

  • Lab 2: Highly available view service
  • Master election in a distributed FS (e.g. GFS)
  • Metadata for sharded services

Implementation: (Multi-)Paxos SMR

slide-5
SLIDE 5

Why Chubby?

Many applications need coordination (locking, metadata, etc). Every sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug- ridden, slow implementation of Paxos Paxos is a known good solution (Multi-)Paxos is hard to implement and use

slide-6
SLIDE 6

How to do consensus as a service

Chubby provides:

  • Small files
  • Locking
  • “Sequencers”

Filesystem-like API

  • Open, Close, Poison
  • GetContents, SetContents, Delete
  • Acquire, TryAcquire, Release
  • GetSequencer, SetSequencer, CheckSequencer
slide-7
SLIDE 7

How to do consensus as a service

Chubby provides:

  • Small files
  • Locking
  • “Sequencers”

Filesystem-like API

  • Open, Close, Poison
  • GetContents, SetContents, Delete
  • Acquire, TryAcquire, Release
  • GetSequencer, SetSequencer, CheckSequencer
slide-8
SLIDE 8

Example: primary election

x = Open(“/ls/cell/service/primary") if (TryAcquire(x) == success) { // I'm the primary, tell everyone SetContents(x, my-address) } else { // I'm not the primary, find out who is primary = GetContents(x) // also set up notifications // in case the primary changes }

slide-9
SLIDE 9

Example

Paxos Chubby App App Client

slide-10
SLIDE 10

Example

Paxos Chubby App App TryAcquire Client

slide-11
SLIDE 11

Example

Paxos Chubby App App OK Client

slide-12
SLIDE 12

Example

Paxos Chubby App Primary Client

slide-13
SLIDE 13

Example

Paxos Chubby App Primary Client TryAcquire

slide-14
SLIDE 14

Example

Paxos Chubby App Primary Client Nope

slide-15
SLIDE 15

Example

Paxos Chubby Backup Primary Client

slide-16
SLIDE 16

Example

Paxos Chubby Backup Primary Client GetContents

slide-17
SLIDE 17

Example

Paxos Chubby Backup Primary Client Primary

slide-18
SLIDE 18

Example

Paxos Chubby Backup Primary Client

slide-19
SLIDE 19

Example

Paxos Chubby Backup Primary Client Requests

slide-20
SLIDE 20

Why a lock service?

One option: a Paxos library (these exist) Why a service:

  • Easier to add to existing systems
  • Want to store small amounts of data, e.g. names,

externally (for clients)

  • Developers don’t understand Paxos!
  • As it turns out, they don’t understand locks either
  • Can have fewer app servers
slide-21
SLIDE 21

Performance

Not highly optimized! Later (and last Thursday): how to do Paxos, fast Paxos implementation: ~1000 ops/s Initially, needed to handle ~5000 ops/s How to scale?

  • Adding nodes to Paxos group?
slide-22
SLIDE 22

Performance

Batching Partitioning Leases (Consistent) Caching Proxies

slide-23
SLIDE 23

Batching

Master accumulates requests from many clients Does one round of Paxos to commit all to log Big throughput gains at expense of latency

  • Classic systems trick (e.g. disks)
  • Ubiquitous in systems w/o latency requirements
slide-24
SLIDE 24

Partitioning

Run multiple Paxos groups, each responsible for different keys Different replicas master in some, replica in others Common in practice

  • Alternative: Egalitarian Paxos
slide-25
SLIDE 25

Leases

Most requests are reads Want to avoid communication on reads

  • Communication not needed for durability
  • Just need to ensure master hasn’t changed

Optimization: master gets lease, renewed while up

  • Chubby: ~10s
  • Master can process reads alone if holding lease
  • If master fails, need to wait 10s before new master

can respond to requests (why?)

slide-26
SLIDE 26

Caching

Chubby uses client caching heavily

  • file data
  • file metadata (incl. non-existence!)

Writ-through, strong leases (+ invalidations)

  • Master tracks which clients might have file cached
  • Sends invalidations on update
  • Caches expire automatically after 12s
slide-27
SLIDE 27

Proxies

KeepAlives and invalidations are a huge % of load Use proxies to track state for groups of clients

  • To master, proxies act exactly like clients
  • To clients, proxies act exactly like master

Client Proxy Master

slide-28
SLIDE 28

Handling failure

Replica failure: no problem Master failure Client failure

slide-29
SLIDE 29

Master failure

Client stops hearing from master

  • Notifies application (stop sending new requests!)
  • Clears cache
  • “grace period” begins (wait for election before

giving up on Chubby entirely)

  • If new master found, continue
  • Otherwise, throw an error to the application
slide-30
SLIDE 30

Master failure

Meanwhile, in the Chubby cell… If master has failed:

  • Do leader election (PMMC)
  • Rebuild state from other replicas + clients
  • Wait for old lease to expire!
slide-31
SLIDE 31

Performance

~50k clients per cell ~20k files

  • Majority are open at any given time
  • Most < 1k
  • All < 256k (hard limit—why?)

2k RPCs/s

  • 93% KeepAlives!

All of these numbers probably bigger now!

slide-32
SLIDE 32

Name service

Surprising dominant use case: name servers! Problems with DNS

  • Designed for web, where slow propagation OK
  • Weak leases
  • Performance bad (see Ousterhout!) if TTLs are low

Chubby: decent performance, strongly consistent Why not use Chubby on the web?

slide-33
SLIDE 33

Discussion

Most errors in failover code

  • Netflix: Chaos Monkey

Chubby metadata stored in Chubby itself Advisory vs. Mandatory locks Importance of programmer convenience

  • Locks—familiar, but programmers get it wrong!

How much are clients trusted? Note: interesting paper called “Paxos made live”

  • Making Paxos work within Chubby
slide-34
SLIDE 34