Memcache as a Service Tom Anderson Goals Rapid application - - PowerPoint PPT Presentation

memcache as a service
SMART_READER_LITE
LIVE PREVIEW

Memcache as a Service Tom Anderson Goals Rapid application - - PowerPoint PPT Presentation

Memcache as a Service Tom Anderson Goals Rapid application development (velocity) - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every user


slide-1
SLIDE 1

Memcache as a Service

Tom Anderson

slide-2
SLIDE 2

Goals

Rapid application development (“velocity”)

  • Speed of adding new features is paramount

Scale

– Billions of users – Every user on FB all the time

Performance

– Low latency for every user everywhere

Fault tolerance

– Scale implies failures

Consistency model:

– “Best effort eventual consistency”

slide-3
SLIDE 3

Facebook’s Scaling Problem

  • Rapidly increasing user base

– Small initial user base – 2x every 9 months – 2013: 1B users globally

  • Users read/update many times per day

– Increasingly intensive app logic per user – 2x I/O every 4-6 months

  • Infrastructure has to keep pace
slide-4
SLIDE 4

Scaling Strategy

Adapt off the shelf components where possible Fix as you go

– no overarching plan

Rule of thumb: Every order of magnitude requires a rethink

slide-5
SLIDE 5

Three-Tier Web Architecture

Front End Server Client Front End Server Front End Server Front End Server Cache Server Cache Server Cache Server Storage Server Storage Server Storage Server Storage Server Storage Server

slide-6
SLIDE 6

Three-Tier Web Architecture

Front End Server Client Front End Server Front End Server Front End Server Cache Server Cache Server Cache Server Storage Server Storage Server Storage Server Storage Server Storage Server

slide-7
SLIDE 7

Three-Tier Web Architecture

Front End Server Client Front End Server Front End Server Front End Server Cache Server Cache Server Cache Server Storage Server Storage Server Storage Server Storage Server Storage Server

Cache miss

slide-8
SLIDE 8

Facebook Three Layer Architecture

  • Application front end

– Stateless, rapidly changing program logic – If app server fails, redirect client to new app server

  • Memcache

– Lookaside key-value cache – Keys defined by app logic (can be computed results)

  • Fault tolerant storage backend

– Stateful – Careful engineering to provide safety and performance – Both SQL and NoSQL

slide-9
SLIDE 9

Facebook Workload

Each user’s page is unique

– draws on events posted by other users

Users not in cliques

– For the most part

User popularity is zipf

– Some user posts affect very large #’s of other pages – Most affect a much smaller number

slide-10
SLIDE 10

Workload

  • Many small lookups
  • Many dependencies
  • App logic: many diffuse, chained reads

– latency of each read is crucial

  • Much smaller update rate

– still large in absolute terms

slide-11
SLIDE 11

Scaling

  • A few servers
  • Many servers
  • An entire data center
  • Many data centers

Each step 10-100x previous one

slide-12
SLIDE 12

Facebook

  • Scale by hashing to partitioned servers
  • Scale by caching
  • Scale by replicating popular keys
  • Scale by replicating clusters
  • Scale by replicating data centers
slide-13
SLIDE 13

Scale By Consistent Hashing

Hash users to front end web servers Hash keys to memcache servers Hash files to SQL servers Result of consistent hashing is all to all communication pattern

– Each web server pulls data from all memcache servers and all storage servers

slide-14
SLIDE 14

Scale By Caching: Memcache

Sharded in-memory key-value cache

– Key, values assigned by application code – Values can be data, result of computation – Independent of backend storage architecture (SQL, noSQL) or format – Design for high volume, low latency

Lookaside architecture

slide-15
SLIDE 15

Lookaside Read

Cache Web Server SQL

  • 1. get k

data

slide-16
SLIDE 16

Lookaside Read

Cache Web Server SQL

  • 2. get k

data

slide-17
SLIDE 17

Lookaside Read

Cache Web Server SQL

  • 3. put k
  • k!
slide-18
SLIDE 18

Lookaside Operation (Read)

  • Webserver needs key value
  • Webserver requests from memcache
  • Memcache: If in cache, return it
  • If not in cache:

– Return error – Webserver gets data from storage server – Possibly an SQL query or complex computation – Webserver stores result back into memcache

slide-19
SLIDE 19

Question

What if swarm of users read same key at the same time?

slide-20
SLIDE 20

Lookaside Write

Cache Web Server SQL

  • 1. update
  • k!
slide-21
SLIDE 21

Lookaside Write

Cache Web Server SQL

  • 2. delete k
  • k!
slide-22
SLIDE 22

Lookaside Operation (Write)

  • Webserver changes a value that would

invalidate a memcache entry

– Could be an update to a key – Could be an update to a table – Could be an update to a value used to derive some key value

  • Client puts new data on storage server
  • Client invalidates entry in memcache
slide-23
SLIDE 23

Why Not Delete then Update?

Cache Web Server SQL

  • 1. delete k
  • k!
slide-24
SLIDE 24

Why Not Delete then Update?

Cache Web Server SQL

  • 2. update
  • k!
slide-25
SLIDE 25

Why Not Delete then Update?

Cache Web Server SQL

  • 2. update
  • k!

Read miss might reload data before it is updated.

slide-26
SLIDE 26

Memcache Consistency

Is memcache linearizable?

slide-27
SLIDE 27

Example

Reader Read cache If missing, Fetch from database Store back to cache Writer Change database Delete cache entry Interleave any # of readers/writers

slide-28
SLIDE 28

Example

  • Read cache
  • Read database
  • Store back to cache
  • change database
  • Delete entry
slide-29
SLIDE 29

Memcache Consistency

Is the lookaside protocol eventually consistent?

slide-30
SLIDE 30

Lookaside With Leases

Goals:

– Reduce (not eliminate) per-key inconsistencies – Reduce cache miss swarms

On a read miss:

– leave a marker in the cache (fetch in progress) – return timestamp – check timestamp when filling the cache – if changed in meantime: don't overwrite

If another thread read misses:

– find marker and wait for update (retry later)

slide-31
SLIDE 31

Question

What if front end crashes while holding read lease? Would any other front end be able to read the data?

slide-32
SLIDE 32

Question

Is FB’s version of lookaside with leases linearizable?

slide-33
SLIDE 33

Example: Cache data with 1 replica

Reader1 (data cached) Read replica1 (old value) Read replica1 (old value) Writer Change database CRASH! (before Delete cache)

slide-34
SLIDE 34

Question

Is FB’s version of lookaside with leases linearizable? Note FB allows popular data to be found in multiple cache servers

slide-35
SLIDE 35

FB Replicates Popular Data Across Caches

Front End Server Client Front End Server Front End Server Front End Server Cache Server Cache Server Cache Server Storage Server Storage Server Storage Server Storage Server Storage Server Client

Dance monkey Dance monkey refresh refresh

slide-36
SLIDE 36

Example: Cache data with 2 replicas

Reader1 (data cached) Read replica1 (old value) Read replica1 (old value) Writer Change database CRASH! (before Delete cache) Reader2 (not cached) Read replica2 Miss Fetch from db Write back to replica 2 (new value)

slide-37
SLIDE 37

Latency Optimizations

Concurrent lookups

– Issue many lookups concurrently – Prioritize those that have chained dependencies

Batching

– Batch multiple requests (e.g., for different end users) to the same memcache server

Incast control:

– Limit concurrency to avoid collisions among RPC responses

slide-38
SLIDE 38

More Optimizations

Return stale data to web server if lease is held

– No guarantee that concurrent requests returning stale data will be consistent with each other

Partitioned memory pools

– Infrequently accessed, expensive to recompute – Frequently accessed, cheap to recompute – If mixed, frequent accesses will evict all others

Replicate keys if access rate is too high

slide-39
SLIDE 39

Gutter Cache

When a memcache server fails, flood of requests to fetch data from storage layer

– Slower for users needing any key on failed server – Slower for users due to storage server contention

Solution: backup (gutter) cache

– Time-to-live invalidation (ok if clients disagree as to whether memcache server is still alive) – TTL is eventually consistent

slide-40
SLIDE 40

Scaling Within a Cluster

What happens as we increase the number of memcache servers to handle more load?

– Recall: All to all communication pattern – Less data between any pair of nodes: less batching – Need even more replication of popular keys – More failures: need bigger gutter cache – …

slide-41
SLIDE 41

Multi-Cluster Scaling

Multiple independent clusters within data center

– Each with front-ends, memcache servers – Data replicated in the caches in each partition – Shared storage backend

Data is replicated in each cluster (inefficient?)

– need to invalidate every cluster on every update

Instead:

– invalidate local cluster on update (read my writes) – background invalidate driven off of database update log – temporary inconsistency!

slide-42
SLIDE 42

Multi-Cluster Scaling

Cache Web Server SQL get Web Server Cache get

slide-43
SLIDE 43

Multi-Cluster Scaling

Cache Web Server SQL Web Server Cache get update

slide-44
SLIDE 44

Multi-Cluster Scaling

Cache Web Server SQL Web Server Cache get update delete

slide-45
SLIDE 45

mcsqueal

Web servers talk to local memcache. On update:

– Acquire local lease – Tell storage layer which keys to invalidate – Invalidate local memcache

Storage layer sends invalidations to other clusters

– Scan database log for updates/invalidations – Batch invalidations to each cluster (mcrouter) – Forward/batch invalidations to remote memcache servers

slide-46
SLIDE 46

Per-Cluster vs. Multi-Cluster

Per-cluster memcache servers

– Frequently accessed data – Inexpensive to compute data – Lower latency, less efficient use of memory

Shared multi-cluster memcache servers

– infrequently accessed – hard to compute data – higher latency, more memory efficient

slide-47
SLIDE 47

Cold Start Consistency

During new cluster startup:

– Many cache misses! – Lots of extra load on SQL servers

Instead of going to SQL server on cache miss:

– Webserver gets data from warm memcache cluster – Puts data into local cluster – Subsequent requests hit in local cluster

slide-48
SLIDE 48

Multi-Region Scaling

Storage layer consistency

– Storage at one data center designated as primary – All updates applied at primary – Updates propagated in background to other data centers What could go wrong?

slide-49
SLIDE 49

Stale Reads

Primary Copy Read-only Cache write

slide-50
SLIDE 50

Multi-Region Consistency

To perform an update to key:

– put marker into local region – Send write to primary region – Delete local copy

On a cache miss:

– Check if local marker – If so, fetch data from primary region – Fill local copy

slide-51
SLIDE 51

FB: Data Centers without Data

Tradeoff in increasing number of data centers

– Lower latency when data near clients – More consistency overhead – More opportunity for inconsistency

Mini-data centers

– Front end web servers – Memcache servers – No backend storage: remote access for cache misses

slide-52
SLIDE 52

Linearizability?

Is linearizability possible with a memcache layer?

  • Needs help from storage layer
  • Every cached copy removed before write

What about snapshot reads?

  • Needs help from storage layer
  • Every copy has version timestamp range
  • Multikey query valid if ranges overlap