Transactional Consistency and Automatic Management in an - - PowerPoint PPT Presentation

transactional consistency and automatic management in an
SMART_READER_LITE
LIVE PREVIEW

Transactional Consistency and Automatic Management in an - - PowerPoint PPT Presentation

Transactional Consistency and Automatic Management in an Application Data Cache Dan Ports Austin Clements Irene Zhang Samuel Madden Barbara Liskov MIT CSAIL Tuesday, October 5, 2010 Modern web applications face immense scaling challenges


slide-1
SLIDE 1

Transactional Consistency and Automatic Management in an Application Data Cache

Dan Ports Austin Clements Irene Zhang Samuel Madden Barbara Liskov MIT CSAIL

Tuesday, October 5, 2010

slide-2
SLIDE 2

Modern web applications face immense scaling challenges

increasingly complex, personalized content e.g. Facebook, MediaWiki, LiveJournal...

Existing caching techniques are less useful

whole-page caches: foiled by personalization database caches: more processing is being done in the application layer

Tuesday, October 5, 2010

slide-3
SLIDE 3

Application-Level Caching

Application Database Cache

Tuesday, October 5, 2010

slide-4
SLIDE 4

e.g. memcached, Java object caches

Application-Level Caching

Application Database Cache

Tuesday, October 5, 2010

slide-5
SLIDE 5

e.g. memcached, Java object caches very lightweight in-memory caches stores application objects (computations), i.e.: not a database replica not a query cache

Application-Level Caching

Application Database Cache

Tuesday, October 5, 2010

slide-6
SLIDE 6

Why Cache Application Data?

Cache higher-level data closer to app needs: DB queries, complex structures, HTML fragments Can separate common and customized content Reduces database load Reduces application server load

  • this matters too (application servers aren’t cheap!)

Tuesday, October 5, 2010

slide-7
SLIDE 7

Existing Caches Add To Application Complexity

No transactional consistency

  • violates guarantees of the underlying DB
  • app. code must deal with transient anomalies

Hash table interface leaves apps responsible for:

  • naming and retrieving cache entries
  • keeping cache up-to-date (invalidations)

Tuesday, October 5, 2010

slide-8
SLIDE 8

Harder Than You Think!

Naming: cache key must uniquely identify value

  • MediaWiki stored list of recent changes with

same key regardless of # days requested (#7541)

Invalidations: require reasoning globally about entire application

  • After editing wiki page, what to invalidate?

Tuesday, October 5, 2010

slide-9
SLIDE 9

Harder Than You Think!

Naming: cache key must uniquely identify value

  • MediaWiki stored list of recent changes with

same key regardless of # days requested (#7541)

Invalidations: require reasoning globally about entire application

  • After editing wiki page, what to invalidate?
  • Forgot editor’s User object – contains edit count

(#8391)

Tuesday, October 5, 2010

slide-10
SLIDE 10

Introducing TxCache

Our cache provides:

  • transactional consistency: serializable, point-in-

time view of data, whether from cache or DB

  • bounded staleness: improves hit rate for

applications that accept old (but consistent) data

  • simpler interface:

applications mark functions cacheable; TxCache caches their results, including naming and invalidations

Tuesday, October 5, 2010

slide-11
SLIDE 11

Application Database TxCache Library Cache

  • TxCache library hides

complexity of cache management

  • Integrates with new

cache server, minor DB modifications (Postgres; <2K lines changed)

  • Together, ensure

whole-system transactional consistency

Tuesday, October 5, 2010

slide-12
SLIDE 12
  • beginRO(staleness), commit(),

beginRW(), abort()

  • make-cacheable(fn)

where fn is a side-effect-free function that depends

  • nly on its arguments and the database state

➔ fn returns cached result of previous call with same inputs if still consistent w/ DB

TxCache Interface

Tuesday, October 5, 2010

slide-13
SLIDE 13
  • beginRO(staleness), commit(),

beginRW(), abort()

  • make-cacheable(fn)

where fn is a side-effect-free function that depends

  • nly on its arguments and the database state

➔ fn returns cached result of previous call with same inputs if still consistent w/ DB

TxCache Interface

That’s it.

Tuesday, October 5, 2010

slide-14
SLIDE 14
  • beginRO(staleness), commit(),

beginRW(), abort()

  • make-cacheable(fn)

where fn is a side-effect-free function that depends

  • nly on its arguments and the database state

➔ fn returns cached result of previous call with same inputs if still consistent w/ DB

TxCache Interface

That’s it. Really!

Tuesday, October 5, 2010

slide-15
SLIDE 15

TxCache Library Application

Tuesday, October 5, 2010

slide-16
SLIDE 16

CALL TxCache Library Application

Tuesday, October 5, 2010

slide-17
SLIDE 17

LOOKUP CALL TxCache Library Application

Tuesday, October 5, 2010

slide-18
SLIDE 18

HIT LOOKUP CALL TxCache Library Application

Tuesday, October 5, 2010

slide-19
SLIDE 19

MISS LOOKUP CALL TxCache Library Application

Tuesday, October 5, 2010

slide-20
SLIDE 20

MISS LOOKUP UPCALL CALL TxCache Library Application

Tuesday, October 5, 2010

slide-21
SLIDE 21

MISS LOOKUP UPCALL CALL TxCache Library QUERIES Application

Tuesday, October 5, 2010

slide-22
SLIDE 22

MISS LOOKUP UPCALL CALL INSERT TxCache Library QUERIES Application

Tuesday, October 5, 2010

slide-23
SLIDE 23

MISS LOOKUP UPCALL CALL INSERT TxCache Library QUERIES Application

Tuesday, October 5, 2010

slide-24
SLIDE 24

Outline

  • 1. Application-Level Caching
  • 2. TxCache Interface
  • 3. Ensuring Transactional Consistency
  • 4. Automating Invalidations
  • 5. Evaluation

Tuesday, October 5, 2010

slide-25
SLIDE 25

Consistency Approach

Goal: all data seen in a transaction reflects single point-in-time snapshot

  • Assign timestamp to transaction
  • Know the validity interval of each object

in cache or database: set of timestamps when it was valid

  • Then: transaction can read data if data’s

validity interval contains txn’s timestamp

Tuesday, October 5, 2010

slide-26
SLIDE 26

A Versioned Cache

Cache entries tagged with validity intervals

  • each entry one immutable version of an object
  • allows lookup for value valid at certain time

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-27
SLIDE 27

A Versioned Cache

Cache entries tagged with validity intervals

  • each entry one immutable version of an object
  • allows lookup for value valid at certain time

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-28
SLIDE 28

Staleness

Assign transaction an earlier timestamp

  • if consistent with application requirements
  • allows cached data to be used longer

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-29
SLIDE 29

Staleness

Assign transaction an earlier timestamp

  • if consistent with application requirements
  • allows cached data to be used longer

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-30
SLIDE 30

Staleness

Assign transaction an earlier timestamp

  • if consistent with application requirements
  • allows cached data to be used longer

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-31
SLIDE 31

Requires starting a DB transaction at same timestamp

  • internally, snapshot isolation supports this
  • added interface to expose this to cache library

Staleness

Assign transaction an earlier timestamp

  • if consistent with application requirements
  • allows cached data to be used longer

Tuesday, October 5, 2010

slide-32
SLIDE 32

Where Do Validity Intervals Come From?

Tuesday, October 5, 2010

slide-33
SLIDE 33

Where Do Validity Intervals Come From?

Validity of an application object = validity of the DB queries used to generate it

  • library tracks query dependencies

Tuesday, October 5, 2010

slide-34
SLIDE 34

Where Do Validity Intervals Come From?

Validity of an application object = validity of the DB queries used to generate it

  • library tracks query dependencies

Validity of a DB query = validity of the tuples accessed to compute it

  • we modify the DB to report this

Tuesday, October 5, 2010

slide-35
SLIDE 35

Where Do Validity Intervals Come From?

Validity of an application object = validity of the DB queries used to generate it

  • library tracks query dependencies

Validity of a DB query = validity of the tuples accessed to compute it

  • we modify the DB to report this

Validity of a tuple = timestamps of creating, deleting transactions

  • multiversion DBs already track this

Tuesday, October 5, 2010

slide-36
SLIDE 36

Computing Query Validity

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-37
SLIDE 37

Computing Query Validity

inserted by txn #41

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-38
SLIDE 38

Computing Query Validity

inserted by txn #41 deleted by txn #50

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-39
SLIDE 39

Computing Query Validity

SELECT * FROM ...;

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-40
SLIDE 40

Computing Query Validity

SELECT * FROM ...;

result = {x, y}

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-41
SLIDE 41

Computing Query Validity

Intersect validity intervals of tuples accessed

SELECT * FROM ...;

result = {x, y} VALIDITY [41, 48)

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-42
SLIDE 42

Computing Query Validity

Intersect validity intervals of tuples accessed

SELECT * FROM ...;

result = {x, y} VALIDITY [41, 48)

x y

time

z q

40 45 50

Tuesday, October 5, 2010

slide-43
SLIDE 43

Lazy Timestamp Selection

Hard to choose timestamp a priori

  • Don’t know access pattern or cache contents
  • Insight: don’t have to choose right away!

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-44
SLIDE 44

Lazy Timestamp Selection

Hard to choose timestamp a priori

  • Don’t know access pattern or cache contents
  • Insight: don’t have to choose right away!

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-45
SLIDE 45

Lazy Timestamp Selection

Hard to choose timestamp a priori

  • Don’t know access pattern or cache contents
  • Insight: don’t have to choose right away!

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-46
SLIDE 46

Lazy Timestamp Selection

Hard to choose timestamp a priori

  • Don’t know access pattern or cache contents
  • Insight: don’t have to choose right away!

K1 K2 time K3 K4

Tuesday, October 5, 2010

slide-47
SLIDE 47

Outline

  • 1. Application-Level Caching
  • 2. TxCache Interface
  • 3. Ensuring Transactional Consistency
  • 4. Automating Invalidations
  • 5. Evaluation

Tuesday, October 5, 2010

slide-48
SLIDE 48

Invalidations

What about objects that are still valid?

  • don’t know their upper validity bound yet!
  • represent as open-ended validity intervals

Later, database notifies cache if object changes; cache truncates interval

K1 K2 time now

Tuesday, October 5, 2010

slide-49
SLIDE 49

Invalidation Tags

How to identify which objects changed?

  • DB doesn’t know which app-level objects are cached

Objects in cache have invalidation tags

  • Modified DB to assign invalidation tags to each query
  • DB generates list of tags affected by each update
  • Cache finds affected objects and updates interval

Tuesday, October 5, 2010

slide-50
SLIDE 50

Invalidation Tags

  • Inval. tags come from query’s access methods
  • TABLE:KEY=VALUE for queries that use index lookups
  • TABLE:* for non-indexed queries (rare)

SELECT * FROM users WHERE name = ‘floyd’; [result] INVALIDATION TAGS users:name=floyd

Tuesday, October 5, 2010

slide-51
SLIDE 51

Invalidation Stream

On each update, DB generates affected tags:

  • for each tuple affected, one tag per index key

Broadcasts to all cache nodes

  • ordered stream, with transaction timestamps

Cache lookups treat unbounded intervals as bounded at last timestamp received

  • avoids invalidate & lookup race conditions

Tuesday, October 5, 2010

slide-52
SLIDE 52

Outline

  • 1. Application-Level Caching
  • 2. TxCache Interface
  • 3. Ensuring Transactional Consistency
  • 4. Automating Invalidations
  • 5. Evaluation

Tuesday, October 5, 2010

slide-53
SLIDE 53

Evaluation

  • How much benefit from adding caching?
  • Does using stale data help?
  • Does consistency hurt performance?

Tuesday, October 5, 2010

slide-54
SLIDE 54

RUBiS Benchmarks

RUBiS: simulated eBay-like auction site

  • standard browsing & bidding workload; 85% read-only
  • two datasets: 850 MB (in-memory), 6 GB (disk-bound)

All servers 2x 3.20 GHz Xeon, 2 GB RAM

  • 1 DB server (modified Postgres 8.2.11)
  • 9 frontend/cache servers (Apache 2 / PHP 5)

Tuesday, October 5, 2010

slide-55
SLIDE 55

Cache Performance

(in-memory DB; 2 cache nodes)

1000 2000 3000 4000 5000 0% 20% 40% 60% 80% 100% 64MB 256MB 512MB 768MB 1024MB Cache size

Max throughput (requests/sec) Cache hit rate

Tuesday, October 5, 2010

slide-56
SLIDE 56

Cache Performance

(disk-bound DB; 8 shared web/cache nodes)

Max throughput (requests/sec) Cache hit rate

100 200 300 400 500 600 80% 85% 90% 95% 100% 1GB 2GB 3GB 4GB 5GB 6GB 7GB 8GB 9GB Cache size

Tuesday, October 5, 2010

slide-57
SLIDE 57

Even A Little Staleness Helps

0x 1x 2x 3x 4x 5x 6x 20 40 60 80 100 120 Relative throughput Staleness limit in seconds no caching (baseline) TxCache (in-memory DB) TxCache (disk-bound DB)

Tuesday, October 5, 2010

slide-58
SLIDE 58

Costs of Consistency

Cache misses classified as:

  • compulsory: data never seen
  • staleness: data invalidated & too old to use
  • capacity: data was evicted
  • consistency: data available but inconsistent w/ prior reads

consistency misses configuration (% of total misses) in-memory, 512 MB, 30 s stale 7.8% in-memory, 512 MB, 15 s stale 5.4% in-memory, 64 MB, 30 s stale 0.2% disk-bound, 9 GB, 30 s stale 0.7%

Tuesday, October 5, 2010

slide-59
SLIDE 59

Costs of Consistency

Cache misses classified as:

  • compulsory: data never seen
  • staleness: data invalidated & too old to use
  • capacity: data was evicted
  • consistency: data available but inconsistent w/ prior reads

consistency misses configuration (% of total misses) in-memory, 512 MB, 30 s stale 7.8% in-memory, 512 MB, 15 s stale 5.4% in-memory, 64 MB, 30 s stale 0.2% disk-bound, 9 GB, 30 s stale 0.7%

common to

  • ther caches

}

Tuesday, October 5, 2010

slide-60
SLIDE 60

Costs of Consistency

Verified experimentally by disabling consistency:

transaction can read any data valid in last 30 sec

1000 2000 3000 4000 5000 64MB 256MB 512MB 768MB 1024MB Peak requests/sec Cache size

no caching TxCache no consistency

Tuesday, October 5, 2010

slide-61
SLIDE 61

Related Work

Application-level caches:

  • more flexible than whole-page caches: partial results
  • require explicit management by application
  • no transactional support (e.g. memcached)
  • r transactions only within cache (e.g. JBoss, AppFabric)

Database replication:

  • FAS, Ganymed: keep stale replicas with batched updates
  • can’t apply methods to app-level caching

Tuesday, October 5, 2010

slide-62
SLIDE 62

Conclusion

TxCache: application-layer caching with a simpler programming model

  • provides transactional consistency across both cache

and database

  • automatic management: applications not responsible for

lookups, updates, invalidations

New mechanisms:

  • consistency ensured by tracking object validity intervals
  • automatic database-generated invalidations

Consistency imposes little performance cost

Tuesday, October 5, 2010