Transactional Consistency and Automatic Management in an Application Data Cache
Dan Ports Austin Clements Irene Zhang Samuel Madden Barbara Liskov MIT CSAIL
Tuesday, October 5, 2010
Transactional Consistency and Automatic Management in an - - PowerPoint PPT Presentation
Transactional Consistency and Automatic Management in an Application Data Cache Dan Ports Austin Clements Irene Zhang Samuel Madden Barbara Liskov MIT CSAIL Tuesday, October 5, 2010 Modern web applications face immense scaling challenges
Tuesday, October 5, 2010
increasingly complex, personalized content e.g. Facebook, MediaWiki, LiveJournal...
whole-page caches: foiled by personalization database caches: more processing is being done in the application layer
Tuesday, October 5, 2010
Application Database Cache
Tuesday, October 5, 2010
e.g. memcached, Java object caches
Application Database Cache
Tuesday, October 5, 2010
e.g. memcached, Java object caches very lightweight in-memory caches stores application objects (computations), i.e.: not a database replica not a query cache
Application Database Cache
Tuesday, October 5, 2010
Cache higher-level data closer to app needs: DB queries, complex structures, HTML fragments Can separate common and customized content Reduces database load Reduces application server load
Tuesday, October 5, 2010
Tuesday, October 5, 2010
Naming: cache key must uniquely identify value
same key regardless of # days requested (#7541)
Invalidations: require reasoning globally about entire application
Tuesday, October 5, 2010
Naming: cache key must uniquely identify value
same key regardless of # days requested (#7541)
Invalidations: require reasoning globally about entire application
(#8391)
Tuesday, October 5, 2010
Our cache provides:
time view of data, whether from cache or DB
applications that accept old (but consistent) data
applications mark functions cacheable; TxCache caches their results, including naming and invalidations
Tuesday, October 5, 2010
Application Database TxCache Library Cache
complexity of cache management
cache server, minor DB modifications (Postgres; <2K lines changed)
whole-system transactional consistency
Tuesday, October 5, 2010
beginRW(), abort()
where fn is a side-effect-free function that depends
➔ fn returns cached result of previous call with same inputs if still consistent w/ DB
Tuesday, October 5, 2010
beginRW(), abort()
where fn is a side-effect-free function that depends
➔ fn returns cached result of previous call with same inputs if still consistent w/ DB
Tuesday, October 5, 2010
beginRW(), abort()
where fn is a side-effect-free function that depends
➔ fn returns cached result of previous call with same inputs if still consistent w/ DB
Tuesday, October 5, 2010
TxCache Library Application
Tuesday, October 5, 2010
CALL TxCache Library Application
Tuesday, October 5, 2010
LOOKUP CALL TxCache Library Application
Tuesday, October 5, 2010
HIT LOOKUP CALL TxCache Library Application
Tuesday, October 5, 2010
MISS LOOKUP CALL TxCache Library Application
Tuesday, October 5, 2010
MISS LOOKUP UPCALL CALL TxCache Library Application
Tuesday, October 5, 2010
MISS LOOKUP UPCALL CALL TxCache Library QUERIES Application
Tuesday, October 5, 2010
MISS LOOKUP UPCALL CALL INSERT TxCache Library QUERIES Application
Tuesday, October 5, 2010
MISS LOOKUP UPCALL CALL INSERT TxCache Library QUERIES Application
Tuesday, October 5, 2010
Tuesday, October 5, 2010
validity interval contains txn’s timestamp
Tuesday, October 5, 2010
Cache entries tagged with validity intervals
K1 K2 time K3 K4
Tuesday, October 5, 2010
Cache entries tagged with validity intervals
K1 K2 time K3 K4
Tuesday, October 5, 2010
Assign transaction an earlier timestamp
K1 K2 time K3 K4
Tuesday, October 5, 2010
Assign transaction an earlier timestamp
K1 K2 time K3 K4
Tuesday, October 5, 2010
Assign transaction an earlier timestamp
K1 K2 time K3 K4
Tuesday, October 5, 2010
Requires starting a DB transaction at same timestamp
Assign transaction an earlier timestamp
Tuesday, October 5, 2010
Tuesday, October 5, 2010
Validity of an application object = validity of the DB queries used to generate it
Tuesday, October 5, 2010
Validity of an application object = validity of the DB queries used to generate it
Validity of a DB query = validity of the tuples accessed to compute it
Tuesday, October 5, 2010
Validity of an application object = validity of the DB queries used to generate it
Validity of a DB query = validity of the tuples accessed to compute it
Validity of a tuple = timestamps of creating, deleting transactions
Tuesday, October 5, 2010
x y
time
z q
40 45 50
Tuesday, October 5, 2010
inserted by txn #41
x y
time
z q
40 45 50
Tuesday, October 5, 2010
inserted by txn #41 deleted by txn #50
x y
time
z q
40 45 50
Tuesday, October 5, 2010
SELECT * FROM ...;
x y
time
z q
40 45 50
Tuesday, October 5, 2010
SELECT * FROM ...;
result = {x, y}
x y
time
z q
40 45 50
Tuesday, October 5, 2010
SELECT * FROM ...;
result = {x, y} VALIDITY [41, 48)
x y
time
z q
40 45 50
Tuesday, October 5, 2010
SELECT * FROM ...;
result = {x, y} VALIDITY [41, 48)
x y
time
z q
40 45 50
Tuesday, October 5, 2010
Hard to choose timestamp a priori
K1 K2 time K3 K4
Tuesday, October 5, 2010
Hard to choose timestamp a priori
K1 K2 time K3 K4
Tuesday, October 5, 2010
Hard to choose timestamp a priori
K1 K2 time K3 K4
Tuesday, October 5, 2010
Hard to choose timestamp a priori
K1 K2 time K3 K4
Tuesday, October 5, 2010
Tuesday, October 5, 2010
Later, database notifies cache if object changes; cache truncates interval
K1 K2 time now
Tuesday, October 5, 2010
Tuesday, October 5, 2010
SELECT * FROM users WHERE name = ‘floyd’; [result] INVALIDATION TAGS users:name=floyd
Tuesday, October 5, 2010
Tuesday, October 5, 2010
Tuesday, October 5, 2010
Tuesday, October 5, 2010
All servers 2x 3.20 GHz Xeon, 2 GB RAM
Tuesday, October 5, 2010
(in-memory DB; 2 cache nodes)
1000 2000 3000 4000 5000 0% 20% 40% 60% 80% 100% 64MB 256MB 512MB 768MB 1024MB Cache size
Max throughput (requests/sec) Cache hit rate
Tuesday, October 5, 2010
(disk-bound DB; 8 shared web/cache nodes)
Max throughput (requests/sec) Cache hit rate
100 200 300 400 500 600 80% 85% 90% 95% 100% 1GB 2GB 3GB 4GB 5GB 6GB 7GB 8GB 9GB Cache size
Tuesday, October 5, 2010
0x 1x 2x 3x 4x 5x 6x 20 40 60 80 100 120 Relative throughput Staleness limit in seconds no caching (baseline) TxCache (in-memory DB) TxCache (disk-bound DB)
Tuesday, October 5, 2010
consistency misses configuration (% of total misses) in-memory, 512 MB, 30 s stale 7.8% in-memory, 512 MB, 15 s stale 5.4% in-memory, 64 MB, 30 s stale 0.2% disk-bound, 9 GB, 30 s stale 0.7%
Tuesday, October 5, 2010
consistency misses configuration (% of total misses) in-memory, 512 MB, 30 s stale 7.8% in-memory, 512 MB, 15 s stale 5.4% in-memory, 64 MB, 30 s stale 0.2% disk-bound, 9 GB, 30 s stale 0.7%
common to
Tuesday, October 5, 2010
Verified experimentally by disabling consistency:
transaction can read any data valid in last 30 sec
1000 2000 3000 4000 5000 64MB 256MB 512MB 768MB 1024MB Peak requests/sec Cache size
no caching TxCache no consistency
Tuesday, October 5, 2010
Application-level caches:
Database replication:
Tuesday, October 5, 2010
TxCache: application-layer caching with a simpler programming model
and database
lookups, updates, invalidations
New mechanisms:
Consistency imposes little performance cost
Tuesday, October 5, 2010