SLIDE 1 View Invalidation for Dynamic Content Caching in Multitiered Architectures
Divyakant Agrawal Wen-Syan Li Oliver Po Wang-Pin Hsiung NEC USA, C&C Research Labs. CA USA
SLIDE 2 12/3/2002 Presented by K. Selçuk Candan
Multi-tiered architectures….
- Clients do not access the
database directly.
E F
- Instead, they use applications
–
which invoke DBMSs
- r they access result caches
–
proxy cache (A)
–
front-end cache (B)
–
edge cache (C)
–
user side cache (D)
–
middle-tier caches (E)
SLIDE 3
12/3/2002 Presented by K. Selçuk Candan
Problem…..
Users
SLIDE 4
12/3/2002 Presented by K. Selçuk Candan
Result caches and consistency
Various
–
view materialization and update management techniques
have been proposed to deal with updates to the underlying data.
These techniques guarantee that cached results are always
–
consistent
with the underlying data.
SLIDE 5
12/3/2002 Presented by K. Selçuk Candan
Strong consistency requirements..
Data Warehouse Data Data
SLIDE 6
12/3/2002 Presented by K. Selçuk Candan
Strong consistency requirements..
Data Warehouse Data Data
SLIDE 7
12/3/2002 Presented by K. Selçuk Candan
Strong consistency requirements..
Data Warehouse Data Data Queries
SLIDE 8
12/3/2002 Presented by K. Selçuk Candan
Result Caches and consistency
Various
–
view materialization and update management techniques
have been proposed to deal with updates to the underlying data.
These techniques guarantee that cached results are always
–
consistent
with the underlying data.
Other applications do not require caches reflect the database
exactly all the time.
SLIDE 9
12/3/2002 Presented by K. Selçuk Candan
Relaxed consistency requirements..
Data Warehouse Data Data Queries Middletier Cache Data Data Queries Misses
SLIDE 10
12/3/2002 Presented by K. Selçuk Candan
Invalidation vs. view maintenance
Result caches need
–
all out-dated results be invalidated
in a timely fashion.
SLIDE 11
12/3/2002 Presented by K. Selçuk Candan
Example
Page: http://www.autobuy.com/modelinfo?car=Toyota
select maker, model, price from Car where maker = "Toyota"; is cached.
SLIDE 12 12/3/2002 Presented by K. Selçuk Candan
Example (cont.)
If a new tuple
–
(Toyota; Avalon; 25000)
is inserted into Car, then we can either
–
recompute the new results of this query (preferably incrementally) and
–
rerun the application to regenerate the page.
–
purge the corresponding page from the cache.
–
the request can still served from the database!
SLIDE 13 12/3/2002 Presented by K. Selçuk Candan
Overinvalidation as a tool
Overinvalidation can be used if accurate invalidation is
–
too expensive or
–
not feasible in a given time frame
Underinvalidation is not acceptable!
Invalidation is inherently cheaper than view maintenance:
- we do not need to compute all consequences of updates
- to reduce the invalidation delay, we can overinvalidate
SLIDE 14 12/3/2002 Presented by K. Selçuk Candan
Query and update streams…
up1 up2 up3 q1 q2 q3 q4 q5 inv1 inv3 inv2
SLIDE 15 12/3/2002 Presented by K. Selçuk Candan
Example
select * from Car, Mileage where Car.maker = "Toyota" and Car.model = Mileage.model;
–
(“Mitsubishi", “Galant", 23000),
–
(“Toyota", “Avalon", 25000),
- For the second tuple, we need to check whether
–
Car.model = Mileage.model
can be satisfied using the data in the database.
(No additional information required) (Additional information required) (Polling query)
SLIDE 16
12/3/2002 Presented by K. Selçuk Candan
Polling queries (cont.)
Polling query that has to be answered:
select * from Mileage where "Avalon" = Mileage.model;
If the result to polling query is non-empty, then
–
the newly inserted tuple affected the query Keypoint: We only need to check for existence, we do not need to evaluate the polling query completely
SLIDE 17
12/3/2002 Presented by K. Selçuk Candan
?: the effect of updates on join views
SLIDE 18 12/3/2002 Presented by K. Selçuk Candan
?: the effect of updates on join views
- no distinction between deleted or inserted tuples
- no need to evaluate entire ?
SLIDE 19
12/3/2002 Presented by K. Selçuk Candan
Challenges in calculating ?
available from the update logs
SLIDE 20 12/3/2002 Presented by K. Selçuk Candan
Challenges in calculating ?
available from the update logs not available !!!
- synchronous: a single copy is maintained
–
the copy is locked during invalidation
- snapshot-based: a copy of the database is maintained
- asynchronous: a single copy is maintained
–
no locking is used
SLIDE 21
12/3/2002 Presented by K. Selçuk Candan
Snapshot-based approach (new and old versions are available)
SLIDE 22
12/3/2002 Presented by K. Selçuk Candan
Results
Snapshot-based approach
–
no over- or under-invalidation
–
replication overhead
SLIDE 23 12/3/2002 Presented by K. Selçuk Candan
Synchronous approach (only new available)
- old version of the database is not available!!!
OVERINVALIDATION
SLIDE 24
12/3/2002 Presented by K. Selçuk Candan
Results
Snapshot-based approach
–
no over- or under-invalidation
–
replication overhead
Synchronous approach
–
when there are more than two relations, unrecoverable over- invalidation is possible
–
locking overhead
SLIDE 25
12/3/2002 Presented by K. Selçuk Candan
Asynchronous approach (neither old nor new is available)
SLIDE 26
12/3/2002 Presented by K. Selçuk Candan
Results
Snapshot-based approach
–
no over- or under-invalidation
–
replication overhead
Synchronous approach
–
when there are more than two relations, unrecoverable over- invalidation is possible
–
locking overhead
Asynchronous approach
–
when there are more than two relations, unrecoverable under- invalidation is possible
–
no overhead
SLIDE 27
12/3/2002 Presented by K. Selçuk Candan
Efficiency: consolidated invalidation
TIME
SLIDE 28
12/3/2002 Presented by K. Selçuk Candan
Consolidated invalidation
SLIDE 29
12/3/2002 Presented by K. Selçuk Candan
Consolidated invalidation
SLIDE 30
12/3/2002 Presented by K. Selçuk Candan
Consolidated invalidation
SLIDE 31
12/3/2002 Presented by K. Selçuk Candan
Consolidated invalidation
SLIDE 32
12/3/2002 Presented by K. Selçuk Candan
Consolidation versus individual invalidation
Individual invalidation:
–
is the average top-1 retrieval cost
–
is the number of queries
Consolidated invalidation:
–
is the total size of ?
SLIDE 33
12/3/2002 Presented by K. Selçuk Candan
Polling query overhead
SLIDE 34
12/3/2002 Presented by K. Selçuk Candan
Polling query overhead
SLIDE 35
12/3/2002 Presented by K. Selçuk Candan
Overinvalidation vs. table sizes
SLIDE 36
12/3/2002 Presented by K. Selçuk Candan
Overinvalidation vs. update rate
SLIDE 37
12/3/2002 Presented by K. Selçuk Candan
Conclusions
Fast invalidation is key for caching in multi-tiered architectures Hard consistency is not required by many applications
–
Overinvalidation is acceptable
–
Underinvalidation is not!
View invalidation is inherently cheaper than view maintenance View invalidation is feasible!