SLIDE 1
Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat - - PowerPoint PPT Presentation
Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat - - PowerPoint PPT Presentation
Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat Behar Yevgeny Ioffe Zachi Klopman Timeline 1:40 - 1:50: Introduction 1:50 - 3:00: Real-Time Databases/Scheduling 3:00 - 3:10: Break 3:10 - 4:00: Operator Scheduling in
SLIDE 2
SLIDE 3
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data: consistency and
validity
- Conclusions
SLIDE 4
References
- http://www.fpa.org/newsletter_info2584/n
ewsletter_info.htm (info on scud missiles)
- http://www.fas.org/spp/starwars/gao/im9
2026.htm (info on Patriot Missile System)
SLIDE 5
Imagine this…
- We are at war with Iraq
- Our soldiers find a potential target
- Military intelligence consults a database
to determine course of action
SLIDE 6
Imagine this…
- We are at war with Iraq
- Air control system constantly monitors
hundreds of aircraft and records them in a database
- Intelligence systems constantly query the
database for potential threats
SLIDE 7
Suddenly…
- Hundreds of missiles are launched
- We suspect some are nuclear
- Need info which will allow us to
determine a course of action
- Need this info to make rapid decision
- The costs of indecision are catastrophic
SLIDE 8
What could go wrong?
- Limited number of missiles we can
intercept
- Once they’re launched, we have limited
time to react
- Our traditional database is slowed by less
critical queries
- Finally, our queries may not be answered
in time due to system load
SLIDE 9
We need a system that:
- Handles time-sensitive queries
- Returns only temporally valid data
- Supports priority scheduling
- Solution: Real-Time Databases!
SLIDE 10
Real-Time Databases and Streams
- Scheduling
– Streams: priority based on QoS optimization – Real-Time: priority based on deadlines
- Load Shedding
– Streams: dropping tuples from queues – Real-Time: missing deadlines
- Freshness of data:
– Streams: not guaranteed – Real-Time: resample
SLIDE 11
Real-Time Databases and Streams
- Scheduling
– Streams: priority based on QoS optimization – Real-Time: priority based on deadlines and user- supplied values
- Load Shedding
– Streams: dropping tuples from queues – Real-Time: missing deadlines, dropping transactions
- Freshness of data:
– Streams: not guaranteed – Real-Time: resample
SLIDE 12
Real-Time Databases
- An extension to traditional databases
- Motivated by class of applications that
require reliable responses
- Predictable (not necessarily fast)
SLIDE 13
Real-Time Database Features
- Priority
– Classification of transactions – Assigns value to transactions
- Deadlines
– Transactions specify explicit time requirements – Transaction scheduling takes time requirements into account – Predictability that transactions will complete by deadline or not at all
SLIDE 14
Transactions and Streams
- Operation on the database that perform
combinations of reads/writes in an atomic step
– Queries are a subset of transactions
- Streams are read-only data (may create
new tuples)
- Data Consistency
SLIDE 15
Characteristics of Transactions
- Manner in which transactions use data
- Nature of time constraints
- Significance of executing a transaction by its
deadline
– consequence of missing specified time constraints
SLIDE 16
Transaction Classification
- Effect of missing transaction deadlines
- Value to user is dependent on timeliness:
– Soft: have some value after deadline – Firm: have no value after deadline – Hard: have negative value after deadline
- Special case: no deadline
- Idea for Streams: Queries have periodic
deadlines
SLIDE 17
Scheduling and Streams
- Streams: schedules queries in terms of
QoS
- Real-Time Databases: schedule
transactions in terms of scheduling policy
SLIDE 18
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data: consistency and
validity
- Conclusions
SLIDE 19
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data: consistency and
validity
- Conclusions
SLIDE 20
Scheduling Policies
- Earliest deadline first (PMM, PAQRS)
- Highest value first
- Highest value per unit computation time
first
- Longest executed transaction first
SLIDE 21
PMM
- Priority Memory Management
- Admission Control
– Decide if we run a query.
- Memory Allocation
– How much memory does each running query get.
SLIDE 22
Memory Allocation: Two Strategies
- Max
– Queries get their maximum required memory
- r no memory at all.
- MinMax
– High priority queries get their maximum required memory and low priority queries get their minimum.
SLIDE 23
Admission Control
- Goal: minimize the miss ratio (number of
queries that miss their deadline/total queries).
- MultiProgramming Level (MPL) =
number of queries to run.
- Optimize system resource use: optimal
MPL.
SLIDE 24
Relating MPL to Streams
- Real-Time: One time queries
- Stream: Continuous Queries
- Possibilities for future DSMS:
– Using MPL for QoS
SLIDE 25
Oh no! Missiles are launched again.
- We are running two types of queries:
– Query1 – Where should CNN’s cameras face to see the missile? – Query2 – Should we shoot the missile down?
- Queries of type 2 are obviously more
important, but how does the db know?
- Consider: Applications for relative query
values in stream systems.
SLIDE 26
PAQRS – extension of PMM
- Priority Adaptation Query Resource
Scheduling.
- PMM only minimizes miss ratio for the entire
system.
- We would like to be able to specify a ratio
between query classes for missed deadlines.
- RelMissRatio (Relative Miss Ratio) = {99:1}
Query1:Query2.
SLIDE 27
Why do we care?
- Think of the missile example.
– Same problems still exist in stream systems.
- Potential Stream Additions:
– Relative Priority Scheduling.
- Not all queries are equal
- Another form of QoS
– Periodic Query Deadlines.
- Deadlines for continuous queries
SLIDE 28
Bias Control
- Puts queries into two groups:
– Regular – Queries run with normal priority – Reserve – Queries run with priority lower than regular.
- Manages groups on a per query basis
– Each class gets RegQuota regular queries. – The rest have to run as reserve queries.
SLIDE 29
Relative Weights
Weight should reflect a class’ RelMissRatio. Weighti = (1/RelMissRatioi)/Σj(1/RelMissRatioj) Weightcnn = (1/99)/(1/99 + 1) = .01 Weightmis = (1)/(1/99 + 1) = .99
SLIDE 30
Bias Control using Relative Weights WeightedMissRatio = Σ(Weighti * MissRatioi) All terms are equal when the ratio is correct. WeightMissRatioex=(.01*99x%) + (.99*x%) WeightMissRatioex=.99x% + .99x%
SLIDE 31
Back to Missiles and CNN
- The actual miss ratio is not correct, the
miss ratio is 50:50!
- RegQuotai
new = RegQuotai
- ld *
{(Weighti * MissRatioi)/ (WeightedMissRatio/NumClasses)}
SLIDE 32
Missiles and CNN Calculations
WeightedMissRate=(.01*.50)+(.99*.50)=.5 .005 ≠ .495 RegQuotacnnnew=RegQuotacnnold * (.01*.50)/(.5/2) RegQuotacnnnew=RegQuotacnnold *.02 (98% less) RegQuotamisnew=RegQuotamisold * (.99*.50)/(.5/2) RegQuotamisnew=RegQuotamisold *1.98 (98% more)
SLIDE 33
Does it really work?
SLIDE 34
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data:consistency & validity
- Conclusions
SLIDE 35
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data:consistency & validity
- Conclusions
SLIDE 36
Essence of Real Time
- Although adaptive systems give better
throughput, IT DOESN’T MATTER!
- RT is about dependability, not
- throughput. 1% miss rate is (usually)
unacceptable.
- Throughput can be handled (usually)
with extra hardware (i.e. money). Dependability needs a special design.
SLIDE 37
Resources in Databases
- Logical
– Locks
- Physical
– CPU(s) – Memory
- Cache
- Work Area
– I/O Bandwidth
- Disks & Storage
- Network for
Distributed Processing
– Time...
SLIDE 38
Cost of a Transaction
- Waiting for locks to release
- Work memory needed (e.g. O(n) for in-memory
hash join, O(sqrt(n)) for disk-assisted)
- I/O amount (e.g. worst case join: multiplication)
- CPU needed to process
- Cost of aborting a transaction (negligible for
queries) If success cannot be guaranteed, don't start!
SLIDE 39
Physical Resources – Now and Then
4 (16) 10
# Disks
6 16.7
Disk Latency (ms)
Latency is Forever…
8192 (1GB) 256
Disk Cache (kB)
120 (1TB) 1
Disk Size (GB)
100 10
I/O Bandwidth (MB/s)
800 20
Memory Buffers (MB)
2 X 2500 40
CPU Speed (MIPS) 2003 (opt.RAID) 1995 (Paper)
SLIDE 40
Memory Allocation Strategies (I)
- Max
– all memory needed or nothing (don't admit)
- MinMax
– all memory needed for high-priority – min memory needed for low-priority
- M&M
– feedback-based allocation – adaptive – small amount of memory set aside for small transactions
SLIDE 41
Memory Allocation Strategies (II)
- Multiclass Dependent
– Small get all the memory they need – Large get a minimum amount – Medium get according to level load
- Classes are:
– Small – less than 10% of memory – Large – more than memory – Medium – between them.
SLIDE 42
Allocating Memory
S M L M S L S
SLIDE 43
Multi-Class Resource Allocation
Single Queue Multiple Queues
S S M L L S M L M S M L SP Resources LP MP Resources MP SP SP LP LP
SLIDE 44
Locking Strategies for Transactions
- Wait patiently…
– Bad idea – can wait forever for lower priority or deadlock
- Upgrade priority of lock holder
– Will complete less important job and then continue
- Abort transaction with lower priority
– Need to asses time of abort… RT systems are not tolerant about lock delays!
SLIDE 45
Locks for Queries (Cursors)
- Grab all locks
– Long wait, holds other transactions
- Disregard locks (“dirty read”)
– May read inconsistent data or data to be discarded
- Read only committed data (“committed read”)
– May read stale data – data may change while acting upon it
- Lock current record (“cursor stability”)
– Other parts of the set may change while active – may interfere with transactions
SLIDE 46
Costs of Distributed Processing
- Two phase commit protocol
- Aborting a transaction
- I/O for Queries
- Network delays (use dedicated
connections)
SLIDE 47
Relevance to Streams
NO
- Locks
- Rollbacks
YES
- Memory allocation
- Disk latency
- I/O Bandwidth
- Deadlines?
SLIDE 48
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data:consistency & validity
- Conclusions
SLIDE 49
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data:consistency & validity
- Conclusions
SLIDE 50
Properties of Data
- Consistency
– Temporal – Absolute – Relative
- Validity and Timestamp
– Validity interval = how long reading is accurate after arrives in system (timestamp)
SLIDE 51
Temporal Consistency
- R-T imposes temporal constraints not
present in streaming systems
- Need to preserve temporal validity of
data to reflect state of environment
- If transaction must meet deadline, valid
data must be in R-T system
- Consists of absolute and relative
consistency
SLIDE 52
Looking at Fresh Data
- How do we look at relevant data in streams?
– No guarantee data is fresh – Shed older data; as new data comes in, older data is flushed from system to make room
- How do we look at fresh data in Real-Time
databases?
– Make sure data hasn’t expired – absolute consistency
SLIDE 53
Absolute Consistency
- B/w state of environment and its
reflection in database
- Necessary to ensure controlling system is
aware of actual state of environment
- Example:
– A reading is taken indicating which reporter is with the 3rd infantry on April 8th; this reading is valid for 24 hours
SLIDE 54
Formal Definition: Absolute Consistency
- Data item d is described by:
(value, avi, timestamp) dvalue = current state of d dtimestamp = time when observation concerning d was made davi=d’s absolute validity interval: length of time following d during which d has absolute validity
SLIDE 55
Validity b/w Data in Streams
- Suppose: Want tuple1 and tuple2 to have
been created within given time interval
- Implicit notion of relative validity
– Data isn’t persistent in streaming systems less likely have relative inconsistency b/c as data becomes stale, less likely to be in system – If time interval is important, specify in query
- E.g. window joins
SLIDE 56
Relative Consistency: R-T
- Data must be consistent in a group used
to derive other data
– Data used to derive other data must be produced close together
- Example:
– If we are taking the average temperature of 3 locations, readings for the 3 areas should be taken within proximity of each other
SLIDE 57
Formal Definition: Relative Consistency
- Set of data items used to derive other
data is a relative consistency set, R
- Rrvi = relative validity interval
- R is relatively consistent if:
– ∀ d’ ∈ R, | dtimestamp – d’timestamp | ≤ Rrvi
SLIDE 58
Illustration
- Scud missile is traveling at 500 mph SW at –
45º, at an altitude of 100 ft
- Patriot missile is traveling at 1500 mph NE at
60º
- Can compute certain calculations to see if they
will intercept
- Readings must be taken within some time
interval I, to ensure computation is possible
SLIDE 59
Observe…
- Scud_speedavi=4 ms, patriot_speedavi = 2
ms, and Rrvi = 1; time = 12:33
- Scud_speed = (500, 4, 12:30)
- Patriot_speed1 = (1500, 2, 12:31)
- Patriot_speed2 = (1500, 2, 12:32)
- All have absolute consistency, but R’s
relative consistency is violated
SLIDE 60
Achieving Validity Intervals
- avi:
– R-T: realized by frequent sampling of real-world data; – Streams: can’t do this
- rvi:
– R-T: rvi w/avi smallest avi belonging to relative consistency set will prevail; – Streams: specified in query – Note: only necessary to achieve rvi of RC Set R if data is being derived from R
SLIDE 61
Timestamps of Derived Data
- How to assign timestamp to derived data
d’?
- One possibility: give d’ timestamp of
- ldest item from which derived:
d’timestamp = mind ∈ R (dtimestamp)
- Alternative: d’timestamp = some function of
data from which derived
SLIDE 62
Another Note on Consistency
- avi and rvi may change with system
dynamics
– Streams: if querying soldier’s heartbeat and see it stabilize can issue query less often – R-T: if system notices heartbeat is steady, may increase validity interval
SLIDE 63
Real-Time relation to Streams
- Real-Time can have streaming queries
- Temporal validity of data vs. window
joins
- Load shedding based on user-defined
priorities (QoS); R-T sheds transactions
- vs. Streams shed tuples
SLIDE 64
Conclusions
- Parallels between Streams and Real-
Time Databases:
– Scheduling (Streams: based on QoS; R-T: based on priority and deadlines) – Load Shedding (Streams: tuple-based; R-T: based on ability to meet deadlines) – Freshness of Data (Streams: defined in query; R-T: defined in data)
SLIDE 65
Discussion Questions
- What are the pros and cons of building
notion of relative consistency into DSMS itself instead of its queries?
- Is it worthwhile to define QoS for a
DSMS in terms of a ratio between queries? For example a relative periodic query scheduling policy.
SLIDE 66
Discussion Questions
- Should it possible to dynamically update
the Relative Miss Ratio? What are some situations that would benefit from this? In streams?
- Low priority queries miss their deadlines
and do not run, what is the parallel to this in a DSMS?
SLIDE 67
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data: consistency and
validity
- Conclusions
SLIDE 68
Real-Time Databases/Scheduling
- General Introduction
- Scheduling Policies
- Resource Allocation
- Properties of Data: consistency and
validity
- Conclusions
SLIDE 69