Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat - - PowerPoint PPT Presentation

real time databases
SMART_READER_LITE
LIVE PREVIEW

Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat - - PowerPoint PPT Presentation

Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat Behar Yevgeny Ioffe Zachi Klopman Timeline 1:40 - 1:50: Introduction 1:50 - 3:00: Real-Time Databases/Scheduling 3:00 - 3:10: Break 3:10 - 4:00: Operator Scheduling in


slide-1
SLIDE 1

Real-Time Databases

Meghan Russ Miriam Speert Pete Dempsey Sedat Behar Yevgeny Ioffe Zachi Klopman

slide-2
SLIDE 2

Timeline

1:40 - 1:50: Introduction 1:50 - 3:00: Real-Time Databases/Scheduling 3:00 - 3:10: Break 3:10 - 4:00: Operator Scheduling in Aurora 4:00 - 4:25: Discussion 4:25 - 4:30: Comments

slide-3
SLIDE 3

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data: consistency and

validity

  • Conclusions
slide-4
SLIDE 4

References

  • http://www.fpa.org/newsletter_info2584/n

ewsletter_info.htm (info on scud missiles)

  • http://www.fas.org/spp/starwars/gao/im9

2026.htm (info on Patriot Missile System)

slide-5
SLIDE 5

Imagine this…

  • We are at war with Iraq
  • Our soldiers find a potential target
  • Military intelligence consults a database

to determine course of action

slide-6
SLIDE 6

Imagine this…

  • We are at war with Iraq
  • Air control system constantly monitors

hundreds of aircraft and records them in a database

  • Intelligence systems constantly query the

database for potential threats

slide-7
SLIDE 7

Suddenly…

  • Hundreds of missiles are launched
  • We suspect some are nuclear
  • Need info which will allow us to

determine a course of action

  • Need this info to make rapid decision
  • The costs of indecision are catastrophic
slide-8
SLIDE 8

What could go wrong?

  • Limited number of missiles we can

intercept

  • Once they’re launched, we have limited

time to react

  • Our traditional database is slowed by less

critical queries

  • Finally, our queries may not be answered

in time due to system load

slide-9
SLIDE 9

We need a system that:

  • Handles time-sensitive queries
  • Returns only temporally valid data
  • Supports priority scheduling
  • Solution: Real-Time Databases!
slide-10
SLIDE 10

Real-Time Databases and Streams

  • Scheduling

– Streams: priority based on QoS optimization – Real-Time: priority based on deadlines

  • Load Shedding

– Streams: dropping tuples from queues – Real-Time: missing deadlines

  • Freshness of data:

– Streams: not guaranteed – Real-Time: resample

slide-11
SLIDE 11

Real-Time Databases and Streams

  • Scheduling

– Streams: priority based on QoS optimization – Real-Time: priority based on deadlines and user- supplied values

  • Load Shedding

– Streams: dropping tuples from queues – Real-Time: missing deadlines, dropping transactions

  • Freshness of data:

– Streams: not guaranteed – Real-Time: resample

slide-12
SLIDE 12

Real-Time Databases

  • An extension to traditional databases
  • Motivated by class of applications that

require reliable responses

  • Predictable (not necessarily fast)
slide-13
SLIDE 13

Real-Time Database Features

  • Priority

– Classification of transactions – Assigns value to transactions

  • Deadlines

– Transactions specify explicit time requirements – Transaction scheduling takes time requirements into account – Predictability that transactions will complete by deadline or not at all

slide-14
SLIDE 14

Transactions and Streams

  • Operation on the database that perform

combinations of reads/writes in an atomic step

– Queries are a subset of transactions

  • Streams are read-only data (may create

new tuples)

  • Data Consistency
slide-15
SLIDE 15

Characteristics of Transactions

  • Manner in which transactions use data
  • Nature of time constraints
  • Significance of executing a transaction by its

deadline

– consequence of missing specified time constraints

slide-16
SLIDE 16

Transaction Classification

  • Effect of missing transaction deadlines
  • Value to user is dependent on timeliness:

– Soft: have some value after deadline – Firm: have no value after deadline – Hard: have negative value after deadline

  • Special case: no deadline
  • Idea for Streams: Queries have periodic

deadlines

slide-17
SLIDE 17

Scheduling and Streams

  • Streams: schedules queries in terms of

QoS

  • Real-Time Databases: schedule

transactions in terms of scheduling policy

slide-18
SLIDE 18

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data: consistency and

validity

  • Conclusions
slide-19
SLIDE 19

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data: consistency and

validity

  • Conclusions
slide-20
SLIDE 20

Scheduling Policies

  • Earliest deadline first (PMM, PAQRS)
  • Highest value first
  • Highest value per unit computation time

first

  • Longest executed transaction first
slide-21
SLIDE 21

PMM

  • Priority Memory Management
  • Admission Control

– Decide if we run a query.

  • Memory Allocation

– How much memory does each running query get.

slide-22
SLIDE 22

Memory Allocation: Two Strategies

  • Max

– Queries get their maximum required memory

  • r no memory at all.
  • MinMax

– High priority queries get their maximum required memory and low priority queries get their minimum.

slide-23
SLIDE 23

Admission Control

  • Goal: minimize the miss ratio (number of

queries that miss their deadline/total queries).

  • MultiProgramming Level (MPL) =

number of queries to run.

  • Optimize system resource use: optimal

MPL.

slide-24
SLIDE 24

Relating MPL to Streams

  • Real-Time: One time queries
  • Stream: Continuous Queries
  • Possibilities for future DSMS:

– Using MPL for QoS

slide-25
SLIDE 25

Oh no! Missiles are launched again.

  • We are running two types of queries:

– Query1 – Where should CNN’s cameras face to see the missile? – Query2 – Should we shoot the missile down?

  • Queries of type 2 are obviously more

important, but how does the db know?

  • Consider: Applications for relative query

values in stream systems.

slide-26
SLIDE 26

PAQRS – extension of PMM

  • Priority Adaptation Query Resource

Scheduling.

  • PMM only minimizes miss ratio for the entire

system.

  • We would like to be able to specify a ratio

between query classes for missed deadlines.

  • RelMissRatio (Relative Miss Ratio) = {99:1}

Query1:Query2.

slide-27
SLIDE 27

Why do we care?

  • Think of the missile example.

– Same problems still exist in stream systems.

  • Potential Stream Additions:

– Relative Priority Scheduling.

  • Not all queries are equal
  • Another form of QoS

– Periodic Query Deadlines.

  • Deadlines for continuous queries
slide-28
SLIDE 28

Bias Control

  • Puts queries into two groups:

– Regular – Queries run with normal priority – Reserve – Queries run with priority lower than regular.

  • Manages groups on a per query basis

– Each class gets RegQuota regular queries. – The rest have to run as reserve queries.

slide-29
SLIDE 29

Relative Weights

Weight should reflect a class’ RelMissRatio. Weighti = (1/RelMissRatioi)/Σj(1/RelMissRatioj) Weightcnn = (1/99)/(1/99 + 1) = .01 Weightmis = (1)/(1/99 + 1) = .99

slide-30
SLIDE 30

Bias Control using Relative Weights WeightedMissRatio = Σ(Weighti * MissRatioi) All terms are equal when the ratio is correct. WeightMissRatioex=(.01*99x%) + (.99*x%) WeightMissRatioex=.99x% + .99x%

slide-31
SLIDE 31

Back to Missiles and CNN

  • The actual miss ratio is not correct, the

miss ratio is 50:50!

  • RegQuotai

new = RegQuotai

  • ld *

{(Weighti * MissRatioi)/ (WeightedMissRatio/NumClasses)}

slide-32
SLIDE 32

Missiles and CNN Calculations

WeightedMissRate=(.01*.50)+(.99*.50)=.5 .005 ≠ .495 RegQuotacnnnew=RegQuotacnnold * (.01*.50)/(.5/2) RegQuotacnnnew=RegQuotacnnold *.02 (98% less) RegQuotamisnew=RegQuotamisold * (.99*.50)/(.5/2) RegQuotamisnew=RegQuotamisold *1.98 (98% more)

slide-33
SLIDE 33

Does it really work?

slide-34
SLIDE 34

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data:consistency & validity
  • Conclusions
slide-35
SLIDE 35

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data:consistency & validity
  • Conclusions
slide-36
SLIDE 36

Essence of Real Time

  • Although adaptive systems give better

throughput, IT DOESN’T MATTER!

  • RT is about dependability, not
  • throughput. 1% miss rate is (usually)

unacceptable.

  • Throughput can be handled (usually)

with extra hardware (i.e. money). Dependability needs a special design.

slide-37
SLIDE 37

Resources in Databases

  • Logical

– Locks

  • Physical

– CPU(s) – Memory

  • Cache
  • Work Area

– I/O Bandwidth

  • Disks & Storage
  • Network for

Distributed Processing

– Time...

slide-38
SLIDE 38

Cost of a Transaction

  • Waiting for locks to release
  • Work memory needed (e.g. O(n) for in-memory

hash join, O(sqrt(n)) for disk-assisted)

  • I/O amount (e.g. worst case join: multiplication)
  • CPU needed to process
  • Cost of aborting a transaction (negligible for

queries) If success cannot be guaranteed, don't start!

slide-39
SLIDE 39

Physical Resources – Now and Then

4 (16) 10

# Disks

6 16.7

Disk Latency (ms)

Latency is Forever…

8192 (1GB) 256

Disk Cache (kB)

120 (1TB) 1

Disk Size (GB)

100 10

I/O Bandwidth (MB/s)

800 20

Memory Buffers (MB)

2 X 2500 40

CPU Speed (MIPS) 2003 (opt.RAID) 1995 (Paper)

slide-40
SLIDE 40

Memory Allocation Strategies (I)

  • Max

– all memory needed or nothing (don't admit)

  • MinMax

– all memory needed for high-priority – min memory needed for low-priority

  • M&M

– feedback-based allocation – adaptive – small amount of memory set aside for small transactions

slide-41
SLIDE 41

Memory Allocation Strategies (II)

  • Multiclass Dependent

– Small get all the memory they need – Large get a minimum amount – Medium get according to level load

  • Classes are:

– Small – less than 10% of memory – Large – more than memory – Medium – between them.

slide-42
SLIDE 42

Allocating Memory

S M L M S L S

slide-43
SLIDE 43

Multi-Class Resource Allocation

Single Queue Multiple Queues

S S M L L S M L M S M L SP Resources LP MP Resources MP SP SP LP LP

slide-44
SLIDE 44

Locking Strategies for Transactions

  • Wait patiently…

– Bad idea – can wait forever for lower priority or deadlock

  • Upgrade priority of lock holder

– Will complete less important job and then continue

  • Abort transaction with lower priority

– Need to asses time of abort… RT systems are not tolerant about lock delays!

slide-45
SLIDE 45

Locks for Queries (Cursors)

  • Grab all locks

– Long wait, holds other transactions

  • Disregard locks (“dirty read”)

– May read inconsistent data or data to be discarded

  • Read only committed data (“committed read”)

– May read stale data – data may change while acting upon it

  • Lock current record (“cursor stability”)

– Other parts of the set may change while active – may interfere with transactions

slide-46
SLIDE 46

Costs of Distributed Processing

  • Two phase commit protocol
  • Aborting a transaction
  • I/O for Queries
  • Network delays (use dedicated

connections)

slide-47
SLIDE 47

Relevance to Streams

NO

  • Locks
  • Rollbacks

YES

  • Memory allocation
  • Disk latency
  • I/O Bandwidth
  • Deadlines?
slide-48
SLIDE 48

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data:consistency & validity
  • Conclusions
slide-49
SLIDE 49

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data:consistency & validity
  • Conclusions
slide-50
SLIDE 50

Properties of Data

  • Consistency

– Temporal – Absolute – Relative

  • Validity and Timestamp

– Validity interval = how long reading is accurate after arrives in system (timestamp)

slide-51
SLIDE 51

Temporal Consistency

  • R-T imposes temporal constraints not

present in streaming systems

  • Need to preserve temporal validity of

data to reflect state of environment

  • If transaction must meet deadline, valid

data must be in R-T system

  • Consists of absolute and relative

consistency

slide-52
SLIDE 52

Looking at Fresh Data

  • How do we look at relevant data in streams?

– No guarantee data is fresh – Shed older data; as new data comes in, older data is flushed from system to make room

  • How do we look at fresh data in Real-Time

databases?

– Make sure data hasn’t expired – absolute consistency

slide-53
SLIDE 53

Absolute Consistency

  • B/w state of environment and its

reflection in database

  • Necessary to ensure controlling system is

aware of actual state of environment

  • Example:

– A reading is taken indicating which reporter is with the 3rd infantry on April 8th; this reading is valid for 24 hours

slide-54
SLIDE 54

Formal Definition: Absolute Consistency

  • Data item d is described by:

(value, avi, timestamp) dvalue = current state of d dtimestamp = time when observation concerning d was made davi=d’s absolute validity interval: length of time following d during which d has absolute validity

slide-55
SLIDE 55

Validity b/w Data in Streams

  • Suppose: Want tuple1 and tuple2 to have

been created within given time interval

  • Implicit notion of relative validity

– Data isn’t persistent in streaming systems less likely have relative inconsistency b/c as data becomes stale, less likely to be in system – If time interval is important, specify in query

  • E.g. window joins
slide-56
SLIDE 56

Relative Consistency: R-T

  • Data must be consistent in a group used

to derive other data

– Data used to derive other data must be produced close together

  • Example:

– If we are taking the average temperature of 3 locations, readings for the 3 areas should be taken within proximity of each other

slide-57
SLIDE 57

Formal Definition: Relative Consistency

  • Set of data items used to derive other

data is a relative consistency set, R

  • Rrvi = relative validity interval
  • R is relatively consistent if:

– ∀ d’ ∈ R, | dtimestamp – d’timestamp | ≤ Rrvi

slide-58
SLIDE 58

Illustration

  • Scud missile is traveling at 500 mph SW at –

45º, at an altitude of 100 ft

  • Patriot missile is traveling at 1500 mph NE at

60º

  • Can compute certain calculations to see if they

will intercept

  • Readings must be taken within some time

interval I, to ensure computation is possible

slide-59
SLIDE 59

Observe…

  • Scud_speedavi=4 ms, patriot_speedavi = 2

ms, and Rrvi = 1; time = 12:33

  • Scud_speed = (500, 4, 12:30)
  • Patriot_speed1 = (1500, 2, 12:31)
  • Patriot_speed2 = (1500, 2, 12:32)
  • All have absolute consistency, but R’s

relative consistency is violated

slide-60
SLIDE 60

Achieving Validity Intervals

  • avi:

– R-T: realized by frequent sampling of real-world data; – Streams: can’t do this

  • rvi:

– R-T: rvi w/avi smallest avi belonging to relative consistency set will prevail; – Streams: specified in query – Note: only necessary to achieve rvi of RC Set R if data is being derived from R

slide-61
SLIDE 61

Timestamps of Derived Data

  • How to assign timestamp to derived data

d’?

  • One possibility: give d’ timestamp of
  • ldest item from which derived:

d’timestamp = mind ∈ R (dtimestamp)

  • Alternative: d’timestamp = some function of

data from which derived

slide-62
SLIDE 62

Another Note on Consistency

  • avi and rvi may change with system

dynamics

– Streams: if querying soldier’s heartbeat and see it stabilize can issue query less often – R-T: if system notices heartbeat is steady, may increase validity interval

slide-63
SLIDE 63

Real-Time relation to Streams

  • Real-Time can have streaming queries
  • Temporal validity of data vs. window

joins

  • Load shedding based on user-defined

priorities (QoS); R-T sheds transactions

  • vs. Streams shed tuples
slide-64
SLIDE 64

Conclusions

  • Parallels between Streams and Real-

Time Databases:

– Scheduling (Streams: based on QoS; R-T: based on priority and deadlines) – Load Shedding (Streams: tuple-based; R-T: based on ability to meet deadlines) – Freshness of Data (Streams: defined in query; R-T: defined in data)

slide-65
SLIDE 65

Discussion Questions

  • What are the pros and cons of building

notion of relative consistency into DSMS itself instead of its queries?

  • Is it worthwhile to define QoS for a

DSMS in terms of a ratio between queries? For example a relative periodic query scheduling policy.

slide-66
SLIDE 66

Discussion Questions

  • Should it possible to dynamically update

the Relative Miss Ratio? What are some situations that would benefit from this? In streams?

  • Low priority queries miss their deadlines

and do not run, what is the parallel to this in a DSMS?

slide-67
SLIDE 67

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data: consistency and

validity

  • Conclusions
slide-68
SLIDE 68

Real-Time Databases/Scheduling

  • General Introduction
  • Scheduling Policies
  • Resource Allocation
  • Properties of Data: consistency and

validity

  • Conclusions
slide-69
SLIDE 69

Persistence of Memory (Dali, 1931)

databases are persistent. data streams, like memory, fade with time…