NiagaraCQ: A Scalable Motivation Continuous Query System What is - - PDF document

niagaracq a scalable
SMART_READER_LITE
LIVE PREVIEW

NiagaraCQ: A Scalable Motivation Continuous Query System What is - - PDF document

Outline NiagaraCQ: A Scalable Motivation Continuous Query System What is NiagaraCQ ? for Internet Databases Details Performance Conclusion Jianjun Chen David J. DeWitt Feng Tian Yuan Wang Computer Sciences Department


slide-1
SLIDE 1

1

NiagaraCQ: A Scalable Continuous Query System for Internet Databases

Jianjun Chen David J. DeWitt Feng Tian Yuan Wang Computer Sciences Department University of Wisconsin-Madison

Outline

Motivation What is NiagaraCQ ? Details Performance Conclusion

Motivation

Continuous queries are growingly popular. Why?

Allow users to receive new results when they become

available without having to issue the same query repeatedly.

Especially useful in an environment like the Internet

comprises of large amounts of frequently changing information

Challenges:

Need to be able to support millions of queries due to the

scale of the Internet.

No existing systems have achieved this level of scalability.

What’s NiagaraCQ?

The continuous query sub system of Niagara,

which is a distributed database system for querying distributed XML data sets using a query language like XML-QL.

Supports scalable continuous query

processing over multiple, distributed XML files.

NiagaraCQ Novelty and Approaches

Grouping. Incremental group optimization strategy with

dynamic re-grouping.

Query-split scheme. Support both change-based and timer-based queries

in a uniform way.

To ensure scalability, need to do more:

Incremental evaluation of continuous queries. Use of both pull and push models for detecting

heterogeneous data source changes.

Memory caching.

NiagaraCQ Command Language

CREATE CQ_name

XML-QL query DO action {START start_time} {EVERY time_interval} {EXPIRE expiration_time}

Delete CQ_name

slide-2
SLIDE 2

2

Incremental group optimization

Groups are created for existing queries according to

their “signatures”, which represent similar structures among the queries

Groups allow the “common parts” of two or more

queries to be shared.

Each individual query in a query group shares the

results from the execution of the “group plan”.

Each individual query in a query group shares the

results from the execution of the group plan.

New query is merged into existing groups whose

signatures match that of the query

Expression Signature

  • Represent the same syntax structure, but possibly different

constant values, in different queries.

  • Expression signatures allow queries with the same syntactic

structure to be grouped together to share computation

XML-QL examples (Fig. 3.1) Expression Signature (Fig. 3.2)

Group

Groups are created for queries based on their

expression signatures. Consists of 3 parts:

Group signature: The common expression signature of all

queries in the group.

Group constant table: The group constant table contains

the signature constants of all queries in the group.

Group (cont.)

Group plan: the group plan is the query plan shared by all

queries in the group. It is derived from the common part of all single query plans in the group.

The Split operator

The result of the shared computation contains

results for all the queries in the group. How to filter and send the results sent to the correct destination

  • perator for further processing ?

A Split operator is combined with a Join operator

based on the constant values stored in the constant table to perform filtering.

Distributes each result tuple of the Join operator to

its correct destination based on the destination buffer name in the tuple (obtained from the Constant Table).

Incremental Grouping Algorithm

  • When a new query is submitted:

1.

The group optimizer traverses its query plan bottom up and tries to match its expression signature with the signatures of existing groups.

2.

The group optimizer breaks the new query plan into two parts

3.

The lower part of the query is

  • removed. The upper part of the

query is added onto the group plan.

4.

If constant table does not have an entry “AOL”, it will be added and a new destination buffer allocated.

  • If no match, a new group will be

generated for this signature and added to the group table.

slide-3
SLIDE 3

3

Discussion (1)

  • 1. Provide arguments justifying that this

(NiagaraCQ) is a better application for XML (group 1 and 3) or for relational data (group 2 and 4)? Why?

  • 2. To support you answer, provide some

examples (applications) where this kind of size/scalability is needed.

Buffer Design

The destination buffer for the split operator is

needed

Pipelined scheme Intermediate Files

Split Operator …….

buffer

Operator

buffer

Pipeline approach

Tuples are pipelined from the output of one operator

into the input of the next operator.

Doesn’t work for grouping timer-based CQ’s. It’s difficult for

a split operator to determine which tuple should be stored and how long they should be stored for.

The query structure is a directed graph, not a tree and

hence the plan may be too complicated for a general XML- QL query engine to execute.

The combine plan may be very large requires resources

beyond the limits of system.

A large portion of the query plan may not need to be

executed at each query invocation.

One query may block many other queries.

Materialized Intermediate Files Materialized Intermediate Files (cont.)

Advantages

Intermediate files and data sources are monitored

uniformly.

Each query is scheduled independently. The potential bottleneck problem of the pipelined

approach is avoided.

Disadvantages

Extra disk I/Os. Split operator becomes a blocking operator.

Timer-based Continuous Queries

Grouped in the same way as change-based

queries except that the time information needs to be recorded at installation time.

Challenges

Hard to monitor the timer events of those queries. Sharing the common computation becomes

difficult due to the various time intervals.

Timer-based continuous queries fires at

specific times, but only if the corresponding input files have been modified.

slide-4
SLIDE 4

4

Incremental Evaluation

Incremental evaluation allows queries to be invoked

  • nly on the changed data.

For each file, on which CQ’s are defined,

NiagaraCQ keeps a “delta file” that contains recent changes.

Queries are run over the delta files whenever

possible instead of their original files.

A time stamp is added to each tuple in the delta file.

NiagaraCQ fetches only tuples that were added to the delta file since the query’s last firing time.

Memory Caching

Caching is used to obtain good performance

with a limited amount of memory.

Caches query plans, system data structures,

and data files for better performance.

What should be cached?

Grouped query plans, assume that the

number of query groups is relatively small.

Recently accessed file. The event list for monitoring the timer-based

  • events. But it can be large, so a “time

window” of this list is kept.

Some performance comparisons Conclusion

Incremental grouping methodology makes group optimization

more scalable.

  • A query-split scheme requires minimal changes to a general

purposed query engine. In this model, both timer-based and change-based continuous queries can be grouped together for event detection and group execution.

  • Incremental evaluation of continuous queries, use of both pull

and push models for detecting heterogeneous data source changes and a caching mechanism further improve scalability.

Discussion (2)

Q: Another similar (in terms of continuous information retrieval)

and very popular technology is RSS which also uses XML. Identify what types of applications are better off with RSS and in which scenario will you use a system like NiagaraCQ?

Optional (if time permits): This paper has some conceptual /

functional similarities with other systems, e.g. use of time concept, integration of information from various sources. Compare and contrast these things and what are the challenges for this kind of systems?