SLIDE 1 Damon Sotoudeh
Efficient Detection of Empty-Result Queries
Gang Luo IBM Watson Research Centre
SLIDE 2
Agenda
Introduction The detection method Related work Future work Conclusion
SLIDE 3 Empty-Result Queries
Queries that return nothing
Do not provide much information May take much time to produce Frequently encountered:
○ CRM (at IBM): 18% ○ Biomedical domain: up to 40% ○ In interactive systems
SLIDE 4 Empty-Result Queries
In interactive systems
Users keep refining queries Few parameters are changed Much of query parts are common
○ In IBM CRM application, only 38% of queries
are distinct
SLIDE 5
Intuition
Remember query parts that previously
led to empty result sets
If a new query matches those parts, it
will generate empty results No query execution required
SLIDE 6
Detection Method
Numbers are set cardinalities
SLIDE 7
Detection Method
Identify lowest set with cardinality zero, and the sub-tree rooted at that point
SLIDE 8
Detection Method
Easy to see that the set cardinalities
above this point are all zero
SLIDE 9 Detection Method
If a new query has this query part, it is an
empty-result query
Only if all the operators above it are empty-result
propagating
○ Selection ○ Projection ○ Join ○ And most of SQL operators
SLIDE 10 Simplifying query plans
Abstractly
Certain operators have no influence on the
emptiness of output
○ Projection ○ Hash ○ Sort, ...
Any join operator is simply a join
○ Hash join ○ Sort-merge join ○ Nested-loops join
SLIDE 11
Simplifying query plans
SLIDE 12
Simplifying query plans
Previous figure corresponds to the
following query:
SLIDE 13
Further simplification
Convert selection conditions to DNF
Disjunctive normal form For example: = Interval selection does not need to be
changed
SLIDE 14
Further simplification
After rewriting selections in DNF, combine
the individual selection terms in each relation
SLIDE 15 Further simplification
Great news:
The output of the four simplified query parts is
also empty!
○ Proof by intuition!
They are called atomic query parts
○ Cannot be further simplified
But generating them is exponential
○ Poor performance for complex queries
SLIDE 16 Detection
How to detect an empty-result query Q?
Break Q into its atomic parts Is there any atomic part in container that covers
Q?
○ If yes, then it is an empty-result query
SLIDE 17
Coverage
A selection condition X covers selection
condition Y, if and only if when Y is true, then X is true.
In other words, if X is false, then so is Y.
SLIDE 18 Coverage
Notion of coverage expands the detection
possibilities
But deciding coverage is exponential Paper uses a restricted coverage detection Trade off between efficiency and coverage detection
If an empty result atomic query part covers an
atomic part of query Q, then Q definitely generates empty results
But we may not necessarily find such match
SLIDE 19 Atomic query container
Is fully stored in memory
For fast access
Is of fixed size M, but M can be fairly
large
Trade off between efficiency and coverage Once the container is full, maintain the most
frequently used atomic parts only
○ E.g. Least recently used (LRU) algorithm
SLIDE 20
Atomic query container
To avoid scanning the whole container
Index the container based on involved
relations
SLIDE 21 Experiments
Based on two queries
Q1: Find the information about certain parts that were
sold on certain days
Q2: Find the information about certain parts that were
sold to certain customers on certain days
SLIDE 22 Experiments
The overhead is trivial compared to query
execution overhead
0.001 0.01 0.1 1 10 100 1000 1 2 3 database size (GB) execution time or overhead (second) execute Q1 check Q1 execute Q2 check Q2
SLIDE 23 Experiments
The overhead of our method increases with both query complexity
and the number of atomic query parts stored in C
When check fails, the overhead of our method is higher than that when check succeeds
SLIDE 24 Related Work
Two general approaches
1.
Find what leads to empty results
○
Time consuming
○
A lot of possibilities
2.
Automatically generalize the query to obtain some answers
○
Domain specific
○
Restricted forms of queries No best approach
SLIDE 25 Open issues
How to include updates? Extension beyond empty result propagating
A method that takes into account
advantages of all current solutions
Not restrictive Efficient
SLIDE 26 Conclusion
An efficient detection method of empty result
sets
High detection rate once the container is
highly filled
Low overhead compared to actual execution
Small storage requirements Perfect for interactions
Existence of hotspots is reflected
SLIDE 27
Thanks for listening! Questions?