Efficient Detection of Empty-Result Queries Gang Luo IBM Watson - - PowerPoint PPT Presentation

efficient detection of empty result
SMART_READER_LITE
LIVE PREVIEW

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson - - PowerPoint PPT Presentation

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh Agenda Introduction The detection method Related work Future work Conclusion Empty-Result Queries Queries that return


slide-1
SLIDE 1

Damon Sotoudeh

Efficient Detection of Empty-Result Queries

Gang Luo IBM Watson Research Centre

slide-2
SLIDE 2

Agenda

 Introduction  The detection method  Related work  Future work  Conclusion

slide-3
SLIDE 3

Empty-Result Queries

 Queries that return nothing

 Do not provide much information  May take much time to produce  Frequently encountered:

○ CRM (at IBM): 18% ○ Biomedical domain: up to 40% ○ In interactive systems

slide-4
SLIDE 4

Empty-Result Queries

 In interactive systems

 Users keep refining queries  Few parameters are changed  Much of query parts are common

○ In IBM CRM application, only 38% of queries

are distinct

slide-5
SLIDE 5

Intuition

 Remember query parts that previously

led to empty result sets

 If a new query matches those parts, it

will generate empty results No query execution required

slide-6
SLIDE 6

Detection Method

Numbers are set cardinalities

slide-7
SLIDE 7

Detection Method

Identify lowest set with cardinality zero, and the sub-tree rooted at that point

slide-8
SLIDE 8

Detection Method

 Easy to see that the set cardinalities

above this point are all zero

slide-9
SLIDE 9

Detection Method

 If a new query has this query part, it is an

empty-result query

 Only if all the operators above it are empty-result

propagating

○ Selection ○ Projection ○ Join ○ And most of SQL operators

slide-10
SLIDE 10

Simplifying query plans

 Abstractly

 Certain operators have no influence on the

emptiness of output

○ Projection ○ Hash ○ Sort, ...

 Any join operator is simply a join

○ Hash join ○ Sort-merge join ○ Nested-loops join

slide-11
SLIDE 11

Simplifying query plans

slide-12
SLIDE 12

Simplifying query plans

 Previous figure corresponds to the

following query:

slide-13
SLIDE 13

Further simplification

 Convert selection conditions to DNF

 Disjunctive normal form  For example: =  Interval selection does not need to be

changed

slide-14
SLIDE 14

Further simplification

 After rewriting selections in DNF, combine

the individual selection terms in each relation

slide-15
SLIDE 15

Further simplification

 Great news:

 The output of the four simplified query parts is

also empty!

○ Proof by intuition!

 They are called atomic query parts

○ Cannot be further simplified

 But generating them is exponential

○ Poor performance for complex queries

slide-16
SLIDE 16

Detection

 How to detect an empty-result query Q?

 Break Q into its atomic parts  Is there any atomic part in container that covers

Q?

○ If yes, then it is an empty-result query

slide-17
SLIDE 17

Coverage

 A selection condition X covers selection

condition Y, if and only if when Y is true, then X is true.

 In other words, if X is false, then so is Y.

slide-18
SLIDE 18

Coverage

 Notion of coverage expands the detection

possibilities

 But deciding coverage is exponential  Paper uses a restricted coverage detection  Trade off between efficiency and coverage detection

 If an empty result atomic query part covers an

atomic part of query Q, then Q definitely generates empty results

 But we may not necessarily find such match

slide-19
SLIDE 19

Atomic query container

 Is fully stored in memory

 For fast access

 Is of fixed size M, but M can be fairly

large

 Trade off between efficiency and coverage  Once the container is full, maintain the most

frequently used atomic parts only

○ E.g. Least recently used (LRU) algorithm

slide-20
SLIDE 20

Atomic query container

 To avoid scanning the whole container

 Index the container based on involved

relations

slide-21
SLIDE 21

Experiments

 Based on two queries

 Q1: Find the information about certain parts that were

sold on certain days

 Q2: Find the information about certain parts that were

sold to certain customers on certain days

slide-22
SLIDE 22

Experiments

 The overhead is trivial compared to query

execution overhead

0.001 0.01 0.1 1 10 100 1000 1 2 3 database size (GB) execution time or overhead (second) execute Q1 check Q1 execute Q2 check Q2

slide-23
SLIDE 23

Experiments

 The overhead of our method increases with both query complexity

and the number of atomic query parts stored in C

When check fails, the overhead of our method is higher than that when check succeeds

slide-24
SLIDE 24

Related Work

 Two general approaches

1.

Find what leads to empty results

Time consuming

A lot of possibilities

2.

Automatically generalize the query to obtain some answers

Domain specific

Restricted forms of queries  No best approach

slide-25
SLIDE 25

Open issues

 How to include updates?  Extension beyond empty result propagating

  • perators

 A method that takes into account

advantages of all current solutions

 Not restrictive  Efficient

slide-26
SLIDE 26

Conclusion

 An efficient detection method of empty result

sets

 High detection rate once the container is

highly filled

 Low overhead compared to actual execution

  • f query

 Small storage requirements  Perfect for interactions

 Existence of hotspots is reflected

slide-27
SLIDE 27

Thanks for listening! Questions?