Query Processing over Incomplete Autonomous Databases - - PowerPoint PPT Presentation

query processing over incomplete autonomous databases
SMART_READER_LITE
LIVE PREVIEW

Query Processing over Incomplete Autonomous Databases - - PowerPoint PPT Presentation

Query Processing over Incomplete Autonomous Databases


slide-1
SLIDE 1
  • Query Processing over Incomplete

Autonomous Databases

  • !

"# $! %%%&

slide-2
SLIDE 2
  • Introduction

''%(%)% )&&'%% ''%

– *+(+!+,-+,(,*+

slide-3
SLIDE 3
  • !

Incompleteness in Web Databases

  • "#$
  • %&'('#$
  • )*+),-

,.+./- *- 0,)). 1.2 3 00+,- 00+- ),+4- .10/4 *4 5+ ,+*- .+/- ..+/- 10*1 *. 6'+

  • '#-
  • 66

7( 8

slide-4
SLIDE 4
  • Problem
  • !'%9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

"9 :'($;

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

Many entities corresponding to tuples with missing values might be relevant to the user query

slide-5
SLIDE 5
  • Low Precision,

Infeasible

Possible Naïve Approaches

>?'#@5A >?'#@5A Low Recall Costly, Infeasible

3+5!6BCB=> -) ''%, ('6! 4+!6%!B> ,)( '6! ()) (%' 7+!BD> -)( '6!+'', )(%' % &'(('

slide-6
SLIDE 6
  • Outline

56$E 8&9/ :&;* !;#

slide-7
SLIDE 7
  • The QPIAD Solution

Select Top K Rewritten Queries <3=>'6? <4=>'6@? <7=>'6.

3E>?'@5 A

' <: <' = ' 3 ' ? 4223 ! 4

  • @?

4224 ! 7 8 . 4225 ! ?

  • @?

4227 B% 5 ' ! 422? B% A 9 ! 4224 ' B ' ? 422A B% ' <: <' = ' 3 ' ? 4223 ! 4

  • @?

4224 ! 7 8 . 4225 !

Base Result Set

AFD: Model~> Body style

' <: <' = ' 5(' ?

  • @?

4227 B% 2+B B ' ? 422A B% 2+7 Ranked Relevant Uncertain Answers

  • LEARN

LEARN REWRITE REWRITE RANK RANK EXPLAIN EXPLAIN

slide-8
SLIDE 8
  • LEARN

LEARN REWRITE REWRITE RANK RANK EXPLAIN EXPLAIN

slide-9
SLIDE 9
  • 8$$';

– ()( – *(&%%'% – !''(

Learning Statistics to Support Ranking & Rewriting

  • C D!!
  • 5 C &&.#E&'#E

;&&.

*8<F-66F' G,H, IJK 2LL63

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

Make, Body ~> Model P(Model=Accord | Make=Honda, Body=Coupe)

  • <'

4223 = <: ' ' !&

slide-10
SLIDE 10
  • )/''>

– <3=>'6? – <4=>'6@? – <7=>'6.

Rewriting to Retrieve Relevant Uncertain Results

! 4225 . 8 ! 4227 @?

  • <:

<' = ' ' ? 4223 !

  • @?

4224 !

Base Set for Q:(Body=Convt)

AFD: Model~> Body

  • #E',

& )

3 '?,@?. 4 '

%+

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

  • 8$$';

  • (& )(

  • )(/('/&
slide-11
SLIDE 11
  • Selecting/Ordering Top-K Rewritten Queries

8M *'8

  • M *'-

( ) ( ) ( )

R P R P F + ⋅ ⋅ ⋅ + = α α

α

1

  • &C /%'F&<
  • '/%''

P R R P F = ⋅ =

– B'&:($G

  • & '(:

All tuples returned for a single query are ranked equally

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

  • 8$$';

  • (&,C&

– ('C&'

slide-12
SLIDE 12
  • Explaining Results to the Users

'F+ !) '$>

) *.&

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

  • 8$$';

– (= – ((.&

Make, Body ~> Model yields This car is 83% likely to have Model=Accord given that its Make=Honda and Body=Sedan

slide-13
SLIDE 13
  • Outline

!9/ $6$E :&;* !;#

slide-14
SLIDE 14
  • #?'H<:H<'H=HH<A

Leveraging Correlation between Data Sources

>?'@5A

<'>3#?'H<:H<'H=HH<A

#?<:H<'H=HH<A

>?'@5A >?<'@5A >?<'@5A I

UNION

#2 F?<'

  • 'A

69> '= && % &N =%

AFDs learned from Cars.com

slide-15
SLIDE 15
  • Handling Aggregate and Join Queries
  • (((<
  • "<

<: <' = < ' ' B% A2,?22 9 ! 422? 43,352 B% ! 4224 57,4B5 ' ? 4227 ?3,A52 4225 4224 4223 = <:

  • '

B% ') 9 ( ') 57,4B5 4224 !

B% B%

A2,?22 A2,?22 < ')

  • <: <' =

' '

B% 1* "' B%

'

B% 11

Make=Honda

5 ? 7 4 3 ' !& B% ! B% ! ' <: <' ' ?

  • @?

?5A@+)H ?5A@+*

8 .

  • 745

?5A@+4H ?5A@+/

' ! t1 + t3 + t2 = 3 t 1 + t 3 + . 9 ( t 2 ) + . 4 ( t 4 ) = 3 . 3

Include a portion of each tuple relative to the probability its missing value matches the query constraint

5?JA@.+.

>?5?JA 8$'@5A

Only include tuples whose most likely missing value matches the query constraint

5?JA@.

slide-16
SLIDE 16
  • Outline

!9/ 8&9/ K !;#

slide-17
SLIDE 17
  • QPIAD Web Interface

http://rakaposhi.eas.asu.edu/qpiad

slide-18
SLIDE 18
  • Empirical Evaluation
  • E>

– 5

O !+ O B%,55,222&

– 5

O 9PE:( O 33%,422,222&

– 5

O !',!:& O 34%,?5,222&

  • &>

– 7C35Q'%

  • :&>

– 32Q&( – '''&) ('

  • *>

  • ('-)('+(+/,,+

– <+(+',(((,R,+ – ('+(+,&,+

slide-19
SLIDE 19
  • Experimental Results – Ranking & Rewriting

<8:E+-*9-*EC

!6%!B M )S))' %

slide-20
SLIDE 20
  • Experimental Results – Ranking & Rewriting

<8:E+-*EC ((

!BD M )S))&''( &%%%'

slide-21
SLIDE 21
  • Experimental Results – General Queries

((( "

slide-22
SLIDE 22
  • Experimental Results – Learning Methods

!

0.2 0.4 0.6 0.8 1 Year Make Model Price Mileage Body Certified

Accuracy

NBC AFD-Enhanced NBC BayesNet 3 BayesNet 2 Decision Tree

slide-23
SLIDE 23
  • Experimental Summary
  • )(N- (

– M <8:E(&

  • *9-*E %(&

– (( M <8:E/)& % '%

  • *E
  • ('

– #E &'

  • <

– (((/( ) (&'' – <8:E( R/ )'(%&

  • ''*.&

  • %(')+++&

– *&#C – *(%)

slide-24
SLIDE 24
  • Outline

!9/ 8&9/ :&;* 5KF8:

slide-25
SLIDE 25
  • Related Work
  • <(:&E%

– 8%'&&M &&&!'' 9%,TC9%,!'9% – 8%%&&M /'%&'( %) '&%)

  • 8%%E%

– 9&')%'%(&%%. – ),) ,''&%' '('%

  • <-N-.

– &&.)( %.) – P(&)('%

  • ((T

– !&&&&(%%( ,,',( ,,+ – P) /'&'%)%) '%( Our work fits here

  • ('
slide-26
SLIDE 26
  • Contributions
  • *)

('/&

– <-)(

  • ))( '%

)'( '('%

– #EC*'!

  • )(; ('%)

&'

– #C%' (

  • #E &R>

– <-)( – # – *.&

slide-27
SLIDE 27
  • Current Directions – QUIC (CIDR `07 Demo)
  • ='''>
  • (

> L'!!:M 5$> ';C

  • 'E
  • E%&&'%>
  • '
  • '

http://rakaposhi.eas.asu.edu/quic

!?!A>

(9$(

  • )(''%

– 5'# – 5&:'# – 5&'# E#

  • #
slide-28
SLIDE 28
  • Problem
  • !'9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

"9E

  • ''

;

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

Many entities corresponding to tuples with missing values might be relevant to the user query

slide-29
SLIDE 29
  • Handling Aggregate and Join Queries
  • (((<
  • "<

– -&&'

5 ? 7 4 3 ' !& B% ! B% ! ' <: <' ++ ' ?

  • @?

?5A@+)H ?5A@+*

8 .

  • 745

?5A@+4H ?5A@+/

' ! t1 + .9(t2) + t3 + .4(t4) =3.3 t 1 + t 2 + t 3 = 3

Only include tuples whose most likely missing value matches the query constraint

5?JA@.

Include a portion of each tuple relative to the probability its missing value matches the query constraint

5?JA@.+.

>?5?JA8$'@5A

slide-30
SLIDE 30
  • 8$$';

– ()( – *(&%%'% – !''(

Learning Statistics to Support Ranking & Rewriting

  • C D!!
  • 5 C &&.#E&'#E

;&&.

  • # M &,-,8:&

*<F-6&<U&--U8:- *8<F-66F' G,H, IJ 2LL63

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

slide-31
SLIDE 31
  • !
  • :&(()%&(

('(')

Incompleteness in Web Databases

"#$

  • %&''%'

%&

  • %&'('#$
  • ''%+(+ +

'& +(+&) &'

  • %%

(%''

)*+),- ,.+./- *- 0,)). 1.2 3 00+,- 00+- ),+4- .10/4 *4 5+ ,+*- .+/- ..+/- 10*1 *. 6'+

  • '#-
  • 66

7( 8

slide-32
SLIDE 32
  • Introduction

''%(%)% )&&'%% ''%

– *+(+!+,-+,(,*+

,'%('&' &'(&&'%

slide-33
SLIDE 33
  • Problem
  • !'9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

slide-34
SLIDE 34
  • Problem
  • !'9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

"9E

  • ''

;

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

Many entities corresponding to tuples with missing values might be relevant to the user query

slide-35
SLIDE 35
  • Selecting Top-K Rewritten Queries using F-Measure
  • &

V/)

8M *'8

  • M *'-

%'8;*++

( ) ( ) ( )

R P R P F + ⋅ ⋅ ⋅ + = α α

α

1

  • P9:P>#C%'

)(%&&

– α@*

86-

– αN*

8K-

– αO*

8L-

  • P9*>#C'(

&C/%'' ')

We still want the most precise tuples first!

  • 9,)'

&C/)(&& %%)&'

RANK RANK

slide-36
SLIDE 36
  • Ordering Top-K Queries using Estimated Precision
  • P)='&C)/,)'

''

P R R P F = ⋅ =

– B'&:( $H$9$$G

  • :(/''&)

&'

NOTE: All tuples returned for a single query are ranked equally

– &)'

RANK RANK

slide-37
SLIDE 37
  • Problem
  • !'%9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

slide-38
SLIDE 38
  • Problem
  • !'%9,

)./+

Want a ‘Honda Accord’ with a ‘sedan’ body style for under ‘$12,000’ High Precision Low Recall

"9E

  • ''
  • :'($;

'

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • 01,222

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ' ' ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 ' '

  • ;

3111 ' ' 033,422 032,522

  • 5

<: <' = ' ' ' 4223 ; ; ' 4224 '

Many entities corresponding to tuples with missing values might be relevant to the user query

slide-39
SLIDE 39
  • Query Rewriting in QPIAD

Select Top K Rewritten Queries <3=>'6? <4=>'6@? <7=>'6.

3E>?'#@5A

' <: <' = ' 3 ' ? 4223 ! 4

  • @?

4224 ! 7 8 . 4225 ! ?

  • @?

4227

  • 5

' ! 422?

  • A

9 ! 4224 ' B ' ? 422A

  • '

<: <' = ' 3 ' ? 4223 ! 4

  • @?

4224 ! 7 8 . 4225 !

Base Result Set

AFD: Model~> Body style

' <: <' = ' 5(' ?

  • @?

4227

  • 2+B

B ' ? 422A

  • 2+7

Ranked Relevant Uncertain Answers

  • WecanselecttopKrewrittenqueriesusingF

measure FMeasure=(1+α)*P*R/(α*P+R) P– EstimatedPrecision R– EstimatedRecallbasedonPandEstimated Selectivity

slide-40
SLIDE 40
  • LEARN

LEARN REWRITE REWRITE RANK RANK EXPLAIN EXPLAIN

slide-41
SLIDE 41
  • Handling Aggregate Queries

&&, ((('!P9%(

  • – #C%6CB>%E9'

'$($

3+ :(/'%'%+ 4+ !&(((%&+ 7+ %()/'(<8:E (+ ?+ :)/'.''+ 5+ #&.'' % &=(+ A+ : /&'( /,'&(((( + B+

  • (((()(((+
slide-42
SLIDE 42
  • Learning Statistics to Support Ranking & Rewriting
  • ('%(D!!
  • (%&&.

#E&'#E'&&.

  • (**-)<%'>

– )/'& &< –

  • ('%&

&-- – 8&&)(& 8:-

*<F-6&<U&--U8:- *8<F-66F' G,H, IJ 2LL63

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

slide-43
SLIDE 43
  • Rewriting to Retrieve Relevant Uncertain Results
  • #E

&,= '%'' '

! 4225 . 8 7 ! 4227 @?

  • ?

' <: <' = ' 3 ' ? 4223 ! 4

  • @?

4224 !

Base Set for Q:(Body=Convt)

AFD: Model~> Body

  • & )'6! )>

3 *& ('% 4 '& (',

&= '!

  • /%,'#EJK,)(

)/%>

– E'%' – #',()/( &'(%)'

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

slide-44
SLIDE 44
  • Selecting/Ordering Top-K Rewritten Queries
  • &

V/)

8M *'8

  • M *'-

( ) ( ) ( )

R P R P F + ⋅ ⋅ ⋅ + = α α

α

1

  • P9:P>#C%'

)(%&&

  • P9*>#C'(&C/

%''')

  • P)='&C)/,)'

''

P R R P F = ⋅ =

– B'&:( $H$9$$G

  • :(/''

&)&'

NOTE: All tuples returned for a single query are ranked equally

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

slide-45
SLIDE 45
  • Explaining Results to the Users

>?<:@"''<'@''NP*1HA !) '$>

) *.& > )())(&&W

LEARN LEARN – – REWRITE REWRITE – – RANK RANK – – EXPLAIN EXPLAIN

slide-46
SLIDE 46
  • Experimental Results – Learning Methods

!

0.2 0.4 0.6 0.8 1 Year Make Model Price Mileage Body Certified

Accuracy

NBC AFD-Enhanced NBC BayesNet 3 BayesNet 2 Decision Tree

(#E'( & #EC*'! &%)

  • %)+++&

<8:E%)' )'& )%' !'%

slide-47
SLIDE 47
  • Handling Aggregate and Join Queries
  • (((<

– &&,((( '!P9%(

O #C%6CB>%E9'' $($

  • "<

– R/%(''/, )R'' – *'&''%'' ) ''()/ – (&N,'%' /&''/

O 8$$($($''E $Q

slide-48
SLIDE 48
  • Experimental Results – Ranking & Rewriting

*α #C

> 32/V )/))'

  • Combined Effect = α + k

Sets tradeoff

  • f precision

& recall Resource limitation on # of rewritten queries

&,))/ ))&%' '%((&

slide-49
SLIDE 49
  • Experimental Results – Ranking & Rewriting

<8:E+-*9-*EC

!6%!B 9 %&)( '%/ $$ !6%!B &)('%) %/

slide-50
SLIDE 50
  • Experimental Results – Ranking & Rewriting

<8:E+-*EC ((

!BD &&( ' )' !BD &&&)N(' ' % &(( '%, )( % / % !BD ) /((9 % '

slide-51
SLIDE 51
  • Experimental Results – Learning Methods

!

0.2 0.4 0.6 0.8 1 Year Make Model Price Mileage Body Certified

Accuracy

NBC AFD-Enhanced NBC BayesNet 3 BayesNet 2 Decision Tree

(#E'( & #EC*'! &%)

  • %)+++&

<8:E%)' )'& )%' !'%

slide-52
SLIDE 52
  • Experimental Results - Extensions

((( "

8'( / ( &',)%( )(& &&.42Q/ 322Q)&' '