How Caching Improves Efficiency and Result Completeness for - - PowerPoint PPT Presentation

how caching improves efficiency and result completeness
SMART_READER_LITE
LIVE PREVIEW

How Caching Improves Efficiency and Result Completeness for - - PowerPoint PPT Presentation

How Caching Improves Efficiency and Result Completeness for Querying Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universitt zu Berlin Can we query the Web


slide-1
SLIDE 1

How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig

http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin

slide-2
SLIDE 2

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 2

SELECT DISTINCT ?i ?label WHERE { ?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i . OPTIONAL { ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") } } ORDER BY ?label

?

Can we query the Web of Data as of it were a single, giant database? Our approach: Link Traversal Based Query Execution [ISWC'09]

slide-3
SLIDE 3

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 3

Main Idea

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

slide-4
SLIDE 4

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 4

Main Idea

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-5
SLIDE 5

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 5

query-local dataset

Main Idea

h t t p : / / b

  • b

. n a m e ?

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-6
SLIDE 6

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 6

query-local dataset

Main Idea

h t t p : / / b

  • b

. n a m e ?

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-7
SLIDE 7

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 7

query-local dataset

Main Idea

h t t p : / / b

  • b

. n a m e ?

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-8
SLIDE 8

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 8

query-local dataset

Main Idea

h t t p : / / b

  • b

. n a m e ?

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

“Descriptor object”

slide-9
SLIDE 9

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 9

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

name project ?prj

Query

?prjName k n

  • w

s http://bob.name ?acq

slide-10
SLIDE 10

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 10

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

knows http://bob.name http://alice.name name project ?prj

Query

?prjName k n

  • w

s http://bob.name ?acq name project ?prj

Query

?prjName k n

  • w

s http://bob.name ?acq

slide-11
SLIDE 11

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 11

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name

?acq name project ?prj

Query

?prjName k n

  • w

s http://bob.name ?acq knows http://bob.name http://alice.name

slide-12
SLIDE 12

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 12

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name ?

http://alice.name

?acq name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-13
SLIDE 13

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 13

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name ?

http://alice.name

?acq name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-14
SLIDE 14

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 14

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name ?

http://alice.name

?acq name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

slide-15
SLIDE 15

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 15

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

name project ?prj

Query

k n

  • w

s ?acq ?prjName http://bob.name

http://alice.name

?acq

slide-16
SLIDE 16

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 16

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name

?acq name ?prjName k n

  • w

s http://bob.name

Query

project ?prj ?acq

slide-17
SLIDE 17

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 17

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

project http://.../AlicesPrj http://alice.name

http://alice.name

?acq name ?prjName k n

  • w

s http://bob.name

Query

project ?prj ?acq

slide-18
SLIDE 18

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 18

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name http://.../AlicesPrj

?prj ?acq

http://alice.name

?acq name ?prjName k n

  • w

s http://bob.name

Query

project ?prj ?acq project http://.../AlicesPrj http://alice.name

slide-19
SLIDE 19

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 19

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name

?acq name ?prjName k n

  • w

s http://bob.name

Query

project ?prj ?acq

http://alice.name http://.../AlicesPrj

?prj ?acq

slide-20
SLIDE 20

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 20

query-local dataset

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

Main Idea

http://alice.name

?acq name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj

http://alice.name http://.../AlicesPrj

?prj ?acq

http://.../AlicesPrj “ … “

?prjName ?prj

slide-21
SLIDE 21

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 21

query-local dataset

Main Idea

http://alice.name

?acq name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj

http://alice.name http://.../AlicesPrj

?prj ?acq

http://.../AlicesPrj “ … “

?prjName ?prj

http://alice.name

?acq

http://.../AlicesPrj “ … “

?prjName ?prj

  • Intertwine query evaluation with traversal of data links
  • We alternate between:
  • Evaluate parts of the query (triple patterns)
  • n a continuously augmented set of data
  • Look up URIs in intermediate

solutions and add retrieved data to the query-local dataset

slide-22
SLIDE 22

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 22

Characteristics

  • Link traversal based query execution:
  • Evaluation on a continuously augmented dataset
  • Discovery of potentially relevant data during execution
  • Discovery driven by intermediate solutions
  • Main advantage:
  • No need to know all data sources in advance
  • Limitations:
  • Query has to contain a URI as a starting point
  • Ignores data that is not reachable* by the query execution

* formal definition in the paper

slide-23
SLIDE 23

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 23

query-local dataset

The Issue

label interest ?i

Query

knows ?acq ?iLabel http://bob.name

slide-24
SLIDE 24

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 24

query-local dataset

The Issue

label interest ?i

Query

knows ?acq ?iLabel http://bob.name

http://bob.name ?

slide-25
SLIDE 25

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 25

query-local dataset

The Issue

label interest ?i

Query

knows ?acq ?iLabel http://bob.name knows http://bob.name http://alice.name

slide-26
SLIDE 26

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 26

query-local dataset

The Issue

label interest ?i

Query

knows ?acq ?iLabel http://bob.name knows http://bob.name http://alice.name ?acq ?iLabel ?i

slide-27
SLIDE 27

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 27

query-local dataset query-local dataset

The Issue

label interest ?i

Query

knows ?acq ?iLabel http://bob.name name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj

slide-28
SLIDE 28

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 28

query-local dataset query-local dataset

Reusing the Query-Local Dataset

label interest ?i

Query

knows ?acq ?iLabel http://bob.name name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj

slide-29
SLIDE 29

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 29

query-local dataset

Reusing the Query-Local Dataset

label interest ?i

Query

knows ?acq ?iLabel http://bob.name name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj knows http://alice.name http://bob.name

slide-30
SLIDE 30

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 30

query-local dataset

Reusing the Query-Local Dataset

label interest ?i

Query

knows ?iLabel ?acq http://bob.name name k n

  • w

s http://bob.name

Query

project ?acq ?prjName ?prj

http://alice.name

?acq knows http://bob.name http://alice.name

slide-31
SLIDE 31

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 31

Re-using the query-local dataset (a.k.a. data caching) may benefit query performance + result completeness

Hypothesis

slide-32
SLIDE 32

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 32

Contributions

  • Systematic analysis of the impact of data caching
  • Theoretical foundation*
  • Conceptual analysis*
  • Empirical evaluation of the potential impact
  • Out of scope: Caching strategies (replacement, invalidation)

*see paper

slide-33
SLIDE 33

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 33

Experiment – Scenario

  • Information about the

distributed social network of FOAF profiles

  • 5 types of queries
  • Experiment Setup:
  • 23 persons
  • Sequential use

➔ 115 queries

slide-34
SLIDE 34

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 34

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse given

  • rder

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate

  • no reuse experiment:
  • No data caching
  • given order experiment
  • Reuse of the query-local

dataset for the complete sequence of all 115 queries

  • Hit rate:

look-ups answered from cache all look-up requests

slide-35
SLIDE 35

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 35

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate

  • no reuse experiment:
  • No data caching
  • given order experiment
  • Reuse of the query-local

dataset for the complete sequence of all 115 queries

  • Hit rate:

look-ups answered from cache all look-up requests

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse given

  • rder
slide-36
SLIDE 36

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 36

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate query execution time (in seconds) number of query results

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse given

  • rder
slide-37
SLIDE 37

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 37

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate query execution time (in seconds) number of query results

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse given

  • rder
slide-38
SLIDE 38

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 38

Summary

  • Contributions:
  • Theoretical foundation
  • Conceptual analysis
  • Empirical evaluation
  • Main findings:
  • Additional results possible (for semantically similar queries)
  • Impact on performance may be positive but also negative
  • Future work:
  • Analysis of caching strategies in our context
  • Main issue: invalidation
slide-39
SLIDE 39

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 39

Backup Slides

slide-40
SLIDE 40

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 40

Contributions

  • Theoretical foundation (extension of the original definition)
  • Reachability by a Dseed-initialized execution of a BGP query b
  • Dseed-dependent solution for a BGP query b
  • Reachability R(B) for a serial execution of B = b1 , … , bn

➔ Each solution for bcur is also R(B)-dependent solution for bcur

  • Conceptual analysis of the impact of data caching
  • Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B )
  • Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] )
  • Empirical verification of the potential impact
  • Out of scope: Caching strategies (replacement, invalidation)
slide-41
SLIDE 41

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 41

Query Template Contact

SELECT * WHERE { <PERSON> foaf:knows ?p . OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname } OPTIONAL { ?p foaf:birthday ?birthday } OPTIONAL { ?p foaf:img ?img } OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID } }

slide-42
SLIDE 42

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 42

Query Template UnsetProps

SELECT DISTINCT ?result ?resultLabel WHERE { ?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> . ?result rdfs:domain foaf:Person . OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) ) <PERSON> foaf:knows ?var2 . ?var2 ?result ?var3 . ?result rdfs:label ?resultLabel . ?result vs:term_status ?var1 . } ORDER BY ?var1

slide-43
SLIDE 43

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 43

Query Template Incoming

SELECT DISTINCT ?result WHERE { ?result foaf:knows <PERSON> . OPTIONAL { ?result foaf:knows ?var1 . FILTER ( <PERSON> = ?var1 ) <PERSON> foaf:knows ?result . } FILTER ( !bound(?var1) ) }

slide-44
SLIDE 44

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 44

Query Template 2ndDegree1

SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?p1 foaf:knows ?result . FILTER ( <PERSON> != ?result ) ?p2 foaf:knows ?result . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }

slide-45
SLIDE 45

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 45

Query Template 2ndDegree2

SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?result foaf:knows ?p1 . FILTER ( <PERSON> != ?result ) ?result foaf:knows ?p2 . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }

slide-46
SLIDE 46

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 46

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound

Experiment – Single Query

hit rate (Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

  • no reuse experiment:
  • No data caching
  • upper bound experiment
  • Reuse of query-local dataset

for 3 executions of each query

  • Third execution measured
  • Hit rate:

look-ups answered from cache all look-up requests

slide-47
SLIDE 47

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 47

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound

Experiment – Single Query

hit rate (Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

  • no reuse experiment:
  • No data caching
  • upper bound experiment
  • Reuse of query-local dataset

for 3 executions of each query

  • Third execution measured
  • Hit rate:

look-ups answered from cache all look-up requests

slide-48
SLIDE 48

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 48

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound

Experiment – Single Query

hit rate query execution time (in seconds) number of query results (Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

slide-49
SLIDE 49

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 49

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound

Experiment – Single Query

hit rate query execution time (in seconds) number of query results (Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

slide-50
SLIDE 50

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 50

Experiment – Single Query

  • In the ideal case for Bupper= [ bcur , bcur ] :
  • pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] )
  • supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0

Experiment Avg.1 number of Query Results (std.dev.) Average1 Hit Rate (std.dev.) Avg.1 query Execution Time (std.dev.) no reuse 4.983

(11.658)

0.576

(0.182)

30.036 s

(46.708)

upper bound 5.070

(11.813)

0.996

(0.017)

1.943 s

(11.375)

1 Averaged over all 115 queries

slide-51
SLIDE 51

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 51

Experiment – Single Query

  • Summary (measurement errors aside):
  • Same number of query results
  • Significant improvements in query performance

Experiment Avg.1 number of Query Results (std.dev.) Average1 Hit Rate (std.dev.) Avg.1 query Execution Time (std.dev.) no reuse 4.983

(11.658)

0.576

(0.182)

30.036 s

(46.708)

upper bound 5.070

(11.813)

0.996

(0.017)

1.943 s

(11.375)

1 Averaged over all 115 queries

slide-52
SLIDE 52

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 52

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

query execution time (in seconds) number of query results

  • given order experiment:
  • Reuse of the query-local

dataset for the complete sequence of all 115 queries

slide-53
SLIDE 53

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 53

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

query execution time (in seconds) number of query results

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40) hit rate

  • given order experiment:
  • Reuse of the query-local

dataset for the complete sequence of all 115 queries

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder
slide-54
SLIDE 54

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 54

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

hit rate query execution time (in seconds) number of query results ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder
slide-55
SLIDE 55

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 55

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

hit rate query execution time (in seconds) number of query results ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder
slide-56
SLIDE 56

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 56

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

hit rate query execution time (in seconds) number of query results

Bgiven order= [ q1 , … , q38 ] s( q39 , Bgiven order ) = b( q39 , Bgiven order ) – b( q39 , [ ] ) = 9 – 1 = 8

slide-57
SLIDE 57

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 57

ContactInfoPhillipe UnsetPropsPhillipe 2ndDegree1Phillipe 2ndDegree2Phillipe IncomingPhillipe

0,2 0,4 0,6 0,8 1 0,2 0,4 0,6 0,8 1 no reuse upper bound given

  • rder

Experiment – Complete Sequence

(Query No. 36) (Query No. 37) (Query No. 38) (Query No. 39) (Query No. 40)

5 10 15 20 25 30 5 10 15 20 25 30 20 40 60 80 20 40 60 80

hit rate query execution time (in seconds) number of query results

Bgiven order= [ q1 , … , q38 ] p'( q39 , Bgiven order ) = c'( q39 , [ ] ) – c'( q39 , Bgiven order ) = 31.48 s – 68.64 s = – 37.16 s

slide-58
SLIDE 58

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 58

Experiment – Complete Sequence

  • Summary:
  • Data cache may provide for additional query results
  • Impact on performance may be positive but also negative

Experiment Avg.1 number of Query Results (std.dev.) Average1 Hit Rate (std.dev.) Avg.1 query Execution Time (std.dev.) no reuse 4.983

(11.658)

0.576

(0.182)

30.036 s

(46.708)

upper bound 5.070

(11.813)

0.996

(0.017)

1.943 s

(11.375)

given order 6.878

(12.158)

0.932

(0.139)

39.845 s

(145.898)

1 Averaged over all 115 queries

slide-59
SLIDE 59

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 59

Experiment – Complete Sequence

  • Executing the query sequence in a random order results in

measurements similar to the given order.

Experiment Avg.1 number of Query Results (std.dev.) Average1 Hit Rate (std.dev.) Avg.1 query Execution Time (std.dev.) no reuse 4.983

(11.658)

0.576

(0.182)

30.036 s

(46.708)

upper bound 5.070

(11.813)

0.996

(0.017)

1.943 s

(11.375)

given order 6.878

(12.158)

0.932

(0.139)

39.845 s

(145.898)

random orders 6.652

(11.966)

0.954

(0.036)

36.994 s

(118.700)

slide-60
SLIDE 60

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 60

These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)