Enabling Completeness-aware Querying in SPARQL
1
Luis Galárraga, Katja Hose, Simon Razniewski May 14th, 2017 WebDB, Chicago
Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja - - PowerPoint PPT Presentation
Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja Hose, Simon Razniewski May 14 th , 2017 WebDB, Chicago 1 Outline Completeness in RDF knowledge bases Completeness oracles Our vision Representations for
1
Luis Galárraga, Katja Hose, Simon Razniewski May 14th, 2017 WebDB, Chicago
– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL
2
– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL
3
– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL
4
5
Français
Switzerland Romance
family citizenOf
Leonhard Euler
Italiano
family
6
7
8
9
– 1% of people have a citizenship in YAGO
10
– 1% of people have a citizenship in YAGO
11
– 1% of people have a citizenship in YAGO
– A single person in the KB could be actually single or the
12
– 1% of people have a citizenship in YAGO
– A single person in the KB could be actually single or the
13
– 1% of people have a citizenship in YAGO
– A single person in the KB could be actually single or the
– Consumers: no completeness guarantees for queries. – Producers: which parts of the KB need to be populated?
14
15
– A query q is complete in K, if q(K*)
16
– A query q is complete in K, if q(K*)
17
Français
Switzerland
Italiano
– A query q is complete in K, if q(K*)
18
Are these all the offjcial languages of Switzerland? Français
Switzerland
Italiano
– A query q is complete in K, if q(K*)
19
Are these all the offjcial languages of Switzerland? Français
Switzerland
Italiano
20
21
22
23
24
Français
Switzerland
Italiano
25
26
27
28
29
30
31
Gold standard: Complete instances in the domain of
Français Italiano Français Italiano Dansk Français
32
Français Italiano Français Italiano Dansk Français PCA oracle Gold standard: Complete instances in the domain of
33
Français Italiano Français Italiano Dansk Français American country
PCA oracle Gold standard: Complete instances in the domain of
34
Français Italiano Français Italiano Dansk Français American country
PCA oracle
Gold standard: Complete instances in the domain of
i(s, o)
35
36
notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒
37
notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒
– On gold standard built via crowdsourcing
38
notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒
– On gold standard built via crowdsourcing – 100% F1-measure for functional relations, quite good for
40
– An oracle is a collection of completeness statements
41
– An oracle is a collection of completeness statements
42
– An oracle is a collection of completeness statements
43
statement
hasPattern
pattern
subject
?x
a p r e d i c a t e
hasOffjcialLang
h a s P r
e c t i
V a r i a b l e
?y
a distinct
Variable true
– A call to the oracle asks for the existence of the query in
44
statement
hasPattern
pattern
subject
?x
a p r e d i c a t e
hasOffjcialLang
h a s P r
e c t i
V a r i a b l e
?y
a distinct
Variable true
– The oracle logic is embedded as a lambda function or a
45
– The oracle logic is embedded as a lambda function or a
46
pca-citizenship
a
SR-Oracle amie-oracle
a h a s F
m u l a
RM-Oracle ∃ o : r(s, o)
precision
96% http://example.org/rest/oracle
address a
List of results is complete according to oracle ɷ with confjdence X
47
48
49
50
How to provide completeness guarantees for arbitrary queries?
51
52
53
54
– It requires a large amount of efort
55
– It requires a large amount of efort
56
– It requires a large amount of efort
57
– It requires a large amount of efort
58
– It requires a large amount of efort
59
– It requires a large amount of efort
60
– It requires a large amount of efort
61
It will generate false negatives
– It requires a large amount of efort
62
If the KB misses Ligurian, this term returns false
– It requires a large amount of efort
63
Even though this term does not care, because Ligurian is not
?country
locatedIn
Europe
?lang
family
Romance
64 m
a r c h
?monarch
65
Projection variable ?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
66
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
67
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
Selective graph pattern
68
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
69
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
Non-selective graph pattern
70
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
71
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
72
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
not needed in case of Set semantics
73
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
74
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
75
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
76
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
77
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
78
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
79
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
It could easily lead to false negatives
80
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
We would like to minimize the number
81
?country
locatedIn
Europe
?lang
family
Romance
m
a r c h
?monarch
We would like to minimize the number
Use more complex oracles that cover larger parts of the query graph at once
82
83
– Example: aggregated number of Spanish speakers in a
84
– Example: aggregated number of Spanish speakers in a
85
– Example: aggregated number of Spanish speakers in a
86
Boolean aggregation function on sets of bindings
87
?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200
Complete list?
88
?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200
SELECT complete(?nspeak) WHERE { ?county inState Delaware . ?county spanishSpeakers ?nspeak . }
89
?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200
SELECT complete(?nspeak) WHERE { ?county inState Delaware . ?county spanishSpeakers ?nspeak . } Completeness oracles to the rescue!
90
– It determines the value and reliability of the data – Existing work provides only completeness statements and
– Vision
– Goal: Increase the value of the results delivered by
91
– Extend the SPARQL query language to support the
92