Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja - - PowerPoint PPT Presentation

enabling completeness aware querying in sparql
SMART_READER_LITE
LIVE PREVIEW

Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja - - PowerPoint PPT Presentation

Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja Hose, Simon Razniewski May 14 th , 2017 WebDB, Chicago 1 Outline Completeness in RDF knowledge bases Completeness oracles Our vision Representations for


slide-1
SLIDE 1

Enabling Completeness-aware Querying in SPARQL

1

Luis Galárraga, Katja Hose, Simon Razniewski May 14th, 2017 WebDB, Chicago

slide-2
SLIDE 2

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

2

slide-3
SLIDE 3

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

3

slide-4
SLIDE 4

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

4

slide-5
SLIDE 5

RDF Knowledge Bases (KBs)

Collection of structured knowledge

5

Français

  • ffjcialLanguage

Switzerland Romance

family citizenOf

Leonhard Euler

  • ffjcialLanguage

Italiano

family

slide-6
SLIDE 6

Plenty of KBs out there!

6

slide-7
SLIDE 7

Plenty of KBs out there!

7

slide-8
SLIDE 8

KBs in action

8

slide-9
SLIDE 9

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

9

slide-10
SLIDE 10

Completeness in RDF KBs

  • KBs are highly incomplete

– 1% of people have a citizenship in YAGO

10

slide-11
SLIDE 11

Completeness in RDF KBs

  • KBs are highly incomplete

– 1% of people have a citizenship in YAGO

  • We do not know where the incompleteness lies

11

slide-12
SLIDE 12

Completeness in RDF KBs

  • KBs are highly incomplete

– 1% of people have a citizenship in YAGO

  • We do not know where the incompleteness lies

– A single person in the KB could be actually single or the

KB may be incomplete

12

slide-13
SLIDE 13

Completeness in RDF KBs

  • KBs are highly incomplete

– 1% of people have a citizenship in YAGO

  • We do not know where the incompleteness lies

– A single person in the KB could be actually single or the

KB may be incomplete

  • Problems for data producers and consumers

13

slide-14
SLIDE 14

Completeness in RDF KBs

  • KBs are highly incomplete

– 1% of people have a citizenship in YAGO

  • We do not know where the incompleteness lies

– A single person in the KB could be actually single or the

KB may be incomplete

  • Problems for data producers and consumers

– Consumers: no completeness guarantees for queries. – Producers: which parts of the KB need to be populated?

14

slide-15
SLIDE 15

Completeness

  • Defjned with respect to a query q via a complete

hypothetical KB K*.

15

slide-16
SLIDE 16

Completeness

  • Defjned with respect to a query q via a complete

hypothetical KB K*.

– A query q is complete in K, if q(K*)

q(K). ⊆

16

slide-17
SLIDE 17

Completeness

  • Defjned with respect to a query q via a complete

hypothetical KB K*.

– A query q is complete in K, if q(K*)

q(K). ⊆

SELECT ?x WHERE { Switzerland offjcialLang ?x }

17

Français

  • ffjcialLanguage

Switzerland

  • ffjcialLanguage

Italiano

slide-18
SLIDE 18

Completeness

  • Defjned with respect to a query q via a complete

hypothetical KB K*.

– A query q is complete in K, if q(K*)

q(K). ⊆

SELECT ?x WHERE { Switzerland offjcialLang ?x }

18

Are these all the offjcial languages of Switzerland? Français

  • ffjcialLanguage

Switzerland

  • ffjcialLanguage

Italiano

slide-19
SLIDE 19

Completeness

  • Defjned with respect to a query q via a complete

hypothetical KB K*.

– A query q is complete in K, if q(K*)

q(K). ⊆

SELECT ?x WHERE { Switzerland offjcialLang ?x }

19

Are these all the offjcial languages of Switzerland? Français

  • ffjcialLanguage

Switzerland

  • ffjcialLanguage

Italiano

[Incomplete query]

slide-20
SLIDE 20

Completeness in RDF data

  • Wikidata provides no value annotations

20

slide-21
SLIDE 21

Completeness in RDF data

  • Wikidata provides no value annotations

SELECT ?x WHERE { USA offjcialLang ?x }

21

  • ffjcialLanguage
slide-22
SLIDE 22

Completeness in RDF data

  • Wikidata provides no value annotations

SELECT ?x WHERE { USA offjcialLang ?x }

22

  • ffjcialLanguage

[Complete query]

slide-23
SLIDE 23

Completeness in RDF data

  • Wikidata provides no value annotations

SELECT ?x WHERE { USA offjcialLang ?x }

23

  • ffjcialLanguage
  • Not applicable if we know some offjcial language

[Complete query]

slide-24
SLIDE 24

Completeness in RDF data

  • Wikidata provides no value annotations

SELECT ?x WHERE { USA offjcialLang ?x }

24

  • ffjcialLanguage
  • Not applicable if we know some offjcial language

[Complete query]

Français

  • ffjcialLanguage

Switzerland

  • ffjcialLanguage

Italiano

slide-25
SLIDE 25

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

25

slide-26
SLIDE 26

Completeness oracle

  • Boolean function ɷ(q, K) that guesses the

completeness of a query q in a KB K.

26

slide-27
SLIDE 27

SR completeness oracle

  • Function ɷ that guesses the completeness of

queries of the form [Galárraga et. al, 2017]:

27

SELECT ?x WHERE { subject relation ?x }

slide-28
SLIDE 28

SR completeness oracle

  • Function ɷ that guesses the completeness of

queries of the form [Galárraga et. al, 2017]:

28

SELECT ?x WHERE { subject relation ?x }

  • We use the notation ɷ(subject, relation)
slide-29
SLIDE 29

SR completeness oracle

  • Function ɷ that guesses the completeness of

queries of the form [Galárraga et. al, 2017]:

29

SELECT ?x WHERE { subject relation ?x }

  • We use the notation ɷ(subject, relation)
  • ɷ = pca(s, r) = partial completeness assumption
slide-30
SLIDE 30

SR completeness oracle

  • Function ɷ that guesses the completeness of

queries of the form [Galárraga et. al, 2017]:

30

  • We use the notation ɷ(subject, relation)
  • ɷ = pca(s, r) = partial completeness assumption

– Query is complete in KB if at least one answer

is known

SELECT ?x WHERE { subject relation ?x }

slide-31
SLIDE 31

Evaluating SR oracles

ɷ = pca(s, r) = partial completeness assumption

31

Gold standard: Complete instances in the domain of

  • ffjcialLanguage

Français Italiano Français Italiano Dansk Français

slide-32
SLIDE 32

Evaluating SR oracles

ɷ = pca(s, r) = partial completeness assumption

32

Français Italiano Français Italiano Dansk Français PCA oracle Gold standard: Complete instances in the domain of

  • ffjcialLanguage
slide-33
SLIDE 33

Evaluating SR oracles

ɷ = american-country-oracle(s, r)

33

Français Italiano Français Italiano Dansk Français American country

  • racle

PCA oracle Gold standard: Complete instances in the domain of

  • ffjcialLanguage
slide-34
SLIDE 34

Evaluating SR oracles

34

Français Italiano Français Italiano Dansk Français American country

  • racle

PCA oracle

PCA oracle Precision = 3/5 Recall = 3/4 American country oracle Precision = 1/2 Recall = 1/4

Gold standard: Complete instances in the domain of

  • ffjcialLanguage
slide-35
SLIDE 35

SR completeness oracles

  • Closed World Assumption: cwa(s, r) = true
  • PCA: pca(s, r) = o : r(s, o)

  • Cardinality: card(s, r) = #(o : r(s, o))

k ≥

  • Popular entities: popularitypop(s, r) = pop(s)
  • No-chg over time: nochangechg(s, r) =

chg(s, r) ∼

  • Star : starr1,..,rn(s, r) = i {1,..,n} : o : r

∀ ∊ ∃

i(s, o)

  • Class: classc(s, r) = type(s, c)
  • Rule mining oracle

35

slide-36
SLIDE 36

Rule mining SR oracle

36

  • Based on completeness rules

notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒

slide-37
SLIDE 37

Rule mining SR oracle

37

  • Based on completeness rules

notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒

  • Learned using the AMIE [Galárraga et. al, 2013] rule

mining system

– On gold standard built via crowdsourcing

slide-38
SLIDE 38

Rule mining SR oracle

38

  • Based on completeness rules

notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan1(x, placeOfDeath) incomplete(x, placeOfDeath) ⇒

  • Learned using the AMIE [Galárraga et. al, 2013] rule

mining system

– On gold standard built via crowdsourcing – 100% F1-measure for functional relations, quite good for

relations hasChild, graduatedFrom

slide-39
SLIDE 39

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

40

slide-40
SLIDE 40

Representing completeness oracles

  • Extensional approach [Darari, et. Al, 2013]

– An oracle is a collection of completeness statements

about queries

41

slide-41
SLIDE 41

Representing completeness oracles

  • Extensional approach [Darari, et. Al, 2013]

– An oracle is a collection of completeness statements

about queries

42

SELECT DISTINCT ?y WHERE { ?x hasOffjcialLang ?y } is complete in the KB

slide-42
SLIDE 42

Representing completeness oracles

  • Extensional approach [Darari, et. Al, 2013]

– An oracle is a collection of completeness statements

about queries

43

statement

hasPattern

pattern

subject

?x

a p r e d i c a t e

hasOffjcialLang

h a s P r

  • j

e c t i

  • n

V a r i a b l e

  • bject

?y

a distinct

Variable true

SELECT DISTINCT ?y WHERE { ?x hasOffjcialLang ?y } is complete in the KB

slide-43
SLIDE 43

Representing completeness oracles

  • Extensional approach [Darari, et. Al, 2013]

– A call to the oracle asks for the existence of the query in

the graph

44

statement

hasPattern

pattern

subject

?x

a p r e d i c a t e

hasOffjcialLang

h a s P r

  • j

e c t i

  • n

V a r i a b l e

  • bject

?y

a distinct

Variable true

SELECT DISTINCT ?y WHERE { ?x hasOffjcialLang ?y } is complete in the KB

slide-44
SLIDE 44

Representing completeness oracles

  • Intensional approach

– The oracle logic is embedded as a lambda function or a

link to a program or resource

45

slide-45
SLIDE 45

Representing completeness oracles

  • Intensional approach

– The oracle logic is embedded as a lambda function or a

link to a program or resource

46

pca-citizenship

a

SR-Oracle amie-oracle

a h a s F

  • r

m u l a

RM-Oracle ∃ o : r(s, o)

precision

96% http://example.org/rest/oracle

address a

slide-46
SLIDE 46

Providing completeness guarantees

List of results is complete according to oracle ɷ with confjdence X

47

slide-47
SLIDE 47

Providing completeness guarantees

48

slide-48
SLIDE 48

Providing completeness guarantees

49

SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

slide-49
SLIDE 49

Providing completeness guarantees

50

SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

How to provide completeness guarantees for arbitrary queries?

slide-50
SLIDE 50

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

51

slide-51
SLIDE 51

D completeness oracles

  • Oracle ɷd for the completeness of queries:

SELECT DISTINCT ?x WHERE { ?x relation ?y } SELECT DISTINCT ?y WHERE { ?x relation ?y }

52

slide-52
SLIDE 52

D completeness oracles

  • Oracle ɷd for the completeness of queries:
  • We use the notation ɷd(relation) or ɷd(relation-1)

SELECT DISTINCT ?x WHERE { ?x relation ?y } SELECT DISTINCT ?y WHERE { ?x relation ?y }

53

SELECT DISTINCT ?y WHERE { ?x offjcialLang ?y }

slide-53
SLIDE 53

D completeness oracles

  • Oracle ɷd for the completeness of queries:
  • We use the notation ɷd(relation) or ɷd(relation-1)

SELECT DISTINCT ?x WHERE { ?x relation ?y } SELECT DISTINCT ?y WHERE { ?x relation ?y }

54

SELECT DISTINCT ?y WHERE { ?x offjcialLang ?y }

  • If ɷd returns true, ɷd states that the KB knows all

languages that are offjcial in some country

slide-54
SLIDE 54

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

55

slide-55
SLIDE 55

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

56

slide-56
SLIDE 56

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

57

SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

slide-57
SLIDE 57

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

58

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLanguage)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

slide-58
SLIDE 58

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

59

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLanguage)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

slide-59
SLIDE 59

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

60

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLang-1)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

slide-60
SLIDE 60

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

61

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLang-1)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

It will generate false negatives

slide-61
SLIDE 61

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

62

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLang-1)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

If the KB misses Ligurian, this term returns false

slide-62
SLIDE 62

Completeness guarantees for arbitrary queries

  • Write completeness annotations for every possible

type of query

– It requires a large amount of efort

  • Reuse existing SR and D oracles

63

ɷ’ = ɷ(Romance, family-1) ( ∧ ∧l:family(l, Romance) ɷ(l, offjcialLang-1)) SELECT ?country WHERE { ?country offjcialLang ?lang . ?lang family Romance . }

Even though this term does not care, because Ligurian is not

  • ffjcial in any country
slide-63
SLIDE 63

Automatic oracle composition

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

64 m

  • n

a r c h

?monarch

slide-64
SLIDE 64

Automatic oracle composition

65

Projection variable ?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

slide-65
SLIDE 65

Automatic oracle composition

66

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

slide-66
SLIDE 66

Automatic oracle composition

67

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

Selective graph pattern

slide-67
SLIDE 67

Automatic oracle composition

68

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1)

slide-68
SLIDE 68

Automatic oracle composition

69

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1)

Non-selective graph pattern

slide-69
SLIDE 69

Automatic oracle composition

70

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1)

slide-70
SLIDE 70

Automatic oracle composition

71

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch))

slide-71
SLIDE 71

Automatic oracle composition

72

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch))

not needed in case of Set semantics

slide-72
SLIDE 72

Automatic oracle composition

73

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch))

slide-73
SLIDE 73

Automatic oracle composition

74

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1)

slide-74
SLIDE 74

Automatic oracle composition

75

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1)

slide-75
SLIDE 75

Automatic oracle composition

76

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1))

slide-76
SLIDE 76

Automatic oracle composition

77

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1)) Completeness veredict = ɷ** ɷ* ɷ’ ɷ’’ ∧ ∧ ∧

slide-77
SLIDE 77

Automatic oracle composition

78

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1)) Completeness veredict = ɷ** ɷ* ɷ’ ɷ’’ ∧ ∧ ∧ Confjdence = prec(ɷ**) prec(ɷ*) prec(ɷ’) prec(ɷ’’) ⨉ ⨉ ⨉

slide-78
SLIDE 78

Automatic oracle composition

79

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1)) Completeness veredict = ɷ** ɷ* ɷ’ ɷ’’ ∧ ∧ ∧

It could easily lead to false negatives

Confjdence = prec(ɷ**) prec(ɷ*) prec(ɷ’) prec(ɷ’’) ⨉ ⨉ ⨉

slide-79
SLIDE 79

Automatic oracle composition

80

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1)) Completeness veredict = ɷ** ɷ* ɷ’ ɷ’’ ∧ ∧ ∧

We would like to minimize the number

  • f used oracles

Confjdence = prec(ɷ**) prec(ɷ*) prec(ɷ’) prec(ɷ’’) ⨉ ⨉ ⨉

slide-80
SLIDE 80

Automatic oracle composition

81

?country

locatedIn

Europe

  • ffjcialLang

?lang

family

Romance

m

  • n

a r c h

?monarch

SELECT ?country WHERE { ?country monarch ?monarch . ?country locatedIn Europe . ?country offjcialLang ?lang . ?lang family Romance . }

ɷ’ = ɷ(Europe, locatedIn-1) ɷ’’ = ɷd(monarch) ∧ (∧country ɷ(country, monarch)) ɷ* = ɷ(Romance, family-1) ɷ** = ∧l : family(l, Romance) ɷ(l, offjcialLang-1)) Completeness veredict = ɷ** ɷ* ɷ’ ɷ’’ ∧ ∧ ∧

We would like to minimize the number

  • f used oracles

Confjdence = prec(ɷ**) prec(ɷ*) prec(ɷ’) prec(ɷ’’) ⨉ ⨉ ⨉

Use more complex oracles that cover larger parts of the query graph at once

slide-81
SLIDE 81

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

82

slide-82
SLIDE 82

Enabling completeness in SPARQL

  • Calls to completeness oracles could be embedded

in the query language

83

slide-83
SLIDE 83

Enabling completeness in SPARQL

  • Calls to completeness oracles could be embedded

in the query language

– Example: aggregated number of Spanish speakers in a

county per state, only for those states with complete information

84

slide-84
SLIDE 84

Enabling completeness in SPARQL

  • Calls to completeness oracles could be embedded

in the query language

– Example: aggregated number of Spanish speakers in a

county per state, only for those states with complete information

SELECT ?state sum(?nspeak) WHERE { ?county inState ?state . ?county spanishSpeakers ?nspeak . } GROUP BY ?state HAVING (complete(?nspeak))

85

slide-85
SLIDE 85

Enabling completeness in SPARQL

  • Calls to completeness oracles could be embedded

in the query language

– Example: aggregated number of Spanish speakers in a

county per state, only for those states with complete information

86

Boolean aggregation function on sets of bindings

SELECT ?state sum(?nspeak) WHERE { ?county inState ?state . ?county spanishSpeakers ?nspeak . } GROUP BY ?state HAVING (complete(?nspeak))

slide-86
SLIDE 86

Enabling completeness in SPARQL

  • For each value of ?state check if the bindings

for ?nspeak are complete

87

?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200

SELECT ?state sum(?nspeak) WHERE { ?county inState ?state . ?county spanishSpeakers ?nspeak . } GROUP BY ?state HAVING (complete(?nspeak))

Complete list?

slide-87
SLIDE 87

Enabling completeness in SPARQL

  • For each value of ?state check if the bindings

for ?nspeak are complete

88

?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200

SELECT ?state sum(?nspeak) WHERE { ?county inState ?state . ?county spanishSpeakers ?nspeak . } GROUP BY ?state HAVING (complete(?nspeak))

SELECT complete(?nspeak) WHERE { ?county inState Delaware . ?county spanishSpeakers ?nspeak . }

slide-88
SLIDE 88

Enabling completeness in SPARQL

  • For each value of ?state check if the bindings

for ?nspeak are complete

89

?state ?county ?nspeak Delaware New Castle 2000 Kent 4300 Sussex 1200 Hawaii Hawaii 30000 Kalawao 1200

SELECT ?state sum(?nspeak) WHERE { ?county inState ?state . ?county spanishSpeakers ?nspeak . } GROUP BY ?state HAVING (complete(?nspeak))

SELECT complete(?nspeak) WHERE { ?county inState Delaware . ?county spanishSpeakers ?nspeak . } Completeness oracles to the rescue!

slide-89
SLIDE 89

Outline

  • Completeness in RDF knowledge bases
  • Completeness oracles
  • Our vision

– Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL

  • Summary & conclusions

90

slide-90
SLIDE 90

Summary

  • Completeness is a dimension of data quality

– It determines the value and reliability of the data – Existing work provides only completeness statements and

  • racles for simple queries
  • Semantic Web is not completeness-aware

– Vision

  • Use completeness oracles for simpler queries to infer

completeness for arbitrary queries

  • Embed completeness in the SPARQL query language

– Goal: Increase the value of the results delivered by

queries

91

slide-91
SLIDE 91

Future work

  • Augment existing RDF data with completeness

statements and oracles

  • Implement reasoning with completeness oracles in

SPARQL query engines

– Extend the SPARQL query language to support the

complete aggregation function

92