SQUALL: a Controlled Natural Language for Querying and Updating RDF - - PowerPoint PPT Presentation

squall a controlled natural language for querying and
SMART_READER_LITE
LIVE PREVIEW

SQUALL: a Controlled Natural Language for Querying and Updating RDF - - PowerPoint PPT Presentation

SQUALL: a Controlled Natural Language for Querying and Updating RDF Graphs Sbastien Ferr Team LIS, Data and Knowledge Management, Irisa Controlled Natural Language, 30 August 2012, Zurich The Web of Data How to search and explore the


slide-1
SLIDE 1

SQUALL: a Controlled Natural Language for Querying and Updating RDF Graphs

Sébastien Ferré Team LIS, Data and Knowledge Management, Irisa

Controlled Natural Language, 30 August 2012, Zurich

slide-2
SLIDE 2

The Web of Data

◮ How to search and explore the Web of data (RDF graphs) ? ◮ How to fill the gap between end users and formal

languages (RDF , OWL, SPARQL) ?

As of September 2010 Music Brainz (zitgist) P20 YAGO World Fact- book (FUB) WordNet (W3C) WordNet (VUA) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists

Plymouth Reading Lists UMBEL UK Post- codes legislation .gov.uk Uberblic UB Mann- heim TWC LOGD Twarql transport data.gov .uk totl.net Tele- graphis TCM Gene DIT Taxon Concept The Open Library (Talis) t4gm Surge Radio STW RAMEAU SH statistics data.gov .uk St. Andrews Resource Lists ECS South- ampton EPrints Semantic Crunch Base semantic web.org Semantic XBRL SW Dog Food rdfabout US SEC Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF New- castle LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Course- ware CORDIS CiteSeer Budapest ACM riese Revyu research data.gov .uk reference data.gov .uk Recht- spraak. nl RDF
  • hloh

Last.FM (rdfize) RDF Book Mashup PSH Product DB PBAC Poké- pédia Ord- nance Survey Openly Local The Open Library Open Cyc Open Calais OpenEI New York Times NTU Resource Lists NDL subjects MARC Codes List Man- chester Reading Lists Lotico The London Gazette LOIUS lobid Resources lobid Organi- sations Linked MDB Linked LCCN Linked GeoData Linked CT Linked Open Numbers lingvoj LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Good- win Family Jamendo iServe NSZL Catalog GovTrack GESIS Geo Species Geo Names Geo Linked Data (es) GTAA STITCH SIDER Project Guten- berg (FUB) Medi Care Euro- stat (FUB) Drug Bank Disea- some DBLP (FU Berlin) Daily Med Freebase flickr wrappr Fishes

  • f Texas

FanHubz Event- Media EUTC Produc- tions Eurostat EUNIS ESD stan- dards Popula- tion (En- AKTing) NHS (EnAKTing) Mortality (En- AKTing) Energy (En- AKTing) CO2 (En- AKTing) education data.gov .uk ECS South- ampton Gem. Norm- datei data dcs MySpace (DBTune) Music Brainz (DBTune) Magna- tune John Peel (DB Tune) classical (DB Tune) Audio- scrobbler (DBTune) Last.fm Artists (DBTune) DB Tropes dbpedia lite DBpedia Pokedex Airports NASA (Data Incu- bator) Music Brainz (Data Incubator) Moseley Folk Discogs (Data In- cubator) Climbing Linked Data for Intervals Cornetto Chronic- ling America Chem2 Bio2RDF biz. data. gov.uk UniSTS UniRef Uni Path- way UniParc Taxo- nomy UniProt SGD Reactome PubMed Pub Chem PRO- SITE ProDom Pfam PDB OMIM OBO MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Cpd InterPro Homolo Gene HGNC Gene Ontology GeneID Gen Bank ChEBI CAS Affy- metrix BibBase BBC Wildlife Finder BBC Program mes BBC Music rdfabout US Census Media Geographic Publications Government Cross-domain Life sciences User-generated content

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

slide-3
SLIDE 3

Formal vs Natural Languages

◮ SPARQL: a formal language à la SQL

◮ very expressive and precise for querying and updating RDF

graphs

◮ requires understanding of low-level notions: relational

algebra and logic

◮ natural language interfaces (ex., Aqualog, FREyA)

◮ good usability through NL ◮ difficult problems: ambiguity and adequacy w.r.t. the

underlying system

◮ in practice, generally limited to simple questions (much less

expressive than SPARQL)

◮ ex., Aqualog queries are limited to 2-triples queries

slide-4
SLIDE 4

Controlled Natural Languages (CNL)

◮ on the natural/formal continuum [Kaufmann&Bernstein

2010]

◮ combine natural syntax and formal semantics ◮ “There is no important theoretical difference between

natural languages and the artificial languages of logicians.” (Montague)

◮ a few CNLs:

◮ ACE [Fuchs et al]: a general purpose CNL ◮ SOS, Rabbit: CNLs for verbalizing OWL axioms ◮ SQUALL: the first CNL for SPARQL queries and updates

slide-5
SLIDE 5

What SQUALL is not...

  • 1. a pure (grammatically correct) subset of English

◮ natural languages are a source of inspiration for flexibility,

expressiveness, concision, high-level forms

◮ I think that CNLs should be more regular than NLs because

they have to be learnt anyway

  • 2. concerned with morphology (lexicon, agreements, etc.)

◮ should have the same requirements as SPARQL w.r.t. data ◮ should be able to refer to every resource without preprocess ◮ shares non-ambiguous notations with SPARQL ◮ author → hackcraft:authoredBy or purl:author ... ?

  • 3. a user interface

◮ querying is difficult, whatever the language ◮ syntax errors, empty results, preferences ◮ syntactic guided input (e.g., Ginseng) is not enough ◮ the objective is semantic guided input (done to a limited

extent in previous work with Sewelis)

slide-6
SLIDE 6

What SQUALL is...?

◮ an alternative CNL syntax for SPARQL

◮ hence, same expressiveness as SPARQL ◮ hence, full adequacy to RDF data ◮ with natural high-level syntax

◮ its implementation is a compiler (3 phases)

  • 1. parsing of the source sentence (SQUALL)
  • 2. generation of an intermediate representation (Montague

λ-terms)

  • 3. production of the target code (SPARQL)
slide-7
SLIDE 7

RDF Graphs

◮ a RDF graph is a set of triples (labeled edges) ◮ a triple (of resources) has a subject, a predicate, and an

  • bject

◮ a triple is a basic sentence ◮ ex., ex:John ex:loves ex:Mary .

◮ a resource denotes an entity or a concept or a literal value

(e.g., numbers, dates, strings)

◮ a property denotes a binary relation between resources

◮ a property can be a transitive verb (ex., ex:loves) or a

relational noun (ex., ex:author)

◮ a class denotes a set of resources

◮ a class can be a noun (ex., ex:woman) or an intransitive

verb (ex., ex:works)

◮ properties and classes are resources themselves

slide-8
SLIDE 8

SPARQL 1.1: querying and updating RDF graphs

◮ SPARQL forms

◮ closed question: ASK graph pattern ◮ open question: SELECT vars WHERE graph pattern ◮ update: DELETE graph INSERT graph WHERE graph

pattern

◮ graph patterns

◮ relational algebra: joins, unions, complements, selections,

projections

◮ constraints: logic, arithmetic, built-ins ◮ named graphs ◮ subqueries ◮ aggregations

slide-9
SLIDE 9

SPARQL query example

SELECT ?r WHERE { ?r rdf:type :researcher . BIND (?r AS ?X) GRAPH :DBLP { FILTER NOT EXISTS { ?p rdf:type :publication . ?p :author ?X . ?p :year ?y . FILTER (?y >= 2000) FILTER NOT EXISTS { { SELECT COUNT(?a) AS ?n WHERE { ?p :author ?a . } } FILTER (?n >= 2) } } } }

slide-10
SLIDE 10

A Montague grammar of SQUALL

◮ Syntax and semantics ◮ Modules:

  • 1. lexical conventions
  • 2. triples as sentences
  • 3. relational algebra as coordinations
  • 4. natural constructs (headed NPs, relatives, ...)
  • 5. queries with wh-words
  • 6. quantifiers as determiners
  • 7. subordination and n-ary predicates with reification
  • 8. built-in predicates and aggregations
  • 9. resolving syntactic ambiguities
slide-11
SLIDE 11

Lexical conventions

The same as in well-known notations (Turtle, SPARQL, N3)

◮ proper nouns, nouns, and verbs (URIs)

◮ <http://dbpedia.org/resource/Berlin>: a full URI

for the Berlin city

◮ dbpedia:Berlin: an abbreviated URI with DBpedia

namespace

◮ :Berlin: an abbreviated URI with default namespace ◮ Berlin: a bare URI (default namespace)

◮ literals

◮ "Hello world!": a plain literal ◮ "42"ˆˆxsd:integer: a typed literal ◮ 42: a bare integer

◮ variables: ?X ◮ grammatical words (SQUALL reserved keywords)

◮ is, a, which, every, ...

slide-12
SLIDE 12

Triples as sentences

S → NP VP { np vp } “[NPA] [VPknow-s B]” NP → Term { λd.(d term) } “A” VP → P1 { λx.(p1 x) } “[P1work-s]” | P2 NP { λx.(np λy.(p2 x y)) } “[P2know-s] [NPB]” P1 → ClassURI { λx.(type x uri) } “work” P2 → PropertyURI { λx.λy.(stat x uri y) } “know”

slide-13
SLIDE 13

Relational algebra as coordinations

They apply to most syntagms ∆: S, NP, VP, P1, P2, (next: Rel, AP, PP). ∆ → not ∆1 { not δ1 } “not [VPknow-s B]” | ∆1 and ∆2 { and δ1 δ2 } “[VPwork-s] and [VPcite-s X]” | ∆1 or ∆2 { or δ1 δ2 } “[NPA] or [NPB]” | maybe ∆1 { option δ1 } “maybe [VPknow-s B]”

slide-14
SLIDE 14

Headed NPs

NP → Det NG1 { λd.(det (init ng1) d) } “[Deta] [NG1woman]” | Det NG2 of NP { λd.(np λx.(det (init (ng2 x)) d)) } “[Detthe] [NG2author-s] of [NPX]” Det → a(n) { λd1.λd2.(exists (and d1 d2)) } | the { λd1.λd2.(the d1 d2) } NG1 → thing AR { and thing ar } “thing [ARthat cite-s A]” | P1 AR { and p1 ar } “[P1woman] [AR?A]” NG2 → P2 AR { λx.λy.(and (p2 x y) (ar y)) } “[P2author] [AR?A]” AR → App Rel { and app rel } “[App?A] [Relthat X cite-s]” | App { app } App → URI { λx.(eq x uri) } “A” | Var { λx.(bind x var) } “?X” | ǫ { λx.true }

slide-15
SLIDE 15

Relatives

Rel → that VP { init vp } “that [VPknow-s B]” | that NP P2 { init λx.(np λy.(p2 y x)) } “that [NPX] [P2cite-s]” | such that S { init λx.s } “such that [S?A work-s]” | Det NG2 of which VP { init λx.(det (ng2 x) vp) } “[Detan] [NG2author] of which [VPknow-s B]” | whose NG2 VP ≡ the NG2 of which VP “whose [NG2author ?A] [VPcites-s a colleague of ?A]” | whose P2 is/are NP { λx.(np λy.(p2 x y)) } “whose [P2author] [VPis a woman]”

slide-16
SLIDE 16

Auxiliary verbs

VP → is/are AP { ap } “is [APa woman]” | is/are Rel { rel } “is [Relsuch that ?A work-s]” | has/have Det P2 AR { λx.(det (p2 x) ar) } “have [Detan] [P2author] [ARthat X cite-s]” AP → Term { λx.(eq x term) } “A” | a(n)/the NG1 { ng1 } “a [NG1woman]” | a(n)/the NG2 of NP { λx.(np λy.(ng2 y x)) } “the [NG2author] of [NPX]”

slide-17
SLIDE 17

Queries with wh-words

S → whether S1 { ask s1 } “whether [SA know-s a woman]” NP → what ≡ which thing | whose NG2 ≡ the NG2 of what “whose [NG2author ?A]” Det → which { λd1.λd2.(select (and d1 d2)) } | how many { λd1.λd2.(count (and d1 d2)) }

slide-18
SLIDE 18

Quantifiers as determiners

Not present as such in SPARQL. Det → some { λd1.λd2.(exists (and d1 d2)) } | every { λd1.λd2.(forall d1 d2) } | no { λd1.λd2.(not (exists (and d1 d2))) } | at least i { λd1.λd2.(atleast i (and d1 d2)) } S → for NP , S { np λx.s } “for [NPevery publication ?X], [San author of ?X work-s]” | there is/are NP { np λx.true } “there are [NPat least 3 person-s that know A]”

slide-19
SLIDE 19

Subordination and n-ary predicates with reification

NP → that S { λd.λa.λg.(s () λt.(d t a g)) } “that [SA know-s B]” PP → at/in Prep NP { λs.(np λz.(prep z s)) } “at [Prepplace] [NPthe city Rennes]” | at/in Det Prep AR ≡ at Prep Det thing AR “at [Detsome] [Prepvenue] [ARwhose place is Rennes]” Rel → at/in which Prep AR S { init λx.(and (ar x) (prep x s)) } “in which [Prepgraph] [SA work-s]” Prep → graph { graph } | URI { arg uri }

◮ Prepositional phrases (PP) can occur at any position of a

sentence among the subject, verb, and object.

slide-20
SLIDE 20

Built-in predicates and aggregations

P1 → Pred1URI { λx.(pred1 uri x) } “Monday” P2 → Pred2URI { λx.λy.(pred2 uri x y) } “match” NG1 → AggregURI of AP (per AP+

i )?

{ λx.(aggreg uri x ap (api)i) } “count of [APthe publication ?P] per [APthe year of ?P]”

slide-21
SLIDE 21

Translation to SPARQL (principle)

Starting with the λ-term produced when parsing a SQUALL sentence:

  • 1. replace some constants by their definition

◮ ex., count = λd.(select λx.(aggreg COUNT x d ()))

  • 2. perform all β-reductions on the result
  • 3. inductively generate SPARQL code

◮ 4 generators for: sentences, updates, queries, and graph

patterns

◮ [ask f] = ASK { [f]G } ◮ [select d] = [?x | d ?x]Q ◮ [forall d1 d2]G = [not (exists (and d1 (not d2)))]G ◮ [the d1 d2]G = [exists (and d1 d2)]G ◮ [the d1 d2]U = [forall d1 d2]U

slide-22
SLIDE 22

Translation to SPARQL (example)

The SQUALL sentence for which researcher-s ?X, in graph DBLP every publication whose author is ?X and whose year ≥ 2000 has at least 2 author-s is parsed as “[Sfor [NP[Detwhich] [NG1[P1researcher-s] [AR[App?X]]]],

[S[PPin [Prepgraph] [NPDBLP]] [S[NP[Detevery] [NG1[P1publication] [AR[Rel[Relwhose [NG2[P2author]] [VPis [AP?X]]] and [Relwhose [NG2[P2year]] [VP[P2≥] [NP2000]]]]]]] [VPhas [Detat least 2] [P2author-s]]]]]”

slide-23
SLIDE 23

Translation to SPARQL (example)

The SQUALL sentence for which researcher-s ?X, in graph DBLP every publication whose author is ?X and whose year ≥ 2000 has at least 2 author-s is parsed as “[Sfor [NP[Detwhich] [NG1[P1researcher-s] [AR[App?X]]]],

[S[PPin [Prepgraph] [NPDBLP]] [S[NP[Detevery] [NG1[P1publication] [AR[Rel[Relwhose [NG2[P2author]] [VPis [AP?X]]] and [Relwhose [NG2[P2year]] [VP[P2≥] [NP2000]]]]]]] [VPhas [Detat least 2] [P2author-s]]]]]”

slide-24
SLIDE 24

Translation to SPARQL (example)

...whose internal representation is: (select X (and (triple X rdf:type :researcher) (graph :DBLP (forall (exists x3 x5 (and (triple x3 rdf:type :publication) (triple x3 :author X) (triple x3 :year x5) (pred2 ≥ x5 2000))) (exists x6 (and (aggreg COUNT x6 x8 x3 (exists x8 (triple x3 :author x8))) (pred2 ≥ x6 2)))))))

slide-25
SLIDE 25

Translation to SPARQL (example)

...which translates to the SPARQL query:

SELECT ?r WHERE { ?r rdf:type :researcher . BIND (?r AS ?X) GRAPH :DBLP { FILTER NOT EXISTS { ?p rdf:type :publication . ?p :author ?X . ?p :year ?y . FILTER (?y >= 2000) FILTER NOT EXISTS { { SELECT COUNT(?a) AS ?n WHERE { ?p :author ?a . } } FILTER (?n >= 2) } } } }

slide-26
SLIDE 26

Implementation

◮ available as a Web form

◮ See http://www.irisa.fr/LIS/softwares/squall

◮ developed in OCaml (functional, type safe, concise)

◮ syntax (367 loc), semantics (295 loc), sparql (198 loc)

◮ extends the paper with

◮ semantic validation: variable scope and binding, semantic

restrictions (e.g., no built-ins in assertions)

◮ arithmetic expressions and function calls (NP, P2, AP) ◮ “what is the height * width of the rectangle-s whose

color is red ?”

◮ quoted NPs as verbs (P1, P2) ◮ “a publication has author a person ?” ◮ “a publication has ’which rdf:Property’ a person ?”

slide-27
SLIDE 27

Perspectives as a language

◮ covering 100% of SPARQL 1.1 (nothing hard)

◮ CONSTRUCT and DESCRIBE forms ◮ query modifiers (ORDER BY, LIMIT, OFFSET) ◮ conditional expressions ◮ concise notation of RDF collections (list) and membership

test

◮ + type checking in expressions

◮ adding more natural constructs

◮ anaphoras: e.g., “some man ... this man” instead of “some

man ?X ... ?X”

◮ comparatives and superlatives ◮ adjectives and adverbs (as RDF classes) ◮ more forms of aggregations ◮ e.g., “what is the average age of the researcher-s ?”

slide-28
SLIDE 28

Perspectives as a user interface

◮ even a CNL is not simple enough ◮ need for guided input

◮ syntax-based auto-completion (like in Ginseng) ◮ query-based faceted search (like in Sewelis)

slide-29
SLIDE 29

Thanks!

Questions ?