Graph-Based RDF Knowledge Graph Research Lei Zou Peking - - PowerPoint PPT Presentation

graph based rdf
SMART_READER_LITE
LIVE PREVIEW

Graph-Based RDF Knowledge Graph Research Lei Zou Peking - - PowerPoint PPT Presentation

Graph-Based RDF Knowledge Graph Research Lei Zou Peking University, China 1 Collaborators Prof. Tamer Ozsu, University of Waterloo Prof. Jeffrey Xu Yu, The Chinese University of Hong Kong Prof. Lei Chen, Hong Kong University of Science and


slide-1
SLIDE 1

Graph-Based RDF Knowledge Graph Research

Lei Zou Peking University, China

1

slide-2
SLIDE 2

Collaborators

2

  • Prof. Tamer Ozsu, University of Waterloo
  • Prof. Jeffrey Xu Yu, The Chinese University of

Hong Kong

  • Prof. Lei Chen, Hong Kong University of Science

and Technology

  • Dr. Haixun Wang, Facebook
slide-3
SLIDE 3

Collaborators

3

PhD students (including alumni): Weiguo Zheng, graduated at 2015, post-doc in The Chinese University of Hong Kong; Peng Peng, graduated at 2016, assistant professor in Hunan University. Shuo Han Seng Hu Master Students (including alumni): Shuo Yang Xinbo Zhang

slide-4
SLIDE 4

Knowledge Graph

Google launches Knowledge Graph project at 2012.

4

slide-5
SLIDE 5

Knowledge Graph

Essentially, KG is a sematic network, which models the entities (including properties) and the relation between each other.

5

slide-6
SLIDE 6

RDF Data Model

  • RDF is the de-facto standard data

format for Knowledge Graph.

  • Simple triple format <subject,

predicate, object>

  • Represent both the properties of

entities and relations between entities.

xmlns:y=http://en.wikipedia.org/wiki y:Abraham Lincoln

Abraham Lincoln:hasName "Abraham Lincoln" Abraham Lincoln:BornOnDate: "1809-02-12" Abraham Lincoln:DiedOnDate: "1865-04-15"

y:Washington_DC DiedIn

7

slide-7
SLIDE 7

RDF & SPARQL

Subject Predicate Object

Abraham_Lincoln hasName “Abraham Lincoln" Abraham_Lincoln BornOnDate “1809-02-12" Abraham_Lincoln DiedOnDate “1865-04-15"” Abraham_Lincoln DiedIn Washington_DC Abraham_Lincoln bornIn Hodgenville KY Reese_Witherspoon bornOnDate "1976-03-22" Reese_Witherspoon bornIn New_Orleans_LA New_Orleans_LA foundingYear “1718” New Orleans LA locatedIn United_States United_States hasName “United States ” United_States hasCapital Washington_DC United_States foundingYear “1776”

RDF Datasets

SELECT ?name WHERE { ?m <bornIn> ? c i t y . ?m <hasName> ?name . ?m <bornOnDate> ?bd . ? c i t y <foundingYear> ` `1718 ' ' . FILTER( regex (str (?bd ), “1 9 7 6 ' ' ) ) }

SPARQL

8

“Finding people who was born in 1976 and his birth place is a city built on 1718.”

slide-8
SLIDE 8

Interdisciplinary Research

Knowledge Engineering

KB construction Rule-based Reasoning

Machine Learning

Knowledge Representation (Graph Embedding)

Natural Language Processing

Information Extraction Semantic Parsing

Database

RDF Database Data Integration 、Knowledge Fusion

9

KG

slide-9
SLIDE 9

Knowledge Engineering

Leipzig University University of Mannheim OpenLink Software Max-Planck-Institute Metaweb Company, acquired by Google in 2010

KB construction [Mendes et al. 12; Suchanek et al. 07; Bollacker ]

1.1 Billion Triples 180 Million Triples 2.5 Billion Triples

10

slide-10
SLIDE 10

Natural Language Processing

Semantic Parsing [Zettlemoyer et al., UAI 05]

Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specic applications. E.g., “Which states borders New Mexico ?”

Lambda-calculus [Alonzo Church, 1940 ]

“Simply typed Lambda-calculus can express varies database query languages such as relational algebra, fixpoint logic and the complex object algebra." [Hillebrand et al., 1996]

11

slide-11
SLIDE 11

Machine Learning

Knowledge Representation: TransE [Bordes et al., NIPS 13]

  • For each triple (Subject,Predicate,Object), “Predicate” as a

translation from Subject to Object

  • Each Subject/Predicate/Object in KG maps to a

multidimension vectors

  • Objective: S+P=O

Beijing − China Ottawa − Canada

12

S P O

China Capital Beijing Canada Capital Ottawa …… …… ……

=Capital

slide-12
SLIDE 12

A Fundamental Problem:How to store RDF data and answer SPARQL queries

Subject Predicate Object

Abraham_Lincoln hasName “Abraham Lincoln" Abraham_Lincoln BornOnDate “1809-02-12" Abraham_Lincoln DiedOnDate “1865-04-15"” Abraham_Lincoln DiedIn Washington_DC Abraham_Lincoln bornIn Hodgenville KY Reese_Witherspoon bornOnDate "1976-03-22" Reese_Witherspoon bornIn New_Orleans_LA New_Orleans_LA foundingYear “1718” New Orleans LA locatedIn United_States United_States hasName “United States” United_States hasCapital Washington_DC United_States foundingYear “1776”

SELECT ?name WHERE { ?m <bornIn> ? c i t y . ?m <hasName> ?name . ?m <bornOnDate> ?bd . ? c i t y <foundingYear> ` `1718 ' ' . FILTER( regex (str (?bd ), “1 9 7 6 ' ' ) ) }

SPARQL

How to answer SPARQL efficiently.

13

DBpeida and Freebase have more than billions of triples

Database

slide-13
SLIDE 13

Graph

14

slide-14
SLIDE 14

Graph

15

Graph is everywhere:

Social Network Citation Network Road Network Protein Network Knowledge Graph Internet

slide-15
SLIDE 15

16

Graph computing is different from traditional computing task.

Benchmark Solving a dense n by n system of linear equations Ax = b BFS search over a large graph Measure floating point computing power (TFlops/s). GTEPS (giga- traversed edges per second). Applications Engineering computing data-intensive workloads

slide-16
SLIDE 16

17

Graph computing is different from traditional computing task.

slide-17
SLIDE 17

Knowledge “GRAPH”

18

Subject Predicate Object Abraham_Lincoln hasName “Abraham Lincoln" Abraham_Lincoln BornOnDate “1809-02-12" Abraham_Lincoln DiedOnDate “1865-04-15"” Abraham_Lincoln DiedIn Washington_DC Abraham_Lincoln bornIn Hodgenville KY Reese_Witherspoon bornOnDate "1976-03-22" Reese_Witherspoon bornIn New_Orleans_LA New_Orleans_LA foundingYear “1718” New Orleans LA locatedIn United_States United_States hasName “United States” United_States hasCapital Washington_DC United_States foundingYear “1776”

slide-18
SLIDE 18

Graph-based RDF Data management

19

SPARQL Query Evaluation Natural Language Question Answering over KG Keyword Search over KG Semantic Search Ontology-based Document Retrieval Subgraph Matching Bipartite graph matching Similarity Subgraph Search Random walk-based Similarity Computing

KG problems Graph Techniques

slide-19
SLIDE 19

Graph-based RDF Data management

20

SPARQL Query Evaluation Natural Language Question Answering over KG Keyword Search over KG Semantic Search Ontology-based Document Retrieval Subgraph Matching Bipartite graph matching Similarity Subgraph Search Random walk-based Similarity Computing

KG problems Graph Techniques Our Solution

slide-20
SLIDE 20

Subgraph Matching-based SPARQL Query Evaluation

21

slide-21
SLIDE 21

A Fundamental Problem:How to store RDF data and answer SPARQL queries

Subject Predicate Object

Abraham_Lincoln hasName “Abraham Lincoln" Abraham_Lincoln BornOnDate “1809-02-12" Abraham_Lincoln DiedOnDate “1865-04-15"” Abraham_Lincoln DiedIn Washington_DC Abraham_Lincoln bornIn Hodgenville KY Reese_Witherspoon bornOnDate "1976-03-22" Reese_Witherspoon bornIn New_Orleans_LA New_Orleans_LA foundingYear “1718” New Orleans LA locatedIn United_States United_States hasName “United States” United_States hasCapital Washington_DC United_States foundingYear “1776”

SELECT ?name WHERE { ?m <bornIn> ? c i t y . ?m <hasName> ?name . ?m <bornOnDate> ?bd . ? c i t y <foundingYear> ` `1718 ' ' . FILTER( regex (str (?bd ), “1 9 7 6 ' ' ) ) }

SPARQL

How to answer SPARQL efficiently.

22

DBpeida and Freebase have more than billions of triples

slide-22
SLIDE 22

Existing Solutions: Resorting to RDBMS techniques

Subject Predicate Objects

Abraham_Lincoln hasName “Abraham Lincoln" Abraham_Lincoln BornOnDate “1809-02-12" Abraham_Lincoln DiedOnDate “1865-04-15"” Abraham_Lincoln DiedIn Washington_DC Abraham_Lincoln bornIn Hodgenville KY Reese_Witherspoon bornOnDate "1976-03-22" Reese_Witherspoon bornIn New_Orleans_LA New_Orleans_LA foundingYear “1718” New Orleans LA locatedIn United_States United_States hasName “United States” United_States hasCapital Washington_DC United_States foundingYear “1776”

SELECT ?name WHERE { ?m <bornIn> ? c i t y . ?m <hasName> ?name . ?m <bornOnDate> ?bd . ? c i t y <foundingYear> ` `1718 ' ' . FILTER( regex (str (?bd ), “1 9 7 6 ' ' ) ) } SELECT T2 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 WHERE T1.property=" bornIn " AND T2.property= "hasName" AND T3.property= "bornOnDate " AND T1.subject=T2.subject AND T2.subject=T3.subject AND T1.object=T4.subject AND T4.propety=“foundingYear “ AND T4.object=" 1718 " AND T3.object LIKE '%1976%'

SQL

Too many self- joins

SPARQL

23

slide-23
SLIDE 23

Existing Solutions (based on RDBMS techniques)

  • Property Table Jena [Wilkinson et al., 2003] ,FlexTable [Wang et al.,

2010] , DB2-RDF [Bornea et al., 2013]

  • Vertically partitioned tables SW-store [Abadi et al., 2009]
  • Exhaustive indexing RDF-3X [Neumann and Weikum, 2008], Hexastore

[Weiss et al., 2008]

Basic Ideas: dividing the large single triple-table into several carefully-designed tables.

  • M. T. Özsu. "A Survey of RDF Data Management Systems", Front. Comp. Sci., 2016.
  • Lei Zou, M. T. Özsu. “Graph-based RDF Data Management”, Data Science and Engineering,

2(1): 56-70 (2017)

24

slide-24
SLIDE 24

Our Solution---gStore [Zou et al., VLDB 11; VLDB J 14 ] Answering SPARQL == subgraph matching

25

slide-25
SLIDE 25

Our Solution---gStore [Zou et al., VLDB 11; VLDB J 14 ]

26

Main Techniques:

  • Store RDF graph G as adjacency lists;
  • Neighborhood Structure Summarization—Encoding
  • Structure-aware Index– VS*-tree.
slide-26
SLIDE 26

Our Solution---gStore

28

Why Encoding Neighborhood ? Neighborhood Pruning: If a vertex u in query graph Q can match a vertex v in data graph G, then any neighbor of vertex u should match one neighbor of vertex v; Otherwise, u cannot match v.

slide-27
SLIDE 27

Our Solution---gStore

29

slide-28
SLIDE 28

Our Solution---gStore

30

slide-29
SLIDE 29

Our Solution---gStore

31

slide-30
SLIDE 30

Our Solution---gStore

32

slide-31
SLIDE 31

Challenges:How to find “crossing matches ”

33

gStore-D: Distributed RDF System

[Peng P, et al., VLDB J 16]

slide-32
SLIDE 32

gStore-D: Distributed RDF System

[Peng P , et al., VLDB J 16]

Main Techniques:

  • Partial Evaluation and Assembly-based Solution;
  • Optimized Assembly Strategy in the distributed

circumstance

34

S1 S2 Sn Local partial matches in S1 Local partial matches in S2 Local partial matches in Sn SPARQL matches SPARQL queries Assemble all local partial matches Initialization Partial Evaluation Assembly

slide-33
SLIDE 33

Background: Partial Evaluation [Jones, 1996; Fan et al., 06; Shuai et al., 2012]

35

gStore-D: Distributed RDF System

[Peng P, et al., VLDB J 16]

Known Input Unknown Input Partial Results

( ) ( , ) '' '( ( , ) Final Resul ) ts s f s f d f x f d   

slide-34
SLIDE 34

Which are“known inputs”and“partial results”?

36

gStore-D: Distributed RDF System

Known inputs: The graph at its own site and the query graph Q. Partial Results: The maximal partial matches of query graph Q over its own partial data graph in the site.

slide-35
SLIDE 35

gStore-D: Distributed RDF System

[Peng P , et al., VLDB J 16]

37

S1 S2 Sn Local partial matches in S1 Local partial matches in S2 Local partial matches in Sn SPARQL matches SPARQL queries Assemble all local partial matches Initialization Partial Evaluation Assembly

slide-36
SLIDE 36

Assembly

38

gStore-D: Distributed RDF System

slide-37
SLIDE 37

Our System

Codes:More than 140,000 lines C++, coding from scratch Project Address:

https://github.com/Caesar11/gStore/ including all codes; user manual; benchmarking test report; system demo video. Licenses: BSD API: C++, Java, Phython, PHP and HTTP Rest Supporting SPARQL 1.1 (including UNION, OPTIONAL, FILTER, GROUP BY, BIND)

39

slide-38
SLIDE 38

Our System

Capability: A single site can support big KG with more than FIVE billion edges (e.g., supporting the full version of DBpdida and freebase in a single machine)

Performance: see our system performance report in github. Endpoints: http://dbpedia.gstore-pku.com http://freebase.gstore-pku.com

40

slide-39
SLIDE 39

The Third Part Comments

  • ---摘自论文“Querying RDF Data Using

A Multigraph-based Approach”in Proceeding of EDBT 2016

DBpedia

33 Million Triples 4 Million Vertices Comparative Systems Systems’ Features Comments Apache Jena Open Source RDF Database; original from HP Lab “x-RDF-3x, Jena are not able to output results for size 20 onwards”. x-RDF-3x Influential academic system, from Max-Planck- Institute Virtuoso Commercial System “Virtuoso seems to become less robust with the increasing query size” gStore (Our System) Open Source System at Github 【Zou et al., VLDB 2011】 “the time performance of gStore seems better than Virtuoso”

【Vijay Ingalalli, Dino Ienco, Pascal Poncelet, Serena Villata: Querying RDF Data Using A Multigraph-based Approach. EDBT 2016: 245-256】

  • LIRMM, IRSTEA
  • CNRS, I3S Laboratory

gStore Virtuoso RDF-3x 11.96 (sec) 20.45 (sec) >60 (sec)

Average Time (seconds) for a sample of 200 complex queries on DBPEDIA.

[Ingalalli et. EDBT 16]

41

slide-40
SLIDE 40

gStore Application

42

  • Institute of Microbiology, CAS –
  • World Data Center for Microorganisms

# of Triples # of Entities 3,594,457,749 414,953,654

PREFIX annotation: <http://gcm.wdcm.org/ontology/gcmAnnotation/v1/> PREFIX taxonomy: <http://gcm.wdcm.org/data/gcmAnnotation1/taxonomy/> SELECT ?taxonId ?name WHERE { ?taxonId annotation:parentTaxid taxonomy:1270. ?nameId annotation:taxid ?taxonId. ?nameId annotation:nameclass ‘scientificName’. ?nameId annotation:taxname ?name. }

“searching strains of Micrococcus luteus”

藤黄微球菌 细菌 陆生菌 放线菌门 放线菌纲 微球菌目 微球菌科 微球菌属

slide-41
SLIDE 41

43

PREFIX annotation: <http://gcm.wdcm.org/ontology/gcmAnnotation/v1/> PREFIX taxonomy: <http://gcm.wdcm.org/data/gcmAnnotation1/taxonomy/> SELECT (COUNT(?geneid) AS ?num) WHERE { { ?taxonid annotation:ancestorTaxid taxonomy:1270. ?geneid a annotation:GeneNode. ?geneid annotation:x-taxon ?taxonid. }UNION { ?geneid a annotation:GeneNode. ?geneid annotation:x-taxon taxonomy:1270. } }

“The number of genes related to Micrococcus luteus and its descendants” # of Triples # of Entities 3,594,457,749 414,953,654

gStore Application

  • Institute of Microbiology, CAS –
  • World Data Center for Microorganisms
slide-42
SLIDE 42

44

PREFIX annotation: <http://gcm.wdcm.org/ontology/gcmAnnotation/v1/> PREFIX taxonomy: <http://gcm.wdcm.org/data/gcmAnnotation1/taxonomy/> SELECT ?taxonid ?name ?genomeid ?description ?strain WHERE { ?taxonid annotation:ancestorTaxid taxonomy:1270. ?nameId a annotation:TaxonName. ?nameId annotation:taxid ?taxonid. ?nameId annotation:nameclass ‘scientificName’. ?nameId annotation:taxname ?name. ?genomeid a annotation:GenomeNode. ?genomeid annotation:x-taxon ?taxonid. ?genomeid annotation:definition ?description.

  • ptional{?genomeid annotation:strain ?strain.}

}

“ Searching for the genes and descriptions related to Micrococcus luteus and its descendants”

gStore Application

  • Institute of Microbiology, CAS –
  • World Data Center for Microorganisms

# of Triples # of Entities 3,594,457,749 414,953,654

slide-43
SLIDE 43

Subgraph Matching-based Natural Language Question/Answering

45

slide-44
SLIDE 44

KG-based Question/Answering

  • SPARQL syntax are too complex for ordinary users
  • RDF KG is“schema-less”data, not like schema-first

relational database.

46

slide-45
SLIDE 45
  • An Easy-to-Use Interface to Access Knowledge Graph
  • It is interesting to both academia and industry.
  • Interdisciplinary research between database and NLP

(natural language processing) communities.

47

KG-based Question/Answering

slide-46
SLIDE 46
  • Information Retrieval-based
  • Generate candidate answers
  • Ranking
  • Semantic Parsing-based
  • Translate NLQ to logical forms
  • Executing

48

KG-based Question/Answering

slide-47
SLIDE 47
  • Information Retrieval-based

50

KG-based Question/Answering

Paul.W.S.Anderson film director Resident_Evil “6.5E7” type budget type “What is the budget of the film directed by Paul Anderson?” director

slide-48
SLIDE 48
  • Information Retrieval-based

51

KG-based Question/Answering

Paul.W.S.Anderson film director Resident_Evil “6.5E7” type budget type “What is the budget of the film directed by Paul Anderson?” Mentioned entity

  • Step. 1
  • Step. 2

Candidate Answer Selection (within 2-hops)

  • Step. 3

Ranking Answers “6.5E7” director

slide-49
SLIDE 49

Our Approach- Data Driven gAnswer [Zou et al, SIGMOD 14]

  • Using graph

matching-based method

  • Graph Matching-

based Disambiguation

  • Combing

Disambiguation and Query together

52

slide-50
SLIDE 50

Our Approach- Data Driven Solution gAnswer

53

Using two Dictionaries Entity Name Dictionary: Entity Mention Extraction and Linking Relation Paraphrasing Dictionary: Relation Mention Extraction and Mapping

slide-51
SLIDE 51

Our Approach- Data Driven Solution gAnswer

54

slide-52
SLIDE 52

Our Approach- Data Driven Solution gAnswer

55

slide-53
SLIDE 53

Our Approach- Data Driven Solution gAnswer

56

slide-54
SLIDE 54

Our Approach- Data Driven Solution gAnswer

57

slide-55
SLIDE 55

Our Approach- Data Driven Solution gAnswer

58

slide-56
SLIDE 56

Our Approach- Data Driven Solution gAnswer

59

slide-57
SLIDE 57

Our Approach- Data Driven Solution gAnswer

60

slide-58
SLIDE 58

Our Approach- Data Driven Solution gAnswer

61

slide-59
SLIDE 59

Online Demo

URL: http://ganswer.gstore-pku.com/

slide-60
SLIDE 60

Keyword Search Over RDF graphs

  • --a query graph assembly approach

63

slide-61
SLIDE 61

Motivation

64

SPARQL vs Keywords

  • Easy-to-use RDF query interfaces:
  • Natural Language Query Answering (NL-QA)
  • -- “Which scientist graduate from a university that located in USA?”
  • Keyword Search
  • -- “scientist graduate from university USA”
  • more concise and flexible
slide-62
SLIDE 62

Challenges

65

  • Understanding the query intention accurately
  • ambiguity of keywords – multiple ways to “interpret” a keyword

res:United_States res:USA_Today res:USA_(album)

“USA”: dbo:almaMater dbo:education dbo:college “graduate from”:

Effectiveness

slide-63
SLIDE 63

Challenges

66

  • Understanding the query intention accurately
  • ambiguity of keywords – multiple ways to “interpret” a keyword
  • ambiguity of query structures – multiple ways to “assemble” the query

dbo:Scientist dbo:University res:United_States dbo:almaMater dbo:country dbo:Scientist dbo:University res:United_States dbo:almaMater dbo:country

Effectiveness

slide-64
SLIDE 64

Our Task

67

  • We study the keyword search on RDF graphs.
  • Given a keyword token sequence 𝑆𝑅 = 𝑙1, 𝑙2, … , 𝑙𝑛 ,
  • ur task is to interpret 𝑆𝑅 as a query graph 𝑅.

?x ?y dbo:almaMater rdf:type rdf:type dbo:country

dbo:Scientist

dbo:University

res:United_States

“scientist graduate from university USA”

𝑆𝑅 𝑅

slide-65
SLIDE 65

Solution Overview

69

Query Graph Assembly

dbo:Scientist dbo:University res:United_States res:USA_Today dbo:almaMater

V1 ={ }

dbo:education dbo:country dbo:location

“scientist”:

V2={ }

“university”: “USA”: V3={ } “graduate from”:

E2={ }

“locate”:

E2={ }

Elementary Query Graph Building Blocks

slide-66
SLIDE 66

QGA Problem

70

  • Query Graph Assembly Problem (QGA):
  • Given 𝑜 vertex terms 𝑢𝑗

𝑤 𝑗 = 1, … , 𝑜 , each 𝑢𝑗 𝑤 is matched to a set 𝑊 𝑗 of

candidate entity/class vertices;

  • and 𝑛 edge terms 𝑢𝑘

𝑓 𝑘 = 1, … , 𝑛 , each 𝑢𝑘 𝑓 is matched to a set 𝐹 𝑘 of

candidate predicate edges.

  • A valid assembly query graph 𝑅 𝑊

𝑅, 𝐹𝑅 must satisfy the following

constraints:

  • each set 𝑊

𝑗 has exactly one vertex in 𝑊 𝑅;

  • each set 𝐹

𝑘 has exactly one edge in 𝐹𝑅.

Definition

slide-67
SLIDE 67

QGA Problem

71

  • 𝑑𝑝𝑡𝑢 𝑅 =

𝑥 𝑤1, 𝑤2 , 𝑞

𝑓 𝑤1,𝑤2 ,𝑞 ∈𝑅

  • where 𝑥

𝑤1, 𝑤2 , 𝑞 denotes the triple assembly cost.

  • The query graph assembly (QGA) problem is to construct a

valid query graph 𝑅 with the minimum 𝑑𝑝𝑡𝑢 𝑅 .

Cost Function

slide-68
SLIDE 68

Assembly Cost

72

  • 𝑥

𝑤1, 𝑤2 , 𝑞 = 𝑁𝐽𝑂 𝑤1 + 𝑞 − 𝑤2 , 𝑤2 + 𝑞 − 𝑤1

TransE Model

O y x 𝒒 𝒕 + 𝒒 𝒕 𝒑

𝒕 + 𝒒 ≈ 𝒑

slide-69
SLIDE 69

QGA Problem

73

  • Theorem: The QGA problem is NP-complete.
  • Proof: We reduce 3-SAT problem to QGA.

Hardness

slide-70
SLIDE 70

Bipartite Graph Model

74

Grouped Nodes

dbo:almaMater dbo:education dbo:country dbo:location

<dbo:Scientist, dbo:University>

V1× V2

E1 E2

<dbo:Scientist, res:USA_Today> <dbo:Scientist, res:United_States> <dbo:University, res:United_States> <dbo:University, res:USA_Today>

V1× V3 V2× V3

𝑾𝒋 × 𝑾𝒌 𝑭𝒋

slide-71
SLIDE 71

Experiments QALD is a series of evaluation campaigns on question answering over linked data.

QALD-6 Competition Results

slide-72
SLIDE 72

IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems

83

slide-73
SLIDE 73

84

WHY? & WHY NOT?

Which actress was born in countries in Europe?

<Marilyn_Monroe> <Judy_Garland> <Lana_Tumer> <Audrey_Hepbum> <Mariene_Dietrich> <Eva_Green> <Elizabeth_Taylor> WHY NOT WHY NOT WHY

Motivation

slide-74
SLIDE 74

<occupation> <Actress> ?actress <country> ?city ?country <type> <EuropeanCountry> <occupation> <Actress> ?actress <occupation> <type> <EuropeanCountry> <Actress>

Which actress was born in countries in Europe?

<type> <Country> ?country ?actress <birthPlace> <birthPlace> ?country

Ordinary Query Q Refined Query Q =Q 1 Q 2

Q-(D)

n1 <Marilyn_Monroe> No. Entity n2 <Judy_Garland> r1 <Audrey_Hepburn> n3 <Lana_Turner> r2 <Mariene_Dietrich> r3 <Eva_Green> r4 <Elizabeth_Taylor>

Q+(D) QΔ(D)

Q(D)

Q 1 Q 2

<deathPlace>

R

Framework IMPROVE-QA [Xinbo Zhang et. al , WWW 17 Poster &SIGMOD 18 demo ]

Demo Group 2 Wednesday 14:00- 15:30

slide-75
SLIDE 75

Semantic Search---a graph similarity- based method

86

slide-76
SLIDE 76

Enumerating all ?

“Schema-less” leads to “Schema variety” Eg: In DBpedia,“Germanic Vehicles”has at least FIVE different schemas

87

Motivation

slide-77
SLIDE 77

Semantic Similarity Search [Weiguo Zheng et al., VLDB

2016]

88

Key Issue: How to define “Graph Similarity Function” in the context of KG ?

slide-78
SLIDE 78

Take-home Message

Graph-based KG data management is a feasible strategy. We need to re-consider graph computing techniques in the context of KG.

98

slide-79
SLIDE 79

Thanks

zoulei@pku.edu.cn

99