Mpri Internship Defense Advances in Holistic Ontology Alignment - - PowerPoint PPT Presentation

mpri internship defense advances in holistic ontology
SMART_READER_LITE
LIVE PREVIEW

Mpri Internship Defense Advances in Holistic Ontology Alignment - - PowerPoint PPT Presentation

Background Paris Performance Joins Theory Literals Application to IE Conclusion Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom ParisTech 1/32 Background


slide-1
SLIDE 1

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Mpri Internship Defense Advances in Holistic Ontology Alignment

Antoine Amarilli Supervised by Pierre Senellart

T´ el´ ecom ParisTech

1/32

slide-2
SLIDE 2

Background Paris Performance Joins Theory Literals Application to IE Conclusion

The Semantic Web

Paris France capitalOf Facts on the Web <p><b>Paris</b> is the <a href="Capital_city"> capital</a> of <a href="France">France</a></p> Facts on the semantic Web

The Web. Lots of information in semi-structured HTML documents. The semantic Web. An effort to represent information in a structured and semantic way.

  • Uses. Interoperability, integration of sources, constraints,

complex queries, inference.

2/32

slide-3
SLIDE 3

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Ontologies

dbp:Paris dbp:France dbp:capital http://www.paris.fr/ foaf:homepage 'Paris' foaf:name

Ontologies are the information sources of the Semantic Web. Vertices are entities or literals. Edges are facts labeled with a relation. Sources : manual creation, existing databases, information extraction.

3/32

slide-4
SLIDE 4

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Linked Data Cloud

Pfam ChEMBL DBpedia Medi Care ERA OS Affy- metrix

Smart Link

UniProt (Bio2RDF) TCM Gene DIT OMIM SIDER Project Guten- berg ProDom HGNC Gene Ontology Eurécom UniRef Drug Bank Pub Chem nked Open Colors LinkedCT SISVU dbpedia lite BNB iServe PubMed data-

  • pen-

ac- uk PRO- SITE Daily Med Taxo- nomy Google BibBase STIT PDB UniParc UniSTS MGI DBLP (L3S) GeneID

data dcs Disea- some

SGD UniProt UN/ LOCODE DBLP (FU Berlin) InterPro Enipedia

Many ontologies are created independently: different entities and relations express the same things. Linked Data: integrate existing

  • ntologies in a network structured

by equality links between equivalent concepts. To automatically derive those links, we need to perform ontology alignment.

4/32

slide-5
SLIDE 5

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Ontology Alignment

Sometimes URIs do not help us and literals are ambiguous or have minor differences...

dbp:Charles_Brackett 'Charles William Brackett' foaf:name dbp:Titanic_(1953_film) dbp:producer 'Titanic' foaf:name imdb:p138992 'Charles Brackett' imdb:label imdb:tt0046435 imdb:producerOf 'Titanic' imdb:label

Sometimes the structures of the two ontologies do not match...

dbp:Douglas_Adams '1952-03-11' dbp:birthDate bnb:AdamsDouglas1952-2001 bnb:AdamsDouglas1952-2001/birth bio:event bio:Birth rdf:type '1952' bio:date 5/32

slide-6
SLIDE 6

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

6/32

slide-7
SLIDE 7

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris

Paris: Probabilistic Alignment of Relations, Instances, and Schema. To bootstrap a matching, Paris uses an equality function on literals and applies propagation rules.

x’ y’ r’ x y r = ⊆ = x’ y’ r’ x y r = ⊆ =

The rules are represented as a system of equations which we iterate until a fixpoint is reached:

Prn+1(x ≡ x′) = 1 −

  • r(x,y)

r′(x′,y′)

  • 1 − Prn(r′ ⊆ r) × fun-1(r) × Prn(y ≡ y′)
  • ×
  • 1 − Prn(r ⊆ r′) × fun-1(r′) × Prn(y ≡ y′)
  • Prn+1(r ⊆ r′) =
  • r(x,y)
  • 1 −

r′(x′,y′)

  • 1 − (Prn(x ≡ x′) × Prn(y ≡ y′))
  • r(x,y)
  • 1 −

x′,y′ (1 − Prn(x ≡ x′) × Prn(y ≡ y′))

  • 7/32
slide-8
SLIDE 8

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis 'Elvis Presley' b:name '1935-01-08' b:birthdate b:Priscilla b:spouse 'Priscilla Presley' b:name a:Elvis 'Elvis Presley' a:name '1935-01-08' a:birthdate a:Priscilla a:spouse 'Priscilla Presley' a:name

8/32

slide-9
SLIDE 9

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis 'Elvis Presley' b:name '1935-01-08' b:birthdate b:Priscilla b:spouse 'Priscilla Presley' b:name a:Elvis 'Elvis Presley' a:name '1935-01-08' a:birthdate a:Priscilla a:spouse 'Priscilla Presley' a:name

8/32

slide-10
SLIDE 10

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis 'Elvis Presley' b:name '1935-01-08' b:birthdate b:Priscilla b:spouse 'Priscilla Presley' b:name a:Elvis 'Elvis Presley' a:name '1935-01-08' a:birthdate a:Priscilla a:spouse 'Priscilla Presley' a:name

8/32

slide-11
SLIDE 11

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis 'Elvis Presley' b:name '1935-01-08' b:birthdate b:Priscilla b:spouse 'Priscilla Presley' b:name a:Elvis 'Elvis Presley' a:name '1935-01-08' a:birthdate a:Priscilla a:spouse 'Priscilla Presley' a:name

8/32

slide-12
SLIDE 12

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Relation Functionalities

A 'Eiffel tower' name '48.8583°N 2.2945°E' position B 'T

  • ur Eiffel'

name '48.8583°N 2.2945°E' position

Two instances should be aligned when they share the same values for aligned functional relations. In theory, the ontology schema should indicate which relations are functional. In practice, no schema, and no “strict” functionality: compute a fuzzy functionality in [0, 1] from the data.

9/32

slide-13
SLIDE 13

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Existing Implementation and Previous Results

Paris is implemented in Java. Paris was evaluated on:

toy datasets from the OAEI, DBpedia and Yago (two ontologies extracted from Wikipedia) Yago and IMDb

The evaluation is done in terms of precision, recall and F-measure.

Instances Classes Relations Prec Rec F Prec Rec F Prec Rec F OAEI person 100% 100% 100% 100% 100% 100% 100% 100% 100% OAEI restaurant 95% 88% 91% 100% 100% 100% 100% 66% 88% DBpedia–Yago 90% 73% 81% 94%

  • 93%
  • IMDb–Yago

94% 90% 92% 28%

  • 100%

80% 89% 10/32

slide-14
SLIDE 14

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

11/32

slide-15
SLIDE 15

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

12/32

slide-16
SLIDE 16

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Performance Improvements

The original Paris takes a few hours per iteration. Ways to improve this:

Replace BerkeleyDB by an in-memory representation of the

  • ntologies.

Parallelize the propagation of entity alignment scores over all

  • entities. Aggregate results at the end to avoid races.

Change the hardware (now that the computation is CPU-bound).

13/32

slide-17
SLIDE 17

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Performance Improvement Results

Iteration Original PARIS New PARIS (1 thread) New PARIS (4 threads) Startup 0h00 0h27 0h10 1 4h04 0h40 0h27 2 5h06 3h00 1h02 3 5h00 0h34 0h24 4 5h30 0h29 0h16 Total 20h 5h 2h

Table: Running times for the DBpedia–Yago alignment task. The

  • riginal Paris was run on an Intel Xeon E5620 CPU clocked at 2.40 Ghz
  • n a machine with 12 GB of RAM. The new Paris was run on an Intel

Core i7-3820 CPU clocked at 3.60 Ghz with 48 GB of RAM.

14/32

slide-18
SLIDE 18

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

15/32

slide-19
SLIDE 19

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Join Relations

a:Douglas_Adams a:UK a:countryOfBirth b:Douglas_Adams b:Cambridge b:birthPlace b:UK (b:birthPlace, b:country) b:country

The simplest possible difference in structure between

  • ntologies: relations of one ontology correspond to join

relations in the other ontology. The terminology is motivated by the “join” operator of relational algebra. We see the join as a binary predicate: the intermediate nodes are existentially quantified but projected away.

16/32

slide-20
SLIDE 20

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Support in Paris

We must keep the representation of joins implicit in Paris (memory constraints). We must recursively enumerate all possible join facts instead

  • f enumerating all possible facts.

We must avoid duplicate facts caused by multiple possible choices for the intermediate nodes. We cannot afford to enumerate all possible relations anymore (many possible joins). ⇒ New algorithm to compute the entity and relation alignments simultaneously.

17/32

slide-21
SLIDE 21

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Practical Issues

How to determine the functionality of join relations? How to select interesting joins to perform without exploring all joins? How to achieve acceptable running time on large ontologies? ⇒ We only perform the join alignment on small ontologies.

18/32

slide-22
SLIDE 22

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

19/32

slide-23
SLIDE 23

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Log-transformation and Product Graph

Prn+1(x ≡ x′) = 1 −

  • r(x,y)

r′(x′,y′)

  • 1 − Prn(r ′ ⊆ r) × fun-1(r) × Prn(y ≡ y ′)
  • ×
  • 1 − Prn(r ⊆ r ′) × fun-1(r ′) × Prn(y ≡ y ′)
  • The entity alignment equation is justified by a probabilistic

model (independent choices). If the relation functionalities and alignments are in {0, 1}, we can apply a log-transformation: LPrn(x ≡ x′) · ·= − log(1 − Prn(x ≡ x′)) By looking at propagation in the product graph, we get a nicer equation, for some matrix M and a constant literal alignment vector L: LPrn+1 = M LPrn +L

20/32

slide-24
SLIDE 24

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Green Measures

LPrn+1 = M LPrn +L This equation is similar to PageRank (LPrn+1 = M LPrn where M is a stochastic matrix) except:

1

The matrix is not stochastic.

2

Diverging to +∞ means convergence (because of the log-transformation).

3

L is pouring alignment weight to the aligned couples of literals.

This last point can be linked to the use of Green measures to focus the PageRank computation. This interpretation suggests possible changes to the entity alignment equation (but we lose the probabilistic interpretation).

21/32

slide-25
SLIDE 25

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

22/32

slide-26
SLIDE 26

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Literal Similarity Functions

Edgar R. Burroughs Douglas Adams and Constance Garnett Edgar Rice Burroughs Adams, Douglas Constance Garnett

The original Paris uses an exact literal equality function. Possible refinements: adjust for case, strip special characters, etc. Yet, we would need a better equality function giving > 0 weight to the alignment of similar literals. Approximate dictionary searching problem: given a literal, to find quickly all similar literals in the other ontology.

23/32

slide-27
SLIDE 27

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Results

We use a shingling technique which was implemented by Mayur Garg (who interned in the team from IIT Delhi). I interfaced his code with Paris. The performance of the shingling technique matches ad-hoc normalization on the OAEI restaurants dataset.

Precision Recall F-measure Paris with exact equality 95% 88% 91% Paris with shingling 96% 95% 96% Paris with normalization 98% 96% 97%

24/32

slide-28
SLIDE 28

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

25/32

slide-29
SLIDE 29

Background Paris Performance Joins Theory Literals Application to IE Conclusion

The Deep Web

Many structured databases can only be queried through interfaces designed for humans (Web forms and HTML result pages). To access this structured information, an automated agent must probe the form and perform wrapper induction

  • n the result pages.

To understand the meaning of the extracted records and attributes, we can use Paris (with a reference

  • ntology).

26/32

slide-30
SLIDE 30

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-31
SLIDE 31

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-32
SLIDE 32

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-33
SLIDE 33

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-34
SLIDE 34

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-35
SLIDE 35

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-36
SLIDE 36

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-37
SLIDE 37

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

  • ntology

alignment

  • ntology

enrichment 'Great Expectations' 'Charles Dickens' 'David Copperfield' 'by Charles Dickens' 'Dover Thrift Editions' ?e1 ?e2

rdfs:type rdfs:type

Labeled graph

'Penguin Books' ?class form probing new probing terms RDF triples generation

Result page

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics

List of records

Great Expectations Charles Dickens Dover Thrift Editions

The following results were found for your search:

David Copperfield by Charles Dickens Penguin Classics wrapper induction

y:hasName y:hasName

'Great Expectations' 'David Copperfield'

y:created

'Charles Dickens'

y:created y:hasName

Charles Dickens

rdfs:type rdfs:type rdfs:type

'Othello'

y:hasName y:created

'Shakespeare'

y:hasName

Othello Shakespeare Book Great Expectations David Copperfield (novel)

Yago Form

Author: Title:

Submit

Publisher: input and

  • utput

schema mapping

27/32

slide-38
SLIDE 38

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Results

We experimented the approach on the Amazon book search form. The entity alignments with the best confidence were indeed books aligned through their title and author. The system identified relations: y:hasPreferredName and (y:created, y:hasPreferredName). It linked them to the result page DOM paths and form fields. The support for join relations and approximate string matching is required in this setting. The approach was presented as a vision paper in the VLDS workshop of VLDB.

28/32

slide-39
SLIDE 39

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1

Background: the Semantic Web

2

The Paris System

3

Performance Improvements

4

Join Relations

5

Theoretical Analysis

6

Approximate Literal Matching

7

Application to Information Extraction

8

Conclusion

29/32

slide-40
SLIDE 40

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Summary of Contributions

Performance improvements resulting in an 10-fold speedup

  • ver the original implementation.

Support of join relation alignments on small ontologies. Insights on the relation between Paris and PageRank-inspired techniques. Integration of approximate string matching to improve the literal alignment. Application of Paris for deep Web analysis.

30/32

slide-41
SLIDE 41

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Further Work

  • Performance. Further gains to be made, perform more complete

benchmarks. Join relations. Performance improvements, especially ways to only select interesting joins. Arbitrary patterns?

  • Theory. Study the possible alternative choices and benchmark
  • them. Understand the full model (we still have no

proof of overall convergence!) and the effects of implementation tweaks. Find links with Max-SAT or Markov Logic Networks? Literal matching. Support of various datatypes such as numbers and dates (engineering work). Fix performance issues to perform larger experiments. Information Extraction. Try with more sources. Find links with named entity disambiguation techniques such as AIDA? Intensional use for large-scale integration.

31/32

slide-42
SLIDE 42

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Thanks!

Thanks for your attention! Questions ?

The research has been funded by the European Union’s seventh framework programme, in the setting of the European Research Council grant Webdam, agreement 226513, and the FP7 grant ARCOMEM, agreement 270239. Frame 4: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ 32/32