Ontology matching tutorial J er ome Euzenat Pavel Shvaiko - - PowerPoint PPT Presentation

ontology matching tutorial
SMART_READER_LITE
LIVE PREVIEW

Ontology matching tutorial J er ome Euzenat Pavel Shvaiko - - PowerPoint PPT Presentation

Problem Applications Basic techniques Process Conclusions Goals of the tutorial Ontology matching tutorial J er ome Euzenat Pavel Shvaiko Illustrate the role of ontology matching Provide an overview of basic matching techniques


slide-1
SLIDE 1

Ontology matching tutorial

J´ erˆ

  • me Euzenat

Pavel Shvaiko

& Montbonnot Saint-Martin, France Trento, Italy Jerome.Euzenat@inrialpes.fr pavel@dit.unitn.it

June 28, 2009

Problem Applications Basic techniques Process Conclusions

Goals of the tutorial

◮ Illustrate the role of ontology matching ◮ Provide an overview of basic matching techniques ◮ Demonstrate the use of basic matching techniques in state of the art systems ◮ Motivate future research

Ontology matching tutorial (v14) – Euzenat and Shvaiko 2 / 51 Problem Applications Basic techniques Process Conclusions

Outline

1

The ontology matching problem

2

Applications

3

Basic techniques

4

Matching process

5

Conclusions

Ontology matching tutorial (v14) – Euzenat and Shvaiko 3 / 51 Problem Applications Basic techniques Process Conclusions

Semantic webs

Ontology matching tutorial (v14) – Euzenat and Shvaiko 5 / 51

slide-2
SLIDE 2

Problem Applications Basic techniques Process Conclusions

Being serious about the semantic web

◮ It is not one guy’s ontology ◮ It is not several guys’ common ontology ◮ It is many guys and girls’ many ontologies ◮ So it is a mess, but a meaningful mess.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 6 / 51 Problem Applications Basic techniques Process Conclusions

Living with heterogeneity

The semantic web will be: ◮ huge; ◮ dynamic; ◮ heterogeneous. These are not bugs, they are features. We must learn to live with them.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 7 / 51 Problem Applications Basic techniques Process Conclusions

Heterogeneity problem

Resources being expressed in different ways must be reconciled before being used. Mismatch between formalized knowledge can occur when: ◮ different languages are used; ◮ different terminologies are used; ◮ different modelling is used.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 8 / 51 Problem Applications Basic techniques Process Conclusions

I have a plan for you

Reconciliation can be performed in 2 steps

Match, Matcher thereby determines the alignment A Generate Generator a processor (for merging, transforming, etc.) Transformation Matching can be achieved at run time or at design time.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 9 / 51

slide-3
SLIDE 3

Problem Applications Basic techniques Process Conclusions

Matching process

matching A′ A parameters resources

Ontology matching tutorial (v14) – Euzenat and Shvaiko 10 / 51 Problem Applications Basic techniques Process Conclusions

Motivation: two ontologies

Product DVD Book CD price title doi creator topic author integer string uri Person Monograph Essay Litterary critics Politics Biography Autobiography Literature isbn author title subject Human Writer Bertrand Russell: My life Albert Camus: La chute

Ontology matching tutorial (v14) – Euzenat and Shvaiko 11 / 51 Problem Applications Basic techniques Process Conclusions

Motivation: two ontologies

Product DVD Book CD price title doi creator topic author Person Monograph Essay Litterary critics Politics Biography Autobiography Literature isbn author title subject Human Writer ≥ ≥ ≥ ≥ ≤

Ontology matching tutorial (v14) – Euzenat and Shvaiko 11 / 51 Problem Applications Basic techniques Process Conclusions

Transformation and mediation

SELECT x.isbn WHERE x : Autobiography AND x.author = ”Bertrand Russell”

mediator

SELECT x.doi WHERE x : Book AND x.author = ”Bertrand Russell” AND x.topic = ”Bertrand Russell” x.doi=http://dx.doi.org/10.1080/041522862X x.isbn=041522862X

Ontology matching tutorial (v14) – Euzenat and Shvaiko 12 / 51

slide-4
SLIDE 4

Problem Applications Basic techniques Process Conclusions

Correspondence

Definition (Correspondence)

Given two ontologies o and o′, a correspondence between o and o′ is a 5-uple: id, e, e′, r, n such that: ◮ id is an identifier of the correspondence ◮ e and e′ are entities of o and o′ (e.g., XML elements, classes) ◮ r is a relation (e.g., equivalence (=), more general (⊒), disjointness (⊥)) ◮ n is a confidence measure in some mathematical structure (typically in the [0 1] range)

Ontology matching tutorial (v14) – Euzenat and Shvaiko 13 / 51 Problem Applications Basic techniques Process Conclusions

Alignment

Definition (Alignment)

Given two ontologies o and o′, an alignment (A) between o and o′: ◮ is a set of correspondences on o and o′ ◮ with some additional metadata (multiplicity: 1-1, 1-*, method, date, properties, etc.)

Ontology matching tutorial (v14) – Euzenat and Shvaiko 14 / 51 Problem Applications Basic techniques Process Conclusions

An application

Ontology matching tutorial (v14) – Euzenat and Shvaiko 16 / 51 Problem Applications Basic techniques Process Conclusions

Interoperability in semantic P2P systems

  • ′′

Matcher mediator query query answer answer

Ontology matching tutorial (v14) – Euzenat and Shvaiko 17 / 51

slide-5
SLIDE 5

Problem Applications Basic techniques Process Conclusions

Application: ontology evolution

Kbt

  • t
  • t+n

Matcher A Generator Transformation Kbt+n

Ontology matching tutorial (v14) – Euzenat and Shvaiko 18 / 51 Problem Applications Basic techniques Process Conclusions

Application: Catalog integration

DB

DBPortal Matcher A Generator Transformation

Ontology matching tutorial (v14) – Euzenat and Shvaiko 19 / 51 Problem Applications Basic techniques Process Conclusions

Applications: P2P information sharing

peer1

  • peer2

Matcher A Generator mediator query query answer answer

Ontology matching tutorial (v14) – Euzenat and Shvaiko 20 / 51 Problem Applications Basic techniques Process Conclusions

Applications: Peer-to-peer and emergent semantics

peer1

  • peer2

Matcher A

  • 1

1

A1

  • 2

2

A2

  • 3

Ontology matching tutorial (v14) – Euzenat and Shvaiko 21 / 51

slide-6
SLIDE 6

Problem Applications Basic techniques Process Conclusions

Applications: Web service composition

service1 service2

  • utput

input

Matcher A Generator mediator

Ontology matching tutorial (v14) – Euzenat and Shvaiko 22 / 51 Problem Applications Basic techniques Process Conclusions

Applications: Agent communication

message Matcher A Generator Translator axioms

Ontology matching tutorial (v14) – Euzenat and Shvaiko 23 / 51 Problem Applications Basic techniques Process Conclusions

Applications requirements

Application instances run time automatic correct complete

  • peration

Ontology evolution √ √ √ transformation Schema integration √ √ √ merging Catalog integration √ √ √ data translation Data integration √ √ √ query answering P2P information sharing √ query answering Web service composition √ √ √ data mediation Multi agent communication √ √ √ √ data translation Query answering √ √ query reformulation

Ontology matching tutorial (v14) – Euzenat and Shvaiko 24 / 51 Problem Applications Basic techniques Process Conclusions

On what basis can we match?

◮ Content: relying on what is inside the ontology

◮ Name, comments, alternate names, names of related entities: NLP, IR, etc. ◮ Internal structure: constraints on relations, typing ◮ External structure: relations between entities: Data mining, Discrete mathematics ◮ Extension: Statistics, data analysis, data mining, machine learning ◮ Semantics (models): Reasoning techniques

◮ Context: the relations of the ontology with the outside

◮ Annotated resources: ◮ The web ◮ External ontologies: dbpedia, etc. ◮ External resources: wordnet, etc.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 26 / 51

slide-7
SLIDE 7

Problem Applications Basic techniques Process Conclusions

Element-level techniques: String-based

Edit distance ◮ takes as input two strings and calculates the number of edition

  • perations, (e.g., insertions, deletions, substitutions) of characters

required to transform one string into another, ◮ normalized by length of the maximum string ◮ EditDistance(NKN,Nikon) = NiKoN/5 = 2/5 = 0.4 ◮ EditDistance(editeur,editor) = edit e

  • ur/7= 3/7 = 0.43

(e.g., S-Match, OLA, Anchor-Prompt)

Ontology matching tutorial (v14) – Euzenat and Shvaiko 27 / 51 Problem Applications Basic techniques Process Conclusions

Element-level techniques: Linguistic resources

◮ Sense-based: WordNet hierarchy distance person God creator1 creator2 artist maker communicator litterate legal document illustrator author1 writer2=author2 writer1 writer3 illustrator author creator Person writer Some other measures (e.g., Resnik measure) depends on the frequency of the terms in the corpus made of all the labels of the ontologies. (e.g., S-Match)

Ontology matching tutorial (v14) – Euzenat and Shvaiko 28 / 51 Problem Applications Basic techniques Process Conclusions

Context-based matching

◮ Using the ontologies on the web as context; ◮ Composing the relations obtained through these ontologies

  • 1. Harvest ontologies on the web;
  • 2. Select those which are related to the ontologies to match;
  • 3. Match them to the found ontologies;
  • 4. Compose the relations between entities through the intermediate
  • ntologies;
  • 5. Aggregate the obtained results (if desired).

At each step there is some lattitude.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 29 / 51 Problem Applications Basic techniques Process Conclusions

Example: Scarlet

Beef Food Agrovoc NAL TAP Beef MeatOrPoultry RedMeat Food ≤ ≤ ≤ = = ≤ Beef Food ≤ = = Beef Food = =

Ontology matching tutorial (v14) – Euzenat and Shvaiko 30 / 51

slide-8
SLIDE 8

Problem Applications Basic techniques Process Conclusions

Extensionnal techniques

ǫ : C → E E can be a set of instances, a set of documents which are indexed by concepts, a set of items, e.g., people, which use these concepts. Two cases: ◮ E is common to both ontologies; ◮ E depends on the ontology. This can be reduced to the former case by identification or record linkage techniques. Techniques: ◮ statistical and machine learning techniques infer and compare the characteristics of populations; ◮ set-theoretic techniques compare the extensions;

Ontology matching tutorial (v14) – Euzenat and Shvaiko 31 / 51 Problem Applications Basic techniques Process Conclusions

Extensionnal techniques

Product DVD Book CD Monograph Essay Litterary critics Politics Biography Autobiography Literature ≥ ≥ . 8

Ontology matching tutorial (v14) – Euzenat and Shvaiko 32 / 51 Problem Applications Basic techniques Process Conclusions

Extensionnal techniques

Product DVD Book CD Monograph Essay Litterary critics Politics Biography Autobiography Literature ≥ ≥ . 8

Ontology matching tutorial (v14) – Euzenat and Shvaiko 32 / 51 Problem Applications Basic techniques Process Conclusions

Structure-level techniques: Model-based

Description logics (DL)-based micro-company = company ⊓ ≤5 employee SME = firm ⊓ ≤10 associate = ≥ company = firm ; associate ⊑ employee ≤ micro-company ⊑ SME

Ontology matching tutorial (v14) – Euzenat and Shvaiko 33 / 51

slide-9
SLIDE 9

Problem Applications Basic techniques Process Conclusions

Matching process

Basic matchers provide candidate correspondences, most of the systems use several such matchers and further combine and filter their results.

A M M′ M′′ Matcher composition A′ A′′ A′′′ Aggregation A′′′′ Filtering A′′′′′ Iteration

Ontology matching tutorial (v14) – Euzenat and Shvaiko 35 / 51 Problem Applications Basic techniques Process Conclusions

Sequential composition

A matching A′ matching′ A′′ parameters resources parameters′ resources′

Ontology matching tutorial (v14) – Euzenat and Shvaiko 36 / 51 Problem Applications Basic techniques Process Conclusions

Data integration as sequential composition

A f A′ d d′ f ′ A′′

Ontology matching tutorial (v14) – Euzenat and Shvaiko 37 / 51 Problem Applications Basic techniques Process Conclusions

Parallel composition

A matching A′ matching′ A′′ A′′′ resources′ parameters′ resources parameters

Ontology matching tutorial (v14) – Euzenat and Shvaiko 38 / 51

slide-10
SLIDE 10

Problem Applications Basic techniques Process Conclusions

Similarity filter, alignment extractor and alignment filter

Many algorithms are based on similarity or distance computation. A number

  • f operations can be based on similarity/distance matrices.

M similarity filter M′ alignment extractor A alignment filter A′

Ontology matching tutorial (v14) – Euzenat and Shvaiko 39 / 51 Problem Applications Basic techniques Process Conclusions

Aggregation operations

There are many different ways to aggregate matcher results, usually depending on confidence/similarity: ◮ Triangular norms (min, weighted products) useful for selecting only the best results; ◮ Multidimentional distances (Eudidean distance, weighted sum) useful for taking into account all dimensions; ◮ Fuzzy aggregation (min, weighted average) useful for aggregating competing algorithms and averaging their results; ◮ Other specific measures (e.g., ordered weighted average).

Ontology matching tutorial (v14) – Euzenat and Shvaiko 40 / 51 Problem Applications Basic techniques Process Conclusions

Dealing with cycles: fix point computation

C1 C ′

1

C2 C ′

2

q p q′ p′

σC (c, c′) =wC

A .

1 max(|A(c)|, |A(c′)|) . X

a,a′∈match(A(c),A(c′))

σA(a, a′) + wC

N .σ(N(c), N(c′))

σA(a, a′) =wA

C .σC (domain(a), domain(a′)) + wA N.σ(N(a), N(a′)) Ontology matching tutorial (v14) – Euzenat and Shvaiko 41 / 51 Problem Applications Basic techniques Process Conclusions

Dealing with cycles: fix point computation

C1 C ′

1

C2 C ′

2

q p q′ p′ C1 p C2 q C ′

1

.4 .6 p′ .8 .2 C ′

2

.5 .6 q′ .4 .5

σC (c, c′) =.6. 1 max(|A(c)|, |A(c′)|) . X

a,a′∈match(A(c),A(c′))

σA(a, a′) + .4.σ(N(c), N(c′)) σA(a, a′) =.6.σC (domain(a), domain(a′)) + .4.σ(N(a), N(a′))

Ontology matching tutorial (v14) – Euzenat and Shvaiko 41 / 51

slide-11
SLIDE 11

Problem Applications Basic techniques Process Conclusions

Dealing with cycles: fix point computation

C1 C ′

1

C2 C ′

2

q p q′ p′ C1 p C2 q C ′

1

.64 .36 p′ .68 .38 C ′

2

.32 .54 q′ .52 .44

σC (c, c′) =.6. 1 max(|A(c)|, |A(c′)|) . X

a,a′∈match(A(c),A(c′))

σA(a, a′) + .4.σ(N(c), N(c′)) σA(a, a′) =.6.σC (domain(a), domain(a′)) + .4.σ(N(a), N(a′))

Ontology matching tutorial (v14) – Euzenat and Shvaiko 41 / 51 Problem Applications Basic techniques Process Conclusions

Dealing with cycles: fix point computation

C1 C ′

1

C2 C ′

2

q p q′ p′ C1 p C2 q C ′

1

.57 .47 p′ .64 .27 C ′

2

.51 .5 q′ .38 .58

σC (c, c′) =.6. 1 max(|A(c)|, |A(c′)|) . X

a,a′∈match(A(c),A(c′))

σA(a, a′) + .4.σ(N(c), N(c′)) σA(a, a′) =.6.σC (domain(a), domain(a′)) + .4.σ(N(a), N(a′))

Ontology matching tutorial (v14) – Euzenat and Shvaiko 41 / 51 Problem Applications Basic techniques Process Conclusions

Dealing with cycles: fix point computation

C1 C ′

1

C2 C ′

2

q p q′ p′ C1 p C2 q C ′

1

.53 .47 p′ .67 .34 C ′

2

.46 .56 q′ .4 .52

Threshold reached: no .1 variation σC (c, c′) =.6. 1 max(|A(c)|, |A(c′)|) . X

a,a′∈match(A(c),A(c′))

σA(a, a′) + .4.σ(N(c), N(c′)) σA(a, a′) =.6.σC (domain(a), domain(a′)) + .4.σ(N(a), N(a′))

Ontology matching tutorial (v14) – Euzenat and Shvaiko 41 / 51 Problem Applications Basic techniques Process Conclusions

Learning matcher( parameter)s

matching A′ parameters resources comparison A

Ontology matching tutorial (v14) – Euzenat and Shvaiko 42 / 51

slide-12
SLIDE 12

Problem Applications Basic techniques Process Conclusions

Filtering similarities: thresholding

◮ Hard threshold retains all the correspondence above threshold n; ◮ Delta threshold consists of using as a threshold the highest similarity value out of which a particular constant value d is subtracted; ◮ Proportional threshold consists of using as a threshold the percentage

  • f the highest similarity value;

◮ Percentage retains the n% correspondences above the others.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 43 / 51 Problem Applications Basic techniques Process Conclusions

Filtering similarities: Softening and hardening

Applies a monotonous function f : [0 1] → [0 1] ◮ Hardening all correspondences with non-1 confidence are assigned 0 confidence; ◮ Smoothening (e.g., sigmo¨ ıd) consists of using as a threshold the highest similarity value out of which a particular constant value d is subtracted; ◮ Weakening consists of using as a threshold the percentage of the highest similarity value;

Ontology matching tutorial (v14) – Euzenat and Shvaiko 44 / 51 Problem Applications Basic techniques Process Conclusions

Extracting alignments

Book Translator Publisher Writer Product .84 0. .90 .12 Provider .12 0. .84 .60 Creator .60 .05 .12 .84 ◮ Greedy algorithm: 1.96;

Ontology matching tutorial (v14) – Euzenat and Shvaiko 45 / 51 Problem Applications Basic techniques Process Conclusions

Extracting alignments

Book Translator Publisher Writer Product .84 0. .90 .12 Provider .12 0. .84 .60 Creator .60 .05 .12 .84 ◮ Greedy algorithm: 1.96; ◮ Stable marriage: 2.1;

Ontology matching tutorial (v14) – Euzenat and Shvaiko 45 / 51

slide-13
SLIDE 13

Problem Applications Basic techniques Process Conclusions

Extracting alignments

Book Translator Publisher Writer Product .84 0. .90 .12 Provider .12 0. .84 .60 Creator .60 .05 .12 .84 ◮ Greedy algorithm: 1.96; ◮ Stable marriage: 2.1; ◮ Maximal weight match: 2.52.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 45 / 51 Problem Applications Basic techniques Process Conclusions

Summary

◮ Ontology heterogeneity is the nature of the semantic web; ◮ Ontology matching is part of the solution; ◮ It can be based on many different techniques; ◮ There already are numerous systems there; ◮ A relatively solid research field has emerged (tools, formats, evaluation, etc.) and is making progress; ◮ But there remains serious challenges ahead.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 47 / 51 Problem Applications Basic techniques Process Conclusions

Challenges 2009

◮ Large-scale ontology matching evaluation, ◮ Efficiency of ontology matching techniques, ◮ Uncertainty in ontology matching, ◮ Context-based matching, ◮ Matcher selection and self-configuration, ◮ User involvement, ◮ Explanation of matching results, ◮ Social and collaborative ontology matching, ◮ Alignment management: infrastructure and support, ◮ Reasoning with alignments. and, of course, many others,

Ontology matching tutorial (v14) – Euzenat and Shvaiko 48 / 51 Problem Applications Basic techniques Process Conclusions

Acknowledgments

We thank all the participants of the Heterogeneity workpack- age of the Knowledge Web network of excellence In particular, we are grateful to Than-Le Bach, Jesus Barrasa, Paolo Bouquet, Jan De Bo, Jos De Bruijn, Rose Dieng-Kuntz, Enrico Franconi, Ra´ ul Garc´ ıa Castro, Manfred Hauswirth, Pascal Hitzler, Mustafa Jarrar, Markus Kr¨

  • tzsch, Ruben Lara, Malgorzata Mochol, Amedeo Napoli, Luciano

Serafini, Fran¸ cois Sharffe, Giorgos Stamou, Heiner Stuckenschmidt, York Sure, Vojtˇ ech Sv´ atek, Valentina Tamma, Sergio Tessaris, Paolo Traverso, Rapha¨ el Troncy, Sven van Acker, Frank van Harmelen, and Ilya Zaihrayeu. And more specifically to Marc Ehrig, Fausto Giunchiglia, Loredana Laera, Diana Maynard, Deborah McGuinness, Petko Valchev, Mikalai Yatskevich, and Antoine Zimmermann for their support and insightful comments Part of this work was carried out while Pavel Shvaiko was with the university

  • f Trento.

Ontology matching tutorial (v14) – Euzenat and Shvaiko 49 / 51

slide-14
SLIDE 14

Problem Applications Basic techniques Process Conclusions

Ontology matching the book

J´ erˆ

  • me Euzenat, Pavel Shvaiko

Ontology matching

  • 1. Applications
  • 2. Problem definition
  • 3. Classification
  • 4. Basic techniques
  • 5. Strategies
  • 6. Systems
  • 7. Evaluation
  • 8. Representation
  • 9. Explanation
  • 10. Processing

http://book.ontologymatching.org

Ontology matching tutorial (v14) – Euzenat and Shvaiko 50 / 51

Questions? pavel@dit.unitn.it Jerome.Euzenat@inria.fr http://exmo.inrialpes.fr