Linking and Building Ontologies of Linked Data Rahul Parundekar, - - PowerPoint PPT Presentation

linking and building ontologies of linked data
SMART_READER_LITE
LIVE PREVIEW

Linking and Building Ontologies of Linked Data Rahul Parundekar, - - PowerPoint PPT Presentation

Linking and Building Ontologies of Linked Data Rahul Parundekar, Craig A. Knoblock and Jose-Luis Ambite {parundek,knoblock,ambite}@isi.edu University of Southern California Web of Linked Data Vast collection of interlinked information


slide-1
SLIDE 1

Rahul Parundekar, Craig A. Knoblock and Jose-Luis Ambite

{parundek,knoblock,ambite}@isi.edu

University of Southern California

Linking and Building Ontologies of Linked Data

slide-2
SLIDE 2

Web of Linked Data

  • Vast collection of interlinked information
  • Different sources with different schemas
slide-3
SLIDE 3

Web of Linked Data

  • Interlinked instances in the various domains
  • Equivalent instances linked with owl:sameAs

Geospatial Domain

slide-4
SLIDE 4

Interlinked Instances

Source 1 Source 2 Schema Level Instance Level

  • wl:sameAs

Los Angeles City of Los Angeles PopulatedPlac e City

slide-5
SLIDE 5

Disjoint Schemas

Source 1 Source 2 Schema Level Instance Level Los Angeles City of Los Angeles

  • wl:sameAs

City NO LINKS!! PopulatedPlac e

slide-6
SLIDE 6

Objective 1: Find Schema Alignments

Source 1 Source 2 Schema Level Instance Level Los Angeles City of Los Angeles

  • wl:sameAs

City = PopulatedPlac e

slide-7
SLIDE 7

Ontologies of Linked Data

  • Ontologies can be highly specialized
  • e.g. DBpedia has classes for Educational Institutions,

Bridges, Airports, etc.

  • But some can be rudimentary
  • e.g. in Geonames all instances only belong to a single

class – ‘Feature’

  • Derived from RDBMS schemas from which Linked Data

was generated

slide-8
SLIDE 8

Traditional Alignments

Geonames DBpedia Schema Level Instance Level University of Southern California University of Southern California

  • wl:sameAs

Feature Educational Institution ⊃

  • There might not exist exact equivalences between classes in two sources
  • Only subset relations possible
slide-9
SLIDE 9

Restriction Classes

  • A specialized class can be created by

restricting the value of one or more properties

  • The following Venn diagram explains a

restriction class in Geonames with a restriction

  • n the value of the featureCode property as

‘S.SCH’

Set of all instances in Restricted Class - rdf:type=Feature & featureCode=S.SCH Set of all instances in Original Class - rdf:type=Feature

slide-10
SLIDE 10

Objective 2: Find Alignments Between Restriction Classes

Geonames DBpedia Schema Level Instance Level

University of Southern California University of Southern California

  • wl:sameAs

rdf:type=Feature & featureCode=S.SCH rdf:type=Educational Institution

  • Find and model specialized descriptions of

classes

=

slide-11
SLIDE 11

Domains

  • Geospatial
  • Dbpedia
  • LinkedGeoData
  • Geonames
  • Zoology
  • Geospecies
  • Dbpedia
  • Genetics (Bio2RDF)
  • GeneID
  • MGI
slide-12
SLIDE 12

Approach

  • Aligning Restriction Classes

R1 R2

slide-13
SLIDE 13

Approach

  • Aligning Restriction Classes
  • Find relation between the two restriction

classes

  • Equivalent
  • Subset

R1 R2 ?

slide-14
SLIDE 14

Extensional Approach to Ontology Alignment

slide-15
SLIDE 15

Lattice of Restriction Classes

  • Instances belonging to a restriction class also

belong to parent restriction class

  • e.g. restrictions from Geonames below
  • This also results in a hierarchy in the

alignments, which our algorithm exploits

slide-16
SLIDE 16

Exploration of Hypotheses Search Space (LinkedGeoData with DBpedia)

(lgd:gnis%3AST_alpha=NJ) (dbpedia:Place#type= h>p://dbpedia.org/resource/City_(New_Jersey)) (rdf:type=lgd:country) (rdf:type=owl:Thing) (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater) Seed hypotheses generation (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City) Seed hypothesis pruning (owl:Thing covers all instances) Prune as no change in the extension set Pruning on empty set r2=Ø (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City & rdf:type=owl:Thing)

slide-17
SLIDE 17
  • 1. Prune seed hypothesis if either restriction

covers all instances in that source

(lgd:gnis%3AST_alpha=NJ) (dbpedia:Place#type= h>p://dbpedia.org/resource/City_(New_Jersey)) (rdf:type=lgd:country) (rdf:type=owl:Thing) (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater) Seed hypotheses generation (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City) Seed hypothesis pruning (owl:Thing covers all instances) Prune as no change in the extension set Pruning on empty set r2=Ø (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City & rdf:type=owl:Thing)

1

slide-18
SLIDE 18
  • 2. Number of instance pairs supporting

hypothesis must be above a threshold

(lgd:gnis%3AST_alpha=NJ) (dbpedia:Place#type= h>p://dbpedia.org/resource/City_(New_Jersey)) (rdf:type=lgd:country) (rdf:type=owl:Thing) (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater) Seed hypotheses generation (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City) Seed hypothesis pruning (owl:Thing covers all instances) Prune as no change in the extension set Pruning on empty set r2=Ø (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City & rdf:type=owl:Thing)

2

slide-19
SLIDE 19
  • 3. Prune if the added constraint does not

change the extension

(lgd:gnis%3AST_alpha=NJ) (dbpedia:Place#type= h>p://dbpedia.org/resource/City_(New_Jersey)) (rdf:type=lgd:country) (rdf:type=owl:Thing) (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater) Seed hypotheses generation (rdf:type=lgd:node) (rdf:type=dbpedia:PopulatedPlace & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (rdf:type=dbpedia:BodyOfWater & dbpedia:Place#type=dbpedia:City) (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City) Seed hypothesis pruning (owl:Thing covers all instances) Prune as no change in the extension set Pruning on empty set r2=Ø (rdf:type=lgd:node) (dbpedia:Place#type=dbpedia:City & rdf:type=owl:Thing)

3

slide-20
SLIDE 20
  • 4. Lexicographic ordering

Lexicographic ordering provides a systematic search by pruning hypotheses with reverse

  • rder

Prune (p5=v5) (p8=v8) (p5=v5 & p6=v6) (p8=v8) (p5=v5 & p7=v7) (p8=v8) r2 (p5=v5 & p6=v6 & p7=v7) (p8=v8) Hypothesis r1

4

slide-21
SLIDE 21

Relaxed Scoring

  • Compensates for missing, inconsistent in the data
slide-22
SLIDE 22

Post-processing: Removing Implied Alignments

Keep the simpler definition & Remove the implied definition

slide-23
SLIDE 23

Removing Implied Alignments

r1 r’1 r2 r’2 Cascading

slide-24
SLIDE 24

Results: Geospatial Domain

slide-25
SLIDE 25

Results: Zoology Domain

slide-26
SLIDE 26

Results: Genetics Domain

slide-27
SLIDE 27

Results: Alignments Found

  • Equivalences, Subset alignments before and

after removing implied alignments

slide-28
SLIDE 28

Datasets: http://www.isi.edu/integration/data/LinkedData

slide-29
SLIDE 29

Related Work

  • Euzenat et al. – Ontology Matching
  • Terminological
  • Structural
  • Semantic
  • FCA-Merge, Duckham et al.
  • Use extensional techniques
  • GLUE
  • Uses an extensional technique after performing machine learning
  • perations
slide-30
SLIDE 30

Conclusion

  • Our algorithm generates alignments, consisting
  • f conjunctions of restriction classes
  • Extensional approach on Linked Data
  • Use of restriction classes
  • Alignments based on the actual data
  • We determine the relationships based on the data
  • Schemas of linked sources can be readily modeled

and used

  • Algorithm also able to
  • Specialize ontologies where original were rudimentary
  • Find complimentary hierarchy across an ontology
slide-31
SLIDE 31

Future Work

  • How to actually understand these alignments
  • Scalability
  • Pre-procesing of the sources
  • Faster alignment processing