INCMAP: A JOURNEY TOWARDS ONTOLOGY -BASED DATA INTEGRATION - - PowerPoint PPT Presentation

incmap a journey towards ontology
SMART_READER_LITE
LIVE PREVIEW

INCMAP: A JOURNEY TOWARDS ONTOLOGY -BASED DATA INTEGRATION - - PowerPoint PPT Presentation

INCMAP: A JOURNEY TOWARDS ONTOLOGY -BASED DATA INTEGRATION CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG, ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL. EXPLORING DATABASES CAN BE TEDIOUS Author of paper with title IncMap? SQL 2


slide-1
SLIDE 1

INCMAP: A JOURNEY TOWARDS ONTOLOGY

  • BASED DATA INTEGRATION

CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG, ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.

slide-2
SLIDE 2

EXPLORING DATABASES CAN BE TEDIOUS…

DBLP CMT EASYCHAIR Author of paper with title ‘IncMap’? SQL 2 SQL 1 SQL 3 Schema 1 Schema 2 Schema 3

slide-3
SLIDE 3

PROBLEM 1: TOO MANY TABLES

Author of paper with title ‘IncMap’? Id Name … Id Name … Id Name … Id Id Name … Id Name … Id Name … Id Name Name … Id Name … Id Name …

A typical SAP schema has more than 10.000 tables

slide-4
SLIDE 4

PROBLEM 2: LIMITED EXPRESSIVENESS

Person Author Reviewer name domain sub-class area domain e-mail domain

aid name e-mail 1 Lennon a@b rid name area 1 Harrison Onto pid e-mail 1 a@b pid area 2 Onto pid name 1 Lennon 2 Harrison pid name e-mail area type 1 Lennon a@b

  • author

2 Harrison

  • Onto

reviewer

Ontology

Author Reviewer Person Author Reviewer Person

Relational Schema (Option 1) Relational Schema (Option 3) Relational Schema (Option 2)

Modeling generalization is “messy”

slide-5
SLIDE 5

PROBLEM 3: TECHNICAL DESIGN

BDC_IXN_FACT_MA BDC_ACCOUNT_DIM BDC_DEMOGRAPHICS_DIM BDC_IXN_FACT_WA

Other issues:

  • De-normalization (i.e., merge tables)
  • No foreign keys!
  • Performance optimizations (horizontal, vertical

fragmentation, …)

slide-6
SLIDE 6

ONTOLOGY-BASED DATA ACCESS

DBLP CMT EASYCHAIR

ONTOLOGY-BASED DATA ACCESS

SQL 2 SQL 1 SQL 3 HIGH-LEVEL QUERY Author of paper with title ‘IncMap’?

Person Author Reviewer name domain sub-class area domain e-mail domain

Ontology

Minimal Ontology (in OWL QL)

slide-7
SLIDE 7

ONTOLOGY-BASED DATA ACCESS

Relational Schema

Person Author Reviewer name domain sub-class area domain e-mail domain

Ontology Mapping? Ontology

IncMap: A Mapping Tool for Relational-To-Ontology Data Integration

slide-8
SLIDE 8

THE JOURNEY OF INCMAP

First version of IncMap

  • Incremental mapping
  • Leverage lexicographical and structural similarity

Christoph Pinkel, et al.: Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap. International Semantic Web Conference 2013

slide-9
SLIDE 9

THE JOURNEY OF INCMAP

First version of IncMap

  • Incremental mapping
  • Leverage lexicographical and structural similarity

Second version of IncMap

  • Consider typical design patterns
  • Leverage reasoning (open vs. closed-world)
  • Bootstrap mappings (fully automatic)

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Andriy Nikolov, Andreas Schwarte, Christian Heupel, Tim Kraska: IncMap: A Journey towards Ontology-based Data Integration. BTW 2017

slide-10
SLIDE 10

STEP 1: MAPPING TO INCGRAPHS

Person' ID' ...' Paper' ?tle' PersID' (FK)' ...'

Person' ref'

PersID'

Paper' ref' ?tle' val'

PersID' ID'

val' val' varchar' type'

Author' domain' writes' Paper' range' Class'

Object' Property'

type'

Datatype' Property'

hasTitle' domain' type' type' subClassOf' Person' type' Author' ref' writes' Paper' ref' hasTitle' val' Person' string' type'

subClassOf'

Relational Schema R Ontology O IncGraph(R) IncGraph(O)

Main Reason: Mitigate structural differences

slide-11
SLIDE 11

IncGraph(R)

STEP 2: REASONING AND PATTERNS

Person' ref'

PersID'

Paper' ref' ?tle' val'

PersID' ID'

val' val' varchar' type'

mul?Etype'

Author' ref' writes' Paper' ref' hasTitle' val' Person' string' type'

subClassOf'

Author' ref' writes' Paper' ref' hasTitle' val' Person' string' type'

subClassOf'

Pattern: Inheritance Reasoning

pid name e-mail area type 1 Lennon a@b

  • author

2 Harrison

  • Onto

reviewer Person

Person' ref'

PersID'

Paper' ref' ?tle' val'

PersID' ID'

val' val' varchar' type'

IncGraph+(R) IncGraph+(O) IncGraph(O)

slide-12
SLIDE 12

REASONING: TWO OPTIONS

Option 1: Full reasoning

  • 1. Reasoning on the base ontology using OWL QL
  • 2. Add all derivable elements to IncGraph(O)

Option 2: Custom reasoning (to close “modeling gaps”)

  • 1. Reasoning on the IncGraph(O)
  • Generalization hierarchies
  • Additional domain and range information
  • 2. Add selected elements to IncGraph(O) set weights (see next

slides)

slide-13
SLIDE 13

STEP 3: PAIRWISE MATCHING

Author' ref' writes' Paper' ref' val' …' Person' ref'

PersID'

Paper' ref' val' val' …' Target' Source' …' Possible' Matches'

Author' ref' writes' Paper' ref' Person'

PersID'

Paper' Author' ref' writes' Paper' ref' Paper'

PersID'

Person' Paper' ref' writes' Author' ref' Person'

PersID'

Paper' 1.0$ 0.1$ 0.2$ 0.1$ 0.1$ 0.5$ 0.2$ 0.5$ 0.2$

Person' ref'

PersID'

Paper' ref' ?tle' val'

PersID' ID'

val' val' varchar' type'

mul?Etype'

Author' ref' writes' Paper' ref' hasTitle' val' Person' string' type'

subClassOf'

Pairwise Connectivity Graph

slide-14
SLIDE 14

STEP 4: FIXPOINT COMPUTATION

  • Human Input (Acceptance and

Rejection of Mappings)

  • Weights for Patterns

(Probability of Pattern)

  • Deactivation of Edges

(based on Patterns)

Author' ref' writes' Paper' ref' Person'

PersID'

Paper' Author' ref' writes' Paper' ref' Paper'

PersID'

Person' Paper' ref' writes' Author' ref' Person'

PersID'

Paper' 1.0$ 0.1$ 0.2$ 0.1$ 0.1$ 0.5$ 0.2$ 0.5$ 0.2$

Pairwise Connectivity Graph Fixpoint Computation (Ext. Similarity Flooding) 0.7 0.5 0.9 0.3 0.3 0.3

Sub- class

0.9 1.0 1.0 1.0

Author' ref' writes' Paper' ref' hasTitle' val' Person' string' type'

subClassOf'

slide-15
SLIDE 15

EVALUATION: RODI BENCHMARK

Conference ontology 1

Target Ontologies (Schema)

Oil & gas ontology

Source Databases (Schema+Data)

CMT Variant CMT Canon. … Conf. Variant Conf. Canon. … Single, large real-world schema

Mapping Rules? Mapping Rules? Mapping Rules? …

Conference ontology 2 Mond. Variant Mond. Rel. …

Mapping Rules?

Geodata ontology

Variants:

  • 1. Adjusted Naming
  • 2. Structural Adjustments

(e.g., hierarchies)

  • 3. Removed foreign keys
  • 4. Merging / Splitting of tables
  • 5. Combined cases

SIGKDD Conference CMT

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Dominique Ritze, Martin G. Skjæveland, Alessandro Solimando, Evgeny Kharlamov: RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data

  • Integration. ESWC 2015

Real-World

https://github.com/chrpin/rodi

slide-16
SLIDE 16

EVALUATION: RODI BENCHMARK

Evaluation queries:

  • Queries simulate

information need

  • Can be additional

input for mapping

  • 56 queries from

simple to complex Metric: per-query F- measure

slide-17
SLIDE 17

EVALUATION: COMPETITORS

Relational-to-Ontology Mapping Systems

  • Ontop: http://ontop.inf.unibz.it (Free University of Bozen-

Bolzano)

  • Bootox: https://www.cs.ox.ac.uk/isg/tools/BootOX/

(University of Oxford) General Mapping Systems (Baseline)

  • COMA++: http://dbs.uni-leipzig.de/de/Research/coma.html

(University of Leipzig)

slide-18
SLIDE 18

EVALUATION: RESULTS

slide-19
SLIDE 19

EVALUATION: RESULTS

slide-20
SLIDE 20

CONCLUSIONS

  • Incremental Mapping Generation for Relational-to-

Ontology Mappings

  • Most benefits from domain knowledge (patterns,

reasoning)

  • Integrated into real-world platform at fluidOps
  • Possible future directions: Patterns, other graph

similarity metrics, …