KDI A Methodology for Data Integration Fausto Giunchiglia and - - PowerPoint PPT Presentation

kdi a methodology for data integration
SMART_READER_LITE
LIVE PREVIEW

KDI A Methodology for Data Integration Fausto Giunchiglia and - - PowerPoint PPT Presentation

KDI A Methodology for Data Integration Fausto Giunchiglia and Mattia Fumagallli University of Trento 1 Overview of the Model Generalized Queries Etypes Model Evaluation Case Studies 2 Overview of the Model Components of the Model


slide-1
SLIDE 1

Fausto Giunchiglia and Mattia Fumagallli

University of Trento

KDI A Methodology for Data Integration

1

slide-2
SLIDE 2

Overview of the Model Case Studies Etypes Model Generalized Queries Evaluation

2

slide-3
SLIDE 3

Overview of the Model

Components of the Model “Data wrangling”

3

slide-4
SLIDE 4

Generalized Query Standards Datasets Language Schema MODEL

Application

Components of the Model

4

KDI Methodology

slide-5
SLIDE 5

Generalized Query Standards Datasets Language Schema MODEL

Application

Components of the Model

5

KDI Methodology

Ontological principles

Open Street Map Open Data Trentino European Open Data Portal

E n It Hi

INSPIRE GTFS GQ 1,2, 3 -n SIRI

slide-6
SLIDE 6

Standard De facto Standard Dataset2 Dataset1 Technical Standard Relevant Application Pilot Reference Datasets Application 1 Application 2

“Data wrangling”

6

slide-7
SLIDE 7

Generalized Queries

Identify the Concepts Queries Collection Mechanism Application Scenario

7

slide-8
SLIDE 8

Tourism Transport Choose the application scenario

Application Scenario

8

slide-9
SLIDE 9

Start with a set of ground queries : Given the application scenario, a set of queries will arise which place demands on an underlying ontology.

  • Give a list all the Hotels in X City which has facility for disable ?
  • Identification of general query pattern

Give me all X in Y AND WHERE.property.True

  • Identification: Concepts and Properties

Entity: Hotel, City Property: Hotel.name, City.name, facilityForDisable. Boolean

Generalized Queries

9

slide-10
SLIDE 10

Driver

?

Location

House No

Wheelchair Accessibility

?

Hotel

Recipe

Restaurant

Movie

Agency

Cold

Mountain Weather

Train

Date

Address Country

Wi-Fi

Party

Dinner

Elevator

Price

Ticket

Road

Building Statue

Bus

Speed

Trip Identify all the core concepts which are needed to answer the generalized queries.

Identify the Concepts

10

slide-11
SLIDE 11

ØQuery generation methodology

  • 1. via a user study, for instance via questionnaires or focus group
  • 2. via a benchmarking analysis of existing sites and data
  • 3. heuristically based on the understanding of the domain developer
  • 4. from datasets – (see rapidminer tree example… see also

http://quepy.machinalis.com/)

  • 5. a combination of the above

Queries Collecting Mechanism

11

slide-12
SLIDE 12

EER Model

Schema Level Language Level

12

slide-13
SLIDE 13

Schema

Schema Level

13

slide-14
SLIDE 14

Date of construction

Status

Functioning AddreessCountry

Country Building

The Plaza

Hotel

Building

Attribute

1907

IS_A

ValueOf

Schema Example

Schema Level

14

slide-15
SLIDE 15

ER Model (example)

15

slide-16
SLIDE 16

Hotel Country

ER Model and Relational Database (example)

16

slide-17
SLIDE 17

Hotel Country

EER Model (example)

17

slide-18
SLIDE 18

Location

Wi-Fi

Hotel

Recipe

Restaurant

Movie

Agency

Cold

Mountain Weather

Train

Address Country

Dinner

Party

Elevator

Price

Ticket Road

Building Statue

Bus Trip

Physical Place Artifact Social Object Event Property Mental Object

House No

WheelchairAccess ibility

Schema Level

Alignment with Upper Ontology and Classification

18

slide-19
SLIDE 19

Ontology Design

Formal Modelling

Schema Level

19

slide-20
SLIDE 20

AddressCountry

Address

Wi-Fi

Yes No

Complex Simple

Issue_1: Attributes and DataProperties

Schema Level

20

slide-21
SLIDE 21

AddressCountry Hotel (The Plaza)

Country

(USA)

City (New York)

PartOf

Issue_2: Relation and ObjectProperties

Schema Level

21

slide-22
SLIDE 22

Language

Language

Language Level

22

slide-23
SLIDE 23

Language

Freeway Highway

expressway, freeway, motorway, pike, state highway, superhighway

Freeway a broad highway designed for high-speed traffic

En

17

IS_A

synonym hyponym

Language Level

23

slide-24
SLIDE 24

Evaluation

Inconsistency check Incompleteness check

24

slide-25
SLIDE 25
  • Inconsistency
  • circularity errors: [ex. Traveler subclassOf Person; Person

subClassOfTraveler; ]

  • semantic inconsistency errors: [ex. Airbus or Waterbus

subclassOf Bus]

  • partition errors: [ex. Non stop Flight SubClassOf

InternationalFlight and DomesticFlight where International and Domestic flight are disjoint]

  • Incompleteness: On traveling domain, if we classify only beach

and mountain location, and we do not consider cultural heritage site

  • Redundancy
  • Identical formal definition of some class
  • Identical formal definition of instances

Evaluation of Ontological Model

25

slide-26
SLIDE 26

Case Studies

Evaluation of Methodology Result

26

slide-27
SLIDE 27

Case Studies Real Estate Where to eat in Trento

Transport In London

Tourism in Trento

Emergency Response In London

Event in Trento

Geospatial

Topics

Case Studies (example)

27

slide-28
SLIDE 28
  • Technique
  • Used standard Human Computer Interaction (HCI) technique
  • Open Ended questions mixed with Likert scale closed questions
  • How: Balanced Questioners
  • Number of participant: 18
  • Participants Information
  • Nationality: Italian, Indian, Germany, Brazil, Ukraine, Ethiopia,

Mexico, Uganda, Cameroon

  • Gender: Male 13 Female 5
  • Age Range: 18-25 (14), 26-30 (4)
  • Level of education: Undergraduate (3) Postgraduate (15)

Case Studies

Evaluation of Methodology

28

slide-29
SLIDE 29

Perspicuity: How easy it is to get familiar with the methodology Efficiency: How effectively user can perform the process Dependability: Can user control the process Stimulation: Is it exciting and motivating Novelty: Is it innovative and creative

Evaluation of Methodology

Case Studies

29

slide-30
SLIDE 30

Pros

  • Well Structured
  • programmatically durable
  • It practically allows describe

the world

  • Provides methods to minimize

the distance between the real world and the abstraction

  • Helps finding out eventual

defects of the ontology and helps correcting them : taxonomic errors, inconsistencies, reliability Cons

  • You need many practice to

build something very well

  • Needs more time to master
  • difficult to identify class for to

align with top level

  • Necessary to write

documentation to clarify choices and terms

  • Formalizing DERA to DL

Results

Case Studies

30

slide-31
SLIDE 31

q Data on the Web Best Practices W3C Recommendation 31 January 2017 https://www.w3.org/TR/dwbp/ q Das, S., & Giunchiglia, F. (2016, October). GeoEtypes: Harmonizing Diversity in Geospatial Data (Short Paper). In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 643-653). Springer International Publishing. q Hlomani, H., & Stacey, D. (2014). Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal, 1-5. q Giunchiglia, F., & Dutta, B. (2011). DERA: A FACETED KNOWLEDGE ORGANIZATION FRAMEWORK. q Guarino, N., & Welty, C. A. (2009). An overview of OntoClean. In Handbook on ontologies (pp. 201-220). Springer Berlin Heidelberg. q Gomez-Perez, A., Fernández-López, M., & Corcho, O. (2006). Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic

  • Web. Springer Science & Business Media.

Reference

31