PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal - - PowerPoint PPT Presentation

pabench designing a taxonomy and implementng a benchmark
SMART_READER_LITE
LIVE PREVIEW

PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal - - PowerPoint PPT Presentation

PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching B. Berjawi, F. Duchateau, F. Faveta, M. Miquel, R. Laurini GEOProcessing 2015 Lisbon, Portugal Motvaton Multplicaton of Points of Interest (POI) and data


slide-1
SLIDE 1

PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

  • B. Berjawi, F. Duchateau, F. Faveta, M. Miquel, R. Laurini

GEOProcessing 2015 Lisbon, Portugal

slide-2
SLIDE 2

Motvaton

Multplicaton of Points of Interest (POI) and data sources

Several Locaton-Based Services (LBS) providers Incomplete, inconsistent, inaccurate, wrong informaton

Integraton of multple sources

Similarity measures Probability measures Learning-based methods

How to evaluate and compare spatal integraton methods?

2

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching
slide-3
SLIDE 3

Related Work

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

3

Ontology matching:

Ontology Alignment Evaluaton Initatve (OAEI) [1]

Schema matching:

XBenchMatch [2] STBenchmark [3]

Entty matching:

EMBench [4]

[1]: Ontology Alignment Evaluation Initiative,” URL: http://oaei.ontologymatching.org [2]: F. Duchateau and Z. Bellahsene, “Designing a benchmark for the assessment of schema matching tools,” in Open Journal of Databases (OJDB), vol. 1, no. 1. RonPub, Germany, 2014, pp. 3–25. [3]: B. Alexe, W. C. Tan, and Y. Velegrakis, “Stbenchmark: towards a benchmark for mapping systems,” Proceedings of the VLDB, vol. 1,

  • no. 1, 2008, pp. 230–244.

[4]: E. Ioannou, N. Rassadko, and Y. Velegrakis, “On generating benchmark data for entity matching,” Journal on Data Semantics, vol. 2,

  • no. 1, 2013, pp. 37–56.
slide-4
SLIDE 4

Related Work

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

4

Spatal Entty matching:

Geoddupe [5] Random-spatal-dataset generator [6]

[5]: H. Kang, V. Sehgal, and L. Getoor, “Geoddupe: A novel interface for interactive entity resolution in geospatial data,” in International Conference on Information Visualisation, 2007, pp. 489–496. [6]: C. Beeri, Y. Doytsher, Y. Kanza, E. Safra, and Y. Sagiv, “Finding corresponding objects when integrating several geo-spatial datasets,” in ACM International Workshop on Geographic Information Systems, 2005, pp. 87–96.

Need for a spatial entity matching benchmark PABench: Point of Interest Alignment Benchmark

slide-5
SLIDE 5

Contributons

Taxonomy of LBS

Describe the context of LBS providers Compare the LBS providers Characterize the diferences that occur between LBS providers

Benchmark

Construct PABench based on the taxonomy characterizaton Generate a characterized training dataset using real data

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

5

slide-6
SLIDE 6

Outline

Introducton Related Work Taxonomy of LBS

Preliminary defnitons Diferences

Benchmark

Benchmark constructon Datasets

Conclusion and Future Work

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

6

slide-7
SLIDE 7

Taxonomy - Preliminary defnitons

POI: geographical object described by a set of propertes

POI = (name, type, coordinates, shape)

Schema of provider: structure of enttes ofered by the provider I: Internal identfer A: Primary terminological

L: Spatal atributes B: Secondary terminological

Entty of POI: instance of a schema and refers to one real- world POI

e = {(idk:label, idk:val), (LATITUDEk:label, LATITUDEk:val), … }

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

7

slide-8
SLIDE 8

Taxonomy - Preliminary defnitons

Associaton functon f: returns the POI described by a given entty Corresponding enttes: two enttes from two distnct providers refer to the same POI (e1  e2)

∃ p ∈ P \ f (e1) = f (e2) = p

Corresponding atributes: two atributes from two distnct schemas represent the same concept (at1  at2)

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

8

slide-9
SLIDE 9

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-10
SLIDE 10

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Entty x (ofered by provider 1) Entty y (ofered by provider 2) EnttyID: 51190385 id: fd0cf424bbd79bf28a832e1764f1c2 Lattude: 48,858606 Longitude: 2,293971 geometry: { locaton : { lat : 48.85837, lng: 2.294481}} DisplayName: Tour Eifel EnttyTypeID: 7999 name: Eifel Tower types: establishment Phone: 0892701239 CountryRegion: FRA Locality: Paris PostalCode: 75007 AddressLine: Champ De Mars, Avenue Anatole France ... formated phone number: +33892701239 website: htp://www.tour-eifel.fr formated address: Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Atribute Heterogeneity (ati  atj)  (at.label  atj.label  at.type  atj.type)

slide-11
SLIDE 11

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-12
SLIDE 12

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Entty x (ofered by provider 1) Entty y (ofered by provider 2) EnttyID: 51190385 id: fd0cf424bbd79bf28a832e1764f1c2 Lattude: 48,858606 Longitude: 2,293971 geometry: { locaton : { lat : 48.85837, lng: 2.294481}} DisplayName: Tour Eifel EnttyTypeID: 7999 name: Eifel Tower types: establishment Phone: 0892701239 CountryRegion: FRA Locality: Paris PostalCode: 75007 AddressLine: Champ De Mars, Avenue Anatole France ... formated phone number: +33892701239 website: htp://www.tour-eifel.fr formated address: Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Diferent Structure ati  (at1, at2, …)  (at1, at2, …)  atj

slide-13
SLIDE 13

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-14
SLIDE 14

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Entty x (ofered by provider 1) Entty y (ofered by provider 2) EnttyID: 51190385 id: fd0cf424bbd79bf28a832e1764f1c2 Lattude: 48,858606 Longitude: 2,293971 geometry: { locaton : { lat : 48.85837, lng: 2.294481}} DisplayName: Tour Eifel EnttyTypeID: Touristc place name: Eifel Tower types: Landmark - atracton Phone: 0892701239 CountryRegion: FRA Locality: Paris PostalCode: 75007 AddressLine: Champ De Mars, Avenue Anatole France ... formated phone number: +33892701239 website: htp://www.tour-eifel.fr formated address: Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Semantc and Syntactc Diferent Data ∃ ati ∈ A1 ∪ B1, ∃ atj ∈ A2 ∪ B2 \

e1  e2  (e1.at  e2.atj )  (e1.at.val  e2.atj.val )

slide-15
SLIDE 15

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-16
SLIDE 16

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Entty x (ofered by provider 1) Entty y (ofered by provider 2) EnttyID: 51190385 id: fd0cf424bbd79bf28a832e1764f1c2 Lattude: 48,858606 Longitude: 2,293971 geometry: { locaton : { lat : 48.85837, lng: 2.294481}} DisplayName: Tour Eifel EnttyTypeID: Touristc place name: Eifel Tower types: Landmark - atracton Phone: 0892701239 CountryRegion: FRA Locality: Paris PostalCode: 75007 AddressLine: Champ De Mars, Avenue Anatole France formated phone number: +33892701239 website: htp://www.tour-eifel.fr formated address: Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Missing Data ∃ ati ∈ A1 ∪ B1, ∃ atj ∈ A2 ∪ B2 \

(at  atj )  (e1.at.val = NULL  e2. atj.val = NULL )

slide-17
SLIDE 17

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-18
SLIDE 18

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Entty x (ofered by provider 1) Entty y (ofered by provider 2) EnttyID: 51190385 id: fd0cf424bbd79bf28a832e1764f1c2 Lattude: 48,858606 Longitude: 2,293971 geometry: { locaton : { lat : 48.85837, lng: 2.294481}} DisplayName: Tour Eifel EnttyTypeID: Touristc place name: Eifel Tower types: Landmark - atracton Phone: 0892701239 CountryRegion: FRA Locality: Paris PostalCode: 75007 AddressLine: Champ De Mars, Avenue Anatole France formated phone number: +33892701239 website: htp://www.tour-eifel.fr formated address: Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France

Diferent Locaton e1  e2  (e1.LATITUDE.val  e2.LATITUDE.val 

e1.LONGITUDE.val  e2.LONGITUDE.val)

slide-19
SLIDE 19

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-20
SLIDE 20

Taxonomy - Example

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

10

Equipollent Positons (e1  e2)  (e1.LATITUDE, e1.LONGITUDE)  p.coordinates  (e2.LATITUDE, e2.LONGITUDE)  p.coordinates  (e1.LONGITUDE.val 

e2.LONGITUDE.val)  (e1.LATITUDE.val  e2.LATITUDE.val)

slide-21
SLIDE 21

Taxonomy - Diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty

Matching

9

Category Diference Schema Atribute Heterogeneity Diferent structure Terminology Semantc Diferent Data (SEM) Syntactc Diferent Data (SYN) Missing Data (MD) Similar Data (SD) Spatal Diferent locatons (DL) Equipollent Positons (EP) Superpositon (SUP) Availability Not found POI Duplicate Enttes

Differences of corresponding entities Differences of non-corresponding entities

slide-22
SLIDE 22

Benchmark - Constructon

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

11

Level Atributes Set of possible diferences Spatal Locaton ,DL, EP Primary Terminological Name and Type , SEM, SYN, {SEM , SYN} Secondary Terminological Phone, Address, Site, etc. , MD, SEM, SYN, {SEM, SYN, MD}, {SEM, SYN}, {SEM, MD}, {SYN, MD}

Diferences concerning corresponding enttes: 96 (3x4x8) distnct situatons of diferences Generate a test case for each of the 96 situatons

Example of a situaton: s= {DL, {SEM,SYN}, MD} Test_case(s)= (Source dataset, Target dataset, Ground_truth)

Remaining diferences will be used to add noise

Superpositon, Similar Data, Not found POI

slide-23
SLIDE 23

Benchmark - Datasets

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

12

Create a characterized dataset – GeoBench tool [7]

[7]: G. Morana, T. Morel, B. Berjawi, and F. Duchateau, “GeoBench: a Geospatial Integration Tool for Building a Spatial Entity Matching Benchmark (Demo), “ in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, Texas, USA, 4-7 November, 2014, pp. 533-536. http://tinyurl.com/p3dbmpj

slide-24
SLIDE 24

Benchmark - Datasets

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

13

Test cases generator

http://tinyurl.com/nc4rurr

slide-25
SLIDE 25

Conclusion and Future Work

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

14

Contributon:

Taxonomy that describes LBS context Necessary specifcatons to design PABench GeoBench tool to create a characterized dataset and a test case generator

Future Work:

Extend PABench by adding more enttes Create a survey that compares and evaluates existng approaches using PABench Extend the taxonomy to cover complex objects

slide-26
SLIDE 26

PABench

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

15

Bilal Berjawi LIRIS, INSA de Lyon, France bberjawi@liris.cnrs.fr htp://unimap.liris.cnrs.fr

Thank you for your attention

slide-27
SLIDE 27

Benchmark – Datasets statstcs

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

27

Dataset Number of Enttes E1 846 E2 685 E3 314 Total 1845 Number of correspondences E1, E2 671 E1, E3 286 E2, E3 277 Total 1234 Situatons of diferences Number of correspondences {EP, SYN, {SYN, MD}} 147 {DL, SYN, {SYN, MD}} 93 {EP, SYN, SYN} 71 {∅, SYN, {SYN, MD}} 70 {EP, ∅, {SYN, MD}} 63

slide-28
SLIDE 28

Example of diferences

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

28 Attributes names Structure Legends Different values Missing values Positioning

slide-29
SLIDE 29

Benchmark - Datasets

  • B. Berjawi - PABench: Designing a Taxonomy and Implementng a Benchmark for Spatal Entty Matching

29

Statstcs and Test Cases generator

http://tinyurl.com/nc4rurr