Schema Matching in a Large Scale Schema Matching in a Large Scale - - PowerPoint PPT Presentation

schema matching in a large scale schema matching in a
SMART_READER_LITE
LIVE PREVIEW

Schema Matching in a Large Scale Schema Matching in a Large Scale - - PowerPoint PPT Presentation

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying Marko Smiljani , Maurice van Keulen, Willem Jonker Dutch Dutch-Belgian Database Day Belgian Database Day -


slide-1
SLIDE 1

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying

Marko Smiljani, Maurice van Keulen, Willem Jonker

Dutch Dutch-Belgian Database Day Belgian Database Day - December 3, 2004 December 3, 2004 - Antwerp, Belgium Antwerp, Belgium

slide-2
SLIDE 2

in this talk in this talk

  • motivation

motivation personal schema based querying

  • understanding

understanding formalizing the schema matching problem

  • solving

solving clustering in schema matching

  • validating

validating semantic validation without semantics

slide-3
SLIDE 3

motivation motivation

slide-4
SLIDE 4

data

mediated schema mediated schema

data data

//account[number=1234]/owner

mediator

slide-5
SLIDE 5

data

personal schema personal schema

data data

//account[number=1234]/owner

PSQ

PSQ – Personal Schema Based Query Answering System

slide-6
SLIDE 6
  • data

schema repository schema loader

  • select
  • architecture

architecture

schemas

slide-7
SLIDE 7
  • D

Dé éj jà à Vu Vu

slide-8
SLIDE 8

goals and issues goals and issues

goals

  • efficiency of schema matching

(time-to-last, time-to-first)

  • effectiveness of schema matching

(precision/recall) issues

  • trees vs. graphs
  • the objective function
slide-9
SLIDE 9

understanding understanding

slide-10
SLIDE 10

schema matching schema matching

hints

slide-11
SLIDE 11

well known framework, well known framework,

  • ffering a range of approaches for efficient problem solving
  • ffering a range of approaches for efficient problem solving

formalism formalism

constraint optimization problem constraint optimization problem

slide-12
SLIDE 12

formalism formalism

ranking correctness

slide-13
SLIDE 13

finding a solution finding a solution

slide-14
SLIDE 14

the idea of clustering the idea of clustering

distance based clustering

slide-15
SLIDE 15

why clustering? why clustering?

  • clusters can be ranked
  • search space is reduced
slide-16
SLIDE 16

clustering approaches (and issues) clustering approaches (and issues) k-medoid

  • how to initialize
  • pre-computation of distance

hand made linear-time clustering

  • make it intelligent,

yet keep it close to linear-time

  • clustering method has to be scalable
slide-17
SLIDE 17

validation validation

slide-18
SLIDE 18

validation paradox validation paradox

  • semantic validation

semantic validation does not like large search spaces! does not like large search spaces!

vs.

  • vs. .
  • clustering is only useful in large search spaces!

clustering is only useful in large search spaces!

H A T P = T / A R = T / H

s e a r c h s p a c e s e a r c h s p a c e

slide-19
SLIDE 19

estimating the precision and recall estimating the precision and recall

  • size based
  • order based
slide-20
SLIDE 20

H A T P = T / A R = T / H H

T

B

size based quality estimation size based quality estimation

R12 = B / A

n

  • c

l u s t e r i n g y e s c l u s t e r i n g

B

slide-21
SLIDE 21

size based quality estimation size based quality estimation

NO CLUSTERING NO CLUSTERING

  • CLUST. BEST CASE
  • CLUST. BEST CASE
  • CLUST. WORST CASE
  • CLUST. WORST CASE

B/A = 93% B

H

slide-22
SLIDE 22
  • rder based quality estimation
  • rder based quality estimation
✂ ✄ ☎ ✆ ✝ ✞ ✟ ✠ ✡ ☛ ☞ ✌ ✍ ✎ ✏ ✑ ✒ ✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✢ ✣ ✤ ✥ ✦ ✧ ★ ✩ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✱ ✲ ✲ ✳

n

  • c

l u s t e r i n g n

  • c

l u s t e r i n g y e s c l u s t e r i n g y e s c l u s t e r i n g

slide-23
SLIDE 23
  • rder based quality estimation
  • rder based quality estimation

NO CLUSTERING NO CLUSTERING

  • CLUST. ALG 1
  • CLUST. ALG 1
  • CLUST. ALG 2
  • CLUST. ALG 2
slide-24
SLIDE 24

what comes next what comes next

  • add intelligence to clustering
  • impact of other hints on clustering
  • using graphs
slide-25
SLIDE 25

En dat was het dan! En dat was het dan!

Vragen? Vragen?