schema matching in a large scale schema matching in a
play

Schema Matching in a Large Scale Schema Matching in a Large Scale - PowerPoint PPT Presentation

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying Marko Smiljani , Maurice van Keulen, Willem Jonker Dutch Dutch-Belgian Database Day Belgian Database Day -


  1. Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying Marko Smiljani � , Maurice van Keulen, Willem Jonker Dutch Dutch-Belgian Database Day Belgian Database Day - December 3, 2004 December 3, 2004 - Antwerp, Belgium Antwerp, Belgium

  2. in this talk in this talk • motivation motivation • personal schema based querying • understanding understanding • formalizing the schema matching problem • solving solving • clustering in schema matching • validating validating • semantic validation without semantics

  3. motivation motivation

  4. mediated schema mediated schema data //account[number=1234]/owner data data mediator

  5. personal schema personal schema data //account[number=1234]/owner PSQ data data PSQ – Personal Schema Based Query Answering System

  6. architecture architecture schemas schema loader schema repository ���������������� ��������������� �������������� ��� ��� ������� ��������������� select ���� �������� ��������������� ��������������� ������� ���������� ���������������� data

  7. Dé éj jà à Vu Vu D ���������������� �������������� ������� ��������������� ����������

  8. goals and issues goals and issues goals • efficiency of schema matching (time-to-last, time-to-first) • effectiveness of schema matching (precision/recall) issues • trees vs. graphs • the objective function

  9. understanding understanding

  10. schema matching schema matching hints

  11. formalism formalism constraint optimization problem constraint optimization problem well known framework, well known framework, offering a range of approaches for efficient problem solving offering a range of approaches for efficient problem solving

  12. formalism formalism correctness ranking

  13. finding a solution finding a solution

  14. the idea of clustering the idea of clustering distance based clustering

  15. why clustering? why clustering? • clusters can be ranked • search space is reduced

  16. clustering approaches (and issues) clustering approaches (and issues) • clustering method has to be scalable k-medoid • how to initialize • pre-computation of distance hand made linear-time clustering • make it intelligent, yet keep it close to linear-time

  17. validation validation

  18. validation paradox validation paradox s s e e a a r r c c h h s s p p a a c c e e P = T / A A H T R = T / H semantic validation • semantic validation • does not like large search spaces! does not like large search spaces! vs. . vs. clustering is only useful in large search spaces! • clustering is only useful in large search spaces! •

  19. estimating the precision and recall estimating the precision and recall • size based • order based

  20. size based quality estimation size based quality estimation g n i r e t s u l c o n B P = T / A A H T R = T / H g n i r e t s u l c s e y H R 12 = B / A T B

  21. size based quality estimation size based quality estimation NO CLUSTERING NO CLUSTERING CLUST. BEST CASE CLUST. BEST CASE B H B/A = 93% CLUST. WORST CASE CLUST. WORST CASE

  22. order based quality estimation order based quality estimation � ✁ ✄ ✝ ✞ ✠ ✂ ✟ ✳ g g g g ☎ ✡ n n n n i i i i r r r r e e ✆ e ☛ e t t t t s s s s u u u u l l l l c c c c s o s o e n e n ✎ y y ✏ ✧ ✑ ★ ✒ ✩ ✓ ✪ ✱ ✔ ✫ ✕ ✬ ✖ ✭ ✗ ✮ ✘ ✯ ✙ ✰ ✌ ✍ ✚ ✛ ✜ ✢ ✲ ✲ ✣ ✤ ✥ ✦ ☞

  23. order based quality estimation order based quality estimation NO CLUSTERING NO CLUSTERING CLUST. ALG 1 CLUST. ALG 1 CLUST. ALG 2 CLUST. ALG 2

  24. what comes next what comes next • add intelligence to clustering • impact of other hints on clustering • using graphs

  25. En dat was het dan! En dat was het dan! Vragen? Vragen?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend