' $ Y our Mediators Need Data Con v ersion! SIGMOD'98, - - PowerPoint PPT Presentation

y our mediators need data con v ersion sigmod 98 seattle
SMART_READER_LITE
LIVE PREVIEW

' $ Y our Mediators Need Data Con v ersion! SIGMOD'98, - - PowerPoint PPT Presentation

' $ Y our Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington Y our Mediators Need Data Con v ersion! Sophie Cluet, Claude Delob el, J er^ ome Sim eon and Katarzyna Smaga V erso Datab ase


slide-1
SLIDE 1 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y
  • ur
Mediators Need Data Con v ersion! Sophie Cluet, Claude Delob el, J
  • er^
  • me
Sim
  • eon
and Katarzyna Smaga V erso Datab ase Gr
  • up
LRI INRIA R
  • c
quenc
  • urt
Universit
  • e
Paris XI, Orsay Y A T T eam 1
slide-2
SLIDE 2 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Con text More and more applications require:
  • to
use the same data in dieren t formats (e.g. rtf2L A T E X, L A T E X2HTML etc);
  • to
in tegrate existing data from legacy sources (with W rapp ers and Mediators);
  • to
  • er
W eb in terfaces (e.g. O 2 W eb, Strudel, Araneus etc). = ) all this requires con v ersion capabilities b et w een heterogeneous sources. Y A T T eam 2
slide-3
SLIDE 3 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Con text Abiteb
  • ulCluetMilo
(ICDT'97) presen t a language to dene corresp
  • ndences
and translations. But
  • n
real application (e.g. O 2 W eb) w e w an t:
  • p
  • w
erful manipulation
  • f
collections (lists, : : : );
  • adaptable
t yping;
  • customization!
= ) W e prop
  • se
a complete system to design and implemen t con v ersion-based soft w are. Y A T T eam 3
slide-4
SLIDE 4 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Outline
  • The
Y A T System for data con v ersion . Y A T data mo del . A Con v ersion Language: Y A TL . Reusing Y A TL programs . Protot yp e
  • Conclusion
and future w
  • rk
Y A T T eam 4
slide-5
SLIDE 5 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Scenario

XML files

YATL translation rel+XML −> odmg

YAT patterns (middleware) RDBMS

YAT/relationnal import wrapper YAT/XML import wrapper

HTML files

YAT/HTML export wrapper

OODBMS

YAT/ODMG import/export wrapper YATL translation

  • dmg −> html

(1) (2)

Y A T T eam 5
slide-6
SLIDE 6 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Outline
  • The
Y A T System for data con v ersion I Y A T data mo del . A Con v ersion Language: Y A TL . Reusing Y A TL programs . Protot yp e
  • Conclusion
and future w
  • rk
Y A T T eam 6
slide-7
SLIDE 7 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % The Y A T mo del
  • semistructured
data mo del lik e in [A CM97 ]:
  • rdered
lab eled trees, with names (= iden tit y) that w e call p atterns.
  • new
constructs to allo w represen tation
  • f
abstract lev els: { lab els for t yp es; { edges' annotations to describ e m ultiple c hildren; { union.
  • instantiation
mec hanism to describ e the link b et w een dieren t represen tation lev els. = ) a semistructured data mo del with exible t yping. Y A T T eam 7
slide-8
SLIDE 8 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Example 1: F rom general to sp ecic

Yat:

Any Any

Yat &Yat The YAT model

*

Ptype : Ptype

tuple

Ptype &Pclass Pclass :

class

The ODMG model

Symbol

Ptype

* * *

&s2

name desc "Golf" "The 1995 Volkswagen Golf GTI ..."

c1:

set

&s1

car suppliers class

The Golf GTI model &Psup Pcar :

name set car desc suppliers class

The Car Schema model .........

*

{set, list, bag, array} Int v String String String ........

Symbol Symbol

Y A T T eam 8
slide-9
SLIDE 9 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Example 2: Mixing structure and less structure

The XML publication model Pbr:

*

Pfield

Symbol String

Pfield:

Symbol

*

Pfield A subset of the HTML model

html head body title String

HtmlPage: HtmlElement

*

ul li String

HtmlElement

a href

* *

HtmlElement: &HtmlPage

h1 String content String Int String brochure title name model suppliers String desc Int number supplier address String String

*

Y A T T eam 9
slide-10
SLIDE 10 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Adv an tages
  • Allo
ws t yp e-c hec king;
  • Allo
ws to write programs at v arious lev els
  • f
represen tation;
  • Base
  • f
the customization pro cess;
  • allo
ws to coheren tly com bine (in parallel)
  • r
comp
  • se
(in sequence) con v ersion programs. Y A T T eam 10
slide-11
SLIDE 11 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Outline
  • The
Y A T System for data con v ersion . Y A T data mo del I A Con v ersion Language: Y A TL . Reusing Y A TL programs . Protot yp e
  • Conclusion
and future w
  • rk
Y A T T eam 11
slide-12
SLIDE 12 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % The Y A T con v ersion language (Y A TL)
  • rule-based,
declarativ e language with pattern-matc hing;
  • explicit
Sk
  • lem
functions (for id creation);
  • managemen
t
  • f
collections t yp es (set, bag, list);
  • abilit
y to represen t rules graphically;
  • mak
es use
  • f
the Y A T mo del t yping abilities (through t yp e c hec king and sev eral customization mec hanisms). Y A T T eam 12
slide-13
SLIDE 13 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y A TL examples (1) F r
  • m
XML br
  • chur
es to c ar
  • bje
cts P br : br
  • chur
eh ! number ! N um; P car (P br ) : ! titl e ! T ; cl ass ! car h ! name ! N ; ! model ! Y ear ; ! desc ! D ; ( = ! desc ! D ; ! suppl ier s ! suppl s ? ! suppl ier ! set fg ! &P sup (S N )i h ! name ! S N ; ! addr ess ! Addi; ? ! P f ieldi; Y ear > 1980, N is getname(T ) Y A T T eam 13
slide-14
SLIDE 14 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y A TL examples (2)

1995

b1:

"Golf" brochure title name model suppliers "The Golf ..." desc 1 number supplier address "VW Center" "Bd Lenoir, 75005 Paris" 1997

b2:

"Golf GTI" brochure title name model suppliers "The new Golf GTI..." desc 2 number supplier address "VW2" "Bd Leblanc, 78000 Versailles" "TV" name address "VW Center" "Bd Lenoir, 75005 Paris"

Input Brochures Output Cars c2:

class suppliers name name desc

c1:

car class "Golf" car desc "Golf GTI" "The Golf ..." "The new Golf GTI..." suppliers set set

&s1 &s2 &s1

  • ption

supplier

  • ptions

fun security "radio" "ABS"

Y A T T eam 14
slide-15
SLIDE 15 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y A TL examples: Collection managemen t Matrix T r ansp
  • sition
(r e quir es
  • r
dering and gr
  • uping)
N ew (I d) : M at [] J ! Y [] I ! X ! A ( = I d : M at I ! X J ! Y ! A

sales nice paris 100 110 300 330 sales 100 110 330 22 300 Polo 20 Polo Golf 22 20 Polo Golf Polo Golf Golf caen nice paris caen nice paris caen

Y A T T eam 15
slide-16
SLIDE 16 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % T yp e c hec king p
  • ssibilities
in Y A TL A con v ersion program has input/output mo dels. It is p
  • ssible
to statically infer the input/output mo dels
  • f
a con v ersion program. Sev eral w a ys to use this information:
  • c
hec k generated data with regards to
  • utput
wrapp er(s);
  • v
erify the coherence when com bining/comp
  • sing
programs;
  • turning
t yp e/c hec king
  • n/o
at run time. Y A T T eam 16
slide-17
SLIDE 17 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Outline
  • The
Y A T System for data con v ersion . Y A T data mo del . A Con v ersion Language: Y A TL I Reusing Y A TL programs . Protot yp e
  • Conclusion
and future w
  • rk
Y A T T eam 17
slide-18
SLIDE 18 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Customization b y instan tiation

Input Model Output Model Conversion program ODMG schema HTML ODMG model

prg

HTML

prg’ (equivalent)

Model Instantiation Program Instantiation Model Instantiation

Y A T T eam 18
slide-19
SLIDE 19 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Com bining programs Com bination allo ws to use the generic program and the sp ecic
  • ne
at the same time.

Input Model Output Model Conversion program ODMG schema HTML ODMG model

prg1

HTML

prg2

ODMG HTML

prg3 = prg1 + prg2 (= if prg2 then prg2 else prg1)

Y A T T eam 19
slide-20
SLIDE 20 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Comp
  • sing
programs One can comp
  • se
t w
  • existing
programs in to a new
  • ne
whic h is equiv alen t.

Input Model Output Model Conversion program ODMG model XML & relational

prg1

HTML

prg2

HTML

prg3 = prg2 o prg1 (= prg2(prg1))

ODMG schema XML & relational

Y A T T eam 20
slide-21
SLIDE 21 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Outline
  • The
Y A T System for data con v ersion . Y A T data mo del . A Con v ersion Language: Y A TL . Reusing Y A TL programs I Protot yp e
  • Conclusion
and future w
  • rk
Y A T T eam 21
slide-22
SLIDE 22 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y A T System Protot yp e An en vironmen t to sp ecify and run data con v ersion comp
  • nen
ts, written in Ob jectiv e CAML [Ler97 ].
  • core
mo dules: Y A T mo del manager, Y A TL parser, Y A TL in terpreter, external functions and predicate ev aluation, t yp e c hec k er.
  • W
rapp ers for O 2 , HTML, B i b T E X, Unix le system, gn uplot etc.
  • graphical
in terface for Y A TL. Y A T T eam 22
slide-23
SLIDE 23 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Y A T System Applications
  • OP
AL Europ ean pro ject (with Bosc h & P eugeot). proprietary
  • b
ject mo del 7! HTML.
  • O
2 W eb re-implemen tation. ODMG 7! HTML.
  • V
erso publication W eb site. XML and ODMG 7! HTML. http://www-rocq.i nri a. fr/ ve rso /p ubl ic ati
  • n
s/
  • exp
  • rting
O 2 statistical results to gnuplot. ODMG 7! gn uplot les. Y A T T eam 23
slide-24
SLIDE 24 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % P ersp ectiv es and future w
  • rk
  • use
  • f
the Y A T system for data in tegration: = ) problems related to query pro cessing, view main tenance etc.
  • public
release
  • f
Y A T. Y A T homepage: http://www-rocq.i nr ia. fr /~s im eon /Y AT/ Y A T T eam 24
slide-25
SLIDE 25 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % References [A CM97] S. Abiteb
  • ul,
S. Cluet, and T. Milo. Corresp
  • ndence
and translation for heterogeneous data. In Pr
  • c
e e dings
  • f
the International Confer enc e
  • n
Datab ase The
  • ry
(ICDT), Delphi, Greece, Jan uary 1997. [A QM + 97] S. Abiteb
  • ul,
D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The lorel query language for semistructured data. International Journal
  • n
Digital Libr aries, 1(1):68{88, April 1997. [BDHS96] P . Buneman, S. B. Da vidson, G. Hillebrand, and D. Suciu. A query language and
  • ptimization
tec hniques for unstructured data. In Pr
  • c
e e dings
  • f
the A CM SIGMOD Confer enc e
  • n
Management
  • f
Data, pages 505{516, Mon treal, Canada, June 1996. [dSAD94] C. Souza dos San tos, S. Abiteb
  • ul,
and C. Delob el. Virtual sc hemas and bases. In Pr
  • c.
  • f
the Int. Conf.
  • n
Extending Datab ase T e chnolo gy, Cam bridge, Marc h 1994. [FFK + 98] M. F ernandez, D. Florescu, J. Kang, A. Y. Levy , and D. Suciu. Catc hing the Boat with Strudel: Exp eriences with a Web-Site Managemen t System. In Pr
  • c
e e dings
  • f
the A CM SIGMOD Confer enc e
  • n
Management
  • f
Data, Seattle, W ashington, June 1998. Y A T T eam 25
slide-26
SLIDE 26 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % [Kos96] A. S. Kosky . T r ansforming Datab ases with R e cursive Data Structur es. PhD thesis, Univ ersit y
  • f
P ennsylv ania, 1996. [Ler97] X. Lero y . The Obje ctive Caml system r ele ase 1.07. INRIA, Decem b er 1997. Do cumen tation and user's man ual. ftp://ftp.inria.fr/lang/caml-light/. [P A GM96] Y. P apak
  • nstan
tinou, S. Abiteb
  • ul,
and H. Garcia-Molina. Ob ject fusion in mediator systems. In Pr
  • c
e e dings
  • f
the International Confer enc e
  • n
V ery L ar ge Data Bases (VLDB), pages 413{424, Bom ba y , India, Septem b er 1996. Y A T T eam 26
slide-27
SLIDE 27 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Related w
  • rk
Flexibility: Strudel [FFK + 98], query languages lik e Lorel [A QM + 97], UnQL [BDHS96 ], etc. Missing p
  • w
erful restructuration primitiv es, no supp
  • rt
for lists. Mor e c
  • nversion
p
  • wer:
W OL [Kos96], Languages for Views [dSAD94 ]. But using sc hema-driv en approac h. With b
  • th:
MSL [P A GM96], missing : con v ersion
  • f
recursiv e structures and supp
  • rt
for lists. A dditional functionalities: Graphical abilities, v erication (t yp e c hec king), customization mec hanisms, go
  • d
supp
  • rt
for collection t yp es. Y A T T eam 27
slide-28
SLIDE 28 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Customization F r
  • m
ODMG to HTML P ag e(P class) : P class : html h ! head ! titl e ! N ame; cl ass ! N ame ? ! Att ! P ty pe, ! body h ! h1 ! N ame; ( =Att2 is concat(Att; " : ") ! ul ? ! il h ! Att2; ! E lem(P ty pe)iii E lem(P ty pe) : P ty pe : ul ? ! il h ! Att2, ( = tupl e ? ! Att ! P 2 ! E lem(P 2 )i Att2 is concat(Att; " : " ) E lem(P ty pe) : P ty pe : &P class, ah ! hr ef ! &P ag e(P class); ( = P class : ! cont ! N amei cl ass ! N ame ! P Y A T T eam 28
slide-29
SLIDE 29 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Customization (2) Using instan tiation
  • n
P car : P ag e(P car ) : P car : html h ! head ! titl e ! car ; cl ass ! car ! tupl e ! body h ! h1 ! car ; h ! name ! T ; ! ul h ! il h ! "name : "; ! desc ! D ; ! T 1i; ! suppl ier s ! il h ! "age : "; ( = ! set ? ! &P sup i; ! D 1i; P sup : ! il h ! "suppliers : "; cl ass ! N ame ! P 2 ; ! ul ? ! il ! a T 1 is data to string(T ), h ! hr ef ! &P ag e(P sup); D 1 is data to string(D ) ! cont ! N ameiiiii Y A T T eam 29
slide-30
SLIDE 30 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Customization (3) Changing the rule b y hand (example): P ag e(P car ) : P car : html h ! head ! titl e ! car ; cl ass ! car ! tupl e ! body h ! h1 ! car ; h ! name ! T ; ! ul h ! il h ! "name : "; ( = ! desc ! D ; ! T 1i; ! suppl ier s ! il h ! "age : " ; ! set ? ! &P sup i; ! D 1iiii T 1 is data to string(T ), D 1 is data to string(D ) Y A T T eam 30
slide-31
SLIDE 31 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Com bining programs Once a con v ersion program has b een instan tiated, com bination allo ws to use the generic program and the sp ecic
  • ne
at the same time. On a giv en input pattern matc hing the t w
  • programs,
the system m ust decide whic h
  • ne
to use.
  • This
can b e done automatically for programs whose input mo dels are instance from
  • ne
another (e.g.
  • dmg
and cars) and co de for the same
  • utput.
  • The
user can sp ecify a hierarc h y
  • f
rules. The more sp ecic is tried rst. (= ) declarativit y vs functionalit y issue). Y A T T eam 31
slide-32
SLIDE 32 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Comp
  • sing
programs Last feature allo ws to comp
  • se
t w
  • existing
programs, and write a new
  • ne
whic h is equiv alen t. T aking t w
  • con
v ersion programs: pr g 1 : M 1 7! M 2 and pr g 2 : M 3 7! M 4 .
  • ne
w
  • uld
lik e to create a program doing (pr g 2
  • pr
g 1) : M 1 7! M 4 . The Y A T system allo ws to compute (pr g 2
  • pr
g 1), if M 2 is an instance
  • f
M 3 . Y A T T eam 32
slide-33
SLIDE 33 Y
  • ur
Mediators Need Data Con v ersion! SIGMOD'98, Seattle, W ashington ' & $ % Comp
  • sing
programs(2) e.g. pr g 1 : X M L 7! C ar s and pr g 2 : O D M G 7! H T M L then (pr g 2
  • pr
g 1) : X M L 7! H T M L directly . P ag e(P car ) : P br : html h ! head ! titl e ! car ; br
  • chur
eh ! number ! N um; ! body h ! h1 ! car ; ! titl e ! T ; ! ul h ! il h ! "name : " ; ! model ! Y ear ; ! T 1i; ! desc ! D ; ! il h ! "age : "; ( = ! suppl s ? ! suppl ier ! D 1iiii h ! name ! S N ; ! addr ess ! Addii; T 1 is data to string(T ), D 1 is data to string(D ) Y A T T eam 33