Age nda End of Que r y Opt i m i z a t i on Que s t - - PDF document

age nda end of que r y opt i m i z a t i on
SMART_READER_LITE
LIVE PREVIEW

Age nda End of Que r y Opt i m i z a t i on Que s t - - PDF document

Age nda End of Que r y Opt i m i z a t i on Que s t i ons ? Da t a I nt e gr a t i on Fi ni s h l a s t bi t s of que r y opt i m i z a t i on Da t a i nt e gr a t i on: t


slide-1
SLIDE 1

1

End

  • f

Que r y Opt i m i z a t i

  • n

Da t a I nt e gr a t i

  • n

M ay 24, 2004 Age nda

  • Que

s t i

  • ns

?

  • Fi

ni s h l a s t bi t s

  • f

que r y

  • pt

i m i z a t i

  • n
  • Da

t a i nt e gr a t i

  • n:

t he l a s t f r

  • nt

i e r

Que r y Exe c ut i

  • n

Que r y c

  • m pi

l e r Exe c ut i

  • n

e ngi ne I nde x/ r e c

  • r

d m gr . Buf f e r m a na ge r St

  • r

a ge m a na ge r s t

  • r

a ge Us e r / Appl i c a t i

  • n

Que r y upda t e Que r y e xe c ut i

  • n

pl a n Re c

  • r

d, i nde x r e que s t s Pa ge c

  • m m a

nds Re a d/ wr i t e pa ge s

Que r y Exe c ut i

  • n

Pl a ns

Pur chase Per son

Buyer =nam e Ci t y=‘ seat t l e’ phone>’ 5430000’ buyer

( Si m pl e Nest ed Loops)

SELECT S.sname FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’

s

Que r y Pl a n:

  • l
  • gi

c a l t r e e

  • i

m pl e m e nt a t i

  • n

c hoi c e a t e ve r y node

  • s

c he dul i ng

  • f
  • pe

r a t i

  • ns

.

( Tabl e scan) ( I ndex scan)

Som e

  • per

at

  • r

s ar e f r

  • m

r el at i

  • nal

al gebr a, and

  • t

her s ( e. g. , scan, gr

  • up)

ar e not .

W e ’ ve Se e n So Fa r

  • Tr

a ns f

  • r

m a t i

  • n

r ul e s

  • The

c

  • s

t m odul e :

– Gi ve n a c a ndi da t e pl a n: wha t i s i t s e xpe c t e d c

  • s

t and s i ze

  • f

t he r e s ul t ?

  • Now:

put t i ng i t a l l t

  • ge

t he r .

Pl a ns f

  • r

Si ngl e

  • Re

l a t i

  • n

Que r i e s ( Pr e p f

  • r

J

  • i

n

  • r

de r i ng)

  • Tas

k:c r e a t e a que r y e xe c ut i

  • n

pl a n f

  • r

a s i ngl e Se l e c t

  • pr
  • j

e c t

  • gr
  • up-

by bl

  • c

k.

  • K e

y i de a:c

  • ns

i de r e a c h pos s i bl e ac c e s s pat h t

  • t

he r e l e va nt t upl e s

  • f

t he r e l a t i

  • n.

Choos e t he c he a pe s t

  • ne

.

  • The

di f f e r e nt

  • pe

r a t i

  • ns

a r e e s s e nt i a l l y c a r r i e d

  • ut

t

  • ge

t he r ( e . g. , i f a n i nde x i s us e d f

  • r

a s e l e c t i

  • n,

pr

  • j

e c t i

  • n

i s done f

  • r

e a c h r e t r i e ve d t upl e , a nd t he r e s ul t i ng t upl e s a r e pi pe l i ne d i nt

  • t

he a ggr ega t e c

  • m put

a t i

  • n)

.

slide-2
SLIDE 2

2

Exa m pl e

  • I

f we ha ve a n I nde x

  • n

r at i ng:

– ( 1/ NKe ys ( I ) ) *NTupl e s ( R) = ( 1/ 10) * 40000 t upl e s r e t r i e ve d. – Cl us t e r e d i nde x: ( 1/ NKe ys ( I ) ) * ( NPa ge s ( I ) +NPa ge s ( R) ) = ( 1/ 10) * ( 50+500) pa ge s a r e r e t r i e ve d ( = 55) . – Unc l us t e r e d i nde x: ( 1/ NKeys ( I ) ) * ( NPa ge s ( I ) +NTupl e s ( R) ) = ( 1/ 10) * ( 50+40000) pa ge s a r e r e t r i e ve d.

  • I

f we ha ve a n i nde x

  • ns

i d:

– W oul d ha ve t

  • r

e t r i e ve a l l t upl e s / pa ge s . W i t h a c l us t e r e d i nde x, t he c

  • s

t i s 50+500.

  • Doi

ng a f i l e s c a n: we r e t r i e ve a l l f i l e pa ge s( 500) .

SELECT S.sid FROM Sailors S WHERE S.rating=8

De t e r m i ni ng J

  • i

n Or de r i ng

  • R1

R2 … . Rn

  • J
  • i

n t r e e :

  • A

j

  • i

n t r e e r e pr e s e nt s a pl a n. An

  • pt

i m i z e r ne e ds t

  • i

ns pe c t m a ny ( a l l ? ) j

  • i

n t r e e s R3 R1 R2 R4

Type s

  • f

J

  • i

n Tr e e s

  • Le

f t de e p:

R3 R1 R5 R2 R4

Type s

  • f

J

  • i

n Tr e e s

  • Bus

hy:

R3 R1 R2 R4 R5

Type s

  • f

J

  • i

n Tr e e s

  • Ri

ght de e p:

R3 R1 R5 R2 R4

Pr

  • bl

e m

  • Gi

ve n: a que r y R1 R2 … Rn

  • As

s um e we ha ve a f unc t i

  • n

c

  • s

t ( ) t ha t gi ves us t he c

  • s

t

  • f

eve r y j

  • i

n t r e e

  • Fi

nd t he be s t j

  • i

n t r e e f

  • r

t he que r y

slide-3
SLIDE 3

3

J

  • i

n Or de r i ng by Dyna m i c Pr

  • gr

a m m i ng

  • I

de a : f

  • r

e a c h s ubs e t

  • f

{R1, … , Rn}, c

  • m put

e t he be s t pl a n f

  • r

t ha t s ubs e t

  • I

n i nc r e a s i ng

  • r

de r

  • f

s e t c a r di na l i t y:

– St e p 1: f

  • r

{R1}, {R2}, … , {Rn} – St e p 2: f

  • r

{R1, R2}, {R1, R3}, … , {Rn- 1, Rn} – … – St e p n: f

  • r

{R1, … , Rn}

  • A

s ubs e t

  • f

{R1, … , Rn} i s a l s

  • c

a l l e d a s ubque r y

Dynam i c Pr

  • gr

a m m i ng: s t e p 1

  • St

e p 1: For e a c h {Ri } do:

– Si ze ( {Ri }) = B( Ri ) – Pl a n( {Ri }) = Ri – Cos t ( {Ri }) = ( cos t

  • f

s ca nni ng Ri )

Dynam i c Pr

  • gr

a m m i ng: s t e p i :

  • St

e p i : For e a c h Q i n {R1, … ,Rn}

  • f

c a r di na l i t y i do:

– Com put e Si ze ( Q) – For e ve r y pa i r

  • f

s ubque r i e s Q’ , Q’ ’ s . t . Q = Q’ U Q’ ’ c

  • m put

e cos t ( Pl a n( Q’ ) Pl an( Q’ ’ ) ) – Cos t ( Q) = t he s m al l e s t s uc h cos t – Pl a n( Q) = t he cor r e s pondi ng pl a n

A f e w pr a c t i c a l c

  • ns

i de r a t i

  • ns
  • He

ur i s t i c s f

  • r

r e duc i ng t he s e a r c h s pa c e – Re s t r i c t t

  • l

ef t l i ne a r t r e e s – Re s t r i c t t

  • t

r e e s “wi t houtc a r t e s i a npr

  • duc

t ”

  • Ne

e d m or e t ha n j us t

  • ne

pl a n f

  • r

e a c h s ubque r y: – “i nt e r e s t i ng

  • r

de r s ” : s a ve a s i ngl e pl a n f

  • r

e ve r y pos s i bl e

  • r

de r i ng

  • f

t he r e s ul t . – W hy?

Que r y Opt i m i z a t i

  • n

Sum m a r y

  • Cr

e a t e i ni t i a l ( na ï ve ) que r y e xe c ut i

  • n

pl a n.

  • Appl

y t r a ns f

  • r

m a t i

  • n

r ul e s :

– Tr y t

  • un-

ne s t bl

  • cks

– M ove pr e di ca t e s a nd gr

  • upi

ng

  • pe

r a t

  • r

s .

  • Cons

i de r e a c h bl

  • c

k a t a t i m e :

– De t e r m i ne j

  • i

n

  • r

de r – Pus h s e l e c t i

  • ns

, pr

  • j

e c t i

  • ns

i f pos s i bl e .

Da t a I nt e gr a t i

  • n
slide-4
SLIDE 4

4

W ha t i s Da t a I nt e gr a t i

  • n
  • Pr
  • vi

di ng

– Uni f

  • r

m ( s am e que r y i nt e r f a ce t

  • al

l s

  • ur

ce s ) – Ac ce s s t

  • (

que r i e s ; e ve nt ua l l y upda t e s t

  • o)

– M ul t i pl e ( we wa nt m a ny, but 2 i s har d t

  • o)

– Aut

  • nom ous

( DBA doe s n’ t r e por t t

  • you)

– He t e r

  • ge

ne

  • us

( da t a m ode l s a r e di f f e r e nt ) – St r uc t ur e d (

  • r

a t l e as t s e m i

  • s

t r uc t ur e d) – Da t a Sour ce s ( not

  • nl

y da t a bas e s ) .

Revi ews Shi ppi ng O r der s I nv ent

  • r

y Books

m ybooks. com M edi at ed Schem a

W est . . . FedEx W AN al t . books. r evi ews I nt er net I nt er net I nt er net UPS East O r der s Cust

  • m er

Revi ews NYTi m es . . . M or gan- Kauf m an Pr ent i ce- Hal l

The Pr

  • bl

e m : Da t a I nt e gr a t i

  • n

Uni f

  • r

m que r y c a pa bi l i t y a c r

  • s

s aut

  • nomous

, he t e r

  • ge

ne

  • us

da t a s

  • ur

c e s

  • n

LAN, W AN,

  • r

I nt e r ne t

M ot i va t i

  • n(

s )

  • Ent

e r pr i s e da t a i nt e gr a t i

  • n;

we b- s i t e c

  • ns

t r uc t i

  • n.
  • W W W :

– Com pa r i s

  • n

s hoppi ng – Por t a l s i nt e gr a t i ng da t a f r

  • m

m ul t i pl e s

  • ur

c e s – B2B, e l e c t r

  • ni

c m a r ke t pl a c e s

  • Sc

i e nc e a nd c ul t ur e : – M e di c a l ge ne t i c s : i nt e gr a t i ng ge nom i c da t a – As t r

  • phys

i c s : m oni t

  • r

i ng e ve nt s i n t he s ky. – Envi r

  • nm e

nt : Puge t Sound Re gi

  • na

l Synt he s i s M ode l – Cul t ur e : uni f

  • r

m a c c e s s t

  • a

l l c ul t ur a l da t aba s e s pr

  • duc

e d by c

  • unt

r i e s i n Eur

  • pe

.

Di s c us s i

  • n
  • W hy

i s i t ha r d?

  • How

wi l l we s

  • l

ve i t ?

Cur r e nt Sol ut i

  • ns
  • M os

t l y a d- hoc pr

  • gr

am m i ng:c r e a t e a s pe c i a l s

  • l

ut i

  • n

f

  • r

e ve r y c a s e ; pay c

  • ns

ul t a nt s a l

  • t
  • f

m oney.

  • Da

t a wa r e hous i ng:l

  • a

d a l l t he da t a pe r i

  • di

c a l l y i nt

  • a

wa r e hous e .

– 6- 18 m ont hs l e a d t i m e – Se pa r a t e s

  • pe

r at i

  • nalDBM S

f r

  • m

de c i s i

  • n

s uppor tDBM S. ( not

  • nl

y a s

  • l

ut i

  • n

t

  • da

t a i nt e gr a t i

  • n)

. – Pe r f

  • r

m a nce i s good; da t a m ay not be f r e s h. – Ne e d t

  • cl

e an, s c r ub you da t a.

Da t a W a r e hous e Ar c hi t e c t ur e

Da t a s

  • ur

c e Da t a s

  • ur

c e Da t a s

  • ur

c e Re l a t i

  • na

l da t a ba s e ( wa r e hous e ) Us e r que r i e s Da t a e xt r a c t i

  • n

pr

  • gr

a m s Da t a c l e a ni ng/ s c r ubbi ng OLAP / De c i s i

  • n

s uppor t / Da t a c ube s / da t a m i ni ng

slide-5
SLIDE 5

5

The Vi r t ua l I nt e gr a t i

  • n

Ar c hi t e c t ur e

  • Le

ave t he da t a i n t he s

  • ur

c e s .

  • W he

n a que r y com es i n:

– De t e r m i ne t he r e l e va nt s

  • ur

ce s t

  • t

he que r y – Br e a k down t he que r y i nt

  • s

ub- que r i e s f

  • r

t he s

  • ur

ce s . – Ge t t he a ns we r s f r

  • m

t he s

  • ur

ce s , a nd com bi ne t he m a ppr

  • pr

i a t e l y.

  • Da

t a i s f r e s h.

  • Cha

l l e nge : pe r f

  • r

m a nc e .

Vi r t ua l I nt e gr a t i

  • n

Ar c hi t e c t ur e

Da t a s

  • ur

c e wr a ppe r Da t a s

  • ur

c e wr a ppe r Da t a s

  • ur

c e wr a ppe r Sour c e s c a n be :r e l a t i

  • na

l , hi e r a r c hi c a l ( I M S) , s t r uc t ur e f i l e s , we b s i t e s .

M e di a t

  • r

:

Us e r que r i e s M e di a t e d s c he m a Da t a s

  • ur

c e c a t a l

  • g

Re f

  • r

m ul a t i

  • n

e ngi ne

  • pt

i m i z e r Exe c ut i

  • n

e ngi ne

W hi c h dat a mode l ?

Da t a I nt e gr a t i

  • n:

Hi ghe r

  • l

e ve l Abs t r a c t i

  • n

Mediated Schema

Q

Q1 Q2 Q3

SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 D atabases fall CSE541 Operating systems w inter SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 Databases fall CSE541 O perating system s winter

… …

Se m a nt i c m a ppi ngs

Mediated Schema

OMIM Swiss- Prot HUGO GO Gene- Clinics Entrez Locus- Link GEO Entity Sequenceable Entity Gene Phenotype Structured Vocabulary Experiment Protein Nucleotide Sequence Microarray Experiment

Q ue r y: For t he m i c r

  • a

r r a y expe r i m e nt I j us t r a n, wha t a r e t he r e l a t e d nuc l e

  • t

i de s e que nc e s a nd f

  • r

wha t pr

  • t

e i n do t he y c

  • de

? www. bi

  • m e

di at

  • r

.

  • r

g www. bi

  • m e

di at

  • r

.

  • r

g Tar c z y Tar c z y-

  • H or

noc h, M or k H or noc h, M or k

Re s e a r c h Pr

  • j

e c t s

  • Ga

r l i c ( I BM ) ,

  • I

nf

  • r

m a t i

  • n

M a ni f

  • l

d( AT& T)

  • Ts

i m m i s , I nf

  • M a

s t e r ( St a nf

  • r

d)

  • The

I nt e r ne t Sof t bot / Ra zor / Tukwi l a ( UW )

  • He

r m e s ( M a r yl a nd)

  • DI

SCO, Agor a ( I NRI A, Fr a nce )

  • SI

M S/ Ar i a dne ( USC/ I SI )

  • M a

ny, m a ny m or e !

Se m a nt i c M a ppi ngs

BooksAndMusic

Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords

Books

Title ISBN Price DiscountPrice Edition

CDs

Album ASIN Price DiscountPrice Studio

BookCategories

ISBN Category

CDCategories

ASIN Category

Artists

ASIN ArtistName GroupName

Authors

ISBN FirstName LastName

Inventory Database A Inventory Database B

  • Di

f f e r e nc e s i n: – Na m e s i n s c he m a – At t r i but e gr

  • upi

ng – Cove r a ge

  • f

da t a ba s e s – Gr a nul a r i t y a nd f

  • r

m a t

  • f

a t t r i but e s

slide-6
SLIDE 6

6 I s s ue s f

  • r

Se m a nt i c M a ppi ngs

Mediated Schema

Q

Q’ Q’ Q’

SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 D atabases fall CSE541 Operating systems w inter SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … … SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 … CID Name Quarter CSE444 Databases fall CSE541 O perating system s winter

… …

Se m a nt i c m a ppi ngs For m al i s m f

  • r

m a ppi ngs Re f

  • r

m ul a t i

  • n

al gor i t hm s How wi l l we c r e a t e t he m ?

Beyond Da t a I nt e gr a t i

  • n
  • M e

di a t e d s c he m a i s a bot t l e ne c k f

  • r

l a r ge

  • s

c a l e da t a s ha r i ng

  • I

t ’ s ha r d t

  • c

r e a t e , m a i nt a i n, a nd a gr e e upon.

Pe e r Da t a M a na ge m e nt Sys t e m s

UW Stanford DBLP UBC Waterloo CiteSeer Toronto

Q Q1 Q2 Q6 Q5 Q4 Q3

  • M a

ppi ngs s pe c i f i e d l

  • c

al l y

  • M a

p t

  • m os

t c

  • nve

ni e nt node s

  • Que

r i e s a ns we r e d by t r a ve r s i ng s e mant i c pat hs . Pi a z z a :[ Ta t a r i nov, H. , I ve s , Suc i u, M or k]

PDM S- Re l a t e d Pr

  • j

e c t s

  • Hype

r i

  • n

( Tor

  • nt
  • )
  • Pe

e r DB ( Si nga por e )

  • Loc

a l r e l a t i

  • na

l m ode l s ( Tr e nt

  • )
  • Edut

e l l a ( Ha nnove r , Ge r m a ny)

  • Se

m a nt i c Gos s i pi ng ( EPFL Zur i c h)

  • Ra

c c

  • on

( UC I r vi ne )

  • Or

c he s t r a ( I ve s , U. Pe nn)

A Fe w Com m e nt s a bout Com m e r c e

  • Unt

i l 5 ye a r s a go:

– Da t a i nt e gr a t i

  • n

= Da t a wa r e hous i ng.

  • Si

nc e t he n:

– A wa ve

  • f

s t a r t ups :

  • Ni

mbl e , M e t aM at r i x, Cal i xa, Com pos i t e , Enos ys

– Bi g guys m a de a nnounc e m e nt s ( I BM , BEA) . – [ De l a y] Bi g guys r e l e a s e d pr

  • duc

t s .

  • Suc

c e s s : a na l ys t s ha ve ne w buz z wor d – EI I

– Ne w a ddi t i

  • n

t

  • a

c r

  • nym

s

  • up

( wi t h EAI ) .

  • Le

s s

  • ns

:

– Pe r f

  • r

m a nc e wa s f i ne . Ne e d m a na ge m e nt t

  • ol

s .

Da t a I nt e gr a t i

  • n:

Be f

  • r

e

Mediated Schema Source Source Source Source Source

Q

Q’ Q’ Q’ Q’ Q’

slide-7
SLIDE 7

7

XM L Q uer y

User Appl i cat i

  • ns

Lens™ Fi l e I nf

  • Br
  • wser

™ Sof t w ar e Devel

  • per

s Ki t

NI M BLE™ API s Fr

  • nt
  • End

XM L

Lens Bui l der ™ M anagem ent Tool s I nt egr at i

  • n

Bui l der

Secur i t y Tool s

Dat a Adm i ni st r at

  • r

Da t a I nt e gr a t i

  • n:

Af t e r

Concor dance Devel

  • per

I nt egr at i

  • n

Layer

Ni m bl e I nt egr at i

  • n

Engi ne™

Com pi l er Ex ecut

  • r

M et adat a Ser ver

Cache Rel at i

  • nal Dat

a W ar ehouse/ M ar t Legacy Fl at Fi l e W eb Pages Com m on XM L Vi ew

Sound Bus i ne s s M ode l s

  • Expl
  • s

i

  • n
  • f

i nt r a ne t a nd e xt r a ne t i nf

  • r

m a t i

  • n
  • 80%
  • f

c

  • r

por a t e i nf

  • r

m a t i

  • n

i s unm a na ge d

  • By

2004 30X m or e e nt e r pr i s e da t a t ha n 1999

  • The

a ve r a ge c

  • m pa

ny: – m a i nt a i ns 49 di s t i nc t e nt e r pr i s e a ppl i c a t i

  • ns

– s pe nds 35%

  • f

t

  • t

a l I T budge t

  • n

i nt e gr a t i

  • n-

r e l a t e d e f f

  • r

t s

1995 1997 1999 2001 2003 2005 Ent er pr i se I nf

  • r

m at i

  • n

Sour ce: G ar t ner , 1999

Sound Bus i ne s s M ode l s

  • Expl
  • s

i

  • n
  • f

i nt r anet and e xt r ane t i nf

  • r

mat i

  • n
  • 80%
  • f

cor por at e i nf

  • r

m at i

  • n

i s unmanaged

  • By

2004 30X m or e ent e r pr i s e dat a t han 1999

  • The

ave r age com pany: – m ai nt ai ns 49 di s t i nct ent e r pr i s e appl i c at i

  • ns

– s pends 35%

  • f

t

  • t

al I T budget

  • n

i nt egr at i

  • n-

r e l at ed e f f

  • r

t s

1995 1997 1999 2001 2003 2005 Ent er pr i se I nf

  • r

m at i

  • n

Sour ce: G ar t ner , 1999

Di m e ns i

  • ns

t

  • Cons

i de r

  • How

m a ny s

  • ur

c e s a r e we a c c e s s i ng?

  • How

a ut

  • nom ous

a r e t hey?

  • M e

t a

  • da

t a a bout s

  • ur

c e s ?

  • I

s t he da t a s t r uc t ur e d?

  • Que

r i e s

  • r

a l s

  • upda

t e s ?

  • Re

qui r e m e nt s : a c c ur a cy, c

  • m pl

e t e ne s s , pe r f

  • r

m a nc e , ha ndl i ng i nc

  • ns

i s t e nc i e s .

  • Cl
  • s

e d wor l d a s s um pt i

  • n

vs .

  • pe

n wor l d?

Out l i ne

  • W r

a ppe r s

  • Se

m a nt i c i nt e gr a t i

  • n

a nd s

  • ur

c e de s c r i pt i

  • ns

:

– M ode l i ng s

  • ur

c e c

  • m pl

e t e ne s s – M ode l i ng s

  • ur

c e c a pa bi l i t i e s

  • Que

r y

  • pt

i m i z a t i

  • n
  • Que

r y e xe c ut i

  • n
  • Pe

e r

  • da

t a m a na ge m e nt s ys t e m s

  • Cr

e a t i ng s c he m a m a ppi ngs

W r a ppe r Pr

  • gr

a m s

  • Ta

s k: t

  • com m uni

c a t e wi t h t he da t a s

  • ur

c e s a nd do f

  • r

m a t t r a ns l a t i

  • ns

.

  • They

a r e bui l t w. r . t . a s pe c i f i c s

  • ur

c e .

  • They

c a n s i t e i t he r a t t he s

  • ur

c e

  • r

a t t he m e di a t

  • r

.

  • Of

t e n ha r d t

  • bui

l d ( ve r y l i t t l e s c i e nc e ) .

  • Ca

n be “ i nt e l l i ge nt ” : pe r f

  • r

m s

  • ur

c e

  • s

pe c i f i c

  • pt

i m i z a t i

  • ns

.

slide-8
SLIDE 8

8

Exa m pl e

<b> I nt r

  • duc

t i

  • n

t

  • DB

</ b> <i > Phi l Be r ns t e i n </ i > <i > Er i c Ne wc

  • m e

r </ i > Addi s

  • n

W e s l ey, 1999 <book> <t i t l e > I nt r

  • duc

t i

  • n

t

  • DB

</ t i t l e > <a ut hor > Phi l Be r ns t e i n </ a ut hor > <a ut hor > Er i c Ne wc

  • m e

r </ a ut hor > <publ i s he r > Addi s

  • n

W e s l e y </ publ i s he r > <ye a r > 1999 </ ye a r > </ book> Tr a ns f

  • r

m : i nt

  • :

Da t a Sour c e Ca t a l

  • g
  • Cont

a i ns a l l m e t a

  • i

nf

  • r

m a t i

  • n

a bout t he s

  • ur

c e s :

– Logi cal s

  • ur

ce cont e nt s ( books , ne w c a r s ) . – Sour ce ca pa bi l i t i e s ( c an a ns we r SQL que r i e s ) – Sour ce com pl e t e ne s s ( ha s al l books ) . – Phys i cal pr

  • pe

r t i e s

  • f

s

  • ur

ce and ne t wor k. – St a t i s t i cs a bout t he da t a ( l i ke i n a n RDBM S) – Sour ce r e l i abi l i t y – M i r r

  • r

s

  • ur

ce s – Upda t e f r e que ncy.

Cont e nt De s c r i pt i

  • ns
  • Us

e r que r i e s r e f e r t

  • t

he me di at e d s c he ma.

  • Da

t a i s s t

  • r

e d i n t he s

  • ur

c e s i n a l

  • c

al s c he ma.

  • Cont

e nt de s c r i pt i

  • ns

pr

  • vi

de t he s e m a nt i c m a ppi ngs be t we e n t he di f f e r e nt s c he m as .

  • Da

t a i nt e gr a t i

  • n

s ys t e m us e s t he de s c r i pt i

  • ns

t

  • t

r a ns l a t e us e r que r i e s i nt

  • que

r i e s

  • n

t he s

  • ur

c es .

De s i de r a t a f r

  • m

Sour c e De s c r i pt i

  • ns
  • Expr

e s s i ve powe r :di s t i ngui s h be t we e n s

  • ur

c e s wi t h c l

  • s

e l y r e l a t e d da t a . He nc e , be a bl e t

  • pr

une a c c e s s t

  • i

r r e l e va nt s

  • ur

c e s .

  • Ea

s y a ddi t i

  • n:m ake

i t e a s y t

  • a

dd ne w da t a s

  • ur

c e s .

  • Re

f

  • r

m ul a t i

  • n:be

a bl e t

  • r

e f

  • r

m ul a t e a us e r que r y i nt

  • a

que r y

  • n

t he s

  • ur

c e s e f f i c i e nt l y a nd e f f e c t i ve l y.

Re f

  • r

m ul a t i

  • n

Pr

  • bl

e m

  • Gi

ve n:

– A que r y Q pos e d

  • ve

r t he m e di a t e d s c he m a – De s c r i pt i

  • ns
  • f

t he da t a s

  • ur

ce s

  • Fi

nd:

– A que r y Q’

  • ve

r t he da t a s

  • ur

ce r e l a t i

  • ns

, s uc h t ha t :

  • Q’

pr

  • vi

de s

  • nl

y c

  • r

r e c t ans we r s t

  • Q,

a nd

  • Q’

pr

  • vi

de s al l pos s i bl e a ns we r s f r

  • m

t

  • Q

gi ve n t he s

  • ur

c e s .

La ngua ge s f

  • r

Sc he m a M a ppi ng

Mediated Schema Source Source Source Source Source

Q

Q’ Q’ Q’ Q’ Q’

GAV LAV GLAV

slide-9
SLIDE 9

9

Gl

  • ba

l

  • a

s

  • Vi

e w

M e di a t e d s c he m a : M ovi e ( t i t l e , di r , ye a r , ge nr e ) , Sc he dul e ( c i ne m a , t i t l e , t i m e ) . Cr e a t e Vi e w M ovi e AS s e l e c t * f r

  • m

S1 [ S1( t i t l e , di r , ye a r , ge nr e ) ] uni

  • n

s e l e c t * f r

  • m

S2 [ S2( t i t l e , di r , ye a r , ge nr e ) ] uni

  • n

[ S3( t i t l e , di r ) , S4( t i t l e, ye a r , ge nr e ) ] s e l e c t S3. t i t l e , S3. di r , S4. ye a r , S4. ge nr e f r

  • m

S3, S4 whe r e S3. t i t l e =S4. t i t l e

Gl

  • ba

l

  • a

s

  • Vi

e w: Exa m pl e 2

M e di a t e d s c he m a : M ovi e ( t i t l e , di r , ye a r , ge nr e ) , Sc he dul e ( c i ne m a , t i t l e , t i m e ) . Cr e a t e Vi e w M ovi e AS [ S1( t i t l e , di r , ye a r ) ] s e l e c t t i t l e , di r , ye a r , NULL f r

  • m

S1 uni

  • n

[ S2( t i t l e , di r , ge nr e ) ] s e l e c t t i t l e , di r , NULL, ge nr e f r

  • m

S2

Gl

  • ba

l

  • a

s

  • Vi

e w: Exa m pl e 3

M e di a t e d s c he m a : M ovi e ( t i t l e , di r , ye a r , ge nr e ) , Sc he dul e ( c i ne m a , t i t l e , t i m e ) . Sour c e S4: S4( c i ne m a , ge nr e ) Cr e a t e Vi e w M ovi e AS s e l e c t NULL, NULL, NULL, ge nr e f r

  • m

S4 Cr e a t e Vi e w Sc he dul e AS s e l e c t c i ne m a , NULL, NULL f r

  • m

S4. But what i f we want t

  • f

i nd whi c h c i ne mas ar e pl ay i ng c

  • me

di e s ?

Gl

  • ba

l

  • a

s

  • Vi

e w Sum m a r y

  • Que

r y r e f

  • r

m ul a t i

  • n

boi l s down t

  • vi

e w unf

  • l

di ng.

  • Ve

r y e a s y conc e pt ua l l y.

  • Ca

n bui l d hi e r a r c hi e s

  • f

m e di a t e d s c he m a s .

  • You

s

  • m e

t i m e s l

  • os

e i nf

  • r

m a t i

  • n.

Not a l ways na t ur a l .

  • Addi

ng s

  • ur

c e s i s ha r d. Ne e d t

  • c
  • ns

i de r a l l

  • t

he r s

  • ur

c e s t ha t a r e a va i l a bl e .

Loc a l

  • a

s

  • Vi

e w ( LAV)

Book: ISBN, Title, Genre, Year R1 R2 R3 R4 R5 Author: ISBN, Name R1( x, y, n) :

  • Book(

x, y, z , t ) , Aut hor ( x, n) , t < 1970 R5( x, y) :

  • Book(

x, y, ” Hum or ” ) Books before 1970 Humor books

Que r y Re f

  • r

m ul a t i

  • n

Book: ISBN, Title, Genre, Year R1 R2 R3 R4 R5 Author: ISBN, Name Books before 1970 Humor books

Query: Find authors of humor books Plan: R1 Join R5

slide-10
SLIDE 10

10

Que r y Re f

  • r

m ul a t i

  • n

Book: ISBN, Title, Genre, Year R1 R2 R3 R4 R5 Author: ISBN, Name ISBN, Title, Name ISBN, Title

Find authors of humor books before 1960 Plan: Can’t do it!

(subtle reasons)

Loc a l

  • a

s

  • Vi

e w: e xa m pl e 1

M e di a t e d s c he m a : M ovi e ( t i t l e , di r , ye a r , ge nr e ) , Sc he dul e ( c i ne m a , t i t l e , t i m e ) . Cr e a t e Sour c eS1 AS s e l e c t * f r

  • m

M ovi e Cr e a t e Sour c eS3 AS [ S3( t i t l e , di r ) ] s e l e c t t i t l e , di r f r

  • m

M ovi e Cr e a t e Sour c eS5 AS s e l e c t t i t l e , di r , ye a r f r

  • m

M ovi e whe r e ye a r > 1960 AND ge nr e =“Com e dy”

Loc a l

  • a

s

  • Vi

e w: Exa m pl e 2

M e di a t e d s c he m a : M ovi e ( t i t l e , di r , ye a r , ge nr e ) , Sc he dul e ( c i ne m a , t i t l e , t i m e ) . Sour c e S4: S4( c i ne m a , ge nr e ) Cr e a t e Sour c e S4 s e l e c t c i ne m a , ge nr e f r

  • m

M ovi e m , Sc he dul e s whe r e m . t i t l e =s . t i t l e . Now i f we want t

  • f

i nd whi c h c i ne mas ar e pl ay i ng c

  • me

di e s , t he r e i s hope !

Loc a l

  • a

s

  • Vi

e w Sum m a r y

  • Ve

r y f l e xi bl e . You ha ve t he powe r

  • f

t he e nt i r e que r y l a nguage t

  • de

f i ne t he c

  • nt

e nt s

  • f

t he s

  • ur

c e .

  • He

nc e , c a n e a s i l y di s t i ngui s h bet we e n c

  • nt

e nt s

  • f

c l

  • s

e l y r e l a t e d s

  • ur

ce s .

  • Addi

ng s

  • ur

c e s i s e a s y: t hey’ r e i nde pe nde nt

  • f

e a c h

  • t

he r .

  • Que

r y r e f

  • r

m ul a t i

  • n:

ans wer i ng que r i e s us i ng v i e ws !

The Ge ne r a l Pr

  • bl

e m

  • Gi

ve n a s e t

  • f

vi e ws V1, … , Vn, a nd a que r y Q, c a n we a ns we r Q us i ng

  • nl

y t he a ns we r s t

  • V1,

… , Vn?

  • M a

ny, m a ny pa pe r son t hi s pr

  • bl

e m .

  • The

be s t pe r f

  • r

m i ng a l gor i t hm : The M i ni Con Al gor i t hm , ( Pot t i nge r & Levy, 2000) .

  • Gr

e a t s ur vey

  • n

t he t

  • pi

c : ( Ha l e vy, 2001) .

Loc a l Com pl e t e ne s s I nf

  • r

m a t i

  • n
  • I

f s

  • ur

c e s a r e i nc

  • m pl

e t e , we ne e d t

  • l
  • ok

a t e a c h

  • ne
  • f

t he m .

  • Of

t e n, s

  • ur

c e s a r e l

  • c

al l y c

  • mpl

e t e .

  • M ovi

e ( t i t l e , di r e c t

  • r

, ye a r ) com pl e t e f

  • r

yea r s a f t e r 1960,

  • r

f

  • r

Am e r i c a n di r e c t

  • r

s .

  • Que

s t i

  • n:gi

ve n a s e t

  • f

l

  • c

a l c

  • m pl

e t e ne s s s t a t e m e nt s , i s a que r y Q’ a c

  • m pl

e t e a ns we r t

  • Q?
slide-11
SLIDE 11

11

Exa m pl e

  • M ovi

e ( t i t l e , di r e c t

  • r

, ye a r ) ( com pl e t e a f t e r 1960) .

  • Show(

t i t l e , t he a t e r , c i t y, hour )

  • Que

r y: f i nd m ovi e s ( a nd di r e c t

  • r

s ) pl ayi ng i n Se a t t l e : Se l e c t m . t i t l e , m . di r e c t

  • r

Fr

  • m

M ovi e m , Show s W he r e m . t i t l e =s . t i t l e AND c i t y=“ Se a t t l e ”

  • Com pl

e t e

  • r

not ?

Exa m pl e #2

  • M ovi

e ( t i t l e , di r e c t

  • r

, ye a r ) , Os ca r ( t i t l e , ye a r )

  • Que

r y: f i nd di r e c t

  • r

s whos e m ovi e s won Os c a r s a f t e r 1965: s e l e c t m . di r e c t

  • r

f r

  • m

M ovi e m , Os c a r

  • whe

r e m . t i t l e =o. t i t l e AND m . yea r =o. ye a r AND

  • .

ye a r > 1965.

  • Com pl

e t e

  • r

not ?

Que r y Opt i m i z a t i

  • n
  • Ve

r y r e l a t e d t

  • que

r y r e f

  • r

m ul a t i

  • n!
  • Goa

l

  • f

t he

  • pt

i m i z e r : f i nd a phys i c a l pl a n wi t h m i ni m a l c

  • s

t .

  • Key

com pone nt s i n

  • pt

i m i z a t i

  • n:

– Se a r c h s pa ce

  • f

pl a ns – Se a r c h s t r a t e gy – Cos t m ode l

Opt i m i z a t i

  • n

i n Di s t r i but e d DBM S

  • A

di s t r i but e d da t a bas e ( 2- m i nut e t ut

  • r

i a l ) :

– Da t a i s di s t r i but e d

  • ve

r m ul t i pl e node s , but i s uni f

  • r

m . – Que r y e xe c ut i

  • n

c a n be di s t r i but e d t

  • s

i t e s . – Com m uni c a t i

  • n

c

  • s

t s a r e s i gni f i c a nt .

  • Cons

e que nce s f

  • r
  • pt

i m i z a t i

  • n:

– Opt i m i z e r ne e ds t

  • de

c i de l

  • c

al i t y – Ne e d t

  • e

xpl

  • i

t i nde pe nde nt pa r a l l el i s m . – Ne e d

  • pe

r a t

  • r

s t ha t r e duc e c

  • m m uni

c a t i

  • n

c

  • s

t s ( s e m i

  • j
  • i

ns ) .

DDBM S vs . Da t a I nt e gr a t i

  • n
  • I

n a DDBM S, da t a i s di s t r i but e d

  • ve

r a s e t

  • f

uni f

  • r

m s i t e s wi t h pr e c i s e r ul e s .

  • I

n a da t a i nt e gr a t i

  • n

c

  • nt

e xt :

– Da t a s

  • ur

ce s m ay pr

  • vi

de

  • nl

y l i m i t e d a c ce s s pa t t e r ns t

  • t

he da t a . – Da t a s

  • ur

ce s m ay have a ddi t i

  • nal

que r y c a pa bi l i t i e s . – Cos t

  • f

a ns we r i ng que r i e s a t s

  • ur

ce s unknown. – St a t i s t i cs a bout da t a unknown. – Tr a ns f e r r a t e s unpr e di c t a bl e .

M ode l i ng Sour c e Ca pa bi l i t i e s

  • Ne

ga t i ve c a pa bi l i t i e s :

– A we b s i t e m ay r e qui r e ce r t a i n i nput s ( i n a n HTM L f

  • r

m ) . – Ne e d t

  • cons

i de r

  • nl

y va l i d que r y e xe cut i

  • n

pl ans .

  • Pos

i t i ve c a pa bi l i t i e s :

– A s

  • ur

ce m ay be a n ODBC com pl i a nt s ys t e m . – Ne e d t

  • de

c i de pl ace m e nt

  • f
  • pe

r a t i

  • ns

a c cor di ng t

  • c

apa bi l i t i e s .

  • Pr
  • bl

e m :how t

  • de

s c r i be and e x pl

  • i

t s

  • ur

c e c apabi l i t i e s .

slide-12
SLIDE 12

12

Exa m pl e #1: Ac c e s s Pa t t e r ns

M e di a t e d s c he m a r e l a t i

  • n:Ci

t e s ( pa pe r 1, pa pe r 2) Cr e a t e Sour c eS1 a s s e l e c t * f r

  • m

Ci t e s gi ve n pa pe r 1 Cr e a t e Sour c e S2 a s s e l e c t pa pe r 1 f r

  • m

Ci t e s Que r y:s el e c t pa pe r 1 f r

  • m

Ci t e s whe r e pa pe r 2=“Ha l 00”

Exa m pl e #1: Cont i nue d

Cr e a t e Sour c eS1 a s s e l e c t * f r

  • m

Ci t e s gi ve n pa pe r 1 Cr e a t e Sour c e S2 a s s e l e c t pa pe r 1 f r

  • m

Ci t e s Se l e c t p1 Fr

  • m

S1, S2 W he r e S2. pa pe r 1=S1. pa pe r 1 AND S1. pa pe r 2=“Ha l 00”

Exa m pl e #2: Ac c e s s Pa t t e r ns

Cr e a t e Sour c eS1 a s s e l e c t * f r

  • m

Ci t e s gi ve n pa pe r 1 Cr e a t e Sour c e S2 a s s e l e c t pa pe r I D f r

  • m

UW - Pa pe r s Cr e a t e Sour c e S3 a s s e l e c t pa pe r I D f r

  • m

Awa r dPa pe r s gi ve n pa pe r I D Que r y:s el e c t * f r

  • m

Awa r dPa pe r s

Exa m pl e #2: Sol ut i

  • ns
  • Ca

n’ t go di r e c t l y t

  • S3

be c a us e i t r e qui r e s a bi ndi ng.

  • Ca

n go t

  • S1,

ge t UW pa pe r s , a nd c he c k i f t he y’ r e i n S3.

  • Ca

n go t

  • S1,

ge t UW pa pe r s , f e e d t he m i nt

  • S2,

a nd f e e d t he r e s ul t s i nt

  • S3.
  • Ca

n go t

  • S1,

f e e d r e s ul t s i nt

  • S2,

f e e d r e s ul t s i nt

  • S2

a ga i n, a nd t he n f e e d r e s ul t s i nt

  • S3.
  • St

r i c t l y s pe a ki ng, we c a n’ t a pr i

  • r

i de c i de whe n t

  • s

t

  • p.
  • Ne

e d r e c ur s i v e que r y pr

  • c

e s s i ng.

Ha ndl i ng Pos i t i ve Ca pa bi l i t i e s

  • Cha

r a c t e r i z i ng pos i t i ve c a pa bi l i t i e s :

– Sc he m a i nde pe nde nt ( e . g. , c a n al ways pe r f

  • r

m j

  • i

ns , s e l e c t i

  • ns

) . – Sc he m a de pe nde nt : c a n j

  • i

n R and S, but not T. – Gi ve n a que r y, t e l l s you whe t he r i t ca n be ha ndl e d.

  • Key

i s s ue : how do you s e a r c h f

  • r

pl a ns ?

  • Ga

r l i c a ppr

  • a

c h ( I BM ) : Gi ve n a que r y, STAR r ul e s de t e r m i ne whi c h s ubque r i e s a r e e xe c ut a bl e by t he s

  • ur

c e s . The n pr

  • c

e e d bot t

  • m -

up a s i n Sys t e m - R.

M a t c hi ng Obj e c t s Ac r

  • s

s Sour c e s

  • How

do I know t ha t A. Ha l e vy i n s

  • ur

c e 1 i s t he s a m e a s Al

  • n

Hal e vy i n s

  • ur

c e 2?

  • I

f t he r e a r e uni f

  • r

m ke ys a c r

  • s

s s

  • ur

c e s , no pr

  • bl

e m .

  • I

f not :

– Dom a i n s pe c i f i c s

  • l

ut i

  • ns

( e . g. , m a ybe l

  • ok

a t t he a ddr e s s , s s n) . – Us e I nf

  • r

m a t i

  • n

r e t r i e va l t e c hni que s ( Cohe n, 98) . J udge s i m i l a r i t y a s you woul d be t we e n doc um e nt s . – Us e c

  • nc
  • r

da nc e t a bl e s . The s e a r e t i m e

  • c
  • ns

um i ng t

  • bui

l d, but you c a n t he n s e l l t he m f

  • r

l

  • t

s

  • f

m one y.

slide-13
SLIDE 13

13

Opt i m i z a t i

  • n

a nd Exe c ut i

  • n
  • Pr
  • bl

e m :

– Fe w and unr e l i a bl e s t a t i s t i c s about t he da t a . – Une xpe c t e d ( pos s i bl y bur s t y) ne t wor k t r a ns f e r r a t e s . – Ge ne r al l y, unpr e di c t a bl e e nvi r

  • nm e

nt .

  • Ge

ne r a l s

  • l

ut i

  • n:

( r e s e a r c h a r e a )

– Ada pt i ve que r y pr

  • ce

s s i ng. – I nt e r l e ave

  • pt

i m i z a t i

  • n

a nd e xe cut i

  • n.

As you ge t t

  • know

m or e a bout your da t a , you c a n i m pr

  • ve

your pl a n.

O pt i m i zer ( Re- ) O pt i m i zer M em Al l

  • c-

Fr agm ent er Execut i

  • n

Engi ne Tem p St

  • r

e Event Handl er Q uer y O per at

  • r

s Ref

  • r

m ul at

  • r

Cat al

  • g

sour ce m appi ngs quer y l

  • gi

cal pl an exec pl an answer dat a exec r esul t s

Tukwi l a Da t a I nt e gr a t i

  • n

Sys t e m

Nove l c

  • m pone

nt s :

– Eve nt ha ndl e r – Opt i m i z a t i

  • n-

e xe c ut i

  • n

l

  • op

Doubl e Pi pe l i ne d J

  • i

n ( Tukwi l a )

Ha s h J

  • i

n

8 Pa r t i a l l y pi pe l i ne d: no

  • ut

put unt i l i nne r r e a d 8 As ym m e t r i c ( i nne r vs .

  • ut

e r ) —

  • pt

i m i z a t i

  • n

r e qui r e s s

  • ur

c e be ha vi

  • r

knowl e dge

Doubl e Pi pe l i ne d Ha s h J

  • i

n

4 Out put s da t a i m m e di a t e l y 4 Sym m e t r i c — r e qui r e s l e s s s

  • ur

c e knowl e dge t

  • pt

i m i z e

Se m a nt i c M a ppi ngs

BooksAndMusic

Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords

Books

Title ISBN Price DiscountPrice Edition

CDs

Album ASIN Price DiscountPrice Studio

BookCategories

ISBN Category

CDCategories

ASIN Category

Artists

ASIN ArtistName GroupName

Authors

ISBN FirstName LastName

Inventory Database A Inventory Database B

  • Ne

e d m a ppi ngs i n e ve r yda t a s ha r i ng a r c hi t e c t ur e

  • “St

andar ds ar e gr e at , but t he r e ar e t

  • many

. ”

W hy i s i t s

  • Ha

r d?

  • Sc

he m a s neve r f ul l y c a pt ur et he i r i nt e nde d m e a ni ng:

– W e ne e d t

  • l

e ve r age a ny a ddi t i

  • nal

i nf

  • r

m a t i

  • n

we m ay have .

  • A

hum a n wi l l a l ways be i n t he l

  • op.

– Goal i s t

  • i

m pr

  • ve

de s i gne r ’ s pr

  • duct

i vi t y. – Sol ut i

  • n

m us t be e xt e ns i bl e .

  • Two

c a s e s f

  • r

s c hem a m a t c hi ng:

– Fi nd a m ap t

  • a

com m on m e di a t e d s che m a . – Fi nd a di r e c t m a ppi ng be t we e n t wo s c he m as .

Typi c a l M a t c hi ng He ur i s t i c s

  • W e

bui l d a m ode lf

  • r

e ve r y el e m e nt f r

  • m

m ul t i pl e s

  • ur

c e s

  • f

e vi de nc e s i n t he s c he m a s

– Sc he m a e l e m e nt na m e s

  • Books

AndCDs / Cat egor i e s ~ BookCat egor i e s / Cat egor y

– De s c r i pt i

  • ns

a nd doc um e nt a t i

  • n
  • I

t emI D: uni que i dent i f i e r f

  • r

a book

  • r

a CD

  • I

SBN: uni que i dent i f i e r f

  • r

any book

– Da t a t ype s , da t a i ns t a nc e s

  • Dat

e Ti me „ I nt ege r ,

  • addr

e s s e s have s i mi l ar f

  • r

mat s

– Sc he m a s t r uc t ur e

  • Al

l books have s i mi l ar at t r i but e s

Models consider only the two schemas. In isolation, techniques are incomplete or brittle: Need principled combination.

slide-14
SLIDE 14

14

Us i ng Pa s t Expe r i e nc e

  • M a

t c hi ng t a s ks a r e

  • f

t e n r e pe t i t i ve

  • Hum a

ns i m pr

  • ve
  • ve

r t i m e a t m a t c hi ng.

– A m a t c hi ng s ys t e m s houl d i m pr

  • ve

t

  • o!
  • LSD:

– Le a r ns t

  • r

e c

  • gni

z e e l e m e nt s

  • f

m e di a t e d s c he m a . – [ Doa n, Dom i ngos , H. , SI GM OD- 01, M LJ

  • 03]
  • Doan:

2003 ACM Di s t i ngui s hed Di s s e r t at i

  • n

Awar d. M edi at ed Schem a

da t a s

  • ur

ce s

M edi at ed Schem a l i s t ed- pr i ce $250, 000 $110, 000 . . . addr e s s pr i c e agent

  • phone

des c r i pt i

  • n

Exa m pl e : M a t c hi ng Re a l

  • Es

t a t e Sour c e s

l

  • c

at i

  • n

M i ami , FL Bos t

  • n,

M A . . . phone ( 305) 729 0831 ( 617) 253 1429 . . . c

  • m m ent

s Fant as t i c hous e Gr e at l

  • c

at i

  • n

. . . r eal es t at e. com l

  • c

at i

  • n

l i s t ed- pr i c e phone com m ent s Schem a

  • f

r eal es t at e. com I f “f ant as t i c” & “gr eat ”

  • ccur

f r equent l y i n dat a val ues => des cr i pt i

  • n

Lear ned hypot hes es pr i c e $550, 000 $320, 000 . . . c

  • nt

act

  • phone

( 278) 345 7215 ( 617) 335 2315 . . . e xt r a

  • i

nf

  • Be

aut i f ul yar d Gr e at be ac h . . . hom es . com I f “phone” occur s i n t he name => agent

  • phone

M edi at ed s chem a

Le a r ni ng Sour c e De s c r i pt i

  • ns
  • W e

l e a r n a c l a s s i f i e r f

  • r

e a c h e l e m e nt

  • f

t he m e di a t e d s c he m a .

  • Tr

a i ni ng e xa m pl es a r e pr

  • vi

de d by t he gi ve n m a ppi ngs .

  • M ul

t i

  • s

t r a t e gy l e a r ni ng:

– Ba s e l e a r ne r s : nam e , i ns t a nce , de s c r i pt i

  • n

– Com bi ne us i ng s t a c ki ng.

  • Ac

c ur a cy

  • f

70- 90% i n e xpe r i m e nt s .

  • Le

a r ni ng a bout t he m e di a t e d s c he m a .

M ul t i

  • St

r a t e gy Le a r ni ng

  • Us

e a s e t

  • f

bas el e a r ne r s :

– Na m e l e a r ne r , Na ï ve Ba ye s , W hi r l , XM L l e a r ne r

  • And

a s e t

  • f

r e c

  • gni

z e r s :

– Count y na m e , z i p c

  • de

, phone num be r s .

  • Ea

c h ba s e l e a r ne r pr

  • duc

e s a pr e di c t i

  • n

we i ght e d by c

  • nf

i de nc e s c

  • r

e .

  • Com bi

ne ba s e l e a r ne r s wi t h a me t a- l e ar ne r , us i ng s t a c ki ng.

The Se m a nt i c W e b

  • A

we b

  • f

s t r uc t ur e d da t a :

– The 5- ye a r

  • l

d vi s i

  • n
  • f

Ti m Be r ne r s

  • Le

e

  • How

doe s i t r e l a t e t

  • da

t a i nt e gr a t i

  • n?
  • How

a r e we goi ng t

  • do

i t ?

  • W hy

s houl d we do i t ? Do we ne e d a ki l l e r a pp

  • r

i s t he s e m a nt i c we b a ki l l e r a pp?

The End