The Future of Data: The Future of Data: A Smorgasbord A - - PDF document

the future of data the future of data a smorgasbord a
SMART_READER_LITE
LIVE PREVIEW

The Future of Data: The Future of Data: A Smorgasbord A - - PDF document

The Future of Data: The Future of Data: A Smorgasbord A Smorgasbord Guy M. Lohman Guy M. Lohman IBM Almaden Research Center IBM Almaden Research Center Myth #1: XML will "solve" the Myth #1: XML will "solve" the data


slide-1
SLIDE 1

The Future of Data: The Future of Data: A Smorgasbord A Smorgasbord

Guy M. Lohman Guy M. Lohman IBM Almaden Research Center IBM Almaden Research Center

Myth #1: XML will "solve" the Myth #1: XML will "solve" the data format problem data format problem

Heterogeneity will Heterogeneity will always always reign! reign! Not everything will be XMLized! Legacy Not everything will be XMLized! Legacy systems, flat files, the next "great thing",... systems, flat files, the next "great thing",... Who's going to control the semantics of all Who's going to control the semantics of all those XML tags? Remember, the "X" stands those XML tags? Remember, the "X" stands for "extensible"! Everyone and his mother for "extensible"! Everyone and his mother will be coming up with new tags, and who will be coming up with new tags, and who knows what they mean when you're knows what they mean when you're searching the web. searching the web.

slide-2
SLIDE 2

XML is NOT a Panacea XML is NOT a Panacea

EXAMPLE 1: What does tag <salary> mean? EXAMPLE 1: What does tag <salary> mean?

What currency? What currency? What frequency? (annual, monthly,hourly,...?) What frequency? (annual, monthly,hourly,...?)

EXAMPLE 2: What does value "order" mean EXAMPLE 2: What does value "order" mean when its tag is <type>? when its tag is <type>?

Type of what? Type of what? "Order" of what? Purchase? Sequence? "Order" of what? Purchase? Sequence?

Gives Gives some some increased context... increased context... But only a But only a slight slight improvement over improvement over Google search! Google search!

Myth #2: Relational is Dead -- Myth #2: Relational is Dead --

Native XML repositories are the future Native XML repositories are the future

Relational DBMSs are hugely successful, Relational DBMSs are hugely successful, with a complete array of utilities, features, with a complete array of utilities, features, and performance honing. and performance honing. Evolutionary rather than revolutionary Evolutionary rather than revolutionary changes are the only way that change will changes are the only way that change will happen happen Remember how object-oriented systems, Remember how object-oriented systems, which surely subsumed relational systems, which surely subsumed relational systems, were going to replace relational? were going to replace relational?

slide-3
SLIDE 3

Myth #3: Just shred everything Myth #3: Just shred everything into relational tables! into relational tables!

Boy, that's a LOT of work for all documents, Boy, that's a LOT of work for all documents, few of which will ever be retrieved by few of which will ever be retrieved by queries queries Many documents won't even be searched! Many documents won't even be searched! This won't exploit the nesting structure that This won't exploit the nesting structure that XML provides -- a lost opportunity XML provides -- a lost opportunity

Myth #4: Everything's off the Myth #4: Everything's off the Web as Data Streams Web as Data Streams

SOMEONE has to store the stuff! SOMEONE has to store the stuff! Companies won't store their corporate Companies won't store their corporate jewels on the Web, except possibly in an jewels on the Web, except possibly in an Intranet inside the firewall Intranet inside the firewall Cacheing will become even more Cacheing will become even more commonplace, for performance commonplace, for performance

slide-4
SLIDE 4

Myth #5: There's just one copy of Myth #5: There's just one copy of the data I'm interested in the data I'm interested in

Multiple levels of cacheing is now Multiple levels of cacheing is now commonplace commonplace

Edge servers Edge servers Mobile clients that are periodically detached Mobile clients that are periodically detached Multiple tiers Multiple tiers Multiple components within a server Multiple components within a server

Different degrees of synchronization Different degrees of synchronization Synchronizing is a major headache! Synchronizing is a major headache!

Guido = 'smart'

Cache Write-Through Dilemma Cache Write-Through Dilemma

Guido='smart' Guido='smart'

Guido = 'jerk'

Replica 1 Replica 2 Master

Guido = 'nice'

slide-5
SLIDE 5

Guido = 'jerk'

Cache Write-Through Dilemma Cache Write-Through Dilemma

Guido='nice' Guido='smart'

Replica 1 Replica 2 Master

Guido = 'jerk'

Cache Write-Through Dilemma Cache Write-Through Dilemma

Guido='nice' Guido= ???

Replica 1 Replica 2 Master

slide-6
SLIDE 6

Myth #6: Don't need to integrate Myth #6: Don't need to integrate data -- use Web Services data -- use Web Services

Back to the future! Back to the future! Return to the "Balkanization" of data silos! Return to the "Balkanization" of data silos! Encapsulating data within an app Encapsulating data within an app

makes sense for security makes sense for security but not within an enterprise! but not within an enterprise!

App Silos vs. Integration App Silos vs. Integration

Customers Database Orders Database DBMS 2 DBMS 1 Customers App Orders App

DB Integration Web Service

slide-7
SLIDE 7

Who's REALLY Doing These? Who's REALLY Doing These?

Stock quotes Stock quotes Searching Shakespeare's plays Searching Shakespeare's plays Most XPath examples Most XPath examples

More Realistic Examples More Realistic Examples

Everything on IBM stock: price + Everything on IBM stock: price +

Analysts' opinions Analysts' opinions News items News items

A great statistic I saw a while ago (when?)... A great statistic I saw a while ago (when?)...

In an article on the web? In an article on the web? In an e-mail from someone? Who? Folder? In an e-mail from someone? Who? Folder? In my Palm? Where? In my Palm? Where? In a presentation someone sent me? In a presentation someone sent me? In a paper I read? In a paper I read? In a file (which directory?) on my In a file (which directory?) on my development machine? development machine? laptop? laptop?

slide-8
SLIDE 8

My Position My Position

Heterogeneity will always reign Heterogeneity will always reign

Format (structured, semi-structured, unstructured) Format (structured, semi-structured, unstructured) Schema chaos, even for structured data! Schema chaos, even for structured data! Schema and data are interchangeable Schema and data are interchangeable

A "Data Smorgasbord" A "Data Smorgasbord" Deal with it! Deal with it! Databases (not apps) are still the best hope for Databases (not apps) are still the best hope for integrating data (richer modeling) integrating data (richer modeling)

Consequences Consequences

Will see: Will see:

Ad hoc "communities" for standardizing Ad hoc "communities" for standardizing semantics of tags (like e-marketplaces) semantics of tags (like e-marketplaces) Products promising integration Products promising integration

Need: Need:

Richer semantic models (yes, even for XML!) Richer semantic models (yes, even for XML!) More robust/adaptive query processing More robust/adaptive query processing Better tools for managing diversity Better tools for managing diversity