Data Curation Means by Michael Stonebraker For K Data Sources: - - PowerPoint PPT Presentation

data curation means by michael stonebraker for k data
SMART_READER_LITE
LIVE PREVIEW

Data Curation Means by Michael Stonebraker For K Data Sources: - - PowerPoint PPT Presentation

Data Curation Means by Michael Stonebraker For K Data Sources: Identify the data sources! Ingest the data Clean the data Transform the data Perform schema integration Perform entity consolidation Simple Example --- 2


slide-1
SLIDE 1

Data Curation Means … by Michael Stonebraker

slide-2
SLIDE 2

For K Data Sources:

Identify the data sources! Ingest the data Clean the data Transform the data Perform schema integration Perform entity consolidation

slide-3
SLIDE 3

Simple Example --- 2 Data Sources

Employee (name, salary, hobbies, age, city, state) Person (p-id, wages, address, birthday, year_born,

likes)

slide-4
SLIDE 4

And 2 Records

Sam Madden, $4000, {bike, dogs}, 36, Cambridge,

Mass.

Samuel E. Madden, $5000, Newton Ma., October

4, 1975, bicycling

slide-5
SLIDE 5

Data Curation (1)

Ingest Read the 2 records and store in a common place Clean $4000 and $5000: both wrong? One right? Both

right? (May have to ask an expert)

Tranform October 4, 1975  39 Now clean 39 and 36

slide-6
SLIDE 6

Data Curation (2)

Schema Integration hobbies same as likes? Person same as Employee? Entity consolidation 2 Sams or 1 Sam?

slide-7
SLIDE 7

Data Curation (3)

Making use of “trusted” data sources, if available Dictionary of hobbies, …

slide-8
SLIDE 8

Papers in this Session

Consider various aspects of data curation Web Tables “finding” issue Sandbox for experimentation In 18 minutes or less Leaving 15 minutes for discussion

slide-9
SLIDE 9

Advertisement

This problem is killing most enterprises!!!! Customer integration for cross selling Purchasing integration to get “most favored

nation” terms

Medical data records …