Enhancing Traditional Databases to Support Broader Data Management - PowerPoint PPT Presentation

Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University

What Is a Database System? � Of course, there are traditional relational database management systems (RDBMS) � Was introduced in 1970 by Dr. E. F. Codd (of IBM) � Commercial relational databases began to appear in the 1980s � The focus of most work in the past 30 years 2 Yi Chen --- January 23, 2006

A Relational Database (RDBMS) Column (Attribute) Table (Relation) Climber Name Skill age Row James Beginner 21 Bob Experienced 33 (Tuple, Record) Climbs Refers to Name Route Date Duration Bob Last Tango 10/10/05 5 Bob Last Tango 1/10/06 4.5 A predefined data structure (schema) is required. 3 Yi Chen --- January 23, 2006

Querying RDBMS: SQL Climber Name Skill age selection: σ Name = “James” James Beginner 21 Bob Experienced 33 Climbs projection: ∏ Route = “Last Name Route Date Duration Tango” Bob Last Tango 10/10/05 5 Bob Last Tango 1/10/06 4.5 join: Climber Climber.name = climbs.name Climbs Name Skill Age Route Date Duration Bob Experienced 33 Last Tango 10/10/05 5 Bob Experienced 33 Last Tango 1/10/06 4.5 4 Yi Chen --- January 23, 2006

The Advantages of RDBMS � Good data organization � High efficiency for large datasets via indexing and query optimization � Concurrency control and reliability 5 Yi Chen --- January 23, 2006

But, 80% of the World’s Data is Not in RDBMS! Examples: � WWW, Emails � Personal data, documents of various format � Sensor data � A lot of scientific data (experimental data, large images, documentation, etc) � Why not? � There are several assumptions in relational databases that do not fit for handling this data. � My research addresses how to enhance RDBMS to manage them. 6 Yi Chen --- January 23, 2006

Challenges for RDBMS (I) � RDBMS Assumption: data conforms to a predefined fixed schema, which is separated from the data itself � Reality: � Data may be collected from different sources on the web, therefore has different schemas � Schema can change over time for a single source � Requirements: We need to handle data of different schemas and have the schemas tightly associated with the data 7 Yi Chen --- January 23, 2006

XML as a Data Representation Format � XML has become a standard data format for various applications, because of: � Flexibility in schemas -- semi-structured data � Self - describing feature � Representing tree data model naturally 8 Yi Chen --- January 23, 2006

9 XML: the Standard for Web Data GenBank PubMed BLAST ... Yi Chen --- January 23, 2006 Internet XML Data XML Data Representation NCBI Web Service Publisher Web Service Requester

XML: Representing Phylogenetic Trees From the Tree of the Life Website, University of Arizona Orangutan Human Gorilla Chimpanzee 10 Yi Chen --- January 23, 2006

Challenges for RDBMS (II) � RDBMS Assumption: Data is clean and consistent. � Reality: real world data is dirty � Data collected from different sources may have missing and conflicting information � Data that is obtained from data mining is often not error-prone � Experimental data often contains random errors � Requirements: we need to measure data quality and handle imprecise and/or incomplete data 11 Yi Chen --- January 23, 2006

Roadmap of This Talk � Managing XML by leveraging mature RDBMS [Chen et al 04] � Introduction to XML � A generic and efficient XML-to-RDBMS mapping � Data mapping from trees to tables � Query translation from tree navigation queries to SQL queries that are efficient � Handling imprecise and incomplete data in DBMS [Chen et al 06] 12 Yi Chen --- January 23, 2006

Sample XML Data <books> books ... <book> book <title> The lord of the rings... ... title section </title> “The lord of the <section> section title rings …” “Locating <title> middle- ... Locating middle-earth title figure earth” “A hall </title> ... fit for a </section> … description king” “King Theoden's </book> golden hall” </books> 13 Yi Chen --- January 23, 2006

Sample XML Queries books � XML query languages are ... based on hierarchical book structure navigation (e.g. XPath) ... title section “The lord of the title section � Sample queries: rings …” “Locating � What are all the section middle- ... title figure titles: //section/title earth” “A hall fit for a description king” “King Theoden's Descendant axis Child axis golden hall” 14 Yi Chen --- January 23, 2006

Sample XML Queries books � XML query languages are ... based on hierarchical book structure navigation (e.g. XPath) ... title section “The lord of the title section � Sample queries: rings …” “Locating � What are all the section middle- ... title figure titles: //section/title earth” “A hall fit for a � What are the titles of description king” “King Theoden's sections that contain a golden hall” figure: //section[/figure]/title Predicates 15 Yi Chen --- January 23, 2006

How to Query XML Data efficiently? � RDBMS have achieved high performance in query evaluation. � Can we leverage RDBMS by encoding XML to tables? 16 Yi Chen --- January 23, 2006

17 Analogy: Fourier Transforms G(f)H(f) Efficient Yi Chen --- January 23, 2006 g * h = ∫∫ - ∞ g(u)h(u)du Complex + ∞

Mapping XML Data to RDBMS Challenge: How to build the bridge between hierarchies and XPath tables? Query Translation XML SQL fragments Storage Mapping XML data Relational databases 18 Yi Chen --- January 23, 2006

Data Mapping Parent ID (1) books [Florescu & Kossmann 99] (2) book T ID Tag Value Structural (3) (4) Information title section 1 books “The lord of (5) the rings …” 2 book section title 3 title The... “Locating 4 section middle- 5 title Locating… title figure earth” “A hall … … … … fit for a description king” “King Theoden's golden hall” 19 Yi Chen --- January 23, 2006

Data Mapping Design special labels (1) . to encode node books relationships (2) book T ID Tag Value Structural (3) (4) Information title section 1 books “The lord of (5) the rings …” 2 book section title 3 title The... “Locating 4 section middle- 5 title Locating… title figure earth” “A hall … … … … fit for a description king” “King Theoden's golden hall” 20 Yi Chen --- January 23, 2006

Query Translator Architecture XPath Sub-query SQL sub-query SQL XPath decomposition translation composition Query Translator � How to choose XPath subqueries, such that: � they can be easily translated to SQL subqueries � the SQL subqueries can be efficiently evaluated � How to combine SQL subqueries to a complete one? 21 Yi Chen --- January 23, 2006

22 Q: //book[//figure]/section/title section title Yi Chen --- January 23, 2006 book Query Translator figure

23 Query Translator: (I) Decomposition to section Q: //book[//figure]/section/title book title Yi Chen --- January 23, 2006 book figure Suffix Paths

Encoding Suffix Paths Using P-labeling (342000,343000) σ 342000 ≤ Plabel ≤ 343000 T //book/section/title books books ... ... (1) (1) book book (2) (2) (3) (3) (4) (4) T ... ... title title section section id Plabel “The lord “The lord 1 100000 of the of the (100) (100) (5) 2 210000 section section title title rings …” “Locating rings …” “Locating 3 321000 middle- middle- ... ... 4 421000 title title figure figure earth” “A hall earth” “A hall 5 342100 … … Evaluating suffix paths fit for a fit for a description description king” king” “King Theoden's “King Theoden's SQL selections on P-labels golden hall” golden hall” 24 Yi Chen --- January 23, 2006

Query Translator: (II) Selection on P-labels 25 section Q: //book[//figure]/section/title book title Yi Chen --- January 23, 2006 book figure

D-labeling Scheme books ... • D-labeling is used to connect (1, 20000, 1) suffix paths. book (6, 1200, 2) (10,80,3) (81, 250,3) • D-labels (start, end, depth) ... title section “The lord can be used to detect of the (100, 200,4) ancestor-descendant section title rings …” “Locating relationships between nodes middle- in a tree. ... (120, 160, 5) title figure earth” “A hall fit for a description king” “King Theoden's golden hall” 26 Yi Chen --- January 23, 2006

Enhancing Traditional Databases to Support Broader Data Management - PowerPoint PPT Presentation

Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional relational database management

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Module 3: Creating and Managing Databases Overview Creating Databases Creating

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

The Regulatory f The Regulatory f frame work of the frame work of the Labour Marke Labour

3. Text and document databases Normal databases: formatted records; document databases:

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Nothing is Traditional about Nothing is Traditional about Environments in a Traditional

S Security in Outsourced i i O d Databases Databases (Query Answer Assurance) (Q y ) 1

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

Introduc)ontoDatabases 1 Rela%onal(Databases(with(PostgreSQL

A quick review Significance of similarity scores (P-values) Empirical null score

DEPUTY Maarten de Vos, all the way from Holland. Maarten is PRINCIPAL putting the finishing

Creating Knowledge in the Age of Digital Information Robert L. Constable Dean of the Faculty of

Make Housing Assistance a Priority in Congress Login at: https://results.zoom.us/j/873308801 or

Triplet and Quartet Distances Between Trees of Arbitrary Degree Gerth Stlting Brodal Rolf

Fast Neighbor Joining Jens Lagergren Isaac Elias Royal Institute of Technology Sweden 1

Note Well Any submission to the IETF intended by the

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Enhancing Traditional Databases to Support Broader Data Management - PowerPoint PPT Presentation

Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional relational database management

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Module 3: Creating and Managing Databases Overview Creating Databases Creating

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

The Regulatory f The Regulatory f frame work of the frame work of the Labour Marke Labour

3. Text and document databases Normal databases: formatted records; document databases:

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP &amp; Databases l PHP can connect to

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Nothing is Traditional about Nothing is Traditional about Environments in a Traditional

S Security in Outsourced i i O d Databases Databases (Query Answer Assurance) (Q y ) 1

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

Introduc)on*to*Databases 1 Rela%onal(Databases(with(PostgreSQL

A quick review Significance of similarity scores (P-values) Empirical null score

DEPUTY Maarten de Vos, all the way from Holland. Maarten is PRINCIPAL putting the finishing

Creating Knowledge in the Age of Digital Information Robert L. Constable Dean of the Faculty of

Make Housing Assistance a Priority in Congress Login at: https://results.zoom.us/j/873308801 or

Triplet and Quartet Distances Between Trees of Arbitrary Degree Gerth Stlting Brodal Rolf

Fast Neighbor Joining Jens Lagergren Isaac Elias Royal Institute of Technology Sweden 1

Note Well Any submission to the IETF intended by the

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to

Introduc)ontoDatabases 1 Rela%onal(Databases(with(PostgreSQL