Fast and Scalable Relational Division on Fast and Scalable - PowerPoint PPT Presentation

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems André S. Gonzaga , Robson L. F. Cordeiro

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

INTRODUCTION Relational Division allows simple representations of queries involving the concept of “for all”

INTRODUCTION 1. To select candidates having all the skills for a given job

INTRODUCTION 2. To select the diseases that have all the given symptoms

INTRODUCTION 2. To select the animals that have all the desired genetic conditions

INTRODUCTION 1. Relational Algebra:

INTRODUCTION 1. Relational Algebra: 2. RDBMS / SQL: a. Does not have an explicit operator for it. b. There are several possible implementations in SQL. c. Most of the time the relational division is used indirectly.

CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use.

CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation.

CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries.

CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. 4. Perform a case study to select genetic data using the relational division.

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual.

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual. ● Represents about 90% of all genetic variations of the individuals.

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: SNP Position along the chromosome Alleles: 11, 12 , 21, 22

BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: Genetic data of the Position along the chromosome Individual Alleles: 11, 12 , 21, 22

BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus.

BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus. The division operation is a derived operator.

BACKGROUND | Relational Division

BACKGROUND | Relational Division DIVIDEND

BACKGROUND | Relational Division DIVISOR

BACKGROUND | Relational Division QUOTIENT

RELATED WORK

PROPOSED ALGORITHMS ● We developed a new algorithm for the division operation

PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

PROPOSED ALGORITHMS | Index-Division Valid groups: { 1 , 2, 3}

PROPOSED ALGORITHMS | Data Generator 1. Cardinality , the number of tuples in the relations of dividend R1 and of divisor R2;

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals , the number of groups of tuples representing the individuals to be evaluated in the operation;

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation , the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result;

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result; 4. Variability , the differences in size between individuals, adjusting the number of tuples on each group.

EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100%

EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100% 2. Genetic Data: 4,100 Animals 10,000 SNPs > 40,000,000 tuples! http://qtl-mas-2012.kassiopeagroup.com/en/index.php

EXPERIMENTS

CONCLUSION We consider that a possible implementation of the Index Division inside the core of the DBMS could achieve the best performance on relational division queries.

Thanks for your attention!

REFERENCES

Fast and Scalable Relational Division on Fast and Scalable - PowerPoint PPT Presentation

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

This Lecture The Relational Model Relational data structures Relations and Relational

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

CSE 154 LECTURE 22:RELATIONAL DATABASES AND SQL Relational databases relational database : A

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

LISTA: Theoretical Linear Convergence, Practical Weights and Thresholds Xiaohan Chen , Jialin

Step Plus Proposal for Personnel Actions Robert Feenstra Dept. of Economics UC Davis 1

u.ewhy of cohomology of complex Complete variables e da dingo e g Compute theranksof matrices L

(Re-)configura.on of Communica.on Networks in the Context of

Symmetric Circuits with Non-Symmetric Gates Anuj Dawar Department of Computer Science and

WWW.FAITHWILMINGTON.COM

Inferential Problems with Nonprobability Samples Richard Valliant University of Michigan &

Fast and Scalable Relational Division on Fast and Scalable - PowerPoint PPT Presentation

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

This Lecture The Relational Model Relational data structures Relations and Relational

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

CSE 154 LECTURE 22:RELATIONAL DATABASES AND SQL Relational databases relational database : A

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

LISTA: Theoretical Linear Convergence, Practical Weights and Thresholds Xiaohan Chen , Jialin

Step Plus Proposal for Personnel Actions Robert Feenstra Dept. of Economics UC Davis 1

u.ewhy of cohomology of complex Complete variables e da dingo e g Compute theranksof matrices L

(Re-)configura.on of Communica.on Networks in the Context of

Symmetric Circuits with Non-Symmetric Gates Anuj Dawar Department of Computer Science and

WWW.FAITHWILMINGTON.COM

Inferential Problems with Nonprobability Samples Richard Valliant University of Michigan &amp;

Inferential Problems with Nonprobability Samples Richard Valliant University of Michigan &