Fast and Scalable Relational Division on Fast and Scalable - - PowerPoint PPT Presentation
Fast and Scalable Relational Division on Fast and Scalable - - PowerPoint PPT Presentation
Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
Relational Division allows simple representations
- f queries involving the concept of “for all”
INTRODUCTION
- 1. To select candidates having all
the skills for a given job
INTRODUCTION
- 2. To select the diseases that have all
the given symptoms
INTRODUCTION
- 2. To select the animals that have all
the desired genetic conditions
INTRODUCTION
- 1. Relational Algebra:
INTRODUCTION
- 1. Relational Algebra:
- 2. RDBMS / SQL:
- a. Does not have an explicit operator for it.
- b. There are several possible implementations in SQL.
- c. Most of the time the relational division is used indirectly.
INTRODUCTION
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
1. Evaluate the division implementations in RDBMS in different cases of use. CONTRIBUTIONS
1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. CONTRIBUTIONS
1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. CONTRIBUTIONS
1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. 4. Perform a case study to select genetic data using the relational division. CONTRIBUTIONS
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data
SNP - Single Nucleotide Polymorphism
- Variations among the individuals in genome wherein the
least frequent allele has an abundance of 1% or greater BACKGROUND | Genetic Data
SNP - Single Nucleotide Polymorphism
- Variations among the individuals in genome wherein the
least frequent allele has an abundance of 1% or greater
- Some SNPs are reported to be highly related to diseases or
development of specific traits of the individual. BACKGROUND | Genetic Data
SNP - Single Nucleotide Polymorphism
- Variations among the individuals in genome wherein the
least frequent allele has an abundance of 1% or greater
- Some SNPs are reported to be highly related to diseases or
development of specific traits of the individual.
- Represents about 90% of all genetic variations
- f the individuals.
BACKGROUND | Genetic Data
SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data
SNP
Codified as: Position along the chromosome Alleles: 11, 12 , 21, 22
SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data Codified as: Position along the chromosome Alleles: 11, 12 , 21, 22
Genetic data of the Individual
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
- It is the only, directly, algebraic correspondent to the
Universal Quantification (∀) from the Relational Calculus. BACKGROUND | Relational Division
BACKGROUND | Relational Division
- It is the only, directly, algebraic correspondent to the
Universal Quantification (∀) from the Relational Calculus. The division operation is a derived operator.
BACKGROUND | Relational Division
BACKGROUND | Relational Division
DIVIDEND
BACKGROUND | Relational Division
DIVIDEND
BACKGROUND | Relational Division
DIVIDEND
BACKGROUND | Relational Division
DIVIDEND
BACKGROUND | Relational Division
DIVISOR
BACKGROUND | Relational Division
DIVISOR
BACKGROUND | Relational Division
DIVISOR
BACKGROUND | Relational Division
QUOTIENT
RELATED WORK
RELATED WORK
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
- We developed a new algorithm for the division operation
PROPOSED ALGORITHMS
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division
Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division
Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division
Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division
Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result; 4. Variability, the differences in size between individuals, adjusting the number of tuples on each group.
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100%
EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100% 2. Genetic Data: 4,100 Animals 10,000 SNPs > 40,000,000 tuples!
http://qtl-mas-2012.kassiopeagroup.com/en/index.php
EXPERIMENTS
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion