Fast and Scalable Relational Division on Fast and Scalable - - PowerPoint PPT Presentation

fast and scalable relational division on fast and
SMART_READER_LITE
LIVE PREVIEW

Fast and Scalable Relational Division on Fast and Scalable - - PowerPoint PPT Presentation

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso


slide-1
SLIDE 1

Fast and Scalable Relational Division on Database Systems

André S. Gonzaga, Robson L. F. Cordeiro

Fast and Scalable Relational Division on Database Systems

slide-2
SLIDE 2

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-3
SLIDE 3

Relational Division allows simple representations

  • f queries involving the concept of “for all”

INTRODUCTION

slide-4
SLIDE 4
  • 1. To select candidates having all

the skills for a given job

INTRODUCTION

slide-5
SLIDE 5
  • 2. To select the diseases that have all

the given symptoms

INTRODUCTION

slide-6
SLIDE 6
  • 2. To select the animals that have all

the desired genetic conditions

INTRODUCTION

slide-7
SLIDE 7
  • 1. Relational Algebra:

INTRODUCTION

slide-8
SLIDE 8
  • 1. Relational Algebra:
  • 2. RDBMS / SQL:
  • a. Does not have an explicit operator for it.
  • b. There are several possible implementations in SQL.
  • c. Most of the time the relational division is used indirectly.

INTRODUCTION

slide-9
SLIDE 9

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-10
SLIDE 10

1. Evaluate the division implementations in RDBMS in different cases of use. CONTRIBUTIONS

slide-11
SLIDE 11

1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. CONTRIBUTIONS

slide-12
SLIDE 12

1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. CONTRIBUTIONS

slide-13
SLIDE 13

1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. 4. Perform a case study to select genetic data using the relational division. CONTRIBUTIONS

slide-14
SLIDE 14

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-15
SLIDE 15

SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data

slide-16
SLIDE 16

SNP - Single Nucleotide Polymorphism

  • Variations among the individuals in genome wherein the

least frequent allele has an abundance of 1% or greater BACKGROUND | Genetic Data

slide-17
SLIDE 17

SNP - Single Nucleotide Polymorphism

  • Variations among the individuals in genome wherein the

least frequent allele has an abundance of 1% or greater

  • Some SNPs are reported to be highly related to diseases or

development of specific traits of the individual. BACKGROUND | Genetic Data

slide-18
SLIDE 18

SNP - Single Nucleotide Polymorphism

  • Variations among the individuals in genome wherein the

least frequent allele has an abundance of 1% or greater

  • Some SNPs are reported to be highly related to diseases or

development of specific traits of the individual.

  • Represents about 90% of all genetic variations
  • f the individuals.

BACKGROUND | Genetic Data

slide-19
SLIDE 19

SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data

SNP

Codified as: Position along the chromosome Alleles: 11, 12 , 21, 22

slide-20
SLIDE 20

SNP - Single Nucleotide Polymorphism BACKGROUND | Genetic Data Codified as: Position along the chromosome Alleles: 11, 12 , 21, 22

Genetic data of the Individual

slide-21
SLIDE 21

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-22
SLIDE 22
  • It is the only, directly, algebraic correspondent to the

Universal Quantification (∀) from the Relational Calculus. BACKGROUND | Relational Division

slide-23
SLIDE 23

BACKGROUND | Relational Division

  • It is the only, directly, algebraic correspondent to the

Universal Quantification (∀) from the Relational Calculus. The division operation is a derived operator.

slide-24
SLIDE 24

BACKGROUND | Relational Division

slide-25
SLIDE 25

BACKGROUND | Relational Division

DIVIDEND

slide-26
SLIDE 26

BACKGROUND | Relational Division

DIVIDEND

slide-27
SLIDE 27

BACKGROUND | Relational Division

DIVIDEND

slide-28
SLIDE 28

BACKGROUND | Relational Division

DIVIDEND

slide-29
SLIDE 29

BACKGROUND | Relational Division

DIVISOR

slide-30
SLIDE 30

BACKGROUND | Relational Division

DIVISOR

slide-31
SLIDE 31

BACKGROUND | Relational Division

DIVISOR

slide-32
SLIDE 32

BACKGROUND | Relational Division

QUOTIENT

slide-33
SLIDE 33

RELATED WORK

slide-34
SLIDE 34

RELATED WORK

slide-35
SLIDE 35

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-36
SLIDE 36
  • We developed a new algorithm for the division operation

PROPOSED ALGORITHMS

slide-37
SLIDE 37

PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

slide-38
SLIDE 38

Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division

slide-39
SLIDE 39

Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division

slide-40
SLIDE 40

Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division

slide-41
SLIDE 41

Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division

slide-42
SLIDE 42

Valid groups: {1, 2, 3} PROPOSED ALGORITHMS | Index-Division

slide-43
SLIDE 43

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-44
SLIDE 44

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2;

slide-45
SLIDE 45

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation;

slide-46
SLIDE 46

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result;

slide-47
SLIDE 47

PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result; 4. Variability, the differences in size between individuals, adjusting the number of tuples on each group.

slide-48
SLIDE 48

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-49
SLIDE 49

EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100%

slide-50
SLIDE 50

EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100% 2. Genetic Data: 4,100 Animals 10,000 SNPs > 40,000,000 tuples!

http://qtl-mas-2012.kassiopeagroup.com/en/index.php

slide-51
SLIDE 51

EXPERIMENTS

slide-52
SLIDE 52

1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

slide-53
SLIDE 53

CONCLUSION We consider that a possible implementation of the Index Division inside the core of the DBMS could achieve the best performance on relational division queries.

slide-54
SLIDE 54

Thanks for your attention!

slide-55
SLIDE 55

REFERENCES