Systems Infrastructure for Data Science Web Science Group Uni - PowerPoint PPT Presentation

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13

Lecture VIII: Fragmentation

Fragmentation • Fragments should be subsets of database relations due to two main reasons: – Access locality: Application views are subsets of relations. Also, multiple views that access a relation may reside at different sites. – Query concurrency and system throughput: Sub- queries can operate on fragments in parallel. • Main issues: – Views that cannot be defined on a single fragment will require extra processing and communication cost . – Semantic data control (e.g., integrity checking) of dependent fragments residing at different sites is more complicated and costly. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 3

Fragmentation Alternatives • Horizontal fragmentation – Primary horizontal fragmentation – Derived horizontal fragmentation • Vertical fragmentation • Hybrid fragmentation Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 4

Example Database Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 5

Horizontal Fragmentation Example Projects with BUDGET < $200,000 Projects with BUDGET ≥ $200,000 Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 6

Vertical Fragmentation Example Project budgets Project names and locations Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 7

Hybrid Fragmentation Example Projects with BUDGET < $200,000 Projects with BUDGET ≥ $200,000 Horizontal Project budgets Project names and locations Vertical Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 8

Correctness of Fragmentation • Completeness – Decomposition of relation R into fragments R 1 , R 2 , .., R n is complete iff each data item in R can also be found in one or more of R i ’s. • Reconstruction – If a relation R is decomposed into fragments R 1 , R 2 , .., R n , then there should exist a relational operator θ such that R = θ 1≤i≤n R i . • Disjointness – If a relation R is horizontally ( vertically ) decomposed into fragments R 1 , R 2 , .., R n , and data item d i ( non-primary key attribute d i ) is in R j , then d i should not be in any other fragment R k (k ≠ j). Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 9

Horizontal Fragmentation Algorithms What is given? • Relationships among database relations L i : one-to-many relationship from an “owner” to a “member” Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 10

Horizontal Fragmentation Algorithms What is given? • Cardinality of each database relation • Mostly used predicates in user queries • Predicate selectivities • Access frequencies for data Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 11

Horizontal Fragmentation Algorithms Predicates • Simple predicate – Given R(A 1 , A 2 , .., A n ), a simple predicate p j is defined as “p j : A i θ value”, where θ є {=, <, ≤, >, ≥, ≠} and value є D i , where D i is the domain of A i . – Examples: PNAME = “Maintenance” BUDGET ≤ 200000 • Minterm predicate – A conjunction of simple and negated simple predicates – Examples: PNAME = “Maintenance” AND BUDGET ≤ 200000 NOT(PNAME = “Maintenance”) AND BUDGET ≤ 200000 Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 12

Primary Horizontal Fragmentation Definition • Given an owner relation R, its horizontal fragments are given by Ri = σ Fi (R), 1 ≤ i ≤ w where F i is a minterm predicate. • First step: Determine a set of simple predicates that will form the minterm predicates. This set of simple predicates must have two key properties: – completeness – minimality Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 13

Completeness of Simple Predicates Definition • A set of simple predicates P is complete iff the accesses to the tuples of the minterm fragments defined on P requires that two tuples of the same minterm fragment have the same probability of being accessed by any application . Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 14

Completeness of Simple Predicates Example Set of simple predicates: P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”} App 1: Find the budgets of projects at each location. App 2: Find projects with budgets less than $200000. P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET ≤ 200000, BUDGET > 200000} complete Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 15

Minimality of Simple Predicates Definition • A set of simple predicates P is complete iff for each predicate p є P: – if p influences how fragmentation is performed (i.e., causes a fragment f to be further fragmented into f i anf f j ), then there should be at least one application that accesses f i and f j differently. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 16

Minimality of Simple Predicates Example App 1: Find the budgets of projects at each location. App 2: Find projects with budgets less than $200000. P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET ≤ 200000, BUDGET > 200000} complete & minimal + PNAME=“Instrumentation” P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET ≤ 200000, BUDGET > 200000, complete & NOT minimal PNAME=“Instrumentation”} Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 17

Primary Horizontal Fragmentation COM_MIN Algorithm Sketch • Input: a relation R and a set of simple predicates P r • Output: a complete and minimal set of simple predicates P r ’ for P r • Rule 1: A relation or fragment is partitioned into at least two parts which are accessed differently by at least one application. • Find a p i є P r such that p i partitions R according to Rule 1. Initialize P r ’ = p i . • Iteratively add predicates to P r ’ until it is complete. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 18

Primary Horizontal Fragmentation PHORIZONTAL Algorithm Sketch • Input: a relation R and a set of simple predicates P r • Output: a set of minterm predicates M according to which relation R is to be fragmented • P r ’ ← COM_MIN(R, P r ) • Determine the set M of minterm predicates • Determine the set I of implications among p i є P r ’ • Eliminate the minterms from M that contradict with I Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 19

Primary Horizontal Fragmentation Example • PAY(title, sal) and PROJ(pno, pname, budget, loc) • Fragmentation of relation PAY – Application: Check the salary info and determine raise. (employee records kept at two sites → application run at two sites) – Simple predicates • p 1 : sal ≤ 30000 • p 2 : sal > 30000 • P r = {p 1 , p 2 } which is complete and minimal P r ‘ = P r – Minterm predicates • m 1 : (sal ≤ 30000) • m 2 : NOT(sal ≤ 30000) = ( sal > 30000) Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 20

Primary Horizontal Fragmentation Example Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 21

Primary Horizontal Fragmentation Example • Fragmentation of relation PROJ – App1: Find the name and budget of projects given their location. (issued at 3 sites) – App2: Access project information according to budget (one site accesses ≤ 200000, other accesses > 200000) – Simple predicates • For App1: p 1 : LOC = “Montreal” p 2 : LOC = “New York” p 3 : LOC = “Paris” • For App2: p 4 : BUDGET ≤ 200000 p 5 : BUDGET > 200000 • P r = P r ' = {p 1 , p 2 , p 3 , p 4 , p 5 } Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 22

Primary Horizontal Fragmentation Example • Fragmentation of relation PROJ – Minterm fragments left after elimination m 1 : (LOC = “Montreal”) AND (BUDGET ≤ 200000) m 2 : (LOC = “Montreal”) AND (BUDGET > 200000) m 3 : (LOC = “New York”) AND (BUDGET ≤ 200000) m 4 : (LOC = “New York”) AND (BUDGET > 200000) m 5 : (LOC = “Paris”) AND (BUDGET ≤ 200000) m 6 : (LOC = “Paris”) AND (BUDGET > 200000) Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 23

Primary Horizontal Fragmentation Example Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 24

Primary Horizontal Fragmentation Correctness • Completeness – Since P r ' is complete and minimal, the selection predicates are complete. • Reconstruction – If relation R is fragmented into F R = {R 1 , R 2 , .., R r } R = U Ri є FR R i • Disjointness – Minterm predicates that form the basis of fragmentation should be mutually exclusive. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 25

Derived Horizontal Fragmentation • Defined on a member relation of a link according to a selection operation specified on its owner. • Two important points: – Each link is an equi-join. – Equi-join can be implemented using semi-joins. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 26

Semi-join • Given R(A) and S(B), semi-join of R with S is defined as follows: • Example: Semi-join reduces the amount of data that needs to be transmitted btw sites. Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 27

Systems Infrastructure for Data Science Web Science Group Uni - PowerPoint PPT Presentation

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Lecture VIII: Fragmentation Fragmentation Fragments should be subsets of database relations due to two main reasons: Access locality: Application views

Cyber- -Science Infrastructure: Science Infrastructure: Cyber Cyber-Science Infrastructure:

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture I:

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Medical Infrastructure in Medical Infrastructure in Medical Infrastructure in Medical

What can Infrastructure do for you today? Daniel Humbedooh Gruno Infrastructure Architect,

Lecture 23 Verified Systems Software Infrastructure is Shaky Software Infrastructure is Shaky

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Compiler Infrastructure Systems and Internet Infrastructure Security (SIIS) Laboratory Page 1

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

Selecting Least Cost Green Infrastructure James W. Ridgway, PE October 14, 2015 Integrated

Infrastructure Solutions MSD 2250R Infrastructure Solutions Background: Infrastructure

Infrastructure & Shared Services Director Infrastructure & Shared Services Organisational

Broadband Infrastructure in Broadband Infrastructure in North Asia and Central Asia North Asia and

Data protection by means of fragmentation Summer school on real-world crypto and privacy

Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory Qingda Hu*, Jinglei Ren, Anirudh

Secure Fragmentation for Content Centric Networking Christopher A. Wood Palo Alto Reseach Center

Measurement of jet fragmentation at ATLAS Andy Buckley, University of Glasgow for the ATLAS

Fragmented Data Routing Based on Exponentially Distributed Contacts in Delay Tolerant Networks

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian

Exploring the (Metric) Space of Collider Events with CMS Open Data Monash University Virtual

Fragmentation, amalgamation and twisted Hilbert spaces Daniel Morales Gonz alez Departamento

Systems Infrastructure for Data Science Web Science Group Uni - PowerPoint PPT Presentation

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Lecture VIII: Fragmentation Fragmentation Fragments should be subsets of database relations due to two main reasons: Access locality: Application views

Cyber- -Science Infrastructure: Science Infrastructure: Cyber Cyber-Science Infrastructure:

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture I:

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Medical Infrastructure in Medical Infrastructure in Medical Infrastructure in Medical

What can Infrastructure do for you today? Daniel Humbedooh Gruno Infrastructure Architect,

Lecture 23 Verified Systems Software Infrastructure is Shaky Software Infrastructure is Shaky

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Compiler Infrastructure Systems and Internet Infrastructure Security (SIIS) Laboratory Page 1

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

Selecting Least Cost Green Infrastructure James W. Ridgway, PE October 14, 2015 Integrated

Infrastructure Solutions MSD 2250R Infrastructure Solutions Background: Infrastructure

Infrastructure &amp; Shared Services Director Infrastructure &amp; Shared Services Organisational

Broadband Infrastructure in Broadband Infrastructure in North Asia and Central Asia North Asia and

Data protection by means of fragmentation Summer school on real-world crypto and privacy

Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory Qingda Hu*, Jinglei Ren, Anirudh

Secure Fragmentation for Content Centric Networking Christopher A. Wood Palo Alto Reseach Center

Measurement of jet fragmentation at ATLAS Andy Buckley, University of Glasgow for the ATLAS

Fragmented Data Routing Based on Exponentially Distributed Contacts in Delay Tolerant Networks

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian

Exploring the (Metric) Space of Collider Events with CMS Open Data Monash University Virtual

Fragmentation, amalgamation and twisted Hilbert spaces Daniel Morales Gonz alez Departamento

Infrastructure & Shared Services Director Infrastructure & Shared Services Organisational