CS377: Database Systems Distributed Databases
1
Distributed Databases
- Department of Mathematics and Computer Science
Emory University
CS377: Database Systems Distributed Databases Distributed Databases - - PowerPoint PPT Presentation
CS377: Database Systems Distributed Databases Distributed Databases Department of Mathematics and Computer Science Emory University 1 Centralized DBMS on a Network
1
Emory University
A distributed database (DDB) is a collection of
4
A distributed database management system (D–
Distributed database system (DDBS) = DDB + D–
6
7
How to distribute the database
8
Optimize cost = data transmission + local processing
Synchronization of concurrent accesses Consistency and isolation of transactions' effects
9
Deadlock management How to make the system resilient to failures Atomicity and durability
Data distribution
TopAdown A mostly in designing systems from scratch BottomAup A when the databases already exist at a
10
Unit of distribution
relation fragments of relations (subArelations)
Data are inherently fragmented, e.g. in locality Allow concurrent execution of a number of transactions that
access different portions of a relation
11
" # $ % !!!! % &'"() # !!! * + #!!!!
!!!!!
$ % !!!!
+ #!!!!
!!!!!
" % &'"() # !!!
!!!!!
" # $ % !!!! % &'"() # !!! * + #!!!!
!!!!!
# % !!!! % # !!! * #!!!!
# $
&'"()
+
!!!!!
A horizontal subset of a relation which contain those of tuples
E.g. Employee relation with selection condition (DNO = 5)
Can be specified by a σ
σ σ σCi (R) operation in the relational algebra.
14
Can be specified by a σ
σ σ σCi (R) operation in the relational algebra.
Complete horizontal fragmentation
A set of horizontal fragments whose conditions C1, C2, …, Cn
include all the tuples in RA every tuple in R satisfies (C1 OR C2 OR … OR Cn).
Disjoint complete horizontal fragmentation: No tuple in R
satisfies (Ci AND Cj) where i ≠ j.
How to reconstruct R from complete horizontal fragments?
Round robin Hash partitioning Range partitioning
15
16
17
A vertical subset of a relation that contains a subset of
E.g. Employee relation: a vertical fragment of Name, Bdate, Sex Can be specified by a ΠLi(R) operation in the relational algebra.
19
Can be specified by a ΠLi(R) operation in the relational algebra. Each fragment must include the primary key attribute of the parent
relation Employee Complete vertical fragmentation
A set of vertical fragments whose projection lists L1, L2, …, Ln
include all the attributes in R but share only the primary key of R.
L1 ∪ L2 ∪ ... ∪ Ln = ATTRS (R) Li ∩ Lj = PK(R) for any i j
How to reconstruct R from complete vertical fragments?
A combination of Vertical fragmentation and Horizontal
This is achieved by SELECTAPROJECT operations
20
!"
A definition of a set of fragments (horizontal or vertical or
#"
Distribution of fragments to sites of distributed databases. It
21
Distribution of fragments to sites of distributed databases. It
$
Full replication: database is replicated to all sites. Partial replication: some selected part is replicated
22
How to distribute the database
23
Optimize cost = data transmission + local processing
Cost of transferring data (files and results) over the network is usually high Example:
Employee at site 1 and Department at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106 bytes.
24
size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes. Table size
= 3,500 bytes.
Q submitted at Site 3: retrieve employee name and department name
where the employee works.
ΠFname,Lname,Dname (Employee Dno = Dnumber Department) Result has 10,000 tuples and each result tuple is 40 bytes
Strategies:
Total transfer size
25
Total transfer size
Total bytes transferred
Optimization criteria: minimizing data transfer. Which strategy?
Strategies:
Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
26
Query result size = 40 * 10,000 = 400,000 bytes. Total transfer
size = 400,000 + 1,000,000 = 1,400,000 bytes.
Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
What if Q is submitted at site 2? Example:
Employee at site 1 and Department at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106 bytes.
27
size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes. Table size
= 3,500 bytes.
Q submitted at Site 2: retrieve employee name and department name
where the employee works.
ΠFname,Lname,Dname (Employee Dno = Dnumber Department) Result has 10,000 tuples and each result tuple is 40 bytes
28
⋉ ⋉ ⋉ = Π ( join ).
Parallel database
Using parallel processers
Architectures
Shared memory
29
Shared memory Shared disk Shared nothing
Data partitioning (shard)