Advanced Database Management Systems
Distributed DBMS:Introduction and Architectures Alvaro A A Fernandes
School of Computer Science, University of Manchester
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 1 / 121
Advanced Database Management Systems Distributed DBMS:Introduction - - PowerPoint PPT Presentation
Advanced Database Management Systems Distributed DBMS:Introduction and Architectures Alvaro A A Fernandes School of Computer Science, University of Manchester AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 1 / 121 Outline
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 1 / 121
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 2 / 121
Introduction to Distributed DBMSs
◮ Control (e.g., over updates, allocation of resources, etc.) ◮ Processing logic (e.g., algebraic operators, data movements, etc.) ◮ Services (e.g., optimization, access control, etc.) ◮ Data (e.g., tuples, columns, relations, etc.) AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 3 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 4 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 5 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 6 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 7 / 121
Introduction to Distributed DBMSs
◮ Transparent management of distributed, fragmented, and replicated
◮ Improved reliability/availability through distributed processes ◮ Improved performance by exploiting locality and parallelism ◮ Easier and more economical system expansion through scale out (i.e.,
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 8 / 121
Introduction to Distributed DBMSs
◮ Network (or distribution) transparency ◮ Replication transparency ◮ Fragmentation transparency ◮ horizontal, through selection ◮ vertical, through projection ◮ hybrid, combining both AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 9 / 121
Introduction to Distributed DBMSs
ENO ENAME TITLE E1
E2
E3
E4
Programmer E5
E6
E7
E8
PNO PNAME BUDGET LOC P1 Instrumentation 150000 Tokyo P2 Database Develop. 135000 Oslo P3 CAD/CAM 250000 Oslo P4 Maintenance 310000 Paris P5 CAD/CAM 500000 Paris
ENO PNO RESP DUR E1 P1 Manager 12 E2 P1 Analyst 24 E2 P2 Analyst 6 E3 P3 Consultant 10 E3 P4 Engineer 48 E4 P2 Programmer 18 E5 P2 Manager 24 E6 P4 Manager 48 E7 P3 Engineer 36 E7 P5 Engineer 23 E8 P3 Manager 40
TITLE SAL
40000
34000
27000 Programmer 24000 AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 10 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 11 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 12 / 121
Introduction to Distributed DBMSs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 13 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 14 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 15 / 121
Distributed DBMS Architectures
◮ infrastructural (e.g., different hardware, communications, OSs, etc.) ◮ syntactic (e.g., different data model, database languages, etc.) ◮ semantic (e.g., different names for the same concepts, different
◮ Design autonomy, i.e., the degree to which the design of a component
◮ Communication autonomy, i.e., the degree to which a component
◮ Execution autonomy, i.e., the degree to which a component DBMS can
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 16 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 17 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 18 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 19 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 20 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 21 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 22 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 23 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 24 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 25 / 121
Distributed DBMS Architectures
◮ either relatively few
◮ or an extremely large amount of
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 26 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 27 / 121
Distributed DBMS Architectures
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 28 / 121
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 29 / 121
Distributed DBMSs: The Design Problem
◮ the placement of data and programs ◮ across the sites of a computer network ◮ as well as possibly designing the network itself
◮ placement of the distributed DBMS software ◮ placement of the applications that run on the database AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 30 / 121
Distributed DBMSs: The Design Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 31 / 121
Distributed DBMSs: The Design Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 32 / 121
Data Distribution Strategies
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 33 / 121
Data Distribution Strategies
◮ Because most relations are designed to be suitable for a great many
◮ Roughly, that view on a relation that is needed by one or more
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 34 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 35 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 36 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 37 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 38 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 39 / 121
Fragmentation and Allocation
PROJ1 = PNO BUDGET P1 150000 P2 135000 P3 250000 P4 310000 P5 500000 PROJ2 = PNO PNAME LOC P1 Instrumentation Tokyo P2 Database Develop. Oslo P3 CAD/CAM Oslo P4 Maintenance Paris P5 CAD/CAM Paris PROJ = PNO PNAME BUDGET LOC P1 Instrumentation 150000 Tokyo P2 Database Develop. 135000 Oslo P3 CAD/CAM 250000 Oslo P4 Maintenance 310000 Paris P5 CAD/CAM 500000 Paris AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 40 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 41 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 42 / 121
Fragmentation and Allocation
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 43 / 121
Fragmentation, in More Detail
◮ about the database ◮ about the applications (i.e., the queries, by and large) ◮ about the communication network ◮ about the computer system AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 44 / 121
Fragmentation, in More Detail
◮ Primary Horizontal Fragmentation (PHF) ◮ Derived Horizontal Fragmentation (DHF)
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 45 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 46 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 47 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 48 / 121
Fragmentation, in More Detail
◮ The selectivity of a minterm mi, denoted by sel(mi) is the number of
◮ The access frequency of an application qi, denoted by acc(qi) is the
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 49 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 50 / 121
Fragmentation, in More Detail
◮ A1: Find the name and budget of projects given their project number. ◮ A2: Find projects according to their budget. ◮ Let A1 be issued at three sites. ◮ Let one site access A2 for budgets below 200000, and the other two
◮ Let the following be the simple predicates:
′Tokyo′
′Oslo′
′Paris′
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 51 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 52 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 53 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 54 / 121
Fragmentation, in More Detail
◮ EMP1 ← EMP ⋉ PAY1 ◮ EMP2 ← EMP ⋉ PAY2 AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 55 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 56 / 121
Fragmentation, in More Detail
◮ normalization of designs ◮ physical clustering
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 57 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 58 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 59 / 121
Fragmentation, in More Detail
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 60 / 121
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 61 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 62 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 63 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 64 / 121
The Distributed Query Processing Problem
1| = |ASG ′ 2| = |σRESP=′manager′(ASG1)| = |ASGi | V (ASG,RESP) = 50 5 = 10
1| + |ASG ′ 2| = 10 + 10 = 20
i ) = length(EMP) + length(ASG) − length(ENO) =
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 65 / 121
The Distributed Query Processing Problem
i
i to sites 3, 4
i ) × tt
i
i | × ta
i to site 5
i ) × tt
40
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 66 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 67 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 68 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 69 / 121
The Distributed Query Processing Problem
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 70 / 121
Two-Phase Distributed Query Optimization
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 71 / 121
Localization and Reduction
◮ determine which are distributed; ◮ for those, determine which fragments are involved; ◮ replace references to global relations with the reconstruction expression
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 72 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 73 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 74 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 75 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 76 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 77 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 78 / 121
Localization and Reduction
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 79 / 121
Cost-Related Issues
◮ Which relation to ship where? ◮ Whether to ship the whole or to ship as needed? ◮ Whether to use semijoins? (Semijoins save on communication at the
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 80 / 121
Cost-Related Issues
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 81 / 121
Cost-Related Issues
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 82 / 121
Cost-Related Issues
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 83 / 121
Cost-Related Issues
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 84 / 121
Cost-Related Issues
◮ only the communication cost is considered ◮ one message conveys one unit of work
◮ Total time = 1, 900UM + 181, 000UT ◮ Response time = 1, 000UM + 100, 000UT AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 85 / 121
Join Ordering in DQP
i=1 length(Ai).
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 86 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 87 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 88 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 89 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 90 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 91 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 92 / 121
Join Ordering in DQP
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 93 / 121
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 94 / 121
Data Integration: Problem Definition
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 95 / 121
Data Integration: Problem Definition
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 96 / 121
Process Alternatives
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 97 / 121
Process Alternatives
◮ translation ◮ Each LCS abstracts over a
◮ A translator maps across to
◮ integration ◮ The ISs are cast in an
◮ The integrator uses the ISs
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 98 / 121
Process Alternatives
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 99 / 121
View-Based Data Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 100 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 101 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 102 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 103 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 104 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 105 / 121
Schema Matching, Mapping and Integration
◮ Insufficient schema and instance information: how can one find out
◮ Subjectivity of the matching: how can one be sure that the
◮ Schema-level versus instance-level matching: which do we use? Both?
◮ Element-level versus structure-level matching: if we find a match for an
◮ Matching cardinality is hard without additional information, as it is not
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 106 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 107 / 121
Schema Matching, Mapping and Integration
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 108 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 109 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 110 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 111 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 112 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 113 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 114 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 115 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 116 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 117 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 118 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 119 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 120 / 121
Dataspaces
AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 121 / 121