Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ.
AP AP
Lecture # 24
Distributed OLAP Databases Lecture # 24 Database Systems Andy - - PowerPoint PPT Presentation
Distributed OLAP Databases Lecture # 24 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 UPCO M IN G DATABASE EVEN TS Swarm64 Tech Talk Thursday November 29 th @ 12pm GHC 8102
Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ.
Lecture # 24
CMU 15-445/645 (Fall 2018)
UPCO M IN G DATABASE EVEN TS
Swarm64 Tech Talk
→ Thursday November 29th @ 12pm → GHC 8102 ← Different Location!
VoltDB Research Talk
→ Monday December 3rd @ 4:30pm → GHC 8102
2
CMU 15-445/645 (Fall 2018)
O LTP VS. O LAP
On-line Transaction Processing (OLTP):
→ Short-lived read/write txns. → Small footprint. → Repetitive operations.
On-line Analytical Processing (OLAP):
→ Long-running, read-only queries. → Complex joins. → Exploratory queries.
8
CMU 15-445/645 (Fall 2018)
BIFURCATED EN VIRO N M EN T
9
Extract Transform Load OLAP Database OLTP Databases
CMU 15-445/645 (Fall 2018)
DECISIO N SUPPO RT SYSTEM S
Applications that serve the management,
to help people make decisions about future issues and problems by analyzing historical data. Star Schema vs. Snowflake Schema
10
CMU 15-445/645 (Fall 2018)
STAR SCH EM A
11
CATEGORY_NAME CATEGORY_DESC PRODUCT_CODE PRODUCT_NAME PRODUCT_DESC
PRODUCT_DIM
COUNTRY STATE_CODE STATE_NAME ZIP_CODE CITY
LOCATION_DIM
ID FIRST_NAME LAST_NAME EMAIL ZIP_CODE
CUSTOMER_DIM
YEAR DAY_OF_YEAR MONTH_NUM MONTH_NAME DAY_OF_MONTH
TIME_DIM
SALES_FACT
PRODUCT_FK TIME_FK LOCATION_FK CUSTOMER_FK PRICE QUANTITY
CMU 15-445/645 (Fall 2018)
SN OWFLAKE SCH EM A
12 CATEGORY_FK PRODUCT_CODE PRODUCT_NAME PRODUCT_DESC
PRODUCT_DIM
COUNTRY STATE_FK ZIP_CODE CITY
LOCATION_DIM
ID FIRST_NAME LAST_NAME EMAIL ZIP_CODE
CUSTOMER_DIM
YEAR DAY_OF_YEAR MONTH_FK DAY_OF_MONTH
TIME_DIM
SALES_FACT
PRODUCT_FK TIME_FK LOCATION_FK CUSTOMER_FK PRICE QUANTITY
CATEGORY_ID CATEGORY_NAME CATEGORY_DESC
CAT_LOOKUP
STATE_ID STATE_CODE STATE_NAME
STATE_LOOKUP
MONTH_NUM MONTH_NAME MONTH_SEASON
MONTH_LOOKUP
CMU 15-445/645 (Fall 2018)
STAR VS. SN OWFLAKE SCH EM A
Issue #1: Normalization
→ Snowflake schemas take up less storage space. → Denormalized data models may incur integrity and consistency violations.
Issue #2: Query Complexity
→ Snowflake schemas require more joins to get the data needed for a query. → Queries on star schemas will (usually) be faster.
13
CMU 15-445/645 (Fall 2018)
P3 P4 P1 P2
PRO BLEM SETUP
14
Application Server Partitions
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
P3 P4 P1 P2
PRO BLEM SETUP
14
Application Server Partitions
SELECT * FROM R JOIN S ON R.id = S.id
P2 P4 P3
CMU 15-445/645 (Fall 2018)
TO DAY'S AGEN DA
Execution Models Query Planning Distributed Join Algorithms Cloud Systems
15
CMU 15-445/645 (Fall 2018)
PUSH VS. PULL
Approach #1: Push Query to Data
→ Send the query (or a portion of it) to the node that contains the data. → Perform as much filtering and processing as possible where data resides before transmitting over network.
Approach #2: Pull Data to Query
→ Bring the data to the node that is executing a query that needs it for processing.
16
CMU 15-445/645 (Fall 2018)
PUSH Q UERY TO DATA
17
Node
Application Server
Node
P1→ID:1-100 P2→ID:101-200
SELECT * FROM R JOIN S ON R.id = S.id R ⨝ S IDs [101,200] Result: R ⨝ S
CMU 15-445/645 (Fall 2018)
Storage
PULL DATA TO Q UERY
18
Node
Application Server
Node
Page ABC Page XYZ R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
Storage
PULL DATA TO Q UERY
18
Node
Application Server
Node
Page ABC Page XYZ R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
Storage
PULL DATA TO Q UERY
18
Node
Application Server
Node
R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id Result: R ⨝ S
CMU 15-445/645 (Fall 2018)
FAULT TO LERAN CE
Traditional distributed OLAP DBMSs were designed to assume that nodes will not fail during query execution.
→ If the DBMS fails during query execution, then the whole query fails.
The DBMS could take a snapshot of the intermediate results for a query during execution to allow it to recover after a crash.
21
CMU 15-445/645 (Fall 2018)
Q UERY PLAN N IN G
All the optimizations that we talked about before are still applicable in a distributed environment.
→ Predicate Pushdown → Early Projections → Optimal Join Orderings
But now the DBMS must also consider the location of data at each partition when optimizing
22
CMU 15-445/645 (Fall 2018)
Q UERY PLAN FRAGM EN TS
Approach #1: Physical Operators
→ Generate a single query plan and then break it up into partition-specific fragments. → Most systems implement this approach.
Approach #2: SQL
→ Rewrite original query into partition-specific queries. → Allows for local optimization at each node. → MemSQL is the only system that I know that does this.
23
CMU 15-445/645 (Fall 2018)
Q UERY PLAN FRAGM EN TS
25
SELECT * FROM R JOIN S ON R.id = S.id
Id:1-100
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 1 AND 100
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 101 AND 200
Id:201-300
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 201 AND 300
CMU 15-445/645 (Fall 2018)
Q UERY PLAN FRAGM EN TS
25
SELECT * FROM R JOIN S ON R.id = S.id
Id:1-100
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 1 AND 100
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 101 AND 200
Id:201-300
SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 201 AND 300
Union the output of each join together to produce final result.
CMU 15-445/645 (Fall 2018)
O BSERVATIO N
The efficiency of a distributed join depends on the target tables' partitioning schemes. One approach is to put entire tables on a single node and then perform the join.
→ You lose the parallelism of a distributed DBMS. → Costly data transfer over the network.
26
CMU 15-445/645 (Fall 2018)
DISTRIBUTED J O IN ALGO RITH M S
To join tables R and S, the DBMS needs to get the proper tuples on the same node. Once there, it then executes the same join algorithms that we discussed earlier in the semester.
27
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 1
One table is replicated at every node. Each node joins its local data and then sends their results to a coordinating node.
28
R (Id) S
Id:1-100 Replicated
R (Id) S
Id:101-200 Replicated
SELECT * FROM R JOIN S ON R.id = S.id
P1 :R⨝S P2:R⨝S
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 1
One table is replicated at every node. Each node joins its local data and then sends their results to a coordinating node.
28
R (Id) S
Id:1-100 Replicated
R (Id) S
Id:101-200 Replicated
SELECT * FROM R JOIN S ON R.id = S.id
P1 :R⨝S P2:R⨝S R⨝S
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 2
Tables are partitioned on the join
for coalescing.
29
R (Id) S (Id)
Id:1-100
R (Id) S (Id)
Id:101-200 Id:1-100 Id:101-200
P1 :R⨝S P2:R⨝S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 2
Tables are partitioned on the join
for coalescing.
29
R (Id) S (Id)
Id:1-100
R (Id) S (Id)
Id:101-200 Id:1-100 Id:101-200
P1 :R⨝S P2:R⨝S R⨝S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 3
Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.
30
R (Id) S (Val)
Id:1-100
R (Id) S (Val)
Id:101-200 Val:1-50 Val:51-100
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 3
Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.
30
R (Id) S (Val)
Id:1-100
R (Id) S (Val)
Id:101-200 Val:1-50 Val:51-100
S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 3
Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.
30
R (Id) S (Val)
Id:1-100
R (Id) S (Val)
Id:101-200 Val:1-50 Val:51-100
S S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 3
Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.
30
R (Id) S (Val)
Id:1-100
R (Id) S (Val)
Id:101-200 Val:1-50 Val:51-100
S S P1 :R⨝S P2:R⨝S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 3
Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.
30
R (Id) S (Val)
Id:1-100
R (Id) S (Val)
Id:101-200 Val:1-50 Val:51-100
S S P1 :R⨝S P2:R⨝S R⨝S
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100
R (Id)
Id:1-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100 Id:101-200
S (Id) R (Id)
Id:1-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100 Id:1-100
S (Id)
Id:101-200
S (Id) R (Id)
Id:1-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100 Id:1-100
S (Id)
Id:101-200
S (Id) P1 :R⨝S P2:R⨝S R (Id)
Id:1-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
SCEN ARIO # 4
Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.
31
R (Name)
S (Val)
Name:A-M
R (Name)
S (Val)
Name:N-Z Val:1-50 Val:51-100 Id:1-100
S (Id)
Id:101-200
S (Id) P1 :R⨝S P2:R⨝S R⨝S R (Id)
Id:1-100
R (Id)
Id:101-200
SELECT * FROM R JOIN S ON R.id = S.id
CMU 15-445/645 (Fall 2018)
RELATIO N AL ALGEBRA: SEM I- J O IN
Like a natural join except that the attributes that are not used to compute the join are restricted. Syntax: (R⋉ S)
32
a_id b_id xxx a1 101 X1 a2 102 X2 a3 103 X3
R(a_id,b_id,xxx) S(a_id,b_id,yyy)
a_id b_id yyy a3 103 Y1 a4 104 Y2 a5 105 Y3
(R ⋉ S)
a_id b_id a3 103
Distributed DBMSs use semi-join to minimize the amount of data sent during joins. This is the same as a projection pushdown.
CMU 15-445/645 (Fall 2018)
CLO UD SYSTEM S
Vendors provide database-as-a-service (DBaaS)
Newer systems are starting to blur the lines between shared-nothing and shared-disk.
33
CMU 15-445/645 (Fall 2018)
CLO UD SYSTEM S
Approach #1: Managed DBMSs
→ No significant modification to the DBMS to be "aware" that it is running in a cloud environment. → Examples: Most vendors
Approach #2: Cloud-Native DBMS
→ The system is designed explicitly to run in a cloud environment. → Usually based on a shared-disk architecture. → Examples: Snowflake, Google BigQuery, Amazon Redshift, Microsoft SQL Azure
34
CMU 15-445/645 (Fall 2018)
UN IVERSAL FO RM ATS
Traditional DBMSs store data in proprietary binary file formats that are incompatible. One can use text formats (XML/JSON/CSV) to share data across different systems. There are now standardized file formats.
35
CMU 15-445/645 (Fall 2018)
UN IVERSAL FO RM ATS
Apache Parquet
→ Compressed columnar storage from Cloudera/Twitter
Apache ORC
→ Compressed columnar storage from Apache Hive.
HDF5
→ Multi-dimensional arrays for scientific workloads.
Apache Arrow
→ In-memory compressed columnar storage from Pandas/Dremio
36
CMU 15-445/645 (Fall 2018)
CO N CLUSIO N
Again, efficient distributed OLAP systems are difficult to implement. More data, more problems…
37
CMU 15-445/645 (Fall 2018)
N EXT CLASS
VoltDB Guest Speaker
38