24 Databases Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

24
SMART_READER_LITE
LIVE PREVIEW

24 Databases Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

Distributed OLAP 24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019 2 ADM IN ISTRIVIA Homework #5 : Monday Dec 3 rd @ 11:59pm Project #4 : Monday Dec 10 th @


slide-1
SLIDE 1

Intro to Database Systems 15-445/15-645 Fall 2019 Andy Pavlo Computer Science Carnegie Mellon University

AP AP

24

Distributed OLAP Databases

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2019)

ADM IN ISTRIVIA

Homework #5: Monday Dec 3rd @ 11:59pm Project #4: Monday Dec 10th @ 11:59pm Extra Credit: Wednesday Dec 10th @ 11:59pm Final Exam: Monday Dec 9th @ 5:30pm Systems Potpourri: Wednesday Dec 4th

→ Vote for what system you want me to talk about. → https://cmudb.io/f19-systems

2

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2019)

ADM IN ISTRIVIA

Monday Dec 2nd – Oracle Lecture

→ Shasank Chavan (VP In-Memory Databases)

Monday Dec 2nd – Oracle Systems Talk

→ 4:30pm in GHC 6115 → Pizza will be served

Tuesday Dec 3rd – Oracle Research Talk

→ Hideaki Kimura (Oracle Beast) → 12:00pm in CIC 4th Floor (Panther Hollow Room) → Pizza will be served.

3

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2019)

LAST CLASS

Atomic Commit Protocols Replication Consistency Issues (CAP) Federated Databases

4

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2019)

BIFURCATED EN VIRO N M EN T

5

Extract Transform Load OLAP Database OLTP Databases

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2019)

DECISIO N SUPPO RT SYSTEM S

Applications that serve the management,

  • perations, and planning levels of an organization

to help people make decisions about future issues and problems by analyzing historical data. Star Schema vs. Snowflake Schema

6

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2019)

STAR SCH EM A

7

CATEGORY_NAME CATEGORY_DESC PRODUCT_CODE PRODUCT_NAME PRODUCT_DESC

PRODUCT_DIM

COUNTRY STATE_CODE STATE_NAME ZIP_CODE CITY

LOCATION_DIM

ID FIRST_NAME LAST_NAME EMAIL ZIP_CODE

CUSTOMER_DIM

YEAR DAY_OF_YEAR MONTH_NUM MONTH_NAME DAY_OF_MONTH

TIME_DIM

SALES_FACT

PRODUCT_FK TIME_FK LOCATION_FK CUSTOMER_FK PRICE QUANTITY

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2019)

SN OWFLAKE SCH EM A

8 CATEGORY_FK PRODUCT_CODE PRODUCT_NAME PRODUCT_DESC

PRODUCT_DIM

COUNTRY STATE_FK ZIP_CODE CITY

LOCATION_DIM

ID FIRST_NAME LAST_NAME EMAIL ZIP_CODE

CUSTOMER_DIM

YEAR DAY_OF_YEAR MONTH_FK DAY_OF_MONTH

TIME_DIM

SALES_FACT

PRODUCT_FK TIME_FK LOCATION_FK CUSTOMER_FK PRICE QUANTITY

CATEGORY_ID CATEGORY_NAME CATEGORY_DESC

CAT_LOOKUP

STATE_ID STATE_CODE STATE_NAME

STATE_LOOKUP

MONTH_NUM MONTH_NAME MONTH_SEASON

MONTH_LOOKUP

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2019)

STAR VS. SN OWFLAKE SCH EM A

Issue #1: Normalization

→ Snowflake schemas take up less storage space. → Denormalized data models may incur integrity and consistency violations.

Issue #2: Query Complexity

→ Snowflake schemas require more joins to get the data needed for a query. → Queries on star schemas will (usually) be faster.

9

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2019)

P3 P4 P1 P2

PRO BLEM SETUP

10

Application Server Partitions

SELECT * FROM R JOIN S ON R.id = S.id

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2019)

P3 P4 P1 P2

PRO BLEM SETUP

10

Application Server Partitions

SELECT * FROM R JOIN S ON R.id = S.id

P2 P4 P3

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2019)

TO DAY'S AGEN DA

Execution Models Query Planning Distributed Join Algorithms Cloud Systems

11

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2019)

PUSH VS. PULL

Approach #1: Push Query to Data

→ Send the query (or a portion of it) to the node that contains the data. → Perform as much filtering and processing as possible where data resides before transmitting over network.

Approach #2: Pull Data to Query

→ Bring the data to the node that is executing a query that needs it for processing.

12

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2019)

PUSH Q UERY TO DATA

13

Node

Application Server

Node

P1→ID:1-100 P2→ID:101-200

SELECT * FROM R JOIN S ON R.id = S.id R ⨝ S IDs [101,200] Result: R ⨝ S

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2019)

Storage

PULL DATA TO Q UERY

14

Node

Application Server

Node

Page ABC Page XYZ R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2019)

Storage

PULL DATA TO Q UERY

14

Node

Application Server

Node

Page ABC Page XYZ R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2019)

Storage

PULL DATA TO Q UERY

14

Node

Application Server

Node

R ⨝ S IDs [101,200] P1→ID:1-100 P2→ID:101-200 SELECT * FROM R JOIN S ON R.id = S.id Result: R ⨝ S

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2019)

O BSERVATIO N

The data that a node receives from remote sources are cached in the buffer pool.

→ This allows the DBMS to support intermediate results that are large than the amount of memory available. → Ephemeral pages are not persisted after a restart.

What happens to a long-running OLAP query if a node crashes during execution?

15

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2019)

Q UERY FAULT TO LERAN CE

Most shared-nothing distributed OLAP DBMSs are designed to assume that nodes do not fail during query execution.

→ If one node fails during query execution, then the whole query fails.

The DBMS could take a snapshot of the intermediate results for a query during execution to allow it to recover if nodes fail.

16

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2019)

Storage

Q UERY FAULT TO LERAN CE

17

Node

Application Server

Node

R ⨝ S SELECT * FROM R JOIN S ON R.id = S.id Result: R ⨝ S

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2019)

Storage

Q UERY FAULT TO LERAN CE

17

Node

Application Server

Node

SELECT * FROM R JOIN S ON R.id = S.id Result: R ⨝ S

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2019)

Q UERY PLAN N IN G

All the optimizations that we talked about before are still applicable in a distributed environment.

→ Predicate Pushdown → Early Projections → Optimal Join Orderings

Distributed query optimization is even harder because it must consider the location of data in the cluster and data movement costs.

18

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2019)

Q UERY PLAN FRAGM EN TS

Approach #1: Physical Operators

→ Generate a single query plan and then break it up into partition-specific fragments. → Most systems implement this approach.

Approach #2: SQL

→ Rewrite original query into partition-specific queries. → Allows for local optimization at each node. → MemSQL is the only system that I know that does this.

19

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2019)

Q UERY PLAN FRAGM EN TS

20

SELECT * FROM R JOIN S ON R.id = S.id

Id:1-100

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 1 AND 100

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 101 AND 200

Id:201-300

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 201 AND 300

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2019)

Q UERY PLAN FRAGM EN TS

20

SELECT * FROM R JOIN S ON R.id = S.id

Id:1-100

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 1 AND 100

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 101 AND 200

Id:201-300

SELECT * FROM R JOIN S ON R.id = S.id WHERE R.id BETWEEN 201 AND 300

Union the output of each join to produce final result.

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2019)

O BSERVATIO N

The efficiency of a distributed join depends on the target tables' partitioning schemes. One approach is to put entire tables on a single node and then perform the join.

→ You lose the parallelism of a distributed DBMS. → Costly data transfer over the network.

21

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2019)

DISTRIBUTED J O IN ALGO RITH M S

To join tables R and S, the DBMS needs to get the proper tuples on the same node. Once there, it then executes the same join algorithms that we discussed earlier in the semester.

22

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 1

One table is replicated at every node. Each node joins its local data and then sends their results to a coordinating node.

23

R{Id} S

Id:1-100 Replicated

R{Id} S

Id:101-200 Replicated

SELECT * FROM R JOIN S ON R.id = S.id

P1:R⨝S P2:R⨝S

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 1

One table is replicated at every node. Each node joins its local data and then sends their results to a coordinating node.

23

R{Id} S

Id:1-100 Replicated

R{Id} S

Id:101-200 Replicated

SELECT * FROM R JOIN S ON R.id = S.id

P1:R⨝S P2:R⨝S R⨝S

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 2

Tables are partitioned on the join

  • attribute. Each node performs the join
  • n local data and then sends to a node

for coalescing.

24

R{Id} S{Id}

Id:1-100

R{Id} S{Id}

Id:101-200 Id:1-100 Id:101-200

P1:R⨝S P2:R⨝S

SELECT * FROM R JOIN S ON R.id = S.id

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 2

Tables are partitioned on the join

  • attribute. Each node performs the join
  • n local data and then sends to a node

for coalescing.

24

R{Id} S{Id}

Id:1-100

R{Id} S{Id}

Id:101-200 Id:1-100 Id:101-200

P1:R⨝S P2:R⨝S R⨝S

SELECT * FROM R JOIN S ON R.id = S.id

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 3

Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.

25

R{Id} S{Val}

Id:1-100

R{Id} S{Val}

Id:101-200 Val:1-50 Val:51-100

SELECT * FROM R JOIN S ON R.id = S.id

slide-33
SLIDE 33

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 3

Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.

25

R{Id} S{Val}

Id:1-100

R{Id} S{Val}

Id:101-200 Val:1-50 Val:51-100

S SELECT * FROM R JOIN S ON R.id = S.id

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 3

Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.

25

R{Id} S{Val}

Id:1-100

R{Id} S{Val}

Id:101-200 Val:1-50 Val:51-100

S S SELECT * FROM R JOIN S ON R.id = S.id

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 3

Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.

25

R{Id} S{Val}

Id:1-100

R{Id} S{Val}

Id:101-200 Val:1-50 Val:51-100

S S

P1:R⨝S P2:R⨝S

SELECT * FROM R JOIN S ON R.id = S.id

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 3

Both tables are partitioned on different keys. If one of the tables is small, then the DBMS broadcasts that table to all nodes.

25

R{Id} S{Val}

Id:1-100

R{Id} S{Val}

Id:101-200 Val:1-50 Val:51-100

S S

P1:R⨝S P2:R⨝S R⨝S

SELECT * FROM R JOIN S ON R.id = S.id

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100

SELECT * FROM R JOIN S ON R.id = S.id

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100

R{Id}

Id:1-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100 Id:101-200

S{Id} R{Id}

Id:1-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100 Id:1-100

S{Id}

Id:101-200

S{Id} R{Id}

Id:1-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100 Id:1-100

S{Id}

Id:101-200

S{Id}

P1:R⨝S P2:R⨝S

R{Id}

Id:1-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2019)

SCEN ARIO # 4

Both tables are not partitioned on the join key. The DBMS copies the tables by reshuffling them across nodes.

26

R{Name}

S{Val}

Name:A-M

R{Name}

S{Val}

Name:N-Z Val:1-50 Val:51-100 Id:1-100

S{Id}

Id:101-200

S{Id}

P1:R⨝S P2:R⨝S R⨝S

R{Id}

Id:1-100

R{Id}

Id:101-200

SELECT * FROM R JOIN S ON R.id = S.id

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2019)

SEM I- J O IN

Join operator where the result only contains columns from the left table. Distributed DBMSs use semi-join to minimize the amount of data sent during joins.

→ This is like a projection pushdown.

Some DBMSs support SEMI JOIN SQL syntax. Otherwise you fake it with EXISTS.

27

SELECT R.id FROM R LEFT OUTER JOIN S ON R.id = S.id WHERE R.id IS NOT NULL

R S S

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2019)

SEM I- J O IN

Join operator where the result only contains columns from the left table. Distributed DBMSs use semi-join to minimize the amount of data sent during joins.

→ This is like a projection pushdown.

Some DBMSs support SEMI JOIN SQL syntax. Otherwise you fake it with EXISTS.

27

SELECT R.id FROM R LEFT OUTER JOIN S ON R.id = S.id WHERE R.id IS NOT NULL

R S R

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2019)

SEM I- J O IN

Join operator where the result only contains columns from the left table. Distributed DBMSs use semi-join to minimize the amount of data sent during joins.

→ This is like a projection pushdown.

Some DBMSs support SEMI JOIN SQL syntax. Otherwise you fake it with EXISTS.

27

SELECT R.id FROM R LEFT OUTER JOIN S ON R.id = S.id WHERE R.id IS NOT NULL

R S

SELECT R.id FROM R WHERE EXISTS ( SELECT 1 FROM S WHERE R.id = S.id)

R.id

R.id

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2019)

SEM I- J O IN

Join operator where the result only contains columns from the left table. Distributed DBMSs use semi-join to minimize the amount of data sent during joins.

→ This is like a projection pushdown.

Some DBMSs support SEMI JOIN SQL syntax. Otherwise you fake it with EXISTS.

27

SELECT R.id FROM R LEFT OUTER JOIN S ON R.id = S.id WHERE R.id IS NOT NULL

R S

SELECT R.id FROM R WHERE EXISTS ( SELECT 1 FROM S WHERE R.id = S.id)

R.id

R.id

slide-48
SLIDE 48

CMU 15-445/645 (Fall 2019)

RELATIO N AL ALGEBRA: SEM I - J O IN

Like a natural join except that the attributes on the right table that are not used to compute the join are restricted. Syntax: (R⋉ S)

28

a_id b_id xxx a1 101 X1 a2 102 X2 a3 103 X3

R(a_id,b_id,xxx) S(a_id,b_id,yyy)

a_id b_id yyy a3 103 Y1 a4 104 Y2 a5 105 Y3

(R ⋉ S)

a_id b_id xxx a3 103 X3

slide-49
SLIDE 49

CMU 15-445/645 (Fall 2019)

CLO UD SYSTEM S

Vendors provide database-as-a-service (DBaaS)

  • fferings that are managed DBMS environments.

Newer systems are starting to blur the lines between shared-nothing and shared-disk.

→ Example: You can do simple filtering on Amazon S3 before copying data to compute nodes.

29

slide-50
SLIDE 50

CMU 15-445/645 (Fall 2019)

CLO UD SYSTEM S

Approach #1: Managed DBMSs

→ No significant modification to the DBMS to be "aware" that it is running in a cloud environment. → Examples: Most vendors

Approach #2: Cloud-Native DBMS

→ The system is designed explicitly to run in a cloud environment. → Usually based on a shared-disk architecture. → Examples: Snowflake, Google BigQuery, Amazon Redshift, Microsoft SQL Azure

30

slide-51
SLIDE 51

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Node

slide-52
SLIDE 52

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Node

slide-53
SLIDE 53

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Node Storage

slide-54
SLIDE 54

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Node Storage

Buffer Pool Page Table

slide-55
SLIDE 55

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Storage

slide-56
SLIDE 56

CMU 15-445/645 (Fall 2019)

SERVERLESS DATABASES

Rather than always maintaining compute resources for each customer, a "serverless" DBMS evicts tenants when they become idle.

31

Application Server

Node Storage

Buffer Pool Page Table

slide-57
SLIDE 57

CMU 15-445/645 (Fall 2019)

DISAGGREGATED CO M PO N EN TS

System Catalogs

→ HCatalog, Google Data Catalog, Amazon Glue Data Catalog

Node Management

→ Kubernetes, Apache YARN, Cloud Vendor Tools

Query Optimizers

→ Greenplum Orca, Apache Calcite

32

slide-58
SLIDE 58

CMU 15-445/645 (Fall 2019)

UN IVERSAL FO RM ATS

Most DBMSs use a proprietary on-disk binary file format for their databases.

→ Think of the BusTub page types…

The only way to share data between systems is to convert data into a common text-based format

→ Examples: CSV, JSON, XML

There are new open-source binary file formats that make it easier to access data across systems.

33

slide-59
SLIDE 59

CMU 15-445/645 (Fall 2019)

UN IVERSAL FO RM ATS

Apache Parquet

→ Compressed columnar storage from Cloudera/Twitter

Apache ORC

→ Compressed columnar storage from Apache Hive.

Apache CarbonData

→ Compressed columnar storage with indexes from Huawei.

34

Apache Iceberg

→ Flexible data format that supports schema evolution from Netflix.

HDF5

→ Multi-dimensional arrays for scientific workloads.

Apache Arrow

→ In-memory compressed columnar storage from Pandas/Dremio.

slide-60
SLIDE 60

CMU 15-445/645 (Fall 2019)

CO N CLUSIO N

More money, more data, more problems… Cloud OLAP Vendors: On-Premise OLAP Systems:

35

slide-61
SLIDE 61

CMU 15-445/645 (Fall 2019)

N EXT CLASS

Oracle Guest Speaker

36