23 Databases Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

23
SMART_READER_LITE
LIVE PREVIEW

23 Databases Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

Distributed OLTP 23 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall 2019 Carnegie Mellon University 2 ADM INISTRIVIA Homework #5 : Monday Dec 3 rd @ 11:59pm Project #4 : Monday Dec 10 th @ 11:59pm


slide-1
SLIDE 1

Intro to Database Systems 15-445/15-645 Fall 2019 Andy Pavlo Computer Science Carnegie Mellon University

AP AP

23

Distributed OLTP Databases

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2019)

ADM INISTRIVIA

Homework #5: Monday Dec 3rd @ 11:59pm Project #4: Monday Dec 10th @ 11:59pm Extra Credit: Wednesday Dec 10th @ 11:59pm Final Exam: Monday Dec 9th @ 5:30pm

2

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2019)

LAST CLASS

System Architectures

→ Shared-Memory, Shared-Disk, Shared-Nothing

Partitioning/Sharding

→ Hash, Range, Round Robin

Transaction Coordination

→ Centralized vs. Decentralized

3

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2019)

OLTP VS. OLAP

On-line Transaction Processing (OLTP):

→ Short-lived read/write txns. → Small footprint. → Repetitive operations.

On-line Analytical Processing (OLAP):

→ Long-running, read-only queries. → Complex joins. → Exploratory queries.

4

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2019)

P3 P4 P1 P2

DECENTRALIZED COORDINATO R

5

Application Server

Begin Request

Partitions

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2019)

P3 P4 P1 P2

DECENTRALIZED COORDINATO R

5

Application Server

Query

Partitions

Query Query

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2019)

P3 P4 P1 P2

DECENTRALIZED COORDINATO R

5

Application Server

Safe to commit? Commit Request

Partitions

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2019)

OBSERVATION

We have not discussed how to ensure that all nodes agree to commit a txn and then to make sure it does commit if we decide that it should.

→ What happens if a node fails? → What happens if our messages show up late? → What happens if we don't wait for every node to agree?

6

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2019)

IM PORTAN T ASSUM PTIO N

We can assume that all nodes in a distributed DBMS are well-behaved and under the same administrative domain.

→ If we tell a node to commit a txn, then it will commit the txn (if there is not a failure).

If you do not trust the other nodes in a distributed DBMS, then you need to use a Byzantine Fault Tolerant protocol for txns (blockchain).

7

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2019)

TODAY'S AGENDA

Atomic Commit Protocols Replication Consistency Issues (CAP) Federated Databases

8

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2019)

ATOM IC COM M IT PROTO CO L

When a multi-node txn finishes, the DBMS needs to ask all the nodes involved whether it is safe to commit. Examples:

→ Two-Phase Commit → Three-Phase Commit (not used) → Paxos → Raft → ZAB (Apache Zookeeper) → Viewstamped Replication

9

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Commit Request

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Commit Request Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Commit Request OK OK Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Commit Request OK OK Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase2: Commit

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Commit Request OK OK OK Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase2: Commit OK

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (SUCCESS)

10

Participant Participant Coordinator

Application Server Node 3 Node 2

Success!

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

Commit Request

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

Commit Request Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

Commit Request ABORT! Phase1: Prepare

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

ABORT!

Participant Participant Coordinator

Application Server Node 3 Node 2

Aborted

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

ABORT!

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase2: Abort Aborted

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2019)

Node 1

TWO- PH ASE COM M IT (ABORT)

11

ABORT! OK

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase2: Abort OK Aborted

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2019)

2PC OPTIM IZATION S

Early Prepare Voting

→ If you send a query to a remote node that you know will be the last one you execute there, then that node will also return their vote for the prepare phase with the query result.

Early Acknowledgement After Prepare

→ If all nodes vote to commit a txn, the coordinator can send the client an acknowledgement that their txn was successful before the commit phase finishes.

12

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

Commit Request

Participant Participant Coordinator

Application Server Node 3 Node 2

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

Commit Request

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase1: Prepare

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

Commit Request OK OK

Participant Participant Coordinator

Application Server Node 3 Node 2

Phase1: Prepare

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

OK OK

Participant Participant Coordinator

Application Server Node 3 Node 2

Success! Phase1: Prepare

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

OK OK

Participant Participant Coordinator

Application Server Node 3 Node 2

Success! Phase1: Prepare Phase2: Commit

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2019)

Node 1

EARLY ACKNOWLEDGEM EN T

13

OK OK OK

Participant Participant Coordinator

Application Server Node 3 Node 2

OK Success! Phase1: Prepare Phase2: Commit

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2019)

TWO- PH ASE COM M IT

Each node records the outcome of each phase in a non-volatile storage log. What happens if coordinator crashes?

→ Participants must decide what to do.

What happens if participant crashes?

→ Coordinator assumes that it responded with an abort if it hasn't sent an acknowledgement yet.

14

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2019)

PAXOS

Consensus protocol where a coordinator proposes an outcome (e.g., commit or abort) and then the participants vote on whether that

  • utcome should succeed.

Does not block if a majority of participants are available and has provably minimal message delays in the best case.

15

slide-33
SLIDE 33

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Acceptor

Node 3

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Acceptor

Node 3

Propose

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Acceptor

Node 3

X

Propose

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request Agree Agree

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Acceptor

Node 3

X

Propose

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request Agree Agree

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Acceptor

Node 3

X

Propose Commit

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Commit Request Agree Agree Accept

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Accept

Acceptor

Node 3

X

Propose Commit

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2019)

Node 1

PAXOS

16

Acceptor Acceptor Proposer

Application Server Node 4 Node 2

Success!

Acceptor

Node 3

X

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

TIM E

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n)

TIM E

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n)

TIM E

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1)

TIM E

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1) Commit(n)

TIM E

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n,n+1)

TIM E

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n,n+1) Agree(n+1)

TIM E

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n,n+1) Commit(n+1) Agree(n+1)

TIM E

slide-48
SLIDE 48

CMU 15-445/645 (Fall 2019)

PAXOS

17

Proposer Proposer Acceptors

Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n,n+1) Commit(n+1) Agree(n+1) Accept(n+1)

TIM E

slide-49
SLIDE 49

CMU 15-445/645 (Fall 2019)

M ULTI- PAXO S

If the system elects a single leader that is in charge

  • f proposing changes for some period of time,

then it can skip the Propose phase.

→ Fall back to full Paxos whenever there is a failure.

The system periodically renews who the leader is using another Paxos round.

18

slide-50
SLIDE 50

CMU 15-445/645 (Fall 2019)

2PC VS. PAXOS

Two-Phase Commit

→ Blocks if coordinator fails after the prepare message is sent, until coordinator recovers.

Paxos

→ Non-blocking if a majority participants are alive, provided there is a sufficiently long period without further failures.

19

slide-51
SLIDE 51

CMU 15-445/645 (Fall 2019)

REPLICATIO N

The DBMS can replicate data across redundant nodes to increase availability. Design Decisions:

→ Replica Configuration → Propagation Scheme → Propagation Timing → Update Method

20

slide-52
SLIDE 52

CMU 15-445/645 (Fall 2019)

REPLICA CONFIGURATIO NS

Approach #1: Master-Replica

→ All updates go to a designated master for each object. → The master propagates updates to its replicas without an atomic commit protocol. → Read-only txns may be allowed to access replicas. → If the master goes down, then hold an election to select a new master.

Approach #2: Multi-Master

→ Txns can update data objects at any replica. → Replicas must synchronize with each other using an atomic commit protocol.

21

slide-53
SLIDE 53

CMU 15-445/645 (Fall 2019)

REPLICA CONFIGURATIO NS

22

Master-Replica

Master

P1

P1 P1

Replicas

Multi-Master

Node 1

P1

Node 2

P1 Writes Reads Writes Reads Writes Reads Reads

slide-54
SLIDE 54

CMU 15-445/645 (Fall 2019)

K- SAFETY

K-safety is a threshold for determining the fault tolerance of the replicated database. The value K represents the number of replicas per data object that must always be available. If the number of replicas goes below this threshold, then the DBMS halts execution and takes itself offline.

23

slide-55
SLIDE 55

CMU 15-445/645 (Fall 2019)

PROPAGATIO N SCHEM E

When a txn commits on a replicated database, the DBMS decides whether it must wait for that txn's changes to propagate to other nodes before it can send the acknowledgement to application. Propagation levels:

→ Synchronous (Strong Consistency) → Asynchronous (Eventual Consistency)

24

slide-56
SLIDE 56

CMU 15-445/645 (Fall 2019)

PROPAGATIO N SCHEM E

Approach #1: Synchronous

→ The master sends updates to replicas and then waits for them to acknowledge that they fully applied (i.e., logged) the changes.

25

Commit? Flush? Flush!

slide-57
SLIDE 57

CMU 15-445/645 (Fall 2019)

PROPAGATIO N SCHEM E

Approach #1: Synchronous

→ The master sends updates to replicas and then waits for them to acknowledge that they fully applied (i.e., logged) the changes.

25

Commit? Flush? Ack Ack Flush!

slide-58
SLIDE 58

CMU 15-445/645 (Fall 2019)

PROPAGATIO N SCHEM E

Approach #1: Synchronous

→ The master sends updates to replicas and then waits for them to acknowledge that they fully applied (i.e., logged) the changes.

Approach #2: Asynchronous

→ The master immediately returns the acknowledgement to the client without waiting for replicas to apply the changes.

25

Commit? Flush? Ack Ack Flush! Commit? Flush? Ack

slide-59
SLIDE 59

CMU 15-445/645 (Fall 2019)

PROPAGATIO N TIM ING

Approach #1: Continuous

→ The DBMS sends log messages immediately as it generates them. → Also need to send a commit/abort message.

Approach #2: On Commit

→ The DBMS only sends the log messages for a txn to the replicas once the txn is commits. → Do not waste time sending log records for aborted txns. → Assumes that a txn's log records fits entirely in memory.

27

slide-60
SLIDE 60

CMU 15-445/645 (Fall 2019)

ACTIVE VS. PASSIVE

Approach #1: Active-Active

→ A txn executes at each replica independently. → Need to check at the end whether the txn ends up with the same result at each replica.

Approach #2: Active-Passive

→ Each txn executes at a single location and propagates the changes to the replica. → Can either do physical or logical replication. → Not the same as master-replica vs. multi-master

28

slide-61
SLIDE 61

CMU 15-445/645 (Fall 2019)

CAP THEOREM

Proposed by Eric Brewer that it is impossible for a distributed system to always be:

→ Consistent → Always Available → Network Partition Tolerant

Proved in 2002.

29

Brewer

Pick Two! Sort of…

slide-62
SLIDE 62

CMU 15-445/645 (Fall 2019)

CAP THEOREM

30

A C P

Consistency Availability Partition Tolerant

Linearizability All up nodes can satisfy all requests. Still operate correctly despite message loss. Impossible

slide-63
SLIDE 63

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=1 B=8

Application Server Application Server

slide-64
SLIDE 64

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=2 A=1 B=8

Application Server Application Server

slide-65
SLIDE 65

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=2 A=1 B=8 A=2

Application Server Application Server

slide-66
SLIDE 66

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=2 A=1 B=8 A=2

Application Server Application Server

ACK

slide-67
SLIDE 67

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=2 Read A A=1 B=8 A=2

Application Server Application Server

ACK

slide-68
SLIDE 68

CMU 15-445/645 (Fall 2019)

CAP CONSISTEN CY

31

Master Replica

NETWORK

Set A=2 A=1 B=8 A=2 Read A A=2 A=1 B=8 A=2

If master says the txn committed, then it should be immediately visible on replicas.

Application Server Application Server

ACK

slide-69
SLIDE 69

CMU 15-445/645 (Fall 2019)

CAP AVAILABILITY

32

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

X

slide-70
SLIDE 70

CMU 15-445/645 (Fall 2019)

CAP AVAILABILITY

32

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

X

Read B

slide-71
SLIDE 71

CMU 15-445/645 (Fall 2019)

CAP AVAILABILITY

32

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

X

Read B B=8

slide-72
SLIDE 72

CMU 15-445/645 (Fall 2019)

CAP AVAILABILITY

32

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

X

Read A

slide-73
SLIDE 73

CMU 15-445/645 (Fall 2019)

CAP AVAILABILITY

32

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

X

Read A A=1

slide-74
SLIDE 74

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master Replica

NETWORK

A=1 B=8 A=1 B=8

Application Server Application Server

slide-75
SLIDE 75

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master

A=1 B=8 A=1 B=8

Application Server Application Server Master

slide-76
SLIDE 76

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master

Set A=2 A=1 B=8 Set A=3 A=1 B=8

Application Server Application Server Master

slide-77
SLIDE 77

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master

Set A=2 A=1 B=8 A=2 Set A=3 A=1 B=8 A=3

Application Server Application Server Master

slide-78
SLIDE 78

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master

Set A=2 A=1 B=8 A=2 Set A=3 ACK A=1 B=8 A=3

Application Server Application Server

ACK

Master

slide-79
SLIDE 79

CMU 15-445/645 (Fall 2019)

CAP PARTITION TOLERAN CE

33

Master

NETWORK

Set A=2 A=1 B=8 A=2 Set A=3 ACK A=1 B=8 A=3

Application Server Application Server

ACK

Master

slide-80
SLIDE 80

CMU 15-445/645 (Fall 2019)

CAP FOR OLTP DBM Ss

How a DBMS handles failures determines which elements of the CAP theorem they support. Traditional/NewSQL DBMSs

→ Stop allowing updates until a majority of nodes are reconnected.

NoSQL DBMSs

→ Provide mechanisms to resolve conflicts after nodes are reconnected.

34

slide-81
SLIDE 81

CMU 15-445/645 (Fall 2019)

OBSERVATION

We have assumed that the nodes in our distributed systems are running the same DBMS software. But organizations often run many different DBMSs in their applications. It would be nice if we could have a single interface for all our data.

35

slide-82
SLIDE 82

CMU 15-445/645 (Fall 2019)

FEDERATED DATABASES

Distributed architecture that connects together multiple DBMSs into a single logical system. A query can access data at any location. This is hard and nobody does it well

→ Different data models, query languages, limitations. → No easy way to optimize queries → Lots of data copying (bad).

36

slide-83
SLIDE 83

CMU 15-445/645 (Fall 2019)

FEDERATED DATABASE EXAM PLE

37

Middleware

Query Requests

Application Server Back-end DBMSs

Connectors

slide-84
SLIDE 84

CMU 15-445/645 (Fall 2019)

FEDERATED DATABASE EXAM PLE

37

Query Requests

Application Server Back-end DBMSs Foreign Data Wrappers

Connectors

slide-85
SLIDE 85

CMU 15-445/645 (Fall 2019)

CONCLUSIO N

We assumed that the nodes in our distributed DBMS are friendly. Blockchain databases assume that the nodes are

  • adversarial. This means you must use different

protocols to commit transactions.

38

slide-86
SLIDE 86

CMU 15-445/645 (Fall 2019)

NEXT CLASS

Distributed OLAP Systems

39