20 Schemes Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

20
SMART_READER_LITE
LIVE PREVIEW

20 Schemes Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

Logging 20 Schemes Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019 2 ADM IN ISTRIVIA Homework #4 is due Wed Nov 13th @ 11:59pm. Project #3 is due Sun Nov 17th @ 11:59pm.


slide-1
SLIDE 1

Intro to Database Systems 15-445/15-645 Fall 2019 Andy Pavlo Computer Science Carnegie Mellon University

AP AP

20

Logging Schemes

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2019)

ADM IN ISTRIVIA

Homework #4 is due Wed Nov 13th @ 11:59pm. Project #3 is due Sun Nov 17th @ 11:59pm. Extra Credit Checkpoint is due Sun Nov 24th.

2

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2019)

TIM E

Schedule

M OTIVATIO N

6

BEGIN R(A) W(A) ⋮ COMMIT

Buffer Pool

A=1 Page A=1 A=2 T1

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2019)

CRASH RECOVERY

Recovery algorithms are techniques to ensure database consistency, transaction atomicity, and durability despite failures. Recovery algorithms have two parts:

→ Actions during normal txn processing to ensure that the DBMS can recover from a failure. → Actions after a failure to recover the database to a state that ensures atomicity, consistency, and durability.

7

Today

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2019)

Failure Classification Buffer Pool Policies Shadow Paging Write-Ahead Log Logging Schemes Checkpoints

8

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2019)

CRASH RECOVERY

DBMS is divided into different components based

  • n the underlying storage device.

We must also classify the different types of failures that the DBMS needs to handle.

9

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2019)

FAILURE CLASSIFICATIO N

Type #1 – Transaction Failures Type #2 – System Failures Type #3 – Storage Media Failures

11

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2019)

TRAN SACTIO N FAILURES

Logical Errors:

→ Transaction cannot complete due to some internal error condition (e.g., integrity constraint violation).

Internal State Errors:

→ DBMS must terminate an active transaction due to an error condition (e.g., deadlock).

12

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2019)

SYSTEM FAILURES

Software Failure:

→ Problem with the DBMS implementation (e.g., uncaught divide-by-zero exception).

Hardware Failure:

→ The computer hosting the DBMS crashes (e.g., power plug gets pulled). → Fail-stop Assumption: Non-volatile storage contents are assumed to not be corrupted by system crash.

13

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2019)

STO RAGE M EDIA FAILURE

Non-Repairable Hardware Failure:

→ A head crash or similar disk failure destroys all or part of non-volatile storage. → Destruction is assumed to be detectable (e.g., disk controller use checksums to detect failures).

No DBMS can recover from this! Database must be restored from archived version.

14

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2019)

O BSERVATIO N

The primary storage location of the database is on non-volatile storage, but this is much slower than volatile storage. Use volatile memory for faster access:

→ First copy target record into memory. → Perform the writes in memory. → Write dirty records back to disk.

15

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2019)

O BSERVATIO N

The DBMS needs to ensure the following guarantees:

→ The changes for any txn are durable once the DBMS has told somebody that it committed. → No partial changes are durable if the txn aborted.

16

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2019)

UN DO VS. REDO

Undo: The process of removing the effects of an incomplete or aborted txn. Redo: The process of re-instating the effects of a committed txn for durability. How the DBMS supports this functionality depends on how it manages the buffer pool…

17

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2019)

Buffer Pool

TIM E

Schedule

T1 T2

BUFFER PO O L

18

A=1 B=9 C=7 A=1 B=9 C=7 B=8

BEGIN R(A) W(A) ⋮ ABORT BEGIN R(B) W(B) COMMIT

A=3

Do we force T

2’s changes to

be written to disk? Is T

1 allowed to overwrite A even

though it has not committed? What happens when we need to rollback T

1?

B=8 A=3

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2019)

STEAL PO LICY

Whether the DBMS allows an uncommitted txn to

  • verwrite the most recent committed value of an
  • bject in non-volatile storage.

STEAL: Is allowed. NO-STEAL: Is not allowed.

19

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2019)

FO RCE PO LICY

Whether the DBMS requires that all updates made by a txn are reflected on non-volatile storage before the txn is allowed to commit. FORCE: Is required. NO-FORCE: Is not required.

20

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2019)

Buffer Pool

TIM E

Schedule

T1 T2

N O - STEAL + FO RCE

21

A=1 B=9 C=7 A=1 B=9 C=7 B=8

BEGIN R(A) W(A) ⋮ ABORT BEGIN R(B) W(B) COMMIT

A=3

FORCE means that T

2 changes must be

written to disk at this point. NO-STEAL means that T

1 changes

cannot be written to disk yet. Now it’s trivial to rollback T

1

B=8 A=1 B=8 C=7

Copy

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2019)

N O - STEAL + FO RCE

This approach is the easiest to implement:

→ Never have to undo changes of an aborted txn because the changes were not written to disk. → Never have to redo changes of a committed txn because all the changes are guaranteed to be written to disk at commit time (assuming atomic hardware writes).

Previous example cannot support write sets that exceed the amount of physical memory available.

22

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2019)

SH ADOW PAGIN G

Maintain two separate copies of the database:

→ Master: Contains only changes from committed txns. → Shadow: Temporary database with changes made from uncommitted txns.

Txns only make updates in the shadow copy. When a txn commits, atomically switch the shadow to become the new master. Buffer Pool Policy: NO-STEAL + FORCE

23

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2019)

SH ADOW PAGIN G

Instead of copying the entire database, the DBMS copies pages on write. Organize the database pages in a tree structure where the root is a single disk page. There are two copies of the tree, the master and shadow

→ The root points to the master copy. → Updates are applied to the shadow copy.

24

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2019)

Disk Memory

SH ADOW PAGIN G EXAM PLE

25

Master Page Table

1 2 3 4

DB Root

Database Root

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2019)

SH ADOW PAGIN G

To install the updates, overwrite the root so it points to the shadow, thereby swapping the master and shadow:

→ Before overwriting the root, none of the txn's updates are part of the disk-resident database → After overwriting the root, all the txn's updates are part

  • f the disk-resident database.

26

Source: The Great Phil Bernstein

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2019)

COMMIT

Disk Memory

SH ADOW PAGIN G EXAM PLE

27

Master Page Table

1 2 3 4

DB Root Shadow Page Table

1 2 3 4

Read-only txns access the current master. Active modifying txn updates shadow pages.

X

X X X

Update

Database Root

Update

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2019)

SH ADOW PAGIN G UN DO / REDO

Supporting rollbacks and recovery is easy. Undo: Remove the shadow pages. Leave the master and the DB root pointer alone. Redo: Not needed at all.

28

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2019)

SH ADOW PAGIN G DISADVAN TAGES

Copying the entire page table is expensive:

→ Use a page table structured like a B+tree. → No need to copy entire tree, only need to copy paths in the tree that lead to updated leaf nodes.

Commit overhead is high:

→ Flush every updated page, page table, and root. → Data gets fragmented. → Need garbage collection. → Only supports one writer txn at a time or txns in a batch.

29

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2019)

SQ LITE (PRE- 20 10 )

When a txn modifies a page, the DBMS copies the original page to a separate journal file before

  • verwriting master version.

After restarting, if a journal file exists, then the DBMS restores it to undo changes from uncommitted txns.

30

Memory Disk

Page 2 Page 2 Page 1 Page 3 Page 5 Page 4 Page 6

Journal File

Page 1 Page 2 Page 3 Page 2' Page 3 Page 3' Page 2'

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2019)

O BSERVATIO N

Shadowing page requires the DBMS to perform writes to random non-contiguous pages on disk. We need a way for the DBMS convert random writes into sequential writes.

31

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2019)

WRITE- AH EAD LO G

Maintain a log file separate from data files that contains the changes that txns make to database.

→ Assume that the log is on stable storage. → Log contains enough information to perform the necessary undo and redo actions to restore the database.

DBMS must write to disk the log file records that correspond to changes made to a database object before it can flush that object to disk. Buffer Pool Policy: STEAL + NO-FORCE

32

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2019)

WAL PROTO CO L

The DBMS stages all a txn's log records in volatile storage (usually backed by buffer pool). All log records pertaining to an updated page are written to non-volatile storage before the page itself is over-written in non-volatile storage. A txn is not considered committed until all its log records have been written to stable storage.

33

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2019)

WAL PROTO CO L

Write a <BEGIN> record to the log for each txn to mark its starting point. When a txn finishes, the DBMS will:

→ Write a <COMMIT> record on the log → Make sure that all log records are flushed before it returns an acknowledgement to application.

34

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2019)

WAL PROTO CO L

Each log entry contains information about the change to a single object:

→ Transaction Id → Object Id → Before Value (UNDO) → After Value (REDO)

35

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2019)

Buffer Pool

A=1 B=5 C=7 A=8

TIM E

Schedule

BEGIN W(A) W(B) ⋮ COMMIT

T1

WAL Buffer

WAL EXAM PLE

36

<T1 BEGIN> <T1, A, 1, 8> <T1, B, 5, 9> <T1 COMMIT> ⋮

Txn result is now safe to return to application.

B=9 A=1 B=5 C=7

<T1 BEGIN> <T1, A, 1, 8> <T1, B, 5, 9> <T1 COMMIT>

1 2X

X

Everything we need to restore T

1 is in the log!

slide-33
SLIDE 33

CMU 15-445/645 (Fall 2019)

WAL IM PLEM EN TATIO N

When should the DBMS write log entries to disk?

→ When the transaction commits. → Can use group commit to batch multiple log flushes together to amortize overhead.

37

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2019)

WAL Buffers

WAL GRO UP CO M M IT

38

<T1 BEGIN> <T1, A, 1, 8> <T1, B, 5, 9> <T2 BEGIN> <T2, C, 1, 2>

<T1 BEGIN> <T1, A, 1, 8> <T1, B, 5, 9> <T2 BEGIN> <T2, C, 1, 2>

TIM E

Schedule

T1 T2

BEGIN W(A) W(B) ⋮ COMMIT BEGIN W(C) W(D) ⋮ COMMIT <T2, D, 3, 4>

<T2, D, 3, 4>

Flush the buffer when it is full. Flush after an elapsed amount of time.

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2019)

WAL IM PLEM EN TATIO N

When should the DBMS write log entries to disk?

→ When the transaction commits. → Can use group commit to batch multiple log flushes together to amortize overhead.

When should the DBMS write dirty records to disk?

→ Every time the txn executes an update? → Once when the txn commits?

39

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2019)

BUFFER PO O L PO LICIES

Almost every DBMS uses NO-FORCE + STEAL

42

NO-STEAL STEAL NO-FORCE

Fastest

FORCE

Slowest

Runtime Performance

NO-STEAL STEAL NO-FORCE

Slowest

FORCE Fastest

Recovery Performance

Undo + Redo No Undo + No Redo

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2019)

LO GGIN G SCH EM ES

Physical Logging

→ Record the changes made to a specific location in the database. → Example: git diff

Logical Logging

→ Record the high-level operations executed by txns. → Not necessarily restricted to single page. → Example: The UPDATE, DELETE, and INSERT queries invoked by a txn.

43

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2019)

PH YSICAL VS. LO GICAL LO GGIN G

Logical logging requires less data written in each log record than physical logging. Difficult to implement recovery with logical logging if you have concurrent txns.

→ Hard to determine which parts of the database may have been modified by a query before crash. → Also takes longer to recover because you must re-execute every txn all over again.

44

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2019)

PH YSIO LO GICAL LO GGIN G

Hybrid approach where log records target a single page but do not specify data organization of the page. This is the most popular approach.

45

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2019)

LO GGIN G SCH EM ES

46

UPDATE foo SET val = XYZ WHERE id = 1;

Physical

<T1, Table=X, Page=99, Offset=4, Before=ABC, After=XYZ> <T1, Index=X_PKEY, Page=45, Offset=9, Key=(1,Record1)>

Logical

<T1, Query="UPDATE foo SET val=XYZ WHERE id=1">

Physiological

<T1, Table=X, Page=99, ObjectId=1, Before=ABC, After=XYZ> <T1, Index=X_PKEY, IndexPage=45, Key=(1,Record1)>

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2019)

CH ECKPO IN TS

The WAL will grow forever. After a crash, the DBMS has to replay the entire log which will take a long time. The DBMS periodically takes a checkpoint where it flushes all buffers out to disk.

47

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2019)

CH ECKPO IN TS

Output onto stable storage all log records currently residing in main memory. Output to the disk all modified blocks. Write a <CHECKPOINT> entry to the log and flush to stable storage.

48

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2019)

WAL

<T1 BEGIN> <T1, A, 1, 2> <T1 COMMIT> <T2 BEGIN> <T2, A, 2, 3> <T3 BEGIN> <CHECKPOINT> <T2 COMMIT> <T3, A, 3, 4> ⋮ CRASH!

CH ECKPO IN TS

Any txn that committed before the checkpoint is ignored (T1). T2 + T3 did not commit before the last checkpoint.

→ Need to redo T2 because it committed after checkpoint. → Need to undo T3 because it did not commit before the crash.

49

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2019)

CH ECKPO IN TS CH ALLEN GES

We have to stall all txns when take a checkpoint to ensure a consistent snapshot. Scanning the log to find uncommitted txns can take a long time. Not obvious how often the DBMS should take a checkpoint…

50

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2019)

CH ECKPO IN TS FREQ UEN CY

Checkpointing too often causes the runtime performance to degrade.

→ System spends too much time flushing buffers.

But waiting a long time is just as bad:

→ The checkpoint will be large and slow. → Makes recovery time much longer.

51

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2019)

CO N CLUSIO N

Write-Ahead Logging is (almost) always the best approach to handle loss of volatile storage.

→ Use incremental updates (STEAL + NO-FORCE) with checkpoints. → On recovery: undo uncommitted txns + redo committed txns.

52

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2019)

N EXT CLASS

Recovery with ARIES.

53