ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control - - PowerPoint PPT Presentation

advanced database systems
SMART_READER_LITE
LIVE PREVIEW

ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control - - PowerPoint PPT Presentation

Lect ure # 03 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Design Decisions) @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 CO RRECTIO N Original SQL-92 isolation levels were not devised assuming a


slide-1
SLIDE 1

Multi-Version Concurrency Control (Design Decisions)

@ Andy_Pavlo // 15- 721 // Spring 2019

ADVANCED DATABASE SYSTEMS

Lect ure # 03

slide-2
SLIDE 2 CMU 15-721 (Spring 2019)

CO RRECTIO N

Original SQL-92 isolation levels were not devised assuming a 2PL-based DBMS.

2

A CRITIQUE OF ANSI SQL ISOLATION LEVELS

SIGMOD 1995

slide-3
SLIDE 3 CMU 15-721 (Spring 2019)

CO RRECTIO N

Original SQL-92 isolation levels were not devised assuming a 2PL-based DBMS.

2

A CRITIQUE OF ANSI SQL ISOLATION LEVELS

SIGMOD 1995

slide-4
SLIDE 4 CMU 15-721 (Spring 2019)

CO RRECTIO N

Original SQL-92 isolation levels were not devised assuming a 2PL-based DBMS.

2

A CRITIQUE OF ANSI SQL ISOLATION LEVELS

SIGMOD 1995

slide-5
SLIDE 5 CMU 15-721 (Spring 2019)

TO DAY'S AGEN DA

Overview of In-Memory MVCC

3

slide-6
SLIDE 6 CMU 15-721 (Spring 2019)

M ULTI- VERSIO N CO N CURREN CY CO N TRO L

The DBMS maintains multiple physical versions

  • f a single logical object in the database:

→ When a txn writes to an object, the DBMS creates a new version of that object. → When a txn reads an object, it reads the newest version that existed when the txn started.

First proposed in 1978 MIT PhD dissertation. First implementation was InterBase (Firebird). Used in almost every new DBMS in last 10 years.

4

slide-7
SLIDE 7 CMU 15-721 (Spring 2019)

M ULTI- VERSIO N CO N CURREN CY CO N TRO L

Main benefits:

→ Writers don’t block readers. → Read-only txns can read a consistent snapshot without acquiring locks. → Easily support time-travel queries.

MVCC is more than just a “concurrency control protocol”. It completely affects how the DBMS manages transactions and the database.

5

slide-8
SLIDE 8 CMU 15-721 (Spring 2019)

SN APSH OT ISO LATIO N

When a txn starts, it sees a consistent snapshot of the database that existed at the moment that the txn started.

→ No torn writes from active txns. → If two txns update the same object, then first writer wins.

We get SI automatically for "free" with MVCC.

→ If we want serializable isolation, then the DBMS has to do extra stuff…

6

slide-9
SLIDE 9 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

Concurrency Control Protocol Version Storage Garbage Collection Index Management

7

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-10
SLIDE 10 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

Concurrency Control Protocol Version Storage Garbage Collection Index Management

7

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-11
SLIDE 11 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

Concurrency Control Protocol Version Storage Garbage Collection Index Management

7

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-12
SLIDE 12 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

Concurrency Control Protocol Version Storage Garbage Collection Index Management

7

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-13
SLIDE 13 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

Concurrency Control Protocol Version Storage Garbage Collection Index Management

7

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-14
SLIDE 14 CMU 15-721 (Spring 2019)

CO N CURREN CY CO N TRO L PROTO CO L

Approach #1: Timestamp Ordering

→ Assign txns timestamps that determine serial order. → Considered to be original MVCC protocol.

Approach #2: Optimistic Concurrency Control

→ Three-phase protocol from last class. → Use private workspace for new versions.

Approach #3: Two-Phase Locking

→ Txns acquire appropriate lock on physical version before they can read/write a logical tuple.

8

slide-15
SLIDE 15 CMU 15-721 (Spring 2019)

TUPLE FO RM AT

9

Unique Txn Identifier Version Lifetime Next/Prev Version Additional Meta-data

TXN-ID DATA BEGIN-TS END-TS POINTER ...

slide-16
SLIDE 16 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

VERSION

slide-17
SLIDE 17 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

VERSION

Thread #1 Tid=10

slide-18
SLIDE 18 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts.

VERSION

Thread #1 Tid=10

slide-19
SLIDE 19 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts.

VERSION

Thread #1 Tid=10

slide-20
SLIDE 20 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts.

VERSION

Thread #1 Tid=10

slide-21
SLIDE 21 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

VERSION

Thread #1 Tid=10

slide-22
SLIDE 22 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

VERSION

Thread #1 Tid=10

slide-23
SLIDE 23 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

10

VERSION

Thread #1 Tid=10

slide-24
SLIDE 24 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

10 B2 10 10

VERSION

Thread #1 Tid=10

slide-25
SLIDE 25 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

10 B2 10 10

10

VERSION

Thread #1 Tid=10

slide-26
SLIDE 26 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1 1 1

B1 1

TIM ESTAM P O RDERIN G (M VTO )

10

Use read-ts field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the latch is unset and its Tid is between begin-ts and end-ts. Txn creates a new version if no other txn holds latch and Tid is greater than read-ts.

B2 10 10

10

VERSION

Thread #1 Tid=10

slide-27
SLIDE 27 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

VERSION

Thread #1 Tid=10

slide-28
SLIDE 28 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field.

VERSION

Thread #1 Tid=10

slide-29
SLIDE 29 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

1

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field.

VERSION

Thread #1 Tid=10

slide-30
SLIDE 30 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

1

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field. If both txn-id and read-cnt are zero, then txn acquires the EXCLUSIVE lock by setting both of them.

VERSION

Thread #1 Tid=10

slide-31
SLIDE 31 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

1

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field. If both txn-id and read-cnt are zero, then txn acquires the EXCLUSIVE lock by setting both of them.

10

VERSION

1

Thread #1 Tid=10

slide-32
SLIDE 32 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

1

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field. If both txn-id and read-cnt are zero, then txn acquires the EXCLUSIVE lock by setting both of them.

10 B2 10 10

VERSION

1

Thread #1 Tid=10

slide-33
SLIDE 33 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

1

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field. If both txn-id and read-cnt are zero, then txn acquires the EXCLUSIVE lock by setting both of them.

10 B2 10 10

10

VERSION

1

Thread #1 Tid=10

slide-34
SLIDE 34 CMU 15-721 (Spring 2019)

TXN-ID READ-CNT BEGIN-TS END-TS

A1 1

B1 1

TWO - PH ASE LO CKIN G (M V2PL)

11

Txns use the tuple's read- cnt field as SHARED lock. Use txn-id and read-cnt together as EXCLUSIVE lock.

READ(A) WRITE(B)

If txn-id is zero, then the txn acquires the SHARED lock by incrementing the read-cnt field. If both txn-id and read-cnt are zero, then txn acquires the EXCLUSIVE lock by setting both of them.

B2 10 10

10

VERSION

Thread #1 Tid=10

slide-35
SLIDE 35 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

Thread #1 Tid=231-1

WRITE(A)

slide-36
SLIDE 36 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

Thread #1 Tid=231-1

WRITE(A)

231-1 A2

  • 231-1

231-1 231-1

slide-37
SLIDE 37 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

Thread #1 Tid=231-1

WRITE(A)

231-1 A2

  • 231-1

slide-38
SLIDE 38 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

Thread #1 Tid=231-1 Thread #2 Tid=1

WRITE(A)

231-1 A2

  • 231-1

slide-39
SLIDE 39 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

A3

  • 1

1

Thread #1 Tid=231-1 Thread #2 Tid=1

WRITE(A)

231-1 A2

  • 231-1

1 1

slide-40
SLIDE 40 CMU 15-721 (Spring 2019)

TXN-ID READ-TS BEGIN-TS END-TS

A1

  • 99999

VERSION

O BSERVATIO N

If the DBMS reaches the max value for its timestamps, it will have to wrap around and restart at one. This will make all previous versions be in the "future" from new transactions.

12

A3

  • 1

Thread #1 Tid=231-1 Thread #2 Tid=1

231-1 A2

  • 231-1

1

slide-41
SLIDE 41 CMU 15-721 (Spring 2019)

PO STGRES TXN ID WRAPARO UN D

Set a flag in each tuple header that says that it is "frozen" in the past. Any new txn id will always be newer than a frozen version. Runs the vacuum before the system gets close to this upper limit. Otherwise it has to stop accepting new commands when the system gets close to the max txn id.

13

slide-42
SLIDE 42 CMU 15-721 (Spring 2019)

VERSIO N STO RAGE

The DBMS uses the tuples’ pointer field to create a latch-free version chain per logical tuple.

→ This allows the DBMS to find the version that is visible to a particular txn at runtime. → Indexes always point to the “head” of the chain.

Threads store versions in “local” memory regions to avoid contention on centralized data structures. Different storage schemes determine where/what to store for each version.

14

slide-43
SLIDE 43 CMU 15-721 (Spring 2019)

VERSIO N STO RAGE

Approach #1: Append-Only Storage

→ New versions are appended to the same table space.

Approach #2: Time-Travel Storage

→ Old versions are copied to separate table space.

Approach #3: Delta Storage

→ The original values of the modified attributes are copied into a separate delta record space.

15

slide-44
SLIDE 44 CMU 15-721 (Spring 2019)

APPEN D- O N LY STO RAGE

All of the physical versions of a logical tuple are stored in the same table space. The versions are mixed together. On every update, append a new version of the tuple into an empty space in the table.

16

Main Table

VERSION VALUE

A0 $111

POINTER

A1 $222 Ø B1 $10 Ø

slide-45
SLIDE 45 CMU 15-721 (Spring 2019)

APPEN D- O N LY STO RAGE

All of the physical versions of a logical tuple are stored in the same table space. The versions are mixed together. On every update, append a new version of the tuple into an empty space in the table.

16

Main Table

VERSION VALUE

A0 $111

POINTER

A1 $222 Ø A2 $333 Ø B1 $10 Ø

slide-46
SLIDE 46 CMU 15-721 (Spring 2019)

APPEN D- O N LY STO RAGE

All of the physical versions of a logical tuple are stored in the same table space. The versions are mixed together. On every update, append a new version of the tuple into an empty space in the table.

16

Main Table

VERSION VALUE

A0 $111

POINTER

A1 $222 A2 $333 Ø B1 $10 Ø

slide-47
SLIDE 47 CMU 15-721 (Spring 2019)

VERSIO N CH AIN O RDERIN G

Approach #1: Oldest-to-Newest (O2N)

→ Just append new version to end of the chain. → Have to traverse chain on look-ups.

Approach #2: Newest-to-Oldest (N2O)

→ Have to update index pointers for every new version. → Don’t have to traverse chain on look ups.

The ordering of the chain has different performance trade-offs.

17

slide-48
SLIDE 48 CMU 15-721 (Spring 2019)

TIM E- TRAVEL STO RAGE

18

On every update, copy the current version to the time- travel table. Update pointers. Main Table

VERSION VALUE

A2 $222

POINTER

B1 $10

Time-Travel Table

VERSION VALUE

A1 $111

POINTER

Ø

slide-49
SLIDE 49 CMU 15-721 (Spring 2019)

TIM E- TRAVEL STO RAGE

18

On every update, copy the current version to the time- travel table. Update pointers. Main Table

VERSION VALUE

A2 $222

POINTER

B1 $10

Time-Travel Table

VERSION VALUE

A1 $111

POINTER

A2 $222 Ø

slide-50
SLIDE 50 CMU 15-721 (Spring 2019)

TIM E- TRAVEL STO RAGE

18

On every update, copy the current version to the time- travel table. Update pointers. Overwrite master version in the main table. Update pointers. Main Table

VERSION VALUE

A2 $222

POINTER

B1 $10 A3 $333

Time-Travel Table

VERSION VALUE

A1 $111

POINTER

A2 $222 Ø

slide-51
SLIDE 51 CMU 15-721 (Spring 2019)

TIM E- TRAVEL STO RAGE

18

On every update, copy the current version to the time- travel table. Update pointers. Overwrite master version in the main table. Update pointers. Main Table

VERSION VALUE

A2 $222

POINTER

B1 $10 A3 $333

Time-Travel Table

VERSION VALUE

A1 $111

POINTER

A2 $222 Ø

slide-52
SLIDE 52 CMU 15-721 (Spring 2019)

DELTA STO RAGE

19

On every update, copy only the values that were modified to the delta storage and overwrite the master version. Main Table

VERSION VALUE

A1 $111

POINTER

B1 $10

Delta Storage Segment

slide-53
SLIDE 53 CMU 15-721 (Spring 2019)

DELTA STO RAGE

19

On every update, copy only the values that were modified to the delta storage and overwrite the master version. Main Table

VERSION VALUE

A1 $111

POINTER

B1 $10

Delta Storage Segment

DELTA POINTER

A1 (VALUE→$111) Ø

slide-54
SLIDE 54 CMU 15-721 (Spring 2019)

DELTA STO RAGE

19

On every update, copy only the values that were modified to the delta storage and overwrite the master version. Main Table

VERSION VALUE

A1 $111

POINTER

B1 $10

Delta Storage Segment

DELTA POINTER

A1 (VALUE→$111) Ø A2 $222

slide-55
SLIDE 55 CMU 15-721 (Spring 2019)

DELTA STO RAGE

19

On every update, copy only the values that were modified to the delta storage and overwrite the master version. Main Table

VERSION VALUE

A1 $111

POINTER

B1 $10

Delta Storage Segment

DELTA POINTER

A2 (VALUE→$222) A1 (VALUE→$111) Ø A2 $222

slide-56
SLIDE 56 CMU 15-721 (Spring 2019)

DELTA STO RAGE

19

On every update, copy only the values that were modified to the delta storage and overwrite the master version. Txns can recreate old versions by applying the delta in reverse order. Main Table

VERSION VALUE

A1 $111

POINTER

B1 $10

Delta Storage Segment

DELTA POINTER

A2 (VALUE→$222) A1 (VALUE→$111) Ø A2 $222 A3 $333

slide-57
SLIDE 57 CMU 15-721 (Spring 2019)

N O N- IN LIN E ATTRIBUTES

20

INT_VAL

A1 $100

Variable-Length Data

A1

STR_VAL

MY_LONG_STRING

Main Table

VERSION

slide-58
SLIDE 58 CMU 15-721 (Spring 2019)

N O N- IN LIN E ATTRIBUTES

20

Reuse pointers to variable- length pool for values that do not change between versions.

INT_VAL

A1 $100 A2 $90

Variable-Length Data

A1

STR_VAL

MY_LONG_STRING MY_LONG_STRING

Main Table

VERSION

slide-59
SLIDE 59 CMU 15-721 (Spring 2019)

N O N- IN LIN E ATTRIBUTES

20

Reuse pointers to variable- length pool for values that do not change between versions. Requires reference counters to know when it is safe to free memory. Unable to relocate memory easily.

INT_VAL

A1 $100 A2 $90

Variable-Length Data

MY_LONG_STRING Refs=1 A1

STR_VAL

Main Table

VERSION

slide-60
SLIDE 60 CMU 15-721 (Spring 2019)

N O N- IN LIN E ATTRIBUTES

20

Reuse pointers to variable- length pool for values that do not change between versions. Requires reference counters to know when it is safe to free memory. Unable to relocate memory easily.

INT_VAL

A1 $100 A2 $90

Variable-Length Data

MY_LONG_STRING Refs=1 A1

STR_VAL

Refs=2

Main Table

VERSION

slide-61
SLIDE 61 CMU 15-721 (Spring 2019)

GARBAGE CO LLECTIO N

The DBMS needs to remove reclaimable physical versions from the database over time.

→ No active txn in the DBMS can “see” that version (SI). → The version was created by an aborted txn.

Three additional design decisions:

→ How to look for expired versions? → How to decide when it is safe to reclaim memory? → Where to look for expired versions?

21

slide-62
SLIDE 62 CMU 15-721 (Spring 2019)

GARBAGE CO LLECTIO N

Approach #1: Tuple-level

→ Find old versions by examining tuples directly. → Background Vacuuming vs. Cooperative Cleaning

Approach #2: Transaction-level

→ Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.

22

slide-63
SLIDE 63 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

slide-64
SLIDE 64 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

slide-65
SLIDE 65 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

slide-66
SLIDE 66 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

slide-67
SLIDE 67 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

Dirty Block BitMap

slide-68
SLIDE 68 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

VERSION BEGIN-TS END-TS

A100 1 9 B100 1 9 B101 10 20

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Vacuum

Dirty Block BitMap

slide-69
SLIDE 69 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

A2 A3 B0 B1 B2 B3

INDEX

A0 A1

slide-70
SLIDE 70 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

A2 A3 B0 B1 B2 B3

INDEX

A0 A1

GET(A)

slide-71
SLIDE 71 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

A2 A3 B0 B1 B2 B3

INDEX

A0 A1

X

GET(A)

slide-72
SLIDE 72 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

A2 A3 B0 B1 B2 B3

INDEX

A0 A1

X X

GET(A)

slide-73
SLIDE 73 CMU 15-721 (Spring 2019)

Thread #1 Tid=12 Thread #2 Tid=25

TUPLE- LEVEL GC

23

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage. Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

A2 A3 B0 B1 B2 B3

INDEX

GET(A)

slide-74
SLIDE 74 CMU 15-721 (Spring 2019)

TRAN SACTIO N - LEVEL GC

Each txn keeps track of its read/write set. The DBMS determines when all versions created by a finished txn are no longer visible. May still require multiple threads to reclaim the memory fast enough for the workload.

24

slide-75
SLIDE 75 CMU 15-721 (Spring 2019)

IN DEX M AN AGEM EN T

PKey indexes always point to version chain head.

→ How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated. → If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary indexes are more complicated…

25

slide-76
SLIDE 76 CMU 15-721 (Spring 2019)

IN DEX M AN AGEM EN T

PKey indexes always point to version chain head.

→ How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated. → If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary indexes are more complicated…

25

slide-77
SLIDE 77 CMU 15-721 (Spring 2019)

SECO N DARY IN DEXES

Approach #1: Logical Pointers

→ Use a fixed identifier per tuple that does not change. → Requires an extra indirection layer. → Primary Key vs. Tuple Id

Approach #2: Physical Pointers

→ Use the physical address to the version chain head.

26

slide-78
SLIDE 78 CMU 15-721 (Spring 2019)

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest

slide-79
SLIDE 79 CMU 15-721 (Spring 2019)

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

GET(A) Append-Only Newest-to-Oldest

Physical Address

slide-80
SLIDE 80 CMU 15-721 (Spring 2019)

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest GET(A)

Physical Address

slide-81
SLIDE 81 CMU 15-721 (Spring 2019)

SECONDARY INDEX SECONDARY INDEX SECONDARY INDEX

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest GET(A)

slide-82
SLIDE 82 CMU 15-721 (Spring 2019)

SECONDARY INDEX SECONDARY INDEX SECONDARY INDEX

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest GET(A)

slide-83
SLIDE 83 CMU 15-721 (Spring 2019)

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest GET(A)

Physical Address Primary Key

slide-84
SLIDE 84 CMU 15-721 (Spring 2019)

IN DEX PO IN TERS

27

PRIMARY INDEX SECONDARY INDEX

A4 A3 A2 A1

Append-Only Newest-to-Oldest GET(A)

TupleId→Address

TupleId Physical Address

slide-85
SLIDE 85 CMU 15-721 (Spring 2019)

M VCC EVALUATIO N PAPER

We implemented all of the design decisions in the Peloton DBMS as part of 15-721 in Spring 2016. Two categories of experiments:

→ Evaluate each of the design decisions in isolation to determine their trade-offs. → Compare configurations of real-world MVCC systems.

28

AN EMPIRICAL EVALUATION OF IN- MEMORY MULTI- VERSION CONCURRENCY CONTROL

VLDB 2017

slide-86
SLIDE 86 CMU 15-721 (Spring 2019)

M VCC DESIGN DECISIO N S

CC Protocol: Inconclusive results… Version Storage: Deltas Garbage Collection: Tuple-Level Vacuuming Indexes: Logical Pointers

29

slide-87
SLIDE 87 CMU 15-721 (Spring 2019)

M VCC CO N FIGURATIO N EVALUATIO N

30

Protocol Version Storage Garbage Collection Indexes

Oracle MV2PL Delta Vacuum Logical Postgres MV-2PL/MV-TO Append-Only Vacuum Physical MySQL-InnoDB MV-2PL Delta Vacuum Logical HYRISE MV-OCC Append-Only – Physical Hekaton MV-OCC Append-Only Cooperative Physical MemSQL MV-OCC Append-Only Vacuum Physical SAP HANA MV-2PL Time-travel Hybrid Logical NuoDB MV-2PL Append-Only Vacuum Logical HyPer MV-OCC Delta Txn-level Logical CMU's TBD MV-OCC Delta Txn-level Logical

slide-88
SLIDE 88 CMU 15-721 (Spring 2019)

M VCC CO N FIGURATIO N EVALUATIO N

31

25 50 75 100

8 16 24 32 40

Throughput (txn/sec) # Threads Oracle/MySQL NuoDB HyPer HYRISE MemSQL HANA HEKATON Postgres

Database: TPC-C Benchmark (40 Warehouses) Processor: 4 sockets, 10 cores per socket

slide-89
SLIDE 89 CMU 15-721 (Spring 2019)

M VCC CO N FIGURATIO N EVALUATIO N

31

25 50 75 100

8 16 24 32 40

Throughput (txn/sec) # Threads Oracle/MySQL NuoDB HyPer HYRISE MemSQL HANA HEKATON Postgres

Database: TPC-C Benchmark (40 Warehouses) Processor: 4 sockets, 10 cores per socket

slide-90
SLIDE 90 CMU 15-721 (Spring 2019)

PARTIN G TH O UGH TS

MVCC is the best approach for supporting txns in mixed workloads. We only discussed MVCC for OLTP.

→ Design decisions may be different for HTAP

32

slide-91
SLIDE 91 CMU 15-721 (Spring 2019)

N EXT CLASS

Modern MVCC Implementations

→ TUM HyPer → CMU Cicada → Microsoft Hekaton

33