15-721 DATABASE SYSTEMS Lecture #05 Multi-Version Concurrency - - PowerPoint PPT Presentation

15 721
SMART_READER_LITE
LIVE PREVIEW

15-721 DATABASE SYSTEMS Lecture #05 Multi-Version Concurrency - - PowerPoint PPT Presentation

15-721 DATABASE SYSTEMS Lecture #05 Multi-Version Concurrency Control Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 TODAYS AGENDA Compare-and-Swap (CAS) MVCC


slide-1
SLIDE 1

Andy Pavlo / / Carnegie Mellon University / / Spring 2016

DATABASE SYSTEMS

Lecture #05 – Multi-Version Concurrency Control

15-721

@Andy_Pavlo // Carnegie Mellon University // Spring 2017

slide-2
SLIDE 2

CMU 15-721 (Spring 2017)

TODAY’S AGENDA

Compare-and-Swap (CAS) MVCC Overview Design Decisions Modern MVCC Implementations Project #2

slide-3
SLIDE 3

CMU 15-721 (Spring 2017)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V

→ If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M

__sync_bool_compare_and_swap(&M, 20, 30)

20

Compare Value Address New Value

slide-4
SLIDE 4

CMU 15-721 (Spring 2017)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V

→ If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M

__sync_bool_compare_and_swap(&M, 20, 30)

30

Compare Value Address New Value

slide-5
SLIDE 5

CMU 15-721 (Spring 2017)

COMPARE-AND-SWAP

Atomic instruction that compares contents of a memory location M to a given value V

→ If values are equal, installs new given value V’ in M → Otherwise operation fails

3

M

__sync_bool_compare_and_swap(&M, 20, 30)

30

X

25 35 Compare Value Address New Value

slide-6
SLIDE 6

CMU 15-721 (Spring 2017)

MULTI-VERSION CONCURRENCY CONTROL

The DBMS maintains multiple physical versions

  • f a single logical object in the database:

→ When a txn writes to an object, the DBMS creates a new version of that object. → When a txn reads an object, it reads the newest version that existed when the txn started.

First proposed in 1978 MIT PhD dissertation. Used in almost every new DBMS in last 10 years.

4

slide-7
SLIDE 7

CMU 15-721 (Spring 2017)

MULTI-VERSION CONCURRENCY CONTROL

Main benefits:

→ Writers don’t block readers. → Read-only txns can read a consistent snapshot without acquiring locks. → Easily support time-travel queries.

MVCC is more than just a “concurrency control protocol”. It completely affects how the DBMS manages transactions and the database.

5

slide-8
SLIDE 8

CMU 15-721 (Spring 2017)

MVCC DESIGN DECISIONS

Concurrency Control Protocol Version Storage Garbage Collection Index Management

6

WE STILL NEED TO THINK OF A TITLE BUT TRUST ME THIS IS A REALLY GOOD PAPER ON IN-MEMORY MVCC VLDB 2017

slide-9
SLIDE 9

CMU 15-721 (Spring 2017)

MVCC IMPLEMENTATIONS

7

Protocol Version Storage Garbage Collection Indexes

Oracle MV2PL Delta Vacuum Logical Postgres MV-2PL/MV-TO Append-Only Vacuum Physical MySQL-InnoDB MV-2PL Delta Vacuum Logical HYRISE MV-OCC Append-Only – Physical Hekaton MV-OCC Append-Only Cooperative Physical MemSQL MV-OCC Append-Only Vacuum Physical SAP HANA MV-2PL Time-travel Hybrid Logical NuoDB MV-2PL Append-Only Vacuum Logical HyPer MV-OCC Delta Txn-level Logical

slide-10
SLIDE 10

CMU 15-721 (Spring 2017)

TUPLE FORMAT

8

Unique Txn Identifier Version Lifetime Next/Prev Version Additional Metadata

TXN-ID DATA BEGIN-TS END-TS POINTER ...

slide-11
SLIDE 11

CMU 15-721 (Spring 2017)

CONCURRENCY CONTROL PROTOCOL

Approach #1: Timestamp Ordering

→ Assign txns timestamps that determine serial order. → Considered to be original MVCC protocol.

Approach #2: Optimistic Concurrency Control

→ Three-phase protocol from last class. → Use private workspace for new versions.

Approach #3: Two-Phase Locking

→ Txns acquire appropriate lock on physical version before they can read/write a logical tuple.

9

slide-12
SLIDE 12

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

slide-13
SLIDE 13

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

slide-14
SLIDE 14

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

slide-15
SLIDE 15

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

Tid=10

slide-16
SLIDE 16

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”.

Tid=10

slide-17
SLIDE 17

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”.

Tid=10

slide-18
SLIDE 18

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”. Txn creates a new version if no other txn holds lock and Tid is greater than “read-ts”.

Tid=10

slide-19
SLIDE 19

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”. Txn creates a new version if no other txn holds lock and Tid is greater than “read-ts”.

Tid=10

10

slide-20
SLIDE 20

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”. Txn creates a new version if no other txn holds lock and Tid is greater than “read-ts”.

Tid=10

10 BX+1 10 10

slide-21
SLIDE 21

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”. Txn creates a new version if no other txn holds lock and Tid is greater than “read-ts”.

Tid=10

10 BX+1 10 10

10

slide-22
SLIDE 22

CMU 15-721 (Spring 2017)

TXN-ID READ-TS BEGIN-TS END-TS

Ax 1 1

Bx 1

TIMESTAMP ORDERING (MVTO)

10

Use “read-ts” field in the header to keep track of the timestamp of the last txn that read it.

READ(A) WRITE(B)

10

Txn is allowed to read version if the lock is unset and its Tid is between “begin-ts” and “end-ts”. Txn creates a new version if no other txn holds lock and Tid is greater than “read-ts”.

Tid=10

BX+1 10 10

10

slide-23
SLIDE 23

CMU 15-721 (Spring 2017)

VERSION STORAGE

The DBMS uses the tuples’ pointer field to create a latch-free version chain per logical tuple.

→ This allows the DBMS to find the version that is visible to a particular txn at runtime. → Indexes always point to the “head” of the chain.

Threads store versions in “local” memory regions to avoid contention on centralized data structures. Different storage schemes determine where/what to store for each version.

11

slide-24
SLIDE 24

CMU 15-721 (Spring 2017)

VERSION STORAGE

Approach #1: Append-Only Storage

→ New versions are appended to the same table space.

Approach #2: Time-Travel Storage

→ Old versions are copied to separate table space.

Approach #3: Delta Storage

→ The original values of the modified attributes are copied into a separate delta record space.

12

slide-25
SLIDE 25

CMU 15-721 (Spring 2017)

APPEND-ONLY STORAGE

13

All of the physical versions of a logical tuple are stored in the same table space Main Table On every update, append a new version of the tuple into an empty space in the table.

KEY VALUE

Ax XXX $111

POINTER

Ax+1 XXX $222 Ø Bx YYY $10 Ø

slide-26
SLIDE 26

CMU 15-721 (Spring 2017)

APPEND-ONLY STORAGE

13

All of the physical versions of a logical tuple are stored in the same table space Main Table On every update, append a new version of the tuple into an empty space in the table.

KEY VALUE

Ax XXX $111

POINTER

Ax+1 XXX $222 Ø Ax+2 XXX $333 Ø Bx YYY $10 Ø

slide-27
SLIDE 27

CMU 15-721 (Spring 2017)

APPEND-ONLY STORAGE

13

All of the physical versions of a logical tuple are stored in the same table space Main Table On every update, append a new version of the tuple into an empty space in the table.

KEY VALUE

Ax XXX $111

POINTER

Ax+1 XXX $222 Ax+2 XXX $333 Ø Bx YYY $10 Ø

slide-28
SLIDE 28

CMU 15-721 (Spring 2017)

VERSION CHAIN ORDERING

Approach #1: Oldest-to-Newest (O2N)

→ Just append new version to end of the chain. → Have to traverse chain on look-ups.

Approach #2: Newest-to-Oldest (N2O)

→ Have to update index pointers for every new version. → Don’t have to traverse chain on look ups.

The ordering of the chain has different performance trade-offs.

14

slide-29
SLIDE 29

CMU 15-721 (Spring 2017)

TIME-TRAVEL STORAGE

15

On every update, copy the current version to the time- travel table. Update pointers. Main Table

KEY VALUE

A2 XXX $222

POINTER

B1 YYY $10

Time-Travel Table

KEY VALUE

A1 XXX $111

POINTER

Ø

slide-30
SLIDE 30

CMU 15-721 (Spring 2017)

TIME-TRAVEL STORAGE

15

On every update, copy the current version to the time- travel table. Update pointers. Main Table

KEY VALUE

A2 XXX $222

POINTER

B1 YYY $10

Time-Travel Table

KEY VALUE

A1 XXX $111

POINTER

A2 XXX $222 Ø

slide-31
SLIDE 31

CMU 15-721 (Spring 2017)

TIME-TRAVEL STORAGE

15

On every update, copy the current version to the time- travel table. Update pointers. Overwrite master version in the main table. Update pointers. Main Table

KEY VALUE

A2 XXX $222

POINTER

B1 YYY $10

Time-Travel Table

KEY VALUE

A1 XXX $111

POINTER

A2 XXX $222 Ø

slide-32
SLIDE 32

CMU 15-721 (Spring 2017)

TIME-TRAVEL STORAGE

15

On every update, copy the current version to the time- travel table. Update pointers. Overwrite master version in the main table. Update pointers. Main Table

KEY VALUE

A2 XXX $222

POINTER

B1 YYY $10 A3 $333

Time-Travel Table

KEY VALUE

A1 XXX $111

POINTER

A2 XXX $222 Ø

slide-33
SLIDE 33

CMU 15-721 (Spring 2017)

DELTA STORAGE

16

On every update, copy only the values that were modified to the delta storage and

  • verwrite the master version.

Main Table

KEY VALUE

A1 XXX $111

POINTER

B1 YYY $10

Delta Storage Segment

slide-34
SLIDE 34

CMU 15-721 (Spring 2017)

DELTA STORAGE

16

On every update, copy only the values that were modified to the delta storage and

  • verwrite the master version.

Main Table

KEY VALUE

A1 XXX $111

POINTER

B1 YYY $10

Delta Storage Segment

DELTA POINTER

A1 (VALUE→$111) Ø A2 $222

slide-35
SLIDE 35

CMU 15-721 (Spring 2017)

DELTA STORAGE

16

On every update, copy only the values that were modified to the delta storage and

  • verwrite the master version.

Main Table

KEY VALUE

A1 XXX $111

POINTER

B1 YYY $10

Delta Storage Segment

DELTA POINTER

A2 (VALUE→$222) A1 (VALUE→$111) Ø A2 $222

slide-36
SLIDE 36

CMU 15-721 (Spring 2017)

DELTA STORAGE

16

On every update, copy only the values that were modified to the delta storage and

  • verwrite the master version.

Txns can recreate old versions by applying the delta in reverse order. Main Table

KEY VALUE

A1 XXX $111

POINTER

B1 YYY $10

Delta Storage Segment

DELTA POINTER

A2 (VALUE→$222) A1 (VALUE→$111) Ø A2 $222 A3 $333

slide-37
SLIDE 37

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

The DBMS needs to remove reclaimable physical versions from the database over time.

→ No active txn in the DBMS can “see” that version (SI). → The version was created by an aborted txn.

Two additional design decisions:

→ How to look for expired versions? → How to decide when it is safe to reclaim memory?

17

slide-38
SLIDE 38

CMU 15-721 (Spring 2017)

GARBAGE COLLECTION

Approach #1: Tuple-level

→ Find old versions by examining tuples directly. → Background Vacuuming vs. Cooperative Cleaning

Approach #2: Transaction-level

→ Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.

18

slide-39
SLIDE 39

CMU 15-721 (Spring 2017)

TXN-ID BEGIN-TS END-TS

Ax 1 9 Bx 1 9 Bx+1 10 20

TUPLE-LEVEL GC

19

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable

  • versions. Works with any storage.

Thread #1 Tid=12 Thread #2 Tid=25 Vacuum

slide-40
SLIDE 40

CMU 15-721 (Spring 2017)

TXN-ID BEGIN-TS END-TS

Ax 1 9 Bx 1 9 Bx+1 10 20

TUPLE-LEVEL GC

19

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable

  • versions. Works with any storage.

Thread #1 Tid=12 Thread #2 Tid=25 Vacuum

Dirty?

slide-41
SLIDE 41

CMU 15-721 (Spring 2017)

TUPLE-LEVEL GC

19

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable

  • versions. Works with any storage.

Thread #1 Tid=12 Thread #2 Tid=25

Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version

  • chain. Only works with O2N.

Ax Ax+1 Ax+2 Ax+3 Bx Bx+1 Bx+2 Bx+3

INDEX

slide-42
SLIDE 42

CMU 15-721 (Spring 2017)

TUPLE-LEVEL GC

19

Background Vacuuming: Separate thread(s) periodically scan the table and look for reclaimable

  • versions. Works with any storage.

Thread #1 Tid=12 Thread #2 Tid=25

Cooperative Cleaning: Worker threads identify reclaimable versions as they traverse version

  • chain. Only works with O2N.

Ax Ax+1 Ax+2 Ax+3 Bx Bx+1 Bx+2 Bx+3

INDEX

X

slide-43
SLIDE 43

CMU 15-721 (Spring 2017)

TRANSACTION-LEVEL GC

Each txn keeps track of its read/write set. The DBMS determines when all versions created by a finished txn are no longer visible.

20

slide-44
SLIDE 44

CMU 15-721 (Spring 2017)

INDEX MANAGEMENT

PKey indexes always point to version chain head.

→ How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated. → If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary indexes are more complicated…

21

slide-45
SLIDE 45

CMU 15-721 (Spring 2017)

INDEX MANAGEMENT

PKey indexes always point to version chain head.

→ How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated. → If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary indexes are more complicated…

21

slide-46
SLIDE 46

CMU 15-721 (Spring 2017)

SECONDARY INDEXES

Approach #1: Logical Pointers

→ Use a fixed identifier per tuple that does not change. → Requires an extra indirection layer. → Primary Key vs. Tuple Id

Approach #2: Physical Pointers

→ Use the physical address to the version chain head.

22

slide-47
SLIDE 47

CMU 15-721 (Spring 2017)

INDEX POINTERS

23

PRIMARY INDEX SECONDARY INDEX

Ax+3 Ax+2 Ax+1 Ax

GET(A) Append-Only Newest-to-Oldest

Physical Address

slide-48
SLIDE 48

CMU 15-721 (Spring 2017)

INDEX POINTERS

23

PRIMARY INDEX SECONDARY INDEX

Ax+3 Ax+2 Ax+1 Ax

Append-Only Newest-to-Oldest GET(A)

Physical Address

slide-49
SLIDE 49

CMU 15-721 (Spring 2017)

SECONDARY INDE SECONDARY INDEX SECONDARY INDEX

INDEX POINTERS

23

PRIMARY INDEX SECONDARY INDEX

Ax+3 Ax+2 Ax+1 Ax

Append-Only Newest-to-Oldest GET(A)

slide-50
SLIDE 50

CMU 15-721 (Spring 2017)

INDEX POINTERS

23

PRIMARY INDEX SECONDARY INDEX

Ax+3 Ax+2 Ax+1 Ax

Append-Only Newest-to-Oldest GET(A)

Physical Address Primary Key

slide-51
SLIDE 51

CMU 15-721 (Spring 2017)

INDEX POINTERS

23

PRIMARY INDEX SECONDARY INDEX

Ax+3 Ax+2 Ax+1 Ax

Append-Only Newest-to-Oldest GET(A)

TupleId→Address

TupleId Physical Address

slide-52
SLIDE 52

CMU 15-721 (Spring 2017)

MVCC CONFIGURATION EVALUATION

24

25 50 75 100

8 16 24 32 40

Throughput (txn/sec) # Threads

Oracle/MySQL Postgres HYRISE HEKATON MemSQL HANA NuoDB HyPer

Database: TPC-C Benchmark (40 Warehouses) Processor: 4 sockets, 10 cores per socket

slide-53
SLIDE 53

CMU 15-721 (Spring 2017)

MODERN MVCC

Microsoft Hekaton (SQL Server) TUM HyPer SAP HANA

25

slide-54
SLIDE 54

CMU 15-721 (Spring 2017)

MICROSOFT HEKATON

Incubator project started in 2008 to create new OLTP engine for MSFT SQL Server (MSSQL).

→ Led by DB ballers Paul Larson and Mike Zwilling

Had to integrate with MSSQL ecosystem. Had to support all possible OLTP workloads with predictable performance.

→ Single-threaded partitioning (e.g., H-Store) works well for some applications but terrible for others.

26

slide-55
SLIDE 55

CMU 15-721 (Spring 2017)

HEKATON MVCC

Every txn is assigned a timestamp (TS) when they begin and when they commit. DBMS maintains “chain” of versions per tuple:

→ BEGIN: The BeginTS of the active txn or the EndTS of the committed txn that created it. → END: The BeginTS of the active txn that created the next version or infinity or the EndTS of the committed txn that created it. → POINTER: Location of the next version in the chain.

27

HIGH-PERFORMANCE CONCURRENCY CONTROL MECHANISMS FOR MAIN-MEMORY DATABASES VLDB 2011

slide-56
SLIDE 56

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 20 John $110

INDEX

slide-57
SLIDE 57

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-58
SLIDE 58

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-59
SLIDE 59

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Update “John” Read “John”

slide-60
SLIDE 60

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-61
SLIDE 61

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-62
SLIDE 62

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

slide-63
SLIDE 63

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

slide-64
SLIDE 64

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

35 35

slide-65
SLIDE 65

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

28

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

35 35

REWIND

slide-66
SLIDE 66

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-67
SLIDE 67

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30

slide-68
SLIDE 68

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John”

slide-69
SLIDE 69

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-70
SLIDE 70

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-71
SLIDE 71

CMU 15-721 (Spring 2017)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

29

10 20 John $100 Txn25 ∞ John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-72
SLIDE 72

CMU 15-721 (Spring 2017)

HEKATON: TRANSACTION STATE MAP

Global map of all txns’ states in the system:

→ ACTIVE: The txn is executing read/write operations. → VALIDATING: The txn has invoked commit and the DBMS is checking whether it is valid. → COMMITTED: The txn is finished, but may have not updated its versions’ TS. → TERMINATED: The txn has updated the TS for all of the versions that it created.

30

slide-73
SLIDE 73

CMU 15-721 (Spring 2017)

HEKATON: TRANSACTION META-DATA

Read Set

→ Pointers to every version read.

Write Set

→ Pointers to versions updated (old and new), versions deleted (old), and version inserted (new).

Scan Set

→ Stores enough information needed to perform each scan

  • peration.

Commit Dependencies

→ List of txns that are waiting for this txn to finish.

32

slide-74
SLIDE 74

CMU 15-721 (Spring 2017)

HEKATON: TRANSACTION VALIDATION

Read Stability

→ Check that each version read is still visible as of the end

  • f the txn.

Phantom Avoidance

→ Repeat each scan to check whether new versions have become visible since the txn began.

Extent of validation depends on isolation level:

→ SERIALIZABLE: Read Stability + Phantom Avoidance → REPEATABLE READS: Read Stability → SNAPSHOT ISOLATION: None → READ COMMITTED: None

33

slide-75
SLIDE 75

CMU 15-721 (Spring 2017)

HEKATON: OPTIMISTIC VS. PESSIMISTIC

Optimistic Txns:

→ Check whether a version read is still visible at the end of the txn. → Repeat all index scans to check for phantoms.

Pessimistic Txns:

→ Use shared & exclusive locks on records and buckets. → No validation is needed. → Separate background thread to detect deadlocks.

34

slide-76
SLIDE 76

CMU 15-721 (Spring 2017)

HEKATON: OPTIMISTIC VS. PESSIMISTIC

35

0.5 1 1.5 2

6 12 18 24

Throughput (txn/sec)

Millions

# Threads Optimistic Pessimistic

Source: Paul Larson

Database: Single table with 1000 tuples Workload: 80% read-only txns + 20% update txns Processor: 2 sockets, 12 cores

slide-77
SLIDE 77

CMU 15-721 (Spring 2017)

HEKATON: PERFORMANCE

Bwin – Large online betting company

→ Before: 15,000 requests/sec → Hekaton: 250,000 requests/sec

EdgeNet – Up-to-date inventory status

→ Before: 7,450 rows/sec (ingestion rate) → Hekaton: 126,665 rows/sec

SBI Liquidity Market – FOREX broker

→ Before: 2,812 txn/sec with 4 sec latency → Hekaton: 5,313 txn/sec with <1 sec latency

36

Source: Paul Larson

slide-78
SLIDE 78

CMU 15-721 (Spring 2017)

HEKATON: IMPLEMENTATION

Use only lock-free data structures

→ No latches, spin locks, or critical sections → Indexes, txn map, memory alloc, garbage collector → We will discuss Bw-Trees + Skip Lists later…

Only one single serialization point in the DBMS to get the txn’s begin and commit timestamp

→ Atomic Addition (CAS)

37

slide-79
SLIDE 79

CMU 15-721 (Spring 2017)

OBSERVATIONS

Read/scan set validations are expensive if the txns access a lot of data. Appending new versions hurts the performance of OLAP scans due to pointer chasing & branching. Record-level conflict checks may be too coarse- grained and incur false positives.

38

slide-80
SLIDE 80

CMU 15-721 (Spring 2017)

HYPER MVCC

Rollback Segment with Deltas

→ In-Place updates for non-indexed attributes → Delete/Insert updates for indexed attributes.

Newest-to-Oldest Version Chains No Predicate Locks Avoids write-write conflicts by aborting txns that try to update an uncommitted object.

39

FAST SERIALIZABLE MULTI-VERSION CONCURRENCY CONTROL FOR MAIN-MEMORY DATABASE SYSTEMS SIGMOD 2015

slide-81
SLIDE 81

CMU 15-721 (Spring 2017)

HYPER MVCC

40

Delta Storage (Per Txn) Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

(ATTR2→$122) Txn 263+1 (ATTR2→$199) Txn 123 (ATTR2→$100) Txn 263 (ATTR2→$139)

slide-82
SLIDE 82

CMU 15-721 (Spring 2017)

PARTING THOUGHTS

MVCC is currently the best approach for supporting txns in mixed workoads We only discussed MVCC for OLTP.

→ Design decisions may be different for HTAP

Interesting MVCC research/project Topics:

→ Block compaction → Version compression → On-line schema changes

41

slide-83
SLIDE 83

CMU 15-721 (Spring 2017)

PROJECT #2

Implement a latch-free Skip List in Peloton.

→ Forward / Reverse Iteration → Garbage Collection

Must be able to support both unique and non- unique keys.

42

slide-84
SLIDE 84

CMU 15-721 (Spring 2017)

PROJECT #2 – DESIGN

We will provide you with a header file with the index API that you have to implement.

→ Data serialization and predicate evaluation will be taken care of for you.

There are several design decisions that you are going to have to make.

→ There is no right answer. → Do not expect us to guide you at every step of the development process.

43

slide-85
SLIDE 85

CMU 15-721 (Spring 2017)

PROJECT #2 – TESTING

We are providing you with C++ unit tests for you to check your implementation. We also have a BwTree implementation to compare against. We strongly encourage you to do your own additional testing.

44

slide-86
SLIDE 86

CMU 15-721 (Spring 2017)

PROJECT #2 – DOCUMENTATION

You must write sufficient documentation and comments in your code to explain what you are doing in all different parts. We will inspect the submissions manually.

45

slide-87
SLIDE 87

CMU 15-721 (Spring 2017)

PROJECT #2 – GRADING

We will run additional tests beyond what we provided you for grading.

→ Bonus points will be given to the groups with the fastest implementation. → We will use Valgrind when testing your code.

All source code must pass ClangFormat syntax formatting checker.

→ See Peloton documentation for formatting guidelines.

46

slide-88
SLIDE 88

CMU 15-721 (Spring 2017)

PROJECT #2 – GROUPS

This is a group project.

→ Everyone should contribute equally. → I will review commit history.

Email me if you do not have a group.

47

slide-89
SLIDE 89

CMU 15-721 (Spring 2017)

PROJECT #2

Due Date: March 2nd, 2017 @ 11:59pm Projects will be turned in using Autolab. Full description and instructions: http://15721.courses.cs.cmu.edu/spring2017/proj ect2.html

48

slide-90
SLIDE 90

CMU 15-721 (Spring 2017)

NEXT CLASS

Index Locking + Latching

49