Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version - - PowerPoint PPT Presentation

lect ure 06 advanced database
SMART_READER_LITE
LIVE PREVIEW

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version - - PowerPoint PPT Presentation

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Part II) @ Andy_Pavlo // 15- 721 // Spring 2018 2 TO DAY'S AGEN DA Microsoft Hekaton (SQL Server) TUM HyPer CMU Cicada CMU 15-721 (Spring 2018) 3 M ICRO SO FT


slide-1
SLIDE 1

Multi-Version Concurrency Control (Part II)

@ Andy_Pavlo // 15- 721 // Spring 2018

ADVANCED DATABASE SYSTEMS Lect ure # 06

slide-2
SLIDE 2

CMU 15-721 (Spring 2018)

TO DAY'S AGEN DA

Microsoft Hekaton (SQL Server) TUM HyPer CMU Cicada

2

slide-3
SLIDE 3

CMU 15-721 (Spring 2018)

M ICRO SO FT H EKATO N

Incubator project started in 2008 to create new OLTP engine for MSFT SQL Server (MSSQL).

→ Led by DB ballers Paul Larson and Mike Zwilling

Had to integrate with MSSQL ecosystem. Had to support all possible OLTP workloads with predictable performance.

→ Single-threaded partitioning (e.g., H-Store) works well for some applications but terrible for others.

3

slide-4
SLIDE 4

CMU 15-721 (Spring 2018)

H EKATO N M VCC

Each txn is assigned a timestamp when they begin (BeginTS) and when they commit (EndTS). Each tuple contains two timestamps that represents their visibility and current state:

→ BEGIN: The BeginTS of the active txn or the EndTS of the committed txn that created it. → END: The BeginTS of the active txn that created the next version or infinity or the EndTS of the committed txn that created it.

4

HIGH- PERFORMANCE CONCURRENCY CONTROL MECHANISMS FOR MAIN- MEMORY DATABASES

VLDB 2011

slide-5
SLIDE 5

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

INDEX

slide-6
SLIDE 6

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

BEGIN @ 25 INDEX

slide-7
SLIDE 7

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read "John"

slide-8
SLIDE 8

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read "John"

slide-9
SLIDE 9

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Update "John" Read "John"

slide-10
SLIDE 10

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John"

slide-11
SLIDE 11

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John"

slide-12
SLIDE 12

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" COMMIT @ 35

slide-13
SLIDE 13

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" COMMIT @ 35

35 35

slide-14
SLIDE 14

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

5

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" COMMIT @ 35

35 35

REWIND

slide-15
SLIDE 15

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

6

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John"

slide-16
SLIDE 16

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

6

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" BEGIN @ 30

slide-17
SLIDE 17

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

6

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" BEGIN @ 30 Read "John"

slide-18
SLIDE 18

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

6

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" BEGIN @ 30 Read "John" Update "John"

slide-19
SLIDE 19

CMU 15-721 (Spring 2018)

BEGIN END POINTER ATTR1 ATTR2

H EKATO N : O PERATIO N S

6

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update "John" Read "John" BEGIN @ 30 Read "John" Update "John"

slide-20
SLIDE 20

CMU 15-721 (Spring 2018)

H EKATO N : TRAN SACTIO N STATE M AP

Global map of all txns’ states in the system:

→ ACTIVE: The txn is executing read/write operations. → VALIDATING: The txn has invoked commit and the DBMS is checking whether it is valid. → COMMITTED: The txn is finished, but may have not updated its versions’ TS. → TERMINATED: The txn has updated the TS for all of the versions that it created.

7

slide-21
SLIDE 21

CMU 15-721 (Spring 2018)

H EKATO N : TRAN SACTIO N M ETA- DATA

Read Set

→ Pointers to every version read.

Write Set

→ Pointers to versions updated (old and new), versions deleted (old), and version inserted (new).

Scan Set

→ Stores enough information needed to perform each scan

  • peration.

Commit Dependencies

→ List of txns that are waiting for this txn to finish.

8

slide-22
SLIDE 22

CMU 15-721 (Spring 2018)

H EKATO N : TRAN SACTIO N VALIDATIO N

Read Stability

→ Check that each version read is still visible as of the end

  • f the txn.

Phantom Avoidance

→ Repeat each scan to check whether new versions have become visible since the txn began.

Extent of validation depends on isolation level:

→ SERIALIZABLE: Read Stability + Phantom Avoidance → REPEATABLE READS: Read Stability → SNAPSHOT ISOLATION: None → READ COMMITTED: None

9

slide-23
SLIDE 23

CMU 15-721 (Spring 2018)

H EKATO N : O PTIM ISTIC VS. PESSIM ISTIC

Optimistic Txns:

→ Check whether a version read is still visible at the end of the txn. → Repeat all index scans to check for phantoms.

Pessimistic Txns:

→ Use shared & exclusive locks on records and buckets. → No validation is needed. → Separate background thread to detect deadlocks.

10

slide-24
SLIDE 24

CMU 15-721 (Spring 2018)

H EKATO N : O PTIM ISTIC VS. PESSIM ISTIC

11

0.5 1 1.5 2

6 12 18 24

Throughput (txn/sec)

Millions

# Threads Optimistic Pessimistic

Source: Paul Larson

Database: Single table with 1000 tuples Workload: 80% read-only txns + 20% update txns Processor: 2 sockets, 12 cores

slide-25
SLIDE 25

CMU 15-721 (Spring 2018)

H EKATO N : LESSO N S

Use only lock-free data structures

→ No latches, spin locks, or critical sections → Indexes, txn map, memory alloc, garbage collector → We will discuss Bw-Trees + Skip Lists later…

Only one single serialization point in the DBMS to get the txn’s begin and commit timestamp

→ Atomic Addition (CAS)

12

slide-26
SLIDE 26

CMU 15-721 (Spring 2018)

O BSERVATIO N S

Read/scan set validations are expensive if the txns access a lot of data. Appending new versions hurts the performance of OLAP scans due to pointer chasing & branching. Record-level conflict checks may be too coarse- grained and incur false positives.

13

slide-27
SLIDE 27

CMU 15-721 (Spring 2018)

H YPER M VCC

Column-store with delta record versioning.

→ In-Place updates for non-indexed attributes → Delete/Insert updates for indexed attributes. → Newest-to-Oldest Version Chains → No Predicate Locks / No Scan Checks

Avoids write-write conflicts by aborting txns that try to update an uncommitted object. Designed for HTAP workloads.

14

FAST SERIALIZABLE MULTI- VERSION CONCURRENCY CONTROL FOR MAIN- MEMORY DATABASE SYSTEMS

SIGMOD 2015

slide-28
SLIDE 28

CMU 15-721 (Spring 2018)

H YPER: STO RAGE ARCH ITECTURE

15

Delta Storage (Per Txn) Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

Ø (ATTR2→$122) Txn #2 (ATTR2→$199) Txn #1 (ATTR2→$100) Txn #3 (ATTR2→$139)

slide-29
SLIDE 29

CMU 15-721 (Spring 2018)

H YPER: VALIDATIO N

First-Writer Wins

→ The version vector always points to the last committed version. → Do not need to check whether write-sets overlap.

Check the undo buffers (i.e., delta records) of txns that committed after the validating txn started.

→ Compare the committed txn's write set for phantoms using Precision Locking. → Only need to store the txn's read predicates and not its entire read set.

16

slide-30
SLIDE 30

CMU 15-721 (Spring 2018)

H YPER: PRECISIO N LO CKIN G

17

Validating Txn

SELECT * FROM foo WHERE attr2 > 20 AND attr2 < 30 SELECT COUNT(attr1) FROM foo WHERE attr2 IN (10,20,30)

Delta Storage (Per Txn)

Txn #1003 (ATTR1→'IceCube', ATTR2→199)

SELECT attr1, AVG(attr2) FROM foo WHERE attr1 LIKE '%Ice%' GROUP BY attr1 HAVING AVG(attr2) > 100

Txn #1002 (ATTR2→122) Txn #1001 (ATTR2→99) (ATTR2→33)

99>20 AND 99<30

FALSE

slide-31
SLIDE 31

CMU 15-721 (Spring 2018)

H YPER: PRECISIO N LO CKIN G

17

Validating Txn

SELECT * FROM foo WHERE attr2 > 20 AND attr2 < 30 SELECT COUNT(attr1) FROM foo WHERE attr2 IN (10,20,30)

Delta Storage (Per Txn)

Txn #1003 (ATTR1→'IceCube', ATTR2→199)

SELECT attr1, AVG(attr2) FROM foo WHERE attr1 LIKE '%Ice%' GROUP BY attr1 HAVING AVG(attr2) > 100

Txn #1002 (ATTR2→122) Txn #1001 (ATTR2→99) (ATTR2→33) FALSE

slide-32
SLIDE 32

CMU 15-721 (Spring 2018)

H YPER: PRECISIO N LO CKIN G

17

Validating Txn

SELECT * FROM foo WHERE attr2 > 20 AND attr2 < 30 SELECT COUNT(attr1) FROM foo WHERE attr2 IN (10,20,30)

Delta Storage (Per Txn)

Txn #1003 (ATTR1→'IceCube', ATTR2→199)

SELECT attr1, AVG(attr2) FROM foo WHERE attr1 LIKE '%Ice%' GROUP BY attr1 HAVING AVG(attr2) > 100

Txn #1002 (ATTR2→122) Txn #1001 (ATTR2→99) (ATTR2→33) FALSE

slide-33
SLIDE 33

CMU 15-721 (Spring 2018)

H YPER: PRECISIO N LO CKIN G

17

Validating Txn

SELECT * FROM foo WHERE attr2 > 20 AND attr2 < 30 SELECT COUNT(attr1) FROM foo WHERE attr2 IN (10,20,30)

Delta Storage (Per Txn)

Txn #1003 (ATTR1→'IceCube', ATTR2→199)

SELECT attr1, AVG(attr2) FROM foo WHERE attr1 LIKE '%Ice%' GROUP BY attr1 HAVING AVG(attr2) > 100

Txn #1002 (ATTR2→122) Txn #1001 (ATTR2→99) (ATTR2→33)

slide-34
SLIDE 34

CMU 15-721 (Spring 2018)

H YPER: PRECISIO N LO CKIN G

17

Validating Txn

SELECT * FROM foo WHERE attr2 > 20 AND attr2 < 30 SELECT COUNT(attr1) FROM foo WHERE attr2 IN (10,20,30)

Delta Storage (Per Txn)

Txn #1003 (ATTR1→'IceCube', ATTR2→199)

SELECT attr1, AVG(attr2) FROM foo WHERE attr1 LIKE '%Ice%' GROUP BY attr1 HAVING AVG(attr2) > 100

Txn #1002 (ATTR2→122) Txn #1001 (ATTR2→99) (ATTR2→33)

'Ice Cube' LIKE '%Ice%'

TRUE

slide-35
SLIDE 35

CMU 15-721 (Spring 2018)

H YPER: VERSIO N SYN O PSES

Store a separate column that tracks the position of the first and last versioned tuple in a block of tuples. When scanning tuples, the DBMS can check for strides of tuples without older versions and execute more efficiently.

18

Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

Ø Ø Ø RZA GZA ODB $300 $300 $0 Ø Ø

Version Synopsis

[2,5)

slide-36
SLIDE 36

CMU 15-721 (Spring 2018)

H YPER: VERSIO N SYN O PSES

Store a separate column that tracks the position of the first and last versioned tuple in a block of tuples. When scanning tuples, the DBMS can check for strides of tuples without older versions and execute more efficiently.

18

Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

Ø Ø Ø RZA GZA ODB $300 $300 $0 Ø Ø

Version Synopsis

[2,5)

1 2 3 4 5 6

Offsets

slide-37
SLIDE 37

CMU 15-721 (Spring 2018)

H YPER: VERSIO N SYN O PSES

Store a separate column that tracks the position of the first and last versioned tuple in a block of tuples. When scanning tuples, the DBMS can check for strides of tuples without older versions and execute more efficiently.

18

Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

Ø Ø Ø RZA GZA ODB $300 $300 $0 Ø Ø

Version Synopsis

[2,5)

slide-38
SLIDE 38

CMU 15-721 (Spring 2018)

H YPER: VERSIO N SYN O PSES

Store a separate column that tracks the position of the first and last versioned tuple in a block of tuples. When scanning tuples, the DBMS can check for strides of tuples without older versions and execute more efficiently.

18

Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

Ø Ø Ø RZA GZA ODB $300 $300 $0 Ø Ø

Version Synopsis

[2,5)

slide-39
SLIDE 39

CMU 15-721 (Spring 2018)

CM U CICADA

In-memory OLTP engine based on optimistic MVCC with append-only storage (N2O).

→ Best-effort Inlining → Loosely Synchronized Clocks → Contention-Aware Validation → Index Nodes Stored in Tables

Designed to be scalable for both low- and high- contention workloads.

19

CICADA: DEPENDABLY FAST MULTI- CORE IN- MEMORY TRANSACTIONS

SIGMOD 2017

slide-40
SLIDE 40

CMU 15-721 (Spring 2018)

Record Meta-data

CICADA: BEST- EFFO RT IN LIN IN G

Record meta-data is stored in a fixed location. Threads will attempt to inline read-mostly version within this meta-data to reduce version chain traversals.

20

POINTER LATEST VERSION

EMPTY

KEY VALUE

XXX $111

POINTER KEY VALUE

YYY $222

POINTER

slide-41
SLIDE 41

CMU 15-721 (Spring 2018)

CICADA: FAST VALIDATIO N

Contention-aware Validation

→ Validate access to recently modified records first.

Early Consistency Check

→ Pre-validate access set before making global writes.

Incremental Version Search

→ Resume from last search location in version list.

21

Source: Hyeontaek Lim

Skip if all recent txns committed successfully.

slide-42
SLIDE 42

CMU 15-721 (Spring 2018)

CICADA: IN DEX STO RAGE

22

Index Node Table

NODE DATA

A1

Keys→[100,200] Pointers→[B,C]

POINTER

B2

Keys→[50,70] Pointers→[D,E]

E3

Keys→[10,30] Pointers→[RID,RID]

Ø B1

Keys→[52,70] Pointers→[D,E]

Ø

Index

A B C D E F G

E2

Keys→[11,30] Pointers→[RID,RID]

E1

Keys→[12,30] Pointers→[RID,RID]

slide-43
SLIDE 43

CMU 15-721 (Spring 2018)

CICADA: IN DEX STO RAGE

22

Index Node Table

NODE DATA

A1

Keys→[100,200] Pointers→[B,C]

POINTER

B2

Keys→[50,70] Pointers→[D,E]

E3

Keys→[10,30] Pointers→[RID,RID]

Ø B1

Keys→[52,70] Pointers→[D,E]

Ø

Index

A B C D E F G

E2

Keys→[11,30] Pointers→[RID,RID]

E1

Keys→[12,30] Pointers→[RID,RID]

slide-44
SLIDE 44

CMU 15-721 (Spring 2018)

CICADA: LOW CO N TEN TIO N

23

10 20 30 40 50

6 12 18 24

Throughput (txn/sec)

Millions

# Threads 2PL Silo Silo' TicToc FOEDUS Hekaton ERMIA Cicada

Workload: YCSB (95% read / 5% write) - 1 op per txn

Source: Hyeontaek Lim

slide-45
SLIDE 45

CMU 15-721 (Spring 2018)

CICADA: H IGH CO N TEN TIO N

24

0.11 0.22 0.33

6 12 18 24

Throughput (txn/sec)

Millions

# Threads 2PL Silo Silo' TicToc FOEDUS Hekaton ERMIA Cicada

Workload: TPC-C (1 Warehouse)

Source: Hyeontaek Lim

slide-46
SLIDE 46

CMU 15-721 (Spring 2018)

PARTIN G TH O UGH TS

There are different ways to check for phantoms in

  • MVCC. We will see more "traditional" ways next

lecture. Andy considers HyPer and Cicada to be state-of- the-art as of January 2018.

25

slide-47
SLIDE 47

CMU 15-721 (Spring 2018)

N EXT CLASS

Index Locking + Latching

26