15-721 DATABASE SYSTEMS [Source] Lecture #04 Concurrency Control - - PowerPoint PPT Presentation

15 721
SMART_READER_LITE
LIVE PREVIEW

15-721 DATABASE SYSTEMS [Source] Lecture #04 Concurrency Control - - PowerPoint PPT Presentation

15-721 DATABASE SYSTEMS [Source] Lecture #04 Concurrency Control Part II Andy Pavlo / / Carnegie Mellon University / / Spring 2016 2 TODAYS AGENDA Isolation Levels Modern Multi-Version Concurrency Control CMU 15-721 (Spring 2016)


slide-1
SLIDE 1

Andy Pavlo / / Carnegie Mellon University / / Spring 2016

Lecture #04 – Concurrency Control Part II

DATABASE SYSTEMS

15-721

[Source]
slide-2
SLIDE 2

CMU 15-721 (Spring 2016)

TODAY’S AGENDA

Isolation Levels Modern Multi-Version Concurrency Control

2

slide-3
SLIDE 3

CMU 15-721 (Spring 2016)

OBSERVATION

Serializability is useful because it allows programmers to ignore concurrency issues but enforcing it may allow too little parallelism and limit performance. We may want to use a weaker level of consistency to improve scalability.

3

slide-4
SLIDE 4

CMU 15-721 (Spring 2016)

ISOLATION LEVELS

Controls the extent that a txn is exposed to the actions of other concurrent txns. Provides for greater concurrency at the cost of exposing txns to uncommitted changes:

→ Dirty Read Anomaly → Unrepeatable Reads Anomaly → Phantom Reads Anomaly

4

slide-5
SLIDE 5

CMU 15-721 (Spring 2016)

ANSI ISOLATION LEVELS

SERIALIZABLE

→ No phantoms, all reads repeatable, no dirty reads.

REPEATABLE READS

→ Phantoms may happen.

READ COMMITTED

→ Phantoms and unrepeatable reads may happen.

READ UNCOMMITTED

→ All of them may happen.

5

Isolation (High→Low)

slide-6
SLIDE 6

CMU 15-721 (Spring 2016)

ISOLATION LEVEL HIERARCHY

6

REPEATABLE READS READ UNCOMMITTED SERIALIZABLE READ COMMITTED

slide-7
SLIDE 7

CMU 15-721 (Spring 2016)

ANSI ISOLATION LEVELS

7

Default Maximum

Actian Ingres 10.0/10S SERIALIZABLE SERIALIZABLE Greenplum 4.1 READ COMMITTED SERIALIZABLE MySQL 5.6 REPEATABLE READS SERIALIZABLE MemSQL 1b READ COMMITTED READ COMMITTED MS SQL Server 2012 READ COMMITTED SERIALIZABLE Oracle 11g READ COMMITTED SNAPSHOT ISOLATION Postgres 9.2.2 READ COMMITTED SERIALIZABLE SAP HANA READ COMMITTED SERIALIZABLE ScaleDB 1.02 READ COMMITTED READ COMMITTED VoltDB SERIALIZABLE SERIALIZABLE

Source: Peter Bailis

slide-8
SLIDE 8

CMU 15-721 (Spring 2016)

ANSI ISOLATION LEVELS

7

Default Maximum

Actian Ingres 10.0/10S SERIALIZABLE SERIALIZABLE Greenplum 4.1 READ COMMITTED SERIALIZABLE MySQL 5.6 REPEATABLE READS SERIALIZABLE MemSQL 1b READ COMMITTED READ COMMITTED MS SQL Server 2012 READ COMMITTED SERIALIZABLE Oracle 11g READ COMMITTED SNAPSHOT ISOLATION Postgres 9.2.2 READ COMMITTED SERIALIZABLE SAP HANA READ COMMITTED SERIALIZABLE ScaleDB 1.02 READ COMMITTED READ COMMITTED VoltDB SERIALIZABLE SERIALIZABLE

Source: Peter Bailis

slide-9
SLIDE 9

CMU 15-721 (Spring 2016)

CRITICISM OF ISOLATION LEVELS

The isolation levels defined as part of SQL-92 standard only focused on anomalies that can

  • ccur in a 2PL-based DBMS.

Two additional isolation levels:

→ CURSOR STABILITY → SNAPSHOT ISOLATION

8

A CRITIQUE OF ANSI SQL ISOLATION LEVELS SIGMOD 1995

slide-10
SLIDE 10

CMU 15-721 (Spring 2016)

CURSOR STABILITY (CS)

The DBMS’s internal cursor maintains a lock

  • n a item in the database until it moves on to

the next item. CS is a stronger isolation level in between REPEATABLE READS and READ COMMITTED that can (sometimes) prevent the Lost Update Anomaly.

9

slide-11
SLIDE 11

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-12
SLIDE 12

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-13
SLIDE 13

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-14
SLIDE 14

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-15
SLIDE 15

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-16
SLIDE 16

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-17
SLIDE 17

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

Txn #2’s write to A will be lost even though it commits after Txn #1.

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •
slide-18
SLIDE 18

CMU 15-721 (Spring 2016)

LOST UPDATE ANOMALY

Txn #2’s write to A will be lost even though it commits after Txn #1.

10

Txn #2

BEGIN COMMIT

WRITE(A)

Txn #1

BEGIN COMMIT

READ(A) WRITE(A)

  • • •
  • • •
  • • • • •

A cursor lock on A would prevent this problem (but not always).

slide-19
SLIDE 19

CMU 15-721 (Spring 2016)

SNAPSHOT ISOLATION (SI)

Guarantees that all reads made in a txn see a consistent snapshot of the database that existed at the time the txn started.

→ A txn will commit under SI only if its writes do not conflict with any concurrent updates made since that snapshot.

SI is susceptible to the Write Skew Anomaly

11

slide-20
SLIDE 20

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

slide-21
SLIDE 21

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

slide-22
SLIDE 22

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-23
SLIDE 23

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-24
SLIDE 24

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-25
SLIDE 25

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-26
SLIDE 26

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-27
SLIDE 27

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

slide-28
SLIDE 28

CMU 15-721 (Spring 2016)

WRITE SKEW ANOMALY

12

Txn #1

Change white marbles to black.

Txn #2

Change black marbles to white.

slide-29
SLIDE 29

CMU 15-721 (Spring 2016)

ISOLATION LEVEL HIERARCHY

13

REPEATABLE READS SNAPSHOT ISOLATION READ UNCOMMITTED CURSOR STABILITY SERIALIZABLE READ COMMITTED

slide-30
SLIDE 30

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

Timestamp-ordering scheme that maintains multiple versions of database objects:

→ When a txn writes to an object, the DBMS creates a new version of that object. → When a txn reads an object, it reads the newest version that existed when the txn started.

First proposed in 1978 MIT PhD dissertation.

14

slide-31
SLIDE 31

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •
slide-32
SLIDE 32

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000

slide-33
SLIDE 33

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000

10001

slide-34
SLIDE 34

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000

10001

slide-35
SLIDE 35

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000

10001

slide-36
SLIDE 36

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2

10001

slide-37
SLIDE 37

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2

10001

slide-38
SLIDE 38

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2

10001

slide-39
SLIDE 39

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2 10003 A2

10001

slide-40
SLIDE 40

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2 10003 A2

10001

slide-41
SLIDE 41

CMU 15-721 (Spring 2016)

MULTI-VERSION CONCURRENCY CONTROL

15

Txn #1

BEGIN COMMIT

READ(A) WRITE(B) WRITE(A)

  • • • •
  • • •

Record Write Timestamp

A1 10000 B1 10000 10001 B2 10003 A2

10001

slide-42
SLIDE 42

CMU 15-721 (Spring 2016)

MODERN MVCC

Microsoft Hekaton (SQL Server) TUM HyPer HPI HYRISE SAP HANA

16

slide-43
SLIDE 43

CMU 15-721 (Spring 2016)

MICROSOFT HEKATON

Incubator project started in 2008 to create new OLTP engine for MSFT SQL Server (MSSQL).

→ Led by DB ballers Paul Larson and Mike Zwilling

Had to integrate with MSSQL ecosystem. Had to support all possible OLTP workloads with predictable performance.

→ Single-threaded partitioning (e.g., H-Store) works well for some applications but terrible for others.

17

slide-44
SLIDE 44

CMU 15-721 (Spring 2016)

HEKATON MVCC

Every txn is assigned a timestamp (TS) when they begin and when they commit. DBMS maintains “chain” of versions per tuple:

→ BEGIN: The BeginTS of the active txn or the EndTS of the committed txn that created it. → END: The BeginTS of the active txn that created the next version or infinity or the EndTS of the committed txn that created it. → POINTER: Location of the next version in the chain.

18

HIGH-PERFORMANCE CONCURRENCY CONTROL MECHANISMS FOR MAIN- MEMORY DATABASES VLDB 2011

slide-45
SLIDE 45

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

INDEX

slide-46
SLIDE 46

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX

slide-47
SLIDE 47

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-48
SLIDE 48

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-49
SLIDE 49

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-50
SLIDE 50

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Read “John”

slide-51
SLIDE 51

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Update “John” Read “John”

slide-52
SLIDE 52

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Update “John” Read “John”

slide-53
SLIDE 53

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

BEGIN @ 25 INDEX Update “John” Read “John”

slide-54
SLIDE 54

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-55
SLIDE 55

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-56
SLIDE 56

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-57
SLIDE 57

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

slide-58
SLIDE 58

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

slide-59
SLIDE 59

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

35 35

slide-60
SLIDE 60

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

19

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” COMMIT @ 35

35 35

REWIND

slide-61
SLIDE 61

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John”

slide-62
SLIDE 62

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30

slide-63
SLIDE 63

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John”

slide-64
SLIDE 64

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John”

slide-65
SLIDE 65

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John”

slide-66
SLIDE 66

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-67
SLIDE 67

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-68
SLIDE 68

CMU 15-721 (Spring 2016)

BEGIN END POINTER ATTR1 ATTR2

HEKATON: OPERATIONS

20

10 20 John $100 Txn25

John $130 20 John $110

Txn25

BEGIN @ 25 INDEX Update “John” Read “John” BEGIN @ 30 Read “John” Update “John”

slide-69
SLIDE 69

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION STATE MAP

Global map of all txns’ states in the system:

→ ACTIVE: The txn is executing read/write operations. → VALIDATING: The txn has invoked commit and the DBMS is checking whether it is valid. → COMMITTED: The txn is finished, but may have not updated its versions’ TS. → TERMINATED: The txn has updated the TS for all of the versions that it created.

21

slide-70
SLIDE 70

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Txn events Txn phases

Source: Paul Larson

slide-71
SLIDE 71

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Txn events Txn phases

Source: Paul Larson

slide-72
SLIDE 72

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Normal processing Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

slide-73
SLIDE 73

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Precommit Normal processing Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

Get txn end timestamp, set state to VALIDATING

slide-74
SLIDE 74

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Precommit Normal processing Validation Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

Get txn end timestamp, set state to VALIDATING Validate reads and scans

→ If validation OK, write new versions to redo log

slide-75
SLIDE 75

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Precommit Commit Normal processing Validation Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

Get txn end timestamp, set state to VALIDATING Validate reads and scans

→ If validation OK, write new versions to redo log

Set txn state to COMMITTED

slide-76
SLIDE 76

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Precommit Commit Normal processing Validation Post- processing Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

Get txn end timestamp, set state to VALIDATING Validate reads and scans

→ If validation OK, write new versions to redo log

Set txn state to COMMITTED Fix up version timestamps

→ Begin TS in new versions, end TS in old versions

slide-77
SLIDE 77

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION LIFECYCLE

22

Get txn start timestamp, set state to ACTIVE

Begin Precommit Commit Terminate Normal processing Validation Post- processing Txn events Txn phases

Source: Paul Larson

Perform normal processing

→ Track txn’s read set, scan set, and write set.

Get txn end timestamp, set state to VALIDATING Validate reads and scans

→ If validation OK, write new versions to redo log

Set txn state to COMMITTED Fix up version timestamps

→ Begin TS in new versions, end TS in old versions

Set txn state to TERMINATED Remove from txn map

slide-78
SLIDE 78

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION META-DATA

Read Set

→ Pointers to every version read.

Write Set

→ Pointers to versions updated (old and new), versions deleted (old), and version inserted (new).

Scan Set

→ Stores enough information needed to perform each scan operation.

Commit Dependencies

→ List of txns that are waiting for this txn to finish.

23

slide-79
SLIDE 79

CMU 15-721 (Spring 2016)

HEKATON: TRANSACTION VALIDATION

Read Stability

→ Check that each version read is still visible as of the end of the txn.

Phantom Avoidance

→ Repeat each scan to check whether new versions have become visible since the txn began.

Extent of validation depends on isolation level:

→ SERIALIZABLE: Read Stability + Phantom Avoidance → REPEATABLE READS: Read Stability → SNAPSHOT ISOLATION: None → READ COMMITTED: None

24

slide-80
SLIDE 80

CMU 15-721 (Spring 2016)

HEKATON: OPTIMISTIC VS. PESSIMISTIC

Optimistic Txns:

→ Check whether a version read is still visible at the end of the txn. → Repeat all index scans to check for phantoms.

Pessimistic Txns:

→ Use shared & exclusive locks on records and buckets. → No validation is needed. → Separate background thread to detect deadlocks.

25

slide-81
SLIDE 81

CMU 15-721 (Spring 2016)

HEKATON: OPTIMISTIC VS. PESSIMISTIC

26

0.5 1 1.5 2

6 12 18 24

Throughput (txn/sec)

Millions

# Threads Optimistic Pessimistic

Source: Paul Larson

Database: Single table with 1000 tuples Workload: 80% read-only txns + 20% update txns Processor: 2 sockets, 12 cores

slide-82
SLIDE 82

CMU 15-721 (Spring 2016)

HEKATON: IMPLEMENTATION

Use only lock-free data structures

→ No latches, spin locks, or critical sections → Indexes, txn map, memory alloc, garbage collector → We will discuss Bw-Trees + Skip Lists later…

Only one single serialization point in the DBMS to get the txn’s begin and commit timestamp

→ Atomic Addition (CAS)

27

slide-83
SLIDE 83

CMU 15-721 (Spring 2016)

HEKATON: PERFORMANCE

Bwin – Large online betting company

→ Before: 15,000 requests/sec → Hekaton: 250,000 requests/sec

EdgeNet – Up-to-date inventory status

→ Before: 7,450 rows/sec (ingestion rate) → Hekaton: 126,665 rows/sec

SBI Liquidity Market – FOREX broker

→ Before: 2,812 txn/sec with 4 sec latency → Hekaton: 5,313 txn/sec with <1 sec latency

28

Source: Paul Larson

slide-84
SLIDE 84

CMU 15-721 (Spring 2016)

MVCC DESIGN CHOICES

Version Chains Version Storage Garbage Collection

29

slide-85
SLIDE 85

CMU 15-721 (Spring 2016)

VERSION CHAINS

Approach #1: Oldest-to-Newest

→ Just append new version to end of the chain. → Have to traverse chain on look-ups.

Approach #2: Newest-to-Oldest

→ Have to update index pointers for every new version. → Don’t have to traverse chain on look ups.

The ordering of the chain has different performance trade-offs.

30

slide-86
SLIDE 86

CMU 15-721 (Spring 2016)

VERSION STORAGE

Approach #1: Insert Method

→ New versions are added as new tuples to the table.

Approach #2: Delta Method

→ Copy the current version to a separate storage location and then overwrite it with the new data. → Rollback segment with deltas, Time-travel table

31

slide-87
SLIDE 87

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

32

BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

slide-88
SLIDE 88

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

32

BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

slide-89
SLIDE 89

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

slide-90
SLIDE 90

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

Rollback Segment (Per Tuple)

BEGIN END DELTA BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

slide-91
SLIDE 91

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

Rollback Segment (Per Tuple)

10 20 (ATTR2→$100)

BEGIN END DELTA BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

slide-92
SLIDE 92

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

Rollback Segment (Per Tuple)

10 20 (ATTR2→$100)

BEGIN END DELTA BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

$110 20 25

slide-93
SLIDE 93

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

Rollback Segment (Per Tuple)

10 20 (ATTR2→$100)

BEGIN END DELTA

20 25 (ATTR2→$110)

BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

$110 20 25 $130 30 35

slide-94
SLIDE 94

CMU 15-721 (Spring 2016)

ROLLBACK SEGMENTS

On every update, copy the old version to the rollback segment and overwrite the tuple in the main data table.

32

Rollback Segment (Per Tuple)

10 20 (ATTR2→$100)

BEGIN END DELTA

20 25 (ATTR2→$110)

BEGIN END ATTR1 ATTR2

10 20 John $100

Main Data Table

$110 20 25 $130 30 35

Txns can recreate old versions by applying the delta in reverse order.

slide-95
SLIDE 95

CMU 15-721 (Spring 2016)

GARBAGE COLLECTION

Approach #1: Vacuum Thread

→ Use a separate background thread to find old versions and delete them.

Approach #2: Cooperative Threads

→ Worker threads remove old versions that they encounter during scans.

GC overhead depends on read/write ratio

→ Hekaton authors report about a 15% overhead on a write-heavy workload. Typically much less.

33

slide-96
SLIDE 96

CMU 15-721 (Spring 2016)

OBSERVATIONS

Read/scan set validations are expensive if the txns access a lot of data. Appending new versions hurts the performance of OLAP scans due to pointer chasing & branching. Record-level conflict checks may be too coarse- grained and incur false positives.

34

slide-97
SLIDE 97

CMU 15-721 (Spring 2016)

HYPER MVCC

Rollback Segment with Deltas

→ In-Place updates for non-indexed attributes → Delete/Insert updates for indexed attributes.

Newest-to-Oldest Version Chains No Predicate Locks Avoids write-write conflicts by aborting txns that try to update an uncommitted object.

35

FAST SERIALIZABLE MULTI-VERSION CONCURRENCY CONTROL FOR MAIN- MEMORY DATABASE SYSTEMS SIGMOD 2015

slide-98
SLIDE 98

CMU 15-721 (Spring 2016)

HYPER MVCC

36

Rollback Segment (Per Txn) Main Data Table

ATTR1

Tupac IceT B.I.G DrDre

ATTR2

$100 $200 $150 $99

Version Vector

(ATTR2→$122) Txn 263+1 (ATTR2→$199) Txn 123 (ATTR2→$100) Txn 263 (ATTR2→$139)

slide-99
SLIDE 99

CMU 15-721 (Spring 2016)

HYRISE MVCC

Insert Method (no rollback segment) Oldest-to-Newest No garbage collection. All updates are executed as DELETE/INSERT.

37

EFFICIENT TRANSACTION PROCESSING FOR HYRISE IN MIXED WORKLOAD ENVIRONMENTS IMDM 2014

slide-100
SLIDE 100

CMU 15-721 (Spring 2016)

SAP HANA MVCC

Insert Method (no rollback segment) Background GC thread (optional) It’s not clear what else they are doing…

38

HIGH-PERFORMANCE TRANSACTION PROCESSING IN SAP HANA IEEE Data Engineering Bulletin 2013

slide-101
SLIDE 101

CMU 15-721 (Spring 2016)

PARTING THOUGHTS

MVCC is currently the best approach for supporting txns in mixed workoads

→ Readers are not blocked by writers.

HyPer’s MVCC makes a lot of good decisions for HTAP workloads.

39

slide-102
SLIDE 102

CMU 15-721 (Spring 2016)

NEXT CLASS

Stored Procedures Optimistic Concurrency Control

40