ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control - PowerPoint PPT Presentation

Lect ure # 05 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Garbage Collection) @ Andy_Pavlo // 15- 721 // Spring 2019

CMU 15-721 (Spring 2019) 2 MVCC GARBAGE COLLECTIO N A MVCC DBMS needs to remove reclaimable physical versions from the database over time. → No active txn in the DBMS can “see” that version (SI). → The version was created by an aborted txn. The DBMS uses the tuples' version meta-data to decide whether it is visible.

CMU 15-721 (Spring 2019) 3 OBSERVATION We have assumed that queries / txns will complete in a short amount of time. This means that the lifetime of an obsolete version is short as well. But HTAP workloads may have long running queries that access old snapshots. Such queries block the traditional garbage collection methods that we have discussed.

CMU 15-721 (Spring 2019) 4 PROBLEM S WITH OLD VERSIONS Increased Memory Usage Memory Allocator Contention Longer Version Chains Garbage Collector CPU Spikes Poor Time-based Version Locality

CMU 15-721 (Spring 2019) 5 MVCC Deletes Indexes with MVCC Tables Garbage Collection Block Compaction

CMU 15-721 (Spring 2019) 6 MVCC DELETES The DBMS physically deletes a tuple from the database only when all versions of a logically deleted tuple are not visible. → If a tuple is deleted, then there cannot be a new version of that tuple after the newest version. → No write-write conflicts / first-writer wins We need a way to denote that tuple has been logically delete at some point in time.

CMU 15-721 (Spring 2019) 7 MVCC DELETES Approach #1: Deleted Flag → Maintain a flag to indicate that the logical tuple has been deleted after the newest physical version. → Can either be in tuple header or a separate column. Approach #2: Tombstone Tuple → Create an empty physical version to indicate that a logical tuple is deleted. → Use a separate pool for tombstone tuples with only a special bit pattern in version chain pointer to reduce the storage overhead.

CMU 15-721 (Spring 2019) 8 MVCC INDEXES MVCC DBMS indexes (usually) do not store version information about tuples with their keys. → Exception: Index-organized tables (e.g., MySQL) Every index must support duplicate keys from different snapshots: → The same key may point to different logical tuples in different snapshots.

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) VERSION BEGIN-TS END-TS POINTER ∞ A 1 1 Ø

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 UPDATE(A) VERSION BEGIN-TS END-TS POINTER ∞ A 1 1 Ø

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 UPDATE(A) VERSION BEGIN-TS END-TS POINTER ∞ A 1 1 20 Ø ∞ A 2 20 Ø

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 UPDATE(A) DELETE(A) VERSION BEGIN-TS END-TS POINTER ∞ A 1 1 20 Ø ∞ A 2 20 Ø

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 Commit @ 25 UPDATE(A) DELETE(A) VERSION BEGIN-TS END-TS POINTER ∞ A 1 1 25 20 Ø ∞ A 2 20 Ø 25 25

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 Commit @ 25 UPDATE(A) DELETE(A) VERSION BEGIN-TS END-TS POINTER ∞ Thread #3 A 1 1 20 25 Ø Begin @ 30 ∞ A 2 20 Ø 25 25 INSERT(A)

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) Thread #2 Begin @ 20 Commit @ 25 UPDATE(A) DELETE(A) VERSION BEGIN-TS END-TS POINTER ∞ Thread #3 A 1 1 20 25 Ø Begin @ 30 ∞ A 2 20 Ø 25 25 ∞ A 1 30 Ø INSERT(A)

CMU 15-721 (Spring 2019) 9 MVCC DUPLICATE KEY PROBLEM Thread #1 Index Begin @ 10 READ(A) READ(A) Thread #2 Begin @ 20 Commit @ 25 UPDATE(A) DELETE(A) VERSION BEGIN-TS END-TS POINTER ∞ Thread #3 A 1 1 20 25 Ø Begin @ 30 ∞ A 2 20 Ø 25 25 ∞ A 1 30 Ø INSERT(A)

CMU 15-721 (Spring 2019) 10 MVCC INDEXES Each index's underlying data structure has to support the storage of non-unique keys. Use additional execution logic to perform conditional inserts for pkey / unique indexes. → Atomically check whether the key exists and then insert. Workers may get back multiple entries for a single fetch. They then have to follow the pointers to find the proper physical version.

CMU 15-721 (Spring 2019) 11 GC DESIGN DECISIO NS Index Clean-up Version Tracking / Identification Granularity Comparison Unit HYBRID GARBAGE COLLECTION FOR MULTI- VERSION CONCURRENCY CONTROL IN SAP HANA SIGMOD 2016

CMU 15-721 (Spring 2019) 12 GC INDEX CLEAN - UP The DBMS must remove a tuples' keys from indexes when their corresponding versions are no longer visible to active txns. Track the txn's modifications to individual indexes to support GC of older versions on commit and removal modifications on abort.

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 UPDATE(A) key=222 VERSION BEGIN-TS END-TS KEY ∞ A 1 1 111

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 UPDATE(A) key=222 VERSION BEGIN-TS END-TS KEY ∞ A 1 1 10 111 ∞ A 2 10 222

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 UPDATE(A) UPDATE(A) key=222 key=333 VERSION BEGIN-TS END-TS KEY ∞ A 1 1 10 111 ∞ A 2 10 222

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 UPDATE(A) UPDATE(A) key=222 key=333 If a txn writes to same tuple more than once, then it just overwrites its previous version. VERSION BEGIN-TS END-TS KEY ∞ A 1 1 10 111 ∞ A 2 A 3 10 222 333

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 UPDATE(A) UPDATE(A) UPDATE(A) key=222 key=333 key=444 If a txn writes to same tuple more than once, then it just overwrites its previous version. VERSION BEGIN-TS END-TS KEY ∞ A 1 1 10 111 ∞ A 3 A 2 A 4 10 333 222 444

CMU 15-721 (Spring 2019) 13 PELOTO N M ISTAKE Thread #1 Index Begin @ 10 ABORT UPDATE(A) UPDATE(A) UPDATE(A) key=222 key=333 key=444 If a txn writes to same tuple more than once, then it just overwrites its previous version. VERSION BEGIN-TS END-TS KEY ∞ A 1 1 10 111 Upon rollback, the DBMS did ∞ A 2 A 3 A 4 10 222 333 444 not know what keys it added to the index in previous versions.

CMU 15-721 (Spring 2019) 14 GC VERSION TRACKIN G Approach #1: Tuple-level → Find old versions by examining tuples directly. → Background Vacuuming vs. Cooperative Cleaning Approach #2: Transaction-level → Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 - ∞ B 6 8 -

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 - ∞ B 6 8 - ∞ A 3 10 -

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 - ∞ B 6 8 - Old Versions ∞ A 3 10 - A 2

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 - ∞ B 6 8 - Old Versions ∞ A 3 10 - A 2 UPDATE(B)

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 - ∞ B 6 8 - 10 Old Versions ∞ A 3 10 - A 2 UPDATE(B) ∞ B 7 10 -

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 - ∞ B 6 8 - 10 Old Versions ∞ A 3 10 - A 2 UPDATE(B) ∞ B 7 10 - B 6

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 10 15 - Commit @ 15 ∞ B 6 8 - 15 10 Old Versions ∞ 15 A 3 10 - A 2 UPDATE(B) ∞ B 7 10 15 - B 6

CMU 15-721 (Spring 2019) 15 GC VERSION TRACKIN G Thread #1 VERSION BEGIN-TS END-TS DATA Begin @ 10 ∞ UPDATE(A) A 2 1 15 10 - Commit @ 15 ∞ B 6 8 - 10 15 Old Versions ∞ 15 A 3 10 - UPDATE(B) ∞ B 7 10 15 - Vacuum A 2 TS<15 B 6

CMU 15-721 (Spring 2019) 16 GC GRANULARITY How should the DBMS internally organize the expired versions that it needs to check to determine whether they are reclaimable. Trade-off between the ability to reclaim versions sooner versus computational overhead.

CMU 15-721 (Spring 2019) 17 GC GRANULARITY Approach #1: Single Version → Track the visibility of individual versions and reclaim them separately. → More fine-grained control, but higher overhead. Approach #2: Group Version → Organize versions into groups and reclaim all of them together. → Less overhead, but may delay reclamations.

ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control - PowerPoint PPT Presentation

Lect ure # 05 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Garbage Collection) @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 MVCC GARBAGE COLLECTIO N A MVCC DBMS needs to remove reclaimable physical

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

TACN - 2019 Tennessee Advanced Communication Network 1 Tennessee Advanced Communication Network

Challenges with Advanced Therapy Medicinal Products Challenges with Advanced Therapy Medicinal

Advanced Learning for Grades 6-12 Highly Capable and Advanced Learning Services Welcome! Who

THE ROLE OF THE ADVANCED CLINICAL PRACTITIONER IN MIDWIFERY Louise Clarke Trainee Advanced

Advanced Manufacturing @ Forsyth Tech Gary M. Green President Advanced Manufacturing

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

Expanding Enrollment in Advanced Expanding Enrollment in Advanced Expanding Enrollment in

Advanced UNIX CIS 218 Advanced UNIX Director ies again CIS 218 Advanced UNIX 1 Directory

Advanced SQL II Advanced Aggregation and OLAP 5DV120 Database System Principles Ume a

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Advanced Airway Management PRESENTED BY: JOSIAH POIRIER RN, JOHN GRUBER FP-C Advanced Airway

NEVADAS ADVANCED ENERGY SHOWCASE Ray Fakhoury March 12, 2019 Advanced Energy Economy 1

Regional Leadership Forums on Advanced Illness Care The Coalition To Transform Advanced Care

Thompson Sampling Algorithms for Mean-Variance Bandits Qiuyu Zhu Vincent Y. F. Tan Institute of

Rejection Power for the Hadron-cluster Background with the upgraded KOTO CsI calorimeter

High Energy Gamma-ray Observations with CALET Nicholas Cannady for the CALET Collaboration

Sampling from log-concave density Alain Durmus, Eric Moulines, Marcelo Pereyra Telecom ParisTech,

Distributed Garbage Collection for General Graphs Basic Approaches to Garbage Collection

A Story of Parametric Trace Slicing, Garbage and Static Analysis Giles Reger School of Computer

FOX Garbage Collection FOX / GC Example 1 ex1: garbage at end let x = (1, 2) , y = let tmp =

Basics of Garbage Collection Merlin Laue Universitt zu Lbeck 9. November 2015 Merlin Laue