Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, - PowerPoint PPT Presentation

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen PARALLEL DATA LABORATORY Carnegie Mellon University

Part I Initial Exploration of Hybrid Indexes [SIGMOD’16] 5

You are running out of memory 6

? Buy more You are running out of memory 6

TPC-C on -Store Memory Limit = 5GB Throughput 60K 20K 8M 10M 0 2M 4M 6M Transactions Executed Memory (GB) 8 Disk tuples 4 In-memory tuples Indexes 0 7

The better way: Use memory more efficiently 9

Indexes are LARGE Hybrid Index Benchmark % space for index 58% 34% TPC-C 55% 41% Voter 34% 18% Articles 10

Our Contributions [SIGMOD’16] The hybrid index architecture The Dual-Stage Transformation Applied to 4 index structures - B+tree - Skip List - Masstree - Adaptive Radix Tree (ART) Performance Space 30 – 70% 11

Did we solve this problem? -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Stay tuned Transactions Executed 12

How do hybrid indexes achieve memory savings ? Static 13

Hybrid Index: a dual-stage architecture dynamic stage static stage 14

Inserts are batched in the dynamic stage write merge dynamic stage static stage 15

Reads search the stages in order dynamic stage static stage 16

A Bloom filter improves read performance read dynamic stage static stage 17

Memory-efficient Skew-aware read write merge ~ ~ ~ ~ ~ ~ ~ ~ dynamic stage static stage 18

The Dual-Stage Transformation merge dynamic stage static stage 19

The Dynamic-to-Static Rules Compaction Reduction Compression 20

4 2 4 6 8 10 11 12 1 2 5 5 5 6 7 8 9 10 3 4 g h i j k l m n a b c d e f 21

Compaction: minimize # of memory blocks 4 2 4 6 8 10 11 12 1 2 5 5 5 6 7 8 9 10 3 4 g h i j k l m n a b c d e f 21

Compaction: minimize # of memory blocks 3 6 9 1 2 3 7 8 9 10 11 12 4 5 6 l m n a b c d h i j k e f g 21

Reduction: minimize structural overhead 3 6 9 10 11 12 1 2 3 7 8 9 4 5 6 a b c i j k l m n d h e f g 22

Reduction: minimize structural overhead 3 6 9 1 2 3 7 8 9 4 5 6 10 11 12 a b c d h i j k l m n e f g 22

Reduction: minimize structural overhead 4 3 6 9 2 4 6 8 10 1 2 3 7 8 9 4 5 6 10 11 12 a b c d h i j k l m n 11 12 1 2 5 5 5 6 7 8 9 10 3 4 e f g g h i j k l m n a b c d e f 22

The merge routine is a blocking process merge dynamic stage static stage 23

The merge routine is a blocking process ? Size % merge dynamic stage static stage 23

Did we solve this problem? B+tree -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Transactions Executed 24

Yes, we improved the DBMS’s capacity! B+tree -Store Throughput (txn/s) 60K 20K 8M 10M 0 2M 4M 6M TPC-C on Hybrid 60K 20K Transactions Executed 24

Throughput (txn/s) B+tree 60K -Store 20K Hybrid 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25

Throughput (txn/s) B+tree 60K Take Away: -Store 20K Higher Memory saved Larger working Hybrid throughput by indexes set in memory 60K 20K 4M 8M 10M 0 2M 6M TPC-C on 8 B+tree Disk tuples Memory (GB) 4 In-memory tuples Indexes 8 Hybrid 4 Transactions Executed 25

Part I Recap The hybrid index architecture GENERAL The Dual-Stage Transformation PRACTICAL Applied to 4 index structures USEFUL - B+tree - Skip List - Masstree - Adaptive Radix Tree (ART) 26

Part II Concurrent hybrid indexes with non- blocking merge 27

Building Concurrent Hybrid Index? merge write dynamic stage static stage 28

Use concurrent data structures for dynamic-stage merge write dynamic stage static stage 29

Static-stage is perfectly concurrent by default merge write dynamic stage static stage 30

Challenge: efficient non-blocking merge algorithm merge write dynamic stage static stage 31

Merge Algorithm Requirements Non-blocking - All existing items are accessible during merge - New items can still enter Efficient - Fast - Bounded temporary memory use 32

Naïve Solution 1: Coarse-grained Locking merge write dynamic stage static stage 33

The intermediate stage unblocks write traffic merge write dynamic stage static stage 34

The intermediate stage unblocks write traffic merge freeze write dynamic stage static stage Intermediate stage 34

How do we unblock reads during merge? merge static stage Intermediate stage 35

Naïve Solution 2: Full Copy-on-write merge static stage Intermediate stage 36

Key Observation Merged-in items in the static-stage will NOT be accessed until the intermediate-stage is deleted Merge Incrementally! 37

Our Solution: Incremental Copy-on-write with Rapid GC parent new old 38

Our Solution: Incremental Copy-on-write with Rapid GC parent When can we safely reclaim the garbage? new old 38

Our Solution: Incremental Copy-on-write with Rapid GC parent When no thread still holds a reference to it! new old 38

Our Solution: Incremental Copy-on-write with Rapid GC Thread-local counters C n C 1 C 2 C 3 parent When no thread still holds a reference to it! new old 38

Our Solution: Incremental Copy-on-write with Rapid GC Thread-local counters C max C max C min C n C 1 C 2 C 3 ++C i = MAX(C i , C max ) + 1 parent GC Condition: When no thread still C min > garbage tag holds a reference to it! new old 38

A Quick Recap of the Merge Algorithm The intermediate stage separates writes from the merge process The incremental merge algorithm with rapid GC is non-blocking and space-efficient 39

What we are building now Non-blocking Compact Radix Tree Merge 40

What we are building now Non-blocking Compact Bwtree Radix Tree Merge 40

What we are building now Non-blocking Compact Skiplist Radix Tree Merge 40

What we are building now Non-blocking Compact Masstree Radix Tree Merge 40

Part III Super-compact static-stage 41

Go “crazy” on space-efficiency Succinct Data Structures - Z + o(Z), where Z is the information-theoretic lower bound - Still allow for efficient query operations 100011010000101… rank 1 (x) = # of 1’s up to position x select 1 (x) = position of the x-occurrence of 1 42

Encoding Radix Tree a a 0 10 $ a b 1000 $ab 100 $ l n r $ a $lnr$a 100010 10000100 $ i i o $ 100101010 $iio$ 10001 i $ n i$n 010 101010 $ $ 1010 $$ 11 43

Memory Savings with the New Encoding 1000 50M email keys with average Memory (MB) 800 length = 20 bytes 600 84% 400 200 0 ART Our Encoding 44

The Takeaway Message Hybrid indexes can save the precious memory resources with minimum performance penalty. 45

Toll-Free Hotline: 1-844-88-CMUDB 44

Back-up Slides

Latency (ms) B+tree Hybrid 50% 10 10 99% 50 52 MAX 115 611

Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, - PowerPoint PPT Presentation

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen PARALLEL DATA LABORATORY Carnegie Mellon University 2 3 4 Part I Initial

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree

Indexes 1 Demo 2 Indexes Index = data structure

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Hybrid Indexes Huanchen Zhang You are running out of memory 2 You are running out of memory 2

Hybrid Indexes D a v i d G . A n d e r s e n A n d r e w P a v l o M i

Fostering Inclusion in the Workplace 1 Why Do We Want an Inclusive Environment? 2 Who is

SU SURVEY O ON R REFORM S SYNAGO GOGU GUE IN INTE TERFAITH ITH IN INCLUSIO SION P

Learning at Home with Inclusion Alberta Tuesday, May 12, 2020 Welcome to Learning at Home with

Santa Clara County Office of Education Data Initiatives California Budget & Policy Center

Tree-Structured Indexes (From Chapter 9)

Pattern Matching in Genomic Sequences through ReRAM Technology Farzaneh Zokaee and Lei Jiang

Nov 2010 Statistical Literacy: Harper's Magazine Fall 2010 1 Fall 2010 2 Statistical

An I ndex Num ber Form ula Problem : the Aggregation of Broadly Com parable I tem s Mick Silver*

Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, - PowerPoint PPT Presentation

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen PARALLEL DATA LABORATORY Carnegie Mellon University 2 3 4 Part I Initial

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Dow Jones Sustainability Indexes A cooperation of Dow Jones Indexes and SAM Content Key

RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Se Kwon Lee, Jayashree

Indexes 1 Demo 2 Indexes Index = data structure

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Hybrid Indexes Huanchen Zhang You are running out of memory 2 You are running out of memory 2

Hybrid Indexes D a v i d G . A n d e r s e n A n d r e w P a v l o M i

Fostering Inclusion in the Workplace 1 Why Do We Want an Inclusive Environment? 2 Who is

SU SURVEY O ON R REFORM S SYNAGO GOGU GUE IN INTE TERFAITH ITH IN INCLUSIO SION P

Learning at Home with Inclusion Alberta Tuesday, May 12, 2020 Welcome to Learning at Home with

Santa Clara County Office of Education Data Initiatives California Budget &amp; Policy Center

Tree-Structured Indexes (From Chapter 9)

Pattern Matching in Genomic Sequences through ReRAM Technology Farzaneh Zokaee and Lei Jiang

Nov 2010 Statistical Literacy: Harper's Magazine Fall 2010 1 Fall 2010 2 Statistical

An I ndex Num ber Form ula Problem : the Aggregation of Broadly Com parable I tem s Mick Silver*

Santa Clara County Office of Education Data Initiatives California Budget & Policy Center