SIAS-Chains: Snapshot Isolation Append Storage Chains Dr. Robert - - PowerPoint PPT Presentation

sias chains snapshot isolation append storage chains
SMART_READER_LITE
LIVE PREVIEW

SIAS-Chains: Snapshot Isolation Append Storage Chains Dr. Robert - - PowerPoint PPT Presentation

SIAS-Chains: Snapshot Isolation Append Storage Chains Dr. Robert Gottstein Prof. Ilia Petrov M.Sc. Sergej Hardock Prof. Alejandro Buchmann Motivation: Storage Technology Evolution Significant impact of storage technology evolution 30000 260


slide-1
SLIDE 1

SIAS-Chains: Snapshot Isolation Append Storage Chains

  • Dr. Robert Gottstein
  • Prof. Ilia Petrov

M.Sc. Sergej Hardock

  • Prof. Alejandro Buchmann
slide-2
SLIDE 2

Motivation: Storage Technology Evolution

| Dr. Robert Gottstein | 01.09.2017

Significant impact of storage technology evolution ▪ Savvio 15k HDD ▪ Seq. Read / Write: 160 MB/s ▪ Read/Write IOPS: 350 / 300 ▪ Latency Read/Write: 3.2 / 3.5 ms ▪ Direct overwrite ▪ Intel X25-E SLC SSD ▪ Seq. Read/Write: 250 / 170 MB/s ▪ Read/Write IOPS (4K): 35 000 / 3 300 ▪ Latency Read/Write (4K): 0.075/0.085 ms ▪ Erase before overwrite ▪ slow & large granularity

30 300 3000 30000 4 8 16 32 64 128 256

Random Throughput [IOPS]

Blocksize [KB] 20 40 60 80 100 120 140 160 180 200 220 240 260 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1024KB

Sequential Throughput [MB/s]

Blocksize [KB]

read write

slide-3
SLIDE 3

Motivation: Storage Technology Evolution

| Dr. Robert Gottstein | 01.09.2017

Significant impact of storage technology evolution ▪ Savvio 15k HDD ▪ Seq. Read / Write: 160 MB/s ▪ Read/Write IOPS: 350 / 300 ▪ Latency Read/Write: 3.2 / 3.5 ms ▪ Direct overwrite ▪ Intel X25-E SLC SSD ▪ Seq. Read/Write: 250 / 170 MB/s ▪ Read/Write IOPS (4K): 35 000 / 3 300 ▪ Latency Read/Write (4K): 0.075/0.085 ms ▪ Erase before overwrite ▪ slow & large granularity

30 300 3000 30000 4 8 16 32 64 128 256

Random Throughput [IOPS]

Blocksize [KB] 20 40 60 80 100 120 140 160 180 200 220 240 260 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1024KB

Sequential Throughput [MB/s]

Blocksize [KB]

read write

HDD: symmetric read/write; high Latency; big block; rotational moving parts SSD: asymmetric read/write; low Latency; No In-Place Updates; small block; write sequentialization; Intrinsic Parallelism; Endurance

DBMS needs to Leverage: ▪ Fast Reads ▪ Low Latencies ▪ Asymmetry ▪ Parallelism ▪ Write Sequentialization

slide-4
SLIDE 4

Motivation: Storage Technology Evolution

| Dr. Robert Gottstein | 01.09.2017

Significant impact of storage technology evolution ▪ Savvio 15k HDD ▪ Seq. Read / Write: 160 MB/s ▪ Read/Write IOPS: 350 / 300 ▪ Latency Read/Write: 3.2 / 3.5 ms ▪ Direct overwrite ▪ Intel X25-E SLC SSD ▪ Seq. Read/Write: 250 / 170 MB/s ▪ Read/Write IOPS (4K): 35 000 / 3 300 ▪ Latency Read/Write (4K): 0.075/0.085 ms ▪ Erase before overwrite ▪ slow & large granularity

30 300 3000 30000 4 8 16 32 64 128 256

Random Throughput [IOPS]

Blocksize [KB] 20 40 60 80 100 120 140 160 180 200 220 240 260 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1024KB

Sequential Throughput [MB/s]

Blocksize [KB]

read write

HDD: symmetric read/write; high Latency; big block; rotational moving parts SSD: asymmetric read/write; low Latency; No In-Place Updates; small block; write sequentialization; Intrinsic Parallelism; Endurance

DBMS needs to Leverage: ▪ Fast Reads ▪ Low Latencies ▪ Asymmetry ▪ Parallelism ▪ Write Sequentialization

Multi Version DBMS: In principle suitable for asymmetric storage.

  • Parallelism. Out-of place updates. Sequentialization....
slide-5
SLIDE 5

▪ Asymmetric: Fast Reads & Slow Writes ▪ Low Latency: no moving parts ▪ No In-Place Updates: Need to erase first (slow) ▪ Intrinsic Parallelism: Read in parallel

Visibility

▪ Timestamps

creation: tscreate invalidation: tsinval

Introduction

| Dr. Robert Gottstein | 01.09.2017

Version Organization & Invalidation

Relation R … A ….

Version X0 … … Version X1 … … Version X2 … … 11 10 9

Item X

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

Tuple X0 Value=9 (tscreate=123, tsinval=null) Tuple X1 Value=10 (tscreate=134, tsinval=null) Tuple X2 Value=11 (tscreate=141, tsinval=null) Tuple X0 Value=9 (tscreate=123, tsinval=134) Tuple X1 Value=10 (tscreate=134, tsinval=141)

slide-6
SLIDE 6

▪ Asymmetric: Fast Reads & Slow Writes ▪ Low Latency: no moving parts ▪ No In-Place Updates: Need to erase first (slow) ▪ Intrinsic Parallelism: Read in parallel

Visibility

▪ Timestamps

creation: tscreate invalidation: tsinval

Introduction

| Dr. Robert Gottstein | 01.09.2017

Version Organization & Invalidation

Relation R … A ….

Version X0 … … Version X1 … … Version X2 … … 11 10 9

Item X

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

Tuple X0 Value=9 (tscreate=123, tsinval=null) Tuple X1 Value=10 (tscreate=134, tsinval=null) Tuple X2 Value=11 (tscreate=141, tsinval=null) Tuple X0 Value=9 (tscreate=123, tsinval=134) Tuple X1 Value=10 (tscreate=134, tsinval=141)

Version Organization & Invalidation Small Random Updates

slide-7
SLIDE 7

SIAS: Snapshot Isolation Append Storage

▪ Version Organization

▪ Backward Chaining of versions ▪ Chain identified by virtual ID (VID) ▪ Store the entrypoint in datastructure: VIDmap

▪ New Invalidation

▪ Invalidation coded within the chain ▪ „One-place“ Invalidation

▪ Append Storage

▪ Append tuple versions to a new page ▪ Write page when filled or on a threshold

| Dr. Robert Gottstein | 01.09.2017

SIAS in a nutshell: redesign architecture and algorithms

Tuple X0 Value=9 (tscreate=123, VID=34) Tuple X1 Value=10 (tscreate=134, VID=34) Tuple X2 Value=11 (tscreate=141, VID=34)

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

Tuple X0 Value=9 (tscreate=123, tsinval=null) Tuple X1 Value=10 (tscreate=134, tsinval=null) Tuple X2 Value=11 (tscreate=141, tsinval=null) Tuple X0 Value=9 (tscreate=123, tsinval=134) Tuple X1 Value=10 (tscreate=134, tsinval=141)

Item X VID=34

slide-8
SLIDE 8

SIAS: Snapshot Isolation Append Storage

▪ Version Organization

▪ Backward Chaining of versions ▪ Chain identified by virtual ID (VID) ▪ Store the entrypoint in datastructure: VIDmap

▪ New Invalidation

▪ Invalidation coded within the chain ▪ „One-place“ Invalidation

▪ Append Storage

▪ Append tuple versions to a new page ▪ Write page when filled or on a threshold

| Dr. Robert Gottstein | 01.09.2017

SIAS in a nutshell: redesign architecture and algorithms

Tuple X0 Value=9 (tscreate=123, VID=34) Tuple X1 Value=10 (tscreate=134, VID=34) Tuple X2 Value=11 (tscreate=141, VID=34)

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

Tuple X0 Value=9 (tscreate=123, tsinval=null) Tuple X1 Value=10 (tscreate=134, tsinval=null) Tuple X2 Value=11 (tscreate=141, tsinval=null) Tuple X0 Value=9 (tscreate=123, tsinval=134) Tuple X1 Value=10 (tscreate=134, tsinval=141)

Item X VID=34

Variant above is widely spread in multi version databases! Variant below allows to address Flash storage properties

slide-9
SLIDE 9

Multi Version DBMS Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

DB Page X0=9 X1=10 X2=11

T1 T2 T3

P0 P10 P21 P32 P4 P... Invalidation Creation Pn

Ti

Transaction B0 B10 B21 B32 B4 B... Device Block Bn Item X

slide-10
SLIDE 10

Multi Version DBMS Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

DB Page X0=9 X1=10 X2=11

T1 T2 T3

P0 P10 P21 P32 P4 P... Invalidation Creation Pn

Ti

Transaction B0 B10 B21 B32 B4 B... Device Block Bn Item X

slide-11
SLIDE 11

Multi Version DBMS Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

DB Page X0=9 X1=10 X2=11

T1 T2 T3

P0 P10 P21 P32 P4 P... Invalidation Creation Pn

Ti

Transaction B0 B10 B21 B32 B4 B... Device Block Bn Item X ▪ Random Writes ▪ In-Place Updates ▪ Mixed Load

slide-12
SLIDE 12

SIAS Principle Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

Tuple Append Storage Management ▪ No in-place invalidation ▪ Append versions instead of pages ▪ Write filled pages Write Order

Bk-3 Bk-2 Bk-1 Bk

X0=9 X1=10 X2=11

T1 T2 T3

Pn DB Page Pn

Ti

Transaction Device Block Bn

▪ DBMS specific ▪ Write reduction ▪ Simplyfied Buffer Management Item X Invalidation Creation

X0

VIDMap

X1 X2

slide-13
SLIDE 13

SIAS Principle Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

Tuple Append Storage Management ▪ No in-place invalidation ▪ Append versions instead of pages ▪ Write filled pages Write Order

Bk-3 Bk-2 Bk-1 Bk

X0=9 X1=10 X2=11

T1 T2 T3

Pn DB Page Pn

Ti

Transaction Device Block Bn

▪ DBMS specific ▪ Write reduction ▪ Simplyfied Buffer Management Item X Invalidation Creation

X0

VIDMap

X1 X2

slide-14
SLIDE 14

SIAS Principle Example

W1[X0=9];C1; W2[X1=10];C2; W3[X2=11];C3;

| Dr. Robert Gottstein | 01.09.2017

Tuple Append Storage Management ▪ No in-place invalidation ▪ Append versions instead of pages ▪ Write filled pages Write Order

Bk-3 Bk-2 Bk-1 Bk

X0=9 X1=10 X2=11

T1 T2 T3

Pn DB Page Pn

Ti

Transaction Device Block Bn

▪ DBMS specific ▪ Write reduction ▪ Simplyfied Buffer Management Item X Invalidation Creation

X0

VIDMap

X1 X2

Significant Write Reduction (5 pages vs. 1 page) Sequentialization

slide-15
SLIDE 15

0x123 00 0x021 0x002 02 0x133 03 Virtual ID TupleID 0x291 34 ...

DB - Relation Y0 Y1 X0 X1 X2

SIAS-Chains VIDMap Tuple Versions

SIAS Principle: VIDmap

| Dr. Robert Gottstein | 01.09.2017

▪ VID not explicitly stored

 Index to Hash Bucket

▪ No Overflow buckets

 VIDs are unique: one VID per tuple (data item)

▪ Storing one TID per VID (sizeof(TID)=6Bytes)

▪ TIDpos=VID mod 1024 ▪

Tuple X0 Value=9 (tscreate=123, VID=34) Tuple X1 Value=10 (tscreate=134, VID=34) Tuple X2 Value=11 (tscreate=141, VID=34)

slide-16
SLIDE 16

Evaluation

Transaction Processing Council Benchmark C TPC-C OLTP Benchmark

slide-17
SLIDE 17

Write reduction

▪ Significant reduction of host writes ▪ Written blocks during TPC-C ▪ TPC-C benchmark Stock relation ▪ SIAS vs. SI ▪ 97% write reduction

| Dr. Robert Gottstein | 01.09.2017

TPC-C OLTP benchmark – Stock relation

Random & In Place Sequentialization 2566 Page Writes! 53 Page Writes! SSD Endurance?

slide-18
SLIDE 18

TPC-C: Throughput on 2x SSD RAID

| Dr. Robert Gottstein | 01.09.2017

350 400 450 500 520 SIAS-Chains 4480 5094 5676 6123 6164 SI 4468 4858 4862 4799 4716

4000 4500 5000 5500 6000 6500

NOTPM

TPC-C on SSD Raid: Throughput (NOTPM) +30% SIAS scales further >540WHs SI: Saturated with <450 WHs

New Order Transactions per Minute (NOTPM)

slide-19
SLIDE 19

TPC-C: Response Time on 2x SSD RAID

| Dr. Robert Gottstein | 01.09.2017

350 400 450 500 520 SIAS-Chains 0,274 0,711 1,5 2,179 2,931 SI 0,376 2,063 4,806 7,937 9,707

2 4 6 8 10 12

Response Time (sec.)

TPC-C on SSD Raid: Response Time (sec.) SI: saturated 3x lower Peak Load tolerance: SIAS scales with ~540WHs

Average response time: new order transaction

slide-20
SLIDE 20

TPC-C: Throughput on 6x SSD RAID (Sylt)

| Dr. Robert Gottstein | 01.09.2017 500 800 1000 1200 1300 1500 SIAS-Chains 6424 10254 12693 13482 13375 13054 SI 6422 10113 10964 10553 10485 9294 6000 7000 8000 9000 10000 11000 12000 13000 14000

NOTPM

TPC-C on Sylt: Throughput (NOTPM) Saturation SIAS removes I/O Bottleneck

New Order Transactions per Minute (NOTPM)

slide-21
SLIDE 21

TPC-C: Response Time on 6x SSD RAID (Sylt)

| Dr. Robert Gottstein | 01.09.2017

Average response time of new order table

500 800 1000 1200 1300 1500 SIAS-Chains 0,024 0,075 0,277 3,031 5,272 10,048 SI 0,028 0,369 3,648 9,725 12,48 22,6 5 10 15 20 25

Response time (sec.)

TPC-C on Sylt: Response time (sec.)

SI Saturation <1200WH SIAS Saturation ~1300WH

slide-22
SLIDE 22

Contributions

| Dr. Robert Gottstein | 01.09.2017

SIAS Architectural Changes

New Version Organization One Place invalidation Tuple Granularity Append Storage Manager Write Retention Selective Scan

  • ver VIDmap

VID Optimized Index Implementation in PostgreSQL Chaining

slide-23
SLIDE 23

Thank You

▪ Gottstein, Robert. Impact of New Storage Technologies on an OLTP DBMS, Its Architecture and Algorithms. Doctoral dissertation, Technische Universität Darmstadt. 2016. ▪ DBMS on Modern Storage Hardware (Tutorial). I. Petrov, R. Gottstein, S. Hardock. ICDE 2015. ▪ SIAS-V in Action: Snapshot Isolation Append Storage - Vectors on Flash. Robert Gottstein, Thorsten Peter, Ilia Petrov and Alejandro Buchmann. In 17th International Conference on Extending Database Technology (EDBT) 2014. ▪ Multi-Version Databases on Flash: Append Storage and Access Paths. Robert Gottstein, Ilia Petrov, Alejandro

  • Buchmann. International Journal On Advances in Software, Vol. 6, Number 3 and 4 2013.

▪ Read Optimisations for Append Storage on Flash. Robert Gottstein, Ilia Petrov, Alejandro Buchmann. 17th International Database Engineering and Applications Symposium, Barcelona, Spain, ACM, 2013. (IDEAS 2013). ▪ FBARC: I/O Asymmetry-Aware Buffer Replacement Strategy. Paul Dubs, Ilia Petrov, Robert Gottstein, Alejandro Buchmann . Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures" (ADMS 2013), in conjunction with VLDB 2013, Riva del Garda, Trento, Italy in August 2013. ▪ Aspects of Append-Based Database Storage Management on Flash Memories. Robert Gottstein, Ilia Petrov, Alejandro

  • Buchmann. DBKDA 2013. Best Paper Award.

▪ Append Storage in Multi-Version Databases on Flash. Robert Gottstein, Ilia Petrov, Alejandro Buchmann. 29th British National Conference on Databases, BNCOD 2013, University of Oxford, United Kingdom, 2013. ▪ SI-CV: Snapshot Isolation with Co-located Versions. Robert Gottstein, Ilia Petrov, Alejandro Buchmann. In Raghunath Nambiar, Meikel Poess: Topics in Performance Evaluation, Measurement and Characterization, Lecture Notes in Computer Science 7144, ISBN 978-3-642-32626-4, Springer Berlin / Heidelberg, 2012 10.1007/978-3-642-32627-1_9 ▪ Data-Intensive Systems on Evolving Memory Hierarchies. I. Petrov, D. Bausch, R. Gottstein, A. Buchmann. EEbS 2012. ▪ Revisiting DBMS Space Management for Native Flash. S. Hardock, I. Petrov, R. Gottstein, A. Buchmann. EDBT 2016. ▪ NoFTL for Real: Databases on Real Native Flash Storage. S. Hardock, I. Petrov, R. Gottstein, A. Buchmann. EDBT 2015. ▪ NoFTL: Database Systems on FTL-less Flash Storage. Sergej Hardock, Ilia Petrov, Robert Gottstein, Alejandro

  • Buchmann. 39th International Conference on Very Large Databases (VLDB), Riva del Garda, Italy, 2013.

| Dr. Robert Gottstein | 01.09.2017

Publications