Consistent and Durable Data Structures for Non-Volatile - - PowerPoint PPT Presentation

consistent and durable data structures for non volatile
SMART_READER_LITE
LIVE PREVIEW

Consistent and Durable Data Structures for Non-Volatile - - PowerPoint PPT Presentation

Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory Shivaram Venkataraman* , Niraj Tolia , Parthasarathy Ranganathan* and Roy H. Campbell *HP Labs, Palo Alto, Maginatics, and University of Illinois,


slide-1
SLIDE 1

Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory

Shivaram Venkataraman*†, Niraj Tolia‡, Parthasarathy Ranganathan* and Roy H. Campbell†

*HP Labs, Palo Alto, ‡Maginatics, and

†University of Illinois, Urbana-Champaign

slide-2
SLIDE 2

Non-Volatile Byte-Addressable Memory (NVBM)

Memristor

3/4/11 2

Phase Change Memory Memristor

slide-3
SLIDE 3

Non-Volatile Byte-Addressable Memory (NVBM)

50-150 nanoseconds Scalable Non-Volatile Lower energy

Memristor

3/4/11 3

slide-4
SLIDE 4

Access Times

1 10 100 1000 10000 100000 1000000 10000000 Nanoseconds

3/4/11 4

Hard Disk Writes – 3 ms Write to SLC Flash – 200 μs Processor clock cycle – 1ns Access L2 cache – 10ns Update DRAM – 55ns

slide-5
SLIDE 5

Access Times

1 10 100 1000 10000 100000 1000000 10000000 Nanoseconds

3/4/11 5

Hard Disk Writes – 3 ms Write to SLC Flash – 200 μs Processor clock cycle – 1ns Access L2 cache – 10ns Update DRAM – 55ns Writes to PCM / Memristor – 100-150 ns

slide-6
SLIDE 6

Data Stores - Disk

L1 Cache

Traditional DB

DRAM Core1 Core2 L1 Cache L1 Cache L2 Cache Disk

File systems

3/4/11 6

slide-7
SLIDE 7

DRAM

Data Stores - DRAM

Core1 Core2 L1 Cache L1 Cache L2 Cache Commit Log - Disk

RAMCloud memcached Memory-based DB

3/4/11 7

slide-8
SLIDE 8

DRAM

Data Stores - NVBM

Core1 Core2 L1 Cache L1 Cache L2 Cache Non-Volatile Memory

Single-level store

3/4/11 8

slide-9
SLIDE 9

Challenges

10 5 20 15 2 ¡ 1 ¡ Consistency Durability

3/4/11 9

slide-10
SLIDE 10

Outline

§ Motivation § Consistent durable data structures

§ Consistent durable B-Tree § Tembo – Distributed Data Store Implementation

§ Evaluation

3/4/11 10

slide-11
SLIDE 11

Consistent Durable Data Structures

§ Versioning for consistency across failures § Restore to last consistent version on recovery § Atomic change across versions § No new processor extensions!

3/4/11 11

slide-12
SLIDE 12

Versioning

§ Totally ordered – Increasing natural numbers § Every update creates a new version § Last consistent version

§ Stored in a well-known location § Used by reader threads and for recovery

3/4/11 12

slide-13
SLIDE 13

Consistent Durable B-Tree

B – Size of a B-Tree node

3/4/11 13

Key [start, end)

Deleted entry Live entry

slide-14
SLIDE 14

Lookup

Find key 20 at version 5

3/4/11 14

slide-15
SLIDE 15

Insert / Split

3/4/11 15

slide-16
SLIDE 16

Garbage Collection

3/4/11 16

slide-17
SLIDE 17

Tembo – Distributed Data Store Implementation

Based on open source key-value store Widely used in production In-memory dataset

3/4/11 17

slide-18
SLIDE 18

Tembo – Distributed Data Store Implementation

Key Value Server

Consistent durable B-Tree Single writer, shared reader

3/4/11 18

Consistent Hashing

slide-19
SLIDE 19

Outline

§ Motivation § Consistent durable data structures

§ Consistent durable B-Tree § Tembo – Distributed Data Store Implementation

§ Evaluation

3/4/11 19

slide-20
SLIDE 20

Ease of Integration

Lines of Code Original STX B-Tree 2110 CDDS Modifications 1902 (90%) Redis (v2.0.0-rc4) 18539 Tembo Modifications 321 (1.7%)

3/4/11 20

slide-21
SLIDE 21

Evaluation - Setup

§ API Microbenchmarks

§ Compare with Berkeley DB § Tembo: Versioning vs. write-ahead logging

§ End-to-End Comparison

§ NoSQL systems – Cassandra § Yahoo Cloud Serving Benchmark

§ 15 node test cluster

§ 13 servers, 2 clients § 720 GB RAM, 120 cores

3/4/11 21

slide-22
SLIDE 22

Durability - Logging vs. Versioning

3/4/11 22

2000 4000 6000 8000 10000 12000 14000 256 1024 4096

Throughput (Ops/sec) Value size (bytes) Redis - BTree+Logging Redis - Hashtable+Logging Tembo - CDDS BTree

2M insert operations, two client threads

slide-23
SLIDE 23

Yahoo Cloud Serving Benchmark

20000 40000 60000 80000 100000 120000 140000 160000 2 10 20 30 Ops/sec Client Threads

Tembo Cassandra-inmemory Cassandra-disk

3/4/11 23

286% 44%

slide-24
SLIDE 24

Furthermore

§ Algorithms for deletion § Analysis for space usage and height of B-Tree § Durability techniques for current processors

3/4/11 24

slide-25
SLIDE 25

Related Work

§ Multi-version data structures

§ Used in transaction time databases

§ NVBM based systems

§ BPFS – File system (SOSP 2009) § NV-Heaps – Transaction Interface (ASPLOS 2011)

§ In-memory data stores

§ H-Store – MIT, Brown University, Yale University § RAMCloud – Stanford University

3/4/11 25

slide-26
SLIDE 26

Work-in-progress

§ Robust reliability testing § Support for transaction-like operations § Integration of versioning and wear-leveling

3/4/11 26

slide-27
SLIDE 27

Conclusion

§ Changes in storage media

§ Rethink software stack

§ Consistent Durable Data Structures

§ Single-level store § Durability through versioning § Up to 286% faster than memory-backed systems

3/4/11 27