The Bw-Tree: A B-tree for New Hardware Platforms Author: J. - - PowerPoint PPT Presentation

the bw tree a b tree for new hardware platforms
SMART_READER_LITE
LIVE PREVIEW

The Bw-Tree: A B-tree for New Hardware Platforms Author: J. - - PowerPoint PPT Presentation

The Bw-Tree: A B-tree for New Hardware Platforms Author: J. Levandoski et al. B uzz w ord The Bw-Tree: A B-tree for New Hardware Platforms DRAM + Flash storage Author: J. Levandoski et al. Hardware Trends Multi-core + large main


slide-1
SLIDE 1

The Bw-Tree: A B-tree for New Hardware Platforms

Author: J. Levandoski et al.

slide-2
SLIDE 2

The Bw-Tree: A B-tree for New Hardware Platforms

Author: J. Levandoski et al.

Buzz word DRAM + Flash storage

slide-3
SLIDE 3

Hardware Trends

  • Multi-core + large main memories

○ Latch contention ■ Worker threads set latches for accessing data ○ Cache invalidation ■ Worker threads access data from different NUMA nodes

slide-4
SLIDE 4

Hardware Trends

  • Multi-core + large main memories

○ Latch contention ■ Worker threads set latches for accessing data ○ Cache invalidation ■ Worker threads access data from different NUMA nodes Delta updates ○ No updates in place ○ Reduces cache invalidation ○ Enable latch-free tree operation

slide-5
SLIDE 5

Hardware Trends

  • Flash storage

○ Good at random reads and sequential reads/writes ○ Bad at random writes ■ Erase cycle

slide-6
SLIDE 6

Hardware Trends

  • Flash storage

○ Good at random reads and sequential reads/writes ○ Bad at random writes ■ Erase cycle Log-structured storage design

slide-7
SLIDE 7

Architecture

  • CRUD API
  • Bw-tree search logic
  • In-memory pages

Bw-tree Layer Cache Layer Flash Layer

  • Logical page abstraction
  • Paging between flash and RAM
  • Sequential writes to log-

structured storage

  • Flash garbage collection
slide-8
SLIDE 8

Architecture

  • CRUD API
  • Bw-tree search logic
  • In-memory pages

Bw-tree Layer Cache Layer Flash Layer

  • Logical page abstraction
  • Paging between flash and RAM
  • Sequential writes to log-

structured storage

  • Flash garbage collection

Atomic record store, not an ACID transactional database

slide-9
SLIDE 9

Architecture

  • CRUD API
  • Bw-tree search logic
  • In-memory pages

Bw-tree Layer Cache Layer Flash Layer

  • Logical page abstraction
  • Paging between flash and RAM
  • Sequential writes to log-

structured storage

  • Flash garbage collection

Atomic record store, not an ACID transactional database

slide-10
SLIDE 10

Logical Pages and Mapping Table

  • Logical pages are identified by PIDs stored as Mapping Table keys.
  • Physical addresses can be either in main memory or in flash storage.
slide-11
SLIDE 11

Delta Updates

  • Tree operations are atomic.
  • Update operations are “logged” as a lineage of delta records.
  • Delta records are incorporated to the base page asynchronously.
  • Updates are “installed” to Mapping Table through compare-and-swap.
  • Important enabler for latch-freedom and cache-efficiency.
slide-12
SLIDE 12

Delta Updates

  • Tree operations are atomic.
  • Update operations are “logged” as a lineage of delta records.
  • Delta records are incorporated to the base page asynchronously.
  • Updates are “installed” to Mapping Table through compare-and-swap.
  • Important enabler for latch-freedom and cache-efficiency.

Q: What is the performance

  • f reading data from page

P?

slide-13
SLIDE 13

Other details

  • SMO: structure modification operations

○ split, merge, consolidate ○ has multiple phases -> how to make SMO atomic?

  • In-memory page garbage collection

○ epoch-based.

slide-14
SLIDE 14

Architecture

  • CRUD API
  • Bw-tree search logic
  • In-memory pages

Bw-tree Layer Cache Layer Flash Layer

  • Logical page abstraction
  • Paging between flash and RAM
  • Sequential writes to log-

structured storage

  • Flash garbage collection
slide-15
SLIDE 15

Flash Layer

slide-16
SLIDE 16

Flushing Pages

PID Physical Address

P

Page P Insert 40 Insert 50 Delete 33 Modify 40 to 60

Q: Why flushing pages? Q: When to flush pages? Q: How many pages to flush? Q: What if you crash during a flush? Log-structured Store

slide-17
SLIDE 17

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer Log-structured Store

slide-18
SLIDE 18

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P

Log-structured Store

slide-19
SLIDE 19

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P Page T

Log-structured Store

slide-20
SLIDE 20

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P Page T Flush

Log-structured Store

slide-21
SLIDE 21

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Flush Insert 50 Delete 33 Page P Page T

Log-structured Store

slide-22
SLIDE 22

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Flush Insert 50 Delete 33 Page P Page T

Log-structured Store

slide-23
SLIDE 23

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P Flush Insert 50 Delete 33 Page T Insert 50 Delete 33

Log-structured Store

slide-24
SLIDE 24

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P Flush Insert 50 Delete 33 Page T Insert 50 Delete 33 Page E

Log-structured Store

slide-25
SLIDE 25

Flushing Pages

PID Physical Address

P

Page P

Flush Write Buffer

Page P Flush Insert 50 Delete 33 Page T Flush Page E Insert 50 Delete 33

Log-structured Store

slide-26
SLIDE 26

Other details

  • Log-structured Store garbage collection

○ Cleans orphaned data unreachable from mapping table ○ Relocates entire pages in sequential blocks (to reduce fragmentation)

  • Access method recovery

○ Occasionally checkpoint mapping table ○ Redo-scan starts from last checkpoint

slide-27
SLIDE 27

Experiment

  • Against

○ BerkeleyDB (without transaction) ○ latch-free skip-list

slide-28
SLIDE 28

Experiment

Over BerkerleyDB:

  • 18x speedup in read-intensive workload
  • 5-8x speedup in update-intensive workload

Over Skip-list:

  • 4.4x speedup in read-only workload.
  • 3.7x speedup in update-intensive workload.
slide-29
SLIDE 29

Thank you!

Slides adapted from http://www.hpts.ws/papers/2013/bw-tree-hpts2013.pdf