A Light-weight Compaction Tree to Reduce I/O Amplification toward - - PowerPoint PPT Presentation

a light weight compaction tree to reduce i o
SMART_READER_LITE
LIVE PREVIEW

A Light-weight Compaction Tree to Reduce I/O Amplification toward - - PowerPoint PPT Presentation

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting Yao 1 , Jiguang Wan 1 , Ping Huang 2 , Xubin He 2 , Qingxin Gui 1 , Fei Wu 1 , and Changsheng Xie 1 1 Wuhan National Laboratory for Optoelectronics


slide-1
SLIDE 1

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores

Ting Yao 1, Jiguang Wan 1, Ping Huang 2, Xubin He 2, Qingxin Gui 1, Fei Wu 1, and Changsheng Xie 1

1Wuhan National Laboratory for Optoelectronics

Huazhong University of Science and Technology

2Temple University

slide-2
SLIDE 2

Outline

➢Background ➢LWC-tree (Light-Weight Compaction tree) ➢LWC-store on SMR drives ➢Experiment Result ➢Summary

slide-3
SLIDE 3

Background

Key-value stores are widespread in modern data centers.

Better service quality Responsive user experience

The log-structured merge tree (LSM- tree) is widely deployed.

RocksDB, Cassandra, Hbase, PNUTs, and LevelDB High throughput write Fast read performance

Log

L0 L1 Ln

Memory Disk

2 1 4

Write (key, value)

compaction 3

L2

……… …

Immutable MemTable MemTable SSTable

slide-4
SLIDE 4

Background · LSM-tree

read

b a c a-c

② sort

③ write

L0 L1 Ln L2 L0 L1 Ln

………

L2

Memory Disk

b a c a-c

The overall read and write data size for a compaction: 8 tables

13x

Above 10x

 This serious I/O amplifications of compactions motivate our design !

slide-5
SLIDE 5

➢Background ➢LWC-tree (Light-Weight

t Com Compacti tion tr tree ee)

➢LWC-store on SMR drives ➢Experiment Result ➢Summary

slide-6
SLIDE 6

LWC-tree

Aim

Alleviate the I/O amplification Achieve high write throughput No sacrifice to read performance

How

Keep the basic component of LSM-tree Keep tables sorted Keep the multilevel structure ➢Light-weight compaction – reduce I/O amplification ➢Metadata aggregation – reduce random read in a compaction ➢New table structure, DTable – improve lookup efficient within a table ➢Workload balance – keep the balance of LWC-tree

slide-7
SLIDE 7

LWC-tree · Light-weight compaction

Aim

 Reduce I/O amplification

How

append the data and only merge the metadata ➢Read the victim table ➢Sort and divide the data, merge the metadata ➢Overwrite and append the segment

Reduce 10 x amplification theoretically (AF=10)

(In LSM-tree, the overall data size for a conventional compaction: 8 tables.)

L1 L2

b a

L0 L1 Ln

………

c a-c

Memory Disk

① read

a-c sort

a’ b’ c’

③ Overwrite and append

a b c

The overall read and write data size for a light-weight compaction: 2 tables.

slide-8
SLIDE 8

LWC-tree · Metadata Aggregation

Aim

Reduce random read in a compaction Efficiently obtain the metadata form

  • verlapped Dtables

How

Cluster the metadata of overlapped DTables to its corresponding victim Dtable after each compaction

Li+1

b a

Li

c a-c a-c b

b’

c

c’

a

a’

A light-weight compaction

slide-9
SLIDE 9

LWC-tree · Metadata Aggregation

Aim

Reduce random read in a compaction Efficiently obtain the metadata form

  • verlapped Dtables

How

Cluster the metadata of overlapped DTables to its corresponding victim Dtable after each compaction

Li+1

b a

Li

c a-c a-c b

b’

c

c’

a

a’

Metadata aggregation after light-weight compaction

slide-10
SLIDE 10

LWC-Tree · DTable

Aim

Support Light-weight compaction Keep the lookup efficiency within a DTable

How

Store the metadata of its corresponding overlapped Dtables Manage the data and block index in segment

Origin data Segment 2

(append data)

Segment 1

(append data)

Metadata Dtable

Overlapped Dtables Metadata Data_block i Data_block i+1

……

Filter blocks footer Index block Meta_index block

Overlapped Meta_index block

Origin index Segment 1 Index Segment 2 Index

Index block Magic data

Index block in segment

slide-11
SLIDE 11

LWC-Tree · Workload Balance

Aim

Keep the balance of LWC-tree Improve the operation efficiency

How

Deliver the key range of the

  • verly-full table to its siblings

after light-weight compaction

Advantage

no data movement and no extra overhead

Data volume DTable number in Level Li 1 2 3 4 5 6 7

n …

Data block 8

b a

Li Li+1

c

d

a-c

d b a c c-d a-b d Light-weight compaction Range adjustment

… … …

1 2 1 2

slide-12
SLIDE 12

➢Background ➢LWC-tree (Light-Weight Compaction tree) ➢LWC-store on SMR drives ➢Experiment Result ➢Summary

slide-13
SLIDE 13

LevelDB on SMR Drives

SMR(Shingled Magnetic Recording)

Overlapped tracks Band & Guard Regions Random write constrain

LevelDB on SMR

 Multiplicative I/O amplification

Figure from Fast 2015 “Skylight – A Window on Shingled Disk Operation”

slide-14
SLIDE 14

LevelDB on SMR Drives

SMR(Shingled Magnetic Recording)

Overlapped tracks Band & Guard Regions Random write constrain

LevelDB on SMR

 Multiplicative I/O amplification

9.73 10.07 9.83 9.72 9.86 25.22 39.89 52.85 62.14 76.59 20 40 60 80 100 20 30 40 50 60

Amplification RAtio band size (MB)

WA MWA

Band size 40 MB WA(Write amplification of LevelDB)

9.83x

AWA (Auxiliary write amplification of SMR)

5.39x

MWA (Multiplicative write amplification of LevelDB on SMR )

52.58x

 This auxiliary I/O amplifications of SMR motivate our implementation!

slide-15
SLIDE 15

LWC-store on SMR drive

Aim

 Eliminate the auxiliary I/O amplification of SMR  Improve the overall performance

How

 A DTable is mapped to a band in SMR drive  Segment appends to the band and

  • verlaps the out-of-date metadata

 Equal division: Divide the DTable

  • verflows a band into several sub-

tables in the same level

slide-16
SLIDE 16

➢Background ➢LWC-tree (Light-Weight Compaction tree) ➢LWC-store on SMR drives ➢Experiment Result ➢Summary

slide-17
SLIDE 17

Configuration

1. LevelDB on HDDs (LDB-hdd) 2. LevelDB on SMR drives (LDB-smr) 3. SMRDB*

  • An SMR drive optimized key-value store
  • reduce the LSM-tree levels to only two levels (i.e., L0 and L1)
  • the key range of the tables at the same level overlapped
  • match the SSTable size with the band size

4. LWC-store on SMR drives (LWC-smr)

Experiment Perimeter

Test Machine 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz processors SMR Drive Seagate ST5000AS0011 CMR HDD Seagate ST1000DM003 SSD Intel P3700

slide-18
SLIDE 18

Experiment · Load (100GB data)

Random load

Large amount of compactions LWC-store 9.80x better than LDB-SMR LWC-store 4.67x better than LDB-HDD LWC-store 5.76x better than SMRDB

Sequential load

No compaction Similar Sequential load performance

12.20 42.90 45.00 45.50 0.00 10.00 20.00 30.00 40.00 50.00 LDB-SMR LDB-HDD SMRDB LWC-SMR

Seq.load throughput (MB/s)

Sequential put

1.00 2.10 1.70 9.80 0.00 2.00 4.00 6.00 8.00 10.00 12.00 LDB-SMR LDB-HDD SMRDB LWC-SMR

Rnd.load throughput (MB/s)

Random put

4.67x

slide-19
SLIDE 19

Experiment · read (100K entries)

30.58 28.91 28.88 28.65 27.50 28.00 28.50 29.00 29.50 30.00 30.50 31.00 LDB-SMR LDB-HDD SMRDB LWC-SMR

Rnd.read latency (ms)

Random get

15.50 12.60 20.50 26.90 0.00 5.00 10.00 15.00 20.00 25.00 30.00 LDB-SMR LDB-HDD SMRDB LWC-SMR

Seq.read throughput (MB/s)

Sequential get

Look-up 100K KV entries against a 100GB random load database

slide-20
SLIDE 20

Experiment · compaction(randomly load 40GB data)

Compaction performance in microscope

 LevelDB: number of compactions is large  SMRDB: data size of each compaction is large  LWC-tree: small number of compactions and small data size

Overall compaction time

 LWC-smr gets the highest efficiency

slide-21
SLIDE 21

Experiment · compaction(randomly load 40GB data)

35640 19227 49298 5128 10000 20000 30000 40000 50000 LDB-SMR LDB-HDD SMRDB LWC-SMR

Overall comp time (s)

Overall compaction time

Compaction performance in microscope

 LevelDB: number of compactions is large  SMRDB: data size of each compaction is large  LWC-tree: small number of compactions and small data size

Overall compaction time

 LWC-smr gets the highest efficiency

slide-22
SLIDE 22

Experiment · Write amplification

Competitors

  • LWC-SMR
  • LDB-SMR

Write amplification (WA)

  • Write amplification of KV-store

Auxiliary write amplification (ARA)

  • Auxiliary write amplification of SMR

multiplicative write amplification (MWA)

  • Multiplicative write amplification of KV stores
  • n SMR

2.28 1.68 1.68 1.63 1.63 25.22 39.89 52.85 62.14 76.59 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 20MB 30MB 40MB 50MB 60MB

Multiplicative write amplification Band Size

LWC-SMR.MWA LDB-SMR.MWA 2.24 1.56 1.47 1.39 1.39 9.73 10.07 9.83 9.72 9.86 1.02 1.08 1.14 1.17 1.17 2.59 3.96 5.38 6.39 7.77 0.00 2.00 4.00 6.00 8.00 10.00 12.00 20MB 30MB 40MB 50MB 60MB

Write amplification Band Size

LWC-SMR.WA LDB-SMR.WA LWC-SMR.AWA LDB-SMR.AWA

38.12x

slide-23
SLIDE 23

Experiment · LWC-store on HDD and SSD

slide-24
SLIDE 24

➢Background ➢LWC-tree (Light-Weight Compaction tree) ➢LWC-store on SMR drives ➢Experiment Result ➢Summary

slide-25
SLIDE 25

Summary

LWC-tree: A variant of LSM-tree

Light-weight compaction – Significantly reduce the I/O amplification of compaction

LWC-store on SMR drive

Data management in SMR drive – eliminate the auxiliary I/O amplification from SMR drive

Experiment result

high compaction efficiency high write efficiency Fast read performance same as LSM-tree Wide applicability

slide-26
SLIDE 26

Thank you!

QUESTIONS? Email: tingyao@hust.edu.cn