Cascade Mapping: Optimizing Memory Efficiency for Flash-based - - PowerPoint PPT Presentation

cascade mapping optimizing memory efficiency for flash
SMART_READER_LITE
LIVE PREVIEW

Cascade Mapping: Optimizing Memory Efficiency for Flash-based - - PowerPoint PPT Presentation

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value systems are widely used today


slide-1
SLIDE 1

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Kefei Wang and Feng Chen Louisiana State University

SoCC '18 Carlsbad, CA

slide-2
SLIDE 2

Key-value Systems in Internet Services

2

  • Key-value systems are widely used today

– Online shopping – Social media – Cloud storage – Big data

Key Value Product_ID Product_Name URL Image

slide-3
SLIDE 3

Key-value Caching

3

“First line of defense” in today’s Internet service

  • High throughput
  • Low latency

Operations: SET GET DELETE

Web Server Cache Server Database Server Client requests

slide-4
SLIDE 4

Key-value Caching

3

“First line of defense” in today’s Internet service

  • High throughput
  • Low latency

Operations: SET GET DELETE

Web Server Cache Server Database Server Client requests

Hit

slide-5
SLIDE 5

Key-value Caching

3

“First line of defense” in today’s Internet service

  • High throughput
  • Low latency

Operations: SET GET DELETE

Web Server Cache Server Database Server Client requests

Miss Hit

slide-6
SLIDE 6

Key-value Caching

3

“First line of defense” in today’s Internet service

  • High throughput
  • Low latency

Operations: SET GET DELETE

Web Server Cache Server Database Server Client requests

Miss Hit

slide-7
SLIDE 7

Flash-based Key-value Caching

4

  • In-flash key-value caches

– Key-values are stored in commercial flash SSDs – Example: Facebook’s McDipper, Twitter’s Fatcache

  • Key features

– Memcached compatible (SET, GET, DELETE) – Advantages: low cost and high performance

  • McDipper: reduce 90% deployed servers, 90% GETs < 1ms*

Speed Power Cost Capacity Persistency DRAM High High High Low No Flash Low- Low+ Low+ High+ Yes+

*https://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920/

slide-8
SLIDE 8

Flash-based Key-value Caching

5

Key-value slabs DRAM Memory Hash-based mapping

Data stored in flash and all the mappings in DRAM

Flash SSD

slide-9
SLIDE 9

Flash-based Key-value Caching

5

Key-value slabs DRAM Memory Hash-based mapping

Slab

Data stored in flash and all the mappings in DRAM

Flash SSD

slide-10
SLIDE 10

Flash-based Key-value Caching

5

Key-value slabs DRAM Memory Hash-based mapping

MD[20] Slab_ID Slot_ID Expiry

Slab Slot

Data stored in flash and all the mappings in DRAM

Flash SSD

slide-11
SLIDE 11

Flash-based Key-value Caching

5

Key-value slabs DRAM Memory Hash-based mapping

MD[20] Slab_ID Slot_ID Expiry

Slab Slot

Data stored in flash and all the mappings in DRAM

Flash SSD

slide-12
SLIDE 12

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

slide-13
SLIDE 13

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

slide-14
SLIDE 14

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

  • Flash memory vs. DRAM memory

– Capacity: Flash cache is 10-100x larger than memory-based cache – Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) – Growth: flash (50-60% per year), DRAM (25-40% per year)

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

slide-15
SLIDE 15

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

  • Flash memory vs. DRAM memory

– Capacity: Flash cache is 10-100x larger than memory-based cache – Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) – Growth: flash (50-60% per year), DRAM (25-40% per year)

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

150 GB

1 TB

DRAM Flash

Assume average key-value size is 300 bytes

slide-16
SLIDE 16

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

  • Flash memory vs. DRAM memory

– Capacity: Flash cache is 10-100x larger than memory-based cache – Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) – Growth: flash (50-60% per year), DRAM (25-40% per year)

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

300 GB

2 TB

DRAM Flash

Assume average key-value size is 300 bytes

slide-17
SLIDE 17

Scalability Challenge

  • High Index-to-data Ratio

– Key-value cache is dominated by small items (90% < 500 bytes) – Key-value mapping entry size: 44 bytes in Fatcache

  • Flash memory vs. DRAM memory

– Capacity: Flash cache is 10-100x larger than memory-based cache – Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) – Growth: flash (50-60% per year), DRAM (25-40% per year)

6

Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

300 GB

2 TB

DRAM Flash

Assume average key-value size is 300 bytes

A technical dilemma: We have a lot of flash space to cache the data, but we don’t have enough DRAM to index the data

slide-18
SLIDE 18

Evolution of Key-value Caching

7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM)

slide-19
SLIDE 19

Evolution of Key-value Caching

7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM)

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

slide-20
SLIDE 20

Evolution of Key-value Caching

7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

slide-21
SLIDE 21

Evolution of Key-value Caching

7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/O

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

slide-22
SLIDE 22

Evolution of Key-value Caching

7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/O One Flash I/O

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

slide-23
SLIDE 23

Evolution of Key-value Caching

  • Leverage the strong locality to differentiate hot and cold mappings

– Hold the most popular mappings in a small in-DRAM mapping structure – Leave the majority mappings in a large in-flash mapping structure 7 key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/O One Flash I/O N Flash I/Os

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

slide-24
SLIDE 24

Outline

  • Cascade mapping design
  • Optimizations
  • Evaluation results
  • Conclusions

8

slide-25
SLIDE 25

Cascade Mapping

9

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash
slide-26
SLIDE 26

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash
slide-27
SLIDE 27

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash

Key

slide-28
SLIDE 28

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash

Key

slide-29
SLIDE 29

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash

Key

slide-30
SLIDE 30

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash

Key

slide-31
SLIDE 31

Cascade Mapping

9 Tier 2 Tier 3

Tier 1

Memory space Flash space

Hierarchical Mapping Structure

– Tier 1 – Hot mappings

  • Hash index based search in memory

– Tier 2 – Warm mappings

  • High-bandwidth quick scan in flash

– Tier 3 – Cold mappings

  • Efficient linked-list structure in flash

Key-value slabs

Key

slide-32
SLIDE 32

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-33
SLIDE 33

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-34
SLIDE 34

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-35
SLIDE 35

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-36
SLIDE 36

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-37
SLIDE 37

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-38
SLIDE 38

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

slide-39
SLIDE 39

Tier 1: A Mapping Table in Memory

10

Bucket 0

Key Hash

Bucket 1 Bucket n

Partition 1 Partition n

… …

Virtual buffer Demote to Tier 2

20 40 60 80 100 4 6 8 10 12 14 16 18 20

Hit Ratio (%) Ratio of Tier 1 (%) CLOCK LRU FIFO

slide-40
SLIDE 40

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

slide-41
SLIDE 41

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

slide-42
SLIDE 42

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

slide-43
SLIDE 43

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

T T T FOUND Serial Search: 3x T

slide-44
SLIDE 44

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

T T T FOUND Serial Search: 3x T

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

slide-45
SLIDE 45

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

11

T T T FOUND Serial Search: 3x T

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

slide-46
SLIDE 46

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

  • An FIFO array of blocks

– The most recent version is always in the latest position

  • Parallelized Batch Search

– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one I/O time

11

T T T FOUND Serial Search: 3x T

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

slide-47
SLIDE 47

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

  • An FIFO array of blocks

– The most recent version is always in the latest position

  • Parallelized Batch Search

– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one I/O time

11

T T T FOUND Serial Search: 3x T

FIFO

Block 2 Block 4 Block 3 Block 1

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

slide-48
SLIDE 48

Tier 2: Direct Indexing in Flash

  • Direct mapping block

– A set of mapping entries demoted from Tier 1

  • An FIFO array of blocks

– The most recent version is always in the latest position

  • Parallelized Batch Search

– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one I/O time

11

T T T FOUND Serial Search: 3x T

FIFO

Block 2 Block 4 Block 3 Block 1

T FOUND Parallel Search: 1x T

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

slide-49
SLIDE 49

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1023

… … …

slide-50
SLIDE 50

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1023

… … … Memory buffers

slide-51
SLIDE 51

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1023

  • “Narrow” hash table

– Long list to walk through – Need less memory buffers (e.g., 128MB)

… … … Memory buffers

slide-52
SLIDE 52

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1 Bucket 1048575 Bucket 1023 Bucket 0

  • “Narrow” hash table

– Long list to walk through – Need less memory buffers (e.g., 128MB)

… … … … … … … … … … … Memory buffers

slide-53
SLIDE 53

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1 Bucket 1048575 Bucket 1023 Bucket 0

  • “Narrow” hash table

– Long list to walk through – Need less memory buffers (e.g., 128MB)

… … … … … … … … … … … Memory buffers

Memory efficiency v.s. I/O efficiency

slide-54
SLIDE 54

Tier 3: Hash Table List Designs

12

Bucket 0 Bucket 1 Bucket 1 Bucket 1048575 Bucket 1023 Bucket 0

  • “Narrow” hash table

– Long list to walk through – Need less memory buffers (e.g., 128MB)

  • “Wide” hash table

– Short list to walk through – Need more memory buffers (e.g., 128GB)

… … … … … … … … … … … Memory buffers

Memory efficiency v.s. I/O efficiency

slide-55
SLIDE 55

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … …

Bucket 1023

Dedicated buffers

Writes Active table Inactive table

slide-56
SLIDE 56

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … … Length limit

Bucket 1023

Dedicated buffers

Writes Active table Inactive table

slide-57
SLIDE 57

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … … Length limit

Bucket 1023

Dedicated buffers

Writes Active table Inactive table

slide-58
SLIDE 58

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … … Length limit

Bucket 1023

Dynamic buffers Dedicated buffers

Writes Active table Inactive table

slide-59
SLIDE 59

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … … Length limit

Bucket 1023

Compaction

Dynamic buffers Dedicated buffers

Writes Active table Inactive table

slide-60
SLIDE 60

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved

– Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

13

Bucket 0 Bucket 1 Bucket 1023

… …

Bucket 1 Bucket 1048575 Bucket 0

… … … … … … … … Length limit

Bucket 1023

Compaction

Dynamic buffers Dedicated buffers

Writes Active table Inactive table

slide-61
SLIDE 61

Outline

  • Cascade mapping design
  • Optimizations
  • Evaluation results
  • Conclusions

14

slide-62
SLIDE 62

Optimization Techniques

  • Partition the hash space to create multiple demotion I/O streams
  • Adopt a memory-efficient CLOCK-based demotion policy
  • Organize an array of direct mapping blocks in the FIFO order
  • Parallel batch search to quickly complete a one-to-one scan
  • Use a dual-mode hash table for both memory and I/O efficiency
  • A jump list by using Bloom filters to skip impossible blocks
  • Make the FIFO-based eviction policy locality aware
  • Use slab sequence counter to realize zero-I/O demapping
  • Leverage the FIFO nature of slabs for efficient crash recovery

15

slide-63
SLIDE 63

Optimization Techniques

  • Partition the hash space to create multiple demotion I/O streams
  • Adopt a memory-efficient CLOCK-based demotion policy
  • Organize an array of direct mapping blocks in the FIFO order
  • Parallel batch search to quickly complete a one-to-one scan
  • Use a dual-mode hash table for both memory and I/O efficiency
  • A jump list by using Bloom filters to skip impossible blocks
  • Make the FIFO-based eviction policy locality aware
  • Use slab sequence counter to realize zero-I/O demapping
  • Leverage the FIFO nature of slabs for efficient crash recovery

16

slide-64
SLIDE 64

Optimization: Jump List

17

Hash bucket

One single long list

slide-65
SLIDE 65

Optimization: Jump List

17 1 0 0 1 1 1 1 1 A B C Bloom filter: to test whether an element is in a set

– A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

Hash bucket

One single long list

slide-66
SLIDE 66

Optimization: Jump List

17 1 0 0 1 1 1 1 1 A B C Bloom filter: to test whether an element is in a set

– A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

One single long list

slide-67
SLIDE 67

Optimization: Jump List

17

Hash bucket

Bloom filters are used to avoid unnecessary tier-3 I/Os

– Bloom filters are stored in flash together with regular mapping blocks – Indicate whether a mapping can be found within next several blocks – If returns negative, jump to the next Bloom filter block 1 0 0 1 1 1 1 1 A B C Bloom filter: to test whether an element is in a set

– A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

One single long list Several short lists connected by hops

slide-68
SLIDE 68

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

slide-69
SLIDE 69

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

Victim slab

slide-70
SLIDE 70

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

Victim slab

slide-71
SLIDE 71

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

Victim slab

slide-72
SLIDE 72

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

Victim slab

slide-73
SLIDE 73

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

  • Our solution: Keep hot data in cache

– If a k-v item’s mapping is in tier 1, indicating it is hot data – Rewrite hot data to a new slab, then erase victim slab

Victim slab

slide-74
SLIDE 74

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

  • Our solution: Keep hot data in cache

– If a k-v item’s mapping is in tier 1, indicating it is hot data – Rewrite hot data to a new slab, then erase victim slab

Victim slab

slide-75
SLIDE 75

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

  • Our solution: Keep hot data in cache

– If a k-v item’s mapping is in tier 1, indicating it is hot data – Rewrite hot data to a new slab, then erase victim slab

Victim slab

slide-76
SLIDE 76

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

  • Our solution: Keep hot data in cache

– If a k-v item’s mapping is in tier 1, indicating it is hot data – Rewrite hot data to a new slab, then erase victim slab

Victim slab

slide-77
SLIDE 77

Optimization: Garbage Collection

18

  • GC is a must-have for key-value systems

– To reclaim flash space – To organize large sequential writes

  • Traditional: Free up space immediately

– Erase entire victim slab based on FIFO order – Reclaim space quickly, but may delete hot data

  • Our solution: Keep hot data in cache

– If a k-v item’s mapping is in tier 1, indicating it is hot data – Rewrite hot data to a new slab, then erase victim slab

  • Adaptive two-phase GC

– If free flash space is too low, perform fast space reclaim – Keep hot data when system under moderate pressure

Victim slab

slide-78
SLIDE 78

Outline

  • Cascade mapping design
  • Optimizations
  • Evaluation results
  • Conclusions

19

slide-79
SLIDE 79

Experimental Setup

  • Implementation

– SlickCache: 3,800 lines of C code added to Twitter’s Fatcache

  • Hardware environment

– Lenovo ThinkServers: 4-core Intel Xeon 3.4 GHz with 16 GB DRAM – 240-GB Intel 730 SSD as cache device – 280-GB Intel Optane 900P SSD as swapping device – 7,200 RPM Seagate 2-TB HDD as database device

  • Software environment

– Ubuntu 16.04 with Linux kernel 4.12 and Ext4 file system – MongoDB 3.4 for backend database

  • Workloads

– Yahoo! Cloud Serving Benchmark (YCSB) – Popular distributions: Hotspot, Zipfian, and Normal

20

slide-80
SLIDE 80

Evaluation Results

21

Comparison with Fatcache and system swapping

Fatcache-Swap-Flash and Fatcache-Swap-Optane are both configured with 10% of physical memory, allowed to swap on flash SSD and Optane SSD respectively.

2x 7x

slide-81
SLIDE 81

Evaluation Results

22

85%

Cache effectiveness (Fixed cache size)

SlickCache only uses 10% of the memory used by Fatcache, achieves comparable performance. SlickCache-GC increases throughput by up to 85% due to the optimized GC policy.

slide-82
SLIDE 82

23

Evaluation Results

125x

Cache effectiveness (Fixed memory size)

SlickCache is able to index a 10 times larger flash cache with the same amount of memory, which in turn increases the hit ratio by up to 8.2 times and the throughput by up to 125 times.

slide-83
SLIDE 83

Conclusions

24

Cascade Mapping for flash-based key-value caching

  • A hierarchical mapping structure for flash-based key-value cache
  • A set of optimizations to improve performance
  • Use less memory while performs better than current design
slide-84
SLIDE 84

25

Thanks! And Questions?