The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, - - PowerPoint PPT Presentation

the multi streamed solid state drive
SMART_READER_LITE
LIVE PREVIEW

The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, - - PowerPoint PPT Presentation

The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho Memory Solutions Lab. Memory Division, Samsung Electronics Co., Ltd SSD as a Drop-in Replacement of HDD SSD shares a common interface with HDD


slide-1
SLIDE 1

Jeong-Uk Kang* , Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho

Memory Solutions Lab. Memory Division, Samsung Electronics Co., Ltd

The Multi-streamed Solid-State Drive

slide-2
SLIDE 2

SSD shares a common interface with HDD

  • The block device abstraction paved the way for wide adoption of SSDs

SSD as a Drop-in Replacement of HDD SSD SSD HDD HDD

Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer

Host Host

Application OS File System

Host Host

Application OS File System

SATA Logical Block Address

slide-3
SLIDE 3

Rotating media and NAND flash memory are very different!

Great, BUT… SSD

Read_Page() Write_Page() Erase_Block() Copy_Page()

SSD

Read_Page() Write_Page() Erase_Block() Copy_Page()

HDD

Read_Sector() Write_Sector()

HDD

Read_Sector() Write_Sector()

Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer

Host Host

Application OS File System

Host Host

Application OS File System

Sector base Logical Block Address

NAND Flash Memory NAND Flash Memory

Disk

NAND Flash Memory

?

slide-4
SLIDE 4

SSD SSD

NAND Flash Memory

Flash translation layer (FTL)

  • Logical block mapping
  • Bad block management
  • Garbage Collection (GC)

The Trick is FTL!

Page Page Page Page

Block

Page Page Page Page

Block

Page Write Page Read Block Erase

Host Host

Application OS (Sector based) File System

FTL Generic Block Layer Generic Block Layer

slide-5
SLIDE 5

GC reclaims space to prepare new empty blocks

  • NAND’s “erase‐before‐update” requirement  Valid page copying

followed by an erase operation

  • Has a large impact on SSD lifetime and performance

Garbage Collection (GC)

Valid data1 Valid data2 Invalid data Invalid data

Block A

Valid page copying

Invalid data Invalid data Valid data3 Valid data4

Block B

Valid data1 Valid data2 Valid data3 Valid data4

Free block

Page Page Page Page Block A Page Page Page Page Block B

ERASED ERASED ERASED ERASED ERASED ERASED ERASED ERASED

slide-6
SLIDE 6

Performance of SSD gradually decreases as time goes on

  • Example: Cassandra update throughput

GC is Expensive!

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940

Cassandra Update Throughput (ops/sec) Time

Throughput

slide-7
SLIDE 7

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940

Cassandra Update Throughput (ops/sec) Time

GC is Expensive!

0.5 1 1.5 2 2.5 3 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940

Valid Pages copied (ops/sec) Cassandra Update Throughput (ops/sec)

Throughput GC overhead

GC highly affects the SSD performance!

(Minutes)

Performance of SSD gradually decreases as time goes on

  • Example: Cassandra update throughput
slide-8
SLIDE 8

Our I dea: Multi-streamed SSD

Host Host

Application OS File System

Generic Block Layer Generic Block Layer

SSD SSD

NAND Flash memory FTL

New interface for SSD

Co‐exists with the existing block layer General & concrete interface Multi‐streaming Interface Host‐provided stream information guides desirable data placement within SSD!

slide-9
SLIDE 9

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

The multi-streamed SSD can sustain Cassandra update throughput

End Result

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Proposed Traditional SSD

slide-10
SLIDE 10

Background

Write optimization in SSD

The Multi-streamed SSD

Our approach Case study

Evaluation

Experimental setup Results

Conclusion

Contents

slide-11
SLIDE 11

Previous write patterns (= current state) matter

Effects of Write Patterns

Sequential LBA updates into Block 2 Random LBA updates into Block 2

LBA 7 LBA 0 LBA 1 LBA 4 Block 0 LBA 2 LBA 3 LBA 6 LBA 5 Block 1 LBA 0 LBA 1 LBA 2 LBA 3 Block 2 LBA 0 LBA 1 LBA 2 LBA 3 LBA 7 LBA 0 LBA 1 LBA 4 Block 0 LBA 2 LBA 3 LBA 6 LBA 5 Block 1 LBA 0 LBA 1 LBA 4 LBA 7 Block 2 LBA 0 LBA 1 LBA 4 LBA 7 Need valid page copying from Block 0 & Block 1

Just erase Block 0

slide-12
SLIDE 12

Stream

SSD SSD

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Page Page Page Page

Block

Stream 1 Stream 2 Stream 3

Write to stream 1 Write to stream 2 Write to stream 3

Data Lifetime? Lifetime 1 Lifetime 2 Lifetime 3

… … …

slide-13
SLIDE 13

Generic Block Layer Generic Block Layer

Multi-streamed SSD

  • Mapping data with different lifetime to different streams

The Multi-streamed SSD

SSD SSD

NAND Flash Memory

Data1 Data3 Page Page Block Data2 Data7 Data9 Page Block

FTL

Host Host Multi‐stream interface

Stream ID = 1

Data10 Data12 Data13 Page Block

Stream ID = 3 Stream ID = 2 Application Data2 Data1 Data3 Data4 Data5 Data10

Provide information about data lifetime

StreamID

Place data with similar lifetime into the same erase unit

Data13 1 1 2 2 3 3 3 2

slide-14
SLIDE 14

Multi-streamed SSD

  • High GC efficiency (Reduce GC overheads)  effects on Performance!

Working Example

1 20 100 21 1

Without Stream Request data

Block 0 Block 1 Block 2

1 20 100 1 21 1 20 22 1 20 22 1 1 20 1 20 100 21 1

Request data

Block 0 Block 1 Block 2

22 1 20 1 20 100 1 21 1 20 22 1 20 1

Multi‐Stream

1 1 1 1 1 1

For effective multi‐streaming, proper mapping of data to streams is essential! Reduce valid pages to copy

slide-15
SLIDE 15

Cassandra employs a size-tiered compaction strategy

Case Study: Cassandra

Write Request

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

Flushing

slide-16
SLIDE 16

Write operations when Cassandra runs

Summary of Cassandra’s Write Patterns

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

metadata, journal … System

System data Write Commit‐log Write Compaction data write Flushing data

slide-17
SLIDE 17

Just one stream I D (= conventional SSD)

Mapping # 1: “Conventional”

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

metadata, journal … System

System data Write Commit‐log Write Compaction data write Flushing data

slide-18
SLIDE 18

Add a new stream to separately handle application writes (stream I D 1) from system traffic (stream I D 0)

Mapping # 2: “Multi-App”

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

metadata, journal … System

System data Write Commit‐log Write Compaction data write Flushing data

1 1 1

slide-19
SLIDE 19

Use three streams; further separate Commit Log

Mapping # 3: “Multi-Log”

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

metadata, journal … System

System data Write Commit‐log Write Compaction data write Flushing data

1 2 2

slide-20
SLIDE 20

Give distinct streams to different tiers of SSTables

Mapping # 4: “Multi-Data”

Memory

Commit Log

Memtable

SSTable 1

K1 K2

SSTable 2

K1 K3

SSTable 3

K2 K3

SSTable 4

K1 K3

SSTable 5

K1 K2 K3

SSTable 6 SSTable 7 SSTable 21

metadata, journal … System

System data Write Commit‐log Write Compaction data write Flushing data

1 2 3

Compaction data write

4

slide-21
SLIDE 21

Multi-stream SSD Prototype

  • Samsung 840 Pro SSD

– 60 GB device capacity YCSB benchmark on Cassandra

  • Write intensive workload

– 1 K data x 1,000,000 record counts – 100,000,000 operation counts I ntel i7-3770 3.4 GHz processor 2 GB Memory

  • Accelerates SSD aging by increasing

Cassandra’s flush frequency

Experimental Setup

Linux kernel 3.13 (modified)

  • Passes the stream ID through

fadvise() system call

  • Stores in the inode of VFS

Application VFS EXT4 Device fadvise (fd, Stream ID) inode field = Stream ID Store Stream ID In buffer head SSD

slide-22
SLIDE 22

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Cassandra’s normalized update throughput

  • Conventional “TRIM off”

Results

Conventional (TRIM off)

slide-23
SLIDE 23

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Conventional (TRIM off)

Cassandra’s normalized update throughput

  • Conventional “TRIM on”

Results

27/37

Conventional (TRIM on)

TRIM gives non‐trivial improvement

But still far from ideal

slide-24
SLIDE 24

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Conventional (TRIM off) Conventional (TRIM on)

Cassandra’s normalized update throughput

  • “Multi‐App” (System data vs. Cassandra data)

Results

Multi‐App

slide-25
SLIDE 25

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Conventional (TRIM off) Conventional (TRIM on) Multi‐App

Cassandra’s normalized update throughput

  • “Multi‐Log” (System data vs. Commit‐Log vs. Flushed data)

Results

Multi‐Log

slide-26
SLIDE 26

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Update Throughput (ops/sec)

Time (Minutes)

Conventional (TRIM off) Conventional (TRIM on) Multi‐App Multi‐Log

Cassandra’s normalized update throughput

  • “Multi‐Data” (System data vs. Commit‐Log vs. Flushed data vs. Compaction data)

Results

Multi‐Data

slide-27
SLIDE 27

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Time (Minutes)

Cassandra’s GC overheads

  • Conventional “TRIM off”

Result # 2

Conventional (TRIM off)

slide-28
SLIDE 28

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Valid page copied (ops/sec)

Time (Minutes)

Cassandra’s GC overheads

  • Conventional “TRIM on”

Results

Conventional (TRIM off)

Conventional (TRIM on)

slide-29
SLIDE 29

Cassandra’s GC overheads

  • “Multi‐App” (System data vs. Cassandra data)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Time (Minutes)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Valid page copied (ops/sec)

Time (Minutes)

Results

Conventional (TRIM off) Conventional (TRIM on)

Multi‐App

slide-30
SLIDE 30

Cassandra’s GC overheads

  • “Multi‐Log” (System data vs. Commit‐Log vs. Flushed data)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Time (Minutes)

Results

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Valid page copied (ops/sec)

Time (Minutes)

Conventional (TRIM off) Conventional (TRIM on) Multi‐App

Multi‐Log

slide-31
SLIDE 31

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Valid page copied (ops/sec)

Time (Minutes)

Cassandra’s GC overheads

  • “Multi‐Data” (System data vs. Commit‐Log vs. Flushed data vs. Compaction data)

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Time (Minutes)

Results The throughput is very well correlated with GC overheads

Conventional (TRIM off) Conventional (TRIM on) Multi‐App Multi‐Log

Multi‐Data

slide-32
SLIDE 32

99 99.1 99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9 100 25 50 75 100 125 150 175

Cumulated distribution (%)

Latency (us)

99 99.1 99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9 100 25 50 75 100 125 150 175

Cumulated distribution (%)

Latency (us)

Cassandra’s cumulated latency distribution

  • Multi‐streaming improves write latency
  • At 99.9%, Multi‐Data lowers the latency by 54 % compared to Normal

Result # 3

Conventional (TRIM on) Multi‐App Multi‐Log

Multi‐Data

slide-33
SLIDE 33

Multi-streamed SSD

  • Mapping application and system

data with different lifetimes to SSD streams

– Higher GC efficiency, lower latency

  • Multi‐streaming can be supported on

a state‐of‐the‐art SSD and co‐exist with the traditional block interface

  • Multi‐stream interface can be

standard for using SSD more efficiently

Conclusion

Host Host

Multi‐stream enhanced Block Layer Multi‐stream enhanced Block Layer

SSD SSD

NAND Flash memory FTL

slide-34
SLIDE 34