A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for - - PowerPoint PPT Presentation

a semi preemptive garbage a semi preemptive garbage
SMART_READER_LITE
LIVE PREVIEW

A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for - - PowerPoint PPT Presentation

A Semi Preemptive Garbage A Semi Preemptive Garbage Collector for Solid State Collector for Solid State Collector for Solid State Collector for Solid State Drives Drives Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang,


slide-1
SLIDE 1

A Semi‐Preemptive Garbage Collector for Solid State A Semi‐Preemptive Garbage Collector for Solid State Collector for Solid State Drives Collector for Solid State Drives

Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

Presented by Junghee Lee Presented by Junghee Lee

slide-2
SLIDE 2

High Performance Storage Systems High Performance Storage Systems

  • Server centric services

– File, web & media servers, transaction processing servers File, web & media servers, transaction processing servers

  • Enterprise‐scale Storage Systems

– Information technology focusing on storage, protection, retrieval of data in large‐scale environments

Google's massive server farms

High Performance Storage Unit Hard Disk Drive

2

Google s massive server farms

Storage Systems

slide-3
SLIDE 3

Spider: A Large‐scale Storage System Spider: A Large‐scale Storage System

  • Jaguar

– Peta‐scale computing machine Peta scale computing machine – 25,000 nodes with 250,000 cores and over 300 TB memory

  • Spider storage system

– The largest center‐wide Lustre‐ based file system based file system – Over 10.7 PB of RAID 6 formatted capacity

13 400 1 TB HDD

  • 13,400 x 1 TB HDDs

– 192 Lustre I/O servers

  • Over 3TB of memory (on Lustre

3

I/O servers)

slide-4
SLIDE 4

Emergence of NAND Flash based SSD Emergence of NAND Flash based SSD

  • NAND Flash vs. Hard Disk Drives

– Pros:

  • Semi-conductor technology, no mechanical parts
  • Offer lower access latencies

– μs for SSDs vs. ms for HDDs μs for SSDs vs. ms for HDDs

  • Lower power consumption
  • Higher robustness to vibrations and temperature

C – Cons:

  • Limited lifetime

– 10K - 1M erases per block

  • High cost

– About 8X more expensive than current hard disks

  • Performance variability

4

y

slide-5
SLIDE 5

Outline Outline

  • Introduction
  • Background and Motivation
  • Background and Motivation

– NAND Flash and SSD – Garbage Collection – Pathological Behavior of SSDs

  • Semi‐Preemptive Garbage Collection
  • Evaluation
  • Conclusion

5

slide-6
SLIDE 6

NAND Flash based SSD NAND Flash based SSD

fwrite

Process Process

fwrite (file, data) Bl k it Application

Process Process File System (FAT, Ext2, NTFS …)

Block write (LBA, size) P i OS

Block Device Driver

Page write (bank, block, page) Device

Block Interface (SATA, SCSI, etc) Memory CPU Memory (FTL) Flash Flash Flash Flash SSD

6

Flash Flash Flash Flash

slide-7
SLIDE 7

NAND Flash Organization NAND Flash Organization

Package Plane 0 R i t

Read

Package Block 0 Page 0 Die 0 Register

Read 0.025 ms

Die 1 Page 63

… …

Plane 0 Plane 1 Plane 2 Plane 3

Write 0.200 ms E d

Plane 0 Plane 1 Plane 2 Plane 3 Block 2047 Page 0

Erased

Page 63

Erase 1.500 ms

7

slide-8
SLIDE 8

Out‐Of‐Place Write Out‐Of‐Place Write

Logical-to-Physical Address Mapping Table Physical Blocks P0 I LPN0 LPN1 Address Mapping Table PPN1 PPN4 P0 I P1 V P2 V P3 E I P3 V LPN1 LPN2 LPN3 PPN4 PPN2 PPN5 P3 E P4 V P5 V P3 V PPN3 LPN3 PPN5 P6 E P7 E

Write to LPN2 Invalidate PPN2 Write to PPN3 Update table

8

slide-9
SLIDE 9

Garbage Collection Garbage Collection

Physical Blocks P0 I P0 E P0 I P1 I P2 I P3 I Select Victim Block P1 V P3 V P0 E P1 E P2 E P3 E P3 I P4 V P5 V Move Valid Pages P3 V P3 E P6 E P7 E Erase Victim Block P6 V P7 V 2 reads + 2 writes + 1 erase= 2*0.025 + 2*0.200 + 1.5 = 1.950(ms) !!

9

slide-10
SLIDE 10

Pathological Behavior of SSDs Pathological Behavior of SSDs

  • Does GC have an impact on the foreground operations?

– If so, we can observe sudden bandwidth drop If so, we can observe sudden bandwidth drop – More drop with more write requests – More drop with more bursty workloads

  • Experimental Setup

SSD d i – SSD devices

  • Intel (SLC) 64GB SSD
  • SuperTalent (MLC) 120GB SSD

– I/O generator

  • Used libaio asynchronous I/O library for block‐level testing

10

slide-11
SLIDE 11

Bandwidth Drop for Write‐Dominant Workloads Bandwidth Drop for Write‐Dominant Workloads

  • ads
  • ads
  • Experiments

– Measured bandwidth for 1MB by varying read‐write ratio Measured bandwidth for 1MB by varying read write ratio

240 260 280

1MB Sequential

220 240

1MB Sequential

160 180 200 220 240 MB/s 160 180 200 MB/s 120 140 10 20 30 40 50 60 Time (Sec)

Intel SLC (SSD)

120 140 10 20 30 40 50 60 Time (Sec)

SuperTalent MLC (SSD)

80% Write 20% Read 60% Write 40% Read 40% Write 60% Read 20% Write 80% Read 80% Write 20% Read 60% Write 40% Read 40% Write 60% Read 20% Write 80% Read

Performance variability increases as we increase rite percentage of

  • rkloads

11

write-percentage of workloads.

slide-12
SLIDE 12

Performance Variability for Bursty Workloads Performance Variability for Bursty Workloads

  • Experiments

– Measured SSD write bandwidth for queue depth (qd) is 8 and 64 Measured SSD write bandwidth for queue depth (qd) is 8 and 64 – Normalized I/O bandwidth with a Z distribution

Intel SLC (SSD) SuperTalent MLC (SSD) Intel SLC (SSD) SuperTalent MLC (SSD)

Performance variability increases as we increase the arrival- rate of req ests (b rst

  • rkloads)

12

rate of requests (bursty workloads).

slide-13
SLIDE 13

Lessons Learned Lessons Learned

  • From the empirical study, we learned:

– Performance variability increases as the percentage of writes in Performance variability increases as the percentage of writes in workloads increases. – Performance variability increases with respect to the arrival rate of write requests write requests.

  • This is because:

– Any incoming requests during the GC should wait until the on‐going GC ends. GC i t ti – GC is not preemptive

13

slide-14
SLIDE 14

Outline Outline

  • Introduction
  • Background and Motivation
  • Background and Motivation
  • Semi‐Preemptive Garbage Collection

– Semi‐Preemption p – Further Optimization – Level of Allowed Preemption

  • Evaluation
  • Conclusion

14

slide-15
SLIDE 15

Technique #1: Semi‐Preemption Technique #1: Semi‐Preemption

Wz Wz Request

z

q Rx Rx Wx Wx E Ry Ry Wy Wy Time GC Wz Wz Wz Wz P ti GC Rx Rx Read page x Data transfer Non-Preemptive GC Preemptive GC Wx Wx E Write page x Erase a block Meta data update

15

E Erase a block

slide-16
SLIDE 16

Technique #2: Merge Technique #2: Merge

Ry Ry Request

y

q Rx Rx Time GC Wx Wx Ry Ry Wy Wy E Rx Rx Read page x Data transfer Wx Wx E Write page x Erase a block Meta data update

16

E Erase a block

slide-17
SLIDE 17

Technique #3: Pipeline Technique #3: Pipeline

Rz Rz Request

z

q Rx Rx Wx Wx E Ry Ry Wy Wy Time GC Rz Rz Rx Rx Read page x Data transfer Wx Wx E Write page x Erase a block Meta data update

17

E Erase a block

slide-18
SLIDE 18

Level of Allowed Preemption Level of Allowed Preemption

  • Drawback of PGC

: The completion time of GC is delayed p y  May incur lack of free blocks  Sometimes need to prohibit preemption

  • States of PGC

Garbage collection Read requests Write requests State 0 State 1 collection requests requests X O O O State 2 State 3 O O O X X X

18

slide-19
SLIDE 19

Outline Outline

  • Introduction
  • Background and Motivation
  • Background and Motivation
  • Semi‐Preemptive Garbage Collection
  • Evaluation

Evaluation

– Setup – Synthetic Workloads – Realistic Workloads

  • Conclusion

19

slide-20
SLIDE 20

Setup Setup

  • Simulator

– MSR’s SSD simulator based on DiskSim

  • Workloads

– Synthetic workloads

  • Used the synthetic workload generator in DiskSim
  • Used the synthetic workload generator in DiskSim

– Realistic workloads

W kl d Average request Read ratio Arrival rate Workloads g q size (KB) (%) (IOP/s) Financial 7.09 18.92 47.19 Cello 7 06 19 63 74 24 Write dominant Cello 7.06 19.63 74.24 TPC-H 31.62 91.80 172.73 OpenMail 9.49 63.30 846.62 dominant Read dominant

20

slide-21
SLIDE 21

Performance Improvements for Synthetic Workloads Performance Improvements for Synthetic Workloads

  • ads
  • ads
  • Varied four parameters: request size, inter‐arrival time,

sequentiality and read/write ratio V i d t ti fi i th

  • Varied one at a time fixing others

2.5 ms) 4.5 NPGC 2.0 se time (m 3.0 3.5 4.0 eviation PGC NPGC std PGC std 1.0 1.5 e respons 1 0 1.5 2.0 2.5 tandard d PGC std 8 16 32 64 0.5 Average 0.5 1.0 St

21

Request size (KB)

slide-22
SLIDE 22

Performance Improvement for Synthetic Workloads (con’t) Performance Improvement for Synthetic Workloads (con’t)

  • ads (co t)
  • ads (co t)

Bursty Random dominant Write dominant

Inter-arrival time (ms) 10 5 3 1 0.8 0.6 0.4 0.2 Probability of sequential access 0.8 0.6 0.4 0.2 Probability of read access

22

sequential access read access

slide-23
SLIDE 23

Performance Improvement for Realistic Workloads Performance Improvement for Realistic Workloads

  • ads
  • ads
  • Average Response Time

1.0 me

  • Variance of Response Times

1.0

  • n

0.8 sponse tim 0.8 d deviatio 0.4 0.6 d ave. res 0.4 0.6 d standard 0.2

  • rmalized

0.2

  • rmalized

Improvement of average response time by 6.5% and

Financial Cello TPC-H OpenMail N

Improvement of variance of response time by 49.8% and

Financial Cello TPC-H OpenMail No

23

66.6% for Financial and Cello. 83.3% for Financial and Cello.

slide-24
SLIDE 24

Conclusions Conclusions

  • Solid state drives

– Fast access speed – Performance variation  garbage collection

  • Semi‐preemptive garbage collection

Ser ice incoming req ests d ring GC – Service incoming requests during GC

  • Average response time and performance

variation are reduced by up to 66.6% and 83.3%

24

slide-25
SLIDE 25

Questions? Questions?

Contact info

Junghee Lee junghee.lee@gatech.edu Electrical and Computer Engineering Georgia Institute of Technology

25

slide-26
SLIDE 26

Thank you! Thank you!

26