Leveraging Value Locality in Optimizing NAND Flash-based SSDs Aayush - - PowerPoint PPT Presentation

leveraging value locality in optimizing nand flash based
SMART_READER_LITE
LIVE PREVIEW

Leveraging Value Locality in Optimizing NAND Flash-based SSDs Aayush - - PowerPoint PPT Presentation

Leveraging Value Locality in Optimizing NAND Flash-based SSDs Aayush Gupta , Raghav Pisolkar, Bhuvan Urgaonkar and Anand Sivasubramaniam Computer Systems Lab The Pennsylvania State University 1 Agenda Relook at Locality Another


slide-1
SLIDE 1

Leveraging Value Locality in Optimizing NAND Flash-based SSDs

Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar and Anand Sivasubramaniam Computer Systems Lab The Pennsylvania State University

1

slide-2
SLIDE 2

Agenda

  • Relook at Locality
  • Another dimension of Locality : Value Locality
  • Value Locality and SSDs
  • CA-SSD Design
  • Mapping Structures
  • Metadata Management
  • Evaluation
  • CA-SSD vs Traditional SSD

2

slide-3
SLIDE 3

Locality: The pillar of storage

  • Temporal Locality
  • If a logical address is accessed now, it is likely to

be accessed again in the near future

  • Spatial Locality
  • If a logical address is accessed now, there is a high

likelihood that its neighboring addresses will be accessed in the near future

  • Pervasive : L1/L2 cache, TLB, Buffer Cache,

Virtual Memory, Disk Cache, Web Cache …

3

slide-4
SLIDE 4

Another Dimension of Locality

  • Value Locality
  • Certain content is accessed preferentially
  • Data deduplication using Content Addressable

Storage (CAS)

  • Use cases of Value Locality (VL)
  • Network traffic reduction
  • Content based Caching
  • Efficient data storage (archival/backup)
  • E.g: Venti, Foundation, EMC Centera, Data Domain Storage

Systems

4

Can we use Value Locality to address the idiosyncrasies of SSDs?

slide-5
SLIDE 5

CAS suits SSD

5

SSD CAS

Provides Deduplication

Writes are a bottleneck

Read/Write asymmetry Block Erases

slide-6
SLIDE 6

CAS and SSD: Made for each other?

6

SSD CAS

Out of Place Updates

slide-7
SLIDE 7

Out of Place Updates in CAS

7

Storage

ABC DEF PQR 120 121 122 123 124

Physical Address

120 121 122 123 124

Logical Address Translation Write (123, ‘XYZ’)

TUV

Write (123, ‘ABC’)

XYZ

slide-8
SLIDE 8

CAS and SSD: Made for each other?

8

SSD CAS

Out of Place Updates Erase before Write

slide-9
SLIDE 9

Problem with CAS

9

SSD CAS

Loss of Sequentiality Fast Random Reads

Do real workloads exhibit Value Locality?

slide-10
SLIDE 10

Workloads [Koller10]

10

Workload Writes (%) Total Requests (Millions) Unique Write Requests (%) Unique Read Requests (%)

web 77.0 3.8 42.35 32.05 mail 77.3 3.6 7.83 80.85 homes 96.7 4.4 66.37 80.75

[Koller10] Koller, R., and Rangaswami, R. “I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance.” (FAST’10)

Write dominant Duplication

slide-11
SLIDE 11

Value Popularity

  • VP represents the number of occurrences of

each unique value in a workload

  • Signifies potential for deduplication for a workload

11

slide-12
SLIDE 12

Some Values are Very Popular

12

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2

Cumulative fraction of write accesses Unique Values (x 105)

mail

24K 8.8% 0.5

slide-13
SLIDE 13

Some Values are Very Popular

13

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5

Cumulative fraction of write accesses Unique Values (x 105)

web mail homes

30%

8.8%

24K 84K 0.5

slide-14
SLIDE 14

CA-SSD

14

SSD CAS

slide-15
SLIDE 15

CA-SSD Design

15

SSD Controller Hash Co-processor RAM BB-RAM

slide-16
SLIDE 16

CA-SSD Design

16

BB-RAM (Mapping structures)

SSD Controller (FTL) Hash Co-processor Device Driver Data H(Data) Update Mapping Structures PPN Write LPN, Data NULL H(Data) PPN, Data Write Return LPN, Data Write

slide-17
SLIDE 17

Mapping Structures: LPT & HPT

17

LPT

LPN

L1 L2 L3

PPN

P1 P2 P1 L4 P4

HPT

Hash

H1 H2 H3

PPN

P1 P2 P3 H4 P4

slide-18
SLIDE 18

Mapping Structures: iLPT

18

iLPT

PPN

P1 P2 P3 P4

LPN

L1, L3 L2 L4 INV

HPT

Hash

H1 H2 H3

PPN

P1 P2 P3 H4 P4

LPT

LPN

L1 L2 L3

PPN

P1 P2 P1 L4 P4

slide-19
SLIDE 19

Mapping Structures: iLPT & iHPT

19

iLPT

PPN

P1 P2 P3 P4

LPN

L1, L3 L2 L4 INV

HPT

Hash

H1 H2 H3

PPN

P1 P2 P3 H4 P4

iHPT

PPN

P1 P2 P3 P4

Hash

H1 H2 H3 H4

LPT

LPN

L1 L2 L3

PPN

P1 P2 P1 L4 P4

Remove

slide-20
SLIDE 20

Metadata: Traditional SSD

20

SSD Controller RAM

LPT

LPN

L1 L2 L3

PPN

P1 P2 P3 L4 P4

slide-21
SLIDE 21

Metadata : CA-SSD

21

SSD Controller Hash Co-processor BB-RAM

iHPT

PPN Hash P1 H1 P2 H2 P3 H3 P4 H4

LPT LPN PPN L1 P1 L2 P2 L3 P1 L4 P4

How do we fit the metadata in CA-SSD’s RAM?

HPT

Hash PPN H1 P1 H2 P2 H3 P3 H4 P4 iLPT PPN LPN P1 L1,L3 P2 L2 P3 INV P4 L4

Option 1: Larger RAM

Not Scalable!!

BB-RAM

slide-22
SLIDE 22

Option 2 : Shrink Metadata

22

BB-RAM

iHPT

PPN Hash P1 H1 P2 H2 P3 H3 P4 H4

LPT

LPN PPN L1 P1 L2 P2 L3 P1 L4 P4

SSD Controller Hash Co-processor

HPT

Hash PPN H1 P1 H2 P2 H3 P3 H4 P4

iLPT

PPN LPN P1 L1,L3 P2 L2 P3 INV P4 L4

slide-23
SLIDE 23

Temporal Value Locality

  • TVL implies that if a certain value is accessed

now, it is likely to be accessed again in the near future not necessarily from the same address

23

slide-24
SLIDE 24

Temporal Value Locality: Writes

24

0.2 0.4 0.6 0.8 1 0.75 1.5 2.25

Cumulative Fraction of Write Requests Position in LRU Queue (x 105)

Value LPN

  • Higher TVL than

traditional TL

  • Shrink metadata

using TVL

web

1.3K 141K 0.9

slide-25
SLIDE 25

Metadata Management: TVL

25

SSD Controller Hash Co-processor BB-RAM

HPT iHPT

Hash H1 H2 H3 PPN P1 P2 P3 H4 P4 PPN P1 P2 P3 P4 Hash H1 H2 H3 H4

LPT LPN PPN L1 P1 L2 P2 L3 P1 L4 P4

iLPT PPN LPN P1 L1,L3 P2 L2 P3 INV P4 L4

slide-26
SLIDE 26

Metadata Management: TVL

26

SSD Controller Hash Co-processor BB-RAM

iLPT PPN LPN P1 L1,L3 P2 L2 P3 INV P4 L4

HPT

Hash PPN H1 P1 H2 P2 H3 P3 iHPT PPN Hash P1 H1 P2 H2 P3 H3

LPT LPN PPN L1 P1 L2 P2 L3 P1 L4 P4

MRU LRU Discard

How does CA-SSD perform compared to Traditional SSDs?

slide-27
SLIDE 27

Evaluation : Response Time

27

2 4 6 8 10 12 14 web mail home

Response Time (ms) Traces

NON CAS CAS 16K 64K 128K

65% 84% 7ms Mail shows lower TVL

slide-28
SLIDE 28

Evaluation : Response Time

28

2 4 6 8 10 12 14 web mail home

Response Time (ms) Traces

NON CAS CAS 128K

Similar to infinite RAM

slide-29
SLIDE 29

Total Writes : web

29

1 2 3 4 5 6 7 8 Non CAS CAS 16K 64K 128K

Total writes (millions)

Workload Writes GC writes

94% reduction

  • Dedup reduces

valid content

  • 75% reduction in

valid pages copied

slide-30
SLIDE 30

Total Erases : web

30

20 40 60 80 100 120 NON CAS CAS 16K 64K 128K

Block erases (Thousands)

77%

  • Lesser number
  • f total writes
  • Reduced GC

invocation

slide-31
SLIDE 31

Conclusions

  • Workloads exhibit significant value locality
  • Characterization of Value Popularity and Temporal

Value Locality

  • CAS and SSDs complement each other
  • Certain implementation challenges need to be

addressed

  • Mapping structures
  • Metadata Management

31

slide-32
SLIDE 32

Thank You Questions???

32