Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling - - PowerPoint PPT Presentation

ouroboros wear leveling a two level hierarchical wear
SMART_READER_LITE
LIVE PREVIEW

Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling - - PowerPoint PPT Presentation

Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling Model for NVRAM Qingyue Liu Peter Varman ECE Department, Rice University May 18, 2017 New Challenges for New Technologies RRAM PCM 3DXpoint Advantages Major Drawback:


slide-1
SLIDE 1

Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling Model for NVRAM

Qingyue Liu Peter Varman ECE Department, Rice University May 18, 2017

slide-2
SLIDE 2

New Challenges for New Technologies

2

  • Advantages

– High-Density: Easy to scale down under 10nm – Non-volatile – In-place update – Low leakage power

  • Major Drawback:

– Lifetime endurance problem – PCM: 107~108 writes per cell – In practice, lifetime around 20x shorter without wear- leveling RRAM PCM 3DXpoint

slide-3
SLIDE 3

Wear-leveling (WL)

3

  • A technique for prolonging the service life of some

kinds of erasable computer storage media

  • Block migration across the memory with certain rules

– Move high usage blocks to low usage frames A A 800 B 20 C 270 D 6 E 80 F 600 G 100 H 96 1 2 3 4 5 6 7 Write D A Aim: Make write evenly distributed across the memory

slide-4
SLIDE 4

SSD WL vs. NVRAM WL

4

  • Solid State Disk (SSD)

– Written out-of-place – Granularity: ➢Read/write: page ➢Erase: block – Requires garbage collection

  • NVRAM

– In-place writing – Granularity: ➢Read/write: byte ➢No erase – No garbage collection

  • NVRAM has more freedom and can do better

– No complex design for garbage collection – Fine-grained wear-leveling – Allows both algebraic and full-associative logical to physical mappings

slide-5
SLIDE 5

Outline

5

  • Background
  • Previous Work
  • Our Contributions

– Hierarchical Ouroboros Wear-leveling – System Design

  • Architecture
  • Parameter selection

– Experiments and Results

  • Conclusion
slide-6
SLIDE 6

Previous Work: NVRAM

6

  • Wear-leveling using restricted algebraic mappings

– No address mapping table – Granularity: memory line (cache line) – Example: Start-Gap Wear-leveling [1]

  • Wear-leveling using fully-associative mappings

– Additional address mapping table needed – Granularity: block – Example: Segment Swapping [2], PCM-aware swap [3]

[1] Qureshi etal, "Enhancing lifetime and security of PCM-based main memory with start-gap wear-leveling." MICRO, 2009. [2] Zhou etal, “A durable and energy efficient main memory using phase change memory technology” ISCA, 2009. [3] A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Moss´e, “Increasing pcm main memory lifetime,” in Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, 2010, pp. 914–919.

slide-7
SLIDE 7

7

Start-Gap Method Analysis

  • Advantages:

– Distribute writes smoothly within the frame – Small space overhead – Simple algorithm

  • Disadvantages:

– Region size is limited since only 1 line is relocated at a time – May not use all the region to distribute the writes

A

WRITE A B C D GapLine Q R S T GapLine U V W X GapLine

7 6

Start Gap

slide-8
SLIDE 8

Previous Work: NVRAM

8

  • Wear-leveling using restricted algebraic mappings

– No address mapping table – Granularity: memory line (cache line) – Example: Start-Gap Wear-leveling [1]

  • Wear-leveling using fully-associative mappings

– Additional address mapping table needed – Granularity: block – Example: Segment Swapping [2], PCM-aware swap [3]

[1] Qureshi etal, "Enhancing lifetime and security of PCM-based main memory with start-gap wear-leveling." MICRO, 2009. [2] Zhou etal, “A durable and energy efficient main memory using phase change memory technology” ISCA, 2009. [3] A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Moss´e, “Increasing pcm main memory lifetime,” in Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, 2010, pp. 914–919.

slide-9
SLIDE 9

Segment Swap vs. PCM-aware Swap

9

  • Segment Swap:

– Periodically swap content in highest-usage frame with content in lowest-usage frame

A B C D E F G H 1 2 3 4 5 6 7 A

  • Advantages:

– Can involve all space into wear- leveling – Can easily be implemented

A G

  • PCM-aware Swap:

– Periodically swap content in highest-usage frame with content in random frame

slide-10
SLIDE 10

10

Analysis of 2 Swap Methods: A* Pattern

Without Wear-leveling Segment Swap PCM-aware Swap

  • A* Pattern: Write to the same logical block A

continuously

  • Deterministic swap is better than randomized

swap under correct conditions

slide-11
SLIDE 11

11

Analysis of 2 Swap Methods: AB* Pattern

Without Wear-leveling Segment Swap PCM-aware Swap

  • AB* Pattern: Alternate writes to two logical blocks A

and B (catastrophic pattern for Segment Swap)

  • Randomized swap is better than deterministic swap in

bad cases

slide-12
SLIDE 12

Outline

12

  • Background
  • Previous Work
  • Our Contributions

– Hierarchical Ouroboros Wear-leveling – System Design

  • Architecture
  • Parameter selection

– Experiments and Results

  • Conclusion
slide-13
SLIDE 13

NVRAM Model

13

  • Memory partitioned

into frames

  • Each frame holds a

block

  • A block holds a set of

memory lines

  • Block assumed to have

consecutive address range Block Memory line A B C D Q R S T U V W X A B C D Frame

slide-14
SLIDE 14

Hierarchical Ouroboros Wear-leveling

14

  • Aim:

– Guarantee write distribution as smooth as possible

  • Level 1: Local WL within frames

– Start-gap like rule – Smooth distribution of writes within a frame – Granularity: Memory line – Aim: Make expensive large block Global WL less frequent

slide-15
SLIDE 15

Hierarchical Ouroboros Wear-leveling

15

  • Level 2: Global WL across

frames

– Exploit demand prediction to direct global wear-leveling – Use randomization in block migration to avoid worst-case behavior – Smooth distribution of writes across frames – Granularity: Frame – Aim: Involve all memory space into wear-leveling

slide-16
SLIDE 16

Global Wear-Leveling Framework

16

  • Inputs

1. Usage counter of each physical frame (U) 2. Prediction of the number of future writes to each logical block (P) – Repetitive workloads – Program Analysis (embedded applications) – Use recent activity (demand) as predictor

Demand-based Ouroboros Migration

slide-17
SLIDE 17

Global Wear-Leveling Framework

17

  • 1. Collect statistics:
  • Estimate future demand of each block to form a vector P
  • Collect current usage for each frame to form a vector U
  • 2. Generate raw block migration mapping
  • Aim: Map the ith hottest (highest demand) block to the ith

coldest (lowest usage) frame

slide-18
SLIDE 18

Raw Block Migration

C D E B A 3 5

4

1 15 5 10 6 20 40 100

  • +

Hot-to-Cold Blocks Cold-to-Hot Frames F 2 10

18

Initialization:

Physical Frame(Usage U) 3 1 2 4 Logical Block (Demand P) D F C E B 3 1 2 4 A B C D E 3 1 2 4 Final Block Order 5 F A 5 5 20 5 100 40 6 10 10 15

slide-19
SLIDE 19

Global Wear-Leveling Framework

19

  • 1. Collect statistics:
  • Estimate future demand of each block to form a vector A
  • Collect current usage for each frame to form a vector U
  • 2. Generate raw block migration mapping
  • Aim: Map the ith hottest (highest demand) block to the ith

coldest (lowest usage) frame

  • 3. Classification step:
  • Identify a hot pool with up to K hottest blocks that meet a

minimum demand threshold

  • 4. Pruning Step:
  • Move only blocks in the hot pool to deterministic frames
slide-20
SLIDE 20

Block Migration with Pruning Method

Initialization:

A E C D B 3 1 2 4 Final Block Order: F 5 C D E B A 3 5

4

1 15 5 10 6 20 40 100

  • +

Hot-to-Cold Blocks Cold-to-Hot Frames F 2 10

|H| = |C| =2

20

C B Physical Frame(Usage U) 3 1 2 4 5 20 5 100 40 6 10 Logical Block (Demand D) A B C D E 3 1 2 4 F 5 10 15

slide-21
SLIDE 21

Deterministic Block Migration Ring

21

C

Deterministic Block Migration

E B

2 1 4

C B E Hot Block Hot Block Cold Block Cold Block

slide-22
SLIDE 22

Ouroboros Block Migration Ring

22

C B E F 5 3

2 1 4 5

Ouroboros Block Migration Ring Free Frame Pool C

Deterministic Block Migration

E B

2 1 4

Hot Block Cold Block Cold Block Hot Block Random Free Block

slide-23
SLIDE 23

Ouroboros Block Migration Ring

22

C B E F 5 3

2 1 4 5

Ouroboros Block Migration Ring Free Frame Pool C

Deterministic Block Migration

E B

2 1 4

C B E F Hot Block Cold Block Hot Block Cold Block Random Free Block

slide-24
SLIDE 24

Global Wear-Leveling Framework

23

  • 1. Collect statistics:
  • Estimate future demand of each block to form a vector A
  • Collect current usage for each frame to form a vector U
  • 2. Generate raw block migration mapping
  • Aim: Map the ith hottest (highest demand) block to the ith coldest

(lowest usage) frame

  • 3. Classification step:
  • Identify a hot pool with up to K hottest blocks that meet a

minimum demand threshold

  • 4. Pruning Step:
  • Move only blocks in the hot pool to deterministic frames
  • 5. Randomization step:
  • Identify free frame pool with more than K free frames for

randomization

  • 6. Form Ouroboros block migration ring for block relocation
slide-25
SLIDE 25

Block Migration with Randomization

Initialization:

Physical Frame(Usage U) A F C D B 3 1 2 4 Final Block Order: E 5 C D E B A 3 5

4

1 15 5 10 6 20 40 100

  • +

Hot-to-Cold Blocks Cold-to-Hot Frames Free Frame Pool 5 F 2 10

|H| = |C| =2, |F|=2

5

24

1 2

4

3 1 2 4 5 20 5 100 40 6 10 Logical Block (Demand D) A B C D E 3 1 2 4 F 5 10 15

slide-26
SLIDE 26

Outline

25

  • Background
  • Previous Work
  • Our Contributions

– Hierarchical Ouroboros Wear-leveling – System Design

  • Architecture
  • Parameter selection

– Experiments and Results

  • Conclusion
slide-27
SLIDE 27

Architecture

26

  • Each request

– Size 16B * 32 = 512B – Touch same partition and offset for all 32 chips

slide-28
SLIDE 28

Parameter Selection

27

  • Example: Parameter Selection for 512GB Memory

– Input:

  • l2 : 7x10−6 , Ωt : 0.6%, Ωs : 0.5%

– Output:

  • F: 8KB, ΓG: 1x108 , ΓL: 195

– Worst case overhead: Ωt : 0.52%, Ωs : 0.2% Parameter Selection

Smoothness level: l2 Global WL threshold: ΓG Local WL threshold :ΓL Frame size: F Time overhead: Ωt Space overhead: Ωs Local WL Constraint Global WL Constraint System Configurations System parameters

slide-29
SLIDE 29

Outline

28

  • Background
  • Previous Work
  • Our Contributions

– Hierarchical Ouroboros Wear-leveling – System Design

  • Architecture
  • Parameter selection

– Experiments and Results

  • Conclusion
slide-30
SLIDE 30

Experiments

29

  • Smoothness value:

– L ∞ smoothness: – L2 smoothness:

  • Usage Distribution
  • Experiments

– Micro Benchmarks:

  • A* pattern, AB* pattern,

AB*50% pattern

  • Total writes: 1014

– Storage Benchmarks:

  • MSR Cambridge pattern,

FIU IODedup pattern

  • Total writes per chip: 2.83 x 1012
  • Write rate per chip: 500MB/s x 32

Note: is the real usage distribution, is the ideal usage distribution, W is the total number of writes

slide-31
SLIDE 31

30

Micro Experiments Results

A* Pattern AB* Pattern (AB)*50% Pattern Without Wear-leveling Without Wear-leveling Without Wear-leveling After Wear-leveling After Wear-leveling After Wear-leveling

slide-32
SLIDE 32

Summary of Ourobros WL

31

  • Correct prediction: Achieve the best possible

smoothness behavior

  • Wrong prediction: No worse than the distribution
  • btained by a random write pattern
  • Partial correct prediction: Fully take advantages of

correct prediction to make usage distribution smooth

slide-33
SLIDE 33

32

Storage Experiments Results

MSR Cambridge FIU IODedup Without Wear-leveling Without Wear-leveling After Wear-leveling After Wear-leveling

slide-34
SLIDE 34

Comparison Among Three WL Methods

33

l∞ smoothness level l2 smoothness level

slide-35
SLIDE 35

Outline

34

  • Background
  • Previous Work
  • Our Contributions

– Hierarchical Ouroboros Wear-leveling – System Design

  • Architecture
  • Parameter selection

– Experiments and Results

  • Conclusion
slide-36
SLIDE 36

Conclusion

35

  • Design a Hierarchical Ouroboros Wear-leveling Method

– Memory line level Local Wear-leveling – Frame level Global Wear-leveling

  • Devise a cyclic block migration method

– Deterministically smooth wear out based on prediction – Involve randomization to break up destructive write pattern

  • Show Ouroboros wear-leveling system architecture
  • Provide a general way to select parameter settings
  • Show the realizability and feasibility of Ouroboros wear-

leveling through experiments