RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant - - PowerPoint PPT Presentation

▶

Mar 19, 2024 353 likes •599 views

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadlines April 6 th : student paper presentation This

SLIDE 1

RELIABILITY OF RESISTIVE MEMORIES

CS/ECE 7810: Advanced Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

SLIDE 2

Overview

¨ Upcoming deadlines

¤ April 6th: student paper presentation

¨ This lecture

¤ Hard errors in resistive memories ¤ Increasing reliability by replication, ECP

, SAFER, FREE-p

¤ Resistive computing

SLIDE 3

Recall: Resistive vs. Dynamic RAM

¨ Phase-Change RAM

¤ Nonvolatile ¤ Projected to be more

scalable

¤ Cells may be written

individually

¤ Slower, with more

energy intensive writes

¤ Susceptible to hard

errors

¨ DRAM

¤ Volatile, charge based ¤ Difficult to further scale

down the capacitor

¤ All of the accesses are

through row buffer

¤ Faster, with acceptable

energy consumption

¤ Vulnerable to soft

errors

SLIDE 4

Solutions to Memory Hard Errors

¨ Accept failure of some fraction of pages

¤ Map failed pages out of logical memory

¨ Wear-level data pages/blocks, and within blocks

¤ Shift/rotate data randomly (intervals/locations)

¨ Differential writes

¤ Write only cells with values that change

¨ Correct errors when possible

¤ Error correction techniques

SLIDE 5

Error Correction Techniques

¨ No correction (detection only)

¤ Inefficient ¤ A page must be retired when the first cell fails

¨ SECDED ECC

¤ With a 12.5% memory overhead 8 chips 8 bits/chip SEC/SECDED 64 bits 7/8 bits

10.9%/12.5% overhead

SLIDE 6

Error Correction Techniques

¨ No correction (detection only)

¤ Inefficient ¤ A page must be retired when the first cell fails

¨ SECDED ECC

¤ With a 12.5% memory overhead ¤ A page must be retired when a block within the page

suffers a second error

X X

SLIDE 7

Error Correction Codes

¨ Good for soft errors

¤ Transient errors

¨ Not good for hard errors

¤ ECC has high entropy and can hasten wear-out ¤ Flipping just one data bit changes about half of ECC bits

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

SLIDE 8

Dynamically Replicated Memory

¨ Goal: handle hard errors by pairing two pages that

have faults in different locations; replicate data across the two pages

¨ How: errors are detected with parity bits; replica

reads are issued if the initial read is faulty

[ASPLOS’10]

SLIDE 9

Dynamically Replicated Memory

¨ Improve the lifetime of PCM by up to 40x over

conventional error-detection techniques

[ASPLOS’10]

SLIDE 10

Error Correction Pointers

¨ Key idea: instead of using ECC to handle a few

transient faults in DRAM, use error-correcting pointers to handle hard errors in specific locations

¨ For a 512-bit line with 1 failed bit, maintain a 9-bit

field to track the failed location and another bit to store the value in that location

¨ Can store multiple such pointers and can recover

from faults in the pointers too

[ISCA’10]

SLIDE 11

Error Correction Pointers

1 1 0 … 1

511 510 509 508 3 2 1

8 7 6 5 3 2 1 4

correction pointer data cells

replacement cell

correction entry

Full?

1 [ISCA’10]

SLIDE 12

Error Correction Pointers

1 1 0 … 1

511 510 509 508 3 2 1

8 7 6 5 3 2 1 4

Full? 5 3 2 1

data cells correction entries

0000

[ISCA’10]

SLIDE 13

Error Correction Pointers

0001

1 1 0 … 1

511 510 509 508 3 2 1

8 7 6 5 3 2 1 4

5 3 2 1

1 1 1 1 1 1 1

8 7 6 5 3 2 1

0010

4 Full?

data cells correction entries What if correction entry fails?

[ISCA’10]

SLIDE 14

Stuck-At-Fault Error Recovery

¨ Observation: a failed cell with a stuck-at value is still

readable

¨ Goal: either write the word or its flipped version so that

the failed bit is made to store the stuck-at value

¨ For multi-bit errors, the line can be partitioned such that

each partition has a single error

¨ Errors are detected by verifying a write; recently failed

bit locations are cached so multiple writes can be avoided

[MICRO’10]

SLIDE 15

Stuck-At-Fault Error Recovery

[MICRO’10]

¨ Three partition candidates in SAFER

How to detect two fails? (read the paper)

SLIDE 16

Stuck-At-Fault Error Recovery

¨ Fail recovery

[MICRO’10]

SLIDE 17

Multi-tiered ECC for Hard/Soft Errors

¨ FREE-p: fine-grained remapping with ECC and

embedded pointer

¤ Re-use a “dead” 64B block for storing a remap pointer ¤ Architectural techniques to accelerate address

remapping

¨ Detection/correction at the memory controller

¤ Allow simple NVRAM devices ¤ Tolerate hard/soft errors in the cell array, periphery,

etc.

[HPCA’11]

SLIDE 18

FREE-p

¨ Embed a 64-bit pointer within a faulty block

¤ There are still-functional bits in a faulty block ¤ 1-bit D/P flag per 64B block

n Identify a block is remapped or not

¤ Avoid chained remapping

n Embed always the FINAL pointer

[HPCA’11]

SLIDE 19

Capacity vs. Lifetime

[HPCA’11]

SLIDE 20

Resistive Computation

¨ Leverage STT-MRAM for energy efficiency ¤ Near-zero leakage power ¤ Low-energy read operation ¨ Goal: selectively migrate on-chip storage and

combinational logic to STT-MRAM to reduce power

¤ On-chip storage: caches, TLBs, register files, queues ¤ Combinational logic: lookup-table (LUT) based computing

[ISCA’10]

SLIDE 21

Hybrid CMT Pipeline

¨ Small arrays

and simple logic in CMOS

¨ Large arrays

and complex logic in STT- MRAM

Fetch Logic

CLK

Inst Buf x 8 Thrd Sel

CLK CLK

Decode Logic Reg File x 8

CLK

Func Unit ALU FPU

CLK CLK

Shared L2$ Banks x 8

MC0 Queue MC0 Logic MC1 Queue MC1 Logic MC2 Queue MC2 Logic MC3 Queue MC3 Logic CLK

ST Buf x 8 STT-MRAM LUTs STT-MRAM Arrays Pure CMOS I$ I-TLB D$ D-TLB

[ISCA’10]

SLIDE 22

System Power

!"# $!"# %!"# &!"# '!"# (!!"# )*+,# ,--.*/0*#

!"#$%&'"()*&+"*,$%-.)/&#"& 0123&!"#$%&'"()*&

1234352#67829# :;<3=>?#67829# !"# $!"# %!"# &!"# '!"# (!!"# )*+,# ,--.*/0*#

!"#$#%"&'()"*&+(*,#-./"0& 1(&2345&!"#$#%"&'()"*&

1$# 1(2#345#-162# )7892#

[ISCA’10]

SLIDE 23

System Performance

0.2 0.4 0.6 0.8 1 B L A S T B S O M C G C H O L E S K Y E Q U A K E F F T K M E A N S L U M G O C E A N R A D I X S W I M W A T E R

G E O M E A N System Throughput Normalized to CMOS [ISCA’10]