Improving the Performance and Endurance of Encrypted Non-volatile - - PowerPoint PPT Presentation

improving the performance and endurance of encrypted non
SMART_READER_LITE
LIVE PREVIEW

Improving the Performance and Endurance of Encrypted Non-volatile - - PowerPoint PPT Presentation

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao*, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China *Arizona State


slide-1
SLIDE 1

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes

Pengfei Zuo, Yu Hua, Ming Zhao*, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China *Arizona State University (ASU), USA

slide-2
SLIDE 2

Non-volatile Memory (NVM)

Non-volatile memory is expected to replace or complement DRAM in memory hierarchy

Non-volatility, low power, high density, large capacity

PCM ReRAM DRAM Read (ns) 20-70 20-50 10 Write (ns) 150-220 70-140 10 Non-volatility √ √ × Standby Power ~0 ~0 High Endurance 107~109 108~1012 1015 Density (Gb/cm2) 13.5 24.5 9.1

PCM ReRAM

2

  • K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015.
  • C. Xu et al. “Overcoming the Challenges of Crossbar Resistive Memory Architectures”, HPCA, 2015.
slide-3
SLIDE 3

Endurance and Security in Non-volatile Memory

NVM typically has limited endurance

– 107~109 for PCM, 108~1012 for ReRAM – Writes have much higher latency than reads – Write reduction matters for NVM

3

NVM is vulnerable to stolen DIMM attack

– NVM still retains data after systems power down – An attacker can directly read data from the stolen NVM – Memory encryption matters for NVM

slide-4
SLIDE 4

4

Encryption Increases Bit Flips to NVM

Diffusion property of encryption

– The change of one bit in the original data has to modify half of bits in the encrypted data

4

00000000…000000000000 10000000…000000000000 01011010…000010110100 10101100…000100101001

Encryption Encryption

1 of 512 bits modified 256 of 512 bits modified

Old data in NVM: New data:

Overwrite Overwrite

slide-5
SLIDE 5

Encryption Increases Bit Flips to NVM

5

Young et al. “DEUCE: Write-efficient encryption for non-volatile memories”, in Proc. of ASPLOS, 2015.

4X Encryption renders existing bit-level write reduction techniques ineffective

slide-6
SLIDE 6

Observation and Motivation

  • A large number of entire-line duplicates exist in real-world applications

6

SPEC CPU2006 PARSEC 2.1

slide-7
SLIDE 7

DeWrite

  • Lightweight cache-line-level

deduplication for NVMM

– Employ lightweight hashing

– Leverage NVM read/write asymmetry – Eliminate a write at the cost of a read

7

Last Level Cache

Metadata Cache AES-ctr Memory Controller Dedup Logic

Metadata: Direct encryption

Metadata Storage Encrypted NVMM

Data: CME Data OTP Non-duplicate

Hardware Architecture

  • Efficient synergization between

deduplication and encryption

– Opportunistic parallelism – Metadata storage co-location

slide-8
SLIDE 8

Prediction-based Parallelism

8

Detect Duplication Is duplicate ? Encrypt Data Write to NVM Cancel the Write No Yes A Write Request

The direct way

Be inefficient for non-duplicate writes

  • Serial execution latency
slide-9
SLIDE 9

Prediction-based Parallelism

9

Detect Duplication Is duplicate ? Encrypt Data Write to NVM Cancel the Write No Yes A Write Request

The direct way

Detect Duplication Is duplicate ? Write to NVM Discard the Ciphertext No Encrypt Data Yes A Write Request

The parallel way

Be inefficient for non-duplicate writes

  • Serial execution latency

Be inefficient for duplicate writes

  • Unnecessary encryption
slide-10
SLIDE 10

Prediction-based Parallelism

10

Detect Duplication Is duplicate ? Encrypt Data Write to NVM Cancel the Write No Yes A Write Request

The direct way

Detect Duplication Is duplicate ? Write to NVM Discard the Ciphertext No Encrypt Data Yes A Write Request

The parallel way

Be inefficient for non-duplicate writes

  • Serial execution latency

Be inefficient for duplicate writes

  • Unnecessary encryption
slide-11
SLIDE 11

Prediction-based Parallelism

11

Detect Duplication Is duplicate ? Encrypt Data Write to NVM Cancel the Write No Yes A Write Request

The direct way

Detect Duplication Is duplicate ? Write to NVM Discard the Ciphertext No Encrypt Data Yes A Write Request

The parallel way

Prediction

Duplicate Non-duplicate

slide-12
SLIDE 12

Prediction-based Parallelism

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:

12

Memory CPU

A B C D

1: duplicate 0: non-duplicate

History window

slide-13
SLIDE 13

Prediction-based Parallelism

13

Memory CPU

A B C D

1: duplicate 0: non-duplicate

History window

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-14
SLIDE 14

Prediction-based Parallelism

14

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-15
SLIDE 15

Prediction-based Parallelism

15

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-16
SLIDE 16

Prediction-based Parallelism

16

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window Predict

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-17
SLIDE 17

Prediction-based Parallelism

17

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-18
SLIDE 18

Prediction-based Parallelism

18

Memory CPU 1: duplicate 0: non-duplicate

A B D

History window

C

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-19
SLIDE 19

Prediction-based Parallelism

19

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window Predict

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-20
SLIDE 20

Prediction-based Parallelism

20

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

92.1% accuracy

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-21
SLIDE 21

Prediction-based Parallelism

21

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

0 0

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-22
SLIDE 22

Prediction-based Parallelism

22

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

0 0

Predict

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-23
SLIDE 23

Prediction-based Parallelism

23

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

0 0

92.1% 93.6%

Predict

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-24
SLIDE 24

Prediction-based Parallelism

24

Memory CPU 1: duplicate 0: non-duplicate

A B C D

History window

0 0

92.1% 93.6%

Predict

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
slide-25
SLIDE 25

Prediction-based Parallelism

  • How to know whether a cache line is duplicate beforehand?
  • Observation: duplication states of most memory writes are the

same as those of their previous ones

  • A prediction scheme:
  • Rationale: the size of duplicate (non-duplicate) data is usually

much larger than a cache line

– E.g., a page (4KB) is duplicate or non-duplicate: 100% accuracy

25

Why can we achieve such a high prediction accuracy? 92.1% 93.6%

slide-26
SLIDE 26

Lightweight Deduplication for NVMM

Traditional deduplication

26

SHA1/ MD5 SHA1/ MD5 Non-duplicate Duplicate Hash computation latency: >300ns ≈ NVM write latency

Match? Match?

Y N

slide-27
SLIDE 27

Lightweight Deduplication for NVMM

Traditional deduplication

27

SHA1/ MD5 SHA1/ MD5 Non-duplicate Duplicate Hash computation latency: >300ns ≈ NVM write latency

DeWrite

CRC-32 CRC-32

Match? Match?

Non-duplicate Read data and compare Read data and compare

Match? Match?

Duplicate

Match? Match?

15ns 75ns+1ns

Y N Y Y N N The latency is 91ns at most

slide-28
SLIDE 28

Metadata Colocation Encryption metadata: per-line counter

28

AES-ctr

LineAddr Counter Key

+

Plaintext Plaintext

+

Ciphertext Ciphertext Encryption Decryption OTP

slide-29
SLIDE 29

Metadata Colocation Encryption metadata: per-line counter Deduplication metadata: address mapping, reverted hash

29

slide-30
SLIDE 30

Metadata Colocation

30

  • a2
  • a4

a5 an

… 1 2 3 4 5 n initAddr: realAddr:

h0 h1

  • h3
  • hn

… 1 2 3 4 5 n initAddr: hash:

(b) The address mapping table (a) The inverted hash table

Deduplicated

Encryption metadata: per-line counter Deduplication metadata: address mapping, reverted hash

‘-’: empty

slide-31
SLIDE 31

Metadata Colocation

31

c0 c1 a2 c3 a4 a5 an

… 1 2 3 4 5 n initAddr: realAddr:

h0 h1 c2 h3 c5 hn

… 1 2 3 4 5 n initAddr:

c4

hash:

Encryption metadata: per-line counter Deduplication metadata: address mapping, reverted hash

(b) The address mapping table (a) The inverted hash table

slide-32
SLIDE 32

Metadata Colocation

32

c0 c1 a2 c3 a4 a5 an

… 1 2 3 4 5 n initAddr: realAddr:

h0 h1 c2 h3 c5 hn

… 1 2 3 4 5 n initAddr:

c4

Flag hash:

Encryption metadata: per-line counter Deduplication metadata: address mapping, reverted hash

(b) The address mapping table (a) The inverted hash table

slide-33
SLIDE 33

Evaluation

Benchmarks

– 12 Benchmarks from SPEC CPU2006: single-threaded – 8 benchmarks from m PARSEC 2.1: multiple-threaded

33

Simulation: gem5 + NVMain

slide-34
SLIDE 34

NVM Endurance

DeWrite reduces 54% writes to secure NVM on average

34

slide-35
SLIDE 35

Write Speedup

35

DeWrite speeds up NVM writes by 4.2X on average

slide-36
SLIDE 36

Read Speedup

36

DeWrite speeds up NVM reads by 3.1X on average

slide-37
SLIDE 37

Instructions per Cycle

37

DeWrite improves the IPC by 80% on average

slide-38
SLIDE 38

Energy Consumption

38

  • DeWrite reduces energy consumption by 40% on average
slide-39
SLIDE 39

Space Overheads of Metadata Storage & Cache

39

Metadata storage

– 6.25%

Metadata cache

– (a) 512KB – (b) 512KB – (c) 512KB – (d) 128KB – Total <2MB

slide-40
SLIDE 40

Conclusion

Memory encryption renders the bit-level write reduction techniques ineffective for secure NVMM This paper proposes DeWrite, a line-level write reduction technique to enhance the endurance and performance

– Lightweight cahe-line-level deduplication – Efficient synergization of deduplication and encryption

Reduce 54% writes, speed up memory writes and reads

  • f secure NVMM by 4.2× and 3.1×, on average

40

slide-41
SLIDE 41

Thanks! Q&A