SuperMem: Enabling Application- transparent Secure Persistent Memory - - PowerPoint PPT Presentation

supermem enabling application transparent secure
SMART_READER_LITE
LIVE PREVIEW

SuperMem: Enabling Application- transparent Secure Persistent Memory - - PowerPoint PPT Presentation

SuperMem: Enabling Application- transparent Secure Persistent Memory with Low Overheads Pengfei Zuo 1,2 , Yu Hua 1 , Yuan Xie 2 1 Huazhong University of Science and Technology, China 2 University of California at Santa Barbara, USA 52nd IEEE/ACM


slide-1
SLIDE 1

SuperMem: Enabling Application- transparent Secure Persistent Memory with Low Overheads

Pengfei Zuo1,2, Yu Hua1, Yuan Xie2

1 Huazhong University of Science and Technology, China 2 University of California at Santa Barbara, USA 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019

slide-2
SLIDE 2

DRAM Persistent Memory

2

Low power Low power

Images from Internet

slide-3
SLIDE 3

Two Key Challenges for Persistent Memory

3

Persistence Security Volatile: Non-volatile: Core Cache

Persistent Memory Inconsistency Clflush, mfence & logging for crash consistency Memory encryption for data security username, password Persistent Memory

Gap between persistence and security: Encryption incurs new inconsistency problem

slide-4
SLIDE 4

Counter Mode Encryption

4

Data Counter

AES Engine One-time pad XOR

Encrypted Data

Counter Cache

Write Back

CPU Cache

Write Back

slide-5
SLIDE 5

Counter Mode Encryption

5

Counter Cache CPU Cache

Encrypted Data Updated Counter Write Back Write Back

slide-6
SLIDE 6

Crash Inconsistency Caused by Encryption

6

Counter Cache CPU Cache

Encrypted Data Updated Counter

Data and counter cannot reach NVM at the same time

Write Back Write Back

slide-7
SLIDE 7

Crash Inconsistency Caused by Encryption

7

Counter Cache CPU Cache

Encrypted Data Updated Counter

CASE 1:

Data and counter cannot reach NVM at the same time

Write Back Write Back

slide-8
SLIDE 8

Crash Inconsistency Caused by Encryption

8

Counter Cache CPU Cache

Encrypted Data Updated Counter

CASE 2:

Data and counter cannot reach NVM at the same time

Write Back Write Back

slide-9
SLIDE 9

Crash Inconsistency Caused by Encryption

9

Counter Cache CPU Cache Data and counter cannot reach NVM at the same time Clflush and mfence cannot operate the counter cache

Encrypted Data Updated Counter Write Back Write Back

Clflush

slide-10
SLIDE 10

Existing Solutions (Write-back Counter Cache)

10

Large Battery Backup

[Awad et al., ASPLOS’16] [Zuo et al., MICRO’18]

Software-level Modification

[Liu et al., HPCA’18]

Error Correction

[Ye et al., MICRO’18]

Counter Cache CPU Cache

Battery

New programming primitives

  • counter_cache_writeback()
  • CounterAtomic

App Encrypted Unencrypted

Check Expensive Portability limitation Long recovery time

slide-11
SLIDE 11

SuperMem: Secure and Persistent Memory

Exploit a write-through counter cache

– No large battery backup – No software-level modifications – No need to correct counters – Double writes

A counter write coalescing scheme

– Reduce the number of write requests

A cross-bank counter storage scheme

– Speedup memory writes

11 Asynchronous DRAM refresh (ADR): cache lines reaching the write queue can be considered durable.

slide-12
SLIDE 12

SuperMem: Secure and Persistent Memory

12 Asynchronous DRAM refresh (ADR): cache lines reaching the write queue can be considered durable.

Write-through counter cache (Guarantee consistency) Counter write coalescing (Reduce writes) Cross-bank counter storage (Speedup writes)

Application-transparent

slide-13
SLIDE 13

SuperMem: Secure and Persistent Memory

13 Asynchronous DRAM refresh (ADR): cache lines reaching the write queue can be considered durable.

Write-through counter cache (Guarantee consistency) Counter write coalescing (Reduce writes) Cross-bank counter storage (Speedup writes)

slide-14
SLIDE 14

Write-through Counter Cache

Ensure that data and its counter reach the write queue in the same time

– Write through counter cache

14

Memory Ctrl CPU Write Queue Flu(A) Read(Ac) Ac++ Enc(A) Ack(A) Ret(A)

App(Ac) App(A)

slide-15
SLIDE 15

Register

Write-through Counter Cache

Ensure that data and its counter reach the write queue in the same time

– Write through counter cache – Add a register

15

Memory Ctrl CPU Write Queue Flu(A) Read(Ac) Ac++ Enc(A) Sto(Ac) Sto(A) App(Ac+A) Ack(A) Ret(A)

slide-16
SLIDE 16

SuperMem: Secure and Persistent Memory

16 Asynchronous DRAM refresh (ADR): cache lines reaching the write queue can be considered durable.

Write-through counter cache (Guarantee consistency) Counter write coalescing (Reduce writes) Cross-bank counter storage (Speedup writes)

slide-17
SLIDE 17

Cross-bank Counter Storage

SingleBank: Counters are stored in a continuous area in NVM [ASPLOS’15, ASPLOS’16, HPCA’18]

17

Data0 Data1 Data2 Ctr0, Ctr1, Ctr2

1 2 3 4 5 6 7 Bank ID: Data Area Ctr Area

Bottleneck

slide-18
SLIDE 18

Cross-bank Counter Storage

SameBank: Stores the counters of data into their local banks

18

1 2 3 4 5 6 7 Bank ID: Data Area Ctr Area 1 2 3 4 5 6 7

Ctr0, Data0 Ctr1, Data1 Ctr2, Data2

2X write latency

slide-19
SLIDE 19

Cross-bank Counter Storage

XBank: Stores each data and its counter into different banks to leverage bank parallelism

19

1 2 3 4 5 6 7 Bank ID: Data Area Ctr Area 4 5 6 7 1 2 3

Data0 Data1 Data2 Ctr0 Ctr1 Ctr2

slide-20
SLIDE 20

SuperMem: Secure and Persistent Memory

20 Asynchronous DRAM refresh (ADR): cache lines reaching the write queue can be considered durable.

Write-through counter cache (Guarantee consistency) Counter write coalescing (Reduce writes) Cross-bank counter storage (Speedup writes)

slide-21
SLIDE 21

Locality-aware Counter Write Coalescing

Spatial locality of counter storage

– All counters of a page are stored in a counter line

21

Line1 Line2 Line3 Line4 Line64

… … … …

M m1 m2 m3 m4 m64

A page:

(64 lines)

A counter line:

(64B)

slide-22
SLIDE 22

Locality-aware Counter Write Coalescing

Spatial locality of counter storage

– All counters of a page are stored in a counter line

22

Line1 Line2 Line3 Line4 Line64

… … … …

A page:

(64 lines)

Spatial locality of log and data writes

A log entry or the transaction data

slide-23
SLIDE 23

Locality-aware Counter Write Coalescing

23

Line1 Line2 Line3 Line4 Line64

… … … …

A page:

(64 lines)

An example of writing 4 lines within a page

slide-24
SLIDE 24

A Ac B Bc C Cc D Dc

Locality-aware Counter Write Coalescing

24

A B C D Write Queue

An example of writing 4 lines within a page

Cache

slide-25
SLIDE 25

A Ac B Bc C Cc D Dc

Locality-aware Counter Write Coalescing

25

Write Queue

An example of writing 4 lines within a page

Ac:

M m1

'

m2 m3 m4 m64

Bc:

M m1

' m2 '

m3 m4 m64

Cc:

M m1

' m2 ' m3 '

m4 m64

Dc:

M m1

' m2 ' m3 ' m4 '

m64

slide-26
SLIDE 26

A Ac B Bc C Cc D Dc

Locality-aware Counter Write Coalescing

26

Write Queue

Ac:

M m1

'

m2 m3 m4 m64

Bc:

M m1

' m2 '

m3 m4 m64

Cc:

M m1

' m2 ' m3 '

m4 m64

Dc:

M m1

' m2 ' m3 ' m4 '

m64

Coalescing counter writes in the write queue

slide-27
SLIDE 27

A Ac B Bc C Cc D Dc

Locality-aware Counter Write Coalescing (CWC)

27

Write Queue

Coalescing counter writes in the write queue

Without CWC A B C D Dc With CWC

slide-28
SLIDE 28

Performance Evaluation

Model NVM using gem5 and NVMain

Comparisons Unsec: An un-encrypted NVM WB: An ideal write-back scheme WT: A write-through scheme WT+CWC: A write-through scheme with CWC WT+Xbank: A write-through scheme with XBank SuperMem Benchmarks Array: Randomly swapping entries Queue: Randomly enqueueing and dequeueing B-tree: Inserting random KVs Hash Table: Inserting random KVs RB-tree: Inserting random KVs

28

slide-29
SLIDE 29

Transaction Execution Latency – Single-core

29

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Unsec WB WT WT+CWC WT+XBank SuperMem

Normalized Execution Latency

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Normalized Execution Latency

Unsec WB WT WT+CWC WT+XBank SuperMem

  • SuperMem achieves the performance comparable to a secure

NVM with an ideal write-back cache (WB)

Transaction size: 256B Transaction size: 4KB

WT+CWC

slide-30
SLIDE 30

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Normalized Execution Latency

Unsec WB WT WT+CWC WT+XBank SuperMem

Transaction Execution Latency – Multi-core

30

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Unsec WB WT WT+CWC WT+XBank SuperMem

Normalized Execution Latency

  • SuperMem achieves the performance comparable to a secure

NVM with an ideal write-back cache (WB)

2 programs 8 programs

WT+XBank

slide-31
SLIDE 31

The Number of Write Requests

31

A r r a y Q u e u e B

  • t

r e e H a s h T a b l e R B

  • t

r e e 0.0 0.5 1.0 1.5 2.0

Normalized # of Writes

Unsec WB WT SuperMem

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Unsec WB WT SuperMem

Normalized # of Writes

Array Queue B-tree Hash Table RB-tree 0.0 0.5 1.0 1.5 2.0

Normalized # of Writes

Unsec WB WT SuperMem

SuperMem reduces up to 50% of write requests by using the CWC scheme

Transaction size: 256B Transaction size: 1KB Transaction size: 4KB

slide-32
SLIDE 32

Conclusion

32

Problem Existing Work Our Solution Memory encryption incurs crash inconsistency issue Using a write-back counter cache Large battery backup, software-level modification, or error correction SuperMem: exploit a write-through counter cache Large battery backup, software-level modification, error correction Counter write coalescing for reducing writes Cross-bank counter storage for speeding up writes

slide-33
SLIDE 33

Thanks! Q&A