Electrical and Computer Engineering
University of Pittsburgh
ASSURE
Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories Non-Volatile Memories Workshop
March 12, 2018 San Diego, CA
Joydeep Rakshit Kartik Mohanram
ASSURE Authentication Scheme for SecURE Energy Efficient - - PowerPoint PPT Presentation
ASSURE Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories Joydeep Rakshit Kartik Mohanram Non-Volatile Memories Workshop March 12, 2018 San Diego, CA Electrical and Computer Engineering University of Pittsburgh
Electrical and Computer Engineering
Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories Non-Volatile Memories Workshop
March 12, 2018 San Diego, CA
Joydeep Rakshit Kartik Mohanram
Main memory requirements and DRAM drawbacks
Capacity: DRAM density hard to scale [1] Energy: High DRAM refresh power due to leakage [2-8]
PCM and RRAM: Emerging NVMs [2-8]
Better scalability High data density (MLC – 2 bits/cell, TLC – 3 bits/cell) Data persistence – no refresh power
[1] International Technology Roadmap for Semiconductors, 2011 [2] M.K.Qureshi et al., “Scalable high performance main memory system using phase-change memory technology”, ISCA, 2009 [3] B. C. Lee et al., “Phase change technology and the future of main memory,” IEEE Micro, 2010 [4] A. Ferreira et al., “Increasing PCM main memory lifetime,” DATE, 2010 [5] S. Sheu et al., “Fast-write resistive RAM (RRAM) for embedded applications,” IEEE Design and Test of Computers, 2011 [6] S. Bock et al., “Analyzing the impact of useless write-backs on the endurance and energy consumption of PCM main memory,” ISPASS, 2011 [7] L. Jiang et al., “Improving write operations in MLC phase change memory,” HPCA, 2012 [8] C. Xu et al., “Understanding the trade-offs in multi-level cell ReRAM memory design,” DAC, 2013
Main memory requirements and DRAM drawbacks
Capacity: DRAM density hard to scale [1] Energy: High DRAM refresh power due to leakage [2-8]
PCM and RRAM: Emerging NVMs [2-8]
Better scalability High data density (MLC – 2 bits/cell, TLC – 3 bits/cell) Data persistence – no refresh power Low endurance High write energy/latency
[1] International Technology Roadmap for Semiconductors, 2011 [2] M.K.Qureshi et al., “Scalable high performance main memory system using phase-change memory technology”, ISCA, 2009 [3] B. C. Lee et al., “Phase change technology and the future of main memory,” IEEE Micro, 2010 [4] A. Ferreira et al., “Increasing PCM main memory lifetime,” DATE, 2010 [5] S. Sheu et al., “Fast-write resistive RAM (RRAM) for embedded applications,” IEEE Design and Test of Computers, 2011 [6] S. Bock et al., “Analyzing the impact of useless write-backs on the endurance and energy consumption of PCM main memory,” ISPASS, 2011 [7] L. Jiang et al., “Improving write operations in MLC phase change memory,” HPCA, 2012 [8] C. Xu et al., “Understanding the trade-offs in multi-level cell ReRAM memory design,” DAC, 2013
PCM and RRAM: Emerging NVMs
Better scalability High data density (MLC – 2 bits/cell, TLC – 3 bits/cell) Data persistence – no refresh power Low endurance High write energy/latency
[1] B. Young et al., “A low power phase change random access memory using a data-comparison write scheme,” ISCS, 2007 [2] S. Cho et al., “Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance,” MICRO, 2009 [3] P. Palangappa et al., “Compex: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM”, HPCA, 2016 [4] M. Qureshi et al., “Enhancing lifetime and security of PCM-based main memory with Start-Gap wear leveling,” MICRO, 2009 [5] S. Schechter et al., “Use ECP, not ECC, for hard failures in resistive memories”, ISCA, 2010 [6] R. Wang et al., “SD-PCM: Constructing reliable super dense Phase Change Memory under write disturbance”, ASPLOS 2015 [7] L. Jiang et al., “Improving write operations in MLC phase change memory”, HPCA, 2012 [8] X. Zhang et al., “TriState-SET: Proactive SET for improved performance of MLC phase change memories”, ICCD, 2015 [9] J.Li et al., “Write-once-memory-code phase change memory”, DATE, 2014
Architecture based solutions
PCM and RRAM: Emerging NVMs
Better scalability High data density (MLC – 2 bits/cell, TLC – 3 bits/cell) Data persistence – no refresh power Low endurance High write energy/latency Security vulnerabilities [1-5]
[1] J. Cong et al., “Improving privacy and lifetime of PCM-based main memory,” DSN, 2010 [2] S. Chhabra and Y. Solihin, “i-NVMM: A secure non-volatile main memory system with incremental encryption,” ISCA, 2011 [3] V. Young et al., “DEUCE: Write-efficient encryption for non-volatile memories,” ASPLOS, 2015 [4] A. Awad et al., “Silent Shredder: Zero-cost shredding for secure non-volatile main memory controllers”, ASPLOS 2016 [5] S. Swami et al., “SECRET: Smartly EnCRypted energy EfficienT non-volatile memories”, DAC, 2016
Cornerstones of secure platform [1]
Confidentiality Integrity Availability
Credit: http://www.cybersafesolutions.com/wp-content/uploads/2016/08/CSS_ThreatPolicies_CIAgraphic.jpg
[1] R. B. Lee, “Security basics for computer architects,” Synthesis Lectures on Computer Architecture, 2013
Cornerstones of secure platform
Confidentiality
Encryption: Energy Lifetime Solution: Efficient NVM encryption
BLE, i-NVMM, DEUCE, Silent Shredder, SECRET [1-5]
Integrity Availability
Credit: http://www.cybersafesolutions.com/wp-content/uploads/2016/08/CSS_ThreatPolicies_CIAgraphic.jpg
[1] J. Cong et al., “Improving privacy and lifetime of PCM-based main memory,” DSN, 2010 [2] S. Chhabra and Y. Solihin, “i-NVMM: A secure non-volatile main memory system with incremental encryption,” ISCA, 2011 [3] V. Young et al., “DEUCE: Write-efficient encryption for non-volatile memories,” ASPLOS, 2015 [4] A. Awad et al., “Silent Shredder: Zero-cost shredding for secure non-volatile main memory controllers”, ASPLOS 2016 [5] S. Swami et al., “SECRET: Smartly EnCRypted energy EfficienT non-volatile memories”, DAC, 2016
Cornerstones of secure platform
Confidentiality
Integrity
Authentication: Energy Lifetime Memory access Solution: ASSURE [1]
Availability
Credit: http://www.cybersafesolutions.com/wp-content/uploads/2016/08/CSS_ThreatPolicies_CIAgraphic.jpg
[1] J. Rakshit and K.Mohanram, “ASSURE: Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories”, DAC, 2017
Cornerstones of secure platform
Confidentiality Integrity
Availability
Exploiting low endurance [1-3]
[1] M. Qureshi et al., “Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling”, MICRO, 2009 [2] N.H. Seong et al., “Security Refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping”, ISCA, 2010 [3] F. Huang et al., “Security RBSG: Protecting phase change memory with security-level adjustable dynamic mapping”, PDPS, 2016.
Cornerstones of secure platform
Confidentiality Integrity Availability
Threat model
Trusted Computing Base (TCB)
Cornerstones of secure platform
Confidentiality
Integrity
Availability
Threat model
Trusted Computing Base (TCB) [1-4]
Processor chip: Processor core, registers, caches, etc… Critical parts of OS
Secure
[1] R. B. Lee, “Security basics for computer architects,” Synthesis Lectures on Computer Architecture, 2013 [2] G. E. Suh et al., “Efficient memory integrity verification and encryption for secure processors,” MICRO, 2003 [3] B. Rogers et al., “Using address independent seed encryption and Bonsai Merkle Trees to make secure processors OS-and performance-friendly”, MICRO, 2007 [4] A. D. Hilton et al., “PoisonIvy: Safe speculation for secure memory,” in MICRO, 2016
Cornerstones of secure platform
Confidentiality
Integrity
Availability
Threat model
Trusted Computing Base (TCB) [1-4]
Processor chip: Processor core, registers, caches, etc… Critical parts of OS Off-chip resources: Memory, buses, etc.
Secure
[1] R. B. Lee, “Security basics for computer architects,” Synthesis Lectures on Computer Architecture, 2013 [2] G. E. Suh et al., “Efficient memory integrity verification and encryption for secure processors,” MICRO, 2003 [3] B. Rogers et al., “Using address independent seed encryption and Bonsai Merkle Trees to make secure processors OS-and performance-friendly”, MICRO, 2007 [4] A. D. Hilton et al., “PoisonIvy: Safe speculation for secure memory,” in MICRO, 2016
Unsecure
Memory data integrity: Attacks and defenses Spoofing
A B C D
Memory data integrity: Attacks and defenses Spoofing
Attacker changes data at a particular memory location
A X C D A B C D
Memory data integrity: Attacks and defenses
Spoofing
Splicing
A D C B
Attacker swaps data between 2 memory locations
A B C D
Memory data integrity: Attacks and defenses
Spoofing Splicing
Replay
Time
t1 t2
W B Y Z
Attacker replays data; replaces new data with older versions
A B C D
Memory data integrity: Authentication
Spoofing Splicing Replay
Data
DH DH DH DH DH DH
DHROOT Secure Processor
DH = HMACK (D) Secret key
HMAC: Hashed Message Authentication Code
Memory data integrity: Authentication
Spoofing Splicing Replay
Data
DH DH DH DH DH DH
DHROOT Secure Processor
Merkle Tree (MT): Data structure constructed by recursive hashing, culminates in a root stored on secure processor.
spliced, or replayed, hence data is tamper-proof.
DH = HMACK (D) Secret key
HMAC: Hashed Message Authentication Code
Counter mode encryption [1, 2, 3, 4]
Implemented in secure processor Low decryption latency during read Secure Processor
One-time Pad (OTP) Line Address Counter Plaintext Ciphertext Key Encryption on write Decryption on read Block cipher Ciphertext
[1] J. Yang et al., “Improving memory encryption performance in secure processors”, IEEE Trans. Computers, 2005 [2] W. Enck et al.,“Defending against attacks on main memory persistence”, ACSAC, 2008 [3] J. Kong and and H. Zhou, “Improving privacy and lifetime of PCM-based main memory”, DSN, 2010 [4] V. Young et al., “DEUCE: Write-efficient encryption for non-volatile memories”, ASPLOS, 2015
Memory data integrity: State-of-the-art
Spoofing Splicing Replay
Encrypted Data Encryption Counters
DH DH DH DH CH CH CH CH CH CH
CHROOT Secure Processor
Bonsai Merkle Tree (BMT) [1]
[1] B. Rogers et al., “Using address independent seed encryption and Bonsai Merkle Trees to make secure processors OS- and performance-friendly”, MICRO, 2007
DH CH = HMAC (Encrypted data || Address || Ctr) = HMAC (Ctr)
Concatenation
Memory authentication overheads
HMAC
High entropy High cell writes
Merkle Tree
HMAC node fetch/update Additional memory accesses
1 1 1
0.65 5.3 5.8
1 2 3 4 5 6 7
IPC NVM Energy Cell Writes Encrypted+Authenticated Encrypted
Memory authentication overheads
Higher aggregate read/write energy Lower lifetime Lower system performance
Objective: Design tamper-proof NVMs with
Lower authentication-related cell updates for improved lifetime Lower authentication-related NVM energy Lower effective authentication latency for better system performance
Objective: Design tamper-proof NVMs with
Lower authentication-related cell updates for improved lifetime Lower authentication-related NVM energy Lower effective authentication latency for better system performance
ASSURE: Authentication Scheme for SecURE Energy Efficient Non-Volatile Memories
ASSURE integrates
Smart message authentication codes (SMACs)
Reduce cell writes pertaining to HMAC updates
ASSURE integrates
Smart message authentication codes (SMACs)
Reduce cell writes pertaining to HMAC updates
Multi-root Merkle Trees (MMTs)
Reduce authentication-related memory accesses Reduce effective authentication latency and energy
Observation
On a write back, only a few words are modified within a cache line [1,2]
State-of-the art NVM encryption re-encrypts only modified words
Nominal HMAC over entire memory line (modified + unmodified words)
[1] V. Young et al., “DEUCE: Write-efficient encryption for non-volatile memories”, ASPLOS, 2015 [2] S. Swami et al., “SECRET: Smartly EnCRypted Energy EfficienT Non-Volatile Memories”, DAC, 2016
Observation
On a write back, only a few words are modified within a cache line [1,2]
State-of-the art NVM encryption re-encrypts only modified words
Nominal HMAC over entire memory line (modified + unmodified words)
SMAC
Update sections of HMAC corresponding to modified words
[1] V. Young et al., “DEUCE: Write-efficient encryption for non-volatile memories”, ASPLOS, 2015 [2] S. Swami et al., “SECRET: Smartly EnCRypted Energy EfficienT Non-Volatile Memories”, DAC, 2016
SMAC
Update sections of HMAC corresponding to modified words
1 0 0 0 0 1 8 7 5 6 7 8 9 A B C D E F 4 1 0 0 0
Word 1 Word 2 Word 3 Word 4 Modbits Modbits
Original Encrypted Cache Line
Time
Write 1
SMAC
Update sections of HMAC corresponding to modified words
1 0 0 0 0 1 8 7 5 6 7 8 9 A B C D E F 4 1 0 0 0 0 1 8 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 6 7 8 9 A B C D E F 4 0 0 0 0
Word 1 Word 2 Word 3 Word 4 Modbits Modbits
6 8 9 A 7 2 1 4 6 2 1 4
IH1 IH2
Original Encrypted Cache Line Intermediate Message 1 (IM1) Intermediate Message 2 (IM2) Intermediate HMACs (IHs) Final HMAC (FH)
Time
Write 1
SMAC
Update sections of HMAC corresponding to modified words
1 0 0 0 0 1 8 7 5 6 7 8 9 A B C D E F 4 1 0 0 0 0 1 8 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 6 7 8 9 A B C D E F 4 0 0 0 0
Word 1 Word 2 Word 3 Word 4 Modbits Modbits
6 8 9 A 7 2 1 4 6 2 1 4
IH1 IH2
Original Encrypted Cache Line Intermediate Message 1 (IM1) Intermediate Message 2 (IM2) Intermediate HMACs (IHs) Final HMAC (FH)
6 4 0 F 5 6 7 8 9 A B C D E F 4 1 0 0 0 1 0 0 0 Modbits
Word 1 Word 2 Word 3 Word 4 Modbits
Time
Write 1 Write 2
SMAC
Update sections of HMAC corresponding to modified words
1 0 0 0 0 1 8 7 5 6 7 8 9 A B C D E F 4 1 0 0 0 0 1 8 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 6 7 8 9 A B C D E F 4 0 0 0 0
Word 1 Word 2 Word 3 Word 4 Modbits Modbits
6 8 9 A 7 2 1 4 6 2 1 4
IH1 IH2
Original Encrypted Cache Line Intermediate Message 1 (IM1) Intermediate Message 2 (IM2) Intermediate HMACs (IHs) Final HMAC (FH)
6 4 0 F 5 6 7 8 9 A B C D E F 4 1 0 0 0 6 4 0 F 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 6 7 8 9 A B C D E F 4 0 0 0 0 A 5 1 3 7 2 1 4 A 2 1 4
IH1 IH2 Unchanged words of FH
1 0 0 0 Modbits
Word 1 Word 2 Word 3 Word 4 Modbits
Time
Write 1 Write 2
Observation
The effective authentication latency is dominated by the MT accesses Smaller MT leads to lower memory accesses
Secure processor Secure root
M20 M21 R L0 L1 L2 L3 L4 L5 L6 L7 M10 M11 M12 M13
Secure processor
Observation
The effective authentication latency is dominated by the MT accesses Smaller MT leads to lower memory accesses
Multi-root Merkle Tree (MMT)
Replace single MT with multiple smaller MTs covering the memory
Static multi-root Merkle Tree (SMMT)
Partition memory in memory block groups (MBGs) Statically assign an MT to each MBG, and store the roots on secure processor
Secure processor Secure root
M20 M21 R L0 L1 L2 L3 L4 L5 L6 L7 M10 M11 M12 M13
Counter read/updated Nodes traversed for authentication
Static multi-root Merkle Tree (SMMT)
Partition memory in memory block groups (MBGs) Statically assign an MT to each MBG, and store the roots on secure processor
Secure roots Secure processor G0 G1
L0 L1 L2 L3 L4 L5 L6 L7 M10 M11 M12 M13 R0 R1
Counter read/updated Nodes traversed for authentication
Static multi-root Merkle Tree (SMMT)
Partition memory in memory block groups (MBGs) Statically assign an MT to each MBG, and store the roots on secure processor
Disadvantages
Linear scaling of on-chip storage for the roots with number of MBGs
Secure roots Secure processor G0 G1
L0 L1 L2 L3 L4 L5 L6 L7 M10 M11 M12 M13 R0 R1
Secure roots G2 G3
L8 L9 L10 L11 L12 L13 L14 L15 M14 M15 M16 M17 M20 M21 M22 M23 M30 L0 L3 L4 L7 M10 M11 M12 M13 M31
G1
L5 L6
G0
L1 L2
Secure processor
RHOT RCOLD
G0 : Hot MBG RHOT : M20 Hot MT
Dynamic multi-root Merkle Tree (DMMT)
Leverage spatial and temporal locality of memory accesses Dynamically determine hot block after PPRED accesses Assign a smaller MT to the hot block, and a larger MT covering all cold blocks Only 2 roots on secure processor
Secure roots G2 G3
L8 L9 L10 L11 L12 L13 L14 L15 M14 M15 M16 M17 M20 M21 M22 M23 M30 L0 L3 L4 L7 M10 M11 M12 M13 M31
G1
L5 L6
G0
L1 L2
Secure processor
RHOT RCOLD
G0 : Hot MBG RHOT : M20 Hot MT
Nodes traversed for access to hot MBG
Dynamic multi-root Merkle Tree (DMMT)
Leverage spatial and temporal locality of memory accesses Dynamically determine hot block after PPRED accesses Assign a smaller MT to the hot block, and a larger MT covering all cold blocks Only 2 roots on secure processor
Secure roots G2 G3
L8 L9 L10 L11 L12 L13 L14 L15 M14 M15 M16 M17 M20 M21 M22 M23 M30 L0 L3 L4 L7 M10 M11 M12 M13 M31
G1
L5 L6
G0
L1 L2
Secure processor
RHOT RCOLD
G0 : Hot MBG RHOT : M20 Hot MT
Dynamic multi-root Merkle Tree (DMMT)
Leverage spatial and temporal locality of memory accesses Dynamically determine hot block after PPRED accesses Assign a smaller MT to the hot block, and a larger MT covering all cold blocks Only 2 roots on secure processor
Nodes traversed for access to hot MBG Nodes traversed for access to cold MBG
Secure roots G2 G3
L8 L9 L10 L11 L12 L13 L14 L15 M14 M15 M16 M17 M20 M21 M22 M23 M30 L0 L3 L4 L7 M10 M11 M12 M13 M31
G1
L5 L6
G0
L1 L2
Secure processor
RHOT RCOLD
G0 : Hot MBG RHOT : M20 Hot MT
Dynamic multi-root Merkle Tree (DMMT)
Hot block update: Hot MBG changed from G0 to G2
Update M20 with RHOT; update corresponding branch till RCOLD
Secure processor
Dynamic multi-root Merkle Tree (DMMT)
Hot block update: Hot MBG changed from G0 to G2
Update M20 with RHOT; update corresponding branch till RCOLD Fetch and verify M22; store as RHOT
Secure roots G2 G3
L8 L9 L10 L11 L12 L13 L14 L15 M14 M15 M16 M17 M20 M21 M22 M23 M30 L0 L3 L4 L7 M10 M11 M12 M13 M31
G1
L5 L6
G0
L1 L2
Secure processor
RHOT RCOLD
G0 : Hot MBG RHOT : M22 Hot MT
AccessCount RAM n×k-bit
+1 MaxCount NextHot
Group index (Gi)
CurrentHot
k-bit AccessCounter
RESET
WREN
k k k k m m m
n : Number of MBGs m : log2n k : log2PPRED WREN : Write Enable
Fetch new HotBlock ACOUNT
k Comparator
WREN WREN GT WR RD
Hot block prediction architecture
Memory system evaluation (TLC RRAM)
Trace driven simulation; NVM energy, lifetime evaluations SPEC CPU2006 [1] memory traces, Intel PIN toolset [2] Simulator: NVMain [3]
Full system evaluation
System performance (IPC) Simulator: MARSSx86 [4]
Evaluated techniques
Bonsai Merkle Tree (baseline) SMMT ASSURE DMMT ASSURE
[1] J. L. Henning , “SPEC CPU2006 benchmark descriptions”, ACM SIGARCH, 2006 [2] C. K. Luk et al.,“Pin: Building customized program analysis tools with dynamic instrumentation”, CPLDI, 2005 [3] M. Poremba et al., “NVMain: An architectural-level main memory simulator for emerging non-volatile memories ”, Annual Symposium on VLSI, 2012 [4] A. Patel et al., “MARSS: a full system simulator for multicore x86 CPUs”, DAC, 2011
BMT SMMT ASSURE DMMT ASSURE NVM Energy 1 0.41 0.45 Memory Lifetime 1 2.36 2.11 System performance 1 1.11 1.10 On-chip memory 1 12.8n n
Results: Summary (normalized to baseline)
BMT SMMT ASSURE DMMT ASSURE NVM Energy 1 0.41 0.45 Memory Lifetime 1 2.36 2.11 System performance 1 1.11 1.10 On-chip memory 1 12.8n n
Results: Summary (normalized to baseline)
BMT SMMT ASSURE DMMT ASSURE NVM Energy 1 0.41 0.45 Memory Lifetime 1 2.36 2.11 System performance 1 1.11 1.10 On-chip memory 1 12.8n n
Results: Summary (normalized to baseline)
BMT SMMT ASSURE DMMT ASSURE NVM Energy 1 0.41 0.45 Memory Lifetime 1 2.36 2.11 System performance 1 1.11 1.10 On-chip memory 1 12.8n n
Results: Summary (normalized to baseline)
NVM authentication
Increases NVM energy Reduces memory lifetime and system performance
Solution: ASSURE
Prevents redundant HMAC computation – SMAC Reduces MT overhead – MMT Preserves security of state-of-the-art BMT authentication