Hardware Acceleration of Cryptography Patrick Schaumont Professor - PowerPoint PPT Presentation

Hardware Acceleration of Cryptography Patrick Schaumont Professor Bradley Department of ECE Virginia Tech Patrick Schaumont (VT)

Outline 1. Fundamentals of Parallelism 2. Embedded Architecture of MSP430, MSP432 3. Hardware Acceleration in Embedded Architectures 4. AES Hardware Accelerator 5. Direct Memory Access 6. Power Dissipation 7. Literature This lecture is about: • Accelerators in microcontrollers • Embedded computing • Efficient crypto (fast, low energy) This lecture is NOT about: • FPGA • Multi‐core • High‐speed crypto Patrick Schaumont (VT) 2

Outline 1. Fundamentals of Parallelism 2. Embedded Architecture of MSP430, MSP432 3. Hardware Acceleration in Embedded Architectures 4. AES Hardware Accelerator 5. Direct Memory Access 6. Power Dissipation 7. Literature Patrick Schaumont (VT) 3

Sequential, Concurrent and Parallel • Consider three tasks and two processors A B v1 v2 C 1 2 Patrick Schaumont (VT) 4

Sequential, Concurrent and Parallel • Sequentially running A, B and C on Proc 1 v1 Proc 1 v2 A B A B C v1 v2 time C Proc 2 time 1 2 v1 and v2 are stored in Proc 1's memory Patrick Schaumont (VT) 5

Sequential, Concurrent and Parallel • In Parallel running A on Proc 1 and B on Proc 2 v1 Proc 1 A B A C v1 v2 time v2 C Proc 2 B time 1 2 v1 is stored in Proc 1's memory v2 is communicated from Proc2 to Proc1 Patrick Schaumont (VT) 6

Sequential, Concurrent and Parallel • Concurrently running A and B on Proc 1, C on Proc 2 Proc 1 A B A + B v1 v2 time v1, v2 C Proc 2 C time 1 2 v1 and v2 are communicated from Proc1 to Proc2 There are many concurrency mechanisms: threading, hyperthreading, SMT, TMT, .. Patrick Schaumont (VT) 7

Control and Data Dependency A Data Dependency is a relation between two operations such that the result of one operation is used by the next A Control Dependency is a relation between two operations such that one operation must execute after the other A → B → C Patrick Schaumont (VT) 8

Control and Data Dependency A Data Dependency is a relation between two operations such that the result of one operation is used by the next A Data Dependency is a fundamental property of the application A Control Dependency is a relation between two operations such that one operation must execute after the other A Control Dependency is caused by a resource constraint Patrick Schaumont (VT) 9

Software and Hardware Software is sequential/concurrent Hardware is parallel PUSH {A4, V1, V2, LR} MOVS A3, #0 MOVW A4, sbox+0 MOVT A4, sbox+0 MOVS A2, #0 ADD V1, A1, A2, LSL #2 LDRB V2, [A3, +V1] SBOX SBOX SBOX SBOX LDRB V2, [A4, +V2] ADDS A2, A2, #1 UXTB A2, A2 CMP A2, #4 STRB V2, [A3, +V1] BLT ||$C$L7|| ADDS A3, A3, #1 UXTB A3, A3 CMP A3, #4 BLT ||$C$L6|| POP {A4, V1, V2, PC} As parallel as allowed by resource constraints Maximally parallel Many control dependencies! Ideally, only data dependencies Patrick Schaumont (VT) 10

Hardware acceleration Sequential Software Parallel Hardware Time T SW T HW If T HW < T SW , overall performance may improve Patrick Schaumont (VT) 11

Hardware acceleration Sequential Software Parallel Hardware Data Time Dependencies T SW T COM T HW Data Dependencies Better : If (T HW + T COM ) < T SW , overall performance may improve Patrick Schaumont (VT) 12

Speedup Sequential Software Parallel Hardware T SW T COM T HW T SW Speedup =? T COM + T HW Patrick Schaumont (VT) 13

Speedup Sequential Software Parallel Hardware T SW T COM T HW T TOTAL T TOTAL Better: Speedup =? T TOTAL ‐ (T SW ‐ (T COM + T HW )) Patrick Schaumont (VT) 14

Speedup Sequential Software Parallel Hardware T SW T COM T HW T TOTAL 1 Speedup = (1 ‐ p) + p/s with p = (T TOTAL / T SW ) ~ parallelizable portion s = T SW / (T COM + T HW ) ~ acceleration Patrick Schaumont (VT) 15

Speedup Sequential Software Parallel Hardware T SW T COM T HW T TOTAL 1 Speedup = (1 ‐ p) + p/s About time! with p = (T TOTAL / T SW ) ~ parallelizable portion s = T SW / (T COM + T HW ) ~ acceleration Patrick Schaumont (VT) 16

Target Platform • MSP432P401R • MSP430FR5994 • ARM Cortex M4 • MSP430 (16 bit) • 256k Flash/64K SRAM • 256K FRAM/ 8KSRAM • $20 • $17 Patrick Schaumont (VT) 18

MSP432P401R Patrick Schaumont (VT) 19

MSP432P401R BUS CPU Memory Program (read) Data (read/write) Patrick Schaumont (VT) 20

MSP432P401R Direct Memory Access CRYPTO Accelerator Patrick Schaumont (VT) 21

MSP432P401R Bus Slaves Bus Masters Patrick Schaumont (VT) 22

MSP430FR5994 Patrick Schaumont (VT) 23

MSP430FR5994 DMA BUS CPU Memory CRYPTO Program (read) Accelerator Data (read/write) Patrick Schaumont (VT) 24

Memory Map (MSP430FR5994) 43FFF • Unified view of all bus slaves in a single memory space 256K FRAM • FRAM, SRAM store program and variables 4000 3BFF • Peripherals contain 8K SRAM memory-mapped registers 1C00 FFF Peripherals 020 Patrick Schaumont (VT) 25

Memory Map (MSP430FR5994) 43FFF • Unified view of all bus slaves in a single memory space 256K FRAM • FRAM, SRAM store program and variables 4000 3BFF • Peripherals contain 8K SRAM memory-mapped registers 1C00 Software Hardware ... BUS FFF MOV.B #1,&P1OUT 1 Peripherals ... P1OUT 020 P1OUT Patrick Schaumont (VT) 26

Example: LED Blinker #include <msp430.h> int main(void) { WDTCTL = WDTPW | WDTHOLD; P1OUT &= ~BIT0; P1IN P1DIR |= BIT0; while (1) { P1OUT ^= BIT0; __delay_cycles(100000); } } P1OUT // Assembly BIC.B #1,&P1OUT+0 OR.B #1,&P1DIR+0 XOR.B #1,&P1OUT+0 PUSHM.A #1, r13 P1DIR MOV.W #33330, r13 SUB.W #1, r13 JNE $1 POPM.A #1, r13 JMP $C$L4 Patrick Schaumont (VT) 27

Hardware Acceleration • Let's design a multiplier peripheral (MSP430) • Our benchmark program unsigned long mymul( unsigned a, unsigned b) { unsigned long r; r = ( unsigned long ) a * b; return r; } volatile int arg1 = 5, arg2 = 3; int main() { return mymul(arg1, arg2); } Patrick Schaumont (VT) 29

Hardware Acceleration • Let's design a multiplier peripheral (MSP430) • Our benchmark program main : MOV.W #5,0(SP) return address MOV.W #3,2(SP) MOV.W 0(SP),r12 b arguments MOV.W 2(SP),r13 a CALLA #mymul rhi mymul : rlo SP SUBA #8,SP MOV.W r13,6(SP) MOV.W r12,4(SP) stack frame CALLA #__mspabi_mpyul just before MOV.W r12,0(SP) ADDA #8, SP MOV.W r13,2(SP) ADDA #8,SP RETA Patrick Schaumont (VT) 30

Hardware Acceleration • Our benchmark program __mspabi_mpyul : // library function MOV.W R12 ,R11 MOV.W R13 ,R14 CLR.W R15 main : CLR.W R12 MOV.W #5,0(SP) CLR.W R13 MOV.W #3,2(SP) CLRC MOV.W 0(SP),r12 arguments RRC.W R11 MOV.W 2(SP),r13 JMP mpyul_add_loop1 CALLA #mymul RRA.W R11 mpyul_add_loop: mymul : JNC shift_test_mpyul SUBA #8,SP mpyul_add_loop1: MOV.W r13,6(SP) ADD.W R14, R12 MOV.W r12,4(SP) ADDC.W R15, R13 CALLA #__mspabi_mpyul RLA.W R14 MOV.W r12,0(SP) shift_test_mpyul: MOV.W r13,2(SP) RLC.W R15 ADDA #8,SP TST.W R11 RETA JNE mpyul_add_loop RET Patrick Schaumont (VT) 31

Hardware Acceleration • Our accelerated program data+control dependency main : MOV.W #5,0(SP) MOV.W #3,2(SP) MOV.W 0(SP),r12 arguments MOV.W 2(SP),r13 CALLA #myhwmul Hardware myhwmul : Multiplier MOV.W r12,&HW1 MOV.W r13,&HW2 software MOV.W #0,&CTL1 HAL MOV.W #1,&CTL1 MOV.W &HW3,r12 data dependency MOV.W &HW4,r13 RETA Patrick Schaumont (VT) 32

Hardware Multiplier System Interconnect per_addr 16 or 20 per_din to 16 per_we other bus slaves per_en bus interface a ctl b 1‐cycle CPU mul rlo rhi from 16 per_dout other 16 bus slaves 16 Patrick Schaumont (VT) 33

Hardware Multiplier Peripheral Design per_din per_din[0] per_din 16 1 16 write_c write_a write_b 16 per_we per_ad per_en 16 16 1 16 1 a ctl b 2 16 16 16 1 address decoding * 16 32 write_c write_a write_b 16 read_r 16 edge edg edg 16 16 edg rlo rhi 0 read_r 2 per_dout Patrick Schaumont (VT) 34

Hardware Acceleration of Cryptography Patrick Schaumont Professor - PowerPoint PPT Presentation

Hardware Acceleration of Cryptography Patrick Schaumont Professor Bradley Department of ECE Virginia Tech Patrick Schaumont (VT) Outline 1. Fundamentals of Parallelism 2. Embedded Architecture of MSP430, MSP432 3. Hardware Acceleration

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Elliptic Curve Cryptography Applications of Elliptic Curve Cryptography Elliptic Curve

Cryptography Concepts and Terminology Cryptography Concepts Cryptography Notation and

Cryptography Concepts and Terminology Cryptography Concepts Cryptography Notation and

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

Public-Key Cryptography Public-Key Cryptography Lecture 9 Public-Key Cryptography Lecture 9 El

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Modern cryptography CSCI 470: Web Science Keith Vertanen Overview Modern cryptography

Public Key Cryptography Cryptography School of Engineering and Technology CQUniversity Australia

Public-Key Cryptography Public-Key Cryptography Lecture 8 Public-Key Cryptography Lecture 8

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Compact Binary Coalescence and the BMS Group Abhay Ashtekar Institute for Gravitation and the

Two New Features in Discrete Choice Experiments to Improve Willingness to Pay Estimation that

Coin Branch and Cut A tutorial J ohn Forrest J uly 18 2006 Outline of Cbc tutorial

On the Exact Security of Message Authentication using Pseudorandom Functions Fast Software

CAR E T E AMWORK QU A LITY RE S PECT HONES T Y EEAST Profile Covering 7,500 square miles

NFCGate Steffen Klee, Alexandros Roussos, Max Maass, Matthias Hollick Opening the Door for NFC

GCM-SIV: Full Nonce Mis isuse-Resistant Authenticated Encry ryption at t Under One Cycle per

Storing passwords in Linux Stored in /etc/shadow seed:$6$5MfvmFOaDU$CVt7:14400:0:99999:7::