Design Automation for Cryptography Anupam Chattopadhyay Assistant - - PowerPoint PPT Presentation

design automation for cryptography
SMART_READER_LITE
LIVE PREVIEW

Design Automation for Cryptography Anupam Chattopadhyay Assistant - - PowerPoint PPT Presentation

Design Automation for Cryptography Anupam Chattopadhyay Assistant Professor, School of Computer Science and Engineering School of Physical and Mathematical Sciences, Nanyang Technological University June 7, 2017 Motivation Security not a


slide-1
SLIDE 1

Design Automation for Cryptography

Anupam Chattopadhyay

Assistant Professor, School of Computer Science and Engineering School of Physical and Mathematical Sciences, Nanyang Technological University June 7, 2017

slide-2
SLIDE 2

Motivation

  • Security not a feature but a design metric
  • Crytography is highly dynamic

Cryptanalysis Custom cryptanalysis] Lightweight cryptography

96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

AES eSTREAM SHA-3 NESSIE CRYPTEC

15 Year

Block ciphers Stream ciphers Hash functions AE PHC CAESAR All proposals attacked !

42 35 59 24 58

▪ Timeline of cryptgraphic competitions

slide-3
SLIDE 3

Motivation

  • Design metrics
  • Security kerenels developer has a huge design space

Variety of constraints Area footprint, power utilization, latency, operating frequency, cost, ... Variety of requirements Throughput, security, thermal limitations, distribution, scalability, flexibility, ... Variety of target platforms GPPs, DSP, GPUs, ASICs, ASIPs, FPGAs, CPLDs, Microcontrollers, ... Architectural customization Wordsize, instruction set, Memory, microarchitectural template, Interfaces, ...

slide-4
SLIDE 4
  • Custom Optimization Examples
  • Domain-specific High Level Synthesis
  • Fault-resistant Design by Physical Synthesis

Contents

slide-5
SLIDE 5

HC-128: Parallelization by State Splitting

  • P and Q has 512 words of 32-bit
  • 5 reads, 1 write
  • A. Khalid, et al. One Word/Cycle HC-128 Accelerator via State-Splitting Optimization, in INDOCRYPT 2014

Design2: Even

  • dd splitting

P0 Q0

Update P & Key Gen.

P0 P1 Q0 Q1 P0 P1 P2 P3 Q0 Q1 Q2 Q3

Design3: 4-way splitting Design1

slide-6
SLIDE 6

HC-128: Parallelization by State Splitting

Pipeline for Design3

Faraday 65m Standard Cell library, typical case

slide-7
SLIDE 7

AES: Technology Mapping

  • The AES MixColumns: matrix multiplication operation of the AES state byte matrix by a

constant matrix given by

  • The smallest circuit in literature requires 108 XOR gates to implement this.
  • This function is four instances of the following equation over :

 41 LUTs using the LUT6 FPGA technology.

  • Instead, we view the operation as a Boolean function rather than over and we optimize it

towards an implementation of 36 LUTs.

  • Inverse MixColumns similarly can be reduced from 72 to 60 LUTs.

Joint work with Mustafa Khairallah and Thomas Peyrin, unpublished

slide-8
SLIDE 8

FPGA-Aware Pipelining

Joint work with Mustafa Khairallah and Thomas Peyrin, unpublished

Logic-aware Partitioning FPGA-aware Partitioning

slide-9
SLIDE 9

High-Level Synthesis

  • Focuses on algorithm to RTL flow

× Dependent on user proficiency, varies widely from tool to tool × Unaware of technology platforms × Hard to reuse design knowledge × Storage allocation optimizations missing

Domain-specialization?

slide-10
SLIDE 10

Berkeley Dwarfs for Parallel Computing[1]

  • How apps relate to 13 dwarfs (Red Hot  Blue Cool)

[1] The Landscape of Parallel Computing Research: A View from Berkeley, by K. Asanovic et al , Technical Report, 2006

E m b ed S P E C D B G am es M L H P C Health Image Speech Music Brows 1Finite State Mach. 2Combinational 3Graph Traversal 4Structured Grid 5Dense Matrix 6Sparse Matrix 7Spectral (FFT) 8Dynamic Prog 9N-Body 10MapReduce 11Backtrack/ B&B 12Graphical Models 13Unstructured Grid

slide-11
SLIDE 11

Source: T. Noll, RWTH Aachen

SoC Processing Elements Configurable Programmable

103 . . . 104 DSP GPP Log P O W E R D I S S I P A T I O N 105 . . . 106 Log P E R F O R M A N C E

Log F L E X I B I L I T Y

FPGA ASIP

slide-12
SLIDE 12

Domain-specific High Level Synthesis: Lessons from Wireless Communication IP

slide-13
SLIDE 13
  • Custom Optimization Examples
  • Domain-specific High Level Synthesis
  • Fault-resistant Design by Physical Synthesis

Contents

slide-14
SLIDE 14

CRYKET: Overview

  • CRYKET (Cryptographic Kernels Toolkit): Domain specific HLS

– Language independent GUI based design capture – Domain specific expertise, well understood kernels

Algorithmic Specifications Architectural Specifications Test Vectors

CRYKET

CRYKET Library Synth Scripts Test Bench Verilog RTL ANSI C Model Verification Model

Logic Synthesis RTL Simulation System Simulation Software Integration System Validation

  • A. Khalid, et al. RAPID-FeinSPN: A Rapid Prototyping Framework for Feistel and SPN-Based Block Ciphers. ICISS 2013
slide-15
SLIDE 15

RunFein: Feistel and SPN Block Cipher

Plaintext Rearrange S1 S2 Sn ... P-Box Key Register Key Update Key Ki Plaintext Data Register S1 S2 Sn ... Rearrange Key Register Key Update Key layer 0 layer 1 layer 2 layer 3 layer 4 layer 0 layer 1 layer 2 layer 3

▪ Block/key/word sizes, rounds, mode of operation, test vectors ▪ Layers of operation: S/P-Box, Bitwise/ Arithmetic/Boolean/ Field operations,

compound popular cipher operations

  • A. Khalid, et al. RunFein: A Rapid Prototyping Framework for Feistel and SPN Based Block Ciphers, JCEN 2016

L Data R Data Ki GF Mul

Feistel Network cipher SPN Cipher

slide-16
SLIDE 16

RunFein: Fast Design Space Exploration

N times unrolled Loop folded No Unrolling

Data Register Round 1 Round 2

...

Round N

Plaintext

Data Register Round 1

Plaintext Sub-pipelined round

Data Register Round 1a Round 1b Round 1c

Plaintext

Data Register Data Register

Bit slicing N times unrolled with pipelining

Data Register Round 1

...

Round N

Plaintext

Data Register

  • A. Khalid, et al. RunFein: A Rapid Prototyping Framework for Feistel and SPN Based Block Ciphers, JCEN 2016
slide-17
SLIDE 17

RunFein: GUI

slide-18
SLIDE 18

Validate? Known Cipher? Enter configuration ANSI C Algorithmic Implementation, Verification environment, Profiling switches, Nodes library (.xml) Y Fail N Load configuration

RunFein Cipher Design Capture GUI

Known configurations PRESENT-80/ 128, AES-128, KLEIN-64/ 80/ 96, LED-64/128, SEA. TEA, XTEA, XXTEA, .... Algorithmic parameters (basic, round layers and kround layers) Microarchitecture Testvectors Cipher Model creation after validation of layers update Controller state

Controller Datapath

Pass Kernel library, Validation checks Generate Software Hardware Testvectors Verification, Throughput Profiling, NIST test Suite RTL/ gate level Simulation, Synthesis, Switching Power estimation (.c, .h, scripts) (.v, .h, scripts) d_state Layer 0 Layer 1 Layer n ... k_state Layer 0 Layer 1 Layer m ... HDL for Controller and Datapath, Testbench, Synthesis Scripts

slide-19
SLIDE 19

RunFein: PRESENT-80 Bitslicing

Faraday 65m Standard Cell library, typical case, Synopsys Design Compiler F-2011.09 Faraday 65m Standard Cell library, typical case, Operating frequency 100 KHz

slide-20
SLIDE 20

RunStream: Analysis and Results

Better

  • A. Khalid, et al. RunStream: A High-level Rapid Prototyping Framework for Stream Ciphers. In ACM TECS, 2016
slide-21
SLIDE 21
  • Custom Optimization Examples
  • Domain-specific High Level Synthesis
  • Fault-resistant Design by Physical Synthesis

Contents

slide-22
SLIDE 22

Preventing Differential Fault Analysis Attack

  • Attacker assumptions

– Ability to induce fault at a given time (𝑈) and space (𝑇) precision – Ability to infer/solve a system of equations based on the observed faulty (𝐷𝑈∗) and correct (𝐷𝑈) ciphertext

  • Prevention

– Redundancy, Concurrent Error Detection – Attack-specific mounted sensor

slide-23
SLIDE 23

DFARPA: Differential Fault Attack Resistant Physical Design Automation

  • Exemplary Fault Attack on AES1

– Multi-byte fault attack model – Fault is induced in at least one of the four diagonals in the AES state

  • Solution: Generate floorplan for the 16 blocks, so that the fault become un-exploitable in

presence of the fault cluster of radius 𝑠 units – Place the blocks(elements) of each diagonal at least 𝑠 units distance – Formulated as a constrained placement problem

  • 1. D. Saha, D. Mukhopadhyay, and D. RoyChowdhury. A diagonal fault attack on the advanced encryption
  • standard. Cryptology ePrint Archive, Report 2009/581, 2009. http://eprint.iacr.org/2009/581.

Joint work with Debdeep Mukhopadhyay and Shivam Bhasin, unpublished

D0 D1 D2 D3 D3 D0 D1 D2 D2 D3 D0 D1 D1 D2 D3 D0

slide-24
SLIDE 24

DFARPA: Reactive Countermeasure

  • The sensor is composed of two key components

– a watchdog ring oscillator (WRO) and a phase detection (PD) circuit.

  • High energy injections impact signal propagation delay, which disturbs the phase of WRO.

This phase change is detected by the PD circuit to raise an alarm and halt sensitive computation.

  • 2 Metal layers are reserved for WRO routing

Joint work with Debdeep Mukhopadhyay and Shivam Bhasin, unpublished

slide-25
SLIDE 25
  • Custom Optimization Examples
  • Domain-specific High Level Synthesis
  • Fault-resistant Design by Physical Synthesis

Contents

slide-26
SLIDE 26

Conclusion and Outlook

  • Conclusion

– Domain-specific HLS can push the design efficiency and productivity – Different phases of EDA can integrate cryptographic/cryptanalyst knowhow to improve

  • Outlook

– Integrating (more) custom optimizations – Diverse technology/platform-specific constraints – Diverse cipher families – Integrating automated SCA and DFA protection

slide-27
SLIDE 27

Thank you! Questions?