Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic - PowerPoint PPT Presentation

An Improved Hardware Implementation of the Grain-128a Stream Cipher Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems Royal Institute of Technology (KTH), Stockholm Email:{shsm,dubrova}@kth.se

Overview • Motivation • Structure of Grain-128a • 4 techniques to improve implementation • Experimental results • Conclusion 2

The Main Goal • Improving Grain-128a in terms of Throughput , Area and Power • We achieve it by modifying the architecture of Grain without changing its algorithm • We succeed to increase the throughput by 52% on average 3

Grain Family of Stream Ciphers • Support 80-bits-key and 128-bits-key algorithms • Support 4, 8 , 16 and 32 (for Grain-128) degrees of parallelization • New version of Grain-128 (Grain-128a) supports authentication, with a maximal tag length of 32 bits 4

Grain-128a • The cipher is divided into two parts: – Keystream generator section – Authentication section 5

Cipher Phases The cipher goes through the following phases: • loading with the key and the initial value (IV) • Keystream initialization phase • Keystream generation phase – Authentication initialization phase – Operational phase 64 clock cycles IV Initializing the accumulator and the authentication shift register key 256 clock cycles tag No output bits Output stream 6 Producing output stream and tag

How to Improve Throughput? • Throughput is determined by the critical path, which is the longest combinational path in the system • 5 potential candidates to critical path: – Dn : maximal delay from any NLFSR flip-flop to any other NLFSR flip-flop – Dhy : maximal delay from any NLFSR or LFSR flip-flop through the h and y functions to the output of the cipher – Dhya : maximal delay from any NLFSR or LFSR flip-flop through the h and y functions to any accumulator flip-flop – Da : maximal delay from any flip-flop in the authentication section of the cipher to any accumulator flip-flop – Dhyn : maximal delay from a flip-flop of the NLFSR or LFSR through the h and y functions to the first flip-flop of the NLFSR Dhya Dn Dl Da Dhy Only during initialization phase 7

Our Approach • We apply four techniques: – Isolation of the authentication section (improving Dhya ) – Fibonacci-to-Galois transformation of shift registers (improving Dn ) – Multi-frequency implementation (improving Dhyn ) – Internal pipelining (improving Dhy ) 8

Our Approach • Isolation of the authentication section • Fibonacci-to-Galois transformation of the feedback shift registers • Multi-frequency implementation • Internal pipelining 9

1. Isolation of the Auth. Section • Problem: – Dhya increases as the degree of parallelization of Grain- 128a grows • Possible solution: – Isolation of the authentication section by inserting flip- flops in the authentication section on the outputs of the h/y function • This solution: – Adds one cycle latency – Has no effect on security Dhya Da ff ff ff Dhy 10

2. Fibonacci to Galois Transformation • Improves Dhyn and Dn • Brings no area or power penalty Dn Dl Dhyn 12

Fibonacci to Galois Transformation* Galois Configuration Fibonacci Configuration 2 delay=3 1 delay=3 delay=3 2 1 delay=5 Critical delay=3 Critical delay=5 f3=x0 + x2x3 f3=x0 + x2x3 +x1x2 f2=x3 +x0x1 f2=x3 f1=x2 f1=x2 f0=x1 f0=x1 *A Transformation from the Fibonacci to the Galois NLFSRs", E. Dubrova,IEEE Transactions on Information Theory , 55:11, 2009, pp. 5263-5271 13

Example The transformation from Fibonnacci to Galois is not unique f 3 = x 1 x 2 + x 0 f 3 = x 0 f 3 = x 1 x 2 + x 1 x 3 + x 0 f 2 = x 3 + x 0 x 2 f 2 = x 3 + x 0 x 1 + x 0 x 2 f 2 = x 3 f 1 = x 2 f 1 = x 2 f 1 = x 2 f 0 = x 1 f 0 = x 1 f 0 = x 1 14

Fibonacci to Galois Transformation • Explore the design space to find the best Galois NLFSR equivalent to a given Fibonacci NLFSR • Optimal algorithm: synthesize every possible combination and find the best solution Computationally unfeasible - we need a heuristic approach* *"An Algorithm for Constructing a Fastest Galois NLFSR Generating a Given Sequence”, J.-M.,Chabloz, S. Mansouri, E. Dubrova , in Sequences and Their Applications , LNCS 6338, 2010, pp. 41-55 15

Improvement on Dl and Dn Dn Dl • Highest improvement is achieved on Dn of Grain-128a x 1 X1 X2 X4 X8 X16 X32 LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR LFSR NLFSR 60% 67% 53% 54% 51% 42% 26% 24% 18% 13% 0% 0% 16

3. Multi-Frequency Implementation • The critical paths for all versions of grain-128a are given by Dhyn • Although transforming the Grain’s configuration improves the delays ( Dn and Dl ) up to 67 %, the clock frequency of the overall Grain cipher improves only about 10% • Dhyn is active only during the keystream initialization phase • To support efficiently both the initialization and key generation phases, we suggest a dual-frequency implementation of Grain-128a Dhyn 18

Multi-Frequency Implementation Clock Divider Block Grain128a phases: – Keystream initialization phase ( Dhyn path) – Keystream generation phase ( Dn path) Multi-frequency based Grain128a: initialization generation phase phase – The cipher receives only one initialization phase external clock signal (fast clk) – Slow clock is made by clock divider from fast clock – Slower clock used during the keystream initialization phase – Faster clock used during the generation phase keystream generation phase 19

4. Internal Pipelining • The h/y function is pipelined during the key generation phase • Advantage: – Cipher frequency is improved during key generation phase • Disadvatage: – Pipeline flip-flops overhead – Increase the latency of a fixed number of cycles during the key generation phase Dhya Dhyn Dhy 21

Throughput Improvement Maximal improvement in frequency compared to the original design. Grain-128a Grain-128a Grain-128a Grain-128a Grain-128a Grain-128a Grain-128a Grain-128a Grain-128a X1 X2 X4 X8 X16 X32 X8 X16 X32 Freq. 67% 80% 65% 53% 41% 40% 32% 29% 33% Area 0% -5% -7% -10% -12 % -23% -1% -5 % -13% Power 3 -7 -13 -4 -11 1 -2% 1% 4% More information about different trade- offs can be found in the paper 22

Conclusion • High throughput improvement • Limited area/power impact • Techniques compatible with the standard ASIC flow • Some techniques can be applied to other ciphers 23

Thank You for your attention Questions? F2G: http://web.it.kth.se/~dubrova/fib2gal.html

Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic - PowerPoint PPT Presentation

An Improved Hardware Implementation of the Grain-128a Stream Cipher Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems Royal Institute of Technology (KTH), Stockholm Email:{shsm,dubrova}@kth.se Overview Motivation

An Improved Hardware Implementation of the Quark Hash Function Shohreh Sharif Mansouri and Elena

A Countermeasure Against Power Analysis Attacks for FSR-Based Stream Ciphers Shohreh Sharif

Internet-of-Things and Deep Learning Elena Dubrova School of Electrical Engineering and Computer

Non-Silicon Non-Binary Computing: Why Not? Elena Dubrova, Yusuf Jamal, Jimson Mathew Royal

Instruction Prefetcher Ali Ansari (Sharif) Fatemeh Golshan (Sharif) Pejman Lotfi-Kamran (IPM)

ELENA Residential: Accelerating the energy refurbishment of residential buildings 25/01/2019 1

Radiation genetics, epigenetics and effects on clock genes Yuri E Dubrova yed2@le.ac.uk

Mainpat Refugee Camp MAHB Inc. Team Members: Jason Hertzberg Garrison Becher Hasan Mansouri

Conformal theory of MacDowell-Mansouri type Micha Szczachor Capstone Institute for Theoretical

Programming Instructor PanteA Zardoshti Department of Computer Engineering Sharif University of

Masters Thesis Information SoC Master Program, year 2 SoC Master Program, year 2 Elena

Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology

Internet Engineering: Search Ali Kamandi Sharif University of Technology kamandi@ce.sharif.edu

Internet Engineering: Search Ali Kamandi Sharif University of Technology kamandi@ce.sharif.edu

Web Service Ali Kamandi kamandi@ce.sharif.edu Sharif University of Technology Internet

Ali Kamandi Spring 2007 kamandi@sharif.edu Sharif University of Technology Internet History

SIFA Statistical Ineffective Fault Attacks Rump Session at CHES 2018 Based on work of:

Local Search Techniques Marijn J.H. Heule Warren A. Hunt Jr. The University of Texas at Austin

Flip-Flop One-bit Memory Something to Remember What I Remember D Q Remember Now! What I

The state with the field H perpendicular to the easy magnetization the two sublattices is more

A modified form of Cheneys Algorithm to allow mutator to progress during a GC cycle Main idea

FLIP the (Flow) Table: Fast LIghtweight Policy-preserving SDN Updates Stefano Vissicchio

Chapter 3 Learn how to design simple logic circuits. Understand how digital circuits work

State Prof. Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,