Smashing the Implementation Records of AES S-box Arash - - PowerPoint PPT Presentation

smashing the implementation records of
SMART_READER_LITE
LIVE PREVIEW

Smashing the Implementation Records of AES S-box Arash - - PowerPoint PPT Presentation

Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London, Ontario, Canada CHES-2018 1 Outline Introduction. Proposed AES S-box Architecture. New


slide-1
SLIDE 1

Smashing the Implementation Records of AES S-box

Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London, Ontario, Canada CHES-2018

1

slide-2
SLIDE 2
  • Introduction.
  • Proposed AES S-box Architecture.
  • New Logic-Minimization Algorithms.
  • New GF((24)2) Inversion.
  • New Exponentiation Stage.
  • New Representation of Subfield Inversion.
  • New Output Multipliers.
  • Comparisons and Concluding Remarks.

Outline

2

slide-3
SLIDE 3

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introduction

  • f Rijndael

Rijmen & Daemen

Standardizing Rijndael as the AES First Imp. using Tower Fields

Satoh et al.

slide-4
SLIDE 4

Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introduction

  • f Rijndael

Rijmen & Daemen

Standardizing Rijndael as the AES Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Then to 113

CMT

First Imp. using Tower Fields

Satoh et al.

slide-5
SLIDE 5

Target small delay / high efficiency Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introduction

  • f Rijndael

Rijmen & Daemen

Standardizing Rijndael as the AES Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Most efficient S-box

Ueno et al.

Reduce the depth

  • f S-box to 16 gates

Boyar, Find and Peralta

Then to 113

CMT

First Imp. using Tower Fields

Satoh et al.

slide-6
SLIDE 6

Target small delay / high efficiency Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introduction

  • f Rijndael

Rijmen & Daemen

Standardizing Rijndael as the AES Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Most efficient S-box

Ueno et al.

Reduce the depth

  • f S-box to 16 gates

Boyar, Find and Peralta

Then to 113

CMT

  • 1. The most compact S-box to date.
  • 2. The most efficient S-box to date.

In this paper, we propose:

First Imp. using Tower Fields

Satoh et al.

slide-7
SLIDE 7

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

4

slide-8
SLIDE 8

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

4

slide-9
SLIDE 9

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

  • We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-box Area (GEs) Delay (ns) Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 194 1.523 1.346 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912 Ueno et al. [UHS+15] 256.5 238 0.831 0.772

slide-10
SLIDE 10

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

  • We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-box Area (GEs) Delay (ns) Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 194 1.523 1.346 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912 Ueno et al. [UHS+15] 256.5 238 0.831 0.772 The smallest

  • riginal

The fastest

  • riginal
slide-11
SLIDE 11

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

  • We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-box Area (GEs) Delay (ns) Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 194 1.523 1.346 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912 Ueno et al. [UHS+15] 256.5 238 0.831 0.772 The smallest

  • riginal

The smallest improved The fastest

  • riginal

The fastest improved

slide-12
SLIDE 12

Implementation Pitfalls

  • 1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

  • We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

At the end, we compare only against the Improved Versions. Formulations of the improved designs are included in the paper.

4

S-box Area (GEs) Delay (ns) Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 194 1.523 1.346 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912 Ueno et al. [UHS+15] 256.5 238 0.831 0.772 The smallest

  • riginal

The smallest improved The fastest

  • riginal

The fastest improved

slide-13
SLIDE 13
  • Original S-box

AES S-box

5

Inversion GF(28)

g

x M + h

s

slide-14
SLIDE 14
  • Original S-box
  • Typical implementation using Composite Fields in Normal Basis

AES S-box

5

Inversion GF(28)

g

x M + h

s

x M + h

s

X X-1

g

()2

Composite field Inversion

slide-15
SLIDE 15
  • 12 terms are shared between the Exponentiation and Multipliers

Proposed AES S-box Architecture

6

s

Tout Tin

g

12 5 10 5

Composite field Inversion

6 6

slide-16
SLIDE 16
  • 12 terms are shared between the Exponentiation and Multipliers

New Logic- Minimization Algorithms New Logic- Minimization Algorithms New Formulations New, Improved Representations New Formulations New Multipliers

Proposed AES S-box Architecture

6

s

Tout Tin

g

12 5 10 5

Composite field Inversion

6 6

slide-17
SLIDE 17
  • 12 terms are shared between the Exponentiation and Multipliers

New Logic- Minimization Algorithms New Logic- Minimization Algorithms New Formulations New, Improved Representations New Formulations New Multipliers

Proposed AES S-box Architecture

6

s

Tout Tin

g

12 5 10 5

Everything optimized by-hand and by CAD tools at various abstraction levels (promote using NAND/NOR and compound gates )

Composite field Inversion

6 6

slide-18
SLIDE 18
  • Introduction, Motivation and Previous Work.
  • Proposed AES S-box Architecture.
  • New Logic-Minimization Algorithms.
  • New GF((24)2) Inversion.
  • New Exponentiation Stage.
  • New Representation of Subfield Inversion.
  • New Output Multipliers.
  • Comparisons and Concluding Remarks.

Outline

7

slide-19
SLIDE 19

Input Rep. in GF((24)2) 12 shared terms

  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].

Logic-Minimization Algorithms

Tin

Tin

g

12

8

slide-20
SLIDE 20
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

Logic-Minimization Algorithms (cont.)

9

First 8 rows of Tin

slide-21
SLIDE 21
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

  • Heuristics (with cancellation):

Normal-BP (Boyar and Peralta [BP10])

Logic-Minimization Algorithms (cont.)

9

First 8 rows of Tin

slide-22
SLIDE 22
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

  • Heuristics (with cancellation):

Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist) Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

9

First 8 rows of Tin

slide-23
SLIDE 23
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

  • Heuristics (with cancellation):

Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist) Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

Compute Dist

2

9

First 8 rows of Tin

slide-24
SLIDE 24
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

  • Heuristics (with cancellation):

Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist) Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

Compute Dist

2

9

3 First 8 rows of Tin

slide-25
SLIDE 25
  • Implement isomorphic transformation

matrices using smallest number of gates.

  • NP-hard problem [BMP08].
  • Previous work
  • Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

  • Heuristics (with cancellation):

Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist) Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

Compute Dist

2

9

3 First 8 rows of Tin

Add the selected gate and redo

slide-26
SLIDE 26
  • Proposed Logic-Minimization Algorithms
  • Improved-BP:
  • Test all the ties.
  • Monitor progress of the delay.
  • Shortest-Dist-First:
  • Select a gate leading to many small (short) Distances

(prioritize small Distances, not the average).

  • Test all the ties and monitor the delay.
  • Focused-Search:
  • Select a gate leading to any small (short) Distance

(ignore the count and search through more cases) (close to exhaustive search).

  • Test all the ties and monitor the delay.

10

First 8 rows of Tin

Logic-Minimization Algorithms (cont.)

1

Compute Dist

2 3

slide-27
SLIDE 27
  • Studied Tin and Tout for all possible isomorphic transformations (a total
  • f 96 matrices).

11

Logic-Minimization Algorithms (cont.)

slide-28
SLIDE 28
  • Studied Tin and Tout for all possible isomorphic transformations (a total
  • f 96 matrices).
  • The proposed algorithms consistently lead to equal or better

implementations.

11

Logic-Minimization Algorithms (cont.)

slide-29
SLIDE 29
  • Studied Tin and Tout for all possible isomorphic transformations (a total
  • f 96 matrices).
  • The proposed algorithms consistently lead to equal or better

implementations.

  • Lightweight Implementation

Optimized by CAD tools Normal-BP Improved-BP Shortest-Dist- First Focused-Search Tin (#gates) 29 19 19 19 19 Tout (#gates) 23 19 17 17 16

11

Logic-Minimization Algorithms (cont.)

slide-30
SLIDE 30
  • Studied Tin and Tout for all possible isomorphic transformations (a total
  • f 96 matrices).
  • The proposed algorithms consistently lead to equal or better

implementations.

  • Lightweight Implementation
  • Fast Implementation

Optimized by CAD tools Normal-BP Improved-BP Shortest-Dist- First Focused-Search Tin (#gates) 29 19 19 19 19 Tout (#gates) 23 19 17 17 16

11

Area (# XOR gates) Delay (levels of XOR gates) Tin (#gates) 24 3 Tout (#gates) 21 3

Logic-Minimization Algorithms (cont.)

slide-31
SLIDE 31
  • Introduction, Motivation and Previous Work.
  • Proposed AES S-box Architecture.
  • New Logic-Minimization Algorithms.
  • New GF((24)2) Inversion.
  • New Exponentiation Stage.
  • New Representation of Subfield Inversion.
  • New Output Multipliers.
  • Comparisons and Concluding Remarks.

Outline

12

slide-32
SLIDE 32
  • Express as one operation with

closed-form equations (allows for maximum sharing).

New Exponentiation Stage

13

()2

slide-33
SLIDE 33
  • Express as one operation with

closed-form equations (allows for maximum sharing).

  • Two designs: Lightweight and Fast.

(Optimized by hand)

  • One design optimized by CAD tools.

New Exponentiation Stage

13

()2

slide-34
SLIDE 34

New Exponentiation Stage (cont.)

14

Area (GEs) Delay (ns)

  • 1. Lightweight (optimized by-hand)

30 0.103

  • 2. Fast (optimized by-hand)

30 0.091

  • 3. Optimized by CAD tool

29.25 0.100

  • 1. Lightweight

(optimized by-hand)

  • 2. Fast

(optimized by-hand)

  • 3. Optimized by CAD tool

(Used XOR3 gates)

slide-35
SLIDE 35
  • Express in closed-form equations
  • Derive 12 equivalent functions using Karnough maps,

and optimize by-hand.

  • Optimized using CAD tools.

New Subfield Inversion

15

slide-36
SLIDE 36
  • Express in closed-form equations
  • Derive 12 equivalent functions using Karnough maps,

and optimize by-hand.

  • Optimized using CAD tools.

New Subfield Inversion

15

Area (GEs) Delay (ns) Lightweight and fast (optimized by-hand) 36 0.121 Optimized by CAD tools 31 0.102

Lightweight and fast, optimized by-hand Used NAND3 gates Optimized by CAD tools Used OR-AND-Invert gates

slide-37
SLIDE 37
  • Two multipliers with a common input:

W = B x E & Z = A x E

New Output Multipliers

16

5 5 Z W B E A

slide-38
SLIDE 38
  • Two multipliers with a common input:

W = B x E & Z = A x E

  • Input and output terms represented as

4 bits x 4 bits  5 bits Reduction from 5 bits back to 4 bits is part of Tout .

New Output Multipliers

16

5 5 Z W B E A

slide-39
SLIDE 39
  • Two multipliers with a common input:

W = B x E & Z = A x E

  • Input and output terms represented as

4 bits x 4 bits  5 bits Reduction from 5 bits back to 4 bits is part of Tout .

  • Previous work:

4x4  4 [Can05b], 5x5  5 [NNI12], 4x5  5 [UHS+15]

New Output Multipliers

16

5 5 Z W B E A

slide-40
SLIDE 40
  • Focus on the combined cost of the two

multipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5 5 Z W B E A Z W B E A

bi + bj ei+ ej ai+ aj

5 6 6 6 5 4 4 4

slide-41
SLIDE 41
  • Focus on the combined cost of the two

multipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5 5 Z W B E A Z W B E A

bi + bj ei+ ej ai+ aj

Used NAND3 gates Part of Tin

5 6 6 6 5 4 4 4

slide-42
SLIDE 42
  • Focus on the combined cost of the two

multipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5 5 Z W B E A Z W B E A

bi + bj ei+ ej ai+ aj

Used NAND3 gates Part of Tin Implemented once (shared)

5 6 6 6 5 4 4 4

slide-43
SLIDE 43
  • Focus on the combined cost of the two

multipliers (deploy maximum sharing).

  • Some multipliers do not allow sharing ([Mas91], [RDJ+01] and [GM16]).

New Output Multipliers (cont.)

17

5 5 Z W B E A Z W B E A

bi + bj ei+ ej ai+ aj

Used NAND3 gates Part of Tin Implemented once (shared)

5 6 6 6 5 4 4 4

slide-44
SLIDE 44

Space and time complexities of a single multiplier

New Output Multipliers (cont.)

18

Multiplier used in Space Complexity Time Complexity GF(((22)2)2) Satoh et al. [SMTM01] 21 XOR + 9 AND 4 DX + DAD Canright [Can05b] 20 XOR + 9 NAND 4 DX + DND Nogami et al. [NNT+10] 21 XOR + 9 AND 4 DX + DAD GF((24)2) Rudra et al. [RDJ+01] 15 XOR + 16 AND 3 DX + DAD Gueron et al. [GM16] 15 XOR + 16 AND 3 DX + DND Nekado et al. [NNI12] 25 XOR + 10 AND 2 DX + DAD Ueno et al. [UHS+15] 21 XOR + 10 AND 2 DX + DAD This work 17 XOR + 10 NAND 2 DX + DND

slide-45
SLIDE 45

Space and time complexities of a single multiplier

The smallest and fastest 4-bit multiplier to date among all the GF((24)2) and GF(((22)2)2) multipliers

New Output Multipliers (cont.)

18

Multiplier used in Space Complexity Time Complexity GF(((22)2)2) Satoh et al. [SMTM01] 21 XOR + 9 AND 4 DX + DAD Canright [Can05b] 20 XOR + 9 NAND 4 DX + DND Nogami et al. [NNT+10] 21 XOR + 9 AND 4 DX + DAD GF((24)2) Rudra et al. [RDJ+01] 15 XOR + 16 AND 3 DX + DAD Gueron et al. [GM16] 15 XOR + 16 AND 3 DX + DND Nekado et al. [NNI12] 25 XOR + 10 AND 2 DX + DAD Ueno et al. [UHS+15] 21 XOR + 10 AND 2 DX + DAD This work 17 XOR + 10 NAND 2 DX + DND

slide-46
SLIDE 46

Additional area and delay required for the multipliers

Area (GEs) Delay (ns) Optimized by-hand 52 0.099 Optimized by CAD tools 53.5 0.121

Optimized by-hand

Z W bi E

ei+ ej

5 6 6 6 5 4 4 4

bij=bi + bj aij=ai + aj

ai Tin

New Output Multipliers (cont.)

19

slide-47
SLIDE 47
  • Introduction, Motivation and Previous Work.
  • Architecture of the Proposed AES S-box.
  • New Logic-Minimization Algorithms.
  • New GF((24)2) Inversion.
  • New Exponentiation Stage.
  • New Representation of Subfield Inversion.
  • New Output Multipliers.
  • Comparisons and Concluding Remarks.

Outline

20

slide-48
SLIDE 48
  • Targeting Lightweight Implementation

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product Canright [Can05b] 200 1.25 250 Improved 113-gates 194 1.35 261.9 This work (Lightweight) 182.25 1.20 218.7

The smallest, fastest and most efficient Lightweight S-box

slide-49
SLIDE 49
  • Targeting Lightweight Implementation
  • Targeting Fast Implementation

At STM 65-nm CMOS standard technology library

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product Canright [Can05b] 200 1.25 250 Improved 113-gates 194 1.35 261.9 This work (Lightweight) 182.25 1.20 218.7 S-box Area (GEs) Delay (ns) Area-Time Product Improved Depth-16 (2012) 222 0.91 202.02 Improved Depth-16 (2017) 216 0.91 196.56 Improved Ueno et al. 238 0.77 183.26 This work (Fast) 208 0.78 162.24

The smallest, fastest and most efficient Lightweight S-box The smallest, fastest and most efficient Fast S-box

slide-50
SLIDE 50
  • Targeting Lightweight Implementation
  • Targeting Fast Implementation

At STM 65-nm CMOS standard technology library

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product Canright [Can05b] 200 1.25 250 Improved 113-gates 194 1.35 261.9 This work (Lightweight) 182.25 1.20 218.7 S-box Area (GEs) Delay (ns) Area-Time Product Improved Depth-16 (2012) 222 0.91 202.02 Improved Depth-16 (2017) 216 0.91 196.56 Improved Ueno et al. 238 0.77 183.26 This work (Fast) 208 0.78 162.24

The smallest, fastest and most efficient Lightweight S-box The smallest, fastest and most efficient Fast S-box

As compared against the improved versions proposed in this paper As a result of testing more than 46 pieces of VHDL code, at various abstraction levels of the designs

slide-51
SLIDE 51

22

Effect of Target Library

  • Industrial technology libraries (e.g., STM and TSMC):
  • Lightweight: Used XOR3 and OAI32  182.25 GEs.
  • Fast: Used NAND3  208 GEs.
slide-52
SLIDE 52

22

Effect of Target Library

  • Industrial technology libraries (e.g., STM and TSMC):
  • Lightweight: Used XOR3 and OAI32  182.25 GEs.
  • Fast: Used NAND3  208 GEs.
  • NanGate45nm:
  • Lightweight: Used AOI12 and OAI12 gates  186 GEs.
  • Fast: Used NAND3  208 GEs (no change).
slide-53
SLIDE 53

22

Effect of Target Library

  • Industrial technology libraries (e.g., STM and TSMC):
  • Lightweight: Used XOR3 and OAI32  182.25 GEs.
  • Fast: Used NAND3  208 GEs.
  • NanGate45nm:
  • Lightweight: Used AOI12 and OAI12 gates  186 GEs.
  • Fast: Used NAND3  208 GEs (no change).
  • Without using any compound gate:
  • Lightweight: 191 GEs (best previous work: 194 GEs)
  • Fast: 211 GEs (best previous work: 216 GEs)
slide-54
SLIDE 54

22

Effect of Target Library

  • Industrial technology libraries (e.g., STM and TSMC):
  • Lightweight: Used XOR3 and OAI32  182.25 GEs.
  • Fast: Used NAND3  208 GEs.
  • NanGate45nm:
  • Lightweight: Used AOI12 and OAI12 gates  186 GEs.
  • Fast: Used NAND3  208 GEs (no change).
  • Without using any compound gate:
  • Lightweight: 191 GEs (best previous work: 194 GEs)
  • Fast: 211 GEs (best previous work: 216 GEs)

The proposed designs are superior under any restriction by the target library.

slide-55
SLIDE 55

Concluding Remarks

23

  • In this paper, we proposed:
  • Two new designs for the AES S-box: Lightweight and fast.
  • New logic-minimization heuristics.
  • New formulations for each stage of the S-box.
  • New output multipliers.
  • Design methodology for an optimum synergy between

theoretical analysis and technology-assisted CAD tools.

slide-56
SLIDE 56

References

24

  • [Can05b] David Canright. A very compact S-box for AES. CHES-2005.
  • [Boy16] CMT: Circuit minimization team, 2016. http://www.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html,
  • [BP12] Joan Boyar and René Peralta. A small depth-16 circuit for the AES S-box. Information Security and Privacy Conference, SEC 2012.
  • [BFP17] Joan Boyar, Magnus Find, and René Peralta. Low-depth, low-size circuits for cryptographic applications. In Boolean Functions and their

Applications BFA-2017.

  • [UHS+15] Rei Ueno, Naofumi Homma, Yukihiro Sugawara, Yasuyuki Nogami, and Takafumi Aoki. Highly efficient GF(28) inversion circuit based on

redundant GF arithmetic and its application to AES design. CHES-2015.

  • [BMP08] Joan Boyar, Philip Matthews, and René Peralta. On the shortest linear straight-line program for computing linear forms. Mathematical

Foundations of Computer Science, MFCS 2008.

  • [Paa94] Christof Paar. Efficient VLSI architectures for bit parallel computation in Galios fields. PhD thesis, University of Duisburg-Essen, Germany, 1994.
  • [BP10] Joan Boyar and René Peralta. A new combinational logic minimization technique with applications to cryptology. Symposium on Experimental

Algorithms, SEA 2010.

  • [NNI12] Kenta Nekado, Yasuyuki Nogami, and Kengo Iokibe. Very short critical path implementation of AES with direct logic gates. International

Workshop on Security, IWSEC 2012.

  • [Mas91] E. D. Mastrovito. VLSI Architectures for Computation in Galois Fields. PhD thesis, Linkoping Univ., Linkoping Sweden, 1991.
  • [RDJ+01] Atri Rudra, Pradeep K. Dubey, Charanjit S. Jutla, Vijay Kumar, Josyula R.Rao, and Pankaj Rohatgi. Efficient Rijndael encryption implementation

with composite field arithmetic. CHES 2001.

  • [GM16] Shay Gueron and Sanu Mathew. Hardware implementation of AES using area-optimal polynomials for composite-field representation GF((24)2)
  • f GF(28). ARITH 2016.
  • [SMTM01] Akashi Satoh, Sumio Morioka, Kohji Takano, and Seiji Munetoh. A compact Rijndael hardware architecture with S-box optimization.

ASIACRYPT 2001.

  • [NNT+10] Yasuyuki Nogami, Kenta Nekado, Tetsumi Toyota, Naoto Hongo, and Yoshitaka Morikawa. Mixed bases for efficient inversion in F((22)2)2 and

conversion matrices of subbytes of AES. CHES-2010.

slide-57
SLIDE 57

25

Thank You, Questions?

slide-58
SLIDE 58

Logic-Minimization Algorithms

Tout

Input and Dist, using

  • riginal the inputs

3 7 7 5 3 3 1 5

Dist, assume using w0+w1

3 5 6 4 3 2 1 5

Sum(Dist) = 29 Dist, assume using w0+w2

3 6 7 5 3 2 1 5

Sum(Dist) = 32 First, add all gates with Dist=1

3 6 7 5 3 2 1 5

Dist, assume using w0+w4

3 6 6 5 3 2 1 5

Sum(Dist) = 31

  • Normal-BP:

1.Test all the possible XOR gates that can use the previous level gates (the inputs and (w2+w4)). That is: from (w0+w1) all the way to (z4 + (w2+w4)). 2.Select one gate that leads to [ min (sum (Dist)) ]. In case of ties, select one gate based on different tie breaking criteria. For example, within the best gates, select one gate that maximizes the Euclidean norm of Dist

  • Improved-BP:

Similar to Normal-BP, but try all the tie, and monitor progress of the Delay.

  • Shortest-Dist-First

Similar to Norma-BP, but select all the gates that as many small numbers in the Dist as possible. If we consider the four cases above, we will select all of them because the smallest number is 2 (excluding ones), and this number (2) appears one time in each case. If it were to appear twice in any case, I would have selected that case. If the smallest number is 3, so that is the smallest Dist, and select the case that leads to as many (Dist=3) as possible.

  • Focused-Search

Similar to ‘Shortest-Dist-First’, but we ignore the count of (Dist=2) or (Dist=3). Here, we select all the gates that include (Dist=2) within the vector of Distances. We do not differentiate based on the count. If there is no gate that lead to Dist=2, select all the gates that include Dist=3, and so on. Dist, assume using w0+w3

3 6 6 5 3 2 1 5

Sum(Dist) = 31

26