new circuit minimization techniques for smaller and
play

New Circuit Minimization Techniques for Smaller and Faster AES - PowerPoint PPT Presentation

New Circuit Minimization Techniques for Smaller and Faster AES SBoxes Alexander Maximov and Patrik Ekdahl Ericsson Research Patrik Ekdahl Ericsson Research 2019-08-26 Ericsson Internal | 2018-02-21 Plaintext Preliminaries 128 128


  1. New Circuit Minimization Techniques for Smaller and Faster AES SBoxes Alexander Maximov and Patrik Ekdahl Ericsson Research Patrik Ekdahl Ericsson Research 2019-08-26 Ericsson Internal | 2018-02-21

  2. Plaintext Preliminaries 128 128 Roundkey 1 AES Round Function Mux • SubBytes is the only non-linear part SubBytes • 16 8x8 SBoxes needed for a full implementation ShiftRows • Forward only or combined SBox MixColumns • In ASICs • Look-up table • Mux Gate implementation 128 Roundkey n What to remember: Registers — New improved methods for circuit minimization. — New SBox architecture which improves the critical path. 128 Ciphertext Ericsson Internal | 2018-02-21

  3. Preliminaries Basic flow of AES SBox Affine transformation Input U Output R xM +b Inversion GF(2^8) Linear Constant Direct implementation of inversion over Rijndael field is very complex. Ericsson Internal | 2018-02-21

  4. Previous work (low area) Rijmen [Rij00] proposed (based on Itoh and Tsujii [IT88]) to use a composite field and do the inversion in GF(2^4) instead. 4 4 ( )2 v Input U Output R 8 8 8 X-1 ( )-1 X xM +b 4 Base conversion 4 Base back-conversion matrix matrix Inversion over GF(2 4 ) — Satoh et al [SMT01] reduced inversion to GF(2 2 ). — Canright [Can05] investigated the importance of subfield representation. Ericsson Internal | 2018-02-21

  5. Previous work (low depth) Boyar, Peralta et al ([BP10a,BP10b,BP12,BFP18]) used a normal base A=a 0 Y + a 1 Y 16 and A -1 = (AA 16 ) -1 A 16 (also based on Itoh and Tsujii [IT88]) to derive another implementation. 4 Input U Output R 8 8 8 ( )17 ( )-1 X-1 X xM +b 4 Several papers followed: — Nogami et al [NNT+10], looking at mixed bases. — Ueno et al [UHS+15], looking at redundant bases. — Reyhani et al [RMTA18a,b], improving Boyar-Peralta (BP) search algorithm. — Li et al [LSL+19], incorporating depth into BP algorithm. Ericsson Internal | 2018-02-21

  6. Previous work (low depth) Boyar, Peralta et al ([BP10a,BP10b,BP12,BFP18]) used a normal base A=a 0 Y + a 1 Y 16 and A -1 = (AA 16 ) -1 A 16 (also based on Itoh and Tsujii [IT88]) to derive another implementation. 4 Input U Output R 8 8 8 ( )17 ( )-1 X-1 X xM +b 4 Collect all linear terms and push into two matrices. Several papers followed: — Nogami et al [NNT+10], looking at mixed bases. — Ueno et al [UHS+15], looking at redundant bases. — Reyhani et al [RMTA18a,b], improving Boyar-Peralta (BP) search algorithm. — Li et al [LSL+19], incorporating depth into BP algorithm. Ericsson Internal | 2018-02-21

  7. Architectural starting point [BP12] Base back-conversion and Base conversion and the affine transformation of generation of linear parts the AES SBox. of multiplications Input U Output R 8 4 bit X 18 bit N Bottom linear 22 bit Q Inverse 4 bit Y Mul- 8 Top linear 2 x Mul GF(24) Sum Basic problem statement: Given a binary matrix 𝑁 "#$ and the maximum allowed depth maxD, find the circuit of depth D ≤ maxD with the minimum number of 2-input XOR gates such that it computes 𝑍 = 𝑁 ' 𝑌. 𝑧 + = 𝑦 + + 𝑦 . + 𝑦 / + 𝑦 0 1 0 1 1 1 Additional Input Requirement (AIR) Input signals may arrive with different delay 𝑒 5 • 𝑧 1 = 𝑦 1 + 𝑦 . + 𝑦 0 𝑁 = 0 1 1 0 1 𝑧 . = 𝑦 + + 𝑦 1 + 𝑦 / + 𝑦 0 1 1 0 1 1 Additional Output Requirement (AOR) Output signals may need to be ready earlier, 𝑓 5 ≤ 𝑛𝑏𝑦𝐸 • Ericsson Internal | 2018-02-21

  8. Our contributions — New techniques for minimizing the Top and Bottom matrices (area with delay constraints). — Introduced a probabilistic heuristic approach to the cancellation-free algorithm by Paar [Paa97]. — New cancellation-allowed exhaustive search algorithm, based on BP-algorithm [BP10a]. — Floating Multiplexers for the combined SBox. — New generalization of BP-algorithm, allowing other types of gates. — New metrics, with lots of speed up tricks for the distance function. — Stack algorithm with a search tree. — New architecture that removes the Bottom matrix and reduces the overall depth. — New circuit for the inverse operation. — Additional Transformation Matrices. Ericsson Internal | 2018-02-21

  9. Combined SBox with multiplexers Input U Top Top Forward Inverse Mux Common part Bottom Bottom Forward Inverse Mux Output R Ericsson Internal | 2018-02-21

  10. Combined SBox with multiplexers Input U Input X Example: Top Top Y F Y I Forward Inverse Mux Mux 𝑍 ; = 𝑌 + + 𝑌 1 𝑍 < = 𝑌 + + 𝑌 . Y 𝑍 = MUX(select, 𝑌 + + 𝑌 1 , 𝑌 + + 𝑌 . ) Common part Replace with: 𝑍 = MUX select, 𝑌 1 , 𝑌 . + 𝑌 + Bottom Bottom Forward Inverse Generally: Mux 𝑍 = A + MUX select, 𝐶, 𝐷 → 𝑍 = A + Δ + MUX select, B + Δ, 𝐷 + Δ Output R Ericsson Internal | 2018-02-21

  11. Boyar-Peralta algorithm [BP10a] — Notion of a “ point ”. — In original algorithm, this is a linear combination of input signals. Set of gates used G ={XOR}. S + = 𝑦 + , 𝑦 1 , … , 𝑦 0 = ( 1,0,0,0,0 , 0,1,0,0,0 , … , 0,0,0,0,1 ) — Base set of known points S. 𝑧 + 1 0 1 1 1 𝑧 1 = — Set of target points T, the rows 𝑧 5 of M. 0 1 1 0 1 𝑧 . 1 1 0 1 1 — Metric using a distance function 𝜀 5 𝑇, 𝑧 5 . ∆= (𝜀 + , 𝜀 1 ,..., 𝜀 $_1 ). — Set of candidates C . Try all base pair 𝑡 5 , 𝑡 Q in 𝑇 R and form a candidate 𝑑 = 𝑕 𝑡 5 , 𝑡 Q , in this case: 𝑑 = 𝑡 5 + 𝑡 • Q Calculate the new distance vector ∆ based on 𝑇 R ∪ 𝑑 • We save the candidate 𝑑 that gives the lowest distance 𝑇 R\1 = 𝑇 R ∪ 𝑑 • • Repeat until the distance vector is all-zero. Ericsson Internal | 2018-02-21

  12. BP for Linear Circuits with Floating Multiplexers — Include MUX, NMUX in the set of gates. The six gates MUX(v,w) MUX(w,v) — A point is now a tuple 𝑞 = (𝐺, 𝐽) NMUX(v,w) NMUX(w,v) — F and I are linear combinations of input signals XOR(v,w) XNOR(v,w) — Translated into 𝑁𝑉𝑌(𝑎𝐺, 𝐺 ' 𝑌, 𝐽 ' 𝑌) — Input points 𝑌 e = 2 e , 2 e , 𝑙 = 0, …𝑜 − 1 < , 𝑙 = 0, …, 𝑛 − 1 ; , 𝑍 — Target points 𝑍 e = 𝑍 e e — Improved metrics and new algorithm (with lots of speed up) to calculate 𝜀 5 𝑇, 𝑧 5 |𝐸𝑛𝑏𝑦 . — We keep track of AIR, and AOR at each stage. — For the full Affine transformation, define the point as 𝑞 = (𝑔, 𝐺, 𝑗, 𝐽) à 𝑁𝑉𝑌(𝑎𝐺, 𝐺 ' 𝑌 + 𝑔, 𝐽 ' 𝑌 + 𝑗) Ericsson Internal | 2018-02-21

  13. BP for any Nonlinear Circuit — Allow all kinds of gates in G (XOR, AND, MUX, … 2-input, 3-input…). — A point is now the truth table of a Boolean function. — Combine points using truth tables and gate functionality. — Target points are the truth table for every output signal of the nonlinear block. — Applicable to circuits of maximum 4-5 input signals, and the number of output signals is not limited. — Used to derive a smaller inversion circuit over GF(2 4 ). Ericsson Internal | 2018-02-21

  14. Search Tree S r+TD 20-50 children S r+TD Sr+2 Sr+3 S r+TD Sr Sr+1 S r+TD S r+TD ~ 400 total children — Try to keep leaves from as many different branches as possible Ericsson Internal | 2018-02-21

  15. Search Tree S r+TD S r+TD Sr+2 Sr+3 S r+TD Sr Sr+1 S r+TD S r+TD TD — Try to keep leaves from as many different branches as possible Ericsson Internal | 2018-02-21

  16. New architecture for lower depths The Bottom matrix only depends on the multiplication 18-bit Q Architecture A 8-bit output R of the 4-bit signal Y with some linear combination Bottom 18-bit N linear of the input signal U 2xMul 18-bit Q Inverse Mul- 4-bit X 4-bit Y 8-bit Input U GF(2 4 ) Sum linear Top 𝑺 = 𝑍 + ' 𝑁 + ' 𝑽 + ⋯ + 𝑍 / ' 𝑁 / ' 𝑽 4-bit Y Architecture D 8-bit output R 32nand2 +8xor4 32-bit L where 𝑁 5 is an 8x8 matrix to be scalar multiplied by the 𝑍 5 bit. Calculate 𝑁 5 in parallel in Top matrix. Assembling requires 56 gates (32NAND, 24XOR) Ericsson Internal | 2018-02-21

  17. New circuit for the inversion in GF(2 4 ) 𝑍 + = 𝑌 1 𝑌 . 𝑌 / + 𝑌 + 𝑌 . + 𝑌 1 𝑌 . + 𝑌 . + 𝑌 / 𝑍 1 = 𝑌 + 𝑌 . 𝑌 / + 𝑌 + 𝑌 . + 𝑌 1 𝑌 . + 𝑌 1 𝑌 / + 𝑌 / 𝑍 . = 𝑌 + 𝑌 1 𝑌 / + 𝑌 + 𝑌 . + 𝑌 + 𝑌 / + 𝑌 + + 𝑌 1 𝑍 / = 𝑌 + 𝑌 1 𝑌 . + 𝑌 + 𝑌 . + 𝑌 + 𝑌 / + 𝑌 1 𝑌 / + 𝑌 1 — In [BP12] they found a circuit of 17 gates and depth 4 (with base gates {AND, XOR}). — By applying the BP-algorithm for general non-linear circuits, we managed to achieve 9 gates and depth 3. T0 = NAND(X0, X2) T3 = MUX(X1, X2, 1) Y1 = MUX(T2, X3, T3) T1 = NOR(X1, X3) T4 = MUX(X3, X0, 1) Y2 = MUX(X0, T2, X1) T2 = XNOR(T0, T1) Y0 = MUX(X2, T2, X3) Y3 = MUX(T2, X1, T4) We also found a small conventional (no MUXes) circuit of 15 gates and depth 3. Ericsson Internal | 2018-02-21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend