Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing - - PowerPoint PPT Presentation

β–Ά
normal basis is usin ing novel concurrent seria ial
SMART_READER_LITE
LIVE PREVIEW

Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing - - PowerPoint PPT Presentation

A New Mult ltipli licative In Inverse Archit itecture in in Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing and Mult ltipli lication Amin Monfared, Hayssam El-Razouk and Arash Reyhani-Masoleh Presented by: Arash


slide-1
SLIDE 1

A New Mult ltipli licative In Inverse Archit itecture in in Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing and Mult ltipli lication

1

Amin Monfared, Hayssam El-Razouk and Arash Reyhani-Masoleh

Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada

24th IEEE Symposium on Computer Arithmetic, 2017

slide-2
SLIDE 2

Outline

  • Motivation
  • Arithmetic operations over 𝐻𝐺(2𝑛) using Gaussian Normal

Basis (GNB)

  • Proposed digit-level square-multiply architecture
  • It computes 𝐡 Γ— 𝐢2𝑓
  • Both digits of inputs 𝐡 and 𝐢 are entered serially
  • Denoted by Digit-Level Fully Serial-In Square-Multiply (DL-FSISM)
  • Proposed inversion architecture
  • It uses the DL-FSISM
  • ASIC implementations and comparison
  • Conclusions and future work

2

slide-3
SLIDE 3

Motivation: Fin inite Fields

  • Many applications use arithmetic operations over

𝐻𝐺(2𝑛)

  • Cryptography: Elliptic Curve, AES
  • Error control coding
  • Reed-Solomon code
  • There are different bases to represent a field element.
  • Polynomial basis, normal basis (NB), dual basis, etc.
  • In NB, squaring is free in hardware.

3

slide-4
SLIDE 4

Motivation: Gaussian Normal Basis is (GNB)

  • GNB over 𝐻𝐺 2𝑛 is a special class of NB and exists

whenever 𝑛 is not divisible by 8.

  • GNBs have been included in IEEE and NIST standards

for ECDSA.

  • Any field element 𝐡 can be represented as

𝐡 = ෍

𝑗=0 π‘›βˆ’1

𝑏𝑗𝛾2𝑗, where π‘π‘—πœ—{0,1} and {𝛾, … , 𝛾2π‘›βˆ’1} is a GNB over 𝐻𝐺 2𝑛 .

  • In this paper, we consider GNB and propose new

digit-level architectures for square-multiply and inversion.

4

slide-5
SLIDE 5

Arit ithmetic ic Operations over 𝐻𝐺(2𝑛) using GNB GNB

  • Addition
  • Let 𝐡 and 𝐢 be two Field elements represented in GNB.
  • The addition operation is bit-wise XOR operation of the

coordinates of the two inputs:

𝐡 + 𝐢 = ෍

𝑗=0 π‘›βˆ’1

(𝑏𝑗+𝑐𝑗)𝛾2𝑗

  • Squaring
  • Squaring operation is performed by right cyclic shift of the

coordinates of 𝐡:

𝐡2 = ෍

𝑗=0 π‘›βˆ’1

𝑏𝑗𝛾2𝑗+1

  • It is free in hardware if all coordinates are available in parallel.

5

slide-6
SLIDE 6

Arit ithmetic ic Operations usin ing GNB: : Mult ltip ipli lication

  • Finite field multiplication is more complex than

addition and squaring.

  • Multiplication can be implemented in digit-level

architectures, in which the digit size can be chosen based on available resources.

  • In this paper, we have used two different types of

digit-level multiplier namely:

  • Digit-Level Parallel-In Serial-Out (DL-PISO)
  • Digit-Level Parallel-In Parallel-Out (DL-PIPO)
  • Also, we have proposed a new multiplier/squarer

architecture

  • Digit-Level Fully Serial-In Square-Multiply (DL-FSISM).

6

slide-7
SLIDE 7

Arit ithmetic ic Operations usin ing GNB: : In Inversion

  • Based on Fermat Little Theorem, an inversion can be calculated by
  • π΅βˆ’1 = 𝐡2π‘›βˆ’2 ∈ 𝐻𝐺 2𝑛 , 𝐡 β‰  0.
  • In Itoh and Tsujii algorithm (ITA) [4], the number of multiplications is

reduced based on decomposing 2π‘›βˆ’1 βˆ’ 1

  • As an example for the NIST recommended field over 𝐻𝐺(2233):

2232 βˆ’ 1 = (1 + 2)(1 + 22)(1 + 24)(1 + 28(1 + 28)(1 + 216)(1 + 232(1 + 232)(1 + 264(1 + 264))))

  • The inversion using ITA takes a total of 10 iterations.
  • Each iteration consists of one single digit-level parallel-in parallel-
  • ut (DL-PIPO) multiplication and one free squaring.
  • [4] T. Itoh and S. Tsujii, β€œA fast algorithm for computing multiplicative inverses in GF(2m) using normal bases,” Information

and computation, vol. 78, no. 3, pp. 171–177, 1988.

7

slide-8
SLIDE 8

Arit rithmetic ic Operatio ions usi sing GNB: In Inversio ion (cont’d)

  • Our inversion flow diagram (based on ITA) uses an interleaved

computations of digit-level parallel-in serial-out (DL-PISO) multiplier and our new DL-FSISM architecture.

  • It only needs a total of 5 iterations.
  • Each iteration consists of two single multiplications (and

squarings)

8

  • In this paper, we propose a new digit-level fully serial-in

parallel-out square-multiply (DL-FSISM) architecture which performs concurrent squaring and multiplication without introducing any delay.

slide-9
SLIDE 9

Proposed Dig igit it-Level l Fully lly Se Seri rial-In Sq Square-Mult ltip iply ly (D (DL-FSISM)

  • Let A and B be field elements and e be an integer.
  • The proposed scheme reads the inputs of A and B digit-by-

digit serially and concurrently computes 𝐺 = 𝐡 Γ— 𝐢2𝑓 .

  • The composite operations of squaring and multiplication are

concurrently performed without introducing any additional delay.

  • For a digit size of 𝑒 bits, it would take ⌈

𝑛 π‘’βŒ‰ clock cycles to

generate the result 𝐺 = 𝐡 Γ— 𝐢2𝑓.

9

slide-10
SLIDE 10

Proposed DL-FSISM: Key y Formulation

Proposition 1: Let 𝐡 and 𝐢 be two 𝐻𝐺(2𝑛) elements that are represented in GNB {𝛾, … , 𝛾2π‘›βˆ’1}. One can compute 𝐺 = 𝐡𝐢2𝑓, by proceeding from 𝑗 = 0 to 𝑙 βˆ’ 1, the result 𝐺 = πΊπ‘™βˆ’1 = 𝐡(π‘™βˆ’1) (𝐢 π‘™βˆ’1 )2𝑓 is obtained using the following recurrence relation 𝐺𝑗 = πΊπ‘—βˆ’1 2𝑒 + Οƒπ‘˜=0

π‘’βˆ’1 πœ€ π‘˜ 𝑏𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐢 𝑗 2𝑓

+ (Οƒπ‘˜=0

π‘’βˆ’1 πœ€ π‘˜

𝑐𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐡 π‘—βˆ’1

2π‘’βˆ’π‘“

)2𝑓 where πœ€

π‘˜ 𝑣, π‘Š = π‘£π‘Šπ›Ύ2π‘˜, u πœ— 0,1 and π‘Š = Οƒπ‘š=0 π‘›βˆ’1 π‘€π‘š 𝛾2π‘š ∈ 𝐻𝐺 2𝑛 .

10

slide-11
SLIDE 11

Proposed DL-FSISM: Archit itecture

  • Three registers X,

Y, and Z are initially cleared

  • Digits of inputs

are entered to X and Y serially from MSB

  • After βŒˆπ‘›

π‘’βŒ‰ clock

cycles, Z contains 𝐡𝐢2𝑓

𝐺

𝑗 = 𝐺 π‘—βˆ’1 2𝑒 + ෍ π‘˜=0 π‘’βˆ’1

πœ€

π‘˜ 𝑏𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐢 𝑗 2𝑓

+ (෍

π‘˜=0 π‘’βˆ’1

πœ€

π‘˜

𝑐𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐡 π‘—βˆ’1

2π‘’βˆ’π‘“

)2𝑓

d

<Z>

d m m

+

m m-d m

d

m m 1

in1 in2

<X>

m-d-1

<Y>

m-d-1

B

k 1

  • B

B

k 1

  • A

i

  • k 1
  • A

A

m-1

B(i)Β»en ad(k-1-i)+d-1

d-1

m d en m en n

B(i) A(i-1)

d

m 1

in1 in2

ad(k-1-i)+0

+

m

d

m m 1

in1 in2

((A(i-1)Β»d)Β«en) bd(k-1-i)+d-1

d-1

d

m 1

in1 in2

bd(k-1-i)+0

+

m en m n m d d d d m-d n

k 1

  • - i

11

𝐡𝐢2𝑓

slide-12
SLIDE 12

Proposed DL-FSISM: Archit itecture (cont’d)

d

<Z>

d m m

+

m m-d m

d

m m 1

in1 in2

<X>

m-d-1

<Y>

m-d-1

B

k 1

  • B

B

k 1

  • A

i

  • k 1
  • A

A

m-1

B(i)Β»en ad(k-1-i)+d-1

d-1

m d en m en n

B(i) A(i-1)

d

m 1

in1 in2

ad(k-1-i)+0

+

m

d

m m 1

in1 in2

((A(i-1)Β»d)Β«en) bd(k-1-i)+d-1

d-1

d

m 1

in1 in2

bd(k-1-i)+0

+

m en m n m d d d d m-d n

k 1

  • - i

m e1 en m m n ev m m X X2-en m e1 en m m n ev m m X X2en

m 1

in2 in1

m m j m  m j 1 1 1 1 1 1

12

𝐺

𝑗 = 𝐺 π‘—βˆ’1 2𝑒 + ෍ π‘˜=0 π‘’βˆ’1

πœ€

π‘˜ 𝑏𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐢 𝑗 2𝑓

+ (෍

π‘˜=0 π‘’βˆ’1

πœ€

π‘˜ Γ— 𝑐𝑒 π‘™βˆ’1βˆ’π‘— +π‘˜ , 𝐡 π‘—βˆ’1 2π‘’βˆ’π‘“

)2𝑓

πœ€

π‘˜

slide-13
SLIDE 13

Proposed In Inversion Archit itecture

13

Ξ΅ = {2,8,32,64} 32

  • The inversion core is

made by serially connecting of DL-PISO and DL-FSISM

  • The register file only

stores from the multipliers

  • 𝑒-bits register is

added between two multipliers to shorten the critical path

  • Each iteration selects
  • ne of inputs of

multiplexers and takes βŒˆπ‘›

π‘’βŒ‰+1 clock cycles

slide-14
SLIDE 14

In Inversion Archit itecture Comparison (Number of It Iterations)

14

Architecture Algorithm Multiplication type Number of Iterations m = 163 m = 233 m = 283 m = 409 m = 571 [4] ITA 1 Γ— Single

N1 9 10 11 11 13

[7, 6] TIT/MTIT 1 Γ— double

N2 5 9 8 7 8

[8] Optimal-3 chain 1 Γ— double

N3 5 7 6 7 7

Proposed ITA 2 Γ— Single Interleaved

⌈N1

2 βŒ‰

5 5 6 6 7

[4] T. Itoh and S. Tsujii, β€œA fast algorithm for computing multiplicative inverses in GF(2m) using normal bases,” Information and computation, vol. 78, no. 3, pp. 171–177, 1988. [6] J. Hu, W. Guo, J. Wei, and R. Cheung, β€œFast and Generic Inversion Architectures Over GF(2m) Using Modified Itoh–Tsujii Algorithms,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, pp. 367–371, April 2015. [7] R. Azarderakhsh, K. Jarvinen, and V. Dimitrov, β€œFast Inversion in GF(2m) with Normal Basis Using Hybrid-Double Multipliers,” IEEE Trans. Comput., vol. 63, pp. 1041–1047, April 2014. [8] K. Jarvinen, V. Dimitrov, and R. Azarderakhsh, β€œA Generalization of Addition Chains and Fast Inversions in Binary Fields,” IEEE

  • Trans. Comput., vol. 64, pp. 2421–2432, Sept. 2015.
  • Our Proposed inversion architecture reduces the required number of

iterations as compared with previous works.

  • The best performance is achieved when 𝑛 = 233.
slide-15
SLIDE 15
  • We have implemented our proposed inversion architecture as well

as the inversion architectures presented in [4] and [8] over 𝐻𝐺 2233 using VHDL.

  • The functionality of the architectures has been verified through

simulations using the ModelSim.

  • ASIC implementations of the proposed architectures conducted

based on using the Synopsys Design Compiler tool.

  • We have used the default settings in Design Compiler with the

STMicroelectronics’ standard 65nm CMOS technology libraries.

15

ASIC Im Implementations

[4] T. Itoh and S. Tsujii, β€œA fast algorithm for computing multiplicative inverses in GF(2m) using normal bases,” Information and computation, vol. 78, no. 3, pp. 171–177, 1988. [8] K. Jarvinen, V. Dimitrov, and R. Azarderakhsh, β€œA Generalization of Addition Chains and Fast Inversions in Binary Fields,” IEEE

  • Trans. Comput., vol. 64, pp. 2421–2432, Sept. 2015.
slide-16
SLIDE 16

16

  • Proposed architectures is

faster than two others for digit size less than 30 , while for bigger digit size, [4] is better

  • Proposed architecture has

better Efficiency for the digit sizes ≀ 8, while for the digit sizes larger than 8, [4] gets the better Efficiency result.

ASIC Im Implementations (cont’d)

[4] T. Itoh and S. Tsujii, β€œA fast algorithm for computing multiplicative inverses in GF(2m) using normal bases,” Information and computation, vol. 78, no. 3, pp. 171–177, 1988. [8] K. Jarvinen, V. Dimitrov, and R. Azarderakhsh, β€œA Generalization of Addition Chains and Fast Inversions in Binary Fields,” IEEE

  • Trans. Comput., vol. 64, pp. 2421–2432, Sept. 2015.
slide-17
SLIDE 17

Conclusion and Future Work

  • Conclusion:
  • We have proposed a novel scheme for concurrent

computing of composite square-and-multiply operation at the digit-level.

  • We have proposed a new GNB field inversion architecture.
  • We have conducted ASIC implementations of different

inversion schemes and shown that the proposed inversion architecture outperforms its counterparts.

  • Future Work
  • Integrating the proposed inverter to Elliptic Curve

Cryptographic processors.

  • Evaluating the possibility of using polynomial basis

representation for the proposed architectures.

17

slide-18
SLIDE 18

Thank You & Questions?

18