High Performance ECC over NIST Primes
- n Commercial FPGAs
High Performance ECC over NIST Primes on Commercial FPGAs ECC 2008, - - PowerPoint PPT Presentation
High Performance ECC over NIST Primes on Commercial FPGAs ECC 2008, Utrecht, September 22-24, 2008 Tim Gneysu Horst Grtz Institute for IT-Security Ruhr University of Bochum, Germany Agenda Introduction and Motivation Brief
Application Specific Integrated Circuit (ASIC)
Field Programmable Gate Arrays (FPGA)
Arithmetic Logic Unit Supporting High-performance RSA and ECC over GF(p), International Journal of Electronics 2007.
IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO IO
CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB
CLB
Switch Matrix
Slice (3) Slice (1) Slice (2) Slice (0)
CIN COUT CIN COUT SHIFTIN SHIFTOUT Interconnect to Neighbors Slice
COUT CIN
16 bit LUT FF FF 16 bit LUT
4 4
Floorplan of a 32-bit Counting Application
P B A C
FPGA
PC
Structure of a modern Xilinx Virtex-4 FPGA
I/O I/O I/O I/O I/O CLK I/O I/O I/O CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB 18K BRAM DSP A 18K BRAM DSP A DSP B DSP B
To next DSP From previous DSP
i i+1 18 18 48 48
48
Selected (most relevant) primes standardized by NIST: P-224: 2224-296+1 P-256: 2256-2224+2192 +296 −1 Weierstraß equation for projective Jacobi Chudnovski coordinates: Y2 = X3+aXZ4 +bZ6 mod P Chudnovski Point representation P = (X, Y, Z, Z2, Z3)
Standard multiplication A x B in product scanning form with single ℓ-bit multiplier Parallel comba multiplication of A x B using the MACC function of n DSPs n2 = 16 cycles 4 DSP units n = 4 cycles
a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3 a3b1 a2b2 a1b3 a3b2 a2b3 a3b3 a3b0 a2b1 a1b2 a0b3 a2b0 a1b1 a0b2 a1b0 a0b1 a0b0 a3b1 a2b2 a1b3 a3b2 a3b3 a2b3
DSP #4 DSP #3 DSP #2 DSP #1 ACCUMULATOR
s0 s1 s2 s3 s4 s5 s6 s0 s1 s2 s3 s4 s5 s6
b0 b1 b2 ... bn-2 bn-1 a0 a1 a2 ... an-2 an-1
x + x + x + x + DSP
Registered n-to-1 multiplexer
x + + DSP
cDELAY
+
CARRY
ci
ℓACC 2ℓM
Partial Product Unit Accumulator Unit
ℓm ℓm ℓm ℓACC ℓm ℓm ℓACC-2ℓm ℓm
ℓm = 16 bit ℓACC = 36 bit
Result range
ci ci+1 ci+2 ... ci+k-1 ci+k
DSP
+
2l m
+ +
+/- +/-
Look Ahead Logic ROM
2P 1P ... CTL p
rj c Reduction Chain Correction Step
2lm 2l m 2l m
lm = 16 bit
lA l A lA
ROM
pi CIN1 CIN2
SR SR
MUX
nAlA nAlA 1 f
CARRY
CIN1 COUT2 CIN2 l A CARRY l A+1 l A+1 1
lA = 32 bit
Modular Multiplier
ℓ
Dual Port RAM Modular Addition/ Subtraction
a0 ... an-1
FSM
32
OUT1 OUT2 A B A B CTL CTL SUB
a0 ... an-1 MUX
OUT OUT
ℓ
IN1 IN2
ECC Core
IN
32
OUT
32 32 32
Tim Güneysu gueneysu@crypto.rub.de