Pushing the Limits of High-Speed GF (2 m ) Elliptic Curve Scalar - PowerPoint PPT Presentation

Pushing the Limits of High-Speed GF (2 m ) Elliptic Curve Scalar Multiplier on FPGAs Chester Rebeiro, Sujoy Sinha Roy, and Debdeep Mukhopadhyay Secured Embedded Architecture Lab Indian Institute of Technology Kharagpur India 9/12/2012 CHES 2012, Leuven Belgium

Elliptic Curve Scalar Multiplication • An elliptic curve over GF (2 m ) is a set of points which satisfies the equation y 2 + xy = x 3 + ax 2 + b , where a , b ∈ GF (2 m ) and b � = 0. The points on the elliptic curve form an additive group. • The projective coordinate representation of the curve is Y 2 + XYZ = X 3 + aX 2 Z 2 + bZ 4 • Scalar Multiplication : Given a base point P = ( X P , Y P , Z P ) on the elliptic curve and a scalar s compute Q = sP ( i.e. Q = P + P + P + · · · ( stimes )) 9/12/2012 CHES 2012, Leuven Belgium 2

Montgomery Ladder for Scalar Multiplication Inputs : scalar s = ( s t − 1 s t − 2 · · · s 1 s 0 ) 2 basepoint P Output : Scalar Product Q = sP 1 P 1 = ( X 1 , Y 1 , Z 1 ) ← P and P 2 = ( X 2 , Y 2 , Z 2 ) ← 2 · P 2 For each bit s k (for k = t − 2 , t − 3 , · · · , 0) • if s k = 1 then P 1 ← P 1 + P 2 ; P 2 = 2 · P 2 • if s k = 0 then P 2 ← P 1 + P 2 ; P 1 = 2 · P 1 3 Q = Projective 2 Affine ( P 1 ) 9/12/2012 CHES 2012, Leuven Belgium 3

Montgomery Ladder for Scalar Multiplication Inputs : scalar s = ( s t − 1 s t − 2 · · · s 1 s 0 ) 2 basepoint P Output : Scalar Product Q = sP 1 P 1 = ( X 1 , Y 1 , Z 1 ) ← P and P 2 = ( X 2 , Y 2 , Z 2 ) ← 2 · P 2 For each bit s k (for k = t − 2 , t − 3 , · · · , 0) • if s k = 1 then P 1 ← P 1 + P 2 ; P 2 = 2 · P 2 • if s k = 0 then P 2 ← P 1 + P 2 ; P 1 = 2 · P 1 3 Q = Projective 2 Affine ( P 1 ) Performing P i ← P i + P j and P j ← 2 · P i X i ← X i · Z j ; Z i ← X j · Z i ; T ← X j ; X j ← X 4 j + b · Z 4 j Z j ← ( T · Z j ) 2 ; T ← X i · Z i ; Z i ← ( X i + Z i ) 2 ; X i ← x · Z i + T . . . all operations are in GF (2 m ) 9/12/2012 CHES 2012, Leuven Belgium 3

Engineering the Montgomery Ladder for Scalar Multiplication Scalar Multiplication Arithmetic Regbank Unit Elliptic Curve sP Group Operations Finite Field Operations ROM Control Unit s (a) The ECC Pyramid (b) Block Diagram 9/12/2012 CHES 2012, Leuven Belgium 4

Engineering the Montgomery Ladder for Scalar Multiplication Scalar Multiplication Arithmetic Regbank Unit Elliptic Curve sP Group Operations Finite Field Operations ROM Control Unit s (a) The ECC Pyramid (b) Block Diagram High-speed scalar multiplication on FPGAs • Minimize area by maximizing utilization of available resources • Optimal Pipelining • Efficient Scheduling of Operations 9/12/2012 CHES 2012, Leuven Belgium 4

Field Programmable Gate Arrays • Provides the speed of hardware and the reconfigurablitity of software • FPGA Architecture Programmable Routing Switches Programmable Connection Switch Logic Block (a) FPGA Island 9/12/2012 CHES 2012, Leuven Belgium 5

Field Programmable Gate Arrays • Provides the speed of hardware and the reconfigurablitity of software • FPGA Architecture COUT Programmable Routing Switches F4 PRE F3 Control LUT & D Q F2 Carry CE Logic Programmable F1 Connection Switch Logic Block CLK CLR SR CE CLK BY CIN (a) FPGA Island (b) Lookup Table 9/12/2012 CHES 2012, Leuven Belgium 5

LUT Utilization LUT • Four (or six) input → one output • Can implement any four (or six) input truth table • y 1 = x 1 ⊕ x 2 ⊕ x 3 ⊕ x 4 Requires one LUT. • y 2 = x 1 ⊕ x 2 Still requires one LUT. 9/12/2012 CHES 2012, Leuven Belgium 6

LUT Utilization LUT • Four (or six) input → one output • Can implement any four (or six) input truth table • y 1 = x 1 ⊕ x 2 ⊕ x 3 ⊕ x 4 Requires one LUT. • y 2 = x 1 ⊕ x 2 Still requires one LUT. • y 2 results in an under utilized LUT. . . . need to maximize LUT utilization to minimize area. 9/12/2012 CHES 2012, Leuven Belgium 6

Finite field Multiplier for Best LUT utilization 233 Karatusba Multiplier 117 116 58 59 58 58 29 29 29 29 29 29 29 30 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 8 5 6 (a) Karatsuba-Ofman Multiplication 9/12/2012 [VLSID 2008] CHES 2012, Leuven Belgium 7

Finite field Multiplier for Best LUT utilization 233 Karatusba Multiplier 233 Karatusba Multiplier 117 117 116 116 58 59 58 59 58 58 58 58 29 29 29 29 29 29 29 30 29 29 29 30 29 29 29 29 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 77 78 7778 7778 7778 7778 777 8 77 78 7856 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 8 5 6 Classical Multiplier (a) Karatsuba-Ofman Multiplication (b) Hybrid Karatsuba Multiplication 9/12/2012 [VLSID 2008] CHES 2012, Leuven Belgium 7

Finite field Multiplier for Best LUT utilization 233 Karatusba Multiplier 233 Karatusba Multiplier 117 117 116 116 58 59 58 59 58 58 58 58 29 29 29 29 29 29 29 30 29 29 29 30 29 29 29 29 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 77 78 7778 7778 7778 7778 777 8 77 78 7856 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 8 5 6 Classical Multiplier (a) Karatsuba-Ofman Multiplication (b) Hybrid Karatsuba Multiplication 9600 LUTs 8800 4 6 8 10 12 14 16 18 20 22 Threshold (c) Finding the Right Threshold 9/12/2012 [VLSID 2008] CHES 2012, Leuven Belgium 7

Finite field Multiplier for Best LUT utilization 233 Karatusba Multiplier 233 Karatusba Multiplier 117 117 116 116 58 59 58 59 58 58 58 58 29 29 29 29 29 29 29 30 29 29 29 30 29 29 29 29 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 14 15 14 15 14 15 14 15 14 15 14 15 14 15 15 15 77 78 7778 7778 7778 7778 777 8 77 78 7856 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 7 7 8 7 8 5 6 Classical Multiplier (a) Karatsuba-Ofman Multiplication (b) Hybrid Karatsuba Multiplication 1.1e+06 1e+06 900000 9600 800000 LUTs 700000 Area * Time 600000 500000 400000 8800 300000 200000 4 6 8 10 12 14 16 18 20 22 100000 Karatsuba-Ofman Hybrid Karatsuba 0 90 180 270 360 450 540 Threshold Number of bits (c) Finding the Right Threshold (d) Comparing Multipliers 9/12/2012 [VLSID 2008] CHES 2012, Leuven Belgium 7

Finite Field Inversion Using Itoh-Tsujii Algorithm • Given a ∈ GF (2 m ), find a − 1 ∈ GF (2 m ) such that a · a − 1 = 1 • Fermat’s Little Theorem : a − 1 = a 2 m − 2 • Itoh-Tsujii Algorithm 1 Define the addition chain for m − 1 (for example m = 233 : (1 , 2 , 3 , 6 , 7 , 14 , 28 , 58 , 116 , 232)) 2 Compute a → a 2 2 − 1 → a 2 3 − 1 → a 2 6 − 1 → a 2 7 − 1 → a 2 14 − 1 · · · → a 2 232 − 1 3 Square to get a 2 233 − 2 9/12/2012 CHES 2012, Leuven Belgium 8

Finite Field Inversion Using Itoh-Tsujii Algorithm • Given a ∈ GF (2 m ), find a − 1 ∈ GF (2 m ) such that a · a − 1 = 1 • Fermat’s Little Theorem : a − 1 = a 2 m − 2 • Itoh-Tsujii Algorithm 1 Define the addition chain for m − 1 (for example m = 233 : (1 , 2 , 3 , 6 , 7 , 14 , 28 , 58 , 116 , 232)) 2 Compute a → a 2 2 − 1 → a 2 3 − 1 → a 2 6 − 1 → a 2 7 − 1 → a 2 14 − 1 · · · → a 2 232 − 1 3 Square to get a 2 233 − 2 • Exponentiation requires a series of cascaded squarers called powerblock along with a finite field multiplier Input Square Square Square Square Circuit−1 Circuit−2 Circuit−3 Circuit−11 qsel Multiplexer Output 9/12/2012 CHES 2012, Leuven Belgium 8

Using Higher Exponents in the Itoh-Tsujii Algorithm Consider using a quad circuit instead of a square. • This requires an addition chain to m − 1 instead of m − 1 thus 2 finishes faster. [IEEE TVLSI 2011, DATE 2011] 9/12/2012 CHES 2012, Leuven Belgium 9

Using Higher Exponents in the Itoh-Tsujii Algorithm Consider using a quad circuit instead of a square. • This requires an addition chain to m − 1 instead of m − 1 thus 2 finishes faster. • The frequency of operation is not affected and area used is less due to better LUT utilization. Table: Comparison of Squarer and Quad Circuits on Xilinx Virtex 4 FPGA Field Squarer Circuit Quad Circuit Size ratio # LUTq # LUT s Delay (ns) # LUT q Delay (ns) 2(# LUTs ) GF (2 193 ) 96 1.48 145 1.48 0.75 GF (2 233 ) 153 1.48 230 1.48 0.75 [IEEE TVLSI 2011, DATE 2011] 9/12/2012 CHES 2012, Leuven Belgium 9

Using Higher Exponents in the Itoh-Tsujii Algorithm Consider using a quad circuit instead of a square. • This requires an addition chain to m − 1 instead of m − 1 thus 2 finishes faster. • The frequency of operation is not affected and area used is less due to better LUT utilization. Table: Comparison of Squarer and Quad Circuits on Xilinx Virtex 4 FPGA Field Squarer Circuit Quad Circuit Size ratio # LUTq # LUT s Delay (ns) # LUT q Delay (ns) 2(# LUTs ) GF (2 193 ) 96 1.48 145 1.48 0.75 GF (2 233 ) 153 1.48 230 1.48 0.75 • Larger exponent circuits can similarly be used to obtain faster results. [IEEE TVLSI 2011, DATE 2011] 9/12/2012 CHES 2012, Leuven Belgium 9

Pushing the Limits of High-Speed GF (2 m ) Elliptic Curve Scalar - PowerPoint PPT Presentation

Pushing the Limits of High-Speed GF (2 m ) Elliptic Curve Scalar Multiplier on FPGAs Chester Rebeiro, Sujoy Sinha Roy, and Debdeep Mukhopadhyay Secured Embedded Architecture Lab Indian Institute of Technology Kharagpur India 9/12/2012 CHES

Elliptic Curve Cryptography Applications of Elliptic Curve Cryptography Elliptic Curve

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Curve Curve Ninjas December 19, 2012 Curve Ninjas Curve Overview Using Curve Implementation

A Brief Introduction to Elliptic Curve Cryptography Or: A headache in 15 minutes Don Owen March

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

Elliptic Curves over the Rational Mordells Theorem Numbers Q An elliptic curve is a

Forms of elliptic curves Wouter Castryck Forms of elliptic curves First definitions Well-known

Elliptic Curves over Q Peter Birkner Technische Universiteit Eindhoven DIAMANT Summer School on

Elliptic Curve Cryptography Meghana Doddapaneni University of Maryland 5 December 2018

Elliptic-curve cryptography Tanja Lange Technische Univesiteit Eindhoven The Netherlands July

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Hyper-and-elliptic-curve cryptography (which is not the same as: hyperelliptic-curve

Hyper-and-elliptic-curve cryptography (which is not the same as: hyperelliptic-curve

High-speed Define 19; prime. elliptic-curve cryptography Define = 358990. Define 1 Curve :

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Pushing the Limits of Kernel Networking Networking Services Team, Red Hat Alexander Duyck August

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Integers Today ! Numeric Encodings ! Programming Implications ! Basic operations ! Programming

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

1 0 1 1 = 1 x 2 3 + 0 x 2 2 + 1 x 2 1 + 1 x 2 0 8 4 2 1 weight Representing Data with Bits

Smartphone/tablet CPUs iPad 1 (2010) was the first popular tablet: more than 15 million sold.

Pipeline Control unit (highly abstracted) Control ID/EX EX/Mem Unit Mem/WB IF/ID IF ID EX

CS4403 - CS9535: An Overview of Parallel Computing Marc Moreno Maza University of Western

Operators Lecture 4 COP 3014 Spring 2017 January 19, 2017 Operators Special built-in