Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago

Some examples of finite fields: Z = (2 255 � 19). ( Z = (2 61 � 1))[ t ] = ( t 5 � 3). ( Z = 223))[ t ] = ( t 37 � 2). ( Z = 2)[ t ] = ( t 283 � t 12 � t 7 � t 5 � 1). Topic of this talk: How quickly can we add, subtract, multiply in these fields? Answer will depend on platform: AMD Athlon, Sun UltraSPARC IV, Intel 8051, Xilinx Spartan-3, etc. Warning: different platforms often favor different fields!

Why do we care? “Modular exponentiation”: can quickly compute n mod 2 262 � 5081 � � 4 n 2 0 ; 1 ; 2 ; : : : ; 2 256 � 1 given . Similarly, can quickly compute mn mod 2 262 � 5081 given n 4 m mod 2 262 � 5081. and 4 Time-savers: fast field mults, short “addition chains.” “Discrete-logarithm problem”: n mod 2 262 � 5081, find n . given 4 This computation seems harder.

� � � � � Diffie-Hellman secret-sharing p = 2 262 � 5081: system using Alice’s Bob’s m n secret key secret key Alice’s Bob’s m mod n mod public key public key p p 4 4 � � � �� f Alice ; Bob g ’s � f Bob ; Alice g ’s mn mod mn mod = shared secret shared secret p p 4 4 mn mod p . Alice, Bob easily find 4 Seems harder for attacker.

Bad news: “Index calculus” solves DLP at surprising speed! To protect against this attack, � 5081 with replace 2 262 a much larger prime. Much slower arithmetic. Alternative: Elliptic-curve � � cryptography. Replace 1 ; 2 ; : : : ; 2 262 � 5082 with a comparable-size “safe elliptic-curve group.” Somewhat slower arithmetic. Either way, need fast arithmetic in a finite field.

The core question How to multiply big integers? Child’s answer: Use polynomial f 0 ; 1 ; : : : ; 9 g with coefficients in to represent integer in radix 10. With this representation, multiply integers in two steps: 1. Multiply polynomials. 2. “Carry” extra digits. Polynomial multiplication involves small integers. Have split one big multiplication into many small operations.

Example of representation: � 10 2 + 3 � 10 1 + 9 � 10 0 = 839 = 8 t = 10) of polynomial value (at 8 t 2 + 3 t 1 + 9 t 0 . Squaring: (8 t 2 + 3 t 1 + 9 t 0 ) 2 = 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 . Carrying: 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 ; 64 t 4 + 48 t 3 + 153 t 2 + 62 t 1 + 1 t 0 ; 64 t 4 + 48 t 3 + 159 t 2 + 2 t 1 + 1 t 0 ; 64 t 4 + 63 t 3 + 9 t 2 + 2 t 1 + 1 t 0 ; 70 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 ; 7 t 5 + 0 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 . In other words, 839 2 = 703921.

� � � � � � � � � What operations were used here? 8 3 9 � � � �� multiply � � � � � � 72 9 72 � � � � � � � � � � add � � � ... 153 � � � � � � � 6 � � � � � add � � 159 divide by 10 � � mod 10 � � � � � 15 9

Scaled variation: 839 = 800 + 30 + 9 = t = 1) of polynomial value (at 800 t 2 + 30 t 1 + 9 t 0 . Squaring: (800 t 2 + 30 t 1 + 9 t 0 ) 2 = t 4 + 48000 t 3 + 15300 t 2 + 640000 540 t 1 + 81 t 0 . Carrying: t 4 + 48000 t 3 + 15300 t 2 + 640000 540 t 1 + 81 t 0 ; t 4 + 48000 t 3 + 15300 t 2 + 640000 620 t 1 + 1 t 0 ; : : : t 5 + 0 t 4 + 3000 t 3 + 900 t 2 + 700000 20 t 1 + 1 t 0 .

� � � � � � � � � What operations were used here? 800 30 9 � � �� multiply � � � � � � � 7200 900 7200 � � � �� add � ... 15300 � � � � � � � 600 � �� add 15900 subtract � �� mod 1000 15000 900

Speedup: double inside squaring � � � + f 2 t 2 + f 1 t 1 + f 0 t 0 Squaring produces coefficients such as f 4 f 0 + f 3 f 1 + f 2 f 2 + f 1 f 3 + f 0 f 4 . Compute more efficiently as 2 f 4 f 0 + 2 f 3 f 1 + f 2 f 2 . Or, slightly faster, f 4 f 0 + f 3 f 1 ) + f 2 f 2 . 2( Or, slightly faster, (2 f 4 ) f 0 + (2 f 3 ) f 1 + f 2 f 2 f 1 ; 2 f 2 ; : : : . after precomputing 2 � 1 = 2 of the work Have eliminated if there are many coefficients.

Speedup: allow negative coeffs 7! 15 ; 9. Recall 159 7! 15000 ; 900. Scaled: 15900 7! 16 ; � 1. Alternative: 159 7! 16000 ; � 100. Scaled: 15900 f � 5 ; � 4 ; : : : ; 4 ; 5 g Use digits f 0 ; 1 ; : : : ; 9 g . instead of Several small advantages: easily handle negative integers; easily handle subtraction; reduce products a bit.

Speedup: delay carries ab + 2 : Computing (e.g.) big a; b polynomials, carry, multiply poly, carry, add, carry. square a = 314, b = 271, = 839: e.g. (3 t 2 +1 t 1 +4 t 0 )(2 t 2 +7 t 1 +1 t 0 ) = 6 t 4 + 23 t 3 + 18 t 2 + 29 t 1 + 4 t 0 ; t 4 + 5 t 3 + 0 t 2 + 9 t 1 + 4 t 0 . carry: 8 As before (8 t 2 + 3 t 1 + 9 t 0 ) 2 = 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 ; 7 t 5 + 0 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 . t 5 +8 t 4 +8 t 3 +9 t 2 +11 t 1 +5 t 0 ; +: 7 7 t 5 + 8 t 4 + 9 t 3 + 0 t 2 + 1 t 1 + 5 t 0 .

a; b polynomials, Faster: multiply polynomial, add, carry. square (6 t 4 + 23 t 3 + 18 t 2 + 29 t 1 + 4 t 0 ) + (64 t 4 +48 t 3 +153 t 2 +54 t 1 +81 t 0 ) = 70 t 4 + 71 t 3 + 171 t 2 + 83 t 1 + 85 t 0 ; 7 t 5 + 8 t 4 + 9 t 3 + 0 t 2 + 1 t 1 + 5 t 0 . Eliminate intermediate carries. Outweighs cost of handling slightly larger coefficients. Important to carry between multiplications (and squarings) to reduce coefficient size; but carries are usually a bad idea for additions, subtractions, etc.

Speedup: polynomial Karatsuba f ; g Computing product of polys f < 20, deg g < 20: with (e.g.) deg 400 coefficient mults, 361 coefficient adds. f as F 0 + F 1 t 10 Faster: Write F 0 < 10, deg F 1 < 10. with deg g as G 0 + G 1 t 10 . Similarly write f g = ( F 0 + F 1 )( G 0 + G 1 ) t 10 Then F 0 G 0 � F 1 G 1 t 10 )(1 � t 10 ). + (

F 0 + F 1 , G 0 + G 1 . 20 adds for 300 mults for three products F 0 G 0 , F 1 G 1 , ( F 0 + F 1 )( G 0 + G 1 ). 243 adds for those products. F 0 G 0 � F 1 G 1 t 10 9 adds for with subs counted as adds and with delayed negations. � � � (1 � t 10 ). 19 adds for 19 adds to finish. Total 300 mults, 310 adds. Larger coefficients, slight expense; still saves time. Can apply idea recursively as poly degree grows.

Many other algebraic speedups in polynomial multiplication: Toom, FFT, etc. Increasingly important as polynomial degree grows. O ( n lg n lg lg n ) coeff operations n -coeff product. to compute n Useful for sizes of that occur in cryptography? Maybe; active research area.

Using CPU’s integer instructions Replace radix 10 with, e.g., 2 24 . Power of 2 simplifies carries. Adapt radix to platform. e.g. Every 2 cycles, Athlon 64 can compute a 128-bit product of two 64-bit integers. (5-cycle latency; parallelize!) Also low cost for 128-bit add. Reasonable to use radix 2 60 . Sum of many products of digits fits comfortably below 2 128 . Be careful: analyze largest sum.

e.g. In 4 cycles, Intel 8051 can compute a 16-bit product of two 8-bit integers. Could use radix 2 6 . Could use radix 2 8 , with 24-bit sums. e.g. Every 2 cycles, Pentium 4 F3 can compute a 64-bit product of two 32-bit integers. (11-cycle latency; yikes!) Reasonable to use radix 2 28 . Warning: Multiply instructions are very slow on some CPUs. e.g. Pentium 4 F2: 10 cycles!

Using floating-point instructions Big CPUs have separate floating-point instructions, aimed at numerical simulation but useful for cryptography. In my experience, floating-point instructions support faster multiplication (often much, much faster) than integer instructions, except on the Athlon 64. Other advantages: portability; easily scaled coefficients.

e.g. Every 2 cycles, Pentium III can compute a 64-bit product of two floating-point numbers, and an independent 64-bit sum. e.g. Every cycle, Athlon can compute a 64-bit product and an independent 64-bit sum. e.g. Every cycle, UltraSPARC III can compute a 53-bit product and an independent 53-bit sum. Reasonable to use radix 2 24 . e.g. Pentium 4 can do the same using SSE2 instructions.

How to do carries in floating-point registers? (No CPU carry instruction: not useful for simulations.) Exploit floating-point rounding: add big constant, subtract same constant. � with j � j � 2 75 : e.g. Given compute 53-bit floating-point sum � and constant 3 � 2 75 , of obtaining a multiple of 2 24 ; � 2 75 from result, subtract 3 obtaining multiple of 2 24 � ; subtract from � . nearest

Reducing modulo a prime p . Fix a prime The prime field Z =p f 0 ; 1 ; 2 ; : : : ; p � 1 g is the set � defined as � mod p , with p , + defined as + mod � defined as � mod p . p = 1000003: e.g. 1000000 + 50 = 47 in Z =p ; � 1 = 1000002 in Z =p ; � 23131 = 1 in Z =p . 117505

How to multiply in Z =p ? Can use definition: f g mod p = f g � p b f g =p . f g by a Can multiply precomputed 1 =p approximation; b f g =p . easily adjust to obtain Slight speedup: “2-adic inverse”; “Montgomery reduction.” We can do better: normally p is chosen with a special form (or dividing a special form; see “redundant representations”) f g mod p much faster. to make

Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some examples of finite fields: Z = (2 255 19). ( Z = (2 61 1))[ t ] = ( t 5 3). ( Z = 223))[ t ] = ( t 37 2). ( Z = 2)[ t ] = ( t 283 t

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Digital Design Discussion: Arithmetic Binary Arithmetic Floating-Point Arithmetic Binary

MODELLING FINITE FIELDS Hendrik Lenstra Mathematisch Instituut Universiteit Leiden Finite

Algorithms for multiquadratic number fields D. J. Bernstein Jens Bauch, Daniel J. Bernstein,

Lecture 4 Arithmetic-Logic Unit 1 Arithmetic - Logic Unit ALU Handles integers Does the

Arithmetic for Computers October 31, 2008 Arithmetic for Computers ALU Arithmetic Logic Unit

Section 4 Section 4 Arithmetic Units a 4-1 1 ALU ALU a 4-2 2 Arithmetic Logic Unit (ALU)

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo & Javad

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic Numbers RELIC is an Efficient

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of

Hour and Half Hour Return to Table of Contents Slide 5 / 87 Telling Time What do you know

DataCamp Data Types for Data Science DataCamp Data Types for Data Science From string to

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Algorithms with numbers (1) CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor:

Math Time MATH Tuesday 4/28/20 First Grade Get your whiteboard and expo marker out. Weekly

Lecture 1 Grade school algorithms Sept. 8, 2017 1 What is an algorithm? An algorithm is a

CS 31: Intro to Systems Binary Arithmetic Martin Gagn Swarthmore College January 24, 2016

Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some examples of finite fields: Z = (2 255 19). ( Z = (2 61 1))[ t ] = ( t 5 3). ( Z = 223))[ t ] = ( t 37 2). ( Z = 2)[ t ] = ( t 283 t

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Digital Design Discussion: Arithmetic Binary Arithmetic Floating-Point Arithmetic Binary

MODELLING FINITE FIELDS Hendrik Lenstra Mathematisch Instituut Universiteit Leiden Finite

Algorithms for multiquadratic number fields D. J. Bernstein Jens Bauch, Daniel J. Bernstein,

Lecture 4 Arithmetic-Logic Unit 1 Arithmetic - Logic Unit ALU Handles integers Does the

Arithmetic for Computers October 31, 2008 Arithmetic for Computers ALU Arithmetic Logic Unit

Section 4 Section 4 Arithmetic Units a 4-1 1 ALU ALU a 4-2 2 Arithmetic Logic Unit (ALU)

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo &amp; Javad

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic Numbers RELIC is an Efficient

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of

Hour and Half Hour Return to Table of Contents Slide 5 / 87 Telling Time What do you know

DataCamp Data Types for Data Science DataCamp Data Types for Data Science From string to

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Algorithms with numbers (1) CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor:

Math Time MATH Tuesday 4/28/20 First Grade Get your whiteboard and expo marker out. Weekly

Lecture 1 Grade school algorithms Sept. 8, 2017 1 What is an algorithm? An algorithm is a

CS 31: Intro to Systems Binary Arithmetic Martin Gagn Swarthmore College January 24, 2016

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo & Javad