Efficient arithmetic on elliptic curves in large characteristic D. - - PDF document

efficient arithmetic on elliptic curves in large
SMART_READER_LITE
LIVE PREVIEW

Efficient arithmetic on elliptic curves in large characteristic D. - - PDF document

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of Illinois at Chicago Fix a field and an elliptic curve. e.g. NIST P-224: the elliptic curve a 6 over Z =p . y 2 = x 3 3 x + p = 2 224 2 96 + 1


slide-1
SLIDE 1

Efficient arithmetic

  • n elliptic curves

in large characteristic

  • D. J. Bernstein

University of Illinois at Chicago

slide-2
SLIDE 2

Fix a field and an elliptic curve. e.g. NIST P-224: the elliptic curve

y2 = x3 3x + a6 over Z =p.

Here

p = 2224 296 + 1

and

a6 = 18958286285566608

00040866854449392 64155046809686793 21075787234672564. e.g. NIST P-256: the elliptic curve

y2 = x3 3x +
  • ver Z =p where
p = 2256 2224 + 2192 + 296 1.

e.g. Curve25519: the elliptic curve

y2 = x3 + 486662x2 + x over Z =p

where

p = 2255 19.
slide-3
SLIDE 3

“Elliptic-curve scalar multiplication”: Given (

x; y) on curve,

and given integer

n 0,

compute

nth multiple of ( x; y)

in the elliptic-curve group. This is the bottleneck in elliptic-curve Diffie-Hellman. The big question: How quickly can we do this? Many variations of problem: e.g.

m; n; P ; Q 7! mP + nQ,

critical for elliptic-curve signatures.

slide-4
SLIDE 4

Review of addition chains Typical recursive formulas: 2P =

P+P. 3 P = 2P+P.

4P = 2P+2P. 5

P = 3P+2P.

6P = 3P+3P. 7

P = 5P+2P.

2nP = 7P+(

n7) P if 4 n<8.

(2n+1)

P = 2nP+P if 4 n<8.

(4n+1)

P = 4nP+P if 4 n<8.

(4n+3)

P = 4nP+3P if 4 n<8.

2nP =

nP+ nP if 8
  • n.

(8n+1)

P = 8nP+P if 4
  • n.

(8n+3)

P = 8nP+3P if 4
  • n.

(8n+5)

P = 8nP+5P if 4
  • n.

(8n+7)

P = 8nP+7P if 4
  • n.
slide-5
SLIDE 5

This addition chain (“length-3 sliding windows”) uses

lg n doublings and 0:25 lg n more additions

to compute

nP for average n.

e.g.

320 additions for

average

n 2
  • 0; 1;
: : : ; 2256 1
  • .

Some easy improvements from fast negation on elliptic curves: (16n

7) P = 16nP 7P, etc.

Also use endomorphisms for “Koblitz curves,” “GLV curves.” More complicated methods replace 0 :25 by

1=lg lg n.
slide-6
SLIDE 6

Explicit doubling formulas On curve

y2 = x3 3x + a6:

2(

x; y) = ( x 00 ; y 00) where = (3 x2 3) =2y, x 00 = 2 2x, y 00 = ( x
  • x
00)
  • y.

7 subs etc., 2 squarings, 1 more mult, 1 division. How do we divide efficiently in a finite field?

slide-7
SLIDE 7 f =g = f g p2 in prime field Z =p.

Can compute

g p2 with lg p squarings and (lg p) =lg lg p more mults.

e.g.

p = 2224 296 + 1:

223 squarings, 11 more mults. More generally,

f =g = f g q 2

in any field of size

q.

There are faster division methods (e.g. “Euclid”—beware timing attacks!); smaller “I/M ratio.” Special methods for some fields.

slide-8
SLIDE 8

Speedup: delay divisions Division costs many mults even with fastest division methods. Save time by delaying divisions. Naive division-delay method: Store field elements as fractions until end of computation. Divide once before output. Mult fractions with 2 field mults. Divide fractions with 2 field mults. Add fractions with 3 field mults.

slide-9
SLIDE 9

Speedup: unify denominators For elliptic-curve doubling, have denominator 2

y

in

= (3 x2 3) =2y;

denominator (2

y)2

in

x 00 = 2 2x;

denominator (2

y)3

in

y 00 = ( x
  • x
00)
  • y.

Subsequent computations will perform separate computations

  • n the denominators (2
y)2 ; (2y)3
  • f
x 00 ; y 00.

Save time by manipulating denominators together.

slide-10
SLIDE 10

“Jacobian coordinates”: Store (

x; y ; z) to represent

elliptic-curve point (

x=z2 ; y =z3).

2(

x=z2 ; y =z3) = ( x 00 ; y 00) where = (3( x=z2)2 3) =2( y =z3)

=

=2y z with = 3x2 3z4; x 00 = 2 2( x=z2)

= (

2 8xy2) =(2y z)2; y 00 = (( x=z2)
  • x
00) ( y =z3)

= (12

xy2
  • 3
8y4) =(2y z)3.
slide-11
SLIDE 11

2(

x=z2 ; y =z3) = ( x2 =z2

2

; y2 =z3

2)

where

z2 = 2 y z, = 3 x2 3z4, x2 = 2 8xy2, y2 = (4xy2
  • x2)
8y4.

Easily compute with 6 squarings, 3 more mults:

x2, z2, z4, y2, y4, y z, xy2, 2, (
  • ).

Also some subs, doublings, etc. Use fast field arithmetic: e.g., can delay carries and reductions in computing

y2.
slide-12
SLIDE 12

Speedup: difference of squares Can compute 3x2

3z4 as

3(

x
  • z2)(
x + z2).

Replace 3 squarings by 1 mult, 1 squaring. Revised total: 4 squarings, 4 more mults. Note: 3x2

3z4 came from 3 x2 3,

derivative of

x3 3x + a6.

Wouldn’t have same speedup for, e.g.,

x3 5x + a6.
slide-13
SLIDE 13

Speedup:

f2 ; g2 ; 2f g

After computing

f2 and g2

can compute 2 f

g

as (

f + g)2
  • f2
  • g2.

In particular: After computing

y2 and z2

can compute 2 y

z

as (

y + z)2
  • y2
  • z2.

Replace 1 mult with 1 squaring. Revised total: 5 squarings, 3 more mults.

slide-14
SLIDE 14

Explicit addition formulas Similar speedups in formulas for adding distinct points. 5 squarings, 11 more mults. Again some opportunities to delay carries, etc.

slide-15
SLIDE 15

Speedup: cache results In adding (

x1 =z2

1

; y1 =z3

1)

to (

x2 =z2

2

; y2 =z3

2),

compute many intermediates, including

z2

1

; z3

1.

Often add same point again to a different point; can reuse

z2

1

; z3

1.

“Chudnovsky coordinates.”

slide-16
SLIDE 16

Speedup: delay fewer divisions? Faster divisions sometimes justify delaying fewer divisions. e.g. Do we really need fractions for

P ; 3P ; 5P ; 7P?

Can convert

P ; 3P ; 5P ; 7P
  • ut of Jacobian coordinates

with one division, several mults. Then save mults in every addition of

P ; 3P ; 5P ; 7P.

“Mixed coordinates.” Sometimes worthwhile, depending on division speed.

slide-17
SLIDE 17

Montgomery coordinates On elliptic curves with “Montgomery form”

y2 = x3 + a2 x2 + x,

preferably with small (

a2 2) =4: n( x1 ; : : :) = ( x n =z n ; : : :) where z1 = 1; x2m = ( x2 m
  • z2
m)2; z2m=4x m z m( x2 m+a2 x m z m+z2 m); x2m+1=4( x m x m+1 z m z m+1)2; z2m+1=4( x m z m+1 z m x m+1)2 x1.

Can also figure out

y,
  • r use cryptographic protocols

that ignore

y.
slide-18
SLIDE 18 x m
  • z
m
  • x
m+1
  • z
m+1
  • +
  • +
  • +
  • +
  • a2
2

4

  • x1
  • x2m
z2m x2m+1 z2m+1
slide-19
SLIDE 19

Assuming (

a2 2) =4 small,

main operations are 4 squarings, 5 more mults for each bit of

n.

Compare to Jacobian coordinates: each bit of

n has

5 squarings, 3 more mults, and on occasion 5 more squarings, 11 more mults. Montgomery form is better if

n is not gigantic.
slide-20
SLIDE 20

What are today’s speed records? Let’s focus on Pentium M. Each Pentium M cycle does

1 floating-point operation:

fp add or fp sub or fp mult. Current scalar-multiplication software for

y2 = x3+486662x2+ x
  • ver Z =(2255
19):

640838 Pentium M cycles. 589825 fp ops;

0:92 per cycle.

Understand cycle counts fairly well by simply counting fp ops.

slide-21
SLIDE 21

Main loop: 545700 fp ops. 2140 times 255 iterations. Reciprocal: 43821 fp ops. 41148 = 254

162 for 254 squares;

2673 = 11

243 for 11 more mults.

Additional work: 304 fp ops. Inside one main-loop iteration: 80 = 8

10 for 8 adds/subs;

55 for mult by 121665; 648 = 4

162 for 4 squarings;

1215 = 5

243 for 5 more mults;

142 for

bx[1] + (1
  • b)
x[0] etc.
slide-22
SLIDE 22

An integer mod 2255

19 is

represented in radix 225:5 as a sum of 10 fp numbers in specified ranges. Add/sub: 10 fp adds/subs. Delay reductions and carries! Mult: poly mult using 102 fp mults, 92 fp adds; reduce using 9 fp mults, 9 fp adds; carry 11 times, each 4 fp adds;

  • verall 2
102 + 4 10 + 3 fp ops.

Squaring: first do 9 fp doublings; then eliminate 92 + 9 fp ops;

  • verall 1
102 + 6 10 + 2 fp ops.
slide-23
SLIDE 23

Course advertisement “High-speed cryptography” at the Fields Institute, 36 hours, starting 23 Oct, ending 17 Nov. What are the state-of-the-art cryptographic functions for sharing secrets, expanding keys, authenticating data, signing data? How fast are these functions in software for typical CPUs? What’s known about security? How were the functions chosen? cr.yp.to/highspeed.html