Factoring Large Numbers Factoring Large Numbers with the TWIRL - - PowerPoint PPT Presentation

factoring large numbers factoring large numbers with the
SMART_READER_LITE
LIVE PREVIEW

Factoring Large Numbers Factoring Large Numbers with the TWIRL - - PowerPoint PPT Presentation

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi Shamir, Eran Tromer Adi Shamir, Eran Tromer Bicycle chain sieve [D. H. Lehmer, 1928] Bicycle chain sieve [D. H. Lehmer, 1928]


slide-1
SLIDE 1
  • Factoring Large Numbers

Factoring Large Numbers with the TWIRL Device with the TWIRL Device

Adi Shamir, Eran Tromer Adi Shamir, Eran Tromer

slide-2
SLIDE 2
  • Bicycle chain sieve [D. H. Lehmer, 1928]

Bicycle chain sieve [D. H. Lehmer, 1928]

slide-3
SLIDE 3
  • The Number Field Sieve

Integer Factorization Algorithm

  • Best algorithm known for factoring large

integers.

  • Subexponential time, subexponential space.
  • Successfully factored a 512-bit RSA key in

1999 (hundreds of workstations running for many months).

  • Record: 530-bit integer factored in 2003.
slide-4
SLIDE 4
  • NFS: Main steps

Matrix step: Find a linear dependency among the numbers found. Relation collection (sieving) step: Find many numbers satisfying a certain (rare) property.

slide-5
SLIDE 5
  • NFS: Main steps

Cost dramatically reduced by [Bernstein 2001] followed by [LSTT 2002] and [GS 2003]. This work Matrix step: Find a linear dependency among the numbers found. Relation collection (sieving) step: Find many numbers satisfying a certain (rare) property.

slide-6
SLIDE 6
  • Cost of sieving for RSA-1024 in 1 year
  • Traditional PC-based:

[Silverman 2000]

100M PCs with 170GB RAM each: $5×1012

  • TWINKLE:

[Lenstra,Shamir 2000][Silverman 2000]*

3.5M TWINKLEs and 14M PCs: ~ $1011

  • Mesh-based sieving

[Geiselmann,Steinwandt 2002]*

Millions of devices, $1011 to $1010 (if at all?) Multi-wafer design – feasible?

  • Our design: $10M using standard silicon

technology (0.13um, 1GHz).

slide-7
SLIDE 7
  • The Sieving Problem

Input: a set of arithmetic progressions. Each progression has a prime interval p and value logp.

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

Output: indices where the sum of values exceeds a threshold.

slide-8
SLIDE 8
  • 1024-bit NFS sieving parameters
  • Total number of indices to test: 3×1023.
  • Each index should be tested against all

primes up to 3.5×109.

slide-9
SLIDE 9
  • Three ways to sieve your numbers...

O O O O O O O O O

3

19

O O

20

O

21

O

22

O

23

O

24

O

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

O O O O O O O O O

2

O O O O

5

O O

7

O O

11

O

13

O

17

O

19

O

23 29

O

31 37

O

41

primes indices (

values)

slide-10
SLIDE 10
  • Time

O O O O O O O O O

3

19

O O

20

O

21

O

22

O

23

O

24

O

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

O O O O O O O O O

2

O O O O

5

O O

7

O O

11

O

13

O

17

O

19

O

23 29

O

31 37

O

41

Memory One contribution per clock cycle.

PC-based sieving, à la Eratosthenes

276–194 BC

slide-11
SLIDE 11
  • Counters

TWINKLE: time-space reversal

O O O O O O O O O

3

19

O O

20

O

21

O

22

O

23

O

24

O

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

O O O O O O O O O

2

O O O O

5

O O

7

O O

11

O

13

O

17

O

19

O

23 29

O

31 37

O

41

Time One index handled at each clock cycle.

The Weizmann Institute Key Locating Engine [Shamir 99]

slide-12
SLIDE 12
  • Various circuits

TWIRL: compressed time

O O O O O O O O O

3

19

O O

20

O

21

O

22

O

23

O

24

O

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

O O O O O O O O O

2

O O O O

5

O O

7

O O

11

O

13

O

17

O

19

O

23 29

O

31 37

O

41

Time

s=5 indices handled at each clock cycle. (real: s=32768)

The Weizmann Institute Relation Locator

slide-13
SLIDE 13
  • 1

2 3

Parallelization in TWIRL

TWINKLE-like pipeline

a

  • ✁✄✂
☎ ✂ ✆ ✂

slide-14
SLIDE 14

Parallelization in TWIRL

TWINKLE-like pipeline Simple parallelization with factor s

a

  • ✁✄✂

s

✂ ✆

s

TWIRL with parallelization factor s

a

  • ✁✄✂

s

✂ ✆

s

a

  • ✁✄✂
☎ ✂ ✆ ✂

slide-15
SLIDE 15
  • Example (simplified): handling large primes
  • Each prime makes a contribution once per 10,000’s of clock

cycles (after time compression); inbetween, it’s merely stored compactly in DRAM.

  • Each memory+processor unit handles 10,000’s of
  • progressions. It computes and sends contributions across

the bus, where they are added at just the right time. Timing is critical.

Memory Processor Memory Processor

slide-16
SLIDE 16
  • Handling large primes (cont.)

Memory Processor

slide-17
SLIDE 17
  • Implementing a priority queue of events
  • The memory contains a list of events of the form (p

,a

), meaning “a progression with interval p

will make a contribution to index a

”. Goal: implement a priority queue.

  • 1. Read next event (p

,a

).

  • 2. Send a log p

contribution to line a

(mod s) of the pipeline.

  • 3. Update a

←a

+p

  • 4. Save the new event (p

,a

) to the memory location that will be read just before index a

passes through the pipeline.

  • To handle collisions, slacks and logic are added.
  • The list is ordered by increasing a

.

  • At each clock cycle:
slide-18
SLIDE 18
  • Handling large primes (cont.)
  • The memory used by past events can be reused.
  • Think of the processor as rotating around the cyclic

memory:

P r

  • c

e s s

  • r
slide-19
SLIDE 19
  • Handling large primes (cont.)
  • The memory used by past events can be reused.
  • Think of the processor as rotating around the cyclic

memory:

  • By assigning similarly-sized primes to the same processor

(+ appropriate choice of parameters), we guarantee that new events are always written just behind the read head.

  • There is a tiny (1:1000) window of activity which is “twirling”

around the memory bank. It is handled by an SRAM-based

  • cache. The bulk of storage is handled in compact DRAM.

P r

  • c

e s s

  • r
slide-20
SLIDE 20

Rational vs. algebraic sieves

  • In fact, we need to perform two

sieves: rational (expensive) and algebraic (even more expensive).

  • We are interested only in indices

which pass both sieves.

  • We can use the results of the

rational sieve to greatly reduce the cost of the algebraic sieve.

algebraic rational

slide-21
SLIDE 21
  • Notes
  • TWIRL is a hypothetical and untested

design.

  • It uses a highly fault-tolerant

wafer-scale design.

  • The following analysis is based on

approximations and simulations.

slide-22
SLIDE 22
  • TWIRL for 512-bit composites

One silicon wafer full of TWIRL devices (total cost ~$15,000) can complete the sieving in under 10 minutes. This is 1,600 times faster than the best previous design.

slide-23
SLIDE 23
  • TWIRL for 1024-bit composites
  • Operates in clusters of 3

almost independent wafers.

  • Initial investment (NRE):

~$20M

  • To complete the sieving in 1 year
  • Use 194 clusters (~600 wafers).
  • Silicon cost: ~$2.9M
  • Total cost: ~$10M (compared to ~$1T).

A

R R R R R R R R

slide-24
SLIDE 24

.