Special Purpose Hardware for Factoring: Special Purpose Hardware for - PowerPoint PPT Presentation

Special Purpose Hardware for Factoring: Special Purpose Hardware for Factoring: the NFS Sieving Step the NFS Sieving Step Adi Shamir Eran Tromer Adi Shamir Eran Tromer Weizmann Institute of Science Weizmann Institute of Science 1

Bicycle chain sieve [D. H. Lehmer, 1928] Bicycle chain sieve [D. H. Lehmer, 1928] 2

NFS: Main computational steps Relation collection Matrix step: (sieving) step: Find many relations. Find a linear relation between the corresponding exponent vectors. Presently dominates cost for Cost dramatically reduced by 1024-bit composites. mesh-based circuits. Surveyed in Adi Shamir’s talk. Subject of this survey. 3

Outline • The relation collection problem • Traditional sieving • TWINKLE • TWIRL • Mesh-based sieving 4

The Relation Collection Step The task: Given a polynomial f (and f ′ ), find many integers a for which f ( a ) is B -smooth (and f ′ ( a ) is B ′ -smooth). For 1024-bit composites: • We need to test 3  10 23 sieve locations (per sieve). • The values f ( a ) are on the order of 10 100 . • Each f ( a ) should be tested against all primes up to B = 3.5  10 9 (rational sieve) and B ′ = 2.6  10 10 (algebraic sieve). (TWIRL settings) 5

Sieveless Relation Collection • We can just factor each f ( a ) using our favorite factoring algorithm for medium-sized composites, and see if all factors are smaller than B . • By itself, highly inefficient. (But useful for cofactor factorization or Coppersmith’s NFS variants.) 6

Relation Collection via Sieving • The task: Given a polynomial f (and f ′ ), find many integers a for which f ( a ) is B -smooth (and f ′ ( a ) is B ′ -smooth). • We look for a such that p | f ( a ) for many large p : • Each prime p “hits” at arithmetic progressions: where r i are the roots modulo p of f . (there are at most deg( f ) such roots, ~1 on average). 7

The Sieving Problem Input: a set of arithmetic progressions. Each progression has a prime interval p and value log p . Output: indices where the sum of values exceeds a threshold. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O a 8

The Game Board arithmetic progressions O 41 37 O 31 29 O 23 O 19 O 17 O O 13 O O O 11 O O O 7 O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 sieve locations ( a values) 9

Traditional PC-based sieving [Eratosthenes of Cyrene] [Carl Pomerance, Richard Schroeppel] 276–194 BC 10

PC-based sieving 2. Assign one memory location to each candidate number in the interval. 3. For each arithmetic progression: • Go over the members of the arithmetic progression in the interval, and for each: • Adding the log p value to the appropriate memory locations. 4. Scan the array for values passing the threshold. 11

Traditional sieving, à la Eratosthenes O 41 37 O 31 29 O 23 Time O 19 O 17 O O 13 O O O 11 O O O 7 O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Memory 12

Properties of traditional PC-based sieving: • Handles (at most) one contribution per clock cycle. • Requires PC’s with enormously large RAM’s. • For large p , almost any memory access is a cache miss. 13

Estimated recurring costs with current technology (US$  year) 768-bit 1024-bit Traditional 1.3  10 7 10 12 PC-based 14

TWINKLE (The Weizmann INstitute Key Locating Engine) [Shamir 1999] [Lenstra, Shamir 2000] 15

TWINKLE: An electro-optical sieving device • Reverses the roles of time and space: assigns each arithmetic progression to a small “cell” on a GaAs wafer, and considers the sieved locations one at a time. • A cell handling a prime p flashes a LED once every p clock cycles. • The strength of the observed flash is determined by a variable density optical filter placed over the wafer. • Millions of potential contributions are optically summed and then compared to the desired threshold by a fast photodetector facing the wafer. 16

Breaking News Exclusive photos of a working TWINKLE device in this very city! 17

Photo-emitting cells (every round hour) Concave mirror Optical sensor 18

TWINKLE: time-space reversal O 41 37 O 31 29 O 23 O 19 Counters O 17 O O 13 O O O 11 O O O 7 O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time 19

Estimated recurring costs with current technology (US$  year) 768-bit 1024-bit Traditional 1.3  10 7 10 12 PC-based TWINKLE 8  10 6 But: NRE… 20

Properties of TWINKLE: • Takes a single clock cycle per sieve location, regardless of the number of contributions. • Requires complicated and expensive GaAs wafer-scale technology. • Dissipates a lot of heat since each (continuously operating) cell is associated with a single arithmetic progression. • Limited number of cells per wafer. • Requires auxiliary support PCs, which turn out to dominate cost. 21

TWIRL (The Weizmann Institute Relation Locator) [Shamir, Tromer 2003] [Lenstra, Tromer, Shamir, Kortsmit, Dodson, Hughes, Leyland 2004] 22

TWIRL: TWINKLE with compressed time • Uses the same time-space reversal as TWINKLE. • Uses a pipeline (skewed local processing) instead of electro-optical phenomena (instantaneous global processing). • Uses compact representations of the progressions (but requires more complicated logic to “decode” these representations). • Runs 3-4 orders of magnitude faster than TWINKLE by parallelizing the handling of sieve locations: “compressed time”. 23

TWIRL: compressed time s =5 indices handled at each clock cycle. (real: s =32768 ) O 41 37 O 31 Various circuits 29 O 23 O 19 O 17 O O 13 O O O 11 O O O 7 O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time 24

Parallelization in TWIRL TWINKLE-like Simple parallelization with factor s pipeline a =0, s ,2 s , … a =0,1,2, … 2 s 2 s +1 2 s +2 2 s +3 3 s -1 2 1 s s + 1 s + 2 s + 3 2 s - 1 0 s - 1 0 2 3 1 25

Parallelization in TWIRL TWINKLE-like Simple parallelization with factor s TWIRL with parallelization factor s pipeline a =0, s ,2 s , … a =0, s ,2 s , … a =0,1,2, … 2 s 2 s +1 2 s +2 2 s +3 3 s -1 2 1 s s + 1 s + 2 s + 3 2 s - 1 0 s - 1 0 2 3 1

Heterogeneous design • A progression of interval p makes a O contribution every p / s clock cycles. O • There are a lot of large primes, but each O contributes very seldom. O O • There are few small primes, but their O O contributions are frequent. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 27

Small primes (few but bright) Large primes (many but dark) 28

Heterogeneous design We place several thousand “stations” along the pipeline. Each station handles progressions whose prime interval are in a certain range. Station design varies with the magnitude of the prime. 29

Example: handling large primes • Each prime makes a contribution once per 10,000’s of clock cycles (after time compression); inbetween, it’s merely stored compactly in DRAM. • Each memory+processor unit handles many progressions. It computes and sends contributions across the bus, where they are added at just the right time. Timing is critical. Processor Memory Processor Memory 30

Implementing a priority queue of events The memory contains a list of events of the form ( p i , a i ), • meaning “ a progression with interval p i will make a contribution to index a i ”. Goal: implement a priority queue. The list is ordered by increasing a i . • • At each clock cycle: 1 . Read next event ( p i , a i ). 2. Send a log p i contribution to line a i ( mod s ) of the pipeline. 3. Update a i Ã a i + p i 4. Save the new event ( p i , a i ) to the memory location that will be read just before index a i passes through the pipeline. • To handle collisions, slacks and logic are added. 31

Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P 32

Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P • By assigning similarly-sized primes to the same processor (+ appropriate choice of parameters), we guarantee that new events are always written just behind the read head. • There is a tiny (1:1000) window of activity which is “twirling” around the memory bank. It is handled by an SRAM-based cache. The bulk of storage is handled in compact DRAM. 33

Special Purpose Hardware for Factoring: Special Purpose Hardware for - PowerPoint PPT Presentation

Special Purpose Hardware for Factoring: Special Purpose Hardware for Factoring: the NFS Sieving Step the NFS Sieving Step Adi Shamir Eran Tromer Adi Shamir Eran Tromer Weizmann Institute of Science Weizmann Institute of Science 1

Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of

Factors 2 12 Factors Factors 3 13 and Unique Unique 14 4 to 10 to 15 Greatest Common

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

FACTORING WITH JUDGMENTS w w w . g u t i e r r e z g r o u p . c o m . c o What is it about?

Factoring integers and Producing primes Francesco Pappalardi Universit` a Roma Tre Erbil,

Computability Theory at Work: Factoring Polynomials and Finding Roots Russell Miller Queens

Integer factoring and compositeness witnesses Jacek Pomykaa & Maciej Radziejewski June 26,

Factoring integers, Producing primes and the RSA cryptosystem Francesco Pappalardi Universit` a

Factoring Polynomials over Local Fields II Sebastian Pauli Department of Mathematics and

Shors Algorithm for Factoring: Background Quantum Algorithms In 1994, Peter Shor came up with O

Factoring into large primes with P-1, P+1 and ECM Alexander Kruppa LORIA CADO workshop Nancy,

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Hardware evaluation and procurement Hardware: competition, evolution, Evaluation of CPU nodes

Filtering and the matrix step in NFS Thorsten Kleinjung Laboratory for Cryptologic Algorithms

NFS over RDMA Brent Callaghan, Theresa Lingutla-Raj, Alex Chiu, Peter Staubach, Omer Asad Sun

Network File System - NFS NFS Specification NFS is a distributed file system (DFS) originally

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

two-phase commit / security (start) 1 Changelog Changes made in this version not seen in fjrst

April 3: Access Control Matrix Overview Access Control Matrix Model Boolean Expression

The problem Given an integer N that we want to factor with the number field sieve, find two

Module 17: Distributed-File Systems Background Naming and Transparency Remote File

Special Purpose Hardware for Factoring: Special Purpose Hardware for - PowerPoint PPT Presentation

Special Purpose Hardware for Factoring: Special Purpose Hardware for Factoring: the NFS Sieving Step the NFS Sieving Step Adi Shamir Eran Tromer Adi Shamir Eran Tromer Weizmann Institute of Science Weizmann Institute of Science 1

Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of

Factors 2 12 Factors Factors 3 13 and Unique Unique 14 4 to 10 to 15 Greatest Common

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

FACTORING WITH JUDGMENTS w w w . g u t i e r r e z g r o u p . c o m . c o What is it about?

Factoring integers and Producing primes Francesco Pappalardi Universit` a Roma Tre Erbil,

Computability Theory at Work: Factoring Polynomials and Finding Roots Russell Miller Queens

Integer factoring and compositeness witnesses Jacek Pomykaa &amp; Maciej Radziejewski June 26,

Factoring integers, Producing primes and the RSA cryptosystem Francesco Pappalardi Universit` a

Factoring Polynomials over Local Fields II Sebastian Pauli Department of Mathematics and

Shors Algorithm for Factoring: Background Quantum Algorithms In 1994, Peter Shor came up with O

Factoring into large primes with P-1, P+1 and ECM Alexander Kruppa LORIA CADO workshop Nancy,

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Hardware evaluation and procurement Hardware: competition, evolution, Evaluation of CPU nodes

Filtering and the matrix step in NFS Thorsten Kleinjung Laboratory for Cryptologic Algorithms

NFS over RDMA Brent Callaghan, Theresa Lingutla-Raj, Alex Chiu, Peter Staubach, Omer Asad Sun

Network File System - NFS NFS Specification NFS is a distributed file system (DFS) originally

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

two-phase commit / security (start) 1 Changelog Changes made in this version not seen in fjrst

April 3: Access Control Matrix Overview Access Control Matrix Model Boolean Expression

The problem Given an integer N that we want to factor with the number field sieve, find two

Module 17: Distributed-File Systems Background Naming and Transparency Remote File

Integer factoring and compositeness witnesses Jacek Pomykaa & Maciej Radziejewski June 26,