Hardware- -Based Implementations Based Implementations Hardware - PowerPoint PPT Presentation

� Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of Factoring Algorithms Factoring Large Numbers with the TWIRL Device Factoring Large Numbers with the TWIRL Device Adi Shamir, Eran Tromer Adi Shamir, Eran Tromer Analysis of Bernstein Analysis of Bernstein’ ’s Factorization Circuit s Factorization Circuit Arjen Lenstra, Adi Shamir, Jim Tomlinson, Eran Tromer Arjen Lenstra, Adi Shamir, Jim Tomlinson, Eran Tromer

� Bicycle chain sieve [D. H. Lehmer, 1928] Bicycle chain sieve [D. H. Lehmer, 1928]

� The Number Field Sieve Integer Factorization Algorithm • Best algorithm known for factoring large integers. • Subexponential time, subexponential space. • Successfully factored a 512-bit RSA key (hundreds of workstations running for many months). • Record: 530-bit integer (RSA-160, 2003). • Factoring 1024-bit: previous estimates were trillions of $ × × year . × × • Our result: a hardware implementation which can factor 1024-bit composites at a cost of about 10M $ × × year . × ×

� NFS – main parts • Relation collection (sieving) step: Find many integers satisfying a certain (rare) property. • Matrix step: Find an element from the kernel of a huge but sparse matrix.

� Previous works: 1024-bit sieving Cost of completing all sieving in 1 year: • Traditional PC-based: [Silverman 2000] 100M PCs with 170GB RAM each: $5 × 10 12 • TWINKLE: [Lenstra,Shamir 2000, Silverman 2000] * 3.5M TWINKLEs and 14M PCs: ~ $10 11 • Mesh-based sieving [Geiselmann,Steinwandt 2002] * Millions of devices, $10 11 to $10 10 (if at all?) Multi-wafer design – feasible? • New device: $10M

� Previous works: 1024-bit matrix step Cost of completing the matrix step in 1 year: • Serial: [Silverman 2000] 19 years and 10,000 interconnected Crays. • Mesh sorting [Bernstein 2001, LSTT 2002] 273 interconnected wafers – feasible?! $4M and 2 weeks. • New device: $0.5M

� ✁ ✆ ✆ ✆ ☎ ✂ ✁ ✂ ✂ ✄ ✁ ✂ ✂ ✄ ✁ ✂ ✂ ✂ ✁ ✂ ✁ ✁ ✂ ✂ Review: the Quadratic Sieve To factor n : ≡ r • Find “random” r such that r , r (mod n ) • Hope that gcd( r , n ) is a nontrivial factor of n . - r How? • Let f – n ( a )=( a + ⌊ n ⌋ ) ( a )=( a + ⌊ n ⌋ ) f • Find a nonempty set S ⊂ Z such that over Z for some r ∈ Z . , r ≡ r • r (mod n )

� ✍ ✍ ✓ ✍ ✔ ✍ ✕ ✖ ✍ ✑ ✑ ☛ ✑ ✑ ✑ ✑ ✎ ✑ ✟ ✞ ✁ ✆ ✝ ✟ ✞ ☛ ✡ ✆ The Quadratic Sieve (cont.) How to find S such that is a square? Look at the factorization of f ( a ): ✏✒✑ 102 2 ☞☎✌ 3 17 ✏✒✑ 33 ☞✠✌ 3 11 1495 ☞☎✌ ✏✒✑ 5 13 23 2 2 ✏✒✑ 84 ☞☎✌ 3 7 2 3 616 ✂✠✄ 7 11 145 ✂☎✄ 5 29 ✏✒✑ 42 ☞☎✌ 2 3 7 This is a square, because all exponents are even. 2 4 3 2 5 0 7 2 11 2

� ✁ ✄ ☎ ✝ ✆ ✂ ✁ ☎ ✄ ✂ ✂ ✁ ✁ ✁ ✁ The Quadratic Sieve (cont.) How to find S such that is a square? • Consider only the π ( B ) primes smaller than a bound B . • Search for integers a for which f ( a ) is B -smooth. Relation For each such a , represent the factorization of f ( a ) as collection a vector of b exponents: ( a )=2 e 3 e 5 e 7 e step L a ( e , e ,..., e ) f • Once b +1 such vectors are found, find a dependency modulo 2 among them. That is, find S such that Matrix all even . =2 e 3 e 5 e 7 e L where e step

� ✁ Observations [Bernstein 2001] • The matrix step involves multiplication of a single huge matrix (of size subexponential in n ) by many vectors. • On a single-processor computer, storage dominates cost yet is poorly utilized. • Sharing the input: collisions, propagation delays. • Solution: use a mesh-based device, with a small processor attached to each storage cell. Devise an appropriate distributed algorithm. Bernstein proposed an algorithm based on mesh sorting. • Asymptotic improvement: at a given cost you can factor integers that are 1.17 longer, when cost is defined as throughput cost = run time X construction cost = AT cost

� � � Implications? • The expressions for asymptotic costs have + o ( 1 ) )·(log n ) 1/3 ·(log log n ) 2/3 . the form e ( • Is it feasible to implement the circuits with current technology? For what problem sizes? • Constant-factor improvements to the algorithm? Take advantage of the quirks of available technology? • What about relation collection?

� ✂ ✁ ✁ ✁ ✁ The Relation Collection Step • Task: Find many integers a for which f ( a ) is B -smooth (and their factorization). • We look for a such that p | f ( a ) for many large p : • Each prime p “hits” at arithmetic progressions: where r are the roots modulo p of f . (there are at most deg( f ) such roots, ~1 on average).

� ✁ ✂ The Sieving Problem Input: a set of arithmetic progressions. Each progression has a prime interval p and value log p . smaller than 10 8 ) (there is about one progression for every prime Output: indices where the sum of values exceeds a threshold. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

✁ � Three ways to sieve your numbers... 41 O primes 37 O 31 29 O 23 O 19 O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 indices ( values)

� Serial sieving, à la Eratosthenes One contribution per clock cycle. 41 O 37 O 31 29 O 23 Time O 19 276–194 BC O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Memory

� � TWINKLE: time-space reversal One index handled at each clock cycle. 41 O 37 O 31 29 O 23 O 19 Counters O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time

� � TWIRL: compressed time s =5 indices handled at each clock cycle. (real: s =32768 ) 41 O 37 O 31 Various circuits 29 O 23 O 19 O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time

� ✝ ☎ ✁ ✞ ✂ ☎ 3 Parallelization in TWIRL 2 1 TWINKLE-like pipeline 0 … ✄✆☎ a

� ☎ ☎ ✞ ☎ ✝ ✂ ☎ ✞ ☎ ✂ ☎ ✂ ✞ Parallelization in TWIRL TWINKLE-like TWIRL with parallelization factor s Simple parallelization with factor s pipeline ✄✆☎ ✄✆☎ … … … a a s s s s ✄✆☎ a

�✁ ✂ ✂ Heterogeneous design • A progression of interval p O makes a contribution every p / s clock cycles. O • There are a lot of large primes, but each contributes very seldom. O • There are few small primes, but their contributions O are frequent. O O O We place numerous “stations” along the pipeline. O O O Each station handles progressions whose prime O O O interval are in a certain range. Station design varies O O O O O with the magnitude of the prime. O O O O O O O O O O O O O O O O O O O O O

� � Example: handling large primes • Primary consideration: efficient storage between contributions. • Each memory+processor unit handle many progressions. It computes and sends contributions across the bus, where they are added at just the right time. Timing is critical. Processor Memory Processor Memory

� � Handling large primes (cont.) Processor Memory

� ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ Handling large primes (cont.) • The memory contains a list of events of the form ( p , a ), meaning “ a progression with interval p will make a ”. Goal: simulate a priority queue. contribution to index a • The list is ordered by increasing a . • At each clock cycle: 1 . Read next event ( p , a ). 2. Send a log p contribution to line a ( mod s ) of the pipeline. 3. Update a + p ← a 4. Save the new event ( p , a ) to the memory location that will be read just before index a passes through the pipeline. • To handle collisions, slacks and logic are added.

� Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P

� Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P • By appropriate choice of parameters, we guarantee that new events are always written just behind the read head. • There is a tiny (1:1000) window of activity which is “twirling” around the memory bank. It is handled by an SRAM-based cache. The bulk of storage is handled in compact DRAM.

Hardware- -Based Implementations Based Implementations Hardware - PowerPoint PPT Presentation

Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of Factoring Algorithms Factoring Large Numbers with the TWIRL Device Factoring Large Numbers with the TWIRL Device Adi Shamir, Eran Tromer Adi

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Hardware Implementations of Fixed-Point Atan2 Florent de Dinechin Matei I stoan Universit

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Threshold Implementations Svetla Nikova Threshold Implementations A provably secure

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Contracts vs. Implementations: Where? Common Eiffel Errors: Instructions for Implementations :

Hardware evaluation and procurement Hardware: competition, evolution, Evaluation of CPU nodes

BIOINSPIRED HARDWARE Erki Suurjaak Overview bioinspired hardware NASAs exploration

Secure Hardware HOW CAN WE PROTECT OUR HARDWARE ??? HOW CAN OUR HARDWARE PROTECT ITSELF ??? 1

HOST Hardware Trojans I ECE 525 Hardware Trojans (HT) What is a hardware Trojan? A deliberate

Contents Contents Mobile Phones Generations Hardware Requirements for Hardware Requirements

CSE 120 Hardware How hardware works Operating Systems Layer What the kernel does API

Exit Hardware 376 Series - Push Bar Exit Hardware 376 Series - Push Bar Exit Hardware In distinct

Computer Hardware Hardware components are the physical parts of the computer Main Hardware

Resource Allocation for Hardware Implementations of Map Richard Townsend Martha A. Kim Stephen

Lecture 5: Number Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct

Lecture 18:Primes and Greatest Common Divisors Dr. Chengjiang Long Computer Vision Researcher at

Shors Algorithm for Factorizing Large Integers G. Eric Moorhouse, UW Math References H.-K.

Quantum algorithms for computing short discrete logarithms and factoring RSA integers PQCrypto

Products of Farey Fractions Je ff Lagarias University of Michigan Ann Arbor, MI, USA August 6,

Hadron Masses and Factorization (in DIS) Ted Rogers Jefferson Lab/Old Dominion University

Cryptography: Public Key Cryptography; Mathematical Preliminaries Greg Plaxton Theory in

Questions about Prime Numbers Proving Existential Statements Is 1 prime? Prove existential

Sambuz

Useful Links

Newsletter

Mail Us