Filtering and the matrix step in NFS Thorsten Kleinjung Laboratory - - PowerPoint PPT Presentation

filtering and the matrix step in nfs
SMART_READER_LITE
LIVE PREVIEW

Filtering and the matrix step in NFS Thorsten Kleinjung Laboratory - - PowerPoint PPT Presentation

Filtering and the matrix step in NFS Thorsten Kleinjung Laboratory for Cryptologic Algorithms EPFL, Station 14, CH-1015 Lausanne, Switzerland 1 / 17 Contents Overview NFS The matrix step Filtering A modified filtering approach 2 / 17


slide-1
SLIDE 1

Filtering and the matrix step in NFS

Thorsten Kleinjung

Laboratory for Cryptologic Algorithms EPFL, Station 14, CH-1015 Lausanne, Switzerland

1 / 17

slide-2
SLIDE 2

Contents

Overview NFS The matrix step Filtering A modified filtering approach

2 / 17

slide-3
SLIDE 3

Overview number field sieve (NFS)

N number to be factored

1 Find two polynomials fi ∈ Z[x], i = 1, 2, with a common zero m

modulo N (and some conditions). Denote by Fi the corresponding homogeneous polynomials.

2 Choose L and find sufficiently many pairs a, b ∈ Z such that F1(a, b)

and F2(a, b) decompose into prime factors ≤ L. Each such pair corresponds to a congruence in a number field, such that both sides are divisible only by prime ideals of norm ≤ L.

3 Find a subset of these congruences such that the products of both

sides are squares. This is equivalent to solving a system of linear equations over F2.

4 Compute square roots and obtain a congruence of type

c2 ≡ d2 (mod N) c, d ∈ Z. gcd(c + d, N) will be a proper divisor of N with probability ≥ 1

2.

3 / 17

slide-4
SLIDE 4

Comments on the individual steps

1 Polynomial selection parallel, quality determines run time of

subsequent steps

2 Sieving parallel, time consuming 3 Filtering easy, but a lot of data movement,

quality determines run time of next step

4 Matrix step some parallelisation possible,

time and memory consuming

5 Square root parallel, negligible amount of time and memory 4 / 17

slide-5
SLIDE 5

Filtering / matrix step

Data from sieving step: S sparse matrix (rows = prime ideals, columns = relations) want to find solution Sv′ = 0 Filtering: produces a smaller sparse matrix A by column operations Av = 0 easily gives v′ with Sv′ = 0 Matrix step: find several vectors v with Av = 0

5 / 17

slide-6
SLIDE 6

Brief history of the matrix step

Gaussian elimination O(d3) algorithm (d: dimension of matrix) many tricks for reducing d RSA-129 in 1994 (quadratic sieve): 188 160 × 188 614 matrix, dense

6 / 17

slide-7
SLIDE 7

Brief history of the matrix step

Gaussian elimination O(d3) algorithm (d: dimension of matrix) many tricks for reducing d RSA-129 in 1994 (quadratic sieve): 188 160 × 188 614 matrix, dense Block Lanczos O(d2) algorithm for sparse matrices RSA-512 in 1999 in Amsterdam: 6 699 191 × 6 711 336 matrix with 417 132 631 non zero entries

6 / 17

slide-8
SLIDE 8

Brief history of the matrix step

Gaussian elimination O(d3) algorithm (d: dimension of matrix) many tricks for reducing d RSA-129 in 1994 (quadratic sieve): 188 160 × 188 614 matrix, dense Block Lanczos O(d2) algorithm for sparse matrices RSA-512 in 1999 in Amsterdam: 6 699 191 × 6 711 336 matrix with 417 132 631 non zero entries Block Wiedemann O(d2) algorithm for sparse matrices allows for limited disjoint parallelisation but needs more operations RSA-768 in 2009, computation in different places: 192 795 550 × 192 796 550 matrix with 27 797 115 920 non zero entries

6 / 17

slide-9
SLIDE 9

Block Wiedemann

Input: d × d matrix A over F2, output: solution(s) of Av = 0 Idea: find a linear combination of Aiy, 0 < i ≤ d which is orthogonal to sufficiently many x(AT)j, j ≥ 0 ⇒ have to compute xTAi+jy

7 / 17

slide-10
SLIDE 10

Block Wiedemann

Input: d × d matrix A over F2, output: solution(s) of Av = 0 Idea: find a linear combination of Aiy, 0 < i ≤ d which is orthogonal to sufficiently many x(AT)j, j ≥ 0 ⇒ have to compute xTAi+jy Berlekamp-Massey step: essentially half-gcd of polynomials of degree ≈ 2d over F2, gives sought linear combination Main computations

≈ 3d matrix-vector-multiplications Berlekamp-Massey step

7 / 17

slide-11
SLIDE 11

Block Wiedemann

Input: d × d matrix A over F2, output: solution(s) of Av = 0 Idea: find a linear combination of Aiy, 0 < i ≤ d which is orthogonal to sufficiently many x(AT)j, j ≥ 0 ⇒ have to compute xTAi+jy Berlekamp-Massey step: essentially half-gcd of polynomials of degree ≈ 2d over F2, gives sought linear combination Main computations

≈ 3d matrix-vector-multiplications Berlekamp-Massey step

Block version: choose n vectors x1, . . . , xn and y1, . . . , yn find linear combinations of Aiyl orthogonal to sufficiently many xk(AT)j still ≈ 3d matrix-vector-multiplications Berlekamp-Massey step more complex

7 / 17

slide-12
SLIDE 12

Block Wiedemann analysis

A: d × d matrix over F2 with w non zero entries n: number of independent sequences (n ≪ d) Main operations: 3d multiplications A · x half-gcd of polynomials of degree ≈ d

n of 2n × 2n matrices

Step memory run time Multiplications O(w) O(dw) Berlekamp-Massey O(nd1+o(1)) O(n2d1+o(1)) Parallelisation issues Communication for multiplication on cluster with k2 nodes: O(d2k) BM: basic steps are ≈ d

n triangulations of 2n × 2n matrices

8 / 17

slide-13
SLIDE 13

Matrix for RSA-768

192 795 550 × 192 796 550 matrix with 27 797 115 920 non zero entries n = 512 (processed in blocks of 64 per cluster, i.e., can use 8 clusters) Step memory run time wall clock time Multiplications 200 GB 99.8% 85d Berlekamp-Massey 1 TB 0.2% 0.75d

9 / 17

slide-14
SLIDE 14

Extrapolation

Assumptions: everything scales as in heuristic run time analysis clusters consist of k2 nodes (16 cores, 32 GB) need 8 byte per entry to store matrix want to do non-communication part of multiplications in ≈ 1 year size d w k2 mem/cl # cl comm BM-men BM-time 768 228 235 32 288 GB 1 0.5a 128 GB 2h 1024 233 240 162 8 TB 25 4a 100 TB 0.2a? 1536 241 248 2562 2 PB 6400 60a 6 EB 10 000a? mem=memory, cl=cluster, comm=communication part of multiplications, BM-time=Berlekamp-Massey step on one cluster

10 / 17

slide-15
SLIDE 15

Filtering overview

Input: large, very sparse matrix (≈ 20 entries per column), sometimes very

  • verdetermined

Try to eliminate rows by: adding columns removing columns removing zero rows Output: smaller, but still sparse matrix (≈ 50 − 200 entries per column)

11 / 17

slide-16
SLIDE 16

Doing more filtering

Sieving for a 110-digit number produced (after removing some relations) a 2 297 422 × 2 357 458 matrix with 18.94 entries per column

100000 200000 300000 400000 500000 600000 700000 800000 100 200 300 400 500 600 1000 2000 3000 4000 5000 matrix dimension d (red) time (blue) average weight w/d

12 / 17

slide-17
SLIDE 17

Reviewing filtering

Input: S matrix from sieving step (possibly some columns removed) Elimination of one row equivalent to multiplication of S on the right by Fi Fi is essentially identity matrix plus a few entries, e.g.: Fi =             1 ... 1 ∗ · · · ∗ ∗ · · · ∗ 1 ... 1             Output: A = S · F1 · . . . · Fr (and F = F1 · . . . · Fr) (Av = 0 gives S · (Fv) = 0)

13 / 17

slide-18
SLIDE 18

A modified approach

Split A = S · F1 · . . . · Fr = M1 · . . . · Ms as follows A = S · F1 · . . . · Ft1

  • M1

· Ft1+1 · . . . · Ft2

  • M2

· . . . · Fts−1+1 · . . . · Fr

  • Ms

Idea: weight increase wt(S · F1 · . . . · Fi−1) wt(S · F1 · . . . · Fi) becomes much bigger than wt(Fi) − d for large i for small h we have wt(Fj+1 · . . . · Fj+h) − d ≈ (wt(Fj+1) − d) + . . . + (wt(Fj+h) − d) Therefore hope for wt(M1) + wt(M2) < wt(M1 · M2) etc.

14 / 17

slide-19
SLIDE 19

How to split the product?

Compare A (d × d matrix) and A′ · B (e × d,d × e) where A = A′ A′′

  • ,

A · B = A′ · B

  • 15 / 17
slide-20
SLIDE 20

How to split the product?

Compare A (d × d matrix) and A′ · B (e × d,d × e) where A = A′ A′′

  • ,

A · B = A′ · B

  • Berlekamp-Massey: reduces space and time by e

d

Communication: number of multiplications: d − → e communication per multiplication: d − → d + e If e <

√ 5−1 2

· d less communication for A′ · B Computation: number of multiplications reduced by e

d

cost per multiplication depends on weight and sparseness of A′′ and B, usually both very sparse (i.e., high cost per entry)

15 / 17

slide-21
SLIDE 21

Results

red: one matrix, blue: e ≈ 0.9 · d, light blue: e ≈ 0.8 · d, purple e ≈ 0.6 · d

500 1000 1500 2000 100000 200000 300000 400000 500000 600000 700000 800000 time matrix dimension e

16 / 17

slide-22
SLIDE 22

Further research

How far can we go? Relation to factor base bound? Good strategies for removing rows, perhaps different strategies for later phases Good splitting points How to deal with initial excess Speed up filtering without decreasing quality of output

17 / 17