two fast parallel gcd algorithms of many integers sidi
play

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed - PowerPoint PPT Presentation

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire dInformatique Paris Nord, France . ISSAC 2017, Kaiserslautern, 24-28 July 2017 . 1 Motivations GCD of two integers : Used in CAS as a low operation,


  1. Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire d’Informatique Paris Nord, France . ISSAC 2017, Kaiserslautern, 24-28 July 2017 . 1

  2. Motivations • GCD of two integers : Used in CAS as a low operation, cryptography, etc. - Sequential: O ( n log 2 n log log n ), Knuth (70)-Sch¨ onhage (71). - Parallel: O ǫ ( n/ log n ) time with O ( n 1+ ǫ ) processors, Chor-Goldreich (90), Sorenson (94) and Sedjelmaci (08). This problem is still open in parallel (P-complet or NC ?) • GCD of many integers : polynomial computations, matrix computations, HNF and SNF. - Sequential: Blan(63), Brad(70), Hav(98), Cop(99), etc. - Parallel: Not addressed ? 2

  3. Name Year Worst-case O ( n 2 ) Euclid ∼ − 300 O ( n 2 ) Lehmer 1938 O ( n 2 ) Stein 1961 O (log 4 nM ( n )) Knuth 1970 Sch¨ onhage 1971 O (log nM ( n )) O ( n 2 ) Brent-Kung 1983 O ( n 2 ) Jebelean-Weber 1993 O ( n 2 / log n ) Sorenson 1994 Stehl´ e et al. 2004 O (log nM ( n )) M¨ ohler 2008 O (log nM ( n )) Table 1: Sequential GCD Algorithms for two integers. 3

  4. Authors Time Nb. of proc. Model Brent-Kung, 1983 O ( n ) O ( n ) Systolic Purdy, 1983 O ( n ) O ( n ) Systolic O ( n log log n O ( n 2+ ǫ ) Kannan et al., 1987 log n ) PRAM-crcw O (log 2 n ) e O ( √ n log n ) Adleman et al., rand., 1988 PRAM-crcw O ( n 1+ ǫ ) Chor-Goldreich, 1990 O ( n / log n ) PRAM-crcw O ( n 1+ ǫ ) Sorenson, 1994 O ( n / log n ) PRAM-crcw O ( n 1+ ǫ ) Sedjelmaci, 2008 O ( n / log n ) PRAM-crcw O ( n log log n O ( n 6+ ǫ ) Sorenson, rand., 2010 log n ) PRAM-erew Table 2: Parallel GCD Algorithms for two integers. 4

  5. Our results: • The GCD of n integers of O ( n ) bits can be achieved in O ( n/ log n ) time with O ( n 2+ ǫ ) processors in CRCW PRAM model in the worst case. • The GCD of m integers of O ( n ) bits can be achieved in O ( n/ log n ) time with O ( mn 1+ ǫ ) processors in CRCW PRAM model, with 2 ≤ m ≤ n 3 / 2 / log n . • We suggest an extended GCD version for many integers and a algorithm to solve linear Diophantine equations. • To our knowledge, it is the first time that we have this parallel performance for computating the GCD of many integers. 5

  6. Notation : A is a vector of n (or m ) integers of O ( n ) bits : A = ( a 0 , a 1 , · · · a n − 1 ), with a i ≥ 0, n ≥ 4 • An integer parameter k satisfying log k = θ (log n ). • gcd( A ) = gcd( a 0 , a 1 , · · · a n − 1 ). • gcd(0 , 0) = 0. • We use the PRAM (Parallel Random Access Machine) model of computation and CRCW PRAM (Concurrent Read Concurrent Write) sub-model. 6

  7. Main idea for designing fast parallel GCD algorithm for many integers: Find a small integer α Repeat a I := α ; a j := a j mod α ; (in parallel, ∀ j � = I ) Until almost all the integers a i are zeros. How to find a small α ? 7

  8. Pigeonhole like techniques: Lemma 1 : Let A = { a 1 , a 2 , · · · , a n } be a set of n distinct positive integers, such that n ≥ 2 and a n /n < a 1 < a 2 < · · · < a n . Then a i +1 − a i < a n ∃ i ∈ { 1 , 2 , · · · , n − 1 } s . t . : n . A straightforward consequence is the following: Corollary 1 : Let A = { a 1 , a 2 , · · · , a n } be a set of n distinct positive integers, with n ≥ 2, then min { a k , | a i − a j | > 0 } ≤ max { a i } , where 1 ≤ k, i, j ≤ n . n We derive the following algorithm : 8

  9. Input : A set A = { a 0 , a 1 , · · · , a n − 1 } of n integers of O ( n ) bits, n ≥ 4. Output : gcd( a 0 , a 1 , · · · , a n − 1 ). α := a 0 ; I := 0 ; p := n ; While ( α > 1) Do For ( i = 0) to ( n − 1) ParDo If (0 < a i ≤ 2 n /p ) Then { α := a i ; I := i ; } Endfor If ( α > 2 n /p ) Then /* Compute in parallel I, J and α */ α := min { | a i − a j | > 0 } = a I − a J ; a I := α ; Endif For ( i = 0) to ( n − 1) ParDo /* Reduce all the a i ’s */ If ( i � = I ) Then a i := a i mod α ; Endfor /* ∀ i , 0 ≤ a i ≤ α */ If ( ∀ i � = I , a i = 0 ) Then Return α ; p := np ; /* p is O (log n ) bits larger */ Endwhile Return α . The ∆ -GCD Algorithm (Poster, ISSAC 2013) 9

  10. Example (∆-GCD): Let A = (912672 , 815430 , 721161 , 565701 , 662592). After 4 iterations, we obtain GCD(A) = 3. n = 20.  912672   34137   4443   72   0  815430 54033 717 93 0                     721161 58569 810 66 0                     565701 38580 3036 60 0 → → → →                     662592 18333 561 3 3                     α = 58569 4443 93 3 3           ( I, J ) = (2 , 4) (0 , 3) (1 , 2) (4 , − ) STOP 9-1

  11. Drawbacks of the pigeonhole technique - The number of distinct integers is important. If there are only O (log n ) distinct integers in A , then the pigeonhole technique will reduce the bit size of the integers by O (log log n ) bits and the number of iterations in the main while loop will be O ( n/ log log n ). - What happens if α = 0 ? For example, if n = 8 and A = (255 , 255 , 193 , 161 , 129 , 97 , 65 , 65). There are only two pairs of integers that match in their 3 most significant bits, namely (255 , 255) and (65 , 65). Unfortunately, in both cases α = 0. - Comparing the O ( n 2 ) pairs of integers ( a i , a j ) to find a small α = a i − a j > 0 in constant parallel time needs O ( n 3 ) processors. 10

  12. Solution: Use other techniques - Consider O ( √ n ) integers and compute their differences a i − a j to find α > 0. There are O ( n ) comparisons done in constant time with O ( n 2+ ǫ ) processors. - In case it fails, use a Lehmer-like reduction ( R ILE , ISSAC’2001). - In case all the R ILE give zero, then reduce transformation will right-shift all the zeros of A and we continue the process with this new A . 11

  13. The Lehmer-like reduction : R ILE and Ext- R ILE . The R ILE and Ext- R ILE algorithms are described in Sed-ISSAC’01 and Sed-JDA’08. ILE stands for Improved Lehmer Euclid : (1) R ILE is defined by Input: u ≥ v ≥ 0 , k = 2 m ; m = θ (log n ). Output: R ILE ( u, v ) = | au + bv | < 2 v/k , with 1 ≤ | a | ≤ k . - Roughly speaking, R ILE ( u, v ) computes the continued fractions. (2) : Ext- R ILE is the extended version of R ILE i.e.: we add the ezout matrix M such that : ( 0 ≤ i, j ≤ ⌊√ n ⌋ ) B´ M × ( a i , a j ) T = ( R i , R j ) ; R j = R ILE . 0 ≤ R j < R i and gcd( R i , R j ) = gcd( a i , a j ). R j < (2 /k ) max { a i , a j } . 12

  14. Example : Let u = 1 759 291 and v = 1 349 639. Their binary representations are respectively: 11010110 1100000111011 2 = 1 759 291 10100100 1100000000111 2 = 1 349 639 We have n = p = 21. For m = 3, we obtain λ = 2 m + 2 = 8, u 1 = 214 and v 1 = 164 (the leading bits of u 1 and v 1 are in bold). Using EEA with u 1 and v 1 , we obtain in turn q , r , b and a ( r = au + bv ) : 13

  15. q r a b 214 1 0 164 0 1 1 50 1 − 1 3 14 − 3 4 3 8 10 − 13 In our example, we obtain a = − 3, b = 4, r = 14 < v 1 /k = 164 / 8 = 20 . 50 and R ILE = | − 3 u + 4 v | = 120 683 < v/ 8 = 168 704 . 88 14

  16. Properties of R ILE and Ext- R ILE : • Parallel complexity : O ( n/ log n ) ǫ time with O ( n 1+ ǫ ) processors on CRCW PRAM (ISSAC’01). • It computes efficiently in parallel the B´ ezout coefficients with the same parallel performance (JDA’08). 15

  17. High level description of ∆ - 2 GCD algorithm . - Test 1: Is there a small enough a i > 0 so that we can consider it straightforwardly as an α ? - Test 2: Does the pigeonhole algorithm provide an α > 0 ? - Test 3: Use a new transformation R based on continued fractions (Sed-ISSAC’01) and test if R > 0 ? If Test 3 fails, i.e.: R j ( a i , a j ) = 0 for all ( a i , a j ), with i, j ≤ √ n , then ( R i , R j ) = ( R i , 0) and ( a i , a j ) ← − (0 , R i ). A new transformation called reduce right-shifts all the zeroes in A . We reduce by half the number of O ( √ n ) positive integers considered (the other half of integers are all zeroes). Moreover, it could be iterated at most O ( √ n ) times since, at each step, we add O ( √ n ) new zeros in the vector A . 16

  18. ∆ - 2 GCD algorithm ,: Input : A vector A = ( a 0 , a 1 , · · · , a n − 1 ), n ≥ 4 and max { a i } < 2 n . Output : gcd( a 0 , a 1 , · · · , a n − 1 ). ( α, I ) := ( a 0 , 0) ; p := n ; N := ⌊√ n ⌋ ; While ( α > 1) Do For ( i = 0) to ( n − 1) ParDo If (0 < a i ≤ 2 n /p ) then { ( α, I ) := ( a i , i ) ; S := 1 } ; else S := 0 ; /* No small a i */ Endfor If ( S = 0) then ( α, I ) := pigeonhole ( A, N ) ; If ( I = − 1) then R := 0 ; /* The pigeonhole fails */ For ( i, j = 0) to ( N − 1) ParDo x ij := R ILE ( a i , a j ) ; If ( x ij > 0) then { ( α, I ) := ( x ij , i ) ; R := 1 ; a I := x ij } /* We can divide all the a i ’s by α = x ij */ Endif Endfor 17

  19. If ( R = 0) /* ∀ i, j , R ILE ( a i , a j ) = 0 */ then A := reduce ( A, N ) ; Endif Endif If ( I ≥ 0) then A := remainder ( A, α, I ) ; If ( ∃ a k � = 0 s.t.: ∀ i � = k ⇒ a i = 0) then Return a k ; p := np ; /* p is O (log n ) bits larger */ Endwhile Return α . 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend