a galois field arithmetic library
play

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate - PowerPoint PPT Presentation

A Galois Field Arithmetic Library Pakize S ANAL, MSc Candidate Supervisor: Asst. Prof. H useyin HIS IL Yasar University Faculty of Engineering Department of Computer Engineering June 5, 2017 1 Outline Content of the bachelor thesis


  1. A Galois Field Arithmetic Library Pakize S ¸ANAL, MSc Candidate Supervisor: Asst. Prof. H¨ useyin HIS ¸IL Yasar University Faculty of Engineering Department of Computer Engineering June 5, 2017 1

  2. Outline Content of the bachelor thesis Studied assembly optimizations Test results 2

  3. Content of the bachelor thesis A Galois Field Arithmetic Library ◮ + , − , ∗ . ◮ GF (2 w − c ) where w = 127 , 128 , 255 , 256 and GF (2 127 − 1). ◮ Constant time AMD64 Assembly. ◮ Extensive validation and performance tests. 3

  4. 1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . a 1 a 0 a 3 a 2 SCB RSCB OSCB b 3 b 2 b 1 b 0 2 256 − c 38 - - x a 0 · b 0 a 1 · b 0 a 2 · b 0 a 3 · b 0 a 0 · b 1 a 1 · b 1 a 2 · b 1 a 3 · b 1 a 0 · b 2 a 1 · b 2 a 2 · b 2 a 3 · b 2 a 0 · b 3 a 1 · b 3 a 2 · b 3 a 3 · b 3 + a · b 4

  5. 1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . SCB RSCB OSCB 2 256 − c 38 35 - a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 x a 3 · b 3 a 1 · b 1 a 0 · b 0 a 2 · b 2 a 3 · b 2 a 1 · b 0 a 2 · b 3 a 0 · b 1 a 3 · b 1 a 2 · b 0 a 3 · b 0 a 2 · b 1 a 1 · b 3 a 0 · b 2 a 1 · b 2 a 0 · b 3 + a · b 4

  6. 1. By scheduling of the operations Four digits schoolbook vs. one level recursive schoolbook multiplication vs. . . . SCB RSCB OSCB 2 256 − c 38 35 37 a 3 a 2 a 1 a 0 b 0 b 3 b 2 b 1 x a 3 · b 3 a 2 · b 2 a 1 · b 1 a 0 · b 0 a 3 · b 2 a 3 · b 0 a 1 · b 0 a 2 · b 3 a 2 · b 1 a 0 · b 1 a 3 · b 1 a 2 · b 0 a 1 · b 3 a 0 · b 2 a 1 · b 2 a 0 · b 3 + a · b 4

  7. 1. By scheduling of the operations One level Karatsuba multiplication vs. one level schoolbook multiplication Karatsuba SCB 2 127 − 1 a 1 a 0 12 6 2 127 − c 17 13 2 128 − c 12 10 b 1 b 0 x a 1 · b 1 a 0 · b 0 ( a 1 + a 0 ) · ( b 1 + b 0 ) a 1 · b 1 - a 0 · b 0 - + a · b 5

  8. 2. By making optimization Register optimization // ... 1 movq 8*0( %r8), %rax 2 mulq 8*0( %r9) 3 movq %rax , %rbx 4 movq %rdx , %rsi 5 movq 8*1( %r8), %rax 6 mulq 8*1( %r9) 7 a 3 a 2 a 1 a 0 movq %rax , %r10 8 b 3 b 2 b 1 b 0 movq %rdx , %r11 9 x a 3 · b 3 a 2 · b 2 a 1 · b 1 a 0 · b 0 movq 8*1( %r8), %rax 10 a 3 · b 2 a 1 · b 0 mulq 8*0( %r9) 11 a 2 · b 3 a 0 · b 1 addq %rax , %rsi 12 a 3 · b 1 a 2 · b 0 adcq %rdx , %r10 13 a 3 · b 0 adcq $0 , %r11 a 2 · b 1 14 a 1 · b 3 a 0 · b 2 movq 8*0( %r8), %rax 15 a 1 · b 2 mulq 8*1( %r9) 16 a 0 · b 3 addq %rax , %rsi 17 + a · b adcq %rdx , %r10 18 adcq $0 , %r11 19 movq %rbx , 8*0( %rdi) 20 movq %rsi , 8*1( %rdi) 21 // ... 22 Listing 1 : < GF (2 255 − c ) , ∗ > 6

  9. 3. By using special instructions The instruction cmovxx if r 13 = 0 then if r 12 = 0 then Return 0. Return 0. Conditional Move else else Return r 14 . Return r 15 . // ... end end 1 movq %r12 , %rax 2 mulq %r14 3 r 13 r 12 movq $0 , %rbp 4 cmp $0 , %r13 5 cmovz %rbp , %r14 6 r 15 r 14 cmp $0 , %r15 7 x cmovz %rbp , %r12 8 a 12 · b 14 andq %r13 , %r15 9 addq %r12 , %rdx 10 r 13 .r 14 adcq $0 , %rbp 11 addq %r14 , %rdx 12 r 12 · r 15 adcq %r15 , %rbp 13 // ... 14 ? + Listing 2 : < GF (2 128 − c ) , ∗ > a · b 7

  10. 3. By using special instructions The instruction btxx Bit Test and Reset // ... 1 /*r11 , r10 , r9 , r8 */ 2 shlq $1 , %r11 3 btrq $63 , %r10 4 adcq $0 , %r11 5 r 11 r 10 r 9 r 8 shlq $1 , %r10 6 btrq $63 , %r9 7 r 9 r 8 adcq $0 , %r10 8 r 11 r 10 9 + addq %r8 , %r10 10 r 11 r 10 adcq %r9 , %r11 11 12 r 11 r 10 btrq $63 , %r11 13 adcq $0 , %r10 14 + adcq $0 , %r11 15 r 11 r 10 // ... 16 Listing 3 : < GF (2 127 − 1) , ∗ > Faster compact Diffie-Hellman: Endomorphisms on the x − line C. Costello, H. Hisil, and B. Smith 8

  11. 3. By using special instructions Comparing with the MPFQ library < GF (2 127 − 1) , ∗ > 45 instructions, 9 clock cycles 33 instructions, 6 clock cyles // ... /* r11 , r10 , r9 , r8*/ 1 movq $9223372036854775807 , %rax 2 // ... 1 movq %r9 , %r12 3 /*r11 , r10 , r9 , r8 */ 2 andq %rax , %r9 4 shlq $1 , %r11 3 shrq $63 , %r12 5 btrq $63 , %r10 4 movq %r10 , %rdx 6 adcq $0 , %r11 5 shlq $1 , %r10 7 shlq $1 , %r10 6 orq %r10 , %r12 8 btrq $63 , %r9 7 shlq $1 , %r11 9 adcq $0 , %r10 8 shrq $63 , %rdx 10 9 orq %r11 , %rdx 11 addq %r8 , %r10 10 addq %r12 , %r8 12 adcq %r9 , %r11 11 adcq %rdx , %r9 13 12 movq %r9 , %r12 14 btrq $63 , %r11 13 andq %rax , %r9 15 adcq $0 , %r10 14 shlq $1 , %r12 16 adcq $0 , %r11 15 adcq $0 , %r8 17 // ... 16 adcq $0 , %r9 18 // ... 19 Listing 4 : My schoolbook’s code reduction part Listing 5 : MPFQ schoolbook’s code reduction part https://www.imsc.res.in/~ecc14/slides/hisil.pdf 9

  12. Test Results Timing benchmarks were taken on an Intel Core i7-6500U processor running Ubuntu 14.04.5 LTS with TurboBoost disabled and all cores but one are switched-off (i.e. hyperthreading is disabled). To obtain the executables, we used GNU- gcc version 4.8.4 with the -O2 flag set and GNU assembler version 2.24. Karatsuba Schoolbook (SCB) Recursive SCB 2 127 − 1 12 6 - 2 127 − c 17 13 - 2 128 − c 12 10 - 2 255 − c - 46 40 2 256 − c - 38 34 10

  13. 1 / ∗ l i b r a r i e s ∗ / 2 #d e f i n e TRIAL 100000000000 3 i n t main () { 4 l on g l on g st , fn ; 5 s t = c p u c y c l e s () ; 6 u n si gn e d l on g an [ 2 ] , bn [ 2 ] , cn [ 2 ] ; 7 an [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 8 an [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 9 bn [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 10 bn [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 11 cn [ 0 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 12 cn [ 1 ] = ( u n si gn e d l on g ) rand () ∗ ( u n si gn e d l on g ) rand () ; 13 u n si gn e d l on g i n t i ; 14 f o r ( i = 0; i < TRIAL ; i ++) { 15 mul127 scb v01 ( an , bn , cn ) ; 16 an [ 0 ] = bn [ 1 ] ; 17 an [ 1 ] = cn [ 0 ] ; 18 bn [ 0 ] = an [ 1 ] ; 19 bn [ 1 ] = cn [ 1 ] ; 20 cn [ 0 ] = an [ 1 ] ; 21 cn [ 1 ] = bn [ 0 ] ; 22 } 23 fn = c p u c y c l e s () ; 24 double f i r s t = (( double ) fn − s t ) / TRIAL ; 25 s t = c p u c y c l e s () ; 26 f o r ( i = 0; i < TRIAL ; i ++) { 27 mu l 127 sc b te st ( an , bn , cn ) ; 28 an [ 0 ] = bn [ 1 ] ; 29 an [ 1 ] = cn [ 0 ] ; 30 bn [ 0 ] = an [ 1 ] ; 31 bn [ 1 ] = cn [ 1 ] ; 32 cn [ 0 ] = an [ 1 ] ; 33 cn [ 1 ] = bn [ 0 ] ; 34 } 35 fn = c p u c y c l e s () ; 36 double second = (( double ) fn − s t ) / TRIAL ; 37 p r i n t f (” net c l oc k c y c l e : %l f \ n \ n” , f i r s t − second ) ; 38 r e t u r n 1; 39 } Listing 6 : A performance test 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend