Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic - PDF document

Efficient Binary Field Arithmetic and Applications to Curve-based Cryptography Diego F. Aranha Department of Computer Science University of Bras´ ılia CHES 2012 Tutorial Diego F. Aranha Efficient Binary Field Arithmetic Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic

Numbers RELIC is an Efficient LIbrary for Cryptography ( http://code.google.com/p/relic-toolkit ): Research framework R elic elic Licensed as free software (LGPL) toolkit 11 source code releases 78,000 lines of code 1300 visitors from 74 countries 1500 downloads Diego F. Aranha Efficient Binary Field Arithmetic Introduction Limitations of other libraries: Restricted portability Uninteresting licensing model Emphasis on standards and commercial algorithms Why a new criptographic library? Organization oriented for portability Complete control of licensing model Code sharing and reproducibility of results Focus on research Diego F. Aranha Efficient Binary Field Arithmetic

Organization Basic organization: Meta-library Compile-time configuration Inspired on GNU Multiple Precision Arithmetic Library (GMP) Protocols Arithmetic backend Diego F. Aranha Efficient Binary Field Arithmetic Breakdown Arithmetic backend: Architecture-dependent Rigid interface with upper layers Generic modules available in C and with GMP support 21 functions for multiple precision integer arithmetic, 26 functions for binary fields, 32 functions for prime fields Why this organization? It is currently possible to obtain competitive timings with the same library in an 8-bit processor with 4KB of RAM and an 8-core Intel desktop processor. Diego F. Aranha Efficient Binary Field Arithmetic

Breakdown Binary field arithmetic: Field size specified on compile time 3 different strategies for squaring, 5 for multiplication, 2 for square root extraction, 2 for half-trace and 6 for inversion Modular reduction by trinomials and pentanomials Binary curve arithmetic: Supersingular, Koblitz and ordinary (standardized or not) Affine, projective and mixed coordinate systems 4 different strategies for random point scalar multiplication, 6 for fixed point and 4 for multiple point Symmetric pairings over genus-1 or genus-2 curves Diego F. Aranha Efficient Binary Field Arithmetic Breakdown Miscellaneous: Support for words of 8, 16, 32 and 64 bits Static, stack, automatic and dynamic memory allocators Helper macros for testing and benchmarking Support for debugging, profiling, tracing and multithreading Abundant Doxygen documentation Deactivation of modules and automatic elimination of algorithms to reduce code size Standard PRNG with configurable seed source Support for FreeBSD, Linux, Mac OS X, Windows Management of configuration and build system with CMake Open collaboration with academia and industry Diego F. Aranha Efficient Binary Field Arithmetic

Part II: Binary fields Diego F. Aranha Efficient Binary Field Arithmetic Introduction A finite field F p m consists of all polynomials with coefficients in Z p , prime p , modulo an irreducible degree- m polynomial f ( z ). Prime p is the characteristic of the field and m is the extension degree . A binary field F 2 m is the special case p = 2 and is formed by polynomials with binary coefficients. Diego F. Aranha Efficient Binary Field Arithmetic

Introduction Example: Field F 2 8 Irreducible polynomial: f ( z ) = z 8 + z 4 + z 3 + z + 1 = 1 0001 1011 Representation: a ( z ) = z 7 + z 3 + 1 = 1000 1001 = 0x89 b ( z ) = z 6 + z 5 + z 2 = 0110 0100 = 0x64 Addition: a ( z ) + b ( z ) = z 7 + z 6 + z 5 + z 3 + z 2 + 1 = 1110 1101 = 0xED Note: a ( z ) + a ( z ) = 2 · a ( z ) = 0 , ∀ a ∈ F 2 m Diego F. Aranha Efficient Binary Field Arithmetic Introduction Example: Field F 2 8 Irreducible polynomial: f ( z ) = z 8 + z 4 + z 3 + z + 1 = 1 0001 1011 Representation: a ( z ) = z 7 + z 3 + 1 = 1000 1001 = 0x89 b ( z ) = z 6 + z 5 + z 2 = 0110 0100 = 0x64 Multiplication: a ( z ) × b ( z ) = z 13 + z 12 + z 8 + z 6 + z 2 mod f ( z ) = z 7 + z 5 + z 4 + z 3 + z 2 + 1 = 0xBD Multiplication by z : z × b ( z ) = z 7 + z 6 + z 3 = 1100 1000 = 0xC8 = b ≪ 1 Diego F. Aranha Efficient Binary Field Arithmetic

Introduction Binary fields ( F 2 m ) are omnipresent in Cryptography: Efficient Curve-based Cryptography (ECC, PBC) Post-quantum Cryptography Block ciphers Many algorithms/optimizations already described in the literature: Is it possible to unify the fastest ones in a simple formulation? Can such a formulation reflect the state-of-the-art and provide new ideas? Diego F. Aranha Efficient Binary Field Arithmetic Objective Contributions Formulation of state-of-the-art binary field arithmetic using vector instructions New strategy for the implementation of multiplication Time-memory trade-offs to compensate for native multiplier Experimental results Diego F. Aranha Efficient Binary Field Arithmetic

Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instructions (65/45 nm) Super shuffle engine introduced in 45 nm series Carry-less multiplier introduced in Nehalem family 256-bit Advanced Vector Extensions instructions (32 nm) Relevant vector instructions: Instruction Description Cost Mnemonic MOVDQA Memory load/store 3/2 ← PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 PXOR,PAND,POR Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 PSHUFB Byte shuffling 3 (1) shuffle , lookup Memory alignment 2 (1) PALIGNR ⊳ PCLMULQDQ Carry-less multiplication 10 (8) ⊗ Diego F. Aranha Efficient Binary Field Arithmetic New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Diego F. Aranha Efficient Binary Field Arithmetic

New SSSE3 instructions Example: Bit manipulation Diego F. Aranha Efficient Binary Field Arithmetic New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Diego F. Aranha Efficient Binary Field Arithmetic

Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � Polynomial basis: a ( z ) ∈ F 2 m = a i z i . i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words (even). Graphical representation: Diego F. Aranha Efficient Binary Field Arithmetic Data types #if WORD == 8 typedef uint8_t dig_t; #elif WORD == 16 typedef uint16_t dig_t; #elif WORD == 32 typedef uint32_t dig_t; #elif WORD == 64 typedef uint64_t dig_t; #endif typedef __m128i vec_t; Diego F. Aranha Efficient Binary Field Arithmetic

Useful macros #define LOAD _mm_load_si128 #define STORE _mm_store_si128 #define PSHUFB _mm_shuffle_epi8 #define XOR _mm_xor_si128 #define AND _mm_and_si128 #define SHL _mm_slli_epi64 #define SHR _mm_srli_epi64 #define SHL8 _mm_slli_si128 #define SHR8 _mm_srli_si128 #define UNPACKLO _mm_unpacklo_epi8 #define UNPACKHI _mm_unpackhi_epi8 #define CLMUL _mm_clmulepi64_si128 Diego F. Aranha Efficient Binary Field Arithmetic Proposed representation To employ 4-bit granular arithmetic, convert to split form : � � a i z i − 4 , a i z i , a L = a H = 0 ≤ i < m , 0 ≤ i < m , 0 ≤ i mod 8 ≤ 3 4 ≤ i mod 8 ≤ 7 A i A L A H Diego F. Aranha Efficient Binary Field Arithmetic

Proposed representation Easy to convert to split form: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = ( A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 ) >> 4 Easy to convert back: a ( z ) = a H ( z ) z 4 + a L ( z ) . Diego F. Aranha Efficient Binary Field Arithmetic Addition/subtraction in F 2 m m − 1 � ( a i ⊕ b i ) z i c ( z ) = a ( z ) + b ( z ) = i =0 A A A A A A A A A A A 1 A ... 9 8 7 6 5 4 3 2 0 n-1 B B B B B B B B B B B 1 B ... 9 8 7 6 5 4 3 0 n-1 2 + + + + + + + + + + + + C C C C C C C C C C C 1 C ... 9 8 7 6 5 4 3 2 0 n-1 Guidelines: Use XOR instruction with largest operand size. Verify impact of higher throughput. Diego F. Aranha Efficient Binary Field Arithmetic

Addition/subtraction in F 2 m void fb_addn_low(dig_t *c, dig_t *a, dig_t *b) { int i; for (i = 0; i < FB_DIGS; i += 2, c += 2, a += 2, b += 2) { vec_t t0 = LOAD (( vec_t *)a); vec_t t1 = LOAD (( vec_t *)b); t0 = XOR(t0 , t1); STORE (( vec_t *)c, t0); } } Diego F. Aranha Efficient Binary Field Arithmetic Squaring in F 2 m m � a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 a ( z ) = i =0 m − 1 � a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Diego F. Aranha Efficient Binary Field Arithmetic

Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ), use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Diego F. Aranha Efficient Binary Field Arithmetic Proposed squaring in F 2 m A i A A L H ... table 01010101 00010001 00010000 00000101 00000100 00000001 00000000 lookup lookup A A H L interhi, interlo T T 2i+1 2i a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Diego F. Aranha Efficient Binary Field Arithmetic

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic - PDF document

Efficient Binary Field Arithmetic and Applications to Curve-based Cryptography Diego F. Aranha Department of Computer Science University of Bras lia CHES 2012 Tutorial Diego F. Aranha Efficient Binary Field Arithmetic Part I: RELIC

R elic elic toolkit http://code.google.com/p/relic-toolkit/ Diego F. Aranha RELIC

Searching for Supernova Relic Neutrinos Dr. Matthew Malek University of Birmingham HEP

Elastically Decoupling Relic (ELDER) Dark Matter Maxim Perelstein, Cornell U.S. Cosmic Visions:

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

POZIERES RELIC BAYONET To the People of South Australia The ruin of Pozieres Windmill, which

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

Joint Enterprise: A relic of the British Empire & how the Black Lives Matter movement presents

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff Berkowitz, New Relic QCon

EMSEV-DEMETER JOINT WORKSHOP IGAR September 7-12, 2008 Bucharest SINAIA, ROMANIA GEODYNAMIC

Relictism Relic species or (Relicts) are remenata of an earlier formerly flora . when

East Africa: F rom Anza to Madagascar: A relic and active 4000 - km Intraplate Strike - Slip

Relic Neutrinos the holy grail of neutrino physics? Fermilab Summer School 2009 J. A.

Containerizing Databases at New Relic (What We Learned) Bryant Vinisky and Joshua Galbraith

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Relic Neutrinos (and other Holy Grails) Institute for Nuclear Theory February 2010 J. A.

Avoiding Full Extension Field Arithmetic in Pairing Computations Craig Costello

Fields and model-theoretic classification, 2 Artem Chernikov UCLA Model Theory conference

Normal Field Extensions Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Curve25519: Which public-key systems new Diffie-Hellman speed records are smallest? Fastest? D.

tr trssrs

Construction of the Lindstr om valuation of an algebraic matroid Dustin Cartwright University

Hardy Fields, Transseries, and Surreal Numbers Lou van den Dries University of Illinois at

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic - PDF document

Efficient Binary Field Arithmetic and Applications to Curve-based Cryptography Diego F. Aranha Department of Computer Science University of Bras lia CHES 2012 Tutorial Diego F. Aranha Efficient Binary Field Arithmetic Part I: RELIC

R elic elic toolkit http://code.google.com/p/relic-toolkit/ Diego F. Aranha RELIC

Searching for Supernova Relic Neutrinos Dr. Matthew Malek University of Birmingham HEP

Elastically Decoupling Relic (ELDER) Dark Matter Maxim Perelstein, Cornell U.S. Cosmic Visions:

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

POZIERES RELIC BAYONET To the People of South Australia The ruin of Pozieres Windmill, which

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

Joint Enterprise: A relic of the British Empire &amp; how the Black Lives Matter movement presents

Stalking the Lost Write: Memory Visibility in Concurrent Java Jeff Berkowitz, New Relic QCon

EMSEV-DEMETER JOINT WORKSHOP IGAR September 7-12, 2008 Bucharest SINAIA, ROMANIA GEODYNAMIC

Relictism Relic species or (Relicts) are remenata of an earlier formerly flora . when

East Africa: F rom Anza to Madagascar: A relic and active 4000 - km Intraplate Strike - Slip

Relic Neutrinos the holy grail of neutrino physics? Fermilab Summer School 2009 J. A.

Containerizing Databases at New Relic (What We Learned) Bryant Vinisky and Joshua Galbraith

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Relic Neutrinos (and other Holy Grails) Institute for Nuclear Theory February 2010 J. A.

Avoiding Full Extension Field Arithmetic in Pairing Computations Craig Costello

Fields and model-theoretic classification, 2 Artem Chernikov UCLA Model Theory conference

Normal Field Extensions Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Curve25519: Which public-key systems new Diffie-Hellman speed records are smallest? Fastest? D.

tr trssrs

Construction of the Lindstr om valuation of an algebraic matroid Dustin Cartwright University

Hardy Fields, Transseries, and Surreal Numbers Lou van den Dries University of Illinois at

Joint Enterprise: A relic of the British Empire & how the Black Lives Matter movement presents