High-speed parallel software implementation of the T pairing Diego - PowerPoint PPT Presentation

High-speed parallel software implementation of the η T pairing Diego F. Aranha Institute of Computing – UNICAMP Joint work with Julio L´ opez and Darrel Hankerson Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Introduction Pairing computation is the most expensive operation in Pairing-Based Cryptography. Parallelism is being increasingly introduced in modern architectures. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Objective Explore two types of parallelism in software to reduce pairing computation latency: Vector instructions; Multiprocessing. Applications: real-time services (DNS?), embedded devices. Contributions Novel ways for implementing binary field arithmetic; Parallelization of Miller’s Algorithm; Static load balancing technique; Experimental results. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instruction set; Multiprocessing with overheads of around 10 microsec; Super shuffle engine introduced in 45 nm series. Relevant vector instructions: Instruction Description Cost Mnemonic Memory load/store 2.5 ← MOVDQA PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ PXOR,PAND,POR Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 Byte shuffling 3 (1) shuffle , lookup PSHUFB PALIGNR Memory alignment 2 (1) ⊳ Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

New SSSE3 instructions Example: Bit manipulation Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � a i z i . Polynomial basis: a ( z ) ∈ F 2 m = i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words. Notation: A is a 64-bit variable, A is a 128-bit variable. Graphical representation: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Squaring in F 2 m m a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 � a ( z ) = i =0 m − 1 a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 � i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Squaring in F 2 m We can write: a ( z ) = a L ( z ) + a H ( z ) · z 4 . Since squaring is a linear operation: a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Polynomials a L ( z ) and a H ( z ) are easy to compute: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Squaring in F 2 m We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ) we use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Proposed squaring in F 2 m t ( z ) = a L ( z ) 2 + a H ( z ) 2 · z 8 Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Square root extraction in F 2 m m − 1 m − 1 √ a = a 2 m − 1 a i z i � 2 m − 1 = z 2 m − 1 � i � � � � = a i i =0 i =0 2 + √ z i − 1 i � � = a i z a i z 2 i even i odd a even + √ z · a odd = For f ( z ) = z 1223 + z 255 + 1 in F 2 1223 , we have √ z = z 612 + z 128 . Important: Multiplication by √ z requires shifts and additions only. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Proposed square root in F 2 m Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Multiplication in F 2 m 1 Multi-precision multiplication: An instance of Karatsuba; L´ opez-Dahab comb method; 2 Modular reduction. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Karatsuba multiplication in F 2 m c ( z ) = a ( z ) · b ( z ) A 1 B 1 z m + [( A 1 + A 0 )( B 1 + B 0 ) + A 1 B 1 + A 0 B 0 ] z ⌈ m / 2 ⌉ + A 0 B 0 . = Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

L´ opez-Dahab multiplication in F 2 m We can compute u · b ( z ) using shifts and additions. If a ( z ) is divided into 4-bit polynomials, compute a ( z ) · b ( z ) by: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Proposed multiplication in F 2 m Algorithm 1 LD multiplication implemented with n 128-bit registers. Input: a ( z ) = a [0 .. n − 1] , b ( z ) = b [0 .. n − 1]. Output: c ( z ) = c [0 .. n − 1]. Note: m i denotes the vector of n 2 128-bit registers ( r ( i − 1+ n / 2) , . . . , r i ). 1: Compute T 0 ( u ) = u ( z ) · b ( z ) , T 1 ( u ) = u ( z ) · ( b ( z ) ≪ 4) for all u ( z ) of degree lower than 4. 2: ( r n − 1 . . . , r 0 ) ← 0 3: for k ← 56 downto 0 by 8 do 4: for j ← 1 to n − 1 by 2 do 5: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 6: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 7: m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 0 ( u ), m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 1 ( v ) 8: end for 9: ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 10: end for 11: for k ← 56 downto 0 by 8 do 12: for j ← 0 to n − 2 by 2 do 13: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 14: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 15: m j / 2 ← m j / 2 ⊕ T 0 ( u ), m j / 2 ← m j / 2 ⊕ T 1 ( v ) 16: end for 17: if k > 0 then ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 18: end for 19: return c = ( r n − 1 . . . , r 0 ) mod f ( z ) Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Modular reduction (64-bit mode) Algorithm 2 Fast modular reduction by f ( z ) = z 1223 + z 255 + 1. Input: c ( z ) = c [0 .. 2 n − 1]. Output: c ( z ) mod f ( z ) = c [0 .. n − 1]. 1: for i ← 2 n − 1 downto n do 2: t ← c [ i ] 3: c [ i − 15] ← c [ i − 15] ⊕ ( t ≫ 8) c [ i − 16] ← c [ i − 16] ⊕ ( t ≪ 56) 4: c [ i − 19] ← c [ i − 19] ⊕ ( t ≫ 7) 5: c [ i − 20] ← c [ i − 20] ⊕ ( t ≪ 57) 6: 7: end for 8: t ← c [19] ≫ 7 , c [0] ← c [0] ⊕ t , t ← t ≪ 7 9: c [3] ← c [3] ⊕ ( t ≪ 56) 10: c [4] ← c [4] ⊕ ( t ≫ 8) 11: c [19] ← ( c [19] ⊕ t ) ∧ 0x7F 12: return c Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

High-speed parallel software implementation of the T pairing Diego - PowerPoint PPT Presentation

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of Computing UNICAMP Joint work with Julio L opez and Darrel Hankerson Diego F. Aranha, Julio L opez, Darrel Hankerson High-speed parallel

High-speed engineering of high-speed software D. J. Bernstein Traditional software engineering:

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

High-speed Serial Interface Lect. 1 Introduction 1 High-Speed Circuits and Systems Lab.,

Parallel Firewall Designs for High-Speed Networks Ryan J. Farley WAKE FOREST US Department of

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

High-speed cryptography Do we care about speed? Daniel J. Bernstein Almost all software is

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

10 years of Speed Tables Peter da Silva FlightAware What are Speed Tables? What are Speed

Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: I have a problem

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Data Parallel Programming in R David Padua Department of

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Intro to data cleaning with Apache Spark CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data

High-speed parallel software implementation of the T pairing Diego - PowerPoint PPT Presentation

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of Computing UNICAMP Joint work with Julio L opez and Darrel Hankerson Diego F. Aranha, Julio L opez, Darrel Hankerson High-speed parallel

High-speed engineering of high-speed software D. J. Bernstein Traditional software engineering:

Cedar Rapids RLR &amp; Speed Des Moines RLR &amp; Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

High-speed Serial Interface Lect. 1 Introduction 1 High-Speed Circuits and Systems Lab.,

Parallel Firewall Designs for High-Speed Networks Ryan J. Farley WAKE FOREST US Department of

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

High-speed cryptography Do we care about speed? Daniel J. Bernstein Almost all software is

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &amp;

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

10 years of Speed Tables Peter da Silva FlightAware What are Speed Tables? What are Speed

Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: I have a problem

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Data Parallel Programming in R David Padua Department of

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Intro to data cleaning with Apache Spark CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &