Learning with Errors Solving of a system of linear equations secret 7ร4 4ร1 7ร1 โค 13 โค 13 โค 13 4 1 11 10 6 4 ร 9 = 8 5 5 9 53 11 1 11 10 3 9 0 10 4 12 1 3 3 2 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find (learn) red ๏จ Solve linear system
Learning with Errors Solving of a system of linear equations looks random random secret small noise 7ร4 4ร1 7ร1 7ร1 โค 13 โค 13 โค 13 โค 13 4 1 11 10 0 6 4 -1 ร 9 + = 8 5 5 9 53 1 11 1 1 11 10 3 9 0 10 1 4 0 12 1 3 3 2 -1 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find red ๏จ Learning with Errors (LWE) Problem
Key Aspects of Lattice-based Systems โข Encryption and signature systems are both feasible (and secure) โ Significant ciphertext expansion for (R-)LWE encryption โ Decryption error probability with (R-)LWE encryption โข Random Sampling not only from uniform but also from Discrete Gaussian distributions (not a trivial task!) โข Most operations are efficient and parallizable โ (Ideal lattices) Make use of FFT for polynomial multiplication โ (Standard lattices) Matrix-vector arithmetic โข Reasonably large public and private keys โ Given for encryption/signatures constructions โ Unclear for advanced services such as functional encryption (e.g., FHE)
Outline โข Introduction โข Classes of Post-Quantum Cryptography (PQC) โ Code-Based Cryptography โ Lattice-Based Cryptography โ Hash-Based Cryptography โข Lessons Learned
Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979) ๏ง Definition : Given a security parameter ๐ , the set of ๐ -bit vectors ๐ ๐ = {0,1} ๐ and a one-way function โ: ๐ ๐ โ ๐ ๐ ๏ง Secret key : Generate 2๐ ร ๐ -bit vector ๐ = (๐ฆ 0,0 , ๐ฆ 0,1 , ๐ฆ 1,0 , ๐ฆ 1,1 , . . , ๐ฆ ๐โ1,1 ) ๏ง Public Key : Compute ๐ = ๐ง 0,0 , . . , ๐ง ๐โ1,1 โ๐ง ๐,๐ = ๐(๐ฆ ๐,๐ ) x 0 x 1 x 0 x 1 x 0 x 1 โฆ = X x 0 x 1 x 0 x 1 h h h h h h h h h h y 0 y 1 y 0 y 1 y 0 y 1 โฆ = Y y 0 y 1 y 0 y 1 ๏ง Publish public key Y
Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979) ๏ง Definition : Given a published public key ๐ and an ๐ -bit message ๐ = (๐ 0 , โฆ , ๐ ๐โ1 ) to sign ๏ง Sign : Generate signature ๐ = (๐ฆ 0,๐ 0 , . . , ๐ฆ ๐โ1,๐ ๐โ1 ) by revealing corresponding ๐ฆ ๐,๐ ๐ secret bits. ๏ง Verify : Check that for f( ๐ ๐ ) = ๐ง (๐,๐ ๐ ) โ ๐ = [0, ๐ โ 1] m 0 m 1 m 2 m n-2 m n-1 r r r r r โฆ = ๐ x 0 x 1 x 0 x 1 x 0 x 1 x 0 x 1 x 0 x 1 ! h h h h h = y 0 y 1 y 0 y 1 y 0 y 1 โฆ = Y y 0 y 1 y 0 y 1
Extension for Multiple Use: Merkleโs Signature Scheme Public MSS key โข Idea by R. Merkle [1979] : reduces P K = V 3 [ 0 the validity of many OTS verification ] V [ 2 0 V ] 2 [ 1 ] keys to a single verification key V V V 1 V 1 1 [ 1 [ [ 3 [ 1 2 0 ] ] ] ] using a binary tree V V V 0 0 V 0 V 0 V 0 V [ [ 0 V 0 [ 0 [ [ [ 4 5 [ [ 0 1 2 3 6 7 ] ] ] ] ] ] ] = = = ] = = = = ๐(๐ 4 ) ๐(๐ 5 ) = ๐(๐ 0 ) ๐(๐ 1 ) ๐(๐ 2 ) ๐(๐ 0 ) ๐(๐ 6 ) ๐(๐ 7 ) Public OTS keys โข Properties and Requirements โ Max. signature count determined by height H of tree (fixed at setup) โ Needs to keep track of already used signatures in the tree ๏ stateful signature scheme โ Can be used with any one-time signature scheme and (collision- resistant) cryptographic hash function
Merkle Signature Scheme Principle Let ๐: {0,1} โ โ {0,1} ๐ be a hash function with security parameter ๐ โข Fix height ๐ผ and generate 2 ๐ผ LD-OTS key pairs (๐ ๐ , ๐ ๐ ) with 0 โค ๐ < 2 ๐ผ โข โข ๐ ๐ with 0 โค ๐ โค ๐ผ and 0 โค ๐ < 2 ๐ผโ๐ Notation : ๐ Example : ๐ผ = 3 PK = V 3 [0] V 2 [0] V 2 [1] V 1 [3] V 1 [2] V 1 [1] V 1 [0] V 0 [5] V 0 [4] V 0 [0] V 0 [6] V 0 [1] V 0 [2] V 0 [3] V 0 [7] = = = = = = = = ๐(๐ 4 ) ๐(๐ 5 ) ๐(๐ 6 ) ๐(๐ 0 ) ๐(๐ 1 ) ๐(๐ 2 ) ๐(๐ 0 ) ๐(๐ 7 ) (๐ 0 , ๐ 0 ) (๐ 1 , ๐ 1 ) (๐ 2 , ๐ 2 ) (๐ 3 , ๐ 3 ) (๐ 4 , ๐ 4 ) (๐ 5 , ๐ 5 ) (๐ 6 , ๐ 6 ) (๐ 7 , ๐ 7 ) โข Computation rule for inner nodes: ๐ ๐ ๐ = g(๐ ๐โ1 [2j] || ๐ ๐โ1 [2j+1]) with 0 < ๐ โค H and 0 โค ๐ < 2 ๐
Key Aspects of Hash-based Cryptographic Systems โข Only signature schemes available , no encryption โข Moderate requirements for implementations โ Second preimage (older schemes: collision) resistant hash function โ Pseudorandom functions for OTS (XMSS) โข Hard limitation on the number of signatures per tree โ Height of the tree determines max. # of signatures (issue with DoS attacks for real-world systems) โ Requires track record of signatures already used (critical in untrusted environments!) โ Increasing tree height increases memory requirements and computational complexity
Outline โข Introduction โข Classes of Post-Quantum Cryptography (PQC) โ Code-Based Cryptography โ Lattice-Based Cryptography โ Hash-Based Cryptography โข Lessons Learned
Lessons Learned โข Post-Quantum Cryptography essential for long-term security โ Code-based encryption schemes are the most mature candidates โ Digital signatures from hash-based cryptography with high confidence respect to security and under standardization โ Lattice-based cryptography has high potential and extremely high versatility โข Next topics in this tutorial (selection due to time constraints) โ Efficient implementation strategies for Code-Based Cryptosystems โ Efficient implementation of Lattice-Based Cryptosystems ICT-644729
Part I: Introduction to Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Gรผneysu Ruhr-Universitรคt Bochum & DFKI 04.10.2017 Thank you! Questions?
Part II: Hardware Architectures for Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Gรผneysu Ruhr-Universitรคt Bochum & DFKI 04.10.2017 including slides by Ingo von Maurich and Thomas Pรถppelmann Tutorial@CHES 2017 - Tim Gรผneysu
Tutorial Outline โ Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Recall: McEliece Encryption Scheme [1978] Key Generation Given a [๐, ๐] -code ๐ท with generator matrix ๐ป and error correcting capability ๐ข Private Key: (๐, ๐ป, ๐) , where ๐ is a scrambling and ๐ is a permutation matrix Public Key: ๐ปโฒ = ๐ ยท ๐ป ยท ๐ Encryption ๐ , error vector e โ ๐ ๐พ 2 ๐ , wt e โค ๐ข Message ๐ โ ๐พ 2 x โ ๐๐ปโฒ + e Decryption Let ฮจ ๐ผ be a ๐ข -error-correcting decoding algorithm. ๐ ยท ๐ โ ฮจ ๐ผ ๐ฆ ยท ๐ โ1 , removes the error e ยท ๐ โ1 Extract ๐ by computing ๐ ยท ๐ ยท ๐ โ1
Security Parameters (Binary Goppa Codes) โข Original proposal : McEliece with binary Goppa codes ๏ง Code properties determine key size, matrices are often large โข Code parameters revisited by Bernstein, Lange and Peters โข Public key is a ๐ โ (๐ โ ๐) bit matrix (redundant part only)
Code-based Cryptography for Embedded Devices K pub =M y= ฮจ (y, K priv ) y=Mx+e K priv (Matrix) Decrypt x Encrypt x y y โข Selection of the employed code is a highly critical issue โ Properties of code determine key size, short keys essential โ Structures in codes reduce key size, but can enable attacks โ Encoding is a fast operation on all platforms (matrix multiplication) โ Decoding requires efficient techniques in terms of time and memory โข Basic McEliece is only CPA-secure; conversion required โข Protection against side-channel and fault-injection attacks
Quasi-Cyclic Moderate Density Check Codes (QC-MDPC) โข ๐ข -error correcting (๐, ๐ , ๐ฅ) -QC-MDPC code of length ๐ = ๐ 0 ๐ โข Parity-check matrix ๐ผ consists of ๐ 0 blocks with fixed row weight ๐ฅ Code/Key Generation Generate ๐ 0 first rows of parity-check matrix blocks ๐ผ ๐ 1. ๐ of weight ๐ฅ ๐ , w = ๐=0 ๐ 0 โ1 ๐ฅ ๐ โ ๐ โ ๐ ๐บ 2 2. Obtain remaining rows by ๐ โ 1 quasi-cyclic shifts of โ ๐ ๐ผ = [๐ผ 0 |๐ผ 1 |โฆ |๐ผ ๐ 0 โ1 ] 3. Generator matrix of systematic form ๐ป = ๐ฝ ๐ ๐ 4. โ1 โ ๐ผ 0 ) ๐ (๐ผ ๐ 0 โ1 โ1 โ ๐ผ 1 ) ๐ (๐ผ ๐ 0 โ1 Q = โฆ โ1 โ ๐ผ ๐ 0 โ2 ) ๐ (๐ผ ๐ 0 โ1
Background on QC-MDPC Codes Parity check matrix ๐ผ ๐ 0 = 2 ๐ผ 1 ๐ผ 0 I Generator matrix ๐ป
(QC-)MDPC McEliece Encryption ๐ , error vector ๐ โ ๐ ๐บ 2 ๐ , ๐ฅ๐ข(๐) โค ๐ข Message ๐ โ ๐บ 2 x โ ๐๐ป + ๐ Decryption Let ฮจ ๐ผ be a ๐ข -error-correcting (QC-)MDPC decoding algorithm. ๐๐ป โ ฮจ ๐ผ ๐๐ป + ๐ Extract ๐ from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] ๐ 0 = 2, ๐ = 9602, ๐ = 4801, ๐ฅ = 90, ๐ข = 84
Tutorial Outline โ Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Hardware Implementation of Building Blocks for McEliece/Niederreiter โข Two Operations โ Encryption/Encoding: G โข Matrix-vector multiplication (with large matricies, either to be stored or to be generated on-the-fly); codeword โข TRNG for error generation โ Decryption/Decoding: ciphertext โข Code- specific syndrome decoding; hard-decision decoding with simple (bitwise) operations preferred โข Inverse-matrix-vector multiplication message
Efficient Decoding of MDPC Codes Decoders for LDPC/MDPC codes: bit flipping and belief propagation โBit - Flippingโ Decoder 1. Compute syndrome ๐ก of the ciphertext 2. Count unsatisfied parity-check-equations # ๐ฃ๐๐ for each ciphertext bit Flip ciphertext bits that violate โฅ ๐ equations 3. 4. Recompute syndrome Repeat until ๐ก = 0 or reaching max. iterations (decoding failure) 5. ๏ง How to determine threshold ๐ ? โข Precompute ๐ ๐ for each iteration [Gal62] โข ๐ = ๐๐๐ฆ ๐ฃ๐๐ [HP03] โข ๐ = ๐๐๐ฆ ๐ฃ๐๐ โ ฮด [MTSB13]
FPGA Low-Resource Encryption Target: Xilinx Spartan-6 FPGA 32 flip flops Scheme: QC-MDPC Encryption m ๏ง Given first 4801-bit row ๐ of ๐ป and message ๐ , Control + XOR compute ๐ฆ = ๐๐ป + ๐ BRAM ๏ง Storage requirements โข One 18 kBit BRAM is sufficient to store message m , m row ๐ and the redundant part (3x4801-bit vectors) G โข But only two data ports are available redundan โข t part Read out 32-bit of the message and store them in a separate register ๏ง Error addition โข Instead of starting with an all-zero redundant part we preload it with the second half of the error vector
FPGA Low-Resource Decryption QC-MDPC Decryption ๏ง Secret key and ciphertext consist of two blocks ๏ง Iterative vs. parallel design ๏ง Decoding is complex task โ parallel processing ๏ง BRAM-based implementation: storage requirements ๏ง Secret key (2x4801 bit) ๏ง Ciphertext (2x4801 bit) ๏ง Syndrome (4801 bit) ๏ง In total 3 BRAMs due to memory and port access requirements
FPGA Low-Resource Decryption QC-MDPC Decryption Syndrome computation ๐ก = ๐ผ๐ฆ ๐ ๏ง โข Similar technique as for encoding ๏ง Compare ๐ก = ๐? โข Compute binary OR of all 32-bit blocks of the syndrome ๏ง Count # ๐ฃ๐๐ โข Hamming weight of syndrome AND โ 0 /โ 1 (32-bit at a time) โข Accumulate Hamming weight ๏ง Bit-flipping โข If # ๐ฃ๐๐ โฅ ๐ ๐ invert ciphertext bit(s) and XOR โ 0 /โ 1 to the syndrome while rotating both
Lightweight FPGA Results ๏ง Post-PAR for Xilinx Spartan-6 XC6SLX4 & Virtex-6 XC6VLX240T ๏ง Encryption takes 735,000 cycles ๏ง Decryption takes 4,274,000 cycles on average
Lightweight FPGA Comparison ๏ง Realistic public key size (0.6 kByte vs. 50-100 kByte) ๏ง Smallest McEliece FPGA implementation ๏ง Sufficient performance for many applications
Tutorial Outline โ Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Lattice-Based Cryptography โข Recall: Benefits of Lattice-Based Cryptography โ We can get signatures and public key encryption from lattices and also more advanced services (IBE, FHE) โ A lot of development on theory side; schemes are improving โ Implementation of lattice-based cryptography is a young field; only done for a few years (except maybe for NTRU)
To be Ideal or not Ideal? Two important lines of research: random lattices and ideal lattices โข Major impact on implementation (theory not that much) โข Security for random lattices is better understood (ideal lattices are more structured) ๏ท Ideal Lattices ๏ท Random Lattices โข โข Operations on large matrices Operations on polynomials with 256 or (e.g., 532x840) 512 coefficients โข Mostly matrix-vector multiplication modulo ๐ < 2 32 โข Mostly polynomial multiplication modulo ๐ < 2 32 โข Large public keys (e.g., 532x840 matrix) โข Public keys are one (or two) polynomials with 256 or 512 coefficients
Learning with Errors Solving of a system of linear equations secret 7ร4 4ร1 7ร1 โค 13 โค 13 โค 13 4 1 11 10 6 4 ร 9 = 8 5 5 9 53 11 1 11 10 3 9 0 10 4 12 1 3 3 2 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find (learn) red ๏จ Solve linear system
Learning with Errors Solving of a system of linear equations looks random random secret small noise 7ร4 4ร1 7ร1 7ร1 โค 13 โค 13 โค 13 โค 13 4 1 11 10 0 6 4 -1 ร 9 + = 8 5 5 9 53 1 11 1 1 11 10 3 9 0 10 1 4 0 12 1 3 3 2 -1 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find red ๏จ Learning with errors
(Ring) Learning with Errors From learning with errors to ring-learning with errors 7ร4 โค 13 Only one line 4 4 1 1 11 11 10 10 โข Shift first line on every line has to be stored โข Use rule that we negate x in case of wrap around (e.g., 3 4 1 11 10 โ โ10 โก 3 mod 13) 2 3 4 1 12 2 3 4 9 12 2 3 10 9 12 2 11 10 9 12
Ring Learning with Errors: Principle ๐ 34 23 โฆ 23 โข Ideal lattices correspond to ideals in random ๐ ๐ ๐ฆ the ring R = ๐ฆ ๐ +1 ร small secret ๐ 1 -2 โฆ 0 โข Ring Learning With Errors (RLWE) (Gaussian) sample is: ๐ฎ = ๐๐ + ๐ โ ๐ for + uniform ๐ โ R and small discrete small error ๐ 0 1 โฆ 0 Gaussian distributed ๐, ๐ โ ๐ธ ๐ (Gaussian) โ Search-RLWE: Find s when given ๐ฎ = and ๐ random โ Decision-RLWE: Distinguish ๐ฎ from 32 43 โฆ 12 uniform when given ๐ฎ and ๐
Example: ๐ ๐ ๐ฆ Polynomial Addition in R = ๐ฆ ๐ +1 ๐ ๐ ๐ฆ โข Assume ring R = ๐ฆ ๐ +1 โข Assume parameters ๐ = 5 and ๐ = 4 ๐ = 4๐ฆ 3 + 2๐ฆ 2 + 0๐ฆ 1 + 1 โข = (4,2,0,1) ๐ฅ = 2๐ฆ 3 + 1๐ฆ 2 + 4๐ฆ 1 + 0 โข = 2,1,4,0 โข ๐ = ๐ + ๐ = 4 + 2 mod 5,2 + 1,4,1 = (1,3,4,1) ๐ ๐ ๐
Example: ๐ ๐ ๐ฆ Polynomial Multiplication in R = ๐ฆ ๐ +1 โข ๐ = 2, 1, 4, 0 โข Task: ๐ = ๐ โ ๐ = (3, 0, 2, 0) โข ๐ = 1, 3, 4, 1
Discrete Gaussian Distribution โข ๐ธ ๐ is defined by assigning weight proportional to โ๐ฆ 2 ๐ ๐ ๐ฆ = exp( 2๐ 2 ) R = ๐ ๐๐๐๐ ๐ฆ Uniform ๐ -1501 1020 502 โฆ -1900 572 ๐ฆ ๐๐๐ + 1 Gaussian e -1 4 -8 โฆ 0 1 Remark on Arithmetic of x-distributed values: Uniform * Gaussian = Uniform Gaussian * Gaussian = larger Gaussian
Gaussian Sampling: Options Cumulative Distribution Table (CDT) Rejection Sampling Sampling Bernoulli Sampling [DG14] Efficient sampling from discrete Gaussians for lattice-based cryptography on a constrained device , Dwarakanath and Galbraith, Applicable Algebra in Engineering, Communication and Computing, 2014 Knuth-Yao Sampling [DDLL14] Lattice Signatures and Bimodal Gaussian s, Lรฉo Ducas and Alain Durmus and Tancrรจde Lepoint and Vadim Lyubashevsky, CRYPTO '13
Ring-LWE Encryption Scheme [LP11/LPR10] Gen : Choose ๐ โ ๐ and ๐ 1 , ๐ 2 โ ๐ธ ๐ ; pk : ๐ = ๐ 1 โ ๐ โ ๐ 2 โ R ; sk : ๐ 2 ๐ ๐ 1 x + Enc ( ๐, ๐, ๐ โ 0,1 ๐ ): ๐ 1 , ๐ 2 , ๐ 3 โ ๐ธ ๐ . ๐ธ ๐ ๐ธ ๐ ๐ธ ๐ ๐ = ๐๐๐๐๐๐ ๐ . Ciphertext: ๐ x + + ๐ 2 [๐ 1 = ๐ โ ๐ 1 +๐ 2 , ๐ 2 = ๐ โ ๐ 1 +๐ 3 + ๐] ๐ ๐๐๐๐๐๐ Dec ( ๐ = [๐ 1 , ๐ 2 ], ๐ ๐ ): Output ๐ 1 ๐๐๐๐๐๐ ๐ x + ๐๐๐๐๐๐(๐ 1 โ ๐ 2 +๐ 2 ) ๐ ๐ 2 1 Correctness: ๐ 1 ๐ 2 + ๐ 2 = (๐๐ 1 + ๐ 2 ) ๐ 2 + ๐๐ 1 + ๐ 3 + ๐ = ๐ 2 ๐๐ 1 + ๐ 2 ๐ 2 + ๐ 1 ๐ 1 โ ๐ 2 ๐๐ 1 + ๐ 3 + ๐ = ๐ + ๐ 2 ๐ 2 + ๐ 1 ๐ 1 + ๐ 3 small large
Ring-LWE Encryption: Parameters R = ๐ ๐๐๐๐ ๐ฆ ๐ฆ ๐๐๐ + 1 ๐ โ bit message/coefficients Error correction m 0 1 โฆ 1 0 โข Encode(m ) ๐๐๐๐๐๐ ๐ โ Return ๐ โ ๐/2 ๐ 0 2046 โฆ 2046 0 โข Decode (x) ๐ + ๐ 2 ๐ 2 + 402 1907 โฆ 2631 4024 ๐ 1 ๐ 1 + ๐ 3 โ If ( 1/4๐ < ๐ฆ < 3/4๐ ) de ๐๐๐๐ ๐ Return 1 โ Else return 0 ๐ 0 1 โฆ 1 0
Ring-LWE Encryption: Parameters | ๐ 1 , ๐ 2 | Parameter sets ๐ ๐ ๐ |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits โข Message and ciphertext: โ Message space: ๐ bits โ Expansion 2 โ log 2 ๐ โ Two large polynomials ( ๐ 1 , ๐ 2 ) โข Public key: one or two large polynomials ( ๐ , ๐) โข Secret key: small polynomial ( ๐ ๐ )
Tutorial Outline โ Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Hardware Implementation Building Blocks for R-LWE โข Two main components โ Polynomial multiplier for ๐ = {256,512,1024} over specific rings with coefficients with less than log2(๐) < 24 bits โ Discrete Gaussian sampler with precisely defined precision ๐
Hardware Implementation: Low-Cost Design for Xilinx Spartan-6 โข Row-wise polynomial multiplication ( ๐๐ 1 / ๐๐ 1 ) โ Simple address generation โ Sample coefficient of ๐ 1 , add row of ๐ 1 then add row of ๐ 2 , add coefficient of ๐ 2 and ๐ 3 โข Key and ciphertext are stored in block memory Modular โข DSP block for arithmetic reduction (power ( ๐ ร ๐ -bit multipler) ot two possible) Multiplication (DSP)
Hardware Implementation: Low Area Post-place-and-route performance on a Spartan-6 LX9 FPGA. Area savings by power of two modulus โข Usage of ๐ = 4096 leads to area improvement and higher clock frequency โข Performance is still very good โข Area consumption is low, especially for decryption
Ring-LWE: Can we do better? โข Schoolbook polynomial multiplication is simple and independent of parameters โข Performance is reasonable but can still be improved Remember: according to schoolbook multiplication, we need ๐ 2 โข multiplications modulo q for one polynomial multiplication โ 128 2 = 16384 โ 256 2 = 65536 โ 512 2 = 262144 โ 1024 2 = 1048576 Can we do better?
Optimization: Polynomial Multiplication based on NTT โข Include algorithmic tweaks for fast polynomial multiplication โข The Number Theoretic Transform (NTT) is a discrete Fourier transform (DFT) defined over a finite field or ring. For a given primitive ๐ -th root of unity ๐ the NTT is defined as: โ Forward transformation: NTT ๐โ1 ๐ ๐ ๐ ๐๐ , ๐ = 0,1,โฆ , ๐ โข ๐ฉ[๐] = ๐=0 โ Inverse transformation: INTT ๐โ1 ๐ฉ ๐ ๐ โ๐๐ , ๐ = 0,1,โฆ , ๐ โข ๐[๐] = ๐ โ1 ๐=0 โข NTT exists if ๐ is a prime, ๐ a power of two and if q โก 1 mod 2๐ โข Example : Ring-LWE encryption: 7681 mod 2 โ 256 = 1
NTT for Lattice Cryptography: Convolution Theorem โข With the convolution theorem we can basically multiply two vectors/polynomials with the help of the NTT โ ๐ = INTT NTT ๐ โ NTT ๐ โ Efficient algorithms are known for bi-direction conversion NTT ๐ โ INTT ๐ NTT ๐ โข Negative Wrapped Convolution: โ Polynomial multiplication in ๐ ๐ ๐ฆ / ๐ฆ ๐ + 1 โ Runtime ๐(๐ log๐) โ No appending of zeros required (as for regular convolution) โ Implicit polynomial reduction by ๐ฆ ๐ + 1
Efficient Computation of the NTT (Cooley-Tukey) twiddle factors Multiplication by ๐ 0 = 1 โข Bitreversal required ( NTT ๐๐โ๐๐ ) โข Precomputationof powers of ๐ possible โข Arithmetic is basically multiplication and reduction ๐ modulo ๐ ( 2 log 2 (๐) times) โข Further optimizations still possible
Ring-LWE Encryption on FPGA NTT is very fast but still quite small Lots of improvement since [GFS+12]
Tutorial Outline โ Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Lessons Learned ๏ง Efficient McEliece implementations with practical key sizes โข QC-MDPC codes are an efficient alternative to binary Goppa codes โข Note: consider attacks on decryption failure rate (ASIACRYPT 2016) โข Low-cost FPGA implementation practical for key agreement scheme (in prep) ๏ง Efficient R-LWE encryption are extremely efficient โข R-LWE (and variants) also allow signature + advanced schemes โข FPGA implementations more efficient than RSA, en par with ECC ๏ง Papers and source code available at http://www.seceng.rub.de/research/projects/pqc/ ๏ง For more papers and codes, see project websites of ICT-644729
Part II: Hardware Architectures for Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Gรผneysu Ruhr-Universitรคt Bochum & DFKI 04.10.2017 Thank you! Questions? Tutorial@CHES 2017 - Tim Gรผneysu
Part III: Post Quantum Cryptography in Embedded Software Tutorial@CHES 2017 - Taipei Tim Gรผneysu Ruhr-Universitรคt Bochum & DFKI 04.10.2017 including slides by Ingo von Maurich and Thomas Pรถppelmann
Tutorial Outline โ Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Recall: McEliece Encryption Scheme [1978] Key Generation Given a [๐, ๐] -code ๐ท with generator matrix ๐ป and error correcting capability ๐ข Private Key: (๐, ๐ป, ๐) , where ๐ is a scrambling and ๐ is a permutation matrix Public Key: ๐ปโฒ = ๐ ยท ๐ป ยท ๐ Encryption ๐ , error vector e โ ๐ ๐พ 2 ๐ , wt e โค ๐ข Message ๐ โ ๐พ 2 x โ ๐๐ปโฒ + e Decryption Let ฮจ ๐ผ be a ๐ข -error-correcting decoding algorithm. ๐ ยท ๐ โ ฮจ ๐ผ ๐ฆ ยท ๐ โ1 , removes the error e ยท ๐ โ1 Extract ๐ by computing ๐ ยท ๐ ยท ๐ โ1
(QC-)MDPC McEliece Encryption ๐ , error vector ๐ โ ๐ ๐บ 2 ๐ , ๐ฅ๐ข(๐) โค ๐ข Message ๐ โ ๐บ 2 x โ ๐๐ป + ๐ Decryption Let ฮจ ๐ผ be a ๐ข -error-correcting (QC-)MDPC decoding algorithm. ๐๐ป โ ฮจ ๐ผ ๐๐ป + ๐ Extract ๐ from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] ๐ 0 = 2, ๐ = 9602, ๐ = 4801, ๐ฅ = 90, ๐ข = 84
Tutorial Outline โ Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
32-bit ARM Microcontroller ARM-based 32-bit Microcontroller ๏ง STM32F407@168MHz ๏ง 32-bit ARM Cortex-M4 ๏ง 1 Mbyte flash, 192 kbyte SRAM ๏ง Crypto functions: TRNG, 3DES, AES, SHA-1/-256, HMAC co-processor ๏ง Costs: roughly US$ 10 AVR-based 8-bit Microcontroller ๏ง ATXMega128A1@32MHz ๏ง 8-bit AVR Xmega Family ๏ง 256 Kbyte flash, 8 Kbyte SRAM ๏ง Crypto functions: DES, AES ๏ง Costs: roughly US$ 10
Implementing Key Generation ๏ง Memory is a scarce resource on microcontrollers ๏ง Generate and store random sparse vectors of length 4801 with 45 bits set ๏ store set bit locations only Generating secret key ๐ฐ = [๐ฐ ๐ |๐ฐ ๐ ] ๏ง Generate first row of ๐ผ 1 , repeat if not invertible ๏ง Generate first row of ๐ผ 0 ๏ง Convert to sparse representation โ 90 counters Computing public key ๐ฏ = [๐ฑ|๐น] โ1 and ๐ผ 0 ๏ง Compute ๐ from first row of ๐ผ 1
Implementing (Plain) Encryption ๏ง Recall operation principle as for low-cost hardware โข All processes are based on 32-bit based operations โข Set bits in message ๐ select rows of the public key ๐ป โข Parse ๐ bit-by-bit, XOR current row of ๐ป if bit is set ๏ง Error addition for encryption โข Use TRNG to provide random bits to add ๐ข errors โข Obtain individual error indices by rejection sampling from log 2 ๐ = 14 bit
Implementing (Plain) Decryption Recall syndrome computation; parity check matrix in sparse ๏ง Parse ciphertext bit-by-bit ๏ง XOR row of the secret key if corresponding ciphertext bit is set Decoding iteration ๏ง Count #bits that are set in the syndrome and current row of the parity-check matrix blocks ๏ use 90 counters ๏ง Compare #bits to decoding threshold ๏ง Invert current ciphertext bit if #bits above threshold ๏ง Add current row to syndrome ๏ง Generate next row โ increment counters (check overflows)
Implementation Results Scheme Platform Cycles/Op Time McE MDPC (keygen) STM32F407 148,576,008 884 ms McE MDPC (enc) STM32F407 16,771,239 100 ms McE MDPC (dec) STM32F407 37,171,833 221 ms McE MDPC (enc) ATxmega256 26,767,463 836 ms McE MDPC (dec) ATxmega256 86,874,388 2,71 s โข 8-Bit AVR platform too slow for real-world deployment โข Key generation excessive, decryption roughly 3 seconds โข 32-bit ARM is a suitable platform and provides built-in TRNG โข Improved QcBits software for Cortex-M4 by Chou (CHES 2016)
Further Implementation Remarks and Requirements โข CCA2-Security for McEliece Encryption: โ Additional conversion (e.g., via Fujisaki-Okamoto, includes the necessity for hash-function and re-encryption) โข Side-Channel Attacks: โ Masking schemes (SCA) for McEliece by Eisenbarth et al. [SAC15], does not include CCA2 security โข Decryption Failure Rate Attacks: โ Guo et al [ASIACRYPT16] identifies correlation between decoding failures in iterative decoders (bit flipping decoding)
Tutorial Outline โ Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Ring-LWE Encryption Scheme [LP11/LPR10] Gen : Choose ๐ โ ๐ and ๐ 1 , ๐ 2 โ ๐ธ ๐ ; pk : ๐ = ๐ 1 โ ๐ โ ๐ 2 โ R ; sk : ๐ 2 ๐ ๐ 1 x + Enc ( ๐, ๐, ๐ โ 0,1 ๐ ): ๐ 1 , ๐ 2 , ๐ 3 โ ๐ธ ๐ . ๐ธ ๐ ๐ธ ๐ ๐ธ ๐ ๐ = ๐๐๐๐๐๐ ๐ . Ciphertext: ๐ x + + ๐ 2 [๐ 1 = ๐ โ ๐ 1 +๐ 2 , ๐ 2 = ๐ โ ๐ 1 +๐ 3 + ๐] ๐ ๐๐๐๐๐๐ Dec ( ๐ = [๐ 1 , ๐ 2 ], ๐ ๐ ): Output ๐ 1 ๐๐๐๐๐๐ ๐ x + ๐๐๐๐๐๐(๐ 1 โ ๐ 2 +๐ 2 ) ๐ ๐ 2 1 Correctness: ๐ 1 ๐ 2 + ๐ 2 = (๐๐ 1 + ๐ 2 ) ๐ 2 + ๐๐ 1 + ๐ 3 + ๐ = ๐ 2 ๐๐ 1 + ๐ 2 ๐ 2 + ๐ 1 ๐ 1 โ ๐ 2 ๐๐ 1 + ๐ 3 + ๐ = ๐ + ๐ 2 ๐ 2 + ๐ 1 ๐ 1 + ๐ 3 small large
Ring-LWE Encryption: Parameters | ๐ 1 , ๐ 2 | Parameter sets ๐ ๐ ๐ |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits โข Message and ciphertext: โ Message space: ๐ bits โ Expansion 2 โ log 2 ๐ โ Two large polynomials ( ๐ 1 , ๐ 2 ) โข Public key: one or two large polynomials ( ๐ , ๐) โข Secret key: small polynomial ( ๐ ๐ )
Tutorial Outline โ Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned
Simple Implementation of RLWE-Encryption void encrypt (poly a, poly p, unsigned char * plaintext, poly c1, poly c2) { int i,j; poly e1,e2,e3; gauss_poly (e1); gauss_poly (e2); gauss_poly (e3); poly_init(c1, 0, n); // init with 0 This has to be fast poly_init(c2, 0, n); // init with 0 for(i = 0;i < n ; i++){ // multiplication loops for(j = 0; j< n ; j++){ c1[(i + j) % n] = modq(c1[(i + j) % n] + ( a[i] * e1[j] * (i+j>=n ? -1 : 1))); c2[(i + j) % n] = modq(c2[(i + j) % n] + ( p[i] * e1[j] * (i+j>=n ? -1 : 1))); } c1[i] = modq(c1[i] + e2[i]); c2[i] = (plaintext[i>>3] & (1<<(i%8))) ? modq(c2[i] + e3[i] + q/2) : modq(c2[i] + e3[i]); } }
Software Implementation Main Functions for R-LWE โข Two main components โ Polynomial multiplier for ๐ = {256,512,1024} over specific rings with coefficients with less than log2(๐) < 24 bits โ Discrete Gaussian sampler with precisely defined precision ๐ and tail cut ๐
Intermediate Results โข Implementation of RLWE-Encryption on the AVR 8-bit ATxmega processor running at 32 MHz โข Schoolbook multiplication (SchoolMul) โข Encryption is two multiplications and decryption one
Recall Improvement: Polynomial Multiplication with NTT โข Number Theoretic Transform (NTT) is a discrete Fourier transform (DFT) defined over a finite field or ring. For a given primitive ๐ -th root of unity ๐ the NTT is defined as: โ Forward transformation: NTT ๐โ1 ๐ ๐ ๐ ๐๐ , ๐ = 0,1, โฆ , ๐ โข ๐ฉ[๐] = ๐=0 โ Inverse transformation: INTT โข ๐[๐] = ๐ โ1 ๐=0 ๐โ1 ๐ฉ ๐ ๐ โ๐๐ ,๐ = 0,1, โฆ , ๐ โข NTT exists if ๐ is a prime, ๐ a power of two and if q โก 1 mod 2๐
Efficient Computation of the NTT (Textbook) twiddle factors Multiplication by ๐ 0 = 1 โข Bitreversal required ( NTT ๐๐โ๐๐ ) โข Precomputation of powers of ๐ possible โข Arithmetic is basically multiplication and ๐ reduction modulo ๐ ( 2 log 2 (๐) times) 09.10.2012
Optimization of NTT Computation Removal of expensive โhelperโ functions โข Problem: Permutation (Bitrev) of polynomial is expensive โ โStandardโ NTT ๐๐โ๐๐ requires bitreversed input and produces naturally ordered output โ Bitreversal before each forward or inverse NTT โข Solution: NTT algorithm can be written as โ Natural to bitreversed for forward: NTT ๐๐โ๐๐ โ Bitreversed to natural for inverse: INTT ๐๐โ๐๐ โ No bitreversal necessary anymore: โข INTT ๐๐โ๐๐ (NTT ๐๐โ๐๐ ๐ โ NTT ๐๐โ๐๐ (๐))
Recommend
More recommend