An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou - PowerPoint PPT Presentation

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou January 5, 2012 Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

QUAD ◮ Stream cipher. Security relies on MQ (Multivariate Quadratics). Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

� � � � � � � � QUAD ◮ Stream cipher. Security relies on MQ (Multivariate Quadratics). ◮ With multivariate quadratic systems P , Q , generate key stream y 0 , y 1 , y 2 , . . . x 0 x 1 = Q ( x 0 ) x 2 = Q ( x 1 ) x 3 = Q ( x 2 ) · · · y 0 = P ( x 0 ) y 1 = P ( x 1 ) y 2 = P ( x 2 ) y 3 = P ( x 3 ) · · · Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

� � � � � � � � QUAD ◮ Stream cipher. Security relies on MQ (Multivariate Quadratics). ◮ With multivariate quadratic systems P , Q , generate key stream y 0 , y 1 , y 2 , . . . x 0 x 1 = Q ( x 0 ) x 2 = Q ( x 1 ) x 3 = Q ( x 2 ) · · · y 0 = P ( x 0 ) y 1 = P ( x 1 ) y 2 = P ( x 2 ) y 3 = P ( x 3 ) · · · ◮ Simply speaking, QUAD is polynomial evaluations . Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

SPELT ◮ Security relies on SMP ; i.e., P , Q are sparse . Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

SPELT ◮ Security relies on SMP ; i.e., P , Q are sparse . ◮ Usually of higher degree than QUAD. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

SPELT ◮ Security relies on SMP ; i.e., P , Q are sparse . ◮ Usually of higher degree than QUAD. ◮ Example: SPELT(31, 4, 96, 96, (32, 16, 8)): ◮ Field: F 31 , Degree: 4, #Variables: 96, #Equations: 96 (for each of P , Q ) ◮ Each equation has only 32 degree-2 terms, 16 degree-3 terms, 8 degree-4 terms Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

SPELT ◮ Security relies on SMP ; i.e., P , Q are sparse . ◮ Usually of higher degree than QUAD. ◮ Example: SPELT(31, 4, 96, 96, (32, 16, 8)): ◮ Field: F 31 , Degree: 4, #Variables: 96, #Equations: 96 (for each of P , Q ) ◮ Each equation has only 32 degree-2 terms, 16 degree-3 terms, 8 degree-4 terms ◮ More efficient than QUAD in practice. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Platform: GTX480 ◮ 15 × 32 = 480 SPs (cores) running at 1.4 GHz (32 SPs in each MP). Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Platform: GTX480 ◮ 15 × 32 = 480 SPs (cores) running at 1.4 GHz (32 SPs in each MP). ◮ Each MP has 16 KB L1 cache and 48 KB shared memory (the sizes can be switched ). Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Platform: GTX480 ◮ 15 × 32 = 480 SPs (cores) running at 1.4 GHz (32 SPs in each MP). ◮ Each MP has 16 KB L1 cache and 48 KB shared memory (the sizes can be switched ). ◮ Each MP has 32K registers. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Platform: GTX480 ◮ 15 × 32 = 480 SPs (cores) running at 1.4 GHz (32 SPs in each MP). ◮ Each MP has 16 KB L1 cache and 48 KB shared memory (the sizes can be switched ). ◮ Each MP has 32K registers. ◮ 32 memory banks in shared memory. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Platform: GTX480 ◮ 15 × 32 = 480 SPs (cores) running at 1.4 GHz (32 SPs in each MP). ◮ Each MP has 16 KB L1 cache and 48 KB shared memory (the sizes can be switched ). ◮ Each MP has 32K registers. ◮ 32 memory banks in shared memory. ◮ The maximal number of registers assigned to each threads is 64. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Details ◮ Threads in a warp deal with the same equation(s) but different sets of x i ’s. In other words, each block generates 32 key streams at the same time. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Details ◮ Threads in a warp deal with the same equation(s) but different sets of x i ’s. In other words, each block generates 32 key streams at the same time. ◮ Information of each term is written in instructions . Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Details ◮ Threads in a warp deal with the same equation(s) but different sets of x i ’s. In other words, each block generates 32 key streams at the same time. ◮ Information of each term is written in instructions . ◮ Values of x i are store in shared memory. ◮ We need 96 × 32 bytes. This is augmented into 100 × 32 to avoid bank conflicts. ◮ There are two buffers in shared memory, serving as source and destination. ◮ The results of the last 96 equations ( Q ) are written to global memory. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Implementation Details ◮ Threads in a warp deal with the same equation(s) but different sets of x i ’s. In other words, each block generates 32 key streams at the same time. ◮ Information of each term is written in instructions . ◮ Values of x i are store in shared memory. ◮ We need 96 × 32 bytes. This is augmented into 100 × 32 to avoid bank conflicts. ◮ There are two buffers in shared memory, serving as source and destination. ◮ The results of the last 96 equations ( Q ) are written to global memory. ◮ DIMGRID=30, DIMBLOCK=512. This means each warp has to deal with 192/16=12 equations. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Experiment Results ◮ Each block uses ≤ 64 KB shared. Each thread uses ≤ 32 regs. Therefore each MP should be able to run two blocks (32 warps) simultaneously. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Experiment Results ◮ Each block uses ≤ 64 KB shared. Each thread uses ≤ 32 regs. Therefore each MP should be able to run two blocks (32 warps) simultaneously. ◮ Performance: 1.38 Gbps. ◮ Good news: Better than the previous result: 0.91 Gbps. ◮ Bad news: Peak performance should be 6 . 99 Gbps if we consider multiplications only. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Experiment Results ◮ Each block uses ≤ 64 KB shared. Each thread uses ≤ 32 regs. Therefore each MP should be able to run two blocks (32 warps) simultaneously. ◮ Performance: 1.38 Gbps. ◮ Good news: Better than the previous result: 0.91 Gbps. ◮ Bad news: Peak performance should be 6 . 99 Gbps if we consider multiplications only. ◮ Mysterious behaviors of nvcc make it hard to find the bottleneck. Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Tweaks to Accelerate the Evaluations ◮ Total number of mults: 96 + 32 · 2 + 16 · 3 + 8 · 4 = 240. ◮ Classifying terms by x i ’s. 7 x 0 x 1 x 4 + 29 x 1 − → x 1 · (7 x 0 x 4 + 29) Saving at least 32 + 16 + 8 = 56 mults. ◮ Classifying terms by coefficients. 14 x 0 x 1 + 14 x 3 x 9 − → 14 · ( x 0 x 1 + x 3 x 9 ) Saving at least (32 + 16 + 8) + (96 − 30) = 122 mults. ◮ A mixed approach Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

Future Works ◮ asfermi: An assembler for the NVIDIA Fermi Instruction Set http://code.google.com/p/asfermi/ ◮ AMD GPUs Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou - PowerPoint PPT Presentation

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou January 5, 2012 Tung Chou An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) QUAD Stream cipher. Security relies on MQ (Multivariate Quadratics). Tung Chou An

The Stressed /er/ Spelt or We have learnt that the / er / sound can be spelt with er

s__ sh_ compl_t_ scr__m The phoneme / ee / can also be spelt with the digraph ey. key

The sound /u/ is usually spelt with the letter u . Try writing the words to go with these

Spelt and seaweed The combining of ancient ingredients from land and sea The fertile crescent

This week, we are going to look at words that have an /ear/ sound spelt using ere. This

These words all contain the phoneme /or/. Work with a friend to sort these words into 2 groups.

spelt with o. How many words can you think of that follow this spelling pattern? Write them

spelt with o. How many words can you think of that follow this spelling pattern? Write them

OPNET Implementation of OPNET Implementation of OPNET Implementation of OPNET Implementation of

EIA Implementation during EIA Implementation during the EIA Implementation during EIA

Cthulus Clutches Lovecraftian Horror Theme Storyboard Implementation Theme Storyboard

Implementation Status of Implementation Activities Tuesday, September 11, 2012 Implementation

National Implementation Action Plans National Implementation Action Plans WORKSHOP ON THE

Implementation of the Convention Implementation of the Convention (legislation, problems and

Implementation of global framework agreements in transnational Implementation of global framework

ERP IMPLEMENTATION ERP IMPLEMENTATION Kedar Gaonkar Kedar Gaonkar IETF IETF- -69 Chicago,

From quantum to classical scattering in post-Minkowskian gravity Early Stage Researcher: Andrea

Gauge-gravity duality and aspects of strongly coupled systems Arnab Kundu The University of

On the uniqueness of Einstein-Hilbert kinetic term (in massive (multi-)gravity) Andrew J. Tolley

Nonlocal Teleparallel Gravity Sebasti an Bahamonde PhD student at Department of Mathematics,

TEACHERS BY ISAAC AYODELE OYABAMBI DEPARTMENT OF EDUCATION FACULTY OF EDUCATION AHMADU BELLO

ANLP Lecture 28: Coreference Sharon Goldwater 18 Nov 2019 Todays lecture What is

All-Paths Algorithm Roland Backhouse October 22, 2002 2 Overview Goal : derive a single

A Universal Machine for Biform Theory Graphs Michael Kohlhase Felix Mance Florian Rabe Computer