Parallel cube testing on GPUs Sudarshan Rao June 10, 2010 1 / 50 - - PowerPoint PPT Presentation

parallel cube testing on gpus
SMART_READER_LITE
LIVE PREVIEW

Parallel cube testing on GPUs Sudarshan Rao June 10, 2010 1 / 50 - - PowerPoint PPT Presentation

Parallel cube testing on GPUs Sudarshan Rao June 10, 2010 1 / 50 Outline 1 Introduction and Background Background Cube Testing CUDA Primitives 2 Framework 3 Experiments Description of experiments Results Timing 4 Conclusions 5 Future Work


slide-1
SLIDE 1

Parallel cube testing on GPUs

Sudarshan Rao June 10, 2010

1 / 50

slide-2
SLIDE 2

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

2 / 50

slide-3
SLIDE 3

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

3 / 50

slide-4
SLIDE 4

Cryptographic primitives

Algorithms used to construct security systems Crypto primitives used everywhere Security is essential Hash functions, block ciphers, stream ciphers etc

4 / 50

slide-5
SLIDE 5

Hash functions

Convert variable length message to fixed length message digest Used in digital signatures, message authentication codes etc Necessary security properties - Preimage resistance, Collision resistance, Second preimage resistance Brute force attacks - birthday paradox e.g., MD5, SHA family etc

5 / 50

slide-6
SLIDE 6

Block cipher

Encrypt fixed blocks of data Used to encrypt certain fixed sized data blocks, construction

  • f stream ciphers etc

Components of a block cipher

Plaintext Key Ciphertext

e.g., DES, AES, Twofish, etc

6 / 50

slide-7
SLIDE 7

Cube attack

Cube attack - Itai Dinur and Adi Shamir Successful against low degree based primitives Treats primitive under attack as a black box Attacks on Trivium reported

7 / 50

slide-8
SLIDE 8

Terminology

In GF(2)

X + Y = X xor Y X ∗ Y = X and Y

p(x1, x2, · · · xn): Polynomial p(x1, x2, · · · xn) = tI · pS(I) + q(x1, x2, · · · xn) I ⊆ {1, 2, . . . n}: Index set pS(I): Superpoly q: Remainder tI = xixi+1 · · · xj where i, (i + 1) · · · j ∈ I xi, xi+1 · · · xj are known as the cube variables

8 / 50

slide-9
SLIDE 9

Evaluation of a superpoly

p = x1x2(x3 + x4) + x1x3 x1, x2 are cube variables Consider

x1x2=11

  • x1x2=00

p = 0 · 0(x3 + x4) + 0 · x3 + 0 · 1(x3 + x4) + 0 · x3 +1 · 0(x3 + x4) + 1 · x3 + 1 · 1(x3 + x4) + 1 · x3

9 / 50

slide-10
SLIDE 10

Evaluation of the superpoly

p(x1, x2, . . . xn) = tI · pS(I) + q(x1, x2, . . . xn) q misses at least one xi, i ∈ I q is added even number of times pS(I) is added only once

10 / 50

slide-11
SLIDE 11

Superpoly

Theorem

  • I

tI · pS(I) + q(x1, x2, . . . xn) = pS(I)

11 / 50

slide-12
SLIDE 12

Find the value of the superpoly

Choose a set of cube variables say c1, c2, . . . cn Choose a set of superpoly variables say s1, s2, . . . sm Choose a random assignment for s1, s2, . . . sm for c1, c2, . . . cn = 000 . . . 00 to 111 . . . 11 do Q = Q ⊕ p(c1, c2, . . . cn, s1, s2, . . . sm) end for

12 / 50

slide-13
SLIDE 13

Cube Testing

Q should be a random polynomial Can perform a variety of tests on Q Cube testing

Test for balance of Q Test for linear variables in Q Test for neutral variables in Q Test for low degree Q

13 / 50

slide-14
SLIDE 14

CUDA

NVIDIA’s SDK for programming their GPUs C for CUDA enables developers to write C like programs Functions called kernels get executed on the GPU Kernels get executed in parallel on the GPU

14 / 50

slide-15
SLIDE 15

CUDA contd...

Figure: Cuda program execution[3]

15 / 50

slide-16
SLIDE 16

CUDA concepts

Thread hierarchy

Thread blocks, grids

Memory hierarchy

Global memory, shared memory, registers

16 / 50

slide-17
SLIDE 17

AES

Block cipher standardized by NIST in 2000 Block sizes of 128 bits, 192 bits or 256 bits Not based on popular Feistel network

Figure: AES Round function[1]

In our tests we use AES-128

17 / 50

slide-18
SLIDE 18

Threefish

Tweakable block cipher Component in Skein, a NIST SHA-3 contest candidate Block sizes of 256 bits, 512 bits and 1024 bits Many simpler rounds more effective than few complicated rounds We use Threefish−256 in our tests

18 / 50

slide-19
SLIDE 19

Threefish Mix and Round functions

Figure: Threefish Mix and Round function[2]

19 / 50

slide-20
SLIDE 20

Keccak

Keccak - candidate hash algorithm in the SHA-3 contest Based on sponge construction Uses a permutation as part of construction Keccak-f [1600] permutation is studied

20 / 50

slide-21
SLIDE 21

Keccak permutation

Keccak-f [1600] - 3-dimensional array R = ι ◦ χ ◦ π ◦ ρ ◦ θ χ is a non-linear mapping θ, π, ρ - operations that permute the state ι - Mixing a round constant

21 / 50

slide-22
SLIDE 22

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

22 / 50

slide-23
SLIDE 23

Design of the framework

CUDA and Java

CUDA - Data collection Statistical analysis in Java

Majority of computation offloaded to GPU

23 / 50

slide-24
SLIDE 24

Data collection

Data collection performed by CUDA program Choose a random subset of the plaintext bits as the cube variables say c1, c2, . . . cn Choose a random subset of the plaintext bits as the superpoly variables say s1, s2, . . . sm {Outer parallel loop - splitting among thread blocks} for i = 1 to N do Choose a random assignment for s1, s2, . . . sm {Inner parallel loop - splitting among threads } for c1, c2, . . . cn = 000 . . . 00 to 111 . . . 11 do Qi = Qi ⊕ Fi(c1, c2, . . . cn, s1, s2, . . . sm) end for end for Write the values of Qi to a output file

24 / 50

slide-25
SLIDE 25

Output file

786432274 203b3a06433a16480d4077af23830b01 43 102 86 81 10 17 51 72 107 41 45 12 71 31 95 117 16 FAC660A226D84441536B6DBE1F4DE419 1 15BD983E24D135969C5F891007805132 2 E6327AEC447FBEA5CFE0D97F0A7A7AD9 3 426A1ABBE71F6181FA9551967BCAB1CD 4 E907E333D4C476ADB0076DF299FE9C20 5 B4DAEB1D515767B9F5C5DA99CC33DE17 6 FB6AE7838E383226EB55B9C41E4FD227 7 0DE3FC648462065F200CAABCAC6792A5 . .

25 / 50

slide-26
SLIDE 26

Statistical Analysis

Output files analysed by Java program Study data with different significance levels, number of samples Statistical functions - Parallel Java Library[4] Plots - Cube Test Library[5]

26 / 50

slide-27
SLIDE 27

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

27 / 50

slide-28
SLIDE 28

Balance Test of 1 superpoly

Let Q be a superpoly Hypothesis

Q is a random polynomial The value of Q is 0/1 with equal probability

Let N be number of random assignments to superpoly variables χ2 test

Expected number of 0s = Expected number of 1s = N/2 n0 = Observed number of 0s n1 = Observed number of 1s χ2 = (n0−N/2)2

N/2

+ (n1−N/2)2

N/2

Calculate p-value (for χ2 distribution with 1 degree of freedom) Test fails if p-value less than significance level

28 / 50

slide-29
SLIDE 29

Balance Test of all superpolys

Hypothesis (significance level of P)

A superpoly will pass the balance test with a probability of (1 − P)

Let N be the number of superpolys being tested χ2 Test

Np = Expected number of passes = (1 − P) · N Nf = Expected number of failures = P · N n0 = Observed number of passed tests n1 = Observed number of failed tests χ2 = (n0−Np)2

Np

+ (n1−Nf )2

Nf

Calculate p-value (for χ2 distribution with 1 degree of freedom) Test fails if p-value less than significance level

29 / 50

slide-30
SLIDE 30

Output/Output independence Test

Let Qi and Qj be two superpolys Hypothesis

The value of Qi is independent of the value of Qj

Let N be number of random assignments to superpoly variables χ2 Test

Expected number of (0,0) values for (Qi, Qj) = N/4 (same for (0,1), (1,0), (1,1)) Let n0, n1, n2 and n3 be the observed counts of (0,0),(0,1), (1,0) and (1,1) values for (Qi, Qj) χ2 = (n0−N/4)2

N/4

+ (n1−N/4)2

N/4

+ (n2−N/4)2

N/4

+ (n3−N/4)2

N/4

Calculate p-value (for χ2 distribution with 3 degrees of freedom) Test fails if p-value less than significance level

30 / 50

slide-31
SLIDE 31

AES-128 Balance Test

Figure: AES-128 Balance Test

31 / 50

slide-32
SLIDE 32

AES-128 Balance Test

Figure: AES-128 Balance Test

32 / 50

slide-33
SLIDE 33

AES-128 Output/Output Independence Test

Figure: AES-128 Independence Test

33 / 50

slide-34
SLIDE 34

AES-128 Output/Output Independence Test

Figure: AES-128 Independence Test

34 / 50

slide-35
SLIDE 35

Threefish-256 Balance Test

Figure: Threefish-256 Balance Test

35 / 50

slide-36
SLIDE 36

Threefish-256 Balance Test

Figure: Threefish-256 Balance Test

36 / 50

slide-37
SLIDE 37

Threefish-256 Output/Output Independence Test

Figure: Threefish-256 Independence Test

37 / 50

slide-38
SLIDE 38

Threefish-256 Output/Output Independence Test

Figure: Threefish-256 Independence Test

38 / 50

slide-39
SLIDE 39

Keccak-f [1600] Balance Test

Figure: Keccak-f [1600] Balance Test

39 / 50

slide-40
SLIDE 40

Keccak-f [1600] Output/Output Independence Test

Figure: Keccak-f [1600] Independence Test

40 / 50

slide-41
SLIDE 41

Speedup plots

Figure: Speedup (1 thread per block)

41 / 50

slide-42
SLIDE 42

Speedup plots

Figure: Speedup (32 thread per block)

42 / 50

slide-43
SLIDE 43

Speedup plots

Figure: Speedup (64 thread per block)

43 / 50

slide-44
SLIDE 44

Speedup plots

Figure: Speedup (20 thread blocks)

44 / 50

slide-45
SLIDE 45

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

45 / 50

slide-46
SLIDE 46

Conclusions

GPUs are excellent platforms for executing massively parallel programs Non randomness was not detected in the balance test on all three primitives Output/Output independence test shows non-randomness in all three primitives

46 / 50

slide-47
SLIDE 47

Outline

1 Introduction and Background

Background Cube Testing CUDA Primitives

2 Framework 3 Experiments

Description of experiments Results Timing

4 Conclusions 5 Future Work

47 / 50

slide-48
SLIDE 48

Future Work

Perform similar studies on the remaining SHA-3 candidates Running different cube tests on primitives, such as tests for linear, neutral variables More performance testing on the framework OpenCL, object-oriented approach

48 / 50

slide-49
SLIDE 49

References

1 Joan Daemen and Vincent Rijmen. Specication for the

Advanced Encryption Standard (AES). Federal Information Processing Standards Publication 197, 2001.

2 Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting,

Mihir Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker. The Skein Hash Function Family. Submission to NIST, 2008.

3 NVIDIA. NVIDIA CUDA programming guide. 4 A. Kaminsky. Parallel Java Library.

http://www.cs.rit.edu/~ark/pj.shtml

5 A. Kaminsky. Cube test analysis of the statistical behavior of

CubeHash and Skein. http://www.cs.rit.edu/~ark/parallelcrypto/cubetest01/

49 / 50

slide-50
SLIDE 50

Fin

Questions?

50 / 50