based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon - - PowerPoint PPT Presentation

based on the reuse of repetitive data
SMART_READER_LITE
LIVE PREVIEW

based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon - - PowerPoint PPT Presentation

FACE : Fast AES CTR mode Encryption Techniques based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon Lee Center for Information Security Technologies, Korea University FACE 1 Introduction The counter is


slide-1
SLIDE 1

FACE : Fast AES CTR mode Encryption Techniques based on the Reuse of Repetitive Data

Jin Hyung Park and Dong Hoon Lee

Center for Information Security Technologies, Korea University

slide-2
SLIDE 2

FACE

Introduction

IV(Counter)

K

Block Cipher Encryption Plaintext0 Ciphertext0 Plaintext1 Ciphertext1 Plaintext i-1 Ciphertext i-1 1st block 2nd block ith block

+ (i – 1)

K

Block Cipher Encryption

K

Block Cipher Encryption The counter is incremented for each block

   

1

slide-3
SLIDE 3

FACE

Introduction

 

IV(Counter)

K

Block Cipher Encryption Plaintext0 Ciphertext0 Plaintext1 Ciphertext1 Plaintext i-1 Ciphertext i-1 1st block 2nd block ith block

+ (i – 1)

K

Block Cipher Encryption

K

Block Cipher Encryption The counter is incremented for each block

2

slide-4
SLIDE 4

FACE

Introduction

 

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 State CTR0 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 01 CTR1 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 State 8A 4F 5E 59 48 7B BA B3 ED 83 6A C4 AC 56 50 38 Round Key 8A 4F 5E 59 48 7B BA B3 ED 83 6A C4 AC 56 50 38 Round Key 8A 4F 5E 59 48 7B BA B3 ED 83 6A C4 AC 56 50 39 State 8A 4F 5E 59 48 7B BA B3 ED 83 6A C4 AC 56 50 3A State

< Initial Whitening phase of AES >

1st block 2nd block

2

slide-5
SLIDE 5

FACE

Introduction

 

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 State CTR0 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 01 CTR1 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 State 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 Round Key 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 Round Key 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 State 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 State

*

Counter-mode Caching**

2

slide-6
SLIDE 6

FACE

Round Function - 4 Transformations

       

[0] [1]

S-Box

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [0] [1]

Shift

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [0] [4] [8] [12] [5] [9] [13] [1] [10] [14] [2] [6] [15] [3] [7] [11] [0] [4] [8] [12] [5] [9] [13] [1] [10] [14] [2] [6] [15] [3] [7] [11]

2 3 1 2 1 1 3 1 1 1 3 1 2 3 1 2

  • [0]

[4] [8] [12] [5] [9] [13] [1] [10] [14] [2] [6] [15] [3] [7] [11] [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [0] [4] [8] [12] [5] [9] [13] [1] [10] [14] [2] [6] [15] [3] [7] [11]

Round Key State

3

slide-7
SLIDE 7

FACE

AES Implementation Methods

s0 = GETU32(in ) ^ rk[0]; s1 = GETU32(in + 4) ^ rk[1]; s2 = GETU32(in + 8) ^ rk[2]; s3 = GETU32(in + 12) ^ rk[3]; /* round 1: */ t0 = Te0[s0 >> 24] ^ Te1[(s1 >> 16) & 0xff] ^ Te2[(s2 >> 8) & 0xff] ^ Te3[s3 & 0xff] ^ rk[ 4]; t1 = Te0[s1 >> 24] ^ Te1[(s2 >> 16) & 0xff] ^ Te2[(s3 >> 8) & 0xff] ^ Te3[s0 & 0xff] ^ rk[ 5]; t2 = Te0[s2 >> 24] ^ Te1[(s3 >> 16) & 0xff] ^ Te2[(s0 >> 8) & 0xff] ^ Te3[s1 & 0xff] ^ rk[ 6]; t3 = Te0[s3 >> 24] ^ Te1[(s0 >> 16) & 0xff] ^ Te2[(s1 >> 8) & 0xff] ^ Te3[s2 & 0xff] ^ rk[ 7];

static const u32 Te0[256] = { 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU, 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U, 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU, 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU, … 0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U, 0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU, }; static const u32 Te3[256] = { 0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U, 0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U, 0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U, 0xfefe19e7U, 0xd7d762b5U, 0xababe64dU, 0x76769aecU, … 0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU, 0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU, };

< OpenSSL >

4

slide-8
SLIDE 8

FACE

AES Implementation Methods

s0 = GETU32(in ) ^ rk[0]; s1 = GETU32(in + 4) ^ rk[1]; s2 = GETU32(in + 8) ^ rk[2]; s3 = GETU32(in + 12) ^ rk[3]; /* round 1: */ t0 = Te0[s0 >> 24] ^ Te1[(s1 >> 16) & 0xff] ^ Te2[(s2 >> 8) & 0xff] ^ Te3[s3 & 0xff] ^ rk[ 4]; t1 = Te0[s1 >> 24] ^ Te1[(s2 >> 16) & 0xff] ^ Te2[(s3 >> 8) & 0xff] ^ Te3[s0 & 0xff] ^ rk[ 5]; t2 = Te0[s2 >> 24] ^ Te1[(s3 >> 16) & 0xff] ^ Te2[(s0 >> 8) & 0xff] ^ Te3[s1 & 0xff] ^ rk[ 6]; t3 = Te0[s3 >> 24] ^ Te1[(s0 >> 16) & 0xff] ^ Te2[(s1 >> 8) & 0xff] ^ Te3[s2 & 0xff] ^ rk[ 7];

< OpenSSL >

Vulnerable to Cache timing attack

static const u32 Te3[256] = { 0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U, 0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U, 0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U, 0xfefe19e7U, 0xd7d762b5U, 0xababe64dU, 0x76769aecU, … 0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU, 0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU, };

static const u32 Te0[256] = { 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU, 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U, 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU, 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU, … 0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U, 0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU, };

4

slide-9
SLIDE 9

FACE

AES Implementation Methods

 

< 8 plaintext blocks > < bitsliced form transformation (OpenSSL implementation based on [1]) >

b0 b1 b2 b3 b12 b13 b14 00000000 MSB LSB

Block0 :

Block1 :

b0 b1 b2 b3 b12 b13 b14 00000001

Block2 :

b0 b1 b2 b3 b12 b13 b14 00000010

Block3 :

b0 b1 b2 b3 b12 b13 b14 00000011

Block4 :

b0 b1 b2 b3 b12 b13 b14 00000100

Block5 :

b0 b1 b2 b3 b12 b13 b14 00000101

Block6 :

b0 b1 b2 b3 b12 b13 b14 00000110

Block7 :

b0 b1 b2 b3 b12 b13 b14 00000111

< 8 [128-bits] registers >

Register0 :

1010 1 0 1 0 MSB LSB

Register1 :

1100 1 1 0 0

Register2 :

1111 0 0 0 0

Register3 :

0000 0 0 0 0

Register4 :

0000 0 0 0 0

Register5 :

0000 0 0 0 0

Register6 :

0000 0 0 0 0

Register7 :

0000 0 0 0 0

… … … … … … … … 5

slide-10
SLIDE 10

FACE

AES Implementation Methods

 

Instruction Description AESENC Perform one round of an AES encryption flow AESENCLAST Perform the last round of an AES encryption flow AESDEC Perform one round of an AES decryption flow AESDECLAST Perform the last round of an AES decryption flow AESKEYGENASSIST Assist in AES round key generation AESIMC Assist in AES Inverse Mix Columns PCLMULQDQ Carryless multiply

*block = _mm_xor_si128( *block , skeys[0] ) ; /* round 1: */ *block = _mm_aesenc_si128 ( *block , skeys[1] ) ;

< Crypto++ >

6

slide-11
SLIDE 11

FACE

AES Implementation Methods

Method Performance (Cycles per Byte) Test Environment Reference Table-based

10.57 + α (not for CTR)

Core 2 Quad Q6600 INDOCRYPT 2008 [1] Bitslicing

9.32

Core 2 Quad Q6600 CHES 2009 [2]

7.59

Core 2 Quad Q9550 AES-NI

1.4 - 2.0

Westmere Processor INTEL whitepaper [3]

0.57

Skylake Core i5 Crypto++ Benchmark [4] [1] : Daniel J. Bernstein and Peter Schwabe, “New AES software speed records”, INDOCRYPT 2008 [2] : Emilia Käsper and Peter Schwabe, “Faster and Timing-Attack Resistant AES-GCM”, CHES 2009 [3] : Shay Gueron, “Intel Advanced Encryption Standard (AES) New Instructions Set”, May, 2010 ( The first Westmere-based processors (that supports AES-NI) were launched on Jan, 2010. ) [4] : Crypto++ 6.0.0 Benchmarks, https://www.cryptopp.com/benchmarks.html, 2017. 12

7

slide-12
SLIDE 12

FACE

Problem

Bits itsli lice ce AES ES-NI NI

aesenc xmm15, xmm1  only 1 instruction performs round operation

Adding some operations to calculate the rest becomes a considerable burden even if instruction latency and throughput differ from each instruction Such operations (for the rest) should be composed of several instructions Necessary input bytes to calculate the rest are spread to whole register Almost the whole instructions of previous implementation should be performed with additional operations (save, load, merge)

During a format conversion, each byte of input is sliced bitwise. And the sliced bits are spread in the corresponding positions of each register

8

slide-13
SLIDE 13

FACE

Our Work (FACE)

FACE

 Extends the counter-mode caching 

FACE

  The first to combine counter-mode caching with bitsliced implementation  The first to apply counter-mode caching up to the round transformations of AES-NI

FACE the highest throughput

 

9

slide-14
SLIDE 14

FACE

Fast AES Counter mode Encryption

FACE (Fast AES Counter Mode Encryption)

  • 12 bytes
  • 232-1

  • 12 bytes
  • 255

  • 1K
  • 240

  • 16 bytes
  • 255

  • 4K
  • 240

10

slide-15
SLIDE 15

256 block The difference last 1 byte FACE

Fast AES Counter mode Encryption

FACErd0

1st Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Initialization Vector (128 bits Counter Value) Block-to-State Transformation

S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

1st block’s Counter Value + Interval

2nd Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11]

Block-to-State Transformation

: Different Part : Available Part of Cache

11

slide-16
SLIDE 16

256 block The difference last 1 byte FACE

Fast AES Counter mode Encryption

FACErd0

1st Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Initialization Vector (128 bits Counter Value)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Block-to-State Transformation

Round Key

Round 1

S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

1st block’s Counter Value + Interval

2nd Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11]

Block-to-State Transformation

Round Key

Round 1

S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache

11

slide-17
SLIDE 17

256 block The difference last 1 byte 3 columns 232-1 FACE

Fast AES Counter mode Encryption

FACErd0

1st Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Initialization Vector (128 bits Counter Value)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Block-to-State Transformation

Round Key

Round 1

S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

1st block’s Counter Value + Interval

2nd Block :

Round 0(Initial Whitening)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11]

Block-to-State Transformation

Round Key

Round 1

S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache

Cache

First Block State 1 Byte Difference Second Block State

11

slide-18
SLIDE 18

The difference last 1 byte FACE

Fast AES Counter mode Encryption

FACErd1

1ST Block :

Round 1 Round 0

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Round 1 Round 0

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache : Correlation of transformation with bytes

12

slide-19
SLIDE 19

The difference last 1 byte This difference spreads FACE

Fast AES Counter mode Encryption

FACErd1

1ST Block :

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache : Correlation of transformation with bytes

12

slide-20
SLIDE 20

The difference last 1 byte This difference spreads 3 columns 255 FACE

Fast AES Counter mode Encryption

FACErd1

1ST Block :

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Cache

First Block State 4 Byte Difference Second Block State Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache : Correlation of transformation with bytes

12

slide-21
SLIDE 21

Pre-computation lookup table FACE

Fast AES Counter mode Encryption

FACErd1+

1ST Block :

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Byte that is used as index : Correlation of transformation with bytes

ctr[15] xor Round 0’s rk[15]

13

slide-22
SLIDE 22

Pre-computation lookup table 240 lookup index last byte of the counter FACE

Fast AES Counter mode Encryption

FACErd1+

1ST Block :

Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

First Block State Second Block State Round 1 Round 0 SubBytes ShiftRows MixColumns AddRoundKey Round 2

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] Lookup Table

1 255

Cached State Index

in12 in13 in14 in15 in12 in13 in14 in15 State _in0,0[3] State _in1,0[3] : Different Part : Byte that is used as index : Correlation of transformation with bytes

13

slide-23
SLIDE 23

FACE

Fast AES Counter mode Encryption

Leverage FACErd1 & FACErd1+

1

Counter

Round 1 Look-up Table Saved State (FACErd1) 1 Round 2

Complete (Up to Round 1) after memory load and merge

b0 b1 b2 b3 b4 b5 b6 b7 b8 b9

b10 b11 b12 b13 b14 b15

Caching Procedure (Round 1)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] SubBytes( ) ShiftRows( )

MixColumns( )

AddRoundKey( )

Index(b15) State Value(4 Bytes) 0x64E2F9C1 1 … … 254 255

0xS[0]S[1]S[2]S[3]

0x1A83B211 0x73816F1F 0x6C8EB21D

Counter

Initial whitening (Round 0)

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Round 0

14

slide-24
SLIDE 24
  • The difference

the first column 4 bytes FACE

Fast AES Counter mode Encryption

FACErd2

1ST Block :

Round 2 Round 1

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15]

Round 2 Round 1

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] : Different Part : Available Part of Cache : Correlation of transformation with bytes

15

slide-25
SLIDE 25
  • The difference

the first column 4 bytes This difference spreads all States FACE

Fast AES Counter mode Encryption

FACErd2

1ST Block :

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Available Part of Cache : Correlation of transformation with bytes

MixColumns MixColumns

15

slide-26
SLIDE 26
  • The difference

the first column 4 bytes This difference spreads all States intermediate result 16 bytes 255 FACE

Fast AES Counter mode Encryption

FACErd2

1ST Block :

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Available Part of Cache : Correlation of transformation with bytes S[0] S[5] S[10] S[15]

2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2

S[0] S[5] S[10] S[15]

2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2

Round Key Round Key

𝟑 ⋅S[0] 𝟐 ⋅ S[0] 𝟐 ⋅ S[0] 𝟒 ⋅ S[0]

[ The case of State[0] ]

= 𝟒 ⋅S[5] 𝟑 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅S[10] 𝟒 ⋅ S[10] 𝟑 ⋅ S[10] 𝟐 ⋅ S[10] 𝟐 ⋅S[15] 𝟐 ⋅ S[15] 𝟒 ⋅ S[15] 𝟑 ⋅ S[15]

Round Key

𝟑 ⋅S[0] 𝟐 ⋅ S[0] 𝟐 ⋅ S[0] 𝟒 ⋅ S[0] = 𝟒 ⋅S[5] 𝟑 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅S[10] 𝟒 ⋅ S[10] 𝟑 ⋅ S[10] 𝟐 ⋅ S[10] 𝟐 ⋅S[15] 𝟐 ⋅ S[15] 𝟒 ⋅ S[15] 𝟑 ⋅ S[15]

Round Key

Cache

MixColumns MixColumns

15

slide-27
SLIDE 27

Pre-computation lookup table FACE

Fast AES Counter mode Encryption

FACErd2+

1ST Block :

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Byte that is used as index : Correlation of transformation with bytes

16

slide-28
SLIDE 28

Pre-computation lookup table FACE

Fast AES Counter mode Encryption

FACErd2+

1ST Block :

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Correlation of transformation with bytes S[0] S[5] S[10] S[15]

2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2

S[0] S[5] S[10] S[15]

2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2

Round Key Round Key

𝟑 ⋅S[0] 𝟐 ⋅ S[0] 𝟐 ⋅ S[0] 𝟒 ⋅ S[0]

[ The case of State[0] ]

= 𝟒 ⋅S[5] 𝟑 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅S[10] 𝟒 ⋅ S[10] 𝟑 ⋅ S[10] 𝟐 ⋅ S[10] 𝟐 ⋅S[15] 𝟐 ⋅ S[15] 𝟒 ⋅ S[15] 𝟑 ⋅ S[15]

Round Key

𝟑 ⋅S[0] 𝟐 ⋅ S[0] 𝟐 ⋅ S[0] 𝟒 ⋅ S[0] = 𝟒 ⋅S[5] 𝟑 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅ S[5] 𝟐 ⋅S[10] 𝟒 ⋅ S[10] 𝟑 ⋅ S[10] 𝟐 ⋅ S[10] 𝟐 ⋅S[15] 𝟐 ⋅ S[15] 𝟒 ⋅ S[15] 𝟑 ⋅ S[15]

Round Key

MixColumns MixColumns

𝟑 ⋅S[0] 𝟐 ⋅ S[0] 𝟐 ⋅ S[0] 𝟒 ⋅ S[0]

By FACErd1+, ctr[15] determines

16

slide-29
SLIDE 29

Pre-computation lookup table 240 lookup index last byte of the counter FACE

Fast AES Counter mode Encryption

FACErd2+

1ST Block :

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Byte that is used as index : Correlation of transformation with bytes

First Block State[0] Second Block State[0]

Lookup Table

1 255

Cached State Index

in12 in13 in14 in15 in12 in13 in14 in15

State _in0,0[3] State _in1,0[3]

[ The case of State[0] ] FACErd2 FACErd2

16

slide-30
SLIDE 30

Pre-computation lookup table 240 lookup index last byte of the counter FACE

Fast AES Counter mode Encryption

FACErd2+

1ST Block :

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

2nd Block :

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11]

Round 2 Round 1 SubBytes ShiftRows MixColumns AddRoundKey Round 3

S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[1] S[2] S[3] S[4] S[5] S[6] S[7] S[8] S[9] S[10] S[11] S[12] S[13] S[14] S[15] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] S[0] S[5] S[10] S[15] S[4] S[9] S[14] S[3] S[8] S[13] S[2] S[7] S[12] S[1] S[6] S[11] : Different Part : Byte that is used as index : Correlation of transformation with bytes

First Block State[0] Second Block State[0]

Lookup Table

1 255

Cached State Index

in12 in13 in14 in15 in12 in13 in14 in15

State _in0,0[3] State _in1,0[3]

[ The case of State[0] ] FACErd2 FACErd2

The he w who hole le op

  • per

eration tions s up up to to rou

  • und

nd 2 ca can be n be done done by 2 by 2 memo memory l y loa

  • ad

d an and d 1 X 1 XOR OR op

  • per

eration tions s on

  • nly

y !

16

slide-31
SLIDE 31

FACE

Cache timing Attacks

< Cache timing variation >

cached non-cached

  

fixed-time instructions depend on secret data does not use conditional branches depend on secret data does not use memory access patterns depend on secret data

** ARMageddon, USENIX 2016

17

slide-32
SLIDE 32

FACE

Cache timing Attacks

< Cache timing variation >

cached non-cached

  

fixed-time instructions depend on secret data does not use conditional branches depend on secret data does not use memory access patterns depend on secret data

** ARMageddon, USENIX 2016

▣ Our method looks like vulnerable to timing attacks (the use of lookup tables) ▣ But, FACE has no operations that depend on secret data

  • In case of FACErd0, FACErd1, and FACErd2, the size of cache is small and the indices are fixed (i.e. constant data)
  • In case of FACErd1+ and FACErd2+, the index is merely a part of counter that does not need to be secret

and the index increases linearly

17

slide-33
SLIDE 33

FACE

Evaluations

BS08 KS09

*

Test Env_1 Test Env_2 Test Env_3 CPU Intel Core 2 Quad Q9550 Intel Core i7 4770K Intel Core i7 8700K Frequency 2.8 GHz 3.5 GHz 3.7 GHz RAM 4 GB 8 GB 16 GB OS Linux 3.19.0-32 x86_64 Linux 3.19.0-32 x86_64 Linux 4.13.0-36 x86_64

18

slide-34
SLIDE 34

FACE

Evaluations

Test Env_1 Test Env_2 Test Env_3 Intel Core 2 Quad Q9550 Intel Core i7 4770K Intel Core i7 8700K 2.8 GHz 3.5 GHz 3.7 GHz 4 GB 8 GB 16 GB Linux 3.19.0-32 x86_64 Linux 3.19.0-32 x86_64 Linux 4.13.0-36 x86_64

19

slide-35
SLIDE 35

FACE

Conclusion

20

slide-36
SLIDE 36

FACE

Conclusion

Thank y hank you

  • u for y
  • r your a
  • ur att

ttention! ention! Any Any Que Questions? stions?

20