Who watches the watchmen?: Utilizing Performance Monitors for - - PowerPoint PPT Presentation

who watches the watchmen utilizing performance monitors
SMART_READER_LITE
LIVE PREVIEW

Who watches the watchmen?: Utilizing Performance Monitors for - - PowerPoint PPT Presentation

Who watches the watchmen?: Utilizing Performance Monitors for Compromising keys of RSA on Intel Platforms Sarani Bhattacharya and Debdeep Mukhopadhyay Indian Institute of Technology Kharagpur CHES 2015 September 15, 2015 CHES 2015 Sarani


slide-1
SLIDE 1

Who watches the watchmen?: Utilizing Performance Monitors for Compromising keys of RSA on Intel Platforms

Sarani Bhattacharya and Debdeep Mukhopadhyay

Indian Institute of Technology Kharagpur

CHES 2015 September 15, 2015

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 1 / 34

slide-2
SLIDE 2

Overview of the talk

Introduction Motivation of the problem Exponentiation primitives for Public key cryptography Modelling branch misses as side-channel Formally modeling success probability Experimental validation Conclusion

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 2 / 34

slide-3
SLIDE 3

Introduction

Hardware performance counters (HPCs) are a set of special-purpose registers to store the counts of hardware-related activities within the microprocessor.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 3 / 34

slide-4
SLIDE 4

Introduction

Hardware performance counters (HPCs) are a set of special-purpose registers to store the counts of hardware-related activities within the microprocessor. Hence HPCs can be utilized for both attacks and their countermeasures.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 3 / 34

slide-5
SLIDE 5

Introduction

Hardware performance counters (HPCs) are a set of special-purpose registers to store the counts of hardware-related activities within the microprocessor. Hence HPCs can be utilized for both attacks and their countermeasures. Asymmetric-key cryptographic algorithms when implemented on systems with branch predictors, are subjected to side-channel attacks exploiting the deterministic branch predictor behaviour due to their key-dependent input sequences.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 3 / 34

slide-6
SLIDE 6

Objective of the work

This work shows that HPCs, which are used as performance monitors (watchmen) in modern computer systems can be utilized to retrieve the secret keys by reasonably modelled adversaries.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 4 / 34

slide-7
SLIDE 7

Objective of the work

This work shows that HPCs, which are used as performance monitors (watchmen) in modern computer systems can be utilized to retrieve the secret keys by reasonably modelled adversaries. The attack exploits the characteristics of branch predictor and shows formally that the leakage of the key increases with the ability of the attacker to model the predictor more accurately.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 4 / 34

slide-8
SLIDE 8

Objective of the work

This work shows that HPCs, which are used as performance monitors (watchmen) in modern computer systems can be utilized to retrieve the secret keys by reasonably modelled adversaries. The attack exploits the characteristics of branch predictor and shows formally that the leakage of the key increases with the ability of the attacker to model the predictor more accurately. We claim that branch misses from HPCs are indeed more significant side-channels compared to timing.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 4 / 34

slide-9
SLIDE 9

Why should we consider HPCs for security analysis?

Results from HPCs are treated as an accurate representations of events occurring in hardware [1], [2].

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 5 / 34

slide-10
SLIDE 10

Why should we consider HPCs for security analysis?

Results from HPCs are treated as an accurate representations of events occurring in hardware [1], [2]. This occurs when the overhead introduced by performance counter interfaces does not dominate the event counts.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 5 / 34

slide-11
SLIDE 11

Why should we consider HPCs for security analysis?

Results from HPCs are treated as an accurate representations of events occurring in hardware [1], [2]. This occurs when the overhead introduced by performance counter interfaces does not dominate the event counts. The accuracy depends upon the interface used, the application and the event being measured [1].

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 5 / 34

slide-12
SLIDE 12

Exploiting Hardware Performance Counters

HPC L1 and L2 D-cache miss counters have been exploited as side-channels in [3] for performing timing based cache attacks on symmetric-key algorithms, like AES.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 6 / 34

slide-13
SLIDE 13

Exploiting Hardware Performance Counters

HPC L1 and L2 D-cache miss counters have been exploited as side-channels in [3] for performing timing based cache attacks on symmetric-key algorithms, like AES. On the other hand, in [4] data from performance counters are used to develop a malware detector in hardware using machine learning techniques.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 6 / 34

slide-14
SLIDE 14

Exploiting Hardware Performance Counters

HPC L1 and L2 D-cache miss counters have been exploited as side-channels in [3] for performing timing based cache attacks on symmetric-key algorithms, like AES. On the other hand, in [4] data from performance counters are used to develop a malware detector in hardware using machine learning techniques. While in [5], a new Virtual Machine Monitor (VMM) named NumChecker is proposed, which exploits HPCs to detect kernel root- kits in a guest Virtual Machine.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 6 / 34

slide-15
SLIDE 15

Performance Monitoring over the years

In [6], profiling HPCs are referred to be accessible in high-privilege modes. But since the advent of Linux-Perf for userspace Program Analysis [7], [8] this highly accurate performance monitoring information are available to Linux users from supercomputers to embedded systems.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 7 / 34

slide-16
SLIDE 16

Performance Monitoring over the years

In [6], profiling HPCs are referred to be accessible in high-privilege modes. But since the advent of Linux-Perf for userspace Program Analysis [7], [8] this highly accurate performance monitoring information are available to Linux users from supercomputers to embedded systems.

Oprofile- a system-wide sampling profiler by Levon which was included into Linux 2.5.43 in 2002.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 7 / 34

slide-17
SLIDE 17

Performance Monitoring over the years

In [6], profiling HPCs are referred to be accessible in high-privilege modes. But since the advent of Linux-Perf for userspace Program Analysis [7], [8] this highly accurate performance monitoring information are available to Linux users from supercomputers to embedded systems.

Oprofile- a system-wide sampling profiler by Levon which was included into Linux 2.5.43 in 2002. PAPI implementation for Linux uses the perfctr Linux patch an event-monitoring device driver to enable access to the counters.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 7 / 34

slide-18
SLIDE 18

Performance Monitoring over the years

In [6], profiling HPCs are referred to be accessible in high-privilege modes. But since the advent of Linux-Perf for userspace Program Analysis [7], [8] this highly accurate performance monitoring information are available to Linux users from supercomputers to embedded systems.

Oprofile- a system-wide sampling profiler by Levon which was included into Linux 2.5.43 in 2002. PAPI implementation for Linux uses the perfctr Linux patch an event-monitoring device driver to enable access to the counters. In 2009, event named ‘perf’ subsystem was added to the Linux kernel, and makes user access to performance counters less clumsy, without kernel patches or recompiles [9].

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 7 / 34

slide-19
SLIDE 19

Performance Monitoring over the years

In [6], profiling HPCs are referred to be accessible in high-privilege modes. But since the advent of Linux-Perf for userspace Program Analysis [7], [8] this highly accurate performance monitoring information are available to Linux users from supercomputers to embedded systems.

Oprofile- a system-wide sampling profiler by Levon which was included into Linux 2.5.43 in 2002. PAPI implementation for Linux uses the perfctr Linux patch an event-monitoring device driver to enable access to the counters. In 2009, event named ‘perf’ subsystem was added to the Linux kernel, and makes user access to performance counters less clumsy, without kernel patches or recompiles [9]. Greatest advantage of Perf event [9] is the subsystem has been already included in the Linux kernel 2.6.31 as “Performance Counters for Linux”.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 7 / 34

slide-20
SLIDE 20

Public key Cryptography

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 8 / 34

slide-21
SLIDE 21

Exponentiation and Underlying Multiplication Primitive

Inputs(M) are encrypted and decrypted by performing modular exponentiation with modulus N on public or private keys represented as n bit binary string.

Square and Multiply Exponentiation

Algorithm 1: Binary version of Square and Multiply Exponentiation Algorithm

S ← M ; for i from 1 to n − 1 do S ← S ∗ S mod N ; if di = 1 then S ← S ∗ M mod N ; end end return S ;

Conditional execution of instruction and their dependence on secret exponent is exploited by the simple power and timing side-channels [10].

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 9 / 34

slide-22
SLIDE 22

Montgomery Ladder Exponentiation Algorithm

A na¨ ıve modification is to have a balanced ladder structure having equal number of squarings and multiplications. Most popular exponentiation primitive for Asymmetric-key cryptographic implementations.

Algorithm 2: Montgomery Ladder Algorithm

R0 ← 1 ; R1 ← M ; for i from 0 to n − 1 do if di = 0 then R1 ← (R0 ∗ R1) mod N ; R0 ← (R0 ∗ R0) mod N ; end else R0 ← (R0 ∗ R1) mod N ; R1 ← (R1 ∗ R1) mod N ; end end return R0 ; CHES 2015 Sarani Bhattacharya Who watches the watchmen? 10 / 34

slide-23
SLIDE 23

Montgomery Multiplication Algorithm

Highly efficient algorithm for performing modular squaring and modular multiplication operation [11]. Avoids time consuming integer division operation. R is assumed to be 2k, when N is k-bit number. Calculates Z = A ∗ B ∗ R−1(modN), A = a ∗ R(modN), B = b ∗ R(modN) and R−1 ∗ R = 1(modN).

Algorithm 3: Montgomery Multiplication Algorithm

S ← A ∗ B ; S ← (S + (S ∗ N−1 mod R) ∗ N)/R ; if S > N then S ← S − N ; end return S ; CHES 2015 Sarani Bhattacharya Who watches the watchmen? 11 / 34

slide-24
SLIDE 24

Branch Predictor State Machines

Predict Taken Predict Not Taken Predict Taken Predict Not Taken Taken Not Taken Not Taken Not Taken Taken Taken Taken Not Taken

Dynamic 2-bit predictor State Machine

The predictor must miss twice before the prediction changes. Conditional branching in regular recurring fashion goes undetected.

Two Level Adaptive Branch Prediction [12]

Pattern History bits 1 1111...11 1111...10 0000...10 0000...01 0000...00 1 1 prediction of B index Pattern History Table State logic transition Branch History Register Sc Sc+1 = d(Sc, Rc) . . . Sc λ(Sc) Branch result of B(Rc)

Figure: Two Level Adaptive Branch Prediction

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 12 / 34

slide-25
SLIDE 25

Modelling Branch Miss as Side-Channel from HPC

We monitor the branch misses on the square and multiply and Montgomery Ladder algorithm using Montgomery multiplication as subroutine for operations like squaring and multiplication. Branch miss rely on the ability of branch predictor to correctly predict future branches to be taken. Profiling of HPCs using performance monitoring tools provides simple user interface to different hardware event counts and are considered as side-channel.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 13 / 34

slide-26
SLIDE 26

Approximating the System predictor with 2-bit branch predictor

4290 4300 4310 4320 4330 4340 4350 4360 470 480 490 500 510 520 530 540 550 560 Observed branch misses from Perf Predicted branch misses from 2-bit dynamic predictor

Figure: Variation of branch-misses from performance counters with increase in branch miss

from 2-bit predictor algorithm

Direct correlation observed for the branch misses from HPCs and from the simulated 2-bit dynamic predictor over a sample of exponent bitstream. This confirms assumption of 2-bit dynamic predictor being an approximation to the underlying system branch predictor.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 14 / 34

slide-27
SLIDE 27

Idea of the Attack

In [13], timing attack exploiting branch mispredictions are demonstrated which requires the knowledge of actual structure of branch prediction hardware of the target system. Advantage of this attack lies in the fact that adversary, inspite of having no knowledge of the underlying architecture, can actually target real systems and reveal secret exponent bits, exploiting the branch miss as side-channel from HPCs. This is an iterative attack, targeting ith bit assuming previous bits to be known. The attack separates a sample input set based on mispredictions for conditional reduction of Montgomery multiplication at the (i + 1)th squaring step of exponentiation assuming secret ith bit.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 15 / 34

slide-28
SLIDE 28

Threat Model for the attack

The attacker knows first i bits of the private key and wants to determine next unknown bit di of the key (d0, d1, · · · , di, · · · , dn−1). Generate a trace of branches as (tm,1, tm,2, · · · , tm,i) for conditional reduction of Montgomery multiplication at every squaring step. Under the assumption of di having value j, where j ∈ {0, 1}, appropriate value of tj

m,i+1 is simulated.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 16 / 34

slide-29
SLIDE 29

Offline Phase of the Attack

1 Trace of taken or not taken branches assume Prediction Oracle d0 d1 · · · di−1 di ti t0

i+1

t2 t1 t1

i+1

· · · ti−1 di = 0 di = 1 t1 t1 t2 t2 · · · · · · ti ti t0

i+1

t1

i+1

if T(t1, t2, · · · , ti) = t1

i+1

then add m to M1 else add m to M2 if T(t1, t2, · · · , ti) = t0

i+1

then add m to M3 else add m to M4 for an input plaintext m

Figure: Partitioning randomly generated Ciphertexts set based on simulated Branch miss Modelling

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 17 / 34

slide-30
SLIDE 30

Separation of Random Inputs

1

M1 = {m|m does not cause a miss during MM of (i + 1)th squaring if di = 1}

2

M2 = {m|m causes a misprediction during MM of (i + 1)th squaring if di = 1}

3

M3 = {m|m does not cause a miss during MM of (i + 1)th squaring if di = 0}

4

M4 = {m|m causes a misprediction during MM of (i + 1)th squaring if di = 0}

We ensure that there must be no common ciphertexts in sets (M1, M3) and (M2, M4) and the sets should be disjoint.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 18 / 34

slide-31
SLIDE 31

Online Phase

The probable next bit is decided following the Algorithm 4.

If(avg(MM2) > avg(MM1)) and (avg(MM4) < avg(MM3)), then the next bit (nbi) = 1 Otherwise, if (avg(MM4) > avg(MM3)) and (avg(MM2) < avg(MM1)) then, next bit (nbi) = 0

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 19 / 34

slide-32
SLIDE 32

Algorithm 4: Adversary Attack Algorithm

Input: (d0, d1, · · · , di−1),M Output: Probable next bit nbi begin Offline Phase; for ∀m ∈ M do Generate taken/ not-taken trace for input m as tm,1, tm,2, · · · , tm,i ; Assume di = 0 and 1, generate t0

m,i+1, t1 m,i+1 respectively;

pm,i+1 = T(tm,1, tm,2, · · · , tm,i ) ; if pm,i+1 = t1

m,i+1 then

Add m to M1 ; end else Add m to M2 ; end if pm,i+1 = t0

m,i+1 then

Add m to M3 ; end else Add m to M4 ; end end Remove Duplicate Ciphertexts in the sets M1, M3 and M2, M4; Online Phase; Observe distribution of branch misses from performance counters as MM1 , MM2 , MM3 , MM4 ; if (avg(MM2 ) > avg(MM1 )) and (avg(MM4 ) < avg(MM3 )) then nbi = 1 ; end if (avg(MM4 ) > avg(MM3 )) and (avg(MM2 ) < avg(MM1 )) then nbi = 0 ; end return nbi ; end CHES 2015 Sarani Bhattacharya Who watches the watchmen? 20 / 34

slide-33
SLIDE 33

Formally Modelling the Success Probability

In the offline phase

Assuming di = 1

Pr[m1 ∈ M1] = Pr[pm1,i+1 = t1

m1,i+1]

Pr[m2 ∈ M2] = Pr[pm2,i+1 = t1

m2,i+1]

Assuming di = 0

Pr[m3 ∈ M3] = Pr[pm3,i+1 = t0

m3,i+1]

Pr[m4 ∈ M4] = Pr[pm4,i+1 = t0

m4,i+1]

After removing duplicates, t0

m,i+1 = t1 m,i+1.

In the online phase

Let nbi be the bit which the attacker concludes to be the next secret bit. Let the expectation of the distribution of branch misses (MM, ∀m ∈ M) be MM. Thus,

Pr[nbi = 0] = Pr[(MM4 − MM3) > 0 ∧ (MM2 − MM1) < 0] Pr[nbi = 1] = Pr[(MM2 − MM1) > 0 ∧ (MM4 − MM3) < 0].

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 21 / 34

slide-34
SLIDE 34

Formally Modelling the Success Probability

Let (i + 1)th branch predicted by the real predictor for input m is rm,i+1. Let i + 1th branch instruction has trace Bm,i+1 for unknown bit di. If di = 0, then Bm,i+1 = t0

m,i+1, otherwise if di = 1, Bm,i+1 = t1 m,i+1.

Thus we can rewrite the previous equation as

Pr[nbi = 0] = Pr[(MM4 − MM3 ) > 0 ∧ (MM2 − MM1 ) < 0] = Pr[(rm4,i+1 = Bm4,i+1) ∧ (rm3,i+1 = Bm3,i+1) ∧ (rm2,i+1 = Bm2,i+1) ∧ (rm1,i+1 = Bm1,i+1)] CHES 2015 Sarani Bhattacharya Who watches the watchmen? 22 / 34

slide-35
SLIDE 35

Formally Modelling the Success Probability

Pr(Success) = Pr[nbi = di ] = Pr[nbi = 0 ∧ di = 0] + Pr[nbi = 1 ∧ di = 1] = Pr[nbi = 0 | di = 0] · Pr[di = 0] + Pr[nbi = 1 | di = 1] · Pr[di = 1]

If di = 0, we replace Bm,i+1 = t0

m,i+1 in Equation 1 as,

Pr[nbi = 0 | di = 0] = Pr[(rm4,i+1 = t0

m4,i+1) ∧ (rm3,i+1 = t0 m3,i+1) ∧ (rm2,i+1 = t0 m2,i+1) ∧ (rm1,i+1 = t0 m1,i+1)]

= Pr[(rm4,i+1 = t0

m4,i+1) ∧ (rm3,i+1 = t0 m3,i+1) ∧ (rm2,i+1 = t1 m2,i+1) ∧ (rm1,i+1 = t1 m1,i+1)]

(since t0

m2,i+1 = t1 m2,i+1 and t1 m1,i+1 = t0 m1,i+1)

Substituting the events from Offline Phase,

Pr[nbi = 0 | di = 0] = Pr[(rm4,i+1 = pm4,i+1) ∧ (rm3,i+1 = pm3,i+1) ∧ (rm2,i+1 = pm2,i+1) ∧ (rm1,i+1 = pm1,i+1)] = Pr[(rm,i+1 = pm,i+1)] CHES 2015 Sarani Bhattacharya Who watches the watchmen? 23 / 34

slide-36
SLIDE 36

Formally Modelling the Success Probability

Similar calculations reveal, Pr[nbi = 1 | di = 1] = Pr[(rm,i+1 = pm,i+1)] Combining equations we get, Pr(Success) = Pr[rm,i+1 = pm,i+1] · [Pr(di = 0) + Pr(di = 1)] = Pr[rm,i+1 = pm,i+1] Thus the probability of success is equal to the probability that the theoretical predictor closely models the real predictor.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 24 / 34

slide-37
SLIDE 37

Experimental Validation for the Online Phase of the Attack

A large input set is separated by simulations over bimodal and two-level adaptive predictor. Average branch misses are observed from HPCs for each elements in set M1 , M2 , M3 and M4. Each set has L = 1000 elements. Experiment is repeated over I = 1000 iterations. Experiments are performed on various platforms as Core-2 Duo E7400, Intel Core i3 M350 and Intel Core i5-3470.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 25 / 34

slide-38
SLIDE 38

Experiments on Square and Multiply Algorithm

4630 4632 4634 4636 4638 4640 4642 4644 4646 4648 100 200 300 400 500 600 700

  • Avg. Branch misses from Performance Counters

Iterations M1 -no simulated miss M2 -misprediction

(a) Correct Assumption di = 1

4630 4632 4634 4636 4638 4640 4642 4644 4646 4648 100 200 300 400 500 600 700

  • Avg. Branch misses from Performance Counters

Iterations M3 -no simulated miss M4 -misprediction

(b) Incorrect Assumption di = 0

Figure: Branch misses from HPCs on square and multiply correctly identifies secret bit di = 1,

ciphertext set partitioned by simulated misses of two-level adaptive predictor

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 26 / 34

slide-39
SLIDE 39

Experiments on Montgomery Ladder

4760 4765 4770 4775 4780 4785 4790 4795 4800 4805 50 100 150 200 250 300 350 400 450

  • Avg. Branch misses from Performance Counters

Iterations M1 -no simulated miss M2 -misprediction

(a) Correct Assumption di = 1

4760 4765 4770 4775 4780 4785 4790 4795 50 100 150 200 250 300 350 400 450

  • Avg. Branch misses from Performance Counters

Iterations M3 -no simulated miss M4 -misprediction

(b) Incorrect Assumption di = 0

Figure: Branch misses from HPCs on Montgomery Ladder correctly identifies secret bit di = 1,

ciphertext set partitioned by simulated misses of two-level adaptive predictor

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 27 / 34

slide-40
SLIDE 40

Comparison with timing as side-channel

4.1e+06 4.2e+06 4.3e+06 4.4e+06 4.5e+06 4.6e+06 4.7e+06 4.8e+06 4.9e+06 100 200 300 400 500 600 700 800 900 1000 Execution Time Iterations M1 -no simulated misprediction M2 - misprediction

(a) Correct Assumption di = 1

3.9e+06 4e+06 4.1e+06 4.2e+06 4.3e+06 4.4e+06 4.5e+06 4.6e+06 4.7e+06 4.8e+06 4.9e+06 100 200 300 400 500 600 700 800 900 1000 Execution Time Iterations M3 -no simulated misprediction M4 - misprediction

(b) Incorrect Assumption di = 0

Figure: No identification of secret bit is possible using timing as side-channel with L = 1000

and I = 1000

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 28 / 34

slide-41
SLIDE 41

Variation of parameters such as Number of Inputs (L) and Iteration (I)

Variation in separation with increase of Ciphertexts

3980 4000 4020 4040 4060 4080 4100 4120 200 400 600 800 1000 1200 1400 1600 1800 2000 Average Branch misses from Performance Counters Number of Inputs (L) M1 -no simulated miss M2 -misprediction

(a) Correct Assumption di = 1

4000 4010 4020 4030 4040 4050 4060 4070 4080 4090 200 400 600 800 1000 1200 1400 1600 1800 2000 Average Branch misses from Performance Counters Number of Inputs (L) M3 -no simulated miss M4 -misprediction

(b) Incorrect Assumption di = 0

Figure: Variation in the separation of branch misses for correct secret bit = 1 showing positive

difference for M1 and M2 with the increase in number of ciphertexts(L), I = 100

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 29 / 34

slide-42
SLIDE 42

Variation in separation with increase of Iterations

4000 4005 4010 4015 4020 4025 4030 4035 4040 4045 4050 200 400 600 800 1000 Average Branch misses from Performance Counters Number of Iteration (I) M1 -no simulated miss M2 -misprediction

(a) Incorrect Assumption di = 1

3990 4000 4010 4020 4030 4040 4050 4060 200 400 600 800 1000 Average Branch misses from Performance Counters Number of Iteration (I) M3 -no simulated miss M4 -misprediction

(b) Correct Assumption di = 0

Figure: Variation in the separation of branch misses for correct secret bit = 0 showing positive

difference for M3, M4 with the increase in number of iteration(I), L = 1000

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 30 / 34

slide-43
SLIDE 43

RSA-OAEP Randomized Padding Scheme

MGF MGF H’ Padding Message Hash RSA Ciphertext RSA Plaintext xx =00? Parameters Correct seed form? H=H’?

Figure: Decryption in RSA-OAEP procedure [14]

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 31 / 34

slide-44
SLIDE 44

RSA-OAEP

Separation for RSA-OAEP scheme

9075 9080 9085 9090 9095 9100 9105 9110 9115 9120 100 200 300 400 500 600 Execution Time Iterations M1 -no simulated misprediction M2 - misprediction

(a) Correct Assumption di = 1

9070 9080 9090 9100 9110 9120 9130 9140 100 200 300 400 500 600 Execution Time Iterations M3 -no simulated misprediction M4 - misprediction

(b) Incorrect Assumption di = 0

Figure: Branch misses from HPCs on RSA-OAEP implementation, correctly identifies secret bit di = 1, ciphertext set partitioned by simulated misses of bimodal predictor

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 32 / 34

slide-45
SLIDE 45

Probable Countermeasures

If input to MM algorithm is masked such that 2 random numbers r1, r2 generated at runtime and inputs are modified as (ar = a + r1) and (br = b + r2), the branch predictor observes branches which depend on r1, r2. This masking strategy will prevent the adversary from simulating branch miss, since r1, r2 are randomly generated at run time. The effect of r1, r2 can be nullified by adding correction terms. There are other implementations of RSA, like CRT-RSA, can be more resistant, since the adversary cannot perform the necessary subsimulations (without knowing the prime factors of the RSA modulus). However, in presence of stronger fault models performance counters pose to be a threatening side channel.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 33 / 34

slide-46
SLIDE 46

Conclusion

Experiments show that HPCs form threatening side-channel for existing implementations of RSA-like ciphers and similar implementations.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34

slide-47
SLIDE 47

Conclusion

Experiments show that HPCs form threatening side-channel for existing implementations of RSA-like ciphers and similar implementations.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34

slide-48
SLIDE 48

Conclusion

Experiments show that HPCs form threatening side-channel for existing implementations of RSA-like ciphers and similar implementations. The information provided by Performance Counters should be computed to access the performance, without providing a mechanism to extract secret information.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34

slide-49
SLIDE 49

Conclusion

Experiments show that HPCs form threatening side-channel for existing implementations of RSA-like ciphers and similar implementations. The information provided by Performance Counters should be computed to access the performance, without providing a mechanism to extract secret information.

CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34

slide-50
SLIDE 50
  • W. Korn, P. J. Teller, and G. Castillo.

Just how accurate are performance counters?, pages 303–310. 2001. Vincent M. Weaver and Sally A. McKee. Can hardware performance counters be trusted? In David Christie, Alan Lee, Onur Mutlu, and Benjamin G. Zorn, editors, 4th International Symposium on Workload Characterization (IISWC 2008), Seattle, Washington, USA, September 14-16, 2008, pages 141–150. IEEE, 2008. Leif Uhsadel, Andy Georges, and Ingrid Verbauwhede. Exploiting hardware performance counters. In Luca Breveglieri, Shay Gueron, Israel Koren, David Naccache, and Jean-Pierre Seifert, editors, FDTC, pages 59–67. IEEE Computer Society, 2008. John Demme, Matthew Maycock, Jared Schmitz, Adrian Tang, Adam Waksman, Simha Sethumadhavan, and Salvatore J. Stolfo. On the feasibility of online malware detection with performance counters. In Avi Mendelson, editor, ISCA, pages 559–570. ACM, 2013. Xueyang Wang and Ramesh Karri. Numchecker: detecting kernel control-flow modifying rootkits by using hardware performance counters. In DAC, page 79. ACM, 2013. Kris Tiri, Onur Acii¸ cmez, Michael Neve, and Flemming Andersen. An Analytical Model for Time-Driven Cache Attacks. In Alex Biryukov, editor, FSE, volume 4593 of Lecture Notes in Computer Science, pages 399–413. Springer, 2007. System Platforms Sector NTT DATA CORPORATION Tetsuo Takata, Platform Solutions Business Unit. Perf for User Space Program Analysis. [Online]. Available: http://events.linuxfoundation.org/sites/events/files/lcjp13_takata. pdfhttp://events.linuxfoundation.org/sites/events/files/lcjp13 takata.pdf, 2013. September 2010 Arnaldo Carvalho de Melo, Linux Kongress. The New Linux ‘perf’ tools. [Online]. Available: http://www.linux-kongress.org/2010/slides/lk2010-perf-acme. pdfhttp://www.linux-kongress.org/2010/slides/lk2010-perf-acme.pdf, 2010. CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34

slide-51
SLIDE 51

Vincent M. Weaver and University of Maine. Linux perf event features and overhead. In 2013 FastPath Workshop, pages –, 2013. Paul C. Kocher. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Neal Koblitz, editor, CRYPTO ’96: Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology, volume 1109 of Lecture Notes in Computer Science, pages 104–113, London, UK, 1996. Springer-Verlag. Peter L. Montgomery. Modular Multiplication without Trial Division. Mathematics of Computation, 44(170):519–521, 1985. Tse-Yu Yeh and Yale N. Patt. Two-level adaptive training branch prediction. In MICRO, pages 51–61, 1991. Onur Acii¸ cmez, C ¸etin Kaya Ko¸ c, and Jean-Pierre Seifert. Predicting Secret Keys Via Branch Prediction. In Masayuki Abe, editor, CT-RSA, volume 4377 of Lecture Notes in Computer Science, pages 225–242. Springer, 2007. James Manger. A chosen ciphertext attack on rsa optimal asymmetric encryption padding (oaep) as standardized in pkcs 1 v2.0. In CRYPTO, pages 230–238, 2001. CHES 2015 Sarani Bhattacharya Who watches the watchmen? 34 / 34