Non-transient Side Channels Mengjia Yan Fall 2020 6.888 - - PowerPoint PPT Presentation

non transient side channels
SMART_READER_LITE
LIVE PREVIEW

Non-transient Side Channels Mengjia Yan Fall 2020 6.888 - - PowerPoint PPT Presentation

Non-transient Side Channels Mengjia Yan Fall 2020 6.888 L5-Non-transient Side Channels 1 Lab Assignment Handout on course website Each (regular) student will receive an email Solo or 2-person group Individual GitHub repo


slide-1
SLIDE 1

Non-transient Side Channels

Mengjia Yan Fall 2020

6.888 L5-Non-transient Side Channels 1

slide-2
SLIDE 2

Lab Assignment

  • Handout on course website
  • Each (regular) student will receive an email
  • Solo or 2-person group
  • Individual GitHub repo
  • Info about accessing a server machine
  • Listeners can send us an email if you want to try the lab
  • Advice:
  • Start early. The first step is not to implement the attack, but to reverse

engineer the machine.

6.888 L5-Non-transient Side Channels 2

slide-3
SLIDE 3

Recap: Prime+Probe

Shared Cache

Sender Receiver

Sender line Receiver line

Time Prime

Cache Set

# ways

6.888 L5-Non-transient Side Channels 3

slide-4
SLIDE 4

Recap: Prime+Probe

Shared Cache

Sender Receiver

Sender line Receiver line

Time Prime

Cache Set

Wait Access

# ways

6.888 L5-Non-transient Side Channels 4

slide-5
SLIDE 5

Recap: Prime+Probe

Shared Cache

Sender Receiver

Sender line Receiver line

Time Prime

Cache Set

Wait Access

# ways

6.888 L5-Non-transient Side Channels 4

slide-6
SLIDE 6

Recap: Prime+Probe

Shared Cache

Sender Receiver

Sender line Receiver line

Time Prime

Cache Set

Wait Access

# ways

6.888 L5-Non-transient Side Channels 4

slide-7
SLIDE 7

Recap: Prime+Probe

Shared Cache

Sender Receiver

Sender line Receiver line

Time Prime

Cache Set

Wait Access

# ways

Receive “1” = 8 accesses à 1 miss

Probe

6.888 L5-Non-transient Side Channels 5

slide-8
SLIDE 8

Analogy: Bucket/Ball

Shared Cache

Sender Receiver

Cache Set

# ways

Sender’s address Receiver’s address Each cache set is a bucket that can hold 8 balls

6.888 L5-Non-transient Side Channels 6

slide-9
SLIDE 9

Analogy: Bucket/Ball

Shared Cache

Sender Receiver

Cache Set

# ways

Sender’s address Receiver’s address Each cache set is a bucket that can hold 8 balls

6.888 L5-Non-transient Side Channels 6

How many cache lines in total in the system?

slide-10
SLIDE 10

Analogy: Bucket/Ball

Shared Cache

Sender Receiver

Cache Set

# ways

Sender’s address Receiver’s address Each cache set is a bucket that can hold 8 balls

6.888 L5-Non-transient Side Channels 6

How many cache lines in total in the system? How to find the bucket used by the sender?

slide-11
SLIDE 11

Practical Cache Side Channels

6.888 L5-Non-transient Side Channels 7

slide-12
SLIDE 12

31 32bit

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes)

Physical Address:

index

  • Can think cache mapping as a hash table with limited size

6.888 L5-Non-transient Side Channels 8

slide-13
SLIDE 13

31 32bit

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes)

Physical Address:

index

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

6.888 L5-Non-transient Side Channels 8

slide-14
SLIDE 14

31 32bit

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes)

Physical Address:

index

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

6.888 L5-Non-transient Side Channels 8

Set Index = (Addr / Block Size) % Number of Sets

slide-15
SLIDE 15

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Physical Address:

index

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

6.888 L5-Non-transient Side Channels 9

Assuming byte-addressable

slide-16
SLIDE 16

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Physical Address:

index

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

Question: Given an 1MB L2 with 1024 sets, how many bits are used for set index?

6.888 L5-Non-transient Side Channels 9

Assuming byte-addressable

slide-17
SLIDE 17

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Physical Address:

index Number of bits for set index = log2(Number of sets)

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

Question: Given an 1MB L2 with 1024 sets, how many bits are used for set index?

6.888 L5-Non-transient Side Channels 9

Assuming byte-addressable

slide-18
SLIDE 18

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Physical Address:

index To distinguish addresses in the same set Number of bits for set index = log2(Number of sets)

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

Question: Given an 1MB L2 with 1024 sets, how many bits are used for set index?

6.888 L5-Non-transient Side Channels 9

Assuming byte-addressable

slide-19
SLIDE 19

31 9 8 6 5 0 32bit

Cache Mapping – Directly Mapped Cache

1 2 3 4 5 6 7 Tag Data (64 bytes) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Physical Address:

index To distinguish addresses in the same set Number of bits for set index = log2(Number of sets)

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

Question: Given an 1MB L2 with 1024 sets, how many bits are used for set index?

6.888 L5-Non-transient Side Channels 9

Assuming byte-addressable

slide-20
SLIDE 20

Cache Mapping – Set Associative Cache

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

1 2 3 4 5 6 7 Tag Data index Tag Data

2-way cache Physical Address:

31 9 8 6 5 0 Tag (high order bits) Index (3 bits) Line offset (6 bits)

6.888 L5-Non-transient Side Channels 10

slide-21
SLIDE 21

Cache Mapping – Set Associative Cache

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

1 2 3 4 5 6 7 Tag Data index Tag Data

2-way cache Physical Address:

31 9 8 6 5 0 Tag (high order bits) Index (3 bits) Line offset (6 bits)

Question: How to decide which way to use?

6.888 L5-Non-transient Side Channels 10

slide-22
SLIDE 22

Cache Mapping – Set Associative Cache

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

1 2 3 4 5 6 7 Tag Data index Tag Data

2-way cache Physical Address:

31 9 8 6 5 0 Tag (high order bits) Index (3 bits) Line offset (6 bits)

Question: How to decide which way to use?

Answer: Cache replacement policy.

6.888 L5-Non-transient Side Channels 10

slide-23
SLIDE 23

Cache Mapping – Set Associative Cache

  • Can think cache mapping as a hash table with limited size
  • Linear cache set mapping using modular arithmetic

1 2 3 4 5 6 7 Tag Data index Tag Data

2-way cache Physical Address:

31 9 8 6 5 0 Tag (high order bits) Index (3 bits) Line offset (6 bits) 31 9 8 6 5 0 Tag (high order bits) Set Index (3 bits) Line offset (6 bits)

Find eviction set == Find addresses with the same set index bits Question: How to decide which way to use?

Answer: Cache replacement policy.

6.888 L5-Non-transient Side Channels 10

slide-24
SLIDE 24

Address Translation (4KB page)

system’s view Physical Address (32bit): Programmer’s view Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset (12 bits) 31 12 11 0 physical page number Page offset (12 bits)

6.888 L5-Non-transient Side Channels 11

slide-25
SLIDE 25

Address Translation (4KB page)

system’s view Physical Address (32bit): Programmer’s view Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset (12 bits) 31 12 11 0 physical page number Page offset (12 bits)

6.888 L5-Non-transient Side Channels 11

Copy page offset

slide-26
SLIDE 26

Address Translation (4KB page)

system’s view Physical Address (32bit): Programmer’s view Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset (12 bits) 31 12 11 0 physical page number Page offset (12 bits) Page Table

6.888 L5-Non-transient Side Channels 11

Copy page offset

slide-27
SLIDE 27

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

6.888 L5-Non-transient Side Channels 12

slide-28
SLIDE 28

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Cache mapping: (8 sets)

6.888 L5-Non-transient Side Channels 12

slide-29
SLIDE 29

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits)

Cache mapping: (8 sets)

6.888 L5-Non-transient Side Channels 12

slide-30
SLIDE 30

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits) Index (3 bits)

Cache mapping: (8 sets)

6.888 L5-Non-transient Side Channels 12

slide-31
SLIDE 31

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits) Index (3 bits) Tag

Cache mapping: (8 sets)

6.888 L5-Non-transient Side Channels 12

slide-32
SLIDE 32

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits) Index (3 bits) Tag

Cache mapping: (8 sets) Cache mapping: (256 sets)

6.888 L5-Non-transient Side Channels 12

slide-33
SLIDE 33

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits) Index (3 bits) Tag

Cache mapping: (8 sets)

Line offset (6 bits) Set Index (8 bits) Tag

Cache mapping: (256 sets)

6.888 L5-Non-transient Side Channels 12

slide-34
SLIDE 34

Find Eviction Set Using Virtual Addresses

Virtual Address (48bit):

48 12 11 0 Virtual page number Page offset

Physical Address (32bit): 4KB page

31 12 11 0 physical page number Page offset (12 bits)

Line offset (6 bits) Index (3 bits) Tag

Cache mapping: (8 sets)

Line offset (6 bits) Set Index (8 bits) Tag

Cache mapping: (256 sets)

2 bit Not controllable via virtual address.

6.888 L5-Non-transient Side Channels 12

slide-35
SLIDE 35

Huge Pages

  • Huge page size: 2MB or 1GB
  • Number of bits for page offset?

6.888 L5-Non-transient Side Channels 13

slide-36
SLIDE 36

Huge Pages

  • Huge page size: 2MB or 1GB
  • Number of bits for page offset?

Virtual Address : 4KB page

48 12 11 0 Virtual page number Page offset (12 bits) 48 21 20 0 Virtual page number Page offset (21 bits)

Virtual Address : 2MB page

6.888 L5-Non-transient Side Channels 13

slide-37
SLIDE 37

Huge Pages

  • Huge page size: 2MB or 1GB
  • Number of bits for page offset?

Virtual Address : 4KB page

48 12 11 0 Virtual page number Page offset (12 bits) 48 21 20 0 Virtual page number Page offset (21 bits)

Virtual Address : 2MB page

Line offset (6 bits) Set Index (8 bits) Tag

Cache mapping: (256 sets)

6.888 L5-Non-transient Side Channels 13

slide-38
SLIDE 38

Multi-level Caches

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1 6.888 L5-Non-transient Side Channels 14

slide-39
SLIDE 39

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1 6.888 L5-Non-transient Side Channels 14

slide-40
SLIDE 40

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

L1-I/D cache L2 cache L3 cache (LLC) DRAM Size 32KB 256KB 1MB/core 16GB Associativity (# ways) 4 or 8 8 16 N/A Latency (cycles) 1-5 12 ~40 ~150 A typical configuration of Intel Ivy Bridge. Configurations are different with processor types. core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1 6.888 L5-Non-transient Side Channels 14

slide-41
SLIDE 41

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

  • LLC is generally divided into multiple slices

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1 6.888 L5-Non-transient Side Channels 15

slide-42
SLIDE 42

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

  • LLC is generally divided into multiple slices

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1

Tag Set Index Line offset

6.888 L5-Non-transient Side Channels 15

slide-43
SLIDE 43

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

  • LLC is generally divided into multiple slices

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1

Tag Set Index Line offset Slice ID = Hash(bits) An undocumented secret hash function

6.888 L5-Non-transient Side Channels 15

slide-44
SLIDE 44

Multi-level Caches

  • Motivation:
  • A memory cannot be large and fast. Add level of

cache to reduce miss penalty

  • LLC is generally divided into multiple slices
  • Conflict happens if addresses map to the same

slice and the same set

core

L2

LLC

I-L1 D-L1

core

L2 I-L1 D-L1

Tag Set Index Line offset Slice ID = Hash(bits) An undocumented secret hash function

6.888 L5-Non-transient Side Channels 15

slide-45
SLIDE 45

Eviction Set Construction Algorithm

Sender Receiver

Sender line Receiver line

Time Access Candidate Addresses

6.888 L5-Non-transient Side Channels 16

Shared Cache

Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19

slide-46
SLIDE 46

Eviction Set Construction Algorithm

Sender Receiver

Sender line Receiver line

Time Access Candidate Addresses Wait Access Target Address

6.888 L5-Non-transient Side Channels 17

Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19

slide-47
SLIDE 47

Eviction Set Construction Algorithm

Sender Receiver

Sender line Receiver line

Time Access Candidate Addresses Wait Access Target Address Measure Latency of Each Candidate Address

Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19

6.888 L5-Non-transient Side Channels 18

slide-48
SLIDE 48

Problems Due to Replacement Policy

  • Self-eviction due to replacement policy
  • An LRU (least recently used) example

Initial:

6.888 L5-Non-transient Side Channels 19

slide-49
SLIDE 49

Problems Due to Replacement Policy

  • Self-eviction due to replacement policy
  • An LRU (least recently used) example

6 7 5 8 2 3 1 4

Initial: Prime:

6.888 L5-Non-transient Side Channels 19

slide-50
SLIDE 50

Problems Due to Replacement Policy

  • Self-eviction due to replacement policy
  • An LRU (least recently used) example

6 7 5 8 2 3 1 4 6 7 5 8 2 3 1 4 9

Initial: Prime: Victim access:

6.888 L5-Non-transient Side Channels 19

slide-51
SLIDE 51

Problems Due to Replacement Policy

  • Self-eviction due to replacement policy
  • An LRU (least recently used) example

6 7 5 8 2 3 1 4 6 7 5 8 2 3 1 4 9 6 7 5 8 2 3 1 4 9

Initial: Prime: Victim access: Probe: Which to evict?

6.888 L5-Non-transient Side Channels 19

slide-52
SLIDE 52

Problems Due to Replacement Policy

  • Self-eviction due to replacement policy
  • An LRU (least recently used) example
  • A small trick:
  • Access addresses in reverse order

6 7 5 8 2 3 1 4 6 7 5 8 2 3 1 4 9 6 7 5 8 2 3 1 4 9

Initial: Prime: Victim access: Probe: Which to evict?

6.888 L5-Non-transient Side Channels 19

slide-53
SLIDE 53

Measure Latency of Multiple Accesses

  • HW Prefetcher + Out-of-order execution

T1 = rdtsc() Dummy1=Ld(Addr1) …… Dummy8=Ld(Addr8) T2 = rdtsc() Latency = T2-T1

6.888 L5-Non-transient Side Channels 20

slide-54
SLIDE 54

Measure Latency of Multiple Accesses

  • HW Prefetcher + Out-of-order execution

T1 = rdtsc() Dummy1=Ld(Addr1) …… Dummy8=Ld(Addr8) T2 = rdtsc() Latency = T2-T1 What we expect: Ld A1 Ld A2 Ld A8 Ld A7 …… Time

6.888 L5-Non-transient Side Channels 20

slide-55
SLIDE 55

Measure Latency of Multiple Accesses

  • HW Prefetcher + Out-of-order execution

T1 = rdtsc() Dummy1=Ld(Addr1) …… Dummy8=Ld(Addr8) T2 = rdtsc() Latency = T2-T1 What we expect: Ld A1 Ld A2 Ld A8 Ld A7 …… Time What actually will happen: Ld A1 Ld A2 Ld A8 Ld A7 …… Time

6.888 L5-Non-transient Side Channels 20

slide-56
SLIDE 56

Out-of-Order Processor

Fetch Decode RegRead Execute Writeback (Commit)

6.888 L5-Non-transient Side Channels 21

slide-57
SLIDE 57

Out-of-Order Processor

Fetch Decode RegRead Execute Writeback (Commit) Check whether the register to read is ready.

6.888 L5-Non-transient Side Channels 21

slide-58
SLIDE 58

Out-of-Order Processor

Ld A1 Ld A2 Ld A8 Ld A7 …… Time Fetch Decode RegRead Execute Writeback (Commit) Check whether the register to read is ready.

6.888 L5-Non-transient Side Channels 21

slide-59
SLIDE 59

Out-of-Order Processor

Ld A1 Ld A2 Ld A8 Ld A7 …… Time Fetch Decode RegRead Execute Writeback (Commit) Check whether the register to read is ready.

Question: How to serialize data accesses?

6.888 L5-Non-transient Side Channels 21

slide-60
SLIDE 60

Serialize Data Accesses

  • A special instruction “mfence”

6.888 L5-Non-transient Side Channels 22

https://www.felixcloutier.com/x86/mfence

slide-61
SLIDE 61

Serialize Data Accesses

  • A special instruction “mfence”
  • Add data dependency by creating a linked list

Dummy1 = Ld(Addr1) Addr2 = Ld(Addr1)

6.888 L5-Non-transient Side Channels 22

https://www.felixcloutier.com/x86/mfence

slide-62
SLIDE 62

Serialize Data Accesses

  • A special instruction “mfence”
  • Add data dependency by creating a linked list

dummy A1 dummy A2 dummy A3 content Pointer to the next node …… Dummy1 = Ld(Addr1) Addr2 = Ld(Addr1)

6.888 L5-Non-transient Side Channels 22

https://www.felixcloutier.com/x86/mfence

slide-63
SLIDE 63

Serialize Data Accesses

  • A special instruction “mfence”
  • Add data dependency by creating a linked list
  • Double linked list to access addresses in reverse order

dummy A1 dummy A2 dummy A3 content Pointer to the next node …… A1 A1 A2 A2 A3 …… Dummy1 = Ld(Addr1) Addr2 = Ld(Addr1)

6.888 L5-Non-transient Side Channels 22

https://www.felixcloutier.com/x86/mfence

slide-64
SLIDE 64

Handle Noise

6.888 L5-Non-transient Side Channels 23

slide-65
SLIDE 65

Handle Noise

  • A real-world example: Square-and-Multiply Exponentiation

for i = n-1 to 0 do r = sqr(r) mod n if ei == 1 then r = mul(r, b) mod n end end What you generally see in papers:

6.888 L5-Non-transient Side Channels 23

slide-66
SLIDE 66

The Multiply Function

6.888 L5-Non-transient Side Channels 24

slide-67
SLIDE 67

The Multiply Function

6.888 L5-Non-transient Side Channels 24

slide-68
SLIDE 68

Raw Trace

Access latencies measured in the probe operation in Prime+Probe. A sequence of “01010111011001” can be deduced as part of the exponent.

6.888 L5-Non-transient Side Channels 25

slide-69
SLIDE 69

There may exist other problems

  • Tips for lab assignment
  • Build the attack step-by-step
  • Recommend to read “Last-Level Cache Side-Channel Attacks are Practical”
  • Ask questions via Piazza

6.888 L5-Non-transient Side Channels 26

slide-70
SLIDE 70

Defenses

6.888 L5-Non-transient Side Channels 27

slide-71
SLIDE 71

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 28

slide-72
SLIDE 72

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 28

slide-73
SLIDE 73

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 28

slide-74
SLIDE 74

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 28

slide-75
SLIDE 75

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker

{Transient, Non-transient} {Cache, DRAM, TLB, NoC, etc.}

X

secret-dependent execution

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 28

slide-76
SLIDE 76

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Block creation of signals: Oblivious execution, speculative execution defenses, etc.

Defenses:

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 29

slide-77
SLIDE 77

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Block creation of signals: Oblivious execution, speculative execution defenses, etc. Close the channel: Isolation, etc.

Defenses:

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 29

slide-78
SLIDE 78

Micro-architecture Side Channels

A Channel (a micro-architecture structure)

Victim Attacker secret-dependent execution

Block creation of signals: Oblivious execution, speculative execution defenses, etc. Close the channel: Isolation, etc. Block detection of signals: Randomization, etc.

Defenses:

Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18

6.888 L5-Non-transient Side Channels 29

slide-79
SLIDE 79

Defense Design Considerations

Security Performance Portability

6.888 L5-Non-transient Side Channels 30

slide-80
SLIDE 80

The Problem: The ISA Abstraction

  • Interface between HW and SW: ISA
  • Advantage: HW optimizations without affecting

usability/portability

Hardware (caches, DRAM, TLBs, etc.) Software (branch, arithmetic instruction, load/store) ISA (instruction set architecture)

6.888 L5-Non-transient Side Channels 31

slide-81
SLIDE 81

6.888 L5-Non-transient Side Channels 32

From https://www.felixcloutier.com/x86/index.html

slide-82
SLIDE 82

The Problem: The ISA Abstraction

  • Interface between HW and SW: ISA
  • ISA specifies functionality, not performance/timing
  • Compare Intel Ivy Bridge and Cascade Processor

Hardware (caches, DRAM, TLBs, etc.) Software (branch, arithmetic instruction, load/store) ISA (instruction set architecture)

Example: DEC [addr]

6.888 L5-Non-transient Side Channels 33

slide-83
SLIDE 83

Data Oblivious/“Constant time” Programming

Write program w/o data-dependent behavior

6.888 L5-Non-transient Side Channels 34

slide-84
SLIDE 84

Data Oblivious/“Constant time” Programming

Write program w/o data-dependent behavior

Original: if (secret) a = *(addr1); else a = *(addr2);

secret = confidential addr1 = public addr2 = public

6.888 L5-Non-transient Side Channels 34

slide-85
SLIDE 85

Data Oblivious/“Constant time” Programming

Write program w/o data-dependent behavior

Original: if (secret) a = *(addr1); else a = *(addr2);

secret = confidential addr1 = public addr2 = public

Data Oblivious: a ← load (addr1); b ← load (addr2); cmov a = (secret) ? a : b;

6.888 L5-Non-transient Side Channels 34

slide-86
SLIDE 86

Data Oblivious/“Constant time” Programming

Write program w/o data-dependent behavior

Original: if (secret) a = *(addr1); else a = *(addr2);

secret = confidential addr1 = public addr2 = public

Data Oblivious: a ← load (addr1); b ← load (addr2); cmov a = (secret) ? a : b;

a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

6.888 L5-Non-transient Side Channels 34

slide-87
SLIDE 87

Programming in Circuit Abstraction

  • Program = DAG (“circuit”)
  • Operations = nodes (“gates”)
  • Data transfers = edges (“wires”)
  • Topology must be confidential data-independent
  • Each gate’s execution must hide its inputs
  • Each wire must hide the value it carries
  • p1
  • p2
  • p3
  • p4

Node/Gate Edge/Wire

6.888 L5-Non-transient Side Channels 35

slide-88
SLIDE 88

What assumptions underpin the model?

a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

if (secret) a = *(addr1); else a = *(addr2); secret = confidential addr1 = public addr2 = public addr1 addr2

36

slide-89
SLIDE 89

What assumptions underpin the model?

  • Rule 1: instruction/gate execution = confidential data-independent

a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

if (secret) a = *(addr1); else a = *(addr2); secret = confidential addr1 = public addr2 = public addr1 addr2

36

slide-90
SLIDE 90

What assumptions underpin the model?

  • Rule 1: instruction/gate execution = confidential data-independent
  • Rule 2: data transfer/wire

= confidential data-independent

a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

if (secret) a = *(addr1); else a = *(addr2); secret = confidential addr1 = public addr2 = public addr1 addr2

36

slide-91
SLIDE 91

What assumptions underpin the model?

  • Rule 1: instruction/gate execution = confidential data-independent
  • Rule 2: data transfer/wire

= confidential data-independent

  • Rule 3: circuit/program topology

= fixed

a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

if (secret) a = *(addr1); else a = *(addr2); secret = confidential addr1 = public addr2 = public addr1 addr2

36

slide-92
SLIDE 92

Today’s machines can violate these assumptions

  • Rule 1: instruction/gate execution = confidential data-independent
  • Rule 2: data transfer/wire

= confidential data-independent

  • Rule 3: circuit/program topology

= fixed

Violations due to: Data-dependent instruction

  • ptimizations

(e.g., zero-skip, early exit, microcode, silent stores, …) a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

addr1 addr2

37

slide-93
SLIDE 93

Today’s machines can violate these assumptions

  • Rule 1: instruction/gate execution = confidential data-independent
  • Rule 2: data transfer/wire

= confidential data-independent

  • Rule 3: circuit/program topology

= fixed

Violations due to: Data at rest optimizations (e.g., compression in register file/uop fusion, cache, page tables, …) a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

addr1 addr2

38

slide-94
SLIDE 94

Today’s machines can violate these assumptions

  • Rule 1: instruction/gate execution = confidential data-independent
  • Rule 2: data transfer/wire

= confidential data-independent

  • Rule 3: circuit/program topology

= fixed

Violations due to: Speculative/OoO execution a ← load addr1 b ← load addr2 cmov secret, b, a

a b secret

addr1 addr2

39

slide-95
SLIDE 95

HW Resource Partition

  • Security v.s. Quality of Service (QoS)
  • Intel Cache Allocation Technology (CAT)

6.888 L5-Non-transient Side Channels 40

slide-96
SLIDE 96

HW Resource Partition

  • Security v.s. Quality of Service (QoS)
  • Intel Cache Allocation Technology (CAT)
  • Temporal Partition v.s. Spatial Partition

6.888 L5-Non-transient Side Channels 40

slide-97
SLIDE 97

HW Resource Partition

  • Security v.s. Quality of Service (QoS)
  • Intel Cache Allocation Technology (CAT)
  • Temporal Partition v.s. Spatial Partition
  • Challenges nowadays:
  • Security domain determination is tricky nowadays
  • Scalability: what is #domains > #partitions
  • How to partition inside cores?
  • Why not execute applications on a single node?

6.888 L5-Non-transient Side Channels 40

slide-98
SLIDE 98

Randomization/Fuzzing

  • Introduce noise to time measurement/Make time

measurement coarse-grained

  • Pros and cons?

6.888 L5-Non-transient Side Channels 41

slide-99
SLIDE 99

Randomization/Fuzzing

  • Introduce noise to time measurement/Make time

measurement coarse-grained

  • Pros and cons?

+ Simple and no performance overhead + Effective towards a group of popular attacks ……

  • Not effective to attacks that do not measure time
  • Not effective to victims that cause big timing difference
  • Affect usability if benign application needs to use a fine-grained timer

6.888 L5-Non-transient Side Channels 41

slide-100
SLIDE 100

Randomization/Fuzzing

  • Introduce noise to time measurement/Make time

measurement coarse-grained

  • Pros and cons?
  • Randomize cache mapping functions
  • Pros and cons?

+ Simple and no performance overhead + Effective towards a group of popular attacks ……

  • Not effective to attacks that do not measure time
  • Not effective to victims that cause big timing difference
  • Affect usability if benign application needs to use a fine-grained timer

6.888 L5-Non-transient Side Channels 41

slide-101
SLIDE 101

Randomization/Fuzzing

  • Introduce noise to time measurement/Make time

measurement coarse-grained

  • Pros and cons?
  • Randomize cache mapping functions
  • Pros and cons?

+ Simple and no performance overhead + Effective towards a group of popular attacks ……

  • Not effective to attacks that do not measure time
  • Not effective to victims that cause big timing difference
  • Affect usability if benign application needs to use a fine-grained timer

+ Generally low performance overhead (still allow cache to be shared)

  • Difficult to reason about security

+/- Can reduce attack bandwidth, but unlikely to eliminate attacks

6.888 L5-Non-transient Side Channels 41

slide-102
SLIDE 102

Next Lecture: Transient Side Channels

6.888 L5-Non-transient Side Channels 42