Bankrupt Covert Channel: Turning Network Predictability into - - PowerPoint PPT Presentation

bankrupt covert channel
SMART_READER_LITE
LIVE PREVIEW

Bankrupt Covert Channel: Turning Network Predictability into - - PowerPoint PPT Presentation

Bankrupt Covert Channel: Turning Network Predictability into Vulnerability Dmitrii Ustiugov , Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh This work is supported by ARM Center of Excellence at University of Edinburgh


slide-1
SLIDE 1

Bankrupt Covert Channel:

Turning Network Predictability into Vulnerability

Dmitrii Ustiugov, Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh

This work is supported by ARM Center of Excellence at University of Edinburgh

slide-2
SLIDE 2

Data Breaches Never Been More Relevant

2

slide-3
SLIDE 3

Containing Data Breaches in Public Cloud

Data breaches happen ☹

  • Spyware, side channels, …

Cloud vendors strive to contain stolen info

  • Firewalls, authentication, …

3

Spy with secret Receiver

How to extract secret? Secure cloud environment Outside world

slide-4
SLIDE 4

Containing Data Breaches in Public Cloud

Data breaches happen ☹

  • Spyware, side channels, …

Cloud vendors strive to contain stolen info

  • Firewalls, authentication, …

4

Spy with secret Receiver

  • Process and virtual machine isolation
  • Physical server isolation
  • Virtual network isolation

How to extract secret?

Image by Lynn Willis, source

Isolation layers Secure cloud environment Outside world

slide-5
SLIDE 5

Containing Data Breaches in Public Cloud

Data breaches happen ☹

  • Spyware, side channels, …

Cloud vendors strive to contain stolen info

  • Firewalls, authentication, …

5

Spy with secret Receiver VM isolation Physical server isolation Virtual network isolation

Secret extraction?

Are secrets safe now?

slide-6
SLIDE 6

Covert Channels

6

Definition: Communication without using legitimate data transfer mechanisms

  • Usually via resource sharing (e.g., CPU cache)
  • Example: Timing channel via access latency modulation

○ High latency for transmitting “1”, low for “0”

Image by Rick Leche (flipped vertically), source

Covert channels allow bypassing isolation layers

slide-7
SLIDE 7

Network Covert Channels

7

Allow communication across cluster/datacenter Breach many isolation layers at once Stereotypical thinking: Networks are noisy ⇨ low accuracy and low throughput channels

Spy with secret Receiver VM isolation Physical server isolation Virtual network isolation

Secret extraction

But… Are modern networks noisy?

slide-8
SLIDE 8

Emerging Networks in Public Cloud

Remote Direct Memory Access (RDMA)

  • Today most cloud providers offer RDMA networks

AWS, Azure, Alibaba, Oracle, …

RDMA network packets bypass destination CPU

  • Low round-trip latency:

2-4μsec

  • High BW with commodity NICs:

100+Gb/s Nodes use one-sided reads/writes to their private data in remote node’s memory

8

CPU NIC

Node A

CPU NIC

Node B

Memory RDMA network

Remote region for A Data Data

One-sided RDMA read

Memory

slide-9
SLIDE 9

Network BW vs. Memory BW Discrepancy

9

First glance at bandwidth in modern servers

  • RDMA NICs offer 100-200Gb/s
  • Memory delivers >100GB/s (=800Gb/s)

Expectation: Memory BW always much larger

NIC CPU NIC

Node B

Memory

Memory

Network BW 100Gb/s Memory BW 100GB/s=800Gb/s CPU

Node A

slide-10
SLIDE 10

Network BW vs. Memory BW Discrepancy

10

First glance at bandwidth in modern servers

  • RDMA NICs offer 100-200Gb/s
  • Memory delivers >100GB/s (=800Gb/s)

Expectation: Memory BW always much larger Wrong!

  • Memory has 100s of internal devices (banks)
  • Each bank delivers just ~10Gb/s

○ E.g., same for both Micron DDR2, DDR4 ○ Bank behaves as FIFO: ~50ns fixed service time

CPU NIC

Node A

CPU NIC

Node B

Memory

Memory

Network BW 100Gb/s

100s of banks

Memory BW 100GB/s=800Gb/s

10Gb/s

Network traffic can easily congest one memory bank

slide-11
SLIDE 11

Bankrupt: RDMA Intra-Cluster Covert Channel

11

Key features

  • No direct communication between Sender and Receiver
  • Extremely stealthy!

Basic idea

  • Sender transmits the secret by modulating

the latency of one memory bank on an Intermediary node

  • Receiver probes the bank latency and decodes the message
  • Intermediary is unrelated innocuous node

○ No shared memory between Sender and Receiver

Sender

CPU NIC

Intermediary Memory

Bank

Receiver

Bursts of RDMA reads

Bank delay Timeline High Low

1 0 1 0 1 0

Probes (individual RDMA reads)

RDMA network

slide-12
SLIDE 12

Constructing Bankrupt

12

1. Search for addresses that map to target bank

  • Challenge: CPU hashes addresses to determine the bank
  • Sender and Receiver search addresses independently
  • Addresses different for Sender and Receiver

○ Recall: No memory sharing!

2. Determining communication parameters

  • Sender side

○ How many RDMA reads per burst? ○ Transmission frequency?

  • Receiver side

○ Receiving (probes) frequency?

Sender

CPU NIC

Intermediary Memory

Bank

Receiver

RDMA network

Bursts of RDMA reads Probes (individual RDMA reads)

Bank delay Timeline High Low

1 0 1 0 1 0

slide-13
SLIDE 13

Finding Addresses in Same Bank

13

slide-14
SLIDE 14

Virtual Memory Addressing

Virtual addresses translated to physical upon access

  • Translation at page granularity
  • Same mechanism for local and remote (over RDMA) accesses

14

p p p p p p p c c c c c c … ? ? ? ? ? ? ? ? ? p p p p p p p c c c c c c … Virtual address (VA)

Cache block offset Page offset Arbitrary bits, defined by OS

Physical address (PA) Translate VA->PA

All bits within page are same

Within a page, physical address bits same as in virtual address

5 63

slide-15
SLIDE 15

Bank Location

Some physical address bits, “bank bits”, define bank

  • Low-order bits to maximize bank-level parallelism

How to find addresses in same bank?

  • These addresses have same bank bits

15

? ? ? ? ? ? ? ? ? p p p p p p p c c c c c c … Physical address (PA) XOR function Bank location Bank bits define bank location

Need to find exact bank bits positions

slide-16
SLIDE 16

Same-Bank Addresses Search: Iteration 1

Attacker (Sender and Receiver independently): 1. Chooses arbitrary addresses in remote memory

○ Reads to same cache blocks (64B) coalesced ⇨ set {5:0} bits to 0

2. Issues RDMA reads to chosen addresses and measures network throughput

○ Network BW = number of serving banks x10Gb/s

16

p p p p p p p 0 0 0 0 0 0 … Virtual address (VA)

In cache block In page (e.g., 1GB)

Throughput Single bank’s BW = ~10Gb/s

(can vary slightly across vendors)

Measurements

slide-17
SLIDE 17

Same-Bank Addresses Search: Iteration 2

Attacker (Sender and Receiver independently) 1. Reduces subset of addresses

a. Set {6:0} bits to 0

2. Issues RDMA reads & measures throughput

17

p p p p p p 0 0 0 0 0 0 0 … Virtual address (VA)

In cache block In page (e.g., 1GB)

Throughput Measurements

Throughput dropped by 2x ⇨ bit 6 is bank bit

Single bank’s BW = ~10Gb/s

(can vary slightly across vendors)

Bank bit!

slide-18
SLIDE 18

Same-Bank Addresses Search: Iteration 3

Attacker (Sender and Receiver independently) 1. Reduces subset of addresses

a. Set {7:0} bits to 0

2. Issues RDMA reads & measures throughput

18

p p p p p 0 0 0 0 0 0 0 0 … Virtual address (VA)

In cache block In page (e.g., 1GB)

Throughput Measurements

Same throughput ⇨ bit 7 is NOT bank bit

Single bank’s BW = ~10Gb/s

(can vary slightly across vendors)

Not a bank bit!

slide-19
SLIDE 19

Same-Bank Addresses Search: Iteration 4

Attacker (Sender and Receiver independently) 1. Reduces subset of addresses

a. Set {8:0} bits to 0

2. Issues RDMA reads & measures throughput

19

p p p p 0 0 0 0 0 0 0 0 0 … Virtual address (VA)

In cache block In page (e.g., 1GB)

Throughput

Measurements

Throughput dropped by 2x ⇨ bit 8 is bank bit

Single bank’s BW = ~10Gb/s

(can vary slightly across vendors)

slide-20
SLIDE 20

Same-Bank Addresses Search: Iteration N

Attacker (Sender and Receiver independently) 1. Reduces subset of addresses

a. Set {N-6:0} bits to 0

2. Issues RDMA reads & measures throughput Knowing bank bits locations, choose arbitrary addresses with

  • bank bits equal to 0
  • cache block bits equal to 0

20

p p 0 0 0 0 0 0 0 0 0 0 0 … Virtual address (VA)

In cache block In page (e.g., 1GB)

Throughput

Measurements

Throughput saturated ⇨ all bank bits zeroed

Trivial complexity: Remote attacker finds addresses in <1 second

Single bank’s BW = ~10Gb/s

(can vary slightly across vendors)

slide-21
SLIDE 21

Determining Communication Parameters

21 Sender

CPU NIC

Intermediary Memory

Bank

Receiver

RDMA network

Bursts of RDMA reads Probes (individual RDMA reads)

Bank delay Timeline High Low

1 0 1 0 1 0

slide-22
SLIDE 22

Sender Side

Key parameter: Sender’s burst size

  • Larger bursts more pronounced

⇨ higher accuracy

○ Especially in noisy networks

  • Smaller bursts drain quicker

⇨ higher frequency

22

Burst size Transmission accuracy Transmission frequency Optimization Space

slide-23
SLIDE 23

Receiver Side

Transmission period estimation

  • Transmitted packets comprise fixed-size preamble and payload

○ Example: 32-bit preamble & 200-bit payload

  • Receiver iteratively determines the transmission period by

looking for pre-agreed preamble value

Key parameter: Probing frequency

  • Several probes (measurements) per transmission period
  • Found little sensitivity on decoding accuracy with probing

frequency > 2MHz (1/0.5μseconds)

23

Timeline Probe Round-Trip Delay Measurements

Transmission period

1

slide-24
SLIDE 24

Evaluation Platforms

Private Cluster (isolated and loaded network )

Cluster size: 6 nodes Infiniband CPU: Xeon E5-2630v4 (Broadwell) RAM: 64GB, DDR4-2400 NIC: Mellanox CX-5, 56Gb/s

Public Cloud: CloudLab Utah (80% utilized during measurements)

Cluster size: 200 nodes Infiniband CPU: Xeon E5-2640v4 (Broadwell) RAM: 64GB, DDR4-2400 NIC: Mellanox CX-4, 50Gb/s

24

slide-25
SLIDE 25

Private Cluster: Isolated Environment

Burst size 32 (x 64 bytes) minimum required for reliable decoding Larger burst sizes

  • Decrease transmission frequency
  • Make signal more pronounced

○ Larger gap between high and low delay

25

Burst size: 32 128

slide-26
SLIDE 26

Private Cluster: Noisy Environment

Network loading μbenchmark issues 40Gb/s (70% link BW) RDMA read traffic to Intermediary Signal with burst size 32 indistinguishable Signal clear with burst size of 128

26

Burst size: 32 128

Noise efficiently compensated for with larger burst size

slide-27
SLIDE 27

Private Cluster: Stealthiness

1. CPU hardware counters

  • Memory bandwidth monitoring

○ Bankrupt loads only one bank, <1% of memory ○ CPU counters too coarse-grain

2. Measure local memory access time (LMAT) with software random-access μbenchmark

  • Using RDTSC timestamps
  • With Bankrupt, LMAT is affected only at 99.9-th percentile

3. Network counters non-helpful: No network resources congested

27

Virtually undetectable with current HW and SW

slide-28
SLIDE 28

Private Cluster vs. CloudLab: Throughput & Accuracy

Accuracy as % of correctly transmitted bits Throughput: True data rate without preambles and errors Similar accuracy but 20% lower throughput in CloudLab

  • Used larger preambles to improve accuracy

Optimal burst size of 32

28

Sweet spot between transmission frequency & accuracy

20%

slide-29
SLIDE 29

Takeaways

Covert channels allow bypassing cloud isolation layers We introduce Bankrupt covert channel

  • No direct communication between Sender and Receiver
  • Affects the timing of single memory bank on Intermediary node
  • Delivers 74Kb/s throughput & robust in noisy public cloud network

See paper for mitigation strategies and other details

29

slide-30
SLIDE 30

Thank you!

Source code available at: github.com/ease-lab/bankrupt Contact details: dmitrii.ustiugov(at)ed.ac.uk Authors thank ARM Center of Excellence at University of Edinburgh for their support

30