Bankrupt Covert Channel:
Turning Network Predictability into Vulnerability
Dmitrii Ustiugov, Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh
This work is supported by ARM Center of Excellence at University of Edinburgh
Bankrupt Covert Channel: Turning Network Predictability into - - PowerPoint PPT Presentation
Bankrupt Covert Channel: Turning Network Predictability into Vulnerability Dmitrii Ustiugov , Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh This work is supported by ARM Center of Excellence at University of Edinburgh
Dmitrii Ustiugov, Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh
This work is supported by ARM Center of Excellence at University of Edinburgh
2
Data breaches happen ☹
Cloud vendors strive to contain stolen info
3
Spy with secret Receiver
How to extract secret? Secure cloud environment Outside world
Data breaches happen ☹
Cloud vendors strive to contain stolen info
4
Spy with secret Receiver
How to extract secret?
Image by Lynn Willis, sourceIsolation layers Secure cloud environment Outside world
Data breaches happen ☹
Cloud vendors strive to contain stolen info
5
Spy with secret Receiver VM isolation Physical server isolation Virtual network isolation
Secret extraction?
6
Definition: Communication without using legitimate data transfer mechanisms
○ High latency for transmitting “1”, low for “0”
Image by Rick Leche (flipped vertically), source
7
Allow communication across cluster/datacenter Breach many isolation layers at once Stereotypical thinking: Networks are noisy ⇨ low accuracy and low throughput channels
Spy with secret Receiver VM isolation Physical server isolation Virtual network isolation
Secret extraction
Remote Direct Memory Access (RDMA)
○
AWS, Azure, Alibaba, Oracle, …
RDMA network packets bypass destination CPU
2-4μsec
100+Gb/s Nodes use one-sided reads/writes to their private data in remote node’s memory
8
CPU NIC
Node A
CPU NIC
Node B
Memory RDMA network
Remote region for A Data Data
One-sided RDMA read
Memory
9
First glance at bandwidth in modern servers
Expectation: Memory BW always much larger
NIC CPU NIC
Node B
Memory
Memory
Network BW 100Gb/s Memory BW 100GB/s=800Gb/s CPU
Node A
10
First glance at bandwidth in modern servers
Expectation: Memory BW always much larger Wrong!
○ E.g., same for both Micron DDR2, DDR4 ○ Bank behaves as FIFO: ~50ns fixed service time
CPU NIC
Node A
CPU NIC
Node B
Memory
Memory
Network BW 100Gb/s
…
100s of banks
Memory BW 100GB/s=800Gb/s
10Gb/s
11
Key features
Basic idea
the latency of one memory bank on an Intermediary node
○ No shared memory between Sender and Receiver
Sender
CPU NIC
Intermediary Memory
…
Bank
Receiver
Bursts of RDMA reads
Bank delay Timeline High Low
1 0 1 0 1 0
Probes (individual RDMA reads)
RDMA network
12
1. Search for addresses that map to target bank
○ Recall: No memory sharing!
2. Determining communication parameters
○ How many RDMA reads per burst? ○ Transmission frequency?
○ Receiving (probes) frequency?
Sender
CPU NIC
Intermediary Memory
…
Bank
Receiver
RDMA network
Bursts of RDMA reads Probes (individual RDMA reads)
Bank delay Timeline High Low
1 0 1 0 1 0
13
Virtual addresses translated to physical upon access
14
p p p p p p p c c c c c c … ? ? ? ? ? ? ? ? ? p p p p p p p c c c c c c … Virtual address (VA)
Cache block offset Page offset Arbitrary bits, defined by OS
Physical address (PA) Translate VA->PA
All bits within page are same
Within a page, physical address bits same as in virtual address
5 63
Some physical address bits, “bank bits”, define bank
How to find addresses in same bank?
15
? ? ? ? ? ? ? ? ? p p p p p p p c c c c c c … Physical address (PA) XOR function Bank location Bank bits define bank location
Attacker (Sender and Receiver independently): 1. Chooses arbitrary addresses in remote memory
○ Reads to same cache blocks (64B) coalesced ⇨ set {5:0} bits to 0
2. Issues RDMA reads to chosen addresses and measures network throughput
○ Network BW = number of serving banks x10Gb/s
16
p p p p p p p 0 0 0 0 0 0 … Virtual address (VA)
In cache block In page (e.g., 1GB)
Throughput Single bank’s BW = ~10Gb/s
(can vary slightly across vendors)
Measurements
Attacker (Sender and Receiver independently) 1. Reduces subset of addresses
a. Set {6:0} bits to 0
2. Issues RDMA reads & measures throughput
17
p p p p p p 0 0 0 0 0 0 0 … Virtual address (VA)
In cache block In page (e.g., 1GB)
Throughput Measurements
Throughput dropped by 2x ⇨ bit 6 is bank bit
Single bank’s BW = ~10Gb/s
(can vary slightly across vendors)
Bank bit!
Attacker (Sender and Receiver independently) 1. Reduces subset of addresses
a. Set {7:0} bits to 0
2. Issues RDMA reads & measures throughput
18
p p p p p 0 0 0 0 0 0 0 0 … Virtual address (VA)
In cache block In page (e.g., 1GB)
Throughput Measurements
Same throughput ⇨ bit 7 is NOT bank bit
Single bank’s BW = ~10Gb/s
(can vary slightly across vendors)
Not a bank bit!
Attacker (Sender and Receiver independently) 1. Reduces subset of addresses
a. Set {8:0} bits to 0
2. Issues RDMA reads & measures throughput
19
p p p p 0 0 0 0 0 0 0 0 0 … Virtual address (VA)
In cache block In page (e.g., 1GB)
Throughput
Measurements
Throughput dropped by 2x ⇨ bit 8 is bank bit
Single bank’s BW = ~10Gb/s
(can vary slightly across vendors)
Attacker (Sender and Receiver independently) 1. Reduces subset of addresses
a. Set {N-6:0} bits to 0
2. Issues RDMA reads & measures throughput Knowing bank bits locations, choose arbitrary addresses with
20
p p 0 0 0 0 0 0 0 0 0 0 0 … Virtual address (VA)
In cache block In page (e.g., 1GB)
Throughput
Measurements
Throughput saturated ⇨ all bank bits zeroed
Trivial complexity: Remote attacker finds addresses in <1 second
Single bank’s BW = ~10Gb/s
(can vary slightly across vendors)
21 Sender
CPU NIC
Intermediary Memory
…
Bank
Receiver
RDMA network
Bursts of RDMA reads Probes (individual RDMA reads)
Bank delay Timeline High Low
1 0 1 0 1 0
Key parameter: Sender’s burst size
⇨ higher accuracy
○ Especially in noisy networks
⇨ higher frequency
22
Burst size Transmission accuracy Transmission frequency Optimization Space
Transmission period estimation
○ Example: 32-bit preamble & 200-bit payload
looking for pre-agreed preamble value
Key parameter: Probing frequency
frequency > 2MHz (1/0.5μseconds)
23
Timeline Probe Round-Trip Delay Measurements
Transmission period
Private Cluster (isolated and loaded network )
Cluster size: 6 nodes Infiniband CPU: Xeon E5-2630v4 (Broadwell) RAM: 64GB, DDR4-2400 NIC: Mellanox CX-5, 56Gb/s
Public Cloud: CloudLab Utah (80% utilized during measurements)
Cluster size: 200 nodes Infiniband CPU: Xeon E5-2640v4 (Broadwell) RAM: 64GB, DDR4-2400 NIC: Mellanox CX-4, 50Gb/s
24
Burst size 32 (x 64 bytes) minimum required for reliable decoding Larger burst sizes
○ Larger gap between high and low delay
25
Burst size: 32 128
Network loading μbenchmark issues 40Gb/s (70% link BW) RDMA read traffic to Intermediary Signal with burst size 32 indistinguishable Signal clear with burst size of 128
26
Burst size: 32 128
Noise efficiently compensated for with larger burst size
1. CPU hardware counters
○ Bankrupt loads only one bank, <1% of memory ○ CPU counters too coarse-grain
2. Measure local memory access time (LMAT) with software random-access μbenchmark
3. Network counters non-helpful: No network resources congested
27
Accuracy as % of correctly transmitted bits Throughput: True data rate without preambles and errors Similar accuracy but 20% lower throughput in CloudLab
Optimal burst size of 32
28
Sweet spot between transmission frequency & accuracy
20%
Covert channels allow bypassing cloud isolation layers We introduce Bankrupt covert channel
See paper for mitigation strategies and other details
29
Source code available at: github.com/ease-lab/bankrupt Contact details: dmitrii.ustiugov(at)ed.ac.uk Authors thank ARM Center of Excellence at University of Edinburgh for their support
30