Nomad : Mitigating Arbitrary Cloud Side Channels via - - PowerPoint PPT Presentation

nomad mitigating arbitrary
SMART_READER_LITE
LIVE PREVIEW

Nomad : Mitigating Arbitrary Cloud Side Channels via - - PowerPoint PPT Presentation

Nomad : Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration Soo-Jin Moon, Vyas Sekar Michael K. Reiter Co-residency side-channel attacks in clouds Stealing secrets (e.g., keys) VM VM VM Machine Machine Many


slide-1
SLIDE 1

Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration

Soo-Jin Moon, Vyas Sekar Michael K. Reiter

slide-2
SLIDE 2
  • Many different vectors

(e.g., L2/L3 cache, storage, main memory)

Co-residency side-channel attacks in clouds

Stealing secrets (e.g., keys)

Demonstrated side-channel attacks are not limited to:

  • Y. Zhang et al., CCS2012; T. Ristenpart et al., CCS2009; F. Liu et al., Oakland 2015

VM VM

Machine

VM

Machine

slide-3
SLIDE 3
  • 1. Requires significant/detailed upgrades
  • 2. Attack-specific

Limitations of Current Defenses

Hardware Hypervisor e.g., New cache design e.g., Deterministic execution e.g., Noise injection OS

Proposed defense includes but not limited to: Y. Zhang et al., CCS2013; T. Kim et al., USENIXSec 2012;

  • F. Liu and R. Lee, Micro 2014

OS

slide-4
SLIDE 4
  • 1. Requires significant/detailed upgrades
  • 2. Attack-specific

Limitations of Current Defenses

Hardware Hypervisor e.g., New cache design e.g., Deterministic execution e.g., Noise injection OS

Proposed defense includes but not limited to: Y. Zhang et al., CCS2013; T. Kim et al., USENIXSec 2012;

  • F. Liu and R. Lee, Micro 2014

What about future side-channel attacks?

OS

slide-5
SLIDE 5

Ideal Properties

1) General 2) Immediately deployable

slide-6
SLIDE 6

Ideal Properties

1) General 2) Immediately deployable

Single-tenancy?

slide-7
SLIDE 7

Ideal Properties

1) General 2) Immediately deployable

Single-tenancy?

slide-8
SLIDE 8

Nomad Ideas

1) General 2) Immediately deployable

slide-9
SLIDE 9

Nomad Ideas

1) General 2) Immediately deployable Tackle root-cause → Minimize co-residency

slide-10
SLIDE 10

Nomad Ideas

1) General 2) Immediately deployable Tackle root-cause Migration → Minimize co-residency

slide-11
SLIDE 11

Cloud Controller

Machine

VM

Machine Machine

VM VM

Nomad Vision: Migration-as-a-Service

VM

  • Provider-assisted
slide-12
SLIDE 12

Cloud Controller

Machine

VM

Machine Machine

VM VM

Nomad Vision: Migration-as-a-Service

Move VMs {…}

VM

  • Provider-assisted
slide-13
SLIDE 13

Cloud Controller

Machine

VM

Machine Machine

VM VM

Nomad Vision: Migration-as-a-Service

Move VMs {…}

VM

Cloud Provider Clients Service offering Opt-in?

  • Opt-in Service
  • Provider-assisted
slide-14
SLIDE 14

Nomad Practical Challenges

Cloud Controller

Machine Machine Machine

VM VM VM VM

Logic

Characterize information leakage due to co-residency

slide-15
SLIDE 15

Nomad Practical Challenges

Cloud Controller

Machine Machine Machine

VM VM VM VM

Logic

Characterize information leakage due to co-residency

Scalable Design

e.g., can Amazon EC2 run this?

slide-16
SLIDE 16

Nomad Practical Challenges

Cloud Controller

Machine Machine Machine

VM VM VM VM

Logic

Characterize information leakage due to co-residency

Scalable Design

e.g., can Amazon EC2 run this?

Practical Impact (cloud)

Minimal modifications?

slide-17
SLIDE 17

Nomad Practical Challenges

Cloud Controller

Machine Machine Machine

VM VM VM VM

Logic

Characterize information leakage due to co-residency

Scalable Design

e.g., can Amazon EC2 run this?

Practical Impact (cloud)

Minimal modifications?

Practical Impact (applications)

1) Advancement of VM migration techniques 2) Many cloud workloads with in-built resilience to migration

slide-18
SLIDE 18

Our Work

General side-channel defense via migration

  • 1. Idea
slide-19
SLIDE 19

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency

  • 2. Logic
slide-20
SLIDE 20

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency

  • 2. Logic

Scalable VM migration strategy that can handle large cloud deployments

  • 3. Scalable Design
slide-21
SLIDE 21

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency

  • 2. Logic

Scalable VM migration strategy that can handle large cloud deployments

  • 3. Scalable Design

Practical OpenStack implementation with minimal modifications

  • 4. Practical Impact
slide-22
SLIDE 22

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency Scalable VM migration strategy that can handle large cloud deployments Practical OpenStack implementation with minimal modifications

  • 2. Logic
  • 3. Scalable Design
  • 4. Practical Impact
slide-23
SLIDE 23

Threat Model

  • Can use any kind of resource
  • Can launch/terminate VMs at will
  • VMs of a given client can collaborate

Objective: Extract secrets via co-residency

slide-24
SLIDE 24

Threat Model

  • Can use any kind of resource
  • Can launch/terminate VMs at will
  • VMs of a given client can collaborate
  • Cannot control VM placement
  • No info. sharing across distinct clients

Objective: Extract secrets via co-residency

slide-25
SLIDE 25

Threat Model

  • Can use any kind of resource
  • Can launch/terminate VMs at will
  • VMs of a given client can collaborate
  • Cannot control VM placement
  • No info. sharing across distinct clients

Objective: Extract secrets via co-residency

  • Don’t know which other clients are malicious

Provider

?

slide-26
SLIDE 26

Information Leakage (InfoLeak) Model

Clients

InfoLeak ?

slide-27
SLIDE 27

Information Leakage (InfoLeak) Model

Clients

Replicated? (R or NR)

R InfoLeak ?

B1 B2

VM-level view

slide-28
SLIDE 28

Information Leakage (InfoLeak) Model

Clients

Replicated? (R or NR)

R InfoLeak ?

B1

NR

B1 B2 B2

VM-level view

slide-29
SLIDE 29

Information Leakage (InfoLeak) Model

Clients

R1 R2

Replicated? (R or NR) Collaborating? (C or NC)

R InfoLeak ?

B1

NR C

B1 B2 B2

VM-level view

slide-30
SLIDE 30

Information Leakage (InfoLeak) Model

Clients

R1 R2

Replicated? (R or NR) Collaborating? (C or NC)

R InfoLeak ?

B1

NR C NC

B1 B2 B2 R1 R2

VM-level view

slide-31
SLIDE 31

Replicated?

Information Leakage (InfoLeak) Model

Collaborating?

<NR,NC> <R,NC> <R,C> <NR,C>

NR R NC C

Least InfoLeak Most InfoLeak

slide-32
SLIDE 32

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency Scalable VM migration strategy that can handle large cloud deployments Practical OpenStack implementation with minimal modifications

  • 2. Logic
  • 3. Scalable Design
  • 4. Practical Impact
slide-33
SLIDE 33

System Overview

Cloud Controller

Machine

VM

Machine Machine

VM VM

Move VMs {…}

VM

slide-34
SLIDE 34

System Overview

Cloud Provider Clients Deployment model (e.g., <NR,NC>) Opt-in? Cloud Controller

Machine

VM

Machine Machine

VM VM

Move VMs {…}

VM

slide-35
SLIDE 35

Operational Timeline

1 epoch = D time units Run placement algorithm every epoch

Sliding Window of ∆ epochs

Time (epoch)

slide-36
SLIDE 36

Operational Timeline

1 epoch = D time units

Side-channel Parameters:

  • K: Information leakage rate (i.e., bits per time unit)
  • P: secret length (i.e., bits)

Run placement algorithm every epoch

Sliding Window of ∆ epochs

Time (epoch)

slide-37
SLIDE 37

Operational Timeline

1 epoch = D time units Run placement algorithm every epoch

Sliding Window of ∆ epochs

Time (epoch)

Provider chooses D and ∆ to AT LEAST satisfy:

D * ∆ * K < P

Extracted secret (bits) if two VMs are co-resident for ∆ epochs

slide-38
SLIDE 38

Placement Algorithm

Placement Algorithm

Recent VM Placements Deployment Model (e.g.,<NR,NC>) Client Workloads & Constraints VM Placement

slide-39
SLIDE 39

Placement Algorithm

Placement Algorithm

Recent VM Placements Deployment Model (e.g.,<NR,NC>) Client Workloads & Constraints VM Placement

Goal (per epoch):

Minimize a global sum of a client- pair InfoLeak across past ∆ epochs i.e., subject to a fixed migration budget

𝑑,𝑑′

𝐽𝑜𝑔𝑝𝑀𝑓𝑏𝑙𝑑 →𝑑′([𝑢 − ∆, 𝑢])

slide-40
SLIDE 40

Placement Algorithm

Placement Algorithm

Recent VM Placements Deployment Model (e.g.,<NR,NC>) Client Workloads & Constraints VM Placement

Goal (per epoch):

Minimize a global sum of a client- pair InfoLeak across past ∆ epochs i.e., subject to a fixed migration budget

𝑑,𝑑′

𝐽𝑜𝑔𝑝𝑀𝑓𝑏𝑙𝑑 →𝑑′([𝑢 − ∆, 𝑢])

F(Deployment Model)

slide-41
SLIDE 41

Placement Algorithm

Placement Algorithm

Recent VM Placements Deployment Model (e.g.,<NR,NC>) Client Workloads & Constraints VM Placement

Goal (per epoch):

Minimize a global sum of a client- pair InfoLeak across past ∆ epochs i.e., subject to a fixed migration budget

𝑑,𝑑′

𝐽𝑜𝑔𝑝𝑀𝑓𝑏𝑙𝑑 →𝑑′([𝑢 − ∆, 𝑢])

F(Deployment Model) F(Network Capacity)

slide-42
SLIDE 42

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

slide-43
SLIDE 43

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

  • ILP (Integer Linear Programming)

For 40 machines, D > 1 day

slide-44
SLIDE 44

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

  • ILP (Integer Linear Programming)

For 40 machines, D > 1 day

slide-45
SLIDE 45

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

  • ILP (Integer Linear Programming)
  • Basic Greedy

For 40 machines, D > 1 day For 400 machines, D > 1 day

slide-46
SLIDE 46

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

  • ILP (Integer Linear Programming)
  • Basic Greedy

For 40 machines, D > 1 day For 400 machines, D > 1 day

slide-47
SLIDE 47

Placement Algorithm

Inputs VM Placement

Challenge: Scalability

Should handle tens of thousands of servers

  • ILP (Integer Linear Programming)
  • Basic Greedy
  • Basic Greedy with our optimizations

For 40 machines, D > 1 day For 400 machines, D > 1 day

slide-48
SLIDE 48

Why is Basic Greedy not scalable?

Generate Moves Compute Benefit

(total reduction in infoLeak)

Pick Best Move Make Move

totalNumMove > Budget

Pairwise Swap: 1-2 -> 2-1

Exit Yes No

N-way Swap: …

slide-49
SLIDE 49

Why is Basic Greedy not scalable?

Generate Moves Compute Benefit

(total reduction in infoLeak)

Pick Best Move Make Move

totalNumMove > Budget

Free Insert: 1 -> M1 Pairwise Swap: 1-2 -> 2-1

Exit Yes No

N-way Swap: …

Bottleneck #1:

Large Search Space

slide-50
SLIDE 50

Why is Basic Greedy not scalable?

Generate Moves Compute Benefit

(total reduction in infoLeak)

Pick Best Move Make Move

totalNumMove > Budget

Free Insert: 1 -> M1 Pairwise Swap: 1-2 -> 2-1

Exit Yes No

N-way Swap: …

Bottleneck #1:

Large Search Space

Bottleneck #2:

Computing InfoLeak across all clients

slide-51
SLIDE 51

Why is Basic Greedy not scalable?

Generate Moves Compute Benefit

(total reduction in infoLeak)

Pick Best Move Make Move

totalNumMove > Budget

Free Insert: 1 -> M1 Pairwise Swap: 1-2 -> 2-1

Exit Yes No

N-way Swap: …

Bottleneck #1:

Large Search Space

Bottleneck #2:

Computing InfoLeak across all clients

Bottleneck #3:

Re-generating move table after each move

slide-52
SLIDE 52

Our Approach

Large Search Space Computing InfoLeak across all clients Re-generating move table after each move Prune Search Space Incremental Benefit Computation Intra-Epoch Lazy Evaluation

Bottlenecks Our Approach

slide-53
SLIDE 53

Our Approach

Large Search Space Computing InfoLeak across all clients Re-generating move table after each move Prune Search Space Incremental Benefit Computation Intra-Epoch Lazy Evaluation

Bottlenecks Our Approach

slide-54
SLIDE 54

Prune #1: Pruning Move Space

Sets of all moves Insert 1 -> M1 Pairwise Swap 1-2 -> 2-1 N-way Swap ….

...

slide-55
SLIDE 55

Prune #1: Pruning Move Space

Nomad sets of all moves Sets of all moves Insert 1 -> M1 Pairwise Swap 1-2 -> 2-1 N-way Swap …. Free Insert

1 -> M1

Pairwise Swap

1-2 -> 2-1

...

slide-56
SLIDE 56

Prune #2: Hierarchical Decomposition

Sets of all free inserts M1 M2

. . .

M50000 C1

. .

C1000

Clients Machines

slide-57
SLIDE 57

Prune #2: Hierarchical Decomposition

Sets of all free inserts M1 M2

. . .

M50000 C1

. .

C1000

Clients Machines

slide-58
SLIDE 58

Prune #2: Hierarchical Decomposition

Nomad sets of all free inserts Sets of all free inserts M1 M2

. . .

M50000 C1

. .

C1000 C1 . C1000

Clients Machines

Cluster1 . Cluster25

slide-59
SLIDE 59

Prune #2: Hierarchical Decomposition

Nomad sets of all free inserts Sets of all free inserts M1 M2

. . .

M50000 C1

. .

C1000 M1 M2 . . M2000

...

C1 . C1000

Clients Machines

Cluster1

C1 Cluster1 . Cluster25

slide-60
SLIDE 60

Our Work

General side-channel defense via migration

  • 1. Idea

Characterize information leakage due to co-residency Scalable VM migration strategy that can handle large cloud deployments Practical OpenStack implementation with minimal modifications

  • 2. Logic
  • 3. Scalable Design
  • 4. Practical Impact
slide-61
SLIDE 61

System Implementation

Cloud Controller

Cluster N Placement Algorithm Cluster 1 Placement Algorithm

VM Placements for Cluster1

Clients in Cluster 1 Clients in Cluster N

VM Placements for Cluster1

OpenStack

  • v. Icehouse
slide-62
SLIDE 62

System Implementation (One Cluster)

VM Placement

OpenStack Icehouse: ~200 LOC in Controller Scheduler code Custom C++ ~2000 LOC General Placement Computation OpenStack-specific Migration Engine

Cluster 1 Placement Algorithm

slide-63
SLIDE 63

System Implementation (One Cluster)

VM Placement

OpenStack Icehouse: ~200 LOC in Controller Scheduler code Custom C++ ~2000 LOC General Placement Computation OpenStack-specific Migration Engine

Cluster 1 Placement Algorithm

Requires minimal modifications to existing deployments

slide-64
SLIDE 64

Key Evaluation Questions

  • Information leakage resilience
  • Scalability
  • Impact on cloud applications
  • Benefit/Cost of each design idea
  • Resilience to strategic adversary
slide-65
SLIDE 65

Information Leakage resilience

<R,C>: Problem size of 20-machines

Nomad brings ~4.5x reduction in InfoLeak for 98th percentile compared to static w.r.t. ILP. Metric:

𝐽𝑜𝑔𝑝𝑀𝑓𝑏𝑙𝑑 →𝑑′([𝑢 − ∆, 𝑢])

slide-66
SLIDE 66

Nomad placement algorithm is scalable to large deployments

Scalability

slide-67
SLIDE 67

Impact on cloud applications

Replicated web-server (Wikibench)

  • Each client : 3 replicated web servers, 1 worker

– In one epoch, at least 1 server migrates

𝑂𝑝𝑠𝑛. 𝑈ℎ𝑠𝑝𝑣𝑕ℎ𝑞𝑣𝑢 (𝑂𝑝𝑠𝑛. 𝑈) = 𝑈𝑥/𝑝 − 𝑈

𝑥

𝑈𝑥/𝑝 𝑦 100

  • Overhead (Norm. T)

– ~0% for 95th Norm T. – 0.096% for 50th (median) Norm. T. – 1.8% for 5th Norm. T

slide-68
SLIDE 68

Discussion

  • Fast side-channel attacks

– Need out-of-band defense – e.g., introduce cache noise, refresh secret

  • Network Impact

– With techniques like incremental diffs, the transfer size is much less than base VM image

  • Incentives for adoption

– Security-conscious clients opt-in – Providers have new revenue streams

  • More opportunities

– Fairness across clients

slide-69
SLIDE 69

Conclusions

  • Co-residency side-channel attacks: real/growing threats
  • Nomad achieves:

– Information leakage resilience close to the ILP – Scalable VM placement algorithm – Practical system atop OpenStack with minimal modifications

Current World : No Migration

  • 1. Per-attack fixes
  • 2. Require significant upgrades

Nomad: “Migration-as-a-Service”

  • 1. General solution
  • 2. Needs minimal changes