Securing Passive Replication Through Verification Bruno Vavala 1,2 , - - PowerPoint PPT Presentation

securing passive replication through verification
SMART_READER_LITE
LIVE PREVIEW

Securing Passive Replication Through Verification Bruno Vavala 1,2 , - - PowerPoint PPT Presentation

Securing Passive Replication Through Verification Bruno Vavala 1,2 , Nuno Neves 1 , Peter Steenkiste 2 1 University of Lisbon (Portugal) 2 Carnegie Mellon University (U.S.) IEEE Symposium on Reliable and Distributed Systems, 2015 Outline


slide-1
SLIDE 1

Securing Passive Replication Through Verification

Bruno Vavala1,2, Nuno Neves1, Peter Steenkiste2

1University of Lisbon (Portugal) 2Carnegie Mellon University (U.S.) IEEE Symposium on Reliable and Distributed Systems, 2015
slide-2
SLIDE 2

Outline

  • Motivation and background
  • Goals
  • Architecture Design & System Operations
  • Evaluation
  • Takeaways
slide-3
SLIDE 3

Fault-Tolerance

  • Service continuity has to be ensured in case of failure
  • Components have to be replicated
  • Replicas must be coordinated

replication

coordination

3
slide-4
SLIDE 4

Fault-Tolerance

  • Service continuity has to be ensured in case of failure
  • Components have to be replicated
  • Replicas must be coordinated
  • Arbitrary failures require

+replicas +coordination

replication coordination

4
slide-5
SLIDE 5

Replication

Active Replication

(State Machine Replication)

Passive Replication

2 main design choices vs.

5
slide-6
SLIDE 6

Active Replication (AR)

State Machine approach: 1. System receives the requests 2. Requests are ordered (“many” messages) 3. Enough replicas execute them 4. Each replica returns an answer 5. Answers are voted

R1 R2 R3 R4 C Request Ordering Protocol 1 2 3 4 5

6
slide-7
SLIDE 7

Passive Replication (PR)

R1 R2 R3 R4 C

  • 1. Primary receives the

requests

  • 2. Requests are executed
  • 3. State updates are

broadcast

  • 4. Backups apply updates

and return ACK

  • 5. Primary votes on ACKs
  • 6. Primary replies to client

1 2 3 4 5 6

7
slide-8
SLIDE 8

Current BFT Solutions

  • PBFT (OSDI’99)

Seminal practical SMR work

  • Correia et al.(SRDS’04)

Hybrid model with TTCB

  • Zyzzyva (SOSP’07)

Speculative executions

  • Prime (DSN’08)

Bounded Delay Guarantee

  • MinBFT (TC’11)

Less replicas in hybrid model

  • CheapBFT (Eurosys’12)

Hybrid model, activation of passive replicas upon failures

  • BFT-SMaRt (DSN’14)

High performance

AR PR …and many . . many

  • thers!
8
slide-9
SLIDE 9

Why no PR solutions?

9
slide-10
SLIDE 10

Why no PR solutions?

AR

R1 R2 R3 R4 Voter system

correct answer

client ✔︎

  • Enough redundancy to extract correct answer
10
slide-11
SLIDE 11

Why no PR solutions?

AR PR

R1 R2 R3 R4 Voter system

correct answer

client R1 R2 R3 correct ?

?

  • Challenge: how to verify the result efficiently?
  • Trivial inefficient solution: re-execute the service

✔︎

11
slide-12
SLIDE 12

Pros & Cons

AR PR Byzantine FT

✔︎ ✗

Replicas

2f+1 2f+1

Re-Computations

O(n) O(1)

Message size

|request| +|input| |reply| +|update|

Non-determinism

✗ ✔

“While some consensus algorithms, such as Paxos […] have started to find their way into those systems, their uses are limited mostly to the maintenance of the global configuration information in the system, not for the actual data replication.” – L. Lamport et al. 12
slide-13
SLIDE 13

Outline

  • Motivation and background
  • Goals
  • Architecture Design & System Operations
  • Evaluation
  • Takeaways
slide-14
SLIDE 14

Goals

Fault-tolerant & resource-efficient & simple replicated architecture for unmodified services Challenges

  • Protect the service results from malicious failures
  • Efficient verification of the results
  • Ensure that state updates are correctly propagated
  • Ensure that client gets correct and consistent results
14
slide-15
SLIDE 15

Outline

  • Motivation and background
  • Goals
  • Architecture Design & System Operations
  • Evaluation
  • Takeaways
slide-16
SLIDE 16

V-PR

Verified Passive Replication

16
slide-17
SLIDE 17

Best of Both Worlds

AR PR V-PR Byzantine FT

✔︎ ✗ ✔

Replicas

(w/ trust assumptions)

2f+1 2f+1 2f+1

Executions

O(n) O(1) O(1)

Message size

|request| +|input| |reply| +|update| |reply| +|update|

Non-determinism

✗ ✔ ✔

17
slide-18
SLIDE 18

TCC Overview

  • Trusted Computing Component
  • It performs actual general-purpose computation
  • It provides trusted services (TPM-like)
  • It has internal registers that store the identity (i.e., hash) of running code
  • Primitives
  • put(data, ID)/get(data, ID). TCC-backed and ID-based secure external
  • storage. Only the same ID can store and retrieve data
  • execute(code, input). TCC-backed isolated execution of arbitrary code.

Running code is identified for ID-based operations

  • attest(). TCC signature that could carry information on running code and results
  • create/get/incr_counter(ID, name). Access controlled Trusted counters. Only ID

can read or modify them

  • verify(). Check validity of attestation, through manufacturer certificate

No different assumptions with respect to previous works, just a more powerful TCC!

18
slide-19
SLIDE 19

Model

  • TCC is crash-only

Rest of the system can fail arbitrarily (Byzantine)

  • TCC only usable through primitives
  • Correct Majority of replicas
  • Asynchronous model for safety, partially synchronous oth.
  • Model does not consider:
  • Denial of Service attacks
  • Physical tampering (at least not to the TCC hardware)
  • Service vulnerabilities
19
slide-20
SLIDE 20

V-PR Architecture

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager

20
slide-21
SLIDE 21

V-PR Architecture

  • Core components: SMW, Manager, U-Manager
  • Update service only applies state updates

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager

21
slide-22
SLIDE 22

V-PR Architecture

  • Service Client and Service are not modified
  • Important effort to make V-PR service oblivious

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager

22
slide-23
SLIDE 23

V-PR Architecture

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager

trusted untrusted trusted untrusted

  • Dual failure model (crash+Byzantine)
  • Two execution environments with different Trust assumptions
  • Entry point: execute(Manager) to call TCC service
23
slide-24
SLIDE 24

Read Requests

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager client request/reply 2.execute 1.client request/reply

  • Client SMW can verify primary’s execution and

establish a session key with the Manager

  • No state updates => read request
  • 2 messages
24
slide-25
SLIDE 25

Write Requests

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager state updates/ACKs 3.state updates/ACKs 4.trusted updates

  • Available state update => write request
  • 4 steps (of message passing) overall
25

6.check ACKs

slide-26
SLIDE 26

Outline

  • Motivation and background
  • Goals
  • Architecture Design & System Operations
  • Evaluation
  • Takeaways
slide-27
SLIDE 27

Evaluation

27
slide-28
SLIDE 28

Implementation

Hardware XMHF TrustVisor Manager Service trusted environment

TCC

  • Message passing with ZeroMQ
  • TCC with XMHF-TrustVisor

(S&P’10, S&P’13)

  • Full SQLite database engine
  • VPR-ed SQLite
  • OS-free implementation
  • very small TCB
28
  • Against recent AR schemes:
  • BFT-SMaRt
(IEEE DSN’14)
  • Prime
(IEEE TDSC’11)
slide-29
SLIDE 29

Performance

  • Overhead comparison among

BFT-SMaRt, Prime and V-PR

1 2 3 4 1 5 10 20 BFT-SMaRt V-PR 5 10 15 20 25 1 5 10 20 BFT-SMaRt V-PR Prime Batch size Batch size Read-latency (ms) Write-latency (ms)

29
slide-30
SLIDE 30

VPR-ed SQLite

5 10 15 20 25 30 35 1 2 5 7 Read Write

  • Realistic trusted executions are the bottleneck
  • 2 TCC execution at the primary (for write requests)
  • in pessimistic runs, 1 more TCC execution at backups

Latency (ms) Batch size

30
slide-31
SLIDE 31

Outline

  • Motivation and background
  • Goals
  • Architecture Design & System Operations
  • Evaluation
  • Takeaways
slide-32
SLIDE 32

Takeaways

  • Easy to design fault-tolerant protocols

using hardware-based security

  • V-PR is the first fully-passive replication scheme that tolerates Byzantine failures
  • No additional assumptions (compared to previous literature)
  • Linear factor reduction in executing replicas
  • Non-determinism supported by design
  • Main limitation is the current technology
  • …but it’s making progress, check out Intel SGX
32
slide-33
SLIDE 33

Thanks.

33
slide-34
SLIDE 34 34
slide-35
SLIDE 35 35
slide-36
SLIDE 36

System Initialization

  • Need to form a secure group
  • If other replicas participate, they could be later shutdown (state loss)
  • Share a unique key K (use TCC secure storage for confidentiality)
  • Start from same initial state

MPrimary MBackup Admin

attested JOIN check attestation attested ACCEPT +encr.{K} check ACKs, install initial state ACK initial state, TCC cert. check attestation

36
slide-37
SLIDE 37

Primary Change

  • Primary identified through local view counter
  • Each replica answer to only one specific primary
  • Detect primary’s failure through timeouts

(partial synchrony)

  • Start primary change protocol, but always answer to primary’s updates
  • Exchange messages to increment view counter
  • Eventually, no progress => new primary
  • Extreme cases
  • Multiple primaries: safe, because only one can make progress
  • Only one view increment:
  • replica wait for others to change primary
  • replica can make progress through consecutive updates anyway
37
slide-38
SLIDE 38

Implementation

  • Message passing w/ high

performance library ZeroMQ

  • TCC with XMHF (S&P’13)

and TrustVisor(S&P’10)

  • Full SQLite database engine
  • VPR-ed SQLite

primary backup client network client broker replica broker

38
slide-39
SLIDE 39

Implementation

Hardware HMHF TrustVisor Manager Service trusted environment

  • Some addressed challenges:
  • Extending the hypervisor to provide dynamic resource management and

trusted counters

  • Running the service in an untrusted environment (no OS support, no access

to devices, like disk): created custom APIs (memory allocation, debugging, etc.), custom filesystem (as a module, so no modification to SQLite)

TCC

  • Message passing w/ high

performance library ZeroMQ

  • TCC with XMHF (S&P’13)

and TrustVisor(S&P’10)

  • Full SQLite database engine
  • VPR-ed SQLite
39
slide-40
SLIDE 40

Reducing TCC Demand

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager 4.untrusted updates

  • Speculative update: validate it and send ACK
  • No TCC execution => 1 active TCC and rest are passive
  • Backup ACKs required: 2f+1

(yes, all of them, so at least a correct one always available)

40
slide-41
SLIDE 41

Blinder

service client Security MW OS OS TCC OS TCC Service Manager U-Manager primary backup client Manager Update Svc Update network U-Manager 2.execute/ blind reply

  • Reply’s authenticator is blinded during update
  • U-Manager cannot send it back to client and break

consistency

  • Reply is unblinded after ACKs are validated

5.unblind reply

41
slide-42
SLIDE 42

Code size

20 40 60 80 100 120 AR VPR Primary VPR Backup Update Network SQLite V-PR Average AR

  • Actively used code in fault-free scenario
  • KSLoC=thousand lines of source code
  • VPR Backup’s code is independent from the implemented service
  • Measurement of service code is not included
42