Tiered Fault Tolerance for LongTerm Integrity ByungGon Chun (Intel - - PowerPoint PPT Presentation

tiered fault tolerance for long term integrity
SMART_READER_LITE
LIVE PREVIEW

Tiered Fault Tolerance for LongTerm Integrity ByungGon Chun (Intel - - PowerPoint PPT Presentation

Tiered Fault Tolerance for LongTerm Integrity ByungGon Chun (Intel Research Berkeley) Joint work with Petros ManiaCs (Intel Research Berkeley), ScoF Shenker (UC Berkeley, ICSI), and John Kubiatowicz (UC Berkeley) Longterm applicaCons


slide-1
SLIDE 1

Tiered Fault Tolerance for Long‐Term Integrity

Byung‐Gon Chun (Intel Research Berkeley) Joint work with Petros ManiaCs (Intel Research Berkeley), ScoF Shenker (UC Berkeley, ICSI), and John Kubiatowicz (UC Berkeley)

slide-2
SLIDE 2

Long‐term applicaCons

write(x‐file, ) read(x‐file)

slide-3
SLIDE 3

Near‐term soluCons do not fit

  • BFT replicated systems: correct if the number of

faulty replicas is always less than some fixed threshold (1/3 of the replicas)

slide-4
SLIDE 4

Near‐term soluCons do not fit

Node Node Node Node

slide-5
SLIDE 5

A new approach to designing long‐term applicaCons

  • A reliability of a system’s components over

long spans of Cme can vary dramaCcally

  • Consider this differenCaCon for long‐term

applicaCons => Tiered fault‐tolerant system framework

  • Apply the framework to construct Bonafide, a

long‐term key‐value store

slide-6
SLIDE 6

Roadmap

  • Tiered fault tolerance framework
  • Bonafide: a long‐term key‐value store

– Tiers: Trusted, Semi‐trusted, Untrusted

  • EvaluaCon
slide-7
SLIDE 7

Monolithic fault‐tolerant system model

Node Node Node Node

slide-8
SLIDE 8

Tiered fault‐tolerant system model

Node Node Node Node

slide-9
SLIDE 9

Sources of differenCaCon

  • Different assurance pracCces

– Formally verified components vs. type‐unsafe so\ware

  • Care in the deployment of a system

– Tight physical access controls, responsive system administraCon vs. unreliable organizaCon

  • Rolling procurement of hardware and so\ware

– A trusted logical component vs. a less trusted component

  • Limited exposure

– Mostly offline vs. online

slide-10
SLIDE 10

ReallocaCon of dependability budget

  • Use differenCaCon to refactor systems into

mulCple components in different fault Cers

  • Different operaConal pracCces for each

component class

High‐trust component Formally verified Limited funcConality Run infrequently/briefly Low‐trust component Buggier Larger Run conCnuously

slide-11
SLIDE 11

Roadmap

  • Tiered fault tolerance framework
  • Bonafide: a long‐term key‐value store

– Tiers: Trusted, Semi‐trusted, Untrusted

  • EvaluaCon
slide-12
SLIDE 12

Bonafide

  • A key‐value store designed to provide long‐

term integrity using the Cered fault framework

– Non‐self‐cerCfying data – A naming service for self‐cerCfying archival storage

  • Simple interface:

– Add(key, value) – Get(key) ‐> value

slide-13
SLIDE 13

Design RaConale

  • Refactor the fucConality of the service into

– A more reliable fault Cer for state changes – A less reliable fault Cer for read‐only state queries

  • IsolaCon between these two Cers

– Trusted component for protecCng state during execuCon

  • f the unreliable Cer

– Use an algorithm to protect large service state with the component

  • Mask faults of the component in the more reliable Cer

– Use a BFT replicated state machine – Mostly offline, execute in a synchronized fashion

slide-14
SLIDE 14

OperaCon of Bonafide

Time Node 1 Node 2 Node N (N=3f+1) S U S U U S S: Service U: Update

slide-15
SLIDE 15

Components in Bonafide and their associated fault Cers

Fault bound Component When How used Watchdog Periodic Invoked MAS (Moded S phase Read

AFested Storage)

U phase WriFen/Read 1/3 ByzanCne Update U phase Replicate store Serve ADDs Unbounded Service S phase Serve GETs Buffer ADDs Audit/Repair

slide-16
SLIDE 16

Guarantees

  • Guarantees integrity of returned data under
  • ur Cered fault assumpCon
  • Ensures liveness of S phases with fewer than

2/3 faulty replicas during S phases

  • Ensures durability if the system creates copies
  • f data faster than they are lost
slide-17
SLIDE 17

Bonafide replica state and process

Moded‐AFested Storage (MAS) Buffer

Trusted storage Untrusted storage Get Audit/ Repair Add S phase Update U phase

AuthenCcated Search Tree (AST)

slide-18
SLIDE 18

Top Cer: trusted

  • Cryptography and trusted hardware
  • Watchdog: Cme source, periodic reboot, sets a

mode bit of MAS

  • MAS: a mode bit, a set of storage slots, signing

key

– Store(q, v): store value v at slot q only in U phases – Lookup(q, z) ‐> value v of slot q and fresh aFestaCon (nonce z)

slide-19
SLIDE 19

BoFom Cer: get

Client <Get,k,z> <Get,k,z> <Get,k,z> <Get,k,z> Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Get operaCon (S phase)

slide-20
SLIDE 20

BoFom Cer: add

Client <Add,k,v> <Add,k,v> <Add,k,v> <Add,k,v> Add operaCon (S phase) Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Replies with MAS aFestaCon are sent a\er the following U phase.

slide-21
SLIDE 21

BoFom Cer: audit and repair

MAS Fetch

slide-22
SLIDE 22

Middle Cer: update process

2f + 1 (=3) PBFT agreements Reboot AST update/ Checkpoint Time

slide-23
SLIDE 23

EvaluaCng the performance of Bonafide implementaCon

  • A prototype built with sfslite, PBFT, Berkeley DB

libraries

– Server Add/Get, Audit/Repair, Update processes – Client proxy process

  • Experiment setup

– Four replica nodes (outdated P4 PCs) running Fedora in a LAN – 1 million key‐value pairs iniCally populated – Add/Get Cme, Audit/repair Cme, U phase duraCon

slide-24
SLIDE 24

Performance evaluaCon

Opera:on Time (ms) Mean (std) Get 3.1 (0.24) Add 1.0 (0.21) Data loss (%) Audit/Repair Time (s) Mean (std) 554.5 (54.6) 1 612.9 (30.3) 10 1147.6 (33.3) 100 3521.5 (201.6) Ac:on Time (s) Mean (std) Reboot 86.6 (2.1) Proposal creaCon 8.0 (4.0) Agreement 5.2 (1.0) AST update/Checkpoint 271.1 (24.8) Total 370.9 (24.0)

Get/Add Cme Audit/Repair Cme U phase duraCon

slide-25
SLIDE 25

Availability

0.95 0.96 0.97 0.98 0.99 1 1 2 3 4 5 6 7 8 9

Availability U phase dura:on (minutes)

U phase period = 9 hours U phase period = 6 hours U phase period = 3 hours

slide-26
SLIDE 26

Related work

  • BFT systems

– PBFT, PBFT‐PR, COCA – BFT‐2F, A2M‐PBFT, A2M – BFT erasure‐coded storage

  • DifferenCaCng trust levels

– Hybrid system model – wormholes model – Hybrid fault model – Different fault thresholds to different sites or clusters

  • Long‐term stores

– Self‐cerCfying bitstore – AnCquity, Oceanstore, Pergamum, Glacier, etc. – LOCKSS, POTSHARDS, CATS

slide-27
SLIDE 27

Conclusion

  • Present a Cered fault‐tolerant system

framework

‐ A2M (SOSP07), Bonafide (FAST09), TrInc (NSDI09)

  • Build Bonafide, a safer key‐value store (of non‐

self‐cerCfying data) for long‐term integrity with the framework

slide-28
SLIDE 28

Thank you!