Tiered Fault Tolerance for LongTerm Integrity ByungGon Chun (Intel - - PowerPoint PPT Presentation

▶

Oct 16, 2023 252 likes •546 views

Tiered Fault Tolerance for LongTerm Integrity ByungGon Chun (Intel Research Berkeley) Joint work with Petros ManiaCs (Intel Research Berkeley), ScoF Shenker (UC Berkeley, ICSI), and John Kubiatowicz (UC Berkeley) Longterm applicaCons

SLIDE 1

Tiered Fault Tolerance for Long‐Term Integrity

Byung‐Gon Chun (Intel Research Berkeley) Joint work with Petros ManiaCs (Intel Research Berkeley), ScoF Shenker (UC Berkeley, ICSI), and John Kubiatowicz (UC Berkeley)

SLIDE 2

Long‐term applicaCons

write(x‐file, ) read(x‐file)

SLIDE 3

Near‐term soluCons do not fit

BFT replicated systems: correct if the number of

faulty replicas is always less than some fixed threshold (1/3 of the replicas)

SLIDE 4

Near‐term soluCons do not fit

Node Node Node Node

SLIDE 5

A new approach to designing long‐term applicaCons

A reliability of a system’s components over

long spans of Cme can vary dramaCcally

Consider this differenCaCon for long‐term

applicaCons => Tiered fault‐tolerant system framework

Apply the framework to construct Bonafide, a

long‐term key‐value store

SLIDE 6

Roadmap

Tiered fault tolerance framework
Bonafide: a long‐term key‐value store

– Tiers: Trusted, Semi‐trusted, Untrusted

EvaluaCon

SLIDE 7

Monolithic fault‐tolerant system model

Node Node Node Node

SLIDE 8

Tiered fault‐tolerant system model

Node Node Node Node

SLIDE 9

Sources of differenCaCon

Different assurance pracCces

– Formally verified components vs. type‐unsafe so\ware

Care in the deployment of a system

– Tight physical access controls, responsive system administraCon vs. unreliable organizaCon

Rolling procurement of hardware and so\ware

– A trusted logical component vs. a less trusted component

Limited exposure

– Mostly offline vs. online

SLIDE 10

ReallocaCon of dependability budget

Use differenCaCon to refactor systems into

mulCple components in different fault Cers

Different operaConal pracCces for each

component class

High‐trust component Formally verified Limited funcConality Run infrequently/briefly Low‐trust component Buggier Larger Run conCnuously

SLIDE 11

Roadmap

Tiered fault tolerance framework
Bonafide: a long‐term key‐value store

– Tiers: Trusted, Semi‐trusted, Untrusted

EvaluaCon

SLIDE 12

Bonafide

A key‐value store designed to provide long‐

term integrity using the Cered fault framework

– Non‐self‐cerCfying data – A naming service for self‐cerCfying archival storage

Simple interface:

– Add(key, value) – Get(key) ‐> value

SLIDE 13

Design RaConale

Refactor the fucConality of the service into

– A more reliable fault Cer for state changes – A less reliable fault Cer for read‐only state queries

IsolaCon between these two Cers

– Trusted component for protecCng state during execuCon

f the unreliable Cer

– Use an algorithm to protect large service state with the component

Mask faults of the component in the more reliable Cer

– Use a BFT replicated state machine – Mostly offline, execute in a synchronized fashion

SLIDE 14

OperaCon of Bonafide

Time Node 1 Node 2 Node N (N=3f+1) S U S U U S S: Service U: Update

SLIDE 15

Components in Bonafide and their associated fault Cers

Fault bound Component When How used Watchdog Periodic Invoked MAS (Moded S phase Read

AFested Storage)

U phase WriFen/Read 1/3 ByzanCne Update U phase Replicate store Serve ADDs Unbounded Service S phase Serve GETs Buffer ADDs Audit/Repair

SLIDE 16

Guarantees

Guarantees integrity of returned data under
ur Cered fault assumpCon
Ensures liveness of S phases with fewer than

2/3 faulty replicas during S phases

Ensures durability if the system creates copies
f data faster than they are lost

SLIDE 17

Bonafide replica state and process

Moded‐AFested Storage (MAS) Buffer

Trusted storage Untrusted storage Get Audit/ Repair Add S phase Update U phase

AuthenCcated Search Tree (AST)

SLIDE 18

Top Cer: trusted

Cryptography and trusted hardware
Watchdog: Cme source, periodic reboot, sets a

mode bit of MAS

MAS: a mode bit, a set of storage slots, signing

key

– Store(q, v): store value v at slot q only in U phases – Lookup(q, z) ‐> value v of slot q and fresh aFestaCon (nonce z)

SLIDE 19

BoFom Cer: get

Client <Get,k,z> <Get,k,z> <Get,k,z> <Get,k,z> Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Get operaCon (S phase)

SLIDE 20

BoFom Cer: add

Client <Add,k,v> <Add,k,v> <Add,k,v> <Add,k,v> Add operaCon (S phase) Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Replies with MAS aFestaCon are sent a\er the following U phase.

SLIDE 21

BoFom Cer: audit and repair

MAS Fetch

SLIDE 22

Middle Cer: update process

2f + 1 (=3) PBFT agreements Reboot AST update/ Checkpoint Time

SLIDE 23

EvaluaCng the performance of Bonafide implementaCon

A prototype built with sfslite, PBFT, Berkeley DB

libraries

– Server Add/Get, Audit/Repair, Update processes – Client proxy process

Experiment setup

– Four replica nodes (outdated P4 PCs) running Fedora in a LAN – 1 million key‐value pairs iniCally populated – Add/Get Cme, Audit/repair Cme, U phase duraCon

SLIDE 24

Performance evaluaCon

Opera:on Time (ms) Mean (std) Get 3.1 (0.24) Add 1.0 (0.21) Data loss (%) Audit/Repair Time (s) Mean (std) 554.5 (54.6) 1 612.9 (30.3) 10 1147.6 (33.3) 100 3521.5 (201.6) Ac:on Time (s) Mean (std) Reboot 86.6 (2.1) Proposal creaCon 8.0 (4.0) Agreement 5.2 (1.0) AST update/Checkpoint 271.1 (24.8) Total 370.9 (24.0)

Get/Add Cme Audit/Repair Cme U phase duraCon

SLIDE 25

Availability

0.95 0.96 0.97 0.98 0.99 1 1 2 3 4 5 6 7 8 9

Availability U phase dura:on (minutes)

U phase period = 9 hours U phase period = 6 hours U phase period = 3 hours

SLIDE 26

Related work

BFT systems

– PBFT, PBFT‐PR, COCA – BFT‐2F, A2M‐PBFT, A2M – BFT erasure‐coded storage

DifferenCaCng trust levels

– Hybrid system model – wormholes model – Hybrid fault model – Different fault thresholds to different sites or clusters

Long‐term stores

– Self‐cerCfying bitstore – AnCquity, Oceanstore, Pergamum, Glacier, etc. – LOCKSS, POTSHARDS, CATS

SLIDE 27

Conclusion

Present a Cered fault‐tolerant system

framework

‐ A2M (SOSP07), Bonafide (FAST09), TrInc (NSDI09)

Build Bonafide, a safer key‐value store (of non‐

self‐cerCfying data) for long‐term integrity with the framework

SLIDE 28

Tiered Fault Tolerance for Long‐Term Integrity

Byung‐Gon Chun (Intel Research Berkeley) Joint work with Petros ManiaCs (Intel Research Berkeley), ScoF Shenker (UC Berkeley, ICSI), and John Kubiatowicz (UC Berkeley)

Long‐term applicaCons

write(x‐file, ) read(x‐file)

Near‐term soluCons do not fit

faulty replicas is always less than some fixed threshold (1/3 of the replicas)

Near‐term soluCons do not fit

A new approach to designing long‐term applicaCons

long spans of Cme can vary dramaCcally

applicaCons => Tiered fault‐tolerant system framework

long‐term key‐value store

Roadmap

– Tiers: Trusted, Semi‐trusted, Untrusted

Monolithic fault‐tolerant system model

Tiered fault‐tolerant system model

Sources of differenCaCon

– Formally verified components vs. type‐unsafe so\ware

– Tight physical access controls, responsive system administraCon vs. unreliable organizaCon

– A trusted logical component vs. a less trusted component

– Mostly offline vs. online

ReallocaCon of dependability budget

mulCple components in different fault Cers

component class

High‐trust component Formally verified Limited funcConality Run infrequently/briefly Low‐trust component Buggier Larger Run conCnuously

Roadmap

– Tiers: Trusted, Semi‐trusted, Untrusted

Bonafide

term integrity using the Cered fault framework

– Non‐self‐cerCfying data – A naming service for self‐cerCfying archival storage

– Add(key, value) – Get(key) ‐> value

Design RaConale

– A more reliable fault Cer for state changes – A less reliable fault Cer for read‐only state queries

– Trusted component for protecCng state during execuCon

– Use an algorithm to protect large service state with the component

– Use a BFT replicated state machine – Mostly offline, execute in a synchronized fashion

OperaCon of Bonafide

Time Node 1 Node 2 Node N (N=3f+1) S U S U U S S: Service U: Update

Components in Bonafide and their associated fault Cers

Fault bound Component When How used Watchdog Periodic Invoked MAS (Moded S phase Read

U phase WriFen/Read 1/3 ByzanCne Update U phase Replicate store Serve ADDs Unbounded Service S phase Serve GETs Buffer ADDs Audit/Repair

Guarantees

2/3 faulty replicas during S phases

Bonafide replica state and process

Trusted storage Untrusted storage Get Audit/ Repair Add S phase Update U phase

Top Cer: trusted

mode bit of MAS

key

– Store(q, v): store value v at slot q only in U phases – Lookup(q, z) ‐> value v of slot q and fresh aFestaCon (nonce z)

BoFom Cer: get

Client <Get,k,z> <Get,k,z> <Get,k,z> <Get,k,z> Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Get operaCon (S phase)

BoFom Cer: add

Client <Add,k,v> <Add,k,v> <Add,k,v> <Add,k,v> Add operaCon (S phase) Reply,k,v,proof,<rd,z> Reply,k,v,proof,<rd,z> f+1 (=2) valid matching responses Replies with MAS aFestaCon are sent a\er the following U phase.

BoFom Cer: audit and repair

Middle Cer: update process

2f + 1 (=3) PBFT agreements Reboot AST update/ Checkpoint Time

EvaluaCng the performance of Bonafide implementaCon

libraries

– Server Add/Get, Audit/Repair, Update processes – Client proxy process

– Four replica nodes (outdated P4 PCs) running Fedora in a LAN – 1 million key‐value pairs iniCally populated – Add/Get Cme, Audit/repair Cme, U phase duraCon

Performance evaluaCon

Get/Add Cme Audit/Repair Cme U phase duraCon

Availability

Related work

– PBFT, PBFT‐PR, COCA – BFT‐2F, A2M‐PBFT, A2M – BFT erasure‐coded storage

– Hybrid system model – wormholes model – Hybrid fault model – Different fault thresholds to different sites or clusters

– Self‐cerCfying bitstore – AnCquity, Oceanstore, Pergamum, Glacier, etc. – LOCKSS, POTSHARDS, CATS

Conclusion

framework

‐ A2M (SOSP07), Bonafide (FAST09), TrInc (NSDI09)

self‐cerCfying data) for long‐term integrity with the framework

Thank you!