Prophecy : Using History for High Throughput Fault Tolerance - - PowerPoint PPT Presentation
Prophecy : Using History for High Throughput Fault Tolerance - - PowerPoint PPT Presentation
Prophecy : Using History for High Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University Non crash failures happen Non crash failures happen Non crash failures happen Non crash
Non‐crash failures happen Non crash failures happen
Non‐crash failures happen Non crash failures happen
Model as Byzantine (malicious) (malicious)
Mask Byzantine faults Mask Byzantine faults
Clients Service
Mask Byzantine faults Mask Byzantine faults
Throughput Throughput
R li t d i Clients Replicated service
Mask Byzantine faults Mask Byzantine faults
Throughput Throughput
R li t d i Clients Replicated service
Mask Byzantine faults Mask Byzantine faults
Throughput Throughput
R li t d i Clients Replicated service
Mask Byzantine faults Mask Byzantine faults
Throughput Throughput
R li t d i Clients Replicated service
Mask Byzantine faults Mask Byzantine faults
Throughput Throughput
Linearizability ( t i t ) (strong consistency)
R li t d i Clients Replicated service
Byzantine fault tolerance (BFT) Byzantine fault tolerance (BFT)
- Low throughput
Low throughput difi li
- Modifies clients
- Long‐lived sessions
Prophecy Prophecy
- High throughput + good consistency
High throughput + good consistency f l h
- No free lunch:
– Read‐mostly workloads – Slightly weakened consistency
Byzantine fault tolerance (BFT) Byzantine fault tolerance (BFT)
- Low throughput
D Prophecy
Low throughput difi li
D‐Prophecy
- Modifies clients
Prophecy
- Long‐lived sessions
Traditional BFT reads Traditional BFT reads
application application
Clients Replica Group
Traditional BFT reads Traditional BFT reads
application application Agree?
Clients Replica Group
A cache solution A cache solution
application cache application
Clients Replica Group
A cache solution A cache solution
application cache application Agree?
Clients Replica Group
A cache solution A cache solution
application cache application
Problems:
Agree?
- Huge cache
- Invalidation
Clients
Invalidation
Replica Group
A compact cache A compact cache
application cache application
Requests Responses req1 resp1 req2 resp2
Clients
req3 resp3
Replica Group
A compact cache A compact cache
application cache application
Requests Responses sketch(req1) sketch(resp1) sketch(req2) sketch(resp2)
Clients
sketch(req3) sketch(resp3)
Replica Group
A sketcher A sketcher
application sketcher application
Clients Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Fast, load‐balanced reads
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
key‐value store
…… ……
replicated state machine Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Maintain a fresh cache
Clients
……
fresh cache
…… ……
Replica Group
Did hi li i bilit ? Did we achieve linearizability? NO!
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… …… …… …… …… ……
Clients
…… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Clients
…… …… ……
Replica Group
Executing a read Executing a read
sketch webpage
…… ……
Agree?
…… …… ……
Fast reads may be stale
Clients
……
may be stale
…… ……
Replica Group
Load balancing Load balancing
sketch webpage
…… …… …… ……
Clients
…… ……
Replica Group
Load balancing Load balancing
sketch webpage
…… ……
Agree?
…… …… …… ……
Clients
…… ……
Replica Group
Load balancing Load balancing
sketch webpage
…… ……
Agree?
…… …… …… ……
Pr(k stale) = gk
Clients
…… ……
Replica Group
D‐Prophecy vs. BFT D Prophecy vs. BFT
Traditional BFT:
- Each replica executes read
p
- Linearizability
D‐Prophecy:
- One replica executes read
Clients
- “Delay‐once” linearizability
Replica Group
Byzantine fault tolerance (BFT) Byzantine fault tolerance (BFT)
- Low throughput
D Prophecy
Low throughput difi li
D‐Prophecy
- Modifies clients
Prophecy
- Long‐lived sessions
Key‐exchange overhead Key exchange overhead
Key‐exchange overhead Key exchange overhead
11%
Key‐exchange overhead Key exchange overhead
11% 3%
Internet services Internet services
Cli Clients Replica Group
A proxy solution A proxy solution
Sketcher
Cli
Proxy
Clients Replica Group
A proxy solution A proxy solution
C lid t Consolidate sketchers
Sketcher
Cli
Proxy
Clients Replica Group
A proxy solution A proxy solution
C lid t Consolidate sketchers
Sketcher
Cli Clients Replica Group
A proxy solution A proxy solution
Sk t h t Sketcher must be fail‐stop
Sketcher
Cli
d
Clients
Trusted
Replica Group
A proxy solution
Sketcher must be fail‐stop
A proxy solution
iddl b l d
Sketcher must be fail stop
- Trust middlebox already
- Small and simple
Sketcher
Cli
d
Clients
Trusted
Replica Group
Executing a read Executing a read
…… …… ……
q
……
Sketcher
Cli
d
q
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… ……
Sketcher
……
Cli
d
…… ……
Req Resp
( )
Clients
Trusted
…… ……
s(q) ⋅⋅⋅ ⋅⋅⋅
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Replica Group
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Req Resp
( )
Replica Group
s(q) ⋅⋅⋅ ⋅⋅⋅
Executing a read Executing a read
…… …… …… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Req Resp
( )
Replica Group
s(q) ⋅⋅⋅ ⋅⋅⋅
Prophecy Prophecy
…… …… …… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Replica Group
Prophecy Prophecy
…… ……
Fast, load‐balanced reads
…… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Replica Group
Prophecy Prophecy
…… ……
Fast reads may be stale
…… ……
Sketcher
Cli
d
…… ……
Clients
Trusted
…… ……
Req Resp
( )
Replica Group
s(q) ⋅⋅⋅ ⋅⋅⋅
Delay‐once linearizability Delay once linearizability
Delay‐once linearizability Delay once linearizability
Delay‐once linearizability Delay once linearizability
〈 W, R, W, W, R, R, W, R 〉
Delay‐once linearizability Delay once linearizability
Read‐after‐write property
〈 W, R, W, W, R, R, W, R 〉
Delay‐once linearizability Delay once linearizability
Read‐after‐write property
〈 W, R, W, W, R, R, W, R 〉
Example application Example application
- Upload embarrassing photos
Upload embarrassing photos
- 1. Remove colleagues from ACL
2 Upload photos
- 2. Upload photos
- 3. (Refresh)
- Weak may reorder
- Delay‐once preserves order
Byzantine fault tolerance (BFT) Byzantine fault tolerance (BFT)
- Low throughput
D Prophecy
Low throughput difi li
D‐Prophecy
- Modifies clients
Prophecy
- Long‐lived sessions
Implementation Implementation
- Modified PBFT
Modified PBFT
– PBFT is stable, complete Competitive with Zyzzyva et al – Competitive with Zyzzyva et. al.
- C++, Tamer async I/O
– Sketcher: ∼2000 LOC – PBFT library: ∼1140 LOC – PBFT client: ∼1000 LOC
Evaluation Evaluation
- Prophecy vs proxied‐PBFT
Prophecy vs. proxied PBFT
– Proxied systems
- D‐Prophecy vs. PBFT
– Non‐proxied systems
Evaluation Evaluation
- Prophecy vs proxied‐PBFT
Prophecy vs. proxied PBFT
– Proxied systems
- We will study:
– Performance on “null” workloads – Performance on null workloads – Performance with real replicated service Where system bottlenecks how to scale – Where system bottlenecks, how to scale
Basic setup Basic setup
Sk t h Sketcher
Clients
(concurrent)
Clients (100) Replica Group (PBFT)
Fraction of failed Fraction of failed fast reads
Fraction of failed Alexa top sites: Fraction of failed fast reads Alexa top sites: < 15%
Small benefit on null reads Small benefit on null reads
Small benefit on null reads Small benefit on null reads
Apache webserver setup Apache webserver setup
Sk t h Sketcher
Clients Clients Replica Group
Large benefit on real workload Large benefit on real workload
Large benefit on real workload Large benefit on real workload
3.7x
Large benefit on real workload Large benefit on real workload
3.7x 2.0x
Large benefit on real workload Large benefit on real workload
3.7x 2.0x
Benefit grows with work Benefit grows with work
Benefit grows with work Benefit grows with work
Benefit grows with work Benefit grows with work
Benefit grows with work Benefit grows with work
94μs (Apache)
Benefit grows with work Benefit grows with work
94μs (Apache) Null workloads are misleading! are misleading!
Benefit grows with work Benefit grows with work
Single sketcher bottlenecks Single sketcher bottlenecks
Single sketcher bottlenecks Single sketcher bottlenecks
Scaling out Scaling out
Scales linearly with replicas Scales linearly with replicas
Summary Summary
- Prophecy good for Internet services
p y g
– Fast, load‐balanced reads
- D Prophecy good for traditional services
- D‐Prophecy good for traditional services
- Prophecy scales linearly while PBFT stays flat
Prophecy scales linearly while PBFT stays flat
- Limitations:
– Read‐mostly workloads (meas. study corroborates) – Delay‐once linearizability (useful for many apps)
Thank You
Additional slides
Transitions Transitions
- Prophecy good for read‐mostly workloads
Prophecy good for read mostly workloads i i i i ?
- Are transitions rare in practice?
Measurement study Measurement study
- Alexa top sites
Alexa top sites
- Access main page every 20 sec for 24 hrs
Mostly static content Mostly static content
Mostly static content Mostly static content
Mostly static content Mostly static content
15%
Dynamic content Dynamic content
- Rabin fingerprinting on transitions
Rabin fingerprinting on transitions
- 43% differ by single contiguous change
- 43% differ by single contiguous change
- Sampled 4000 of them, over half due to: