Tolerating Latency in Replicated State Machines through Client - - PowerPoint PPT Presentation

tolerating latency in replicated state
SMART_READER_LITE
LIVE PREVIEW

Tolerating Latency in Replicated State Machines through Client - - PowerPoint PPT Presentation

Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester 1 , James Cowling 2 , Edmund B. Nightingale 3 , Peter M. Chen 1 , Jason Flinn 1 , Barbara Liskov 2 University of Michigan 1 , MIT CSAIL 2 ,


slide-1
SLIDE 1

Tolerating Latency in Replicated State Machines through Client Speculation

April 22, 2009

Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter M. Chen1, Jason Flinn1, Barbara Liskov2 University of Michigan1, MIT CSAIL2, Microsoft Research3

slide-2
SLIDE 2

Simple Service Configuration

NSDI'09 Benjamin Wester University of Michigan CSE 2

1 ++x x=1

slide-3
SLIDE 3

Replicated State Machines (RSM)

3

x=1 x=1 x=1 ++x 2 2 2 2 x=1 x=2 x=2 x=2 x=2 ++x ++x ++x

  • Agree on request
  • All non-faulty replies are

identical

NSDI'09 Benjamin Wester University of Michigan CSE

slide-4
SLIDE 4

RSMs have high latency

4

2 2 2

  • 1. Need many replies
  • 2. Agreement
  • 3. Geographic Distribution

NSDI'09 Benjamin Wester University of Michigan CSE

slide-5
SLIDE 5

5

Hide the Latency

  • Use speculative execution inside RSM
  • Speculate before consensus is reached

– Without faults, any reply predicts consensus value – Let client continue after receiving one reply

NSDI'09 Benjamin Wester University of Michigan CSE

slide-6
SLIDE 6

6

Overview

  • Introduction
  • Improving RSMs with speculation
  • Application to PBFT
  • Performance
  • Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

slide-7
SLIDE 7

Speculative Execution in RSM

  • Continue processing while waiting

7

Blocked Take Checkpoint x=1 Predict: 1 Speculate! Commit x=1

NSDI'09 Benjamin Wester University of Michigan CSE

slide-8
SLIDE 8

Critical path: first reply

  • Completion latency less relevant
  • First reply latency sets critical path

– Speed – Accuracy

  • Other desirable properties

– Throughput – Stability under contention – Smaller number of replicas

8

1 1

NSDI'09 Benjamin Wester University of Michigan CSE

slide-9
SLIDE 9

Requests while speculative

  • 1. Hold request

– Bad performance

  • 2. Distributed commit/rollback

– State tracking complex

9

while !check_lottery(): submit_tps() buy_corvette()

win?

yes

Predict win? = yes buy

What do we do with this?

NSDI'09 Benjamin Wester University of Michigan CSE

slide-10
SLIDE 10

Resolve speculations on the replicas

  • Explicitly encode dependencies as predicates
  • No special request handling needed
  • Replicas need to log past replies
  • Local decision at replicas matches client

10

yes

if win?=yes: buy

yes

NSDI'09 Benjamin Wester University of Michigan CSE

while !check_lottery(): submit_tps() buy_corvette()

win?

win? = yes

Predict win? = yes

slide-11
SLIDE 11

11

Overview

  • Introduction
  • Improving RSMs with speculation
  • Application to PBFT
  • Performance
  • Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

slide-12
SLIDE 12

Practical BFT

12

f=1

primary client

  • CS

[Castro and Liskov 1999]

NSDI'09 Benjamin Wester University of Michigan CSE

slide-13
SLIDE 13

13

Additional Details

  • Tentative execution

– PBFT/PBFT-CS complete in 4 phases

  • Read-only optimization

– Accurate answer from backup replica

  • Failure threshold

– Bound worst-case failure

  • Correctness

NSDI'09 Benjamin Wester University of Michigan CSE

slide-14
SLIDE 14

14

Overview

  • Introduction
  • Improving RSMs with speculation
  • Application to PBFT
  • Performance
  • Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

slide-15
SLIDE 15

Benchmarks

  • Shared counter

– Simple checkpoint – No computation

  • NFS: Apache httpd build

– Complex checkpoint – Significant computation

15 NSDI'09 Benjamin Wester University of Michigan CSE

slide-16
SLIDE 16

16

Topology

2.5 or 15 ms Primary 1. Primary-local 2. Primary-remote 3. Uniform

NSDI'09 Benjamin Wester University of Michigan CSE

slide-17
SLIDE 17

17

Base case: no replication

2.5 or 15 ms 1. Primary-local 2. Primary-remote 3. Uniform

NSDI'09 Benjamin Wester University of Michigan CSE

slide-18
SLIDE 18

Shared Counter

18

20 40 60 80 100 120 5 10 15 Run Time (sec) Network Delay (ms) PBFT PBFT-CS No replication

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

slide-19
SLIDE 19

Shared Counter

19

20 40 60 80 100 120 5 10 15 Run Time (sec) Network Delay (ms) PBFT PBFT-CS No replication Zyzzyva

Primary-local topology

[Kotla et al. 07]

NSDI'09 Benjamin Wester University of Michigan CSE

slide-20
SLIDE 20

Shared Counter

20

20 40 60 80 100 120 5 10 15 Run Time (sec) Network Delay (ms) PBFT PBFT-CS No replication

Uniform & Primary-remote topology

NSDI'09 Benjamin Wester University of Michigan CSE

slide-21
SLIDE 21

Shared Counter

21

20 40 60 80 100 120 5 10 15 Run Time (sec) Network Delay (ms) PBFT PBFT-CS No replication Zyzzyva

NSDI'09 Benjamin Wester University of Michigan CSE

Uniform & Primary-remote topology

slide-22
SLIDE 22

NFS: Apache build

22

5 10 15 20 25 30 35 5 10 15 Run Time (min) Network Delay (ms) PBFT PBFT-CS No replication

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

slide-23
SLIDE 23

NFS: Apache build

23

Uniform topology

5 10 15 20 25 30 35 5 10 15 Run Time (min) Network Delay (ms) PBFT PBFT-CS No replication

NSDI'09 Benjamin Wester University of Michigan CSE

slide-24
SLIDE 24

NFS: Apache build

24

Primary-remote topology

5 10 15 20 25 30 35 5 10 15 Run Time (min) Network Delay (ms) PBFT PBFT-CS No replication

NSDI'09 Benjamin Wester University of Michigan CSE

slide-25
SLIDE 25

NFS: With Failure

25

5 10 15 20 25 30 35 5 10 15 Run Time (min) Network Delay (ms) PBFT PBFT-CS No replication PBFT-CS (1% fail)

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

slide-26
SLIDE 26

Throughput (Shared Counter)

26

10 20 30 40 50 60 70 1 10 100 KOps/sec Number of Clients PBFT PBFT-CS Zyzzyva

LAN topology

NSDI'09 Benjamin Wester University of Michigan CSE

slide-27
SLIDE 27

27

Conclusion

  • Integrate client speculation within RSMs
  • Predicated requests: performance without complexity
  • Clients less sensitive to latency between replicas
  • 5x speedup over non-speculative protocol

Makes WAN deployments more practical

NSDI'09 Benjamin Wester University of Michigan CSE