Performance of UDP-based Byzantine Fault Tolerant Consensus Final - - PowerPoint PPT Presentation

performance of udp based byzantine fault tolerant
SMART_READER_LITE
LIVE PREVIEW

Performance of UDP-based Byzantine Fault Tolerant Consensus Final - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Performance of UDP-based Byzantine Fault Tolerant Consensus Final talk for the Bachelors Thesis by Lucas Mair advised by Richard von Seck,


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Performance of UDP-based Byzantine Fault Tolerant Consensus

Final talk for the Bachelor’s Thesis by

Lucas Mair

advised by Richard von Seck, M. Sc. and Johannes Schleger, M. Sc. Wednesday 8th July, 2020 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Structure

  • 1. Introduction

1.1 State Machine Replication 1.2 Byzantine Fault Tolerance 1.3 HotStuff

  • 2. Research Questions
  • 3. Related Work
  • 4. Problem Analysis
  • 5. Approach and Implementation

5.1 Library Structure 5.2 Implementation Approach 5.3 Conduction of Measurements

  • 6. Experiment Evaluation
  • 7. Conclusion

Lucas Mair — BFT Consensus 2

slide-3
SLIDE 3

Introduction

State Machine Replication (SMR) State machine replication is used for providing a fault-tolerant service to process client requests.

Figure 1: Example system using SMR

Lucas Mair — BFT Consensus 3

slide-4
SLIDE 4

Introduction

State Machine Replication (SMR)

  • Input order of multiple client requests is important
  • Each replica generates separate output
  • System has to agree on single output
  • System will function correctly as long as number of faulty replicas is below a threshold:
  • Fault tolerance for crash failures: at least 2f + 1 replicas for f faulty ones (majority decision)
  • Fault tolerance for byzantine failures: at least 3f + 1 replicas for f faulty ones

Lucas Mair — BFT Consensus 4

slide-5
SLIDE 5

Introduction

Byzantine Fault Tolerance

  • Arbitrary behaviour (e.g. sending wrong output) of faulty replicas instead of crashing
  • Decision has to be reached with N − f results (for N replicas)
  • Those results could include f faulty ones ⇒ at least f +1 correct results are needed as well:

N − f ≥ f + f + 1 ⇒ N ≥ 3f + 1

  • Consensus protocols facilitate the agreement process using those results

Lucas Mair — BFT Consensus 5

slide-6
SLIDE 6

Introduction

HotStuff The thesis focuses on the BFT consensus protocol "HotStuff" [8] ... which improves upon the complexity bounds of previous algorithms such as PBFT

  • Improves footprint of authentication messages per consensus round from O(n3) in PBFT

to O(n2) in case of O(n) consecutive leader failures

  • Proposed feasibility for large-scale applications (100 and more replicas)
  • Example implementation in C++ freely accessible on GitHub [7]
  • Uses TCP and TLS for network communication
  • Communication model assumes reliable point-to-point connections

Lucas Mair — BFT Consensus 6

slide-7
SLIDE 7

Research Questions

slide-8
SLIDE 8

Research Questions

Optimizing HotStuff using UDP Central question of the Thesis: Is the usage of TCP necessary for the network communication of HotStuff? ⇒ Potential of optimizing it by using UDP instead. ⇒ The analysis, optimization and benchmarks are based on the example implementation.

  • Which TCP features are actually used or required by HotStuff?
  • Will a UDP-based implementation speed up the consensus process even if TCP features

such as retransmissions are omitted?

  • Quantify the tradeoff between UDP-speedup and failed consensus rounds due to poten-

tially higher transmission error rate.

  • Compare measurements to benchmarks of the HotStuff authors and analyze results.

Lucas Mair — BFT Consensus 8

slide-9
SLIDE 9

Related Work

slide-10
SLIDE 10

Related Work

Previous Analysis of UDP and Other Optimization Approaches Previous UDP-based protocol PBFT [3]:

  • Criticized by Chondros et al. [4] regarding packet loss
  • Problematic behaviour of UDP-based client ⇔ replica communication

Comparison of a TCP and UDP implementation by Aublin et al. (RBFT [1]):

  • Similar performance, lower latency when using UDP
  • Identification of cryptographic operations as actual bottleneck

Optimizations based on:

  • Reduction of cryptographic operations and speculative execution (Zyzzyva [5])
  • Usability and versatility, including dynamic reconfiguration (BFT-SMaRt [2])
  • Parallel consensus rounds with multiple leaders (Mir-BFT [6])

Lucas Mair — BFT Consensus 10

slide-11
SLIDE 11

Problem Analysis

slide-12
SLIDE 12

Problem Analysis

Usage of TCP-Features in HotStuff Current state of HotStuff, using TCP-based message transmission:

  • Acknowledgements cause overhead.
  • Ordered data transfer is not necessary.
  • Flow control or congestion control are redundant in a way:

⇒ consensus rounds automatically slow down while waiting for messages. ⇒ message transmission frequency slows down (no bandwidth or payload changes).

  • TCP-handshake signals connection establishment, but causes overhead.
  • Client ⇔ replica communication must stay TCP-based (command order & reliability).
  • Retransmission after message loss is important.

Lucas Mair — BFT Consensus 12

slide-13
SLIDE 13

Problem Analysis

Resulting Changes by using UDP Effect of using UDP-based message transmission in HotStuff:

  • Startup of sockets on receiving side has to be checked separately:

⇒ ICMP Destination Unreachable can serve as indicator

  • Message arrival is only inferred by successful progress of the consensus round.
  • Possible consensus failure due to lost messages without retransmission
  • DTLS requires differentiation of server and client.
  • Custom retransmission may be needed.

Lucas Mair — BFT Consensus 13

slide-14
SLIDE 14

Problem Analysis

Discussion on Retransmission Possible model for custom retransmission:

  • Estimate optimal retransmission timer based on Round Trip Time (RTT).

⇒ send message again if consensus round made no progress after timer expires. Problems:

  • Only leader can properly estimate RTT with current message types.
  • Unnecessary retransmissions possible.

Solutions:

  • Estimate and update RTT timer for replicas using ping mechanism.
  • Implement explicit acknowledgement messages.

Lucas Mair — BFT Consensus 14

slide-15
SLIDE 15

Approach and Implementation

slide-16
SLIDE 16

Approach and Implementation

The libhotstuff library structure

receive commands and send output via ClientNetwork

Replica Replica

= Replica Implementation

communicate via PeerNetwork

salticidae secp256k1 libhotstuff hotstuff_client hotstuff_app

  • ClientNetwork
  • PeerNetwork
  • HotStuff

uses

= Network Implementation = Signature Implementation

Figure 2: Diagram showing the library structure of libhotstuff and communication between client and replicas

Lucas Mair — BFT Consensus 16

slide-17
SLIDE 17

Approach and Implementation

Implementation Approach

  • Create single UDP socket for sending and receiving messages.
  • Use UDP implementation by creating additional functions.

(e.g. _recv_data → _recv_data_udp).

  • Call new functions only for replica ⇔ replica communication, not for client messages.

⇒ modify PeerNetwork class and check for newly introduced enable_tls option

  • Implement DTLS using OpenSSL and assign deterministic server and client roles.
  • No implementation of Retransmission due to time constraints.

Lucas Mair — BFT Consensus 17

slide-18
SLIDE 18

Approach and Implementation

Conduction of Measurements

  • Measurements on physical machines (testbed)

... with 4 and 7 replicas (tolerating 1 and 2 faults, respectively)

  • Benchmarks oriented on original paper for comparison purposes:

measuring throughput(Kops/sec) vs. latency(ms)

  • Different batch sizes (100, 400, 800) with 0/0 payload (request/reply)
  • Different payload sizes (0/0, 128/128, 1024/1024) with batch size 400
  • Robustness benchmark using avg.

time difference of successful consensus round vs. consensus with single leader failure

  • Benchmark with 1% packet loss

Lucas Mair — BFT Consensus 18

slide-19
SLIDE 19

Experiment Evaluation

slide-20
SLIDE 20

Experiment Evaluation

Batchsize Experiments

10 20 30 40 50 time 10 20 30 Kops/sec

(a) Throughput over time

24 26 28 Kops/sec 5 10 15 20 25 latency (ms)

(b) Latency

10 20 30 40 50 time 10 20 30 40 Kops/sec

(c) Throughput over time

28 30 32 34 36 Kops/sec 5 10 15 20 latency (ms)

(d) Latency

Figure 3: TCP (a,b) vs. UDP (c,d) results for a batchsize of 100

Lucas Mair — BFT Consensus 20

  • Throughput: +23%
  • Latency: -17%
slide-21
SLIDE 21

Experiment Evaluation

Batchsize Experiments batchsize: 100 400 800 TCP: throughput: 26 Kops/s 106 Kops/s 146 Kops/s latency: 19.1 ms 18.8 ms 27.4 ms UDP: throughput: 32 Kops/s 120 Kops/s 141 Kops/s latency: 15.7 ms 16.5 ms 28.4 ms throughput percentage differences for UDP: +23% +13% −3%

Table 1: Table displaying the average throughput and latencies for varying batchsizes.

Lucas Mair — BFT Consensus 21

slide-22
SLIDE 22

Experiment Evaluation

Payload Experiments

10 20 30 40 50 time 25 50 75 100 Kops/sec

(a) Throughput over time

92 94 96 98 100 102 Kops/sec 5 10 15 20 25 latency (ms)

(b) Latency

10 20 30 40 50 time 25 50 75 100 Kops/sec

(c) Throughput over time

98 99 100 101 Kops/sec 5 10 15 20 25 latency (ms)

(d) Latency

Figure 4: TCP (a,b) vs. UDP (c,d) results for a payload size of 128 and batchsize of 400

Lucas Mair — BFT Consensus 22

  • Throughput: +2%
  • Latency: -3%
slide-23
SLIDE 23

Experiment Evaluation

Payload Experiments payload size: 0/0 bytes 128/128 bytes 1024/1024 bytes TCP: throughput: 106 Kops/s 99 Kops/s 35 Kops/s latency: 18.8 ms 20.4 ms 57.4 ms UDP: throughput: 120 Kops/s 101 Kops/s 35 Kops/s latency: 16.5 ms 19.9 ms 57.3 ms throughput percentage differences for UDP: +13% +2% 0%

Table 2: Table displaying the average throughput and latencies for varying client request/response payload sizes with a batchsize of 400.

Lucas Mair — BFT Consensus 23

slide-24
SLIDE 24

Experiment Evaluation

Scaling Experiments

10 20 30 40 50 time 20 40 60 80 100 Kops/sec

(a) Throughput over time

70 75 80 85 Kops/sec 20 40 60 latency (ms)

(b) Latency

10 20 30 40 50 time 20 40 60 80 100 Kops/sec

(c) Throughput over time

82 84 86 Kops/sec 10 20 30 latency (ms)

(d) Latency

Figure 5: TCP (a,b) and UDP (c,d) results for batchsize = 400 with n = 7 replicas and payload size of 0/0

Lucas Mair — BFT Consensus 24

  • Throughput: +6%
  • Latency: -7%
slide-25
SLIDE 25

Experiment Evaluation

Scaling Experiments number of replicas: n = 4 n = 7 relative change TCP: throughput: 106 Kops/s 79 Kops/s

  • 26%

latency: 18.8 ms 25.6 ms +36% UDP: throughput: 120 Kops/s 84 Kops/s

  • 30%

latency: 16.5 ms 23.8 ms +44% throughput percentage differences for UDP: +13% +6%

Table 3: Table displaying the average throughput and latencies of tests with differing replica counts. The batchsize is 400, the payload is 0/0.

Lucas Mair — BFT Consensus 25

slide-26
SLIDE 26

Experiment Evaluation

Loss Experiment

10 20 30 40 50 time 20 40 60

  • ps/sec

Figure 6: UDP result for a batchsize of 1 with loss rate 1%

Robustness benchmark: avg. processing overhead of failed round = 4.1 ms. (averaged over 250 samples, measured with 3 second timeout)

Lucas Mair — BFT Consensus 26

  • Very low throughput.
  • Successive failed consensus rounds.
  • Reduction of throughput over time.

⇒ No use for internet deployment in current state.

slide-27
SLIDE 27

Conclusion

slide-28
SLIDE 28

Conclusion

Project Conclusion

  • Impact of using UDP is higher for: low batchsizes, low payload sizes, small number of

replicas.

  • At higher values (batchsize = 800 or payload = 1028/1028) TCP has superior performance

and equal latency.

  • Batchsize affects time required for serialization and cryptographic operations.
  • Payload size affects time required for serialization and hashing.

Lucas Mair — BFT Consensus 28

slide-29
SLIDE 29

Conclusion

Project Conclusion

  • Cryptograpic operations are still a limiting factor for scaling.

⇒ threshold signatures are not implemented yet

  • No direct comparison to benchmarks in the HotStuff paper possible.

⇒ (missing parameter information)

  • Retransmission mechanism required for environments with possible packet loss.

Lucas Mair — BFT Consensus 29

slide-30
SLIDE 30

Questions?

slide-31
SLIDE 31

Backup Slides

slide-32
SLIDE 32

Backup Slides

DTLS Server and Client Assignment Replica 𝑠0 Replica 𝑠

1 1: client 2: server 3: client 0: server 2: client 3: server

Replica 𝑠2

0: client 1: server 3: client

Replica 𝑠3

0: server 1: client 2: server Figure 7: Replica sockets determine their role based on the index and order of replica entries. Each client initiates the DTLS handshake with the corresponding server.

The formula is: (index + created sockets) mod 2 =

client 1 → server

Lucas Mair — BFT Consensus 32

slide-33
SLIDE 33

Backup Slides

Original HotStuff Benchmark Plot

Figure 8: A typical benchmark plot from the HotStuff paper [8].

Lucas Mair — BFT Consensus 33

slide-34
SLIDE 34

Backup Slides

Phases of HotStuff

new-view N1 N3 N4 N2 N1 N3 N4 prepare Prepare-Phase N2 N1 N3 N4 N2 N1 N3 N4 Pre-Commit-Phase Commit-Phase N2 N1 N3 N4 Decide-Phase prepare-vote pre-commit-vote commit-vote pre-commit commit decide

Next-View Interrupt Timeout-Period

New-View(V+1) Next Leader

Figure 9: Diagram showing the phases HotStuff goes through during one consensus round

Lucas Mair — BFT Consensus 34

slide-35
SLIDE 35

Bibliography

[1] P . Aublin, S. B. Mokhtar, and V. Quéma. Rbft: Redundant byzantine fault tolerance. In 2013 IEEE 33rd International Conference on Distributed Computing Systems, pages 297–306, 2013. [2]

  • A. Bessani, J. Sousa, and E. E. P

. Alchieri. State machine replication for the masses with bft-smart. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 355–362, 2014. [3]

  • M. Castro and B. Liskov.

Practical byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, OSDI ’99, page 173–186. USENIX Association, 1999. [4]

  • N. Chondros, K. Kokordelis, and M. Roussopoulos.

On the practicality of practical byzantine fault tolerance. In P . Narasimhan and P . Triantafillou, editors, Middleware 2012, pages 436–455, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. [5]

  • R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong.

Zyzzyva: Speculative byzantine fault tolerance. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, page 45–58. Asso- ciation for Computing Machinery, 2007. [6]

  • C. Stathakopoulou, T. David, and M. Vukoli´

c. Mir-bft: High-throughput bft for blockchains, 2019. [7]

  • M. Yin.

libhotstuff, general-purpose bft state machine replication library, 2019. Github repository: https://github.com/hot-stuff/libhotstuff.

Lucas Mair — BFT Consensus 35

slide-36
SLIDE 36

Bibliography

[8]

  • M. Yin, D. Malkhi, M. K. Reiter, G. G. Gueta, and I. Abraham.

Hotstuff: Bft consensus in the lens of blockchain. arXiv preprint arXiv:1803.05069, v6 July 2019.

Lucas Mair — BFT Consensus 36