Summary of etcd benchmarks Exploring etcd as a potential back-end for - - PowerPoint PPT Presentation

summary of etcd benchmarks
SMART_READER_LITE
LIVE PREVIEW

Summary of etcd benchmarks Exploring etcd as a potential back-end for - - PowerPoint PPT Presentation

Summary of etcd benchmarks Exploring etcd as a potential back-end for the O 2 Configuration module Pascal Boeschoten 1 / 18 Outline Objective Test setup Results Next steps Notes 2 / 18 Objective Evaluate the


slide-1
SLIDE 1

1 / 18

Summary of etcd benchmarks

Pascal Boeschoten

Exploring etcd as a potential back-end for the O2 Configuration module

slide-2
SLIDE 2

2 / 18

Outline

Objective Test setup Results Next steps Notes

slide-3
SLIDE 3

3 / 18

  • Evaluate the performance of etcd as a store for configuration

parameters

– Using representative workloads – In “single-server” and “cluster” modes – With multiple parameter data structures

Objective

slide-4
SLIDE 4

4 / 18

Test setup – Workload

  • At start of run, thousands of processes retrieve parameters

– Large burst of GET requests from clients – High load on server(s) for a short period Metric Value Number of client processes 7k to 70k Parameters per client process 100 Average size per parameter 100 bytes Total configuration data volume 700 MB

Key workload estimations

slide-5
SLIDE 5

5 / 18

Test setup – Parameter organization

  • The structure of the data will vary per client process, thus

various approaches were explored

Name Description Query Fragmented Separate key-value pairs One GET per parameter Blob Key-value pairs stored in single blob One GET for whole blob Flat Key-value pairs grouped in one directory Recursive GET Tree Key-value pairs grouped in nested directories, structured like a binary tree, with five parameters per directory Recursive GET

slide-6
SLIDE 6

6 / 18

Results – Single server

  • Fragmented structure: low performance
  • Others: much better

200000 400000 600000 800000 2 4 6 8 10 12 14 16

Time to pull parameters, fragmented structure 1 server, 43 client hosts, 16 processes per client

Time Total number of parameters Seconds 100 ppc, 7 Mp 200 ppc, 14 Mp 2 4 6 8 10 12

Time to pull parameters, varying structures 1 server, 42 client hosts

Blob Flat Tree Parameters per process, total parameters Seconds

slide-7
SLIDE 7

7 / 18

Results – Cluster

  • With higher server counts, performance quickly improves to

acceptable levels (except for fragmented)

7 Mp, 1 srv 7 Mp, 3 srv 7 Mp, 5 srv 7 Mp, 7 srv 7 Mp, 9 srv 14 Mp, 1 srv 14 Mp, 3 srv 14 Mp, 5 srv 14 Mp, 7 srv 14 Mp, 9 srv 2 4 6 8 10 12 14 16

Time to pull parameters, various data organizations and cluster sizes

70k client processes across 42 hosts

Blob Flat Tree Millions of parameters, server count Seconds

Fragmented: 13.7 s

(other results omitted for chart readability...)

slide-8
SLIDE 8

8 / 18

Results – Cluster

  • Roughly linear relationship between server count and

performance → good!

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8

Time to pull 7M parameters as a function of 1/servers

70k client processes across 42 hosts

Blob Flat Tree Blob (avg/proc) Flat (avg/proc) Tree (avg/proc) 1 / servers Seconds

slide-9
SLIDE 9

9 / 18

Results – JSON overhead

  • Significant JSON response size overhead for flat and tree

– Tree overhead is “exponential” ●

  • A newer (not yet stable) etcd API uses Google Protocol

Buffers for serialization

– The binary format might reduce overhead Parameter structure Usable size (bytes) JSON response size (bytes) Overhead 100 blob 10000 10892 8.92 % 100 flat 10000 18098 81.0 % 100 tree 10000 21214 112 % 200 blob 20000 21796 8.92 % 200 flat 20000 36198 81.0 % 200 tree 20000 44214 121 %

slide-10
SLIDE 10

10 / 18 1 2 3 4 5 10000 20000 30000 40000 50000

Frequency of amount of timeouts per process 9 servers, 42 clients, 70k processes, 250ms timeout

100 Blob 200 Tree Timeouts per process Frequency

Results – Timeouts

  • Connection timeouts occur when server is overloaded

– Harmful to performance with 1 server – Practically disappear with >=3 servers

1 2 3 20000 40000 60000 80000

Frequency of amount of timeouts per process 7M parameters in "200 tree" structure, 42 clients, 70k processes, 10s timeout

1 server 3 servers 5 servers 7 servers 9 servers Timeouts per process Frequency

Of tested structures, 100 blob is “best case”, 200 tree “worst case” (not counting fragmented)

slide-11
SLIDE 11

11 / 18

Results – Resource usage

  • Not ideal, but OK

– Large spikes – Overall underutilization

0.2 0.4 0.6 0.8 1 1.2 1.4 20 40 60 80 100 200 400 600 800 1000

Etcd server resource usage with "200 tree" structure

9 servers, 42 client hosts, 70k processes, 14M parameters

CPU usage Tx MB Rx MB Seconds CPU usage (%) Megabytes

slide-12
SLIDE 12

12 / 18

Results – Resource usage (2)

  • Spiking apparent even during very short runs
  • Cause unknown
  • But is tuning necessary? “Good enough” already?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 20 40 60 80 100 200 400 600 800 1000 1200

Etcd server resource usage with "200 blob" structure

9 servers, 42 client hosts, 70k processes, 14M parameters

CPU usage Tx MB Rx MB Seconds CPU usage (%) Megabytes

slide-13
SLIDE 13

13 / 18

Results – Resource usage (3)

  • Resource usage with fragmented structure:

– Similar CPU, much lower transmission.

1 2 3 4 5 6 7 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50

Etcd server resource usage with "50 fragmented" structure

9 servers, 42 client hosts, 70k processes, 3.5M parameters

CPU usage Tx MB Rx MB Time (seconds) CPU usage (%) Megabytes

slide-14
SLIDE 14

14 / 18

Next steps

  • Investigate a structure in between fragmented and tree

– Multiple recursive GETs per process

  • Multiple etcd instances per physical server

– May improve resource utilization

  • Compare with other back-ends

– Consul, Zookeeper?

slide-15
SLIDE 15

15 / 18

Notes

  • More details available attached to JIRA ticket:

https://alice.its.cern.ch/jira/browse/OCONF-3

slide-16
SLIDE 16

16 / 18

Reference slides

slide-17
SLIDE 17

17 / 18

Machine specifications

Hostname Specifications aidrefpc001/45 2x E5520, 8 threads, 2.26 GHz aidrefsrv10 2x E5-2650 v2, 16 threads, 2.6 GHz aido2qc10/13 aido2qc40/43 aido2web1 2x E5-2640 v3, 16 threads, 2.6 GHz

slide-18
SLIDE 18

18 / 18

JSON snippet

{ "action":"get", "node":{ "key":"/test/tree100", "dir":true, "nodes":[ { "key":"/test/tree100/key3", "value":"value00...03", "modifiedIndex":103, "createdIndex":103 }, { "key":"/test/tree100/key4", "value":"value00...04", "modifiedIndex":104, "createdIndex":104 }, { "key":"/test/tree100/dirA", "dir":true, "nodes":[ { "key":"/test/tree100/dirA/key8", "value":"value00...08", "modifiedIndex":78, "createdIndex":78 }, ...

Formatted snippet of a “200 tree” JSON response