Measuring Performance on OpenBSD Alexander Bluhm bluhm@openbsd.org - - PowerPoint PPT Presentation

measuring performance on openbsd
SMART_READER_LITE
LIVE PREVIEW

Measuring Performance on OpenBSD Alexander Bluhm bluhm@openbsd.org - - PowerPoint PPT Presentation

What did exist before? How does it work? What are the findings? What is the Conclusion? Measuring Performance on OpenBSD Alexander Bluhm bluhm@openbsd.org EuroBSDCon, September 2019 What did exist before? How does it work? What are the


slide-1
SLIDE 1

What did exist before? How does it work? What are the findings? What is the Conclusion?

Measuring Performance on OpenBSD

Alexander Bluhm

bluhm@openbsd.org

EuroBSDCon, September 2019

slide-2
SLIDE 2

What did exist before? How does it work? What are the findings? What is the Conclusion?

Agenda

1

What did exist before?

2

How does it work?

3

What are the findings?

4

What is the Conclusion?

slide-3
SLIDE 3

What did exist before? How does it work? What are the findings? What is the Conclusion?

genua Firewall Testbed HPF

slide-4
SLIDE 4

What did exist before? How does it work? What are the findings? What is the Conclusion?

Multi User, Multi Purpose Hardware Setup

Source 1 Source 2 . . . Source n Switch Drain 1 Drain 2 . . . Drain n 1 Gbit 1 Gbit Target 1 Target 2 . . . Target n 10 Gbit 10 Gbit

slide-5
SLIDE 5

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Hardware Design

Target Target 10 Gbit

slide-6
SLIDE 6

What did exist before? How does it work? What are the findings? What is the Conclusion?

Existing Regression Tests

slide-7
SLIDE 7

What did exist before? How does it work? What are the findings? What is the Conclusion?

Regression History for i386

history severity

slide-8
SLIDE 8

What did exist before? How does it work? What are the findings? What is the Conclusion?

Agenda

1

What did exist before?

2

How does it work?

3

What are the findings?

4

What is the Conclusion?

slide-9
SLIDE 9

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Goals

history reproducable details available drill down automatic

slide-10
SLIDE 10

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance History

install release checkout at date compile kernel run tests advance by step collect results

slide-11
SLIDE 11

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Tests Overview

run history run detail test command run date install version cvs checkout steps

slide-12
SLIDE 12

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Run at Date

run and install log average numbers unstable results checkout date kernel commits build quirks

slide-13
SLIDE 13

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Repeat at CVS Checkout

reboot log single result repeat count

  • utlier

standard deviation

slide-14
SLIDE 14

What did exist before? How does it work? What are the findings? What is the Conclusion?

Weekly from 6.2 to 6.3

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2017-11-01 2017-12-01 2018-01-01 2018-02-01 2018-03-01 bits/sec Checkout (date) TCP Performance A B C D E F G H I J

slide-15
SLIDE 15

What did exist before? How does it work? What are the findings? What is the Conclusion?

Quirks from 6.2 to 6.3

A OpenBSD/amd64 6.2 release B fix cvs vendor branch checkout C clang update LLVM to 5.0.0 D pfctl pf packet rate matching E move kernel source file dwiic.c F pfctl pf divert type G sysctl struct vfsconf H clang update LLVM to 5.0.1 I pfctl pf syncookies J OpenBSD/amd64 6.3 release

slide-16
SLIDE 16

What did exist before? How does it work? What are the findings? What is the Conclusion?

Build Quirks

checkout at date compile kernel run tests advance by step checkout userland build toolchain

slide-17
SLIDE 17

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Hardware

Target 1 Target 1 Target 2 Target 2 Linux

slide-18
SLIDE 18

What did exist before? How does it work? What are the findings? What is the Conclusion?

Performance Master

web master console local target remote target publish control configure and compile install release

slide-19
SLIDE 19

What did exist before? How does it work? What are the findings? What is the Conclusion?

Agenda

1

What did exist before?

2

How does it work?

3

What are the findings?

4

What is the Conclusion?

slide-20
SLIDE 20

What did exist before? How does it work? What are the findings? What is the Conclusion?

Drilldown from Week to Days

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2017-12-28 2017-12-30 2018-01-01 2018-01-03 2018-01-05 2018-01-07 2018-01-09 bits/sec Checkout (date) TCP Performance

slide-21
SLIDE 21

What did exist before? How does it work? What are the findings? What is the Conclusion?

Reproduce and Reboot

compile kernel run tests run again reboot machine relink kernel

slide-22
SLIDE 22

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Keep Machine Running

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-23
SLIDE 23

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Reboot Machine

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-24
SLIDE 24

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Link and Reorder Kernel

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-25
SLIDE 25

What did exist before? How does it work? What are the findings? What is the Conclusion?

KARL Kernel Address Randomized Link

locore0 gap

  • bj3
  • bj4
  • bj1
  • bj2

start() boot main() shuffle random unmap

slide-26
SLIDE 26

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Sort Objects, Fixed Gap

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-27
SLIDE 27

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Sort Objects, Random Gap

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-28
SLIDE 28

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.5, 1 Day, 5 Tests, Align Sorted Objects, Fixed Gap

3 × 109 3.1 × 109 3.2 × 109 3.3 × 109 3.4 × 109 3.5 × 109 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-01 2019-08-02 bits/sec Checkout (date) TCP Performance

tpmr

  • cteon

unveil

slide-29
SLIDE 29

What did exist before? How does it work? What are the findings? What is the Conclusion?

Kernel Symbol Table

nm bsd, diff, diffstat sort align unveil +169 +13

  • 169
  • 13

tpmr +25997 +28731

  • 25983
  • 28717
slide-30
SLIDE 30

What did exist before? How does it work? What are the findings? What is the Conclusion?

6.4, 15 Days, 5 Tests, 2 CPU Sockets, Keep running

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2018-10-12 2018-10-14 2018-10-16 2018-10-18 2018-10-20 2018-10-22 2018-10-24 bits/sec Checkout (date) TCP Performance Y Z ab c

slide-31
SLIDE 31

What did exist before? How does it work? What are the findings? What is the Conclusion?

2 CPU Sockets, Repeat, Keep running

second cycle

slide-32
SLIDE 32

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.2 to 6.3, 173 Days, Reorder

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2017-11-01 2017-12-01 2018-01-01 2018-02-01 2018-03-01 bits/sec Checkout (date) TCP Performance A B C D E F G H I J

tx mitigation

slide-33
SLIDE 33

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.2 to 6.3, 173 Days, Make Kernel

100 200 300 400 500 2017-11-01 2017-12-01 2018-01-01 2018-02-01 2018-03-01 sec Checkout (date) MAKE Performance A B C D E F G H I J

clang 5.0.0 clang 5.0.1 Meltdown

slide-34
SLIDE 34

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.3 to 6.4, 202 Days, Reorder

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2018-04-01 2018-05-01 2018-06-01 2018-07-01 2018-08-01 2018-09-01 2018-10-01 bits/sec Checkout (date) TCP Performance K L M N O PQ RS TU V W X Y

retpoline witness retguard witness

slide-35
SLIDE 35

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.3 to 6.4, 202 Days, Reorder

100 200 300 400 500 2018-04-01 2018-05-01 2018-06-01 2018-07-01 2018-08-01 2018-09-01 2018-10-01 sec Checkout (date) MAKE Performance KL M N O PQ RS TU V W X Y

clang 6.0.0 DRM retpoline witness retguard witness time

slide-36
SLIDE 36

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.4 to 6.5, 185 Days, Reorder

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2018-11-01 2018-12-01 2019-01-01 2019-02-01 2019-03-01 2019-04-01 bits/sec Checkout (date) TCP Performance Y Z a bc d e f g h i j k l

save args

slide-37
SLIDE 37

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.4 to 6.5, 185 Days, Make Kernel

100 200 300 400 500 2018-11-01 2018-12-01 2019-01-01 2019-02-01 2019-03-01 2019-04-01 sec Checkout (date) MAKE Performance Y Z a bc d e f g h i j k l

llvm ld clang 7.0.1 libllvm stack protector

slide-38
SLIDE 38

What did exist before? How does it work? What are the findings? What is the Conclusion?

from 6.5, 154 Days, Align

5 × 108 1 × 109 1.5 × 109 2 × 109 2.5 × 109 3 × 109 3.5 × 109 4 × 109 4.5 × 109 2019-05-01 2019-06-01 2019-07-01 2019-08-01 2019-09-01 bits/sec Checkout (date) TCP Performance m

tx mitigation checksum

slide-39
SLIDE 39

What did exist before? How does it work? What are the findings? What is the Conclusion?

OpenBSD CVS Log

OpenBSD cvs log OpenBSD cvs log

created created 2019-04-20T18:30:24Z begin begin 2019-04-16T00:00:00Z end end 2019-04-17T00:00:00Z path path src/sys commits commits 8 date date 2019-04-16T04:04:19Z author author dlg files files src/sys/net/if.c log diff annotate src/sys/net/if_var.h log diff annotate src/sys/net/ifq.c log diff annotate src/sys/net/ifq.h log diff annotate message message have another go at tx mitigation the idea is to call the hardware transmit routine less since in a lot of cases posting a producer ring update to the chip is (very)

  • expensive. it's better to do it for several packets instead of each

packet, hence calling this tx mitigation. this diff defers the call to the transmit routine to a network taskq, or until a backlog of packets has built up. dragonflybsd uses 16 as the size of it's backlog, so i'm copying them for now. i've tried this before, but previous versions caused deadlocks. i discovered that the deadlocks in the previous version was from ifq_barrier calling taskq_barrier against the nettq. interfaces generally hold NET_LOCK while calling ifq_barrier, but the tq might already be waiting for the lock we hold. this version just doesnt have ifq_barrier call taskq_barrier. it instead relies on the IFF_RUNNING flag and normal ifq serialiser barrier to guarantee the start routine wont be called when an interface is going down. the taskq_barrier is only used during interface destruction to make sure the task struct wont get used in the future, which is already done without the NET_LOCK being held. tx mitigation provides a nice performanace bump in some setups. up to 25% in some cases. tested by tb@ and hrvoje popovski (who's running this in production).

  • k visa@

date date 2019-04-16T09:40:21Z author author dlg files files src/sys/dev/pci/if_myx.c log diff annotate message message make sff page reads work on little endian archs too. like amd64. some modules seem to need more time when waiting for bytes while here. hrvoje popovski hit the endian issue date date 2019-04-16T11:42:56Z

slide-40
SLIDE 40

What did exist before? How does it work? What are the findings? What is the Conclusion?

UDP Throughput, from 6.5, 154 Days, Align

5 × 108 1 × 109 1.5 × 109 2 × 109 2019-05-01 2019-06-01 2019-07-01 2019-08-01 2019-09-01 bits/sec Checkout (date) UDP Performance m

tx mitigation MDS TSC TSC checksum

slide-41
SLIDE 41

What did exist before? How does it work? What are the findings? What is the Conclusion?

UDP and Timecounter

iperf3 timecounter UDP Mbits tsc 924 acpihpet0 739 acpitimer0 395 i8254 306

slide-42
SLIDE 42

What did exist before? How does it work? What are the findings? What is the Conclusion?

iperf3 UDP

send packet in iperf3 loop 1 write 2 gettimeofday 1 select 2 gettimeofday

slide-43
SLIDE 43

What did exist before? How does it work? What are the findings? What is the Conclusion?

Agenda

1

What did exist before?

2

How does it work?

3

What are the findings?

4

What is the Conclusion?

slide-44
SLIDE 44

What did exist before? How does it work? What are the findings? What is the Conclusion?

Insights

measuring sucks multi socket CPUs suck reproducing is hard do not trust your numbers keep it stupid simple

slide-45
SLIDE 45

What did exist before? How does it work? What are the findings? What is the Conclusion?

Future Ideas

forwarding throughput Linux client and server testing patches historic releases file system performance

slide-46
SLIDE 46

What did exist before? How does it work? What are the findings? What is the Conclusion?

Thanks

Jan Klemkow for Hardware Administration Moritz Buhl for Visualization genua for Hosting and Worktime

slide-47
SLIDE 47

What did exist before? How does it work? What are the findings? What is the Conclusion?

Links

http://bluhm.genua.de/ http://bluhm.genua.de/regress/results/regress.html http://bluhm.genua.de/perform/results/perform.html http://bluhm.genua.de/perform/results/gnuplot/test.data https://github.com/bluhm/regress-all https://github.com/bluhm/udpbench https://github.com/younix/testmaster https://github.com/bluhm/talk-perform

slide-48
SLIDE 48

What did exist before? How does it work? What are the findings? What is the Conclusion?

Questions