Patterns for High Performance C# Federico Lois Twitter: - - PowerPoint PPT Presentation

patterns for high
SMART_READER_LITE
LIVE PREVIEW

Patterns for High Performance C# Federico Lois Twitter: - - PowerPoint PPT Presentation

Patterns for High Performance C# Federico Lois Twitter: @federicolois Github: redknightlois Repo: performance-course The best programs are written so that computing machines can perform them quickly and so that human beings can understand


slide-1
SLIDE 1
slide-2
SLIDE 2

Patterns for High Performance C#

Federico Lois

Twitter: @federicolois Github: redknightlois Repo: performance-course

slide-3
SLIDE 3

c o r v a l i u s . c o m

“The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly.”

Donald Knuth

slide-4
SLIDE 4

c o r v a l i u s . c o m

Asymptotic Notation

“Big-O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.” Bachmann–Landau notation

slide-5
SLIDE 5

c o r v a l i u s . c o m

BigO notation

  • Instruction Counting. (Turing model)
  • Simple, effective for most problems.
  • Cache-Oblivious (based on RAM model)
  • Incorporates a simple cache to the model.
  • Doesn’t explicitly model it’s size.
  • It can include the tall-cache assumption.
  • Cache-Aware (based on RAM model)
  • It models explicitely size, structuction, eviction policy.
  • Theoretical analysis is pretty complex.
slide-6
SLIDE 6

c o r v a l i u s . c o m

Big O in practice

  • Useful to evaluate general behavior.
  • Not necessarily a deal-breaker
  • Guides your hypothesis.
  • Usually will not represent the behavior
  • Until sizes are big enough to dominate
  • Which may never happen
  • Simple models add uncertainty.
  • Our job is to adjust those variables.
slide-7
SLIDE 7

c o r v a l i u s . c o m

Performance Bounds

  • Compute Bound
  • Memory Bound.
  • Input/Output Bound
slide-8
SLIDE 8

c o r v a l i u s . c o m

20% of the code consumes 80% of the resources

Pareto Rule (80-20)

…especially bad when they are in the critical path

slide-9
SLIDE 9

c o r v a l i u s . c o m

Pareto

20% of the code consumes 80% of the CPU/Memory/IO

slide-10
SLIDE 10

c o r v a l i u s . c o m

Pareto

20% of the code consumes 80% of the resources

slide-11
SLIDE 11

c o r v a l i u s . c o m

20% of the 20% of the code consumes 64% of the resources

Pareto

2

…around of 4% of the code.

slide-12
SLIDE 12

c o r v a l i u s . c o m

20% of the 20% of the 20% of the code consumes 51% of the resources

Pareto

3

…roughly 0,8% of the code.

slide-13
SLIDE 13

c o r v a l i u s . c o m

Pareto

Architecture/Network/Algorithm

Optimization Land

slide-14
SLIDE 14

c o r v a l i u s . c o m

Pareto

  • Choosing the wrong algorithm/data structure.
  • Systems outgrowing design parameters.
  • Chatty network interfaces: nano-services
  • Physical (and not so physical) distance.
  • CPU is doing nothing, nichts, nada!
slide-15
SLIDE 15

c o r v a l i u s . c o m

Pareto

2

Algorithm Time Optimization Land

slide-16
SLIDE 16

c o r v a l i u s . c o m

Pareto

2

  • Doing things more than once.
  • CPU is doing stuff, just nothing useful (for you)!
  • Memory pressure on GC or allocators
  • Thread state hand-off
  • Using data structures wrong
  • I’m watching at you int.GetHashCode() & long.GetHashCode()
slide-17
SLIDE 17

c o r v a l i u s . c o m

Pareto

3

Micro Optimization Land

slide-18
SLIDE 18

c o r v a l i u s . c o m

Pareto

3

Voodoo Land

slide-19
SLIDE 19

c o r v a l i u s . c o m

Pareto

3

... function calls will hurt you ... code alignment will hurt you ... useless instructions will hurt you You get the idea  … false sharing will hurt you … cache line pollution will hurt you … memory layout will hurt you …loop size in bytes will hurt you

slide-20
SLIDE 20

c o r v a l i u s . c o m

Secret Sauce for High Performance (???)

  • Adopt laziness as a way of life.
  • Why do things twice when you can do them once.
  • Choose the right data structures/algorithms
  • Avoid being chatty over the network (aka IO)
  • Design for no less than 20x your expected requirements
  • Diminish allocations (like the plague)
slide-21
SLIDE 21

Measure, Measure and when you are sure,

Measure Again!!

(just in case, you know!)

slide-22
SLIDE 22

The End!

…of not talking about C#

slide-23
SLIDE 23

High Performance C#

(even though most would apply to other platforms / frameworks / languages out there…)

slide-24
SLIDE 24

c o r v a l i u s . c o m

IF-Switch

slide-25
SLIDE 25

c o r v a l i u s . c o m

IF-Switch

  • If you know the statistical distribution
  • IF tends to be more efficient, except when
  • You face a uniform distribution.
  • You face a non tail distribution.
  • Switch builds a perfect hash
  • Unless values are consecutive.
slide-26
SLIDE 26

c o r v a l i u s . c o m

Try-Catch

slide-27
SLIDE 27

c o r v a l i u s . c o m

Try-Catch

CanThrow

slide-28
SLIDE 28

c o r v a l i u s . c o m

Try-Catch

slide-29
SLIDE 29

c o r v a l i u s . c o m

Try-Catch

Without Try-Catch

slide-30
SLIDE 30

c o r v a l i u s . c o m

Interfaces vs Class vs Struct

  • Stack Allocation vs Heap Allocation
  • At least in C#, etc.
  • Accessing an struct via interface will allocate on the Heap.
  • Aka Boxing
  • Struct is subject to special optimization.
  • You can abuse the Dead Code Elimination mechanism to do

simple metaprogramming.

slide-31
SLIDE 31

Allocations

  • Pooling
  • Generalized
  • Contextual
  • Per operation
  • Stack Allocations
  • Fixed
  • Structs
  • Ref/Out Trick
  • Ref Return (C# 7)
slide-32
SLIDE 32

Allocations

slide-33
SLIDE 33

c o r v a l i u s . c o m

Inlining

  • Compilers do it, and allow you to suggest them targets
  • Avoiding the call helps you because it diminishes:
  • Instruction Cache Misses
  • Push/Pop en el stack
  • Number of retired instructions
  • Call context changes at the processor
  • Avoiding the call increments
  • Caller size in bytes.
  • Locality of reference
  • Collateral effects (among others)
  • Dead code elimination
  • Contant propagation
slide-34
SLIDE 34

c o r v a l i u s . c o m

Inlining – Call Cost

slide-35
SLIDE 35

c o r v a l i u s . c o m

Inlining – Call Cost

slide-36
SLIDE 36

c o r v a l i u s . c o m

Inlining - Virtual Calls

  • They cant be removed without devirtualization.
  • The cost of a virtual call is higher than static ones.
slide-37
SLIDE 37

c o r v a l i u s . c o m

Constant Propagation

slide-38
SLIDE 38

c o r v a l i u s . c o m

Constant Propagation

slide-39
SLIDE 39

c o r v a l i u s . c o m

Constant Propagation

slide-40
SLIDE 40

c o r v a l i u s . c o m

Simple Metaprogramming

slide-41
SLIDE 41

c o r v a l i u s . c o m

Comparers

slide-42
SLIDE 42

c o r v a l i u s . c o m

Comparers

slide-43
SLIDE 43

c o r v a l i u s . c o m

Comparers

slide-44
SLIDE 44

c o r v a l i u s . c o m

Code Flow

slide-45
SLIDE 45

c o r v a l i u s . c o m

slide-46
SLIDE 46

c o r v a l i u s . c o m

slide-47
SLIDE 47

But is it really all this work worth the trouble?

slide-48
SLIDE 48

c o r v a l i u s . c o m

slide-49
SLIDE 49

Thanks for coming!