Patterns for High Performance C# Federico Lois Twitter: - - PowerPoint PPT Presentation
Patterns for High Performance C# Federico Lois Twitter: - - PowerPoint PPT Presentation
Patterns for High Performance C# Federico Lois Twitter: @federicolois Github: redknightlois Repo: performance-course The best programs are written so that computing machines can perform them quickly and so that human beings can understand
Patterns for High Performance C#
Federico Lois
Twitter: @federicolois Github: redknightlois Repo: performance-course
c o r v a l i u s . c o m
“The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly.”
Donald Knuth
c o r v a l i u s . c o m
Asymptotic Notation
“Big-O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.” Bachmann–Landau notation
c o r v a l i u s . c o m
BigO notation
- Instruction Counting. (Turing model)
- Simple, effective for most problems.
- Cache-Oblivious (based on RAM model)
- Incorporates a simple cache to the model.
- Doesn’t explicitly model it’s size.
- It can include the tall-cache assumption.
- Cache-Aware (based on RAM model)
- It models explicitely size, structuction, eviction policy.
- Theoretical analysis is pretty complex.
c o r v a l i u s . c o m
Big O in practice
- Useful to evaluate general behavior.
- Not necessarily a deal-breaker
- Guides your hypothesis.
- Usually will not represent the behavior
- Until sizes are big enough to dominate
- Which may never happen
- Simple models add uncertainty.
- Our job is to adjust those variables.
c o r v a l i u s . c o m
Performance Bounds
- Compute Bound
- Memory Bound.
- Input/Output Bound
c o r v a l i u s . c o m
20% of the code consumes 80% of the resources
Pareto Rule (80-20)
…especially bad when they are in the critical path
c o r v a l i u s . c o m
Pareto
20% of the code consumes 80% of the CPU/Memory/IO
c o r v a l i u s . c o m
Pareto
20% of the code consumes 80% of the resources
c o r v a l i u s . c o m
20% of the 20% of the code consumes 64% of the resources
Pareto
2
…around of 4% of the code.
c o r v a l i u s . c o m
20% of the 20% of the 20% of the code consumes 51% of the resources
Pareto
3
…roughly 0,8% of the code.
c o r v a l i u s . c o m
Pareto
Architecture/Network/Algorithm
Optimization Land
c o r v a l i u s . c o m
Pareto
- Choosing the wrong algorithm/data structure.
- Systems outgrowing design parameters.
- Chatty network interfaces: nano-services
- Physical (and not so physical) distance.
- CPU is doing nothing, nichts, nada!
c o r v a l i u s . c o m
Pareto
2
Algorithm Time Optimization Land
c o r v a l i u s . c o m
Pareto
2
- Doing things more than once.
- CPU is doing stuff, just nothing useful (for you)!
- Memory pressure on GC or allocators
- Thread state hand-off
- Using data structures wrong
- I’m watching at you int.GetHashCode() & long.GetHashCode()
c o r v a l i u s . c o m
Pareto
3
Micro Optimization Land
c o r v a l i u s . c o m
Pareto
3
Voodoo Land
c o r v a l i u s . c o m
Pareto
3
... function calls will hurt you ... code alignment will hurt you ... useless instructions will hurt you You get the idea … false sharing will hurt you … cache line pollution will hurt you … memory layout will hurt you …loop size in bytes will hurt you
c o r v a l i u s . c o m
Secret Sauce for High Performance (???)
- Adopt laziness as a way of life.
- Why do things twice when you can do them once.
- Choose the right data structures/algorithms
- Avoid being chatty over the network (aka IO)
- Design for no less than 20x your expected requirements
- Diminish allocations (like the plague)
Measure, Measure and when you are sure,
Measure Again!!
(just in case, you know!)
The End!
…of not talking about C#
High Performance C#
(even though most would apply to other platforms / frameworks / languages out there…)
c o r v a l i u s . c o m
IF-Switch
c o r v a l i u s . c o m
IF-Switch
- If you know the statistical distribution
- IF tends to be more efficient, except when
- You face a uniform distribution.
- You face a non tail distribution.
- Switch builds a perfect hash
- Unless values are consecutive.
c o r v a l i u s . c o m
Try-Catch
c o r v a l i u s . c o m
Try-Catch
CanThrow
c o r v a l i u s . c o m
Try-Catch
c o r v a l i u s . c o m
Try-Catch
Without Try-Catch
c o r v a l i u s . c o m
Interfaces vs Class vs Struct
- Stack Allocation vs Heap Allocation
- At least in C#, etc.
- Accessing an struct via interface will allocate on the Heap.
- Aka Boxing
- Struct is subject to special optimization.
- You can abuse the Dead Code Elimination mechanism to do
simple metaprogramming.
Allocations
- Pooling
- Generalized
- Contextual
- Per operation
- Stack Allocations
- Fixed
- Structs
- Ref/Out Trick
- Ref Return (C# 7)
Allocations
c o r v a l i u s . c o m
Inlining
- Compilers do it, and allow you to suggest them targets
- Avoiding the call helps you because it diminishes:
- Instruction Cache Misses
- Push/Pop en el stack
- Number of retired instructions
- Call context changes at the processor
- Avoiding the call increments
- Caller size in bytes.
- Locality of reference
- Collateral effects (among others)
- Dead code elimination
- Contant propagation
c o r v a l i u s . c o m
Inlining – Call Cost
c o r v a l i u s . c o m
Inlining – Call Cost
c o r v a l i u s . c o m
Inlining - Virtual Calls
- They cant be removed without devirtualization.
- The cost of a virtual call is higher than static ones.
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Simple Metaprogramming
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Code Flow
c o r v a l i u s . c o m
c o r v a l i u s . c o m
But is it really all this work worth the trouble?
c o r v a l i u s . c o m