Myths and Folklore Martin Thompson - @mjpt777 Top Performance - - PowerPoint PPT Presentation
Myths and Folklore Martin Thompson - @mjpt777 Top Performance - - PowerPoint PPT Presentation
Top Performance Myths and Folklore Martin Thompson - @mjpt777 Top Performance Myths and Folklore Martin Thompson - @mjpt777 Top 10 Performance Mistakes Martin Thompson - @mjpt777 10 Not Upgrading 9 Duplicated Work Database Tuning?
Top Performance Myths and Folklore
Martin Thompson - @mjpt777
Top 10 Performance Mistakes
Martin Thompson - @mjpt777
10
Not Upgrading
9
Duplicated Work
Database Tuning?
Where is the real issue?
8
Data Dependent Loads
Aka “Pointer Chasing”
Are all memory
- perations equal?
Sequential Access
- Average time in ns/op to sum all
longs in a 1GB array?
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op
~1 ns/op
Really??? Less than 1ns per operation?
Random walk per OS Page
- Average time in ns/op to sum all
longs in a 1GB array?
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op testRandomPage avgt 2.703 ± 0.025 ns/op
~3 ns/op
Data dependant walk per OS Page
- Average time in ns/op to sum all
longs in a 1GB array?
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op testRandomPage avgt 2.703 ± 0.025 ns/op testDependentRandomPage avgt 7.102 ± 0.326 ns/op
~7 ns/op
Random heap walk
- Average time in ns/op to sum all
longs in a 1GB array?
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op testRandomPage avgt 2.703 ± 0.025 ns/op testDependentRandomPage avgt 7.102 ± 0.326 ns/op testRandomHeap avgt 19.896 ± 3.110 ns/op
~20 ns/op
Data dependant heap walk
- Average time in ns/op to sum all
longs in a 1GB array?
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op testRandomPage avgt 2.703 ± 0.025 ns/op testDependentRandomPage avgt 7.102 ± 0.326 ns/op testRandomHeap avgt 19.896 ± 3.110 ns/op testDependentRandomHeap avgt 89.516 ± 4.573 ns/op
~90 ns/op
Need to ADD 40+ ns/op for NUMA access on a server!!!
Access Pattern Benchmark
Benchmark Mode Score Error Units testSequential avgt 0.832 ± 0.006 ns/op testRandomPage avgt 2.703 ± 0.025 ns/op testDependentRandomPage avgt 7.102 ± 0.326 ns/op testRandomHeap avgt 19.896 ± 3.110 ns/op testDependentRandomHeap avgt 89.516 ± 4.573 ns/op
What does this mean for data structures?
Buckets
1 EUR/USD Hash Buckets Key Value Next
1 EUR/USD Hash Buckets 2 GBP/EUR Key Value Next Hash Key Value Next
1 EUR/USD Hash Buckets 2 GBP/EUR Key Value Next Hash Key Value Next 3 GBP/USD Hash Key Value Next
Buckets Key Value Hash Next
1 EUR/USD 4
- 1
Buckets Key Value Hash Next
1 1 EUR/USD 4
- 1
Buckets Key Value Hash Next 2 GBP/EUR 2
- 1
1 1 EUR/USD 4 2 Buckets Key Value Hash Next 2 GBP/EUR 2
- 1
3 GBP/USD 4
- 1
.net Dictionary is >10X faster than HashMap for 2+ GB of data
Understand object relationships and then choose appropriate data structures
Java desperately needs Value Types on the stack and Aggregates on the heap
Data Structures are becoming evermore important again!
7
Too Much Allocation
“Allocation is free…”
Reclamation is NOT free!
Remember Data Dependent Loads?
Too much allocation or copying will wash out your cache
6
Going Parallel
http://www.frankmcsherry.org/assets/COST.pdf
Amdahl’s Law
2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024
Speedup Processors
Amdahl
Universal Scalability Law
2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024
Speedup Processors
Amdahl USL
Universal Scalability Law
C(N) = N / (1 + α(N – 1) + ((β* N) * (N – 1)))
C = capacity or throughput N = number of processors α = contention penalty β = coherence penalty
Shared mutable state is Evil!
“You can have a second computer once you’ve shown you know how to use the first one” – Paul Barham
“You can have a second CPU
- nce you’ve shown you know
how to use the first one” – Martin Thompson
5
Not Understanding TCP
TCP – Sequenced Flow 1
Client Server
TCP – Sequenced Flow 1
Client Server
SYN
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK ACK
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK ACK Data == MSS
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK ACK Data == MSS
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK ACK Data == MSS Delayed ACK
TCP – Sequenced Flow 1
Client Server
SYN SYN, ACK ACK Data == MSS Delayed ACK Data < MSS
TCP – Sequenced Flow – TCP_NODELAY
Client Server
SYN SYN, ACK ACK
TCP – Sequenced Flow – TCP_NODELAY
Client Server
SYN SYN, ACK ACK Data == MSS
TCP – Sequenced Flow – TCP_NODELAY
Client Server
SYN SYN, ACK ACK Data == MSS Data < MSS
TCP – Sequenced Flow – TCP_NODELAY
Client Server
SYN SYN, ACK ACK Data == MSS ACK Data < MSS
4
Synchronous Communications
Client Server
Client Server
Client Server
Client Server
Client Server
Client Server
Client Server
Asynchronous Communications
Client Server
Client Server
Client Server
Client Server
Client Server
Client Server
Client Server
Synchronous Communications is the crystal meth
- f distributed computing
3
Text Encoding
“But it’s human readable...”
“Binary is hard to work with...”
while (i >= 0) { int remainder = quotient % 10; quotient = quotient / 10; results[i--] = (byte)('0' + remainder); }
Communications Battery life and bandwidth?
2
API Design
public void characters( char[] ch, int start, int length) throws SAXException
public void characters( char[] ch, int start, int length) throws SAXException public void startElement( String uri, String localName, String qName, Attributes atts) throws SAXException
API Design can be composed for usability vs performance trade offs
public String[] split(String regex)
public String[] split(String regex) public Iterable<String> split(String regex)
public String[] split(String regex) public Iterable<String> split(String regex) public void split( String regex, Collection<String> dst)
selector.selectNow(); Set<SelectionKey> selectedKeys = selector.selectedKeys(); Iterator<SelectionKey> iter = selectedKeys.iterator(); while (iter.hasNext()) { SelectionKey key = iter.next(); if (key.isReadable()) { key.attachment(); // do work } iter.remove(); }
selector.selectNow(); Set<SelectionKey> selectedKeys = selector.selectedKeys(); Iterator<SelectionKey> iter = selectedKeys.iterator(); while (iter.hasNext()) { SelectionKey key = iter.next(); if (key.isReadable()) { key.attachment(); // do work } iter.remove(); }
// Keep and reuse List<SelectionKey> keys = new ArrayList<>(); selector.selectNow(keys, READABLE); keys.forEach(keyHandler);
1
20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 1 2 3 4 5 6 7 8 Time (nanoseconds)
Average (Mean) Logging Duration
20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 1 2 3 4 5 6 7 8 Time (nanoseconds)
Why do we Log?
Recording Events Recording Errors Instrumentation Debugging
Recording Events
Big Data
Recording Errors
public class DistinctErrorLog { public void record(Throwable observation)
Instrumentation
systemCounters.get(FAILED_LOGIN).increment();
Debugging
Byte Buddy
In Closing…
Where are you spending you Computing Resource Budget?
Run a profiler regularly!!!
Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777 “Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction.”
- Albert Einstein