Production Profiling: What, Why and How Richard Warburton - - PowerPoint PPT Presentation

production profiling what why and how
SMART_READER_LITE
LIVE PREVIEW

Production Profiling: What, Why and How Richard Warburton - - PowerPoint PPT Presentation

Production Profiling: What, Why and How Richard Warburton (@richardwarburto) Sadiq Jaffer (@sadiqj) https://www.opsian.com Why Performance Matters Development isnt Production Profiling vs Monitoring Production Profiling Conclusion


slide-1
SLIDE 1

Production Profiling: What, Why and How

Richard Warburton (@richardwarburto) Sadiq Jaffer (@sadiqj) https://www.opsian.com

slide-2
SLIDE 2

Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

slide-3
SLIDE 3

Customer Experience

slide-4
SLIDE 4

Amazon: 100ms of latency costs 1% of sales Google: 500ms seconds in search page generation time drops traffic by 20%

Responsive Applications make more Money

slide-5
SLIDE 5

Stop Costly Downtime

slide-6
SLIDE 6

Reduce Costs

slide-7
SLIDE 7

Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

slide-8
SLIDE 8

Development isn’t Production

Performance testing in development can be easier May not have access to production Tooling often desktop-based Not representative of production

slide-9
SLIDE 9

Unrepresentative Hardware

vs

slide-10
SLIDE 10

Unrepresentative Software

slide-11
SLIDE 11

Unrepresentative Workloads

vs

slide-12
SLIDE 12

The JVM may have very different behaviour in production

Hotspot does adaptive optimisation Production may optimise differently

slide-13
SLIDE 13
slide-14
SLIDE 14

Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

slide-15
SLIDE 15

Ambient/Passive/System Metrics

Preconfigured numerical measure about the system CPU Time Usage / Page-load Times Cheap and sometimes effective

slide-16
SLIDE 16

Logging

Records arbitrary events emitted by the system being monitored log4j/slf4j/logback Logs of GC events Often manual, aids system understanding, expensive

slide-17
SLIDE 17

Coarse Grained Instrumentation

Measures time within some instrumented section of the code Time spent inside the controller layer of your web-app or performing SQL queries More detailed and actionable though expensive

slide-18
SLIDE 18

Production Profiling

What methods use up CPU time? What lines of code allocate the most objects? Where are your CPU Cache misses coming from? Automatic, can be cheap but often isn’t

slide-19
SLIDE 19

Where Instrumentation can be blind in the Real World

Problem: Every 5 seconds an HTTP endpoint would be really slow. Instrumentation: on the servlet request, didn’t even show the pause! Cause: Tomcat expired its resources cache every 5 seconds, on load one resource scanned the entire classpath

slide-20
SLIDE 20
slide-21
SLIDE 21

Surely a better way?

Not just Metrics - Actionable Insights Diagnostics aren’t Diagnosis What about Profiling?

slide-22
SLIDE 22

Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

slide-23
SLIDE 23

How to use Production Profilers

1) Extract relevant time period and apps/machines 2) Choose a type of profile: CPU Time/Wallclock Time/Memory 3) View results to tell you what the dominant consumer of a resource is 4) Fix biggest bottleneck 5) Deploy / Iterate

slide-24
SLIDE 24

CPU Time vs Wallclock Time

slide-25
SLIDE 25

Profiling Hotspots

slide-26
SLIDE 26

Profiling Treeviews

slide-27
SLIDE 27

Profiling Flamegraphs

slide-28
SLIDE 28

Instrumenting Profilers

Add instructions to collect timings (Eg: JVisualVM Profiler) Inaccurate - modifies the behaviour of the program High Overhead - > 2x slower

slide-29
SLIDE 29

Sampling/Statistical Profilers

WebServerThread.run() Controller.doSomething() Controller.next() Repo.readPerson() new Person() View.printHtml() ??? ???

slide-30
SLIDE 30

Safepoint Bias after Inlining

WebServerThread.run() Controller.doSomething() Controller.next() Repo.readPerson() new Person() View.printHtml() ???

slide-31
SLIDE 31

Time to Safepoint

  • XX:+PrintSafepointStatistics

Threads

Safepoint poll VM Operation

slide-32
SLIDE 32

Advanced Statistical Profiling in Java

OS Signals to interrupt threads on resource consumption threshold JVM’s signal handler-safe AsyncGetCallTrace to walk the stack

slide-33
SLIDE 33

People are put off by practical as much as technical issues

slide-34
SLIDE 34

Barriers to Ad-Hoc Production Profiling

Generally requires access to production Process involves manual work - hard to automate Low-overhead open source profilers unsupported

slide-35
SLIDE 35

What if we profiled all the time?

slide-36
SLIDE 36

Historical Data

Allows for post-hoc incident analysis Enables correlation with other data/metrics Performance regression analysis

slide-37
SLIDE 37

Putting Samples in Context

Application version Environment parameters (machine type, CPU, location, etc.) Ad-hoc profiling we can’t do this

slide-38
SLIDE 38

Opsian - Continuous Profiling

Opsian Aggregation service W e b R e p

  • r

t s JVM Agents

slide-39
SLIDE 39

Summary

We can profile in production with low overhead To overcome practical issues we can profile production all the time Profiling all the time opens up new capabilities

slide-40
SLIDE 40

Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

slide-41
SLIDE 41

Performance Matters Development isn’t Production Metrics can be unactionable Instrumentation has high overhead Continuous Profiling provides insight

slide-42
SLIDE 42

We need an attitude shift on profiling + monitoring

slide-43
SLIDE 43

Continuous Proactive not Reactive Systematic not Ad Hoc

slide-44
SLIDE 44

Please do Production Profiling. All the time.

slide-45
SLIDE 45

Any Questions?

https://www.opsian.com/

slide-46
SLIDE 46

The End