djigger An open-source performance analysis solution Context - - PowerPoint PPT Presentation

djigger
SMART_READER_LITE
LIVE PREVIEW

djigger An open-source performance analysis solution Context - - PowerPoint PPT Presentation

djigger An open-source performance analysis solution Context Performance Testing & Analysis @ several companies Depending on project : often no tools or tools that cant be used 2012 Thread dumps are available : while (true) do


slide-1
SLIDE 1

djigger

An open-source performance analysis solution

slide-2
SLIDE 2
  • Performance Testing & Analysis @ several companies
  • Depending on project : often no tools or tools that can’t be used
  • Thread dumps are available : while (true) do kill -3 PID done
  • Analyzing thread dumps manually is a pain

Let’s build our own thread dump analyzer !

Context

2012

slide-3
SLIDE 3

Development

2012 Thread Dump Analyzer 2013 Sampler 2014 Collector 2015 2016 Agent Full APM

(aggregate events) (no more kill -3) (24/7 archiving) (instrument) (distributed tracing)

public release

~ 10 companies use djigger in France and Switzerland

slide-4
SLIDE 4

About performance analysis

slide-5
SLIDE 5

Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem.

My definition

slide-6
SLIDE 6

Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail

Necessary and sufficient conditions

slide-7
SLIDE 7

Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail

Many factors affect our ability to do this correctly, not just tooling

It’s not just about tools

slide-8
SLIDE 8

Many factors are at play...

Knowledge

  • f the stack

Problem inputs Permissions & environment Monitoring maturity

slide-9
SLIDE 9

who owns the code? may I access the system? may I change things?

Many factors are at play...

Knowledge

  • f the stack

Problem inputs Permissions & environment Monitoring maturity

slide-10
SLIDE 10

Many factors are at play...

do we have proper tooling? are all environments monitored? do I have the necessary data? Permissions & environment Monitoring maturity Knowledge

  • f the stack

Problem inputs

slide-11
SLIDE 11

Many factors are at play...

have I already seen this pattern? are components closed/proprietary? can I understand this runtime? Knowledge

  • f the stack

Problem inputs Permissions & environment Monitoring maturity

slide-12
SLIDE 12

Many factors are at play...

what’s the occurrence pattern? what’s the desired behaviour? what are the actual symptoms? Knowledge

  • f the stack

Problem inputs Permissions & environment Monitoring maturity

slide-13
SLIDE 13

About metrics

slide-14
SLIDE 14

User CPU Memory Net I/O Pool usage Cache hit ratio Disk I/O

There’s a ton of metrics out there

Logs AWR / v$ Heap dumps Kern CPU Cache Size Queue size

?

slide-15
SLIDE 15

User CPU Memory Net I/O Pool usage Cache hit ratio Disk I/O

I don’t play the elimination game (anymore)

Logs AWR / v$ Heap dumps Kern CPU Cache Size Queue size

slide-16
SLIDE 16

What are the main actors of a program’s execution?

Let’s look at what the program is doing

slide-17
SLIDE 17

What are the main actors of a program’s execution? Threads. What’s the most important information about a thread?

Let’s look at what the program is doing

slide-18
SLIDE 18

What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to?

Let’s look at what the program is doing

slide-19
SLIDE 19

What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to? GC pauses.

Let’s look at what the program is doing

slide-20
SLIDE 20

I check thread stacks and GC overhead first.

Look at what the program is doing

slide-21
SLIDE 21

Analysis process

slide-22
SLIDE 22

A 3-step approach to analyzing latency issues WHERE WHAT WHY

slide-23
SLIDE 23

A 3-step approach to analyzing latency issues WHAT

ex.: a servlet call

slide-24
SLIDE 24

A 3-step approach to analyzing latency issues WHERE WHAT

ex.: a servlet call ex.: time is spent in DB

slide-25
SLIDE 25

A 3-step approach to analyzing latency issues WHERE WHY WHAT

ex.: a servlet call ex.: time is spent in DB ex.: 1-n pattern and query can be cached

slide-26
SLIDE 26

A 3-step approach to analyzing latency issues WHERE WHY WHAT

Find out which events are problematic (transaction, method, click..) Identify top consumers in the execution trees Read stacks & object data to identify faulty or

  • ptimizable behaviour

ex.: a servlet call ex.: time is spent in DB ex.: 1-n pattern and query can be cached

slide-27
SLIDE 27

sampling instrumentation

Collecting events

slide-28
SLIDE 28

Collecting events

sampling instrumentation

Thread-dump events, approximation of reality Concrete measurements and object capture

slide-29
SLIDE 29

Collecting events

sampling instrumentation

without agent with agent

Thread-dump events, approximation of reality Concrete measurements and object capture (for BCI)

slide-30
SLIDE 30

Stacktrace Sampling

slide-31
SLIDE 31

A dummy thread at runtime

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

slide-32
SLIDE 32

A dummy thread at runtime

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

time stacked methods

slide-33
SLIDE 33

A random thread dump

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

at java.lang.Object.wait() at mypackage.datasource.acquireConnection() at mypackage.Myclass.doMoreStuff() at mypackage.MyClass.main()

slide-34
SLIDE 34

Sampling = periodical thread dumps

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

slide-35
SLIDE 35

Time-based events

e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e

slide-36
SLIDE 36

Time-based aggregation

Tree aggregator

43% 57% 14% 28% 43% 43%

slide-37
SLIDE 37

Thread-based aggregation

Y% X% A% B% C% D% Thread 1 Thread 2 Thread 3 Z% Y%

Tree aggregator

slide-38
SLIDE 38

What does it look like in djigger?

=

slide-39
SLIDE 39

3-step approach with sampling WHERE WHAT WHY

1 2 3

without agent search aggregated events read stacks and stats drill-down locally

slide-40
SLIDE 40

Example

slide-41
SLIDE 41

Example

slide-42
SLIDE 42

Example

slide-43
SLIDE 43

Instrumentation

slide-44
SLIDE 44

A dummy thread at runtime (again)

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put() g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

slide-45
SLIDE 45

Subscriptions

mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff()

put() g e t g e t g e t g e t g e t Object.wait() acquire Connection() socketRead( )

Active subscriptions: Start event: End event:

slide-46
SLIDE 46

begin = 11:38:20.271, method= wait, duration= 599 ms begin = 11:38:20.252, method= acquireConnection, duration= 613 ms

Subscription-based events

e e e

begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms

slide-47
SLIDE 47

begin = 11:38:20.271, method= wait, duration= 599 ms begin = 11:38:20.252, method= acquireConnection, duration= 613 ms

Transaction flags

e e e

begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms

tId= 1fa23

slide-48
SLIDE 48

begin = 11:38:20.271, method= wait, duration= 599 ms begin = 11:38:20.252, method= acquireConnection, duration= 613 ms

Object capture

executeQuery(“SELECT * FROM TABLE”)

e e e

begin = 11:38:20.243, …, data = “SELECT * FROM MYTABLE”

tId= 1fa23

slide-49
SLIDE 49

begin = 11:38:20.271, ... begin = 11:38:20.252, ...

Distributed transactions

e e e

begin = 11:38:20.243, ...

tId= 1fa23

JVM 1 JVM 2 begin = 11:38:20.252, ...

e e

begin = 11:38:20.301, ...

tId= 87e01

drill-down

slide-50
SLIDE 50

3-step approach with instrumentation WHERE WHAT WHY

1 2 3

with agent refine search entry point events drill-down across JVMs capture

  • bject data
slide-51
SLIDE 51

What does it look like in djigger?

slide-52
SLIDE 52

What does it look like in djigger?

handleRequest() invoke() invoke() invoke() ... invoke() invoke() invoke() ... executeQuery() executeQuery() executeQuery()

slide-53
SLIDE 53

What does it look like in djigger?

slide-54
SLIDE 54

Component overview

slide-55
SLIDE 55
slide-56
SLIDE 56

connectors

JMX, -javaagent, kill -3, jstack, process attach, ... events

slide-57
SLIDE 57

client

P P R R O O F F I I L L E E R R M M O O D D E E connectors

JMX, -javaagent, kill -3, jstack, process attach, ... events harvest & analyze

slide-58
SLIDE 58

APM MODE

client store collector

connectors

JMX, -javaagent, kill -3, jstack, process attach, ... events harvest analyze persist events events

slide-59
SLIDE 59

Download and try out djigger !

slide-60
SLIDE 60

Download djigger at http://denkbar.io

slide-61
SLIDE 61

Thanks for your attention