Java One 2015 Deep Dive T op Performance Mistakes And other Tips - - PowerPoint PPT Presentation

java one 2015 deep dive t op performance mistakes
SMART_READER_LITE
LIVE PREVIEW

Java One 2015 Deep Dive T op Performance Mistakes And other Tips - - PowerPoint PPT Presentation

Java One 2015 Deep Dive T op Performance Mistakes And other Tips & T ricks to make you a Performance Expert More on http://blog.dynatrace.com Andreas Grabner - @grabnerandi Safe Harbor AND MANY MORE 0.01ms 0.02ms 15


slide-1
SLIDE 1

And other Tips & T ricks to make you a “Performance Expert” More on http://blog.dynatrace.com

Andreas Grabner - @grabnerandi

Java One 2015 – Deep Dive T

  • p Performance Mistakes
slide-2
SLIDE 2

Safe Harbor 

slide-3
SLIDE 3

AND MANY MORE

slide-4
SLIDE 4

0.02ms

0.01ms

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

15 Years: That’s why I ended up talking about performance

slide-8
SLIDE 8

Where do your Stories come from?

slide-9
SLIDE 9

#1: Real Life & Real User Stories

slide-10
SLIDE 10

#2: http://bit.ly/onlineperfclinic

slide-11
SLIDE 11

#3: http://bit.ly/sharepurepath

slide-12
SLIDE 12
slide-13
SLIDE 13

20% 80%

slide-14
SLIDE 14
slide-15
SLIDE 15

Frontend Performance

We are getting FATer!

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Example of a “Bad” Web Deployment

282! Objects

  • n that page

282! Objects

  • n that page

9.68MB Page Size 9.68MB Page Size

8.8s Page Load

Time

8.8s Page Load

Time Most objects are images delivered from your main domain Most objects are images delivered from your main domain Very long Connect tjme (1.8s) to your CDN Very long Connect tjme (1.8s) to your CDN

slide-19
SLIDE 19

Mobile landing page of Super Bowl ad

434 Resources in total on that page: 230 JPEGs, 75 PNGs, 50 GIFs, … 434 Resources in total on that page: 230 JPEGs, 75 PNGs, 50 GIFs, …

Total size of ~ 20MB Total size of ~ 20MB

slide-20
SLIDE 20

Fifa.com during Worldcup

Source: htup://apmblog.compuware.com/2014/05/21/is-the-fjfa-world-cup-website-ready-for-the-tournament/

slide-21
SLIDE 21

8MB of background image for STPCon (Word Press)

slide-22
SLIDE 22

Make F12 or Browser Agent your friend!

slide-23
SLIDE 23

Compare yourself Online!

slide-24
SLIDE 24

Key Metrics

# of Resources Size of Resources Total Size of Content

slide-25
SLIDE 25
  • Browser Built-In Developer Tools
  • Extensions such as YSlow, PageSpeed
  • Online Tools
  • WebPageTest
  • Google PageSpeed Insights
  • Dynatrace Performance Center
  • ...
  • Automate!! With Selenium, WebDriver, Cucumber, ...

T

  • oling
slide-26
SLIDE 26

Frontend Availability

Back to Basics Please!

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

Online Services for you: Is it down right now?

slide-31
SLIDE 31

Online Services for you: Outage Analyzer

slide-32
SLIDE 32

Tip for handling Spike Load: GO LEAN!!

Response tjme improved 4x Response tjme improved 4x 1h before SuperBowl KickOf 1h before SuperBowl KickOf 1h afuer Game ended 1h afuer Game ended

slide-33
SLIDE 33

Key Metrics

HTTP 3xx, 4xx, 5xx # of Domains

slide-34
SLIDE 34
  • Dynatrace Synthetic
  • Ruxit Synthetic
  • NewRelic Synthetic
  • AppDynamics
  • PingDom
  • ... Just Google for „Synthetic Monitoring“

Online Services

slide-35
SLIDE 35

Backend Performance

The Usual Suspects

slide-36
SLIDE 36
  • Symptoms
  • HTML takes between 60 and 120s to render
  • High GC Time
  • Developer Assumptions
  • Bad GC Tuning
  • Probably bad Database Performance as rendering was simple
  • Result: 2 Years of Finger pointing between Dev and DBA

Project: Online Room Reservation System

slide-37
SLIDE 37

Developers built own monitoring

void roomreservationReport(int officeId) { long startTime = System.currentTimeMillis(); Object data = loadDataForOffice(officeId); long dataLoadTime = System.currentTimeMillis() - startTime; generateReport(data, officeId); } Result:

  • Avg. Data Load Time: 45s!

DB Tool says:

  • Avg. SQL Query: <1ms!
slide-38
SLIDE 38

#1: Loading too much data

24889! Calls to the Database API! 24889! Calls to the Database API! High CPU and High Memory Usage to keep all data in Memory High CPU and High Memory Usage to keep all data in Memory

slide-39
SLIDE 39

#2: On individual connections

12444! individual connectjons 12444! individual connectjons

Classical N+1 Query Problem Classical N+1 Query Problem Individual SQL really <1ms Individual SQL really <1ms

slide-40
SLIDE 40

#3: Putting all data in temp Hashtable

Lots of tjme spent in Hashtable.get Lots of tjme spent in Hashtable.get Called from their Entjty Objects Called from their Entjty Objects

slide-41
SLIDE 41
  • … you know what code is doing you inherited!!
  • … you are not making mistakes like this 
  • Explore the Right Tools
  • Built-In Database Analysis Tools
  • “Logging” options of Frameworks such as Hibernate, …
  • JMX, Perf Counters, … of your Application Servers
  • Performance Tracing Tools: Dynatrace, Ruxit, NewRelic,

AppDynamics, Your Profjler of Choice …

Lessons Learned – Don’t Assume …

slide-42
SLIDE 42

Key Metrics

# of SQL Calls # of same SQL Execs (1+N) # of Connectjons Rows/Data Transferred

slide-43
SLIDE 43

Backend Performance

Architectural Mistakes with „Migrating“ to (Micro)Services

slide-44
SLIDE 44

26.7s Executjon Time 26.7s Executjon Time 33! Calls to the same Web Service 33! Calls to the same Web Service

171! SQL Queries through LINQ by this Web Service – request similar data for each call 171! SQL Queries through LINQ by this Web Service – request similar data for each call

Architecture Violatjon: Direct access to DB instead from frontend logic Architecture Violatjon: Direct access to DB instead from frontend logic

slide-45
SLIDE 45

21671! Calls to Oracle 21671! Calls to Oracle

3136! Calls to H2 mostly executed on async background threads 3136! Calls to H2 mostly executed on async background threads 33! Diferent connectjons used 33! Diferent connectjons used

DB Exceptjons on both Databases DB Exceptjons on both Databases DB Exceptjons on both Databases DB Exceptjons on both Databases

40! internal Web Service Calls that do all these DB Updates 40! internal Web Service Calls that do all these DB Updates

slide-46
SLIDE 46

Key Metrics

# of Service Calls Payload of Service Calls # of Involved Threads 1+N Service Call Patuern!

slide-47
SLIDE 47
  • Dynatrace
  • Ruxit
  • NewRelic
  • AppDynamics
  • Any Profjler that can trace across tiers
  • Google for Tracing or APM (Application Performance

Management)

T

  • oling
slide-48
SLIDE 48

Logging

WE CAN LOG THIS!!

LOG

slide-49
SLIDE 49

Log Hotspots in Frameworks!

callAppenders clear CPU and I/O Hotspot Excessive logging through Spring Framework

slide-50
SLIDE 50

Debug Log and outdated log4j library

#1: Top Problem: log4j.callAppenders

  • > 71% Sync Time

#1: Top Problem: log4j.callAppenders

  • > 71% Sync Time

#2: Most of logging done from fjllDetail method #2: Most of logging done from fjllDetail method

#3: Doing “DEBUG” log

  • utput: Is this necessary?

#3: Doing “DEBUG” log

  • utput: Is this necessary?
slide-51
SLIDE 51

Key Metrics

# of Log Entries Size of Logs per Use Case

slide-52
SLIDE 52

Response Time is not the only Performance Indicator

Look at Resources as well

slide-53
SLIDE 53

Is this a successful new Build?

slide-54
SLIDE 54

Look at Resource Usage: CPU, Memory, …

slide-55
SLIDE 55

Memory? Look at Heap Generations

slide-56
SLIDE 56

Root Cause: Dependency Injection

slide-57
SLIDE 57

Prevent: Monitor Memory Metrics for every Build

slide-58
SLIDE 58

#3: Growing “Old Gen” is a good indicator for a Mem Leak #3: Growing “Old Gen” is a good indicator for a Mem Leak

#4: Heavy GC kicks in when Old Generatjon is full! #4: Heavy GC kicks in when Old Generatjon is full! #5: Throughput

  • f Applicatjon

goes to 0 due to no memory available #5: Throughput

  • f Applicatjon

goes to 0 due to no memory available #1: Eden Space stays constant. Objects being propagated to Survivor Space #1: Eden Space stays constant. Objects being propagated to Survivor Space #2: GC Actjvity in Young Generatjon ultjmately moves objects into Old Generatjon #2: GC Actjvity in Young Generatjon ultjmately moves objects into Old Generatjon

slide-59
SLIDE 59

Key Metrics

# of Objects per Generatjon # of GC Runs Total Impact of GC

slide-60
SLIDE 60

Tips & Tricks

And more Metrics of course 

slide-61
SLIDE 61

Tip: Layer Breakdown over Time

With increasing load: Which LAYER doesn’t SCALE? With increasing load: Which LAYER doesn’t SCALE?

slide-62
SLIDE 62

Tip: Exceptions and Log Messages

How are # of EXCEPTIONS evolving over tjme? How are # of EXCEPTIONS evolving over tjme? How many SEVERE LOG messages to we write in relatjon to Exceptjons? How many SEVERE LOG messages to we write in relatjon to Exceptjons?

slide-63
SLIDE 63

Tip: Failed Transactions

Are more TRANSACTIONS FAILING (HTTP 5xx, 4xx, …) under heavier load? Are more TRANSACTIONS FAILING (HTTP 5xx, 4xx, …) under heavier load?

slide-64
SLIDE 64

Tip: Database Activity

Do we see increased in AVG #

  • f SQL Executjons over Time?

Do we see increased in AVG #

  • f SQL Executjons over Time?

Do TOTAL # of SQL Executjons increase with load? Shouldn’t it fmatuen due to CACHES? Do TOTAL # of SQL Executjons increase with load? Shouldn’t it fmatuen due to CACHES?

slide-65
SLIDE 65

Tip: Database History Dashboard

How many SQL Statements are PREPARED? How many SQL Statements are PREPARED? What’s the overall Executjon Time of diferent SQL Types (SELECT, INSERT, DELETE, …) What’s the overall Executjon Time of diferent SQL Types (SELECT, INSERT, DELETE, …)

slide-66
SLIDE 66

Tip: DB Connection Pool Utilization

Do we have enough DB CONNECTIONS per pool? Do we have enough DB CONNECTIONS per pool?

slide-67
SLIDE 67

For more Key Metrics

htup://blog.dynatrace.com htup://blog.ruxit.com

slide-68
SLIDE 68

We want to get from here …

slide-69
SLIDE 69

T

  • here!
slide-70
SLIDE 70

Use these applicatjon metrics as additjonal Quality Gates

slide-71
SLIDE 71 71

What you currently measure What you should measure

Quality Metrics in your CI

# Test Failures Overall Duration

Execution Time per test # calls to API # executed SQL statements # Web Service Calls # JMS Messages # Objects Allocated # Exceptions # Log Messages # HTTP 4xx/5xx Request/Response Size Page Load/Rendering Time …
slide-72
SLIDE 72

Connecting your T ests with Quality

12 120ms 3 1 68ms Build 20 testPurchase OK testSearch OK Build 17 testPurchase OK testSearch OK Build 18 testPurchase FAILED testSearch OK Build 19 testPurchase OK testSearch OK Build # Test Case Status # SQL # Excep CPU 12 120ms 3 1 68ms 12 5 60ms 3 1 68ms 75 230ms 3 1 68ms Test Framework Results Architectural Data

We identified a regresesion Problem solved

Exceptions probably reason for failed tests Problem fixed but now we have an architectural regression Problem fixed but now we have an architectural regression

Now we have the functional and architectural confidence Let’s look behind the scenes

slide-73
SLIDE 73

#1: Analyzing each Test #1: Analyzing each Test #2: Metrics for each Test #2: Metrics for each Test #3: Detectjng Regression based on Measure #3: Detectjng Regression based on Measure

slide-74
SLIDE 74

Quality-Metrics based Build Status Quality-Metrics based Build Status

slide-75
SLIDE 75

Pull data into Jenkins, Bamboo ... Pull data into Jenkins, Bamboo ...

slide-76
SLIDE 76

Making Quality a fjrst-class citizen

„Too hard“ „Too hard“ „we‘ll get round to this later“ „we‘ll get round to this later“ „not cool enough“ „not cool enough“

slide-77
SLIDE 77

Questions and/or Demo

Slides: slideshare.net/grabnerandi Get Tools: bit.ly/dttrial YouTube Tutorials: bit.ly/dttutorials Contact Me: agrabner@dynatrace.com Follow Me: @grabnerandi Read More: blog.dynatrace.com

slide-78
SLIDE 78

Andreas Grabner

Dynatrace Developer Advocate @grabnerandi http://blog.dynatrace.com