Continuous Innovation through DevOps Pipelines Andreas Grabner: - - PowerPoint PPT Presentation

continuous innovation
SMART_READER_LITE
LIVE PREVIEW

Continuous Innovation through DevOps Pipelines Andreas Grabner: - - PowerPoint PPT Presentation

Continuous Innovation through DevOps Pipelines Andreas Grabner: @grabnerandi, andreas.grabner@dynatrace.com Slides: http://www.slideshare.net/grabnerandi Podcast: https://www.spreaker.com/show/pureperformance The Story started in 2009


slide-1
SLIDE 1

Continuous Innovation through DevOps Pipelines

Andreas Grabner: @grabnerandi, andreas.grabner@dynatrace.com Slides: http://www.slideshare.net/grabnerandi Podcast: https://www.spreaker.com/show/pureperformance

slide-2
SLIDE 2

@grabnerandi

The Story started in 2009

slide-3
SLIDE 3

@grabnerandi

slide-4
SLIDE 4

@grabnerandi

“The stuff we did when we were a Start Up and we All were

Devs, Testers and Ops”

Quote from Andreas Grabner back in 2013 @ DevOps Boston

slide-5
SLIDE 5

@grabnerandi

slide-6
SLIDE 6

Goal: Optimize Lead Time

time

Feature Lead Time minimize

Users

slide-7
SLIDE 7

24 “Features in a Box” Ship the whole box! Very late feedback 

slide-8
SLIDE 8

„1 Feature at a Time“ „Optimize before Deploy“ „Immediate Customer Feedback“

Continuous Innovation and Optimization

slide-9
SLIDE 9

DevOps Adoption

slide-10
SLIDE 10

700 deployments / YEAR 10 + deployments / DAY 50 – 60 deployments / DAY Every 11.6 SECONDS Innovators (aka Unicorns): Deliver value at the speed of business

slide-11
SLIDE 11
slide-12
SLIDE 12

@grabnerandi

slide-13
SLIDE 13

“We Deliver High Quality Software, Faster and Automated using New Stack“ „Shift-Left Performance to Reduce Lead Time“

Adam Auerbach, Sr. Dir DevOps

https://github.com/capitalone/Hygieia & https://www.spreaker.com/user/pureperformance

“… deploy some of our most critical production workloads on the AWS platform …”, Rob Alexander, CIO

slide-14
SLIDE 14

2 major releases/year

customers deploy &

  • perate on-prem

26 major releases/year

170 prod deployments/day self-service online sales SaaS & Managed

2011 2016

slide-15
SLIDE 15

Confidential, Dynatrace, LLC

mobile browser network multi-geo 3rd parties cloud containers services code hosts synthetic logs business transaction applications sdn relax

full-stack, broad, hyper-scale

slide-16
SLIDE 16

@grabnerandi

https://dynatrace.github.io/ufo/

“In Your Face” Data!

slide-17
SLIDE 17

@grabnerandi

Availability dropped to 0%

#1: Availability -> Brand Impact

slide-18
SLIDE 18

@grabnerandi

New Deployment + Mkt Push Increase # of unhappy users! Decline in Conversion Rate Overall increase of Users!

#2: User Experience -> Conversion

Spikes in FRUSTRATED Users!

slide-19
SLIDE 19

@grabnerandi

#3: Resource Cons -> Cost per Feature

4x $$$ to IaaS

slide-20
SLIDE 20

@grabnerandi

#4: Performance -> Behavior

slide-21
SLIDE 21

@grabnerandi

Not every Sprint ends without bruises!

slide-22
SLIDE 22

@grabnerandi

slide-23
SLIDE 23

Understanding Code Complexity

  • 4 Millions Lines of Monolith Code
  • Partially coded and commented in

Russian From Monolith to Microservice

  • Initial devs no longer with company
  • What to extract withouth breaking it?

Shift Left Quality & Performance

  • No automated testing in the pipeline
  • Bad builds just made it into production

Cross Application Impacts

  • Shared Infrastructure between Apps
  • No consolidated monitoring strategy
slide-24
SLIDE 24

@grabnerandi

Scaling an Online Sports Club Search Service

2015 2014 20xx Response Time 2016+

1) 2-Man Project 2) Limited Success 3) Start Expansion 4) Performance Slows Growth

Users

5) Potential Decline?

slide-25
SLIDE 25

@grabnerandi

Early 2015: Monolith Under Pressure

Can„t scale vertically endlessly! May: 2.68s 94.09% CPU

Bound

April: 0.52s

slide-26
SLIDE 26

@grabnerandi

From Monolith to Services in a Hybrid-Cloud

Front End in Geo-Distributed Cloud Scale Backend in Containers On Premise

slide-27
SLIDE 27

@grabnerandi

Go live – 7:00 a.m.

slide-28
SLIDE 28

@grabnerandi

Go live – 12:00 p.m.

slide-29
SLIDE 29

What Went Wrong?

slide-30
SLIDE 30

@grabnerandi

26.7s Load Time

5kB Payload

33! Service Calls

99kB - 3kB for each call!

171! Total SQL Count

Architecture Violation

Direct access to DB from frontend service

Single search query end-to-end

slide-31
SLIDE 31

Understanding Code Complexity

  • Existing 10 year old code & 3rd party
  • Skills: Not everyone is a perf expert or born architect

From Monolith to Microservice

  • Service usage in the End-to-End Scenarios?
  • Will it scale? Or is it just a new monolith?

Understand Deployment Complexity

  • When moving to Cloud/Virtual: Costs, Latency …
  • Old & new patterns, e.g: N+1 Query, Data

Understand Your End Users

  • What they like and what they DONT like!
  • Its priority list & input for other teams, e.g: testing
slide-32
SLIDE 32

@grabnerandi

The fixed end-to-end use case

“Re-architect” vs. “Migrate” to Service-Orientation

2.5s (vs 26.7)

5kB Payload

1! (vs 33!) Service Call

5kB (vs 99) Payload!

3! (vs 177)

Total SQL Count

slide-33
SLIDE 33

@grabnerandi

slide-34
SLIDE 34

@grabnerandi

You measure it! from Dev (to) Ops

slide-35
SLIDE 35

@grabnerandi

Build 17 testNewsAlert OK testSearch OK Build # Use Case Stat # APICalls # SQL Payload CPU 1 5 2kb 70ms 1 35 5kb 120ms Use Case Tests and Monitors Service & App Metrics Build 26 testNewsAlert OK testSearch OK Build 25 testNewsAlert OK testSearch OK 1 4 1kb 60ms 34 171 104kb 550ms Ops #ServInst Usage RT 1 0.5% 7.2s 1 63% 5.2s 1 4 1kb 60ms 2 3 10kb 150ms 1 0.6% 3.2s 6 75% 2.5s Build 35 testNewsAlert

  • testSearch

OK

  • 2

3 7kb 100ms

  • 4

80% 2.0s

Continuous Innovation and Optimization

Re-architecture into „Services“ + Performance Fixes Scenario: Monolithic App with 2 Key Features

slide-36
SLIDE 36

Where to Start? Where to Go?

slide-37
SLIDE 37

@grabnerandi

slide-38
SLIDE 38

„Always seek to Increase Flow“

Ensure Success in The First Way

Removing Bottlenecks

Eliminating Technical Debt Enable Successful Cloud & Miroservices Migration

Shift-Left Quality

Reduce Code Complexity

slide-39
SLIDE 39

Manual Code/Architectural Bottleneck Detection

  • Blog & YouTube Tutorial:
  • http://apmblog.dynatrace.com/2016/06/23/automatic-problem-detection-with-dynatrace/
  • http://bit.ly/dttutorials
  • Metrics
  • # SQL, # of Same SQLs, # Threads, # Web Service/API Calls # Exceptions, # of Logs
  • # Bytes Transferred, Total Page Load, # of JavaScript/CSS/Images ...
slide-40
SLIDE 40

Automatic ic Bottleneck Root Cause Information

slide-41
SLIDE 41

Manual Database Bottleneck Detection

  • Blog & YouTube Tutorial:
  • http://apmblog.dynatrace.com/2016/02/18/diagnosing-java-hotspots/
  • http://bit.ly/dttutorials -> Database Diagnostics
  • Patterns
  • N+1 Query, Unprepared SQL, Slow SQL, Database Cache, Indices, Loading Too Much Data ...
slide-42
SLIDE 42

Automated Database Bottleneck Detection

slide-43
SLIDE 43

Automated Code/Archiecture Bottleneck Detection

slide-44
SLIDE 44

“To Deliver High Quality Working Software Faster“

„We have to Shift-Left Performance to Optimize Pipelines“

http://apmblog.dynatrace.com/2016/10/04/scaling-continuous-delivery-shift-left-performance-to-improve-lead-time-pipeline-flow/

slide-45
SLIDE 45

= Functional Result (passed/failed) + Web Performance Metrics (# of Images, # of JavaScript, Page Load Time, ...) + App Performance Metrics (# of SQL, # of Logs, # of API Calls, # of Exceptions ...) Fail the build early!

slide-46
SLIDE 46

Reduce Le Lead Tim ime: Stop 80% of Performance Issues in your Integration Phase

CI/CD: Test Automation (Selenium, Appium, Cucumber, Silk, ...) to detect functional and architectural (performance, scalabilty) regressions Perf: Performance Test (JMeter, LoadRunner, Neotys, Silk, ...) to detect tough performance issues

slide-47
SLIDE 47

Shift-Left Perf rformance results in Reduced Lead Time powered by Dynatrace Test t Automation

http://apmblog.dynatrace.com/2016/10/04/scaling-continuous-delivery-shift-left-performance-to-improve-lead-time-pipeline-flow/

slide-48
SLIDE 48

Faster Lead Times to User Value! Results in Business Success!

slide-49
SLIDE 49

Questions

Slides: slideshare.net/grabnerandi Get Tools: bit.ly/dtpersonal Watch: bit.ly/dttutorials Follow Me: @grabnerandi Read More: blog.dynatrace.com Listen: http://bit.ly/pureperf Mail: andreas.grabner@dynatrace.com

slide-50
SLIDE 50

Andreas Grabner

Dynatrace Developer Advocate @grabnerandi http://blog.dynatrace.com

slide-51
SLIDE 51

@grabnerandi

„Always seek to Increase Flow“ „Understand and Respond to Outcome“ „Culture on Continual Experimentation“

slide-52
SLIDE 52

@grabnerandi

Increased Flow of High Quality Value

Test Driven Development Automated Deployments Shift-Left Performance Break the Monolith Infrastructure as Code Migrate to Virtual/Cloud/PaaS

Remove Bottlenecks

slide-53
SLIDE 53

@grabnerandi

Fast Response to Outcome: Address Deployment Impact

User Experience, Conversion Rate Costs and Efficiency Availability

slide-54
SLIDE 54

@grabnerandi

Real User Feedback: Building the RIGHT thing RIGHT!

Experiment & innovate on new ideas Optimizing what is not perfect Removin g what nobody needs

slide-55
SLIDE 55

Remove Database Bottlenecks

cite the database as the most common challenge or issue with application performance

88%

slide-56
SLIDE 56

Automatic ic Bottleneck Root Cause Information

slide-57
SLIDE 57

Manual Service Bottleneck Detection

  • Blogs:
  • http://apmblog.dynatrace.com/2016/06/08/diagnosing-common-bad-micro-service-call-patterns/
  • http://apmblog.dynatrace.com/2015/08/26/monolith-to-microservices-key-architectural-metrics-to-watch/
  • Patterns
  • N+1, High Payload, Lack of Caching, Thread & Connection Pool Shortage, Excessive Async Calls
slide-58
SLIDE 58

Automated Service Bottleneck Detection

slide-59
SLIDE 59

Au Automated Large Scale Service Monitoring and Bottleneck Detection

slide-60
SLIDE 60

Automatic ic Bottleneck Root Cause Information

slide-61
SLIDE 61

Manual Deployment Bottleneck Detection

  • Blogs:
  • http://apmblog.dynatrace.com/2016/07/07/measure-frequent-successful-software-releases/
  • http://apmblog.dynatrace.com/2015/08/04/hybris-performance-review-10-system-health-checks/
  • Patterns
  • Load Distribution, # HTTP 3xx/4xx/5xx, # of Exceptions, Stuck Threads, Timeouts, ...
slide-62
SLIDE 62

Automated Deployment Bottleneck Detection

slide-63
SLIDE 63

Automatic ic Bottleneck Root Cause Information