Im a Performance Geek!!! Designed and Implemented Monitoring - - PowerPoint PPT Presentation

i m a performance geek designed and implemented
SMART_READER_LITE
LIVE PREVIEW

Im a Performance Geek!!! Designed and Implemented Monitoring - - PowerPoint PPT Presentation

Introduction Im a Performance Geek!!! Designed and Implemented Monitoring Architecture for Wachovia Investment Bank and Wells Fargo Managed Services Ive used many of the enterprise class monitoring tools in existence. I


slide-1
SLIDE 1
slide-2
SLIDE 2

Introduction

  • I’m a Performance Geek!!!
  • Designed and Implemented Monitoring

Architecture for Wachovia Investment Bank and Wells Fargo Managed Services

  • I’ve used many of the enterprise class

monitoring tools in existence.

  • I currently live, work, and play in Idaho,

USA

2

slide-3
SLIDE 3

3

This is Iowa, I don’t live here. This is Idaho, I live here. Right Here!

slide-4
SLIDE 4

Agenda

Big Dumb Data

4

Smart Data Defined Shifting DR to PR Smart Data Strategies Examples Questions

slide-5
SLIDE 5

Big Dumb Data

5

slide-6
SLIDE 6

Why do monitoring tools exist anyway?

To quickly identify and remediate the business impact of performance and stability issues.

6

slide-7
SLIDE 7

What is Business Impact?

7

slide-8
SLIDE 8

Big Data = Enterprise Data Bloating

  • Business Data
  • Log Files
  • Monitoring Data
  • Business Intelligence Data
  • Legal Data
  • Regulatory Compliance Data
  • Email
  • Etc…

8

slide-9
SLIDE 9

Keep Everything?

9

slide-10
SLIDE 10

Keeping Too Little is Also Bad

10

slide-11
SLIDE 11

Keep Just What You Need

11

slide-12
SLIDE 12

True Story: Oops, that got expensive.

5-7 years ago installed and operated 3 monitoring tools

12

BTM, APM, and Predictive Analytics ~80 Applications Ended up with ~50 Management Servers And 5-10 TB of data Explore the hidden costs before you decide to implement

slide-13
SLIDE 13

The Digital Hoarders are Winning

13

slide-14
SLIDE 14

14

36% 37% 47% Network Bandwidth System Performance Data Storage

Gartner Survey

slide-15
SLIDE 15

False Pretense That Storage is Cheap

  • 5 Year Storage Costs: 80% OpEx, 20% CapEx

(2009 IBM Study)

  • IT Budgets: Up To 40% Spent on Storage
  • $5-25/GB/month Fully Loaded Cost

– $61,440 - $307,200 Per Year Per TB

15

slide-16
SLIDE 16

Smart Data Defined

16

slide-17
SLIDE 17

Data must be turned into information to be useful.

Heart Rate = 150 bpm Blood Pressure = 200 over 100 Is the person performing well or not?

17

slide-18
SLIDE 18

Are we talking about this guy?

18

slide-19
SLIDE 19

19

Or this guy?

slide-20
SLIDE 20

Data must be turned into information to be useful.

Eye Color = Brown Weight = 207 lbs (94 kg)

20

Is the person performing well or not? Distance Run = 100 meters Time = 9.58s World Record Time=9.69s

slide-21
SLIDE 21

Correlation + Analytics Turned Data Into Information

21

slide-22
SLIDE 22

Traditional Monitoring Tools Are Misleading

22

Resource Spikes May or May Not Cause Business Impact

slide-23
SLIDE 23

Having a lot of data causes a false sense of security.

23

Your needle is somewhere in there, good luck finding it anytime soon.

slide-24
SLIDE 24

We’ve become addicted to metrics!

24

How Much Is Enough???

slide-25
SLIDE 25

What do these charts tell us about application performance or business impact?

25

slide-26
SLIDE 26

This is better, but still not good enough.

26

Average Response Time of ProcessOrder Transaction with Historical Baseline

slide-27
SLIDE 27

True Story: Wasted Time.

Called onto conf line to help with Sev 1

27

Confident I had all of the data I needed to figure out the problem Searched charts for hours The problem wasn’t on my servers in the first place

slide-28
SLIDE 28

We need our monitoring platforms to do the heavy lifting for us if we want MTTR < 30 minutes.

28

Monitor my application from the user AND IT perspective. Determine what is normal by

  • bservation and analytics.

Show me what my application looks like right now using correlation. Alert me if anything above changes for the worse. Have the data I need to solve the problem and lead me to the answer quickly.

slide-29
SLIDE 29

Disaster Recovery (DR) Needs to Shift to Problem Recovery (PR)

29

slide-30
SLIDE 30

We spend too much time planning for what will probably never happen.

30

slide-31
SLIDE 31

We spend too little time planning for what happens all too often.

31

slide-32
SLIDE 32

What is Problem Recovery Planning?

PR is a strategy and an organizational mindset. It’s the idea that monitoring is critical to managing applications and ensuring an

  • ptimal user experience.

It’s the practical implementation of a well defined monitoring architecture.

32

slide-33
SLIDE 33

Monitoring is an afterthought too often.

slide-34
SLIDE 34

When a problem occurs…

  • Do we have monitoring?
  • What kind?
  • What are we collecting?
  • How long do we have history?

34

slide-35
SLIDE 35

Think about what you need ahead of time.

35

DB Network Infra Log App

slide-36
SLIDE 36

True Story: Investment Bank Blues

36

  • 40-50 Sev 1 Incendents Per Month
  • MTTR ~2 hours
  • Executive Mandate to Cut Incidents to

Single Digits

  • Executive Mandate of 15 Minute or Less

MTTR for All Trading Applications

slide-37
SLIDE 37

37

Had It Already

  • Infrastructure Monitoring
  • NPM – Network Performance Monitoring
  • Periodic Database Monitoring

Missing

  • APM – Application Performance Monitoring
  • Log Monitoring and Analytics
  • Always On Database Monitoring
  • Predictive Analytics
slide-38
SLIDE 38

38

Added

  • APM – Application Performance Monitoring
  • Predictive Analytics
  • Always On Database Monitoring
  • Business/IT Master Dashboard

Significant Results

  • Reduced Sev 1s from 45/month to 4/month
  • Improved key transaction speeds by 10x
  • Reduced MTTR from 3 hrs to 30 mins
  • Detected and repaired problems before

impact

slide-39
SLIDE 39

Cloud Computing is driving the need for PR planning

  • Cloud apps are highly distributed so they

can take advantage of dynamic scaling

39

  • Highly distributed applications are much

harder to troubleshoot

  • Use of APM is the fastest way to identify

and fix application problems in the cloud

slide-40
SLIDE 40

Smart Data Strategies

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

The costs add up.

  • Cloud Bandwidth = ~$5000 per year per
  • application. Charged $.12 per GB of data out
  • f cloud.
  • Storage Costs = $204,800 per month by end
  • f year 1. Using $5 per GB per month.

~1.3 Million USD spent at end of 1st year.

42

  • Single High Traffic Application
  • Transmit and store up to 40 TB of monitoring

data per year! (Keep Everything)

slide-43
SLIDE 43

We need to save THE RIGHT data

43

Analytics Archive Aggregation Correlation Control Application

slide-44
SLIDE 44

EUE – Key Performance Indicators (KPIs)

EUE – Pages, response time, network time, render time, location performance, etc…

44

slide-45
SLIDE 45

EUE – Key Performance Indicators (KPIs)

EUE – Pages, response time, network time, render time, location performance, etc…

45

slide-46
SLIDE 46

Business Transaction KPIs

46

BTs – Response time, count, rate, errors, CPU Used, CPU Block, CPU Wait, etc…

slide-47
SLIDE 47

Application Flow KPIs

47

Application Flow – Active nodes, active tiers, node response time, tier response time, external service response times, etc…

slide-48
SLIDE 48

Deep Diagnostics – We don’t need to save these forever.

48

slide-49
SLIDE 49

Don’t be this guy…

49

slide-50
SLIDE 50

Plan ahead, anticipate your needs, keep your

  • rganization nimble, powerful and purpose built.

50

slide-51
SLIDE 51

Example

51

slide-52
SLIDE 52

Netflix

  • Video Streaming
  • AWS Deployment
  • Highly dynamic environment
  • ~10,000 JVM Nodes
  • Doing it right

52

slide-53
SLIDE 53

Netflix

Collecting over 1 million metrics per minute.

53

slide-54
SLIDE 54

What’s the point(s)?

  • Big data isn’t a bad thing as long as it is

serving a purpose.

  • Big monitoring data slows down MTTR and

drives up both OpEx and CapEx.

  • Focusing on Problem Recovery will help you

figure out your architecture, tools, and process.

  • Don’t be a digital hoarder!!!

54

slide-55
SLIDE 55

Questions???

55

slide-56
SLIDE 56

Thank You