Scalability: Pushing the Limits PNSQC Presentation, October 2014 - - PowerPoint PPT Presentation

scalability
SMART_READER_LITE
LIVE PREVIEW

Scalability: Pushing the Limits PNSQC Presentation, October 2014 - - PowerPoint PPT Presentation

Scalability: Pushing the Limits PNSQC Presentation, October 2014 Neha Rai, Tim Schooley, Tejas Patil 2 So what is Scalability? Scalability is the ability of a system to successfully handle an increasing workload, or its ability to


slide-1
SLIDE 1

PNSQC Presentation, October 2014

Scalability: Pushing the Limits

Neha Rai, Tim Schooley, Tejas Patil

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

So what is “Scalability”? “Scalability is the ability of a system to successfully handle an increasing workload, or its ability to be expanded without major architectural changes, or detriment.”

For a good read, check out “Characteristics of Scalability and Their Impact on Performance”, André B. Bondi, AT&T Labs

slide-4
SLIDE 4

4

Once upon a time, there was…

slide-5
SLIDE 5

5

Policy User authentication Auditing Reports Key escrow

slide-6
SLIDE 6

6

[1] By Brian Snelson (originally posted to Flickr as Final assembly) [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons [2] By U.S. Navy photo by Lt. Arwen Chisholm [Public domain], via Wikimedia Commons

Mission Critical

(Photographs for example only ; not indicative of actual customers)

[1] [2]

slide-7
SLIDE 7

7

DB

AH AH

McAfee ePolicy Orchestrator

Drive Encryption Agent-Server Communication Interval (ASCI) (Agent Handler(s)) McAfee Agent

slide-8
SLIDE 8

8

Effects of changing the ASCI, with 100,000 clients

5 10 15 20 25 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Average number of client requests per second Agent-Server Communication Interval (hours)

slide-9
SLIDE 9

9

So we integrated into ePO...

Policy Users Auditing Reports Key escrow

slide-10
SLIDE 10

10

  • Does it meet our scalability expectation?

We had a number in mind, based on existing ePO scalability guidelines (goal of 100,000).

  • Will it work for existing customers?

Mission critical. It has to work.

  • Does it meet our quality goals?

Do we know what happens when the system reaches its limits?

Are we ready to roll it out?

slide-11
SLIDE 11

11

Without testing the limits, bad things™ can happen.

[Confidence in] ability to meet demand Investment ($) in pushing the limits

slide-12
SLIDE 12

12

Key take-away #1: Understand the risks of not doing Scalability Testing

(this will help you determine if you need to do it)

slide-13
SLIDE 13

13

DB

AH AH

“5_G>I’N^O!”

What to test?

  • Covers many

components

  • High impact failure

case

  • Simple result to

interpret

  • Covers high

complexity code

  • Covers a very

common use case 1.5x ASCI

slide-14
SLIDE 14

14

[1] By David B. Gleason from Chicago, IL (The Pentagon) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons [2] By Rev Stan, Harry Potter studio tour: The cupboard under the stairs [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Flickr

OK, where are you going to get all the clients from?

(Note: this will depend on your architecture)

You might not have

  • ne of these!
slide-15
SLIDE 15

15

DB

AH AH

ePolicy Orchestrator

“5_G>I’N^O!”

N nodes N nodes

slide-16
SLIDE 16

16

So why did we have to simulate? (Optimization) x 100

Not testing Steve’s true ability to cook under heavy demand.

slide-17
SLIDE 17

17

So why did we have to simulate?

Meaningful data helps uncover the limitations of the system. (for us, it was user data)

slide-18
SLIDE 18

18

Example causes of limitations Larger calculations Cache memory Connection pools Contention Disk IO Network IO

Recommendation: keep the hardware consistent, and don’t use virtualization unless you expect your customers to use it.

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

Key take-away #2: Define your test scenarios sensibly. Aim for broad coverage Keep acceptance criteria simple Target complex areas Suitable tools for gathering results

slide-22
SLIDE 22

22

# Nodes # requests/second

So how did we run the tests?

(the goal was 100k, but we needed to find the limit) Increasing cost (setup time)

slide-23
SLIDE 23

23

  • The first scalability tests were fireworks.

– Crashes, memory leaks, deadlocks. – All uncovering high severity defects.

  • We identified bottlenecks, then optimized.

– Expensive calculations. – Expensive SQL transactions.

  • We finally obtained a level of confidence.

– Now we’re ready to sell it.

What were our findings?

(bearing in mind this was a new integration)

slide-24
SLIDE 24

24

The results

ePO, Agent Handler and SQL server hardware: Dell PowerEdge R515, 2.6GHZ 6C, 8GB, 7.2K SATA Dell PowerEdge R715, 2x 2.0GHZ 8C, 8GB, 15K SAS ASCI: 4 hours Nodes: 100,000 Average requests per second (to DB): ~7 All tests passed on this configuration. Notes: no other point products were installed. These results are advisory only.

slide-25
SLIDE 25

25

How might this apply elsewhere?

slide-26
SLIDE 26

26

Cost vs Gain

[Confidence in] ability to meet demand Investment ($) in pushing the limits

Law of diminishing returns

slide-27
SLIDE 27

27

Key take-away #3: Invest in Scalability appropriately (it’s a bottomless pit, if you want it to be)

slide-28
SLIDE 28

28

Summary

  • Understand the risks of your system not

meeting its Scalability requirements.

  • Define your test scenarios sensibly.
  • Invest appropriately in Scalability testing.
  • Have fun, and enjoy the fireworks!
slide-29
SLIDE 29

29

Questions?

Neha_Rai@McAfee.com Tim_Schooley@McAfee.com Tejas_Patil@McAfee.com Remember to take the in-app Presentation Survey!

slide-30
SLIDE 30