M2 Wednesday, March 27, 2002 11:30 AM M EASURING THE E FFECTIVENESS - - PDF document

m2
SMART_READER_LITE
LIVE PREVIEW

M2 Wednesday, March 27, 2002 11:30 AM M EASURING THE E FFECTIVENESS - - PDF document

BIO P R E S E N T A T I O N PRESENTATION M2 Wednesday, March 27, 2002 11:30 AM M EASURING THE E FFECTIVENESS OF A UTOMATED F UNCTIONAL T ESTING Ross Collard Collard & Company International Conference On Software Test Automation March


slide-1
SLIDE 1

BIO P R E S E N T A T I O N PRESENTATION International Conference On Software Test Automation March 25-28, 2002 San Jose, CA, USA Wednesday, March 27, 2002 11:30 AM

MEASURING THE EFFECTIVENESS OF AUTOMATED FUNCTIONAL TESTING

Ross Collard

Collard & Company

M2

slide-2
SLIDE 2

Ross Collard

Ross Collard is president of Collard & Company, a consulting firm located in Manhattan. His experience includes several software testing & QA projects; strategic planning for technology; and managing large software projects. His consulting and training clients have included: ADP, American Express, Anheuser- Busch, AT&T, Banamex, Bank of America, Baxter Healthcare, Bechtel, Blue Cross/Blue Shield, Boeing, British Airways, the CIA, Ciba Geigy, Cisco, Citibank, Computer Associates, Dayton Hudson, DEC, Dell, EDS, Exxon, General Electric, Goldman Sachs, GTE, the Federal Reserve Bank, Ford, Fijitsu, Hewlett-Packard, Hughes Aircraft, Intel, Johnson & Johnson, JP Morgan, Lucent, McGraw Hill, MCI, Merck, Microsoft, Motorola, NASA, Nortel, Novell, Oracle, Procter & Gamble, Prudential, IBM, Swiss Bank and the U.S. Air Force.

  • Mr. Collard has conducted seminars on business and information technology topics for

businesses, governments and universities, including George Washington, Harvard and New York Universities, MIT and U.C. Berkeley. He has lectured in the U.S.A., Europe, the Middle East, the Far East, South America and the South Pacific. He has a BE in Electrical Engineering from the University of New Zealand (where he grew up), an MS in Computer Science from the California Institute of Technology and an MBA from Stanford University. He can be reached at rcollar@attglobal.net.

slide-3
SLIDE 3

1

MEASURING THE EFFECTIVENESS OF AUTOMATED FUNCTIONAL TESTING Ross Collard, Collard & Company With thanks to James Bach, Elfriede Dustin, Dot Graham, Sam Guckenheimer, and Bret Pettichord. Overview – The Issues We Will Discuss What problem are we trying to solve, and what questions do we want to ask about our test automation? What information do we need to develop credible answers? Where do we get the information, and how trustworthy and precise does it need to be? How do we derive conclusions from the information? What problems are we likely to encounter, and how do we handle them?

slide-4
SLIDE 4

2

What Problem are We Trying to Solve?

  • Justify past expenditures on test automation (a post-mortem).
  • Lobby for future investments (in tools, equipment, staff,

training, etc.)

  • Assess the current status of our automation.
  • Before-and-after comparisons within our organization.
  • Benchmarks to industry peers.
  • Set direction. (Where do we go from here?)
  • Change focus.
  • Retrench / abandon.
  • Champion and encourage further automation.
slide-5
SLIDE 5

3

A Caution Although test automation effectiveness has been studied relatively little, information technology (IT) investment effectiveness has been widely measured. Large-scale studies from MIT, Morgan Stanley and others have concluded that the level of IT investment is NOT a predictor of company profitability or growth. Why? The issue is HOW organizations invest. Low-performing companies use IT to automate existing manual business processes, while the high performers make IT a catalyst to change the business processes to improve customer value. This is technology adoption at work: by the time there is sufficient trustworthy data to form valid conclusions, the questions are moot.

slide-6
SLIDE 6

4

Reasons NOT to Measure

  • Data can be difficult or impossible to acquire.
  • The effort often is time consuming, distracting from other

activities.

  • Assumptions are needed about unknowns.
  • The data analysis can take some weird turns – can the

conclusions logically be supported by the data??

  • The results can lack credibility and be subject to criticism.
  • The results could be embarrassing.
slide-7
SLIDE 7

5

Reasons that We Must Measure

  • You can’t manage what you can’t measure.
  • Without findings and conclusions, you’re just another opinion.
  • In the hustle and bustle of test activities, it is hard to know the

situation without standing back and reflecting.

  • It is unprofessional not to objectively evaluate and report back

to the investors in test automation (managers, clients).

  • An honest effort to evaluate our effectiveness will win kudos

and respect.

slide-8
SLIDE 8

6

Characteristics which Influence the Effectiveness of Test Automation

  • Many organizations have unrealistic goals for test automation,

sometimes aided by vendor snake oil.

  • Many organizations have vague goals for automation.
  • Many organizations fail with automation, though it all depends
  • n how we define success and failure. For example, some
  • rganizations abandon using the tools.
  • The time required to develop and maintain automated test cases
  • ften goes way up.
slide-9
SLIDE 9

7

Common Characteristics (continued)

  • The time to execute automated test cases goes way down.
  • The elapsed time for testing decreases, sometimes

dramatically.

  • The number of problems found increases, sometimes

dramatically.

  • The tool acquisition or development cost usually is only 5% to

15% of TCO (total costs of ownership, including training, internal and external support, tool upgrades, etc.)

slide-10
SLIDE 10

8

Common Characteristics (continued)

  • Many of the costs are overhead and hidden unless the
  • rganization has finely tuned cost accounting systems, in areas

like training, test library maintenance, and centralized support for test automation on decentralized projects.

  • The savings from test automation generally do not come from

tester headcount reduction.

  • Testers’ morale can increase – there’s less scut work – or

decrease, because they don’t like automation or the tools.

  • Testers’ skills often need extensive upgrading, especially if the

tools or test environment are quirky.

slide-11
SLIDE 11

9

Common Characteristics (continued)

  • Re-use of test cases for regression testing often rises

significantly with automation, and re-test time is cut significantly too.

  • There can be long learning curves and long lead times before

the pay-off from test automation is realized, as with the building of any other kind of infrastructure. Don’t measure too

  • early. (Though interim progress reports are important to keep

the faithful believing.)

slide-12
SLIDE 12

10

Testers’ (and Managers’) Questions about Test Automation (1) To Evaluate Effectiveness:

  • Can we show justification for the investments we’ve made?
  • What were our original goals for test automation?
  • Do the benefits realized match expectations?
  • How much have we spent on automation?
  • How much would we have spent on comparable testing

without automation: what’s the equivalent manual test effort (Dot Graham’s EMTE)?

  • Is test automation helping or hurting?
slide-13
SLIDE 13

11

Testers’ and Managers’ Questions (continued)

  • How effective is our automated testing in finding problems?
  • How many problems are being found in testing?
  • -Manual vs. automated.
  • -By type of defect.
  • -By level of severity.
  • What are the levels of a) irreproducible and b) false test

results?

  • -Manual vs. automated.
  • What costs have been avoided by the defects found not

causing failures?

  • What types of problems are being MISSED with

automated testing?

slide-14
SLIDE 14

12

Testers’ and Managers’ Questions (continued)

  • How reliable are the test results?
  • What is the test coverage? (Even if more defects are not

being found, higher coverage is an indicator of higher confidence in the test results.)

  • -Manual vs. automated.
  • -As measured by coverage tools.
  • -As assessed subjectively (if coverage tools are not

available).

  • Have testing practices become more organized and

consistent with automation (less vagaries)?

slide-15
SLIDE 15

13

Testers’ and Managers’ Questions (continued)

  • How effective is our automated testing in encouraging test case

and test facility re-use?

  • What percentage of test cases are re-run in regression

testing?

  • -Manual vs. automated.
  • -Our experience vs. industry norms.
  • What percentage of test cases are re-used across multiple

test projects?

  • -Manual vs. automated.
slide-16
SLIDE 16

14

Testers’ and Managers’ Questions (continued)

  • How effective is our automated testing in speeding delivery?
  • How quickly is the testing completed?
  • -Manual vs. automated.
  • What impact does automated testing have on delivery time

(e.g., by reducing re-work)?

  • How do we compare with industry norms?
  • What is the impact on user satisfaction?
  • Periodic user satisfaction surveys.
  • -By customer segment or user group.
  • -By system or version of a system.
  • -Areas tested by manual vs. automated means.
slide-17
SLIDE 17

15

Testers’ and Managers’ Questions (continued)

  • What are the costs of testing?
  • How much elapsed time and tester hours do each suite of

test cases require (for automated test cases vs. the manual equivalents)?

  • -Test case development.
  • -Test case maintenance.
  • -Test execution.
  • -Results evaluation and follow-up.
  • What is the cost of the equipment tied up in testing?
  • -Test case development and maintenance.
  • -Test execution.
  • How many times does a test case need to run to break even

(earn back its development and maintenance costs)?

slide-18
SLIDE 18

16

Testers’ and Managers’ Questions (continued)

  • What is the relative effectiveness of the tools for us?
  • For an organization which is using multiple tools, which

tools are working better in that organization?

  • Comparatively, how do we treat our testers?
  • Are there salary (and perks) differences?
  • -Manual vs. automated.
  • Are there differences in the turnover of test staff?
  • -Manual vs. automated.
  • How do we compare with industry patterns in human

resources practices?

  • How’s morale? Do the testers like how the organization

has automated and how it has impacted them?

slide-19
SLIDE 19

17

Testers’ and Managers’ Questions (continued) (2) To Set Further Direction:

  • Where are the most fruitful areas for us to focus our test

automation?

  • Where is the majority of the testers’ manual effort being

expended?

  • Are there areas which have been automated but where the

automation currently is not very effective?

  • What are the major bottlenecks and areas of project

delays?

  • Are we testing the right things?
  • -Defects found in testing vs. operation.

–What are the defect densities (defects per test case, etc.)

slide-20
SLIDE 20

18

Testers’ and Managers’ Questions (continued) (3) For Newcomers to Automation:

  • Are we ready for test automation (or to escalate our investment

in automation)?

  • What is the maturity of our test processes?
  • What are the success and failure rates of other similar
  • rganizations with test automation?
  • What is the expected duration to pay back the investment?
  • Are there reality checks for newcomers to automation, to

counter inflated claims?

slide-21
SLIDE 21

19

Testers’ and Managers’ Questions (continued)

  • How will automation impact our test group?
  • Tools?
  • -Which tool or set of tools is most likely to meet our

needs?

  • -Should we build our tool(s) or use commercially

available ones?

  • People?
  • -Increase or reduce tester headcount?
  • -Change in the mix of skills needed in the test group?
  • -Change in the mix of work activities in the test group?
  • Test equipment?
  • Test procedures?
slide-22
SLIDE 22

20

Another Caution … We’ll probably never be able to answer all these questions: it is a long laundry list. Being overly ambitious and trying to measure the world condemns many metrics projects to fail. We need to zero in on what’s really critical to us on the prior list of “nice to know” items, and decide what is feasible based on our available information sources.

slide-23
SLIDE 23

21

Information Sources The Budget as a Source of Test Automation Cost Data Realistic budgets should be developed, and ideally costs tracked in these categories:

  • Tool acquisition costs.
  • Vendor maintenance and support.
  • Training.
  • External consulting support.
  • Internal support.
  • Additional test equipment and facilities (above what would be

needed for manual testing).

  • Testers’ costs (salaries and overheads).
  • Debugging and fixing costs.
slide-24
SLIDE 24

22

The Budget as a Source (continued) People like cost-based analyses because SOME costs are relatively easy to collect measure. We should distrust cost numbers for two reasons: (1) Test managers do not have much control over costs. The management game is often to get whatever budget you can and then figure out how to use it most effectively. (2) Measurable cost savings may be non-existent or negligible, compared to “intangibles” such as faster system delivery, better test coverage and more regression testing.

slide-25
SLIDE 25

23

Information Sources (continued) Problem Databases and Test Logs: These can be a treasure house of data that’s trustworthy, pertinent, easy to extract and use:

  • Number of problems, by type and level of severity.
  • Cross-referenced to test cases.
  • Readily categorized: manual vs. automated.
  • Timing – when problems were found (early or late).
  • -This one is crucial – automation finds bugs faster.
  • Time to test, debug and fix (maybe).
  • Test case data.
  • Re-use of test cases.
  • Irreproducible and false test results.
slide-26
SLIDE 26

24

Information Sources (continued) Defect Density One way to assess test effectiveness is to count the number of defects found per test case (the defect density). This measure is really useful only for comparing the effectiveness of different test suites (e.g., manual vs. automated), when they are applied to the same system. A test suite which works well for one system may not work well with another: the conclusions from the assessment of effectiveness may not be portable across different systems or versions of the same system. The counted defects should be weighted by level of severity.

slide-27
SLIDE 27

25

Information Sources (continued) User Satisfaction Surveys Preferably, we’ll piggyback on existing surveys. Typical questions (asked of users of systems with manual vs. automated testing):

  • 1. Information Technology (IT) understands our business issues.
  • 2. IT has the right systems and services to meet our needs.
  • 3. IT has a clear and consistent technology direction.
  • 4. The IT organization is customer-focused and effective.
  • 5. IT projects are generally successful in meeting our objectives

and contain few surprises.

  • 6. IT costs are fair and competitive.
slide-28
SLIDE 28

26

User Satisfaction Surveys (continued)

  • 7. IT uses its resources productively.
  • 8. IT provides reliable services.
  • 9. IT is sufficiently responsive to our requests for services.
  • 10. IT provides timely and effective responses to systems problems

and enhancement requests.

  • 11. IT's technical staff is competent.
  • 12. IT is an effective partner in our business strategy and planning.
  • 13. IT includes us as appropriate in decisions that affect us.
  • 14. IT work policies and standards are clear and effective.
  • 15. IT is proactive in suggesting how we can apply technology for

competitive advantage. Many factors influence the answers besides test automation.

slide-29
SLIDE 29

27

Information Sources (continued) Coverage Analysis Coverage provides a measure of the completeness of testing, which is not necessarily the same as its effectiveness. It can be measured as black-box feature coverage (the percentage of features tested, and how thoroughly they were tested). This approach requires a comprehensive listing of the features, and a mapping of the test cases to the features, and is not precise. In addition or as an alternative, branch or statement coverage can be tracked at the source code level by automated coverage tools. This is precise, but the implications of the measurements can be hard to discern.

slide-30
SLIDE 30

28

Information Sources (continued) Publicly Available Metrics (Industry Norms)

  • There’s no central repository of data – we have to hunt and

peck for it.

  • The validity and applicability of the data is often doubtful.
  • Nevertheless, many “rules of thumb” are available.
  • Speakers at a conference like this are a good source.
  • Vendors and consultants may have axes to grind.
slide-31
SLIDE 31

29

Information Sources (continued) Areas which need Intelligent Speculation

  • Areas where measurements can’t be taken.
  • E.g., what costs have been avoided by the defects found

through automation?

  • Areas where observations can be made but the findings are

subjective, not quantitative.

  • E.g., has the testers’ morale improved? Has developers

respect for testers improved? In assessing test effectiveness, it is a good idea to supplement the bare numbers with impressionistic descriptions (“stories”).

slide-32
SLIDE 32

30

Potential Problems There is a danger that if too much attention is paid to gathering the metrics, the measurement process will get in the way of the test

  • automation. (I mentioned before that measurement is time

consuming.) Metrics projects can take on a holy mission in themselves and

  • ccasionally overshadow everything else. Common sense needs to

be applied in decide how much data can be gathered, and reasonable guesstimates will sometimes have to stand in for hard measured data. The formulation of conclusions is only as good as the assumptions which underlie them – we need to recognize them and get buy-in.

slide-33
SLIDE 33

31

Potential Problems (continued) There are often lots of difficulties in data collection, and compromises are required. Baselines (the “before” set of metrics for before-and-after comparisons) are often informal or absent. Unless fairly sophisticated metrics programs are already in place, establishing the measures, collecting and validating the data, and developing the baseline can take months – or have to be worked around. In addition, if the environment is not very stable, this baseline will shift significantly anyway over the next 6 to 12 months, regardless

  • f whether the testing is automated or not.
slide-34
SLIDE 34

32

Potential Problems (continued) Measurements often have imprecision, biases and credibility

  • issues. People can form different opinions from the same data.

Difficulties occur in developing conceptual models of what’s really going on. The lack of a suitable model leads to a failure to interpret, analyze and correctly use the data once it has been collected. Backwards vision: the measurements tend to look in the rear view mirror. Measuring and reporting conclusions in technical terms, not the business terms which the decision-makers can relate to.

slide-35
SLIDE 35

33

In Summary – Helpful Hints

  • First understand the “why” of your measurement goals – what

you want to accomplish and how you will use the results.

  • Gain an understanding of the information sources available to

you and what compromises have to be made.

  • Define a path to get “there” (your final presentation of findings

and conclusions) from “here”.

  • Plan ahead and decide early what to measure – you can’t go

back and record data in the past.

  • Subject your analysis to the test of common sense.