1
1
Software Estimation Concepts
Presented for BlueCross BlueShield of South Carolina By The Rushing Center Furman University
Seagull Systems Analysis Series TM
Module 1: Overview of Estimating
2
Software Estimation Concepts Seagull Systems Analysis Series TM - - PDF document
Software Estimation Concepts Seagull Systems Analysis Series TM Presented for BlueCross BlueShield of South Carolina By The Rushing Center Furman University 1 SA : Software Estimation Concepts Module 1: Overview of Estimating 2 1 Art
1
2
The typical software organization is not struggling to
The typical software organization is struggling to avoid
Complex formulas aren't necessarily better. Software projects are influenced by numerous factors
This seminar emphasizes rules of thumb, procedures,
3
When executives ask for an estimate they often are
4
“These functions need to be completed by July 1 so
5
The goal is accuracy
6
“This project will take 14 weeks”
Usually a target masquerading as an estimate.
7
Single-point estimates assume 100% probability of the actual outcome equaling the planned outcome. This isn't realistic.
8
They must be supported by effective project
9
This definition is the foundation of the discussion
10
If the planned progress is realistic (that is, based on
Helps avoid schedule-stress-related quality problems.
When schedule pressure is extreme, about four times
11
Parkinson’s Law
“Gold plating” Procrastination
12
Reduced effectiveness of project plans. When planning assumptions are wrong by a large
Statistically reduced chance of on-time completion. Developers typically underestimate by 20-30% of their
Poor technical foundation leads to worse than nominal
A low estimate can lead to too little time spent on up stream
13
Destructive late project dynamics make the project worse
More status meetings to discuss how to get the project back
Frequent reestimations late in the project to determine just
Loosing face with key customers for missing delivery dates. Preparing interim releases to support customer requirements.
More discussions about which requirements absolutely must
Fixing problems from quick and dirty workarounds
14
15
16
17
18
19
20
21
22
23
24
You must force the Cone to narrow by removing
Defining requirements—again, including what you are
Designing the user interface helps to reduce the risk of
If the product isn't really defined, or if the product
25
1. Start with a "most likely" estimate and then compute
26
27
Source: Adapted from Software Estimation with Cocomo II (Boehm et al. 2000).
28 10/20/2014
Feasibility + 100%, - 50% Requirements + 50%, - 25% Design + 20%, - 10% Coding +10%, - 5% Testing +5%, - 2.5%
Your current estimate is 30 staff months, and you
You’d say that the expected range is between 36
29
As a first step, fixing the chaos is more important than
30
Requirements that weren't investigated well in the first
Lack of end-user involvement in gathering and validating
Poor designs that lead to numerous errors in the code Poor coding practices that give rise to extensive bug fixing Inexperienced personnel Incomplete or unskilled project planning Abandoning planning under pressure Developer gold-plating Lack of automated source code control
31
NASA's Software Engineering Laboratory plans on a
32
One of the most common sources of estimation error is
One study found that developers tended to estimate
They tended to overlook 20% to 30% of the necessary
Omitted work falls into three general categories:
Missing requirements, Missing software-development activities, Missing non-software-development activities.
33
34
35
36
Unfounded Optimism
Don't reduce developer estimates—they're probably too
Subjectivity and Bias
The belief that the more places there are to tweak the
The reality is the opposite - more chances for subjectivity
37
Intuition and guessing in project estimates were both
Even a 15-minute estimate will be more accurate.
Difference between accuracy and precision. A measurement can be precise without being accurate,
Airline schedules are precise to the minute, but they are
Measuring people's heights in whole meters might be
Project stakeholders make assumptions about project
38
For example, assuming the project team will focus on the
Especially adding together a set of "best case" estimates or
Especially those that require final budget approval in the
39
40
41
The largest driver in an estimate is the size of the
Organizations routinely violate this fundamental fact in
Costs, effort, and schedule are estimated without
Costs, effort, and schedule are not adjusted when the
42
43
Effort for a 1,000,000-LOC system is more than 10 times
Larger projects require coordination among larger
The larger the system becomes, the greater the cost of each unit 44
45
There isn't a simple technique in the “art” of estimation that
When estimating a project of a significantly different size
Use estimation software that applies the “science” of
If the new project you're estimating will be similar in
lines of code per staff month, to estimate a new project.
46
Life-critical software requires far more effort than a similarly sized
47 Type of Software 10,000-LOC Project 100,000-LOC Project 250,000-LOC Project Business Systems 800–18,000 (3,000) 200–7,000 (600) 100–5,000 (500) Internet Systems (public) 600–10,000 (1,500) 100–2,000 (300) 100–1,500 (200) Intranet Systems (internal) 1,500–18,000 (4,000) 300–7,000 (800) 200–5,000 (600) Real-Time 100–1,500 (200) 20–300 (50) 20–300 (40) Scientific Systems/Engineering Research 500–7,500 (1,000) 100–1,500 (300) 80–1,000 (200) Shrink wrap/Packaged Software 400–5,000 (1,000) 100–1,000 (200) 70–800 (200) Systems Software/Drivers 200–5,000 (600) 50–1,000 (100) 40–800 (90) Telecommunications 200–3,000 (600) 50–600 (100) 40–500 (90)
LOC/Staff Month Low-High (Nominal)
Notice the ranges are large—typically a factor of 10
Will automatically incorporate the development factors
By far the best approach. (we'll discuss the use of
48
49
The table on the next slide lists the Cocomo II ratings
The Very Low column represents the amount you would
For example, if a team had very low "Applications
We will discuss COCOMO II in more depth later.
50
51
This graph presents
The factors represented
More observations are
52
53 Cocomo II Factor Observation Applications (Business Area) Experience Teams that aren't familiar with the project's business area need significantly more time. This shouldn't be a surprise. Architecture and Risk Resolution The more actively the project attacks risks, the lower the effort and cost will be. This is one of the few Cocomo II factors that is controllable by the project manager. Database Size Large, complex databases require more effort project-wide. Total influence is moderate. Developed for Reuse Software that is developed with the goal of later reuse can increase costs as much as 31%. This doesn't say whether the initiative actually succeeds. Industry experience has been that forward-looking reuse programs often fail. Personnel Continuity (turnover) Project turnover is expensive—in the top one-third of influential factors. Process Maturity Projects that use more sophisticated development processes take less effort than projects that use unsophisticated processes. Cocomo II uses an adaptation of the CMM process maturity model to apply this criterion to a specific project. Product Complexity Product complexity (software complexity) is the single most significant adjustment factor in the Cocomo II model. Product complexity is largely determined by the type of software you're building. Requirements Analyst Capability The single largest personnel factor—good requirements capability—makes a factor of 2 difference in the effort for the entire project. Competency in this area has the potential to reduce a project's overall effort from nominal more than any other factor. Requirements Flexibility Projects that allow the development team latitude in how they interpret requirements take less effort than projects that insist on rigid, literal interpretations of all requirements. Time Constraint Minimizing response time increases effort across the board. This is one reason that systems projects and real-time projects tend to consume more effort than other projects of similar sizes. Use of Software Tools Advanced tool sets can reduce effort significantly.
54
Estimate driven by features to be delivered or Driven by budget and time
Small projects
Large projects (team of 25 people or more, 6-12 months or more)
Medium projects (5 to 25 people, 3 to 12 months.)
55
Both usually start with top-down or statistically based
Both eventually migrate toward bottom-up techniques. Iterative projects transition to refining their estimates
56
This seminar defines development stages as follows: Early
Middle
Sequential project:
Iterative projects:
Late
57
Many of the following slides will begin with a table that
58
59
60
Karl had the historical data of knowing that the banquet
He counted the number of tables and then computed
Lucy based her estimate on the documented fact of the
She used her judgment to estimate the room was 70
The least accurate estimate came from, Bill
He used only judgment to create the answer.
61
Something that will be a strong indicator of the
Number of marketing requirements Number of engineering requirements Function Points (more about these later) Number of Web pages. Test cases
62
Create a rough estimate based on a count of
Then tighten up the estimate later based on a
63
A sample of at least 20 items for the average to be
Be sure the same assumptions apply to the count that
Ticket scanner vs. manually counting everyone at
64
65
Quantity to Count Historical Data Needed to Convert the Count to an Estimate Marketing requirements
requirements Features
Use cases
calendar time Software requirements
Function Points
Change requests
effort per small, medium, and large change request) Web pages
data point) Reports
Dialog boxes
66
67
68
69
Calibration is used to:
convert counts to estimates lines of code to effort requirements to number of test cases etc.
Calibration can use three kinds of data:
Industry data - data from other organizations that develop
Historical data - data from the organization that will conduct
Project data - data generated earlier in the same project
Historical data and project data are highly useful
Support creation of highly accurate estimates
70
71
How complex is the software, what is the execution time
Can the organization commit to stable requirements, or
Is the project manager free to remove a problem team
Is the team free to concentrate on the current project, or
72
Can the organization add team members to the new project
Does the organization support the use of effective design,
Does the organization operate in a regulated environment
Can the project manager depend on team members staying
73
Size
Effort (staff months) Time (calendar months) Defects (classified by severity) 74
Convert the data to a model
Developers average X lines of code per staff month. Our team is averaging X staff hours per use case to create the
Our testers create test cases at a rate of X hours per test case. In our environment, we average X lines of code per function
On this project so far, defect correction work has averaged X
Models are all linear.
The math is the same whether you're building a 10,000-LOC
75
Collect data from your current project and use that as a
The goal should be to switch from using organizational
The more iterative your project is, the sooner you'll be able
76
Your organization might be at the top end of the
Will reduce variability in your estimate arising from
77
78
Use of Structured Process Use of Estimation Checklist Estimating Task Effort in Ranges Comparing Task Estimates to Actuals What's estimated Effort, Schedule, Features Effort, Schedule, Features Size, Effort, Schedule, Features Size, Effort, Schedule, Features Size of project S M L S M L S M L S M L Development stage Early–Late Early–Late Early–Late Middle–Late Iterative or sequential Both Both Both Both Accuracy possible High High High N/A
Individual expert judgment is the most common
83% of estimators used "informal analogy" as their primary
Being expert in the technology or software development
Time needed to code and debug a particular feature or
People who will actually do the work create the
Estimates prepared by people who aren't doing the
More likely to underestimate than estimator-developers
79
The result is that a one line entry on the schedule,
80
Decompose estimates into tasks that will require no more
Tasks larger than that will contain too many places that
Ending up with estimates that are at the 1/2 day, or full day
Create both Best Case and Worst Case estimates to stimulate
81
Estimate the Most Likely Case using expert judgment.
82
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
83
Keep a list of your estimates
Compute the Magnitude of Relative Error (MRE)
84
Estimated Days to Complete Feature Best Case Worst Case Expected Case Actual Outcome MRE In Range from Best Case to Worst Case? Feature 1 1.25 2 1.54 2 23% Yes Feature 2 1.5 2.5 1.83 2.5 27% Yes Feature 3 2 3 2.33 1.25 87% No Feature 4 0.75 2 1.13 1.5 25% Yes Feature 5 0.5 1.25 0.79 1 21% Yes Feature 6 0.25 0.5 0.46 0.5 8% Yes TOTAL 10.50 18.25 13.625 16.25 80% Yes Average 29%
Decomposition by Feature or Task Decomposition by Work Breakdown Structure (WBS) Computing Best and Worst Cases from Standard Deviation What's estimated Size, Effort, Features Effort Effort, Schedule Size of project S M L
S M L Development stage Early–Late (small projects); Middle–Late (medium and large projects) Early–Middle Early–Late (small projects); Middle–Late (medium and large projects) Iterative or sequential Both Both Both Accuracy possible Medium–High Medium Medium
85
The errors on the high side and the errors on the low
86
Decomposing a project via an activity-based work
The table on the next slide shows a generic, activity-
The left column lists the category of activities such as
The other columns list the kinds of work within each
87
88
Combine the column descriptions with the categories—for example, Create/Do Planning, Manage Planning, Review Planning, Create/Do Requirements Work, Manage Requirements Work, Review Requirements Work, Create/Do Coding, Manage Coding, Review Coding, and so
represent the most common combinations. You will probably need to extend the list to include at least a few additional entries related to specifics of your organization's software-development approach. You might also decide to exclude some of this WBS's categories.
To deliver both the first task and the second task on time, you have to
Statistically, those odds are multiplied together, so the odds of
To complete 10 tasks on time you have to multiply the 1/4s 10 times,
The odds of 1 in 4 might not seem so bad at the individual task
89
This is based on the assumption that the minimum is
90
91
92 Weeks to Complete Feature Best Case (25% Likely) Most Likely Case Worst Case (75% Likely) Expected Case (50% Likely) Feature 1 1.6 2.0 3.0 2.10 Feature 2 1.8 2.5 4.0 2.63 Feature 3 2.0 3.0 4.2 3.03 Feature 4 0.8 1.2 1.6 1.20 Feature 5 3.8 4.5 5.2 4.50 Feature 6 3.8 5.0 6.0 4.97 Feature 7 2.2 2.4 3.4 2.53 Feature 8 0.8 1.2 2.2 1.30 Feature 9 1.6 2.5 3.0 2.43 Feature 10 1.6 4.0 6.0 3.93 TOTAL 20.0 28.3 38.6 28.62
Percentage Confident Calculation 2% Expected case – (2 x StandardDeviation) 10% Expected case – (1.28 x StandardDeviation) 16% Expected case – (1 x StandardDeviation) 20% Expected case – (0.84 x StandardDeviation) 25% Expected case – (0.67 x StandardDeviation) 30% Expected case – (0.52 x StandardDeviation) 40% Expected case – (0.25 x StandardDeviation) 50% Expected case 60% Expected case + (0.25 x StandardDeviation) 70% Expected case + (0.52 x StandardDeviation) 75% Expected case + (0.67 x StandardDeviation) 80% Expected case + (0.84 x StandardDeviation) 84% Expected case + (1 x StandardDeviation) 90% Expected case + (1.28 x StandardDeviation) 98% Expected case + (2 x StandardDeviation)
93
94
If this percentage of your actual
estimation range... ...use this number as the divisor in the standard deviation calculation for individual estimates 10% 0.25 20% 0.51 30% 0.77 40% 1.0 50% 1.4 60% 1.7 70% 2.1 80% 2.6 90% 3.3 99.7% 6.0
Don't divide the range from best case to worst case by 6
Choose a divisor based on the accuracy of your estimation
Focus on making your Expected Case estimates
If the individual estimates are accurate, aggregation will not
If the individual estimates are not accurate, aggregation will be
95
Estimation by Analogy What's estimated Size, Effort, Schedule, Features Size of project S M L Development stage Early–Late Iterative or sequential Both Accuracy possible Medium
96
97
1.
1.
2.
3.
4.
5.
98
99
100
Database 10 tables User interface 14 Web pages Graphs and reports 10 graphs + 8 reports Foundation classes 15 classes Business rules ???
Database 14 tables User interface 19 Web pages Graphs and reports 14 graphs + 16 reports Foundation classes 15 classes Business rules ???
101
Subsystem Actual Size of AccSellerator 1.0 Estimated Size of Triad 1.0 Multiplication Factor Database 10 tables 14 tables 1.4 User interface 14 Web pages 19 Web pages 1.4 Graphs and reports 10 graphs + 8 reports 14 graphs + 16 reports 1.7 Foundation classes 15 classes 15 classes 1.0 Business rules ??? ??? 1.5
The factors of 1.4 for database, 1.4 for user interface, and 1.0 for foundation classes seem straightforward. The factor of 1.7 for graphs and reports is a little tricky. Should graphs be weighted the same as reports? Maybe. Graphs might require more work than reports, or vice versa. If we had access to the code base for AccSellerator 1.0, we could check whether graphs and reports should be weighted equally or whether one should be weighted more heavily than the other. In this case, we'll just assume they're weighted equally. We should document this assumption so that we can retrace our steps later, if we need to. The business rules entry is also problematic. The team in the case study didn't find anything they could count. For sake of the example, we'll just accept their claim that the business rules for Triad will be about 50% more complicated than the business rules were in AccSellerator.
102
Computing Size of Triad 1.0 Based on Comparison to AccSellerator 1.0 Subsystem Code Size of AccSellerator 1.0 Multiplication Factor Estimated Code Size of Triad 1.0 Database 5,000 1.4 7,000 User interface 14,000 1.4 19,600 Graphs and reports 9,000 1.7 15,300 Foundation classes 4,500 1.0 4,500 Business rules 11,000 1.5 16,500 TOTAL 43,500
103
Look for the following major sources of inconsistency:
Significantly different sizes between the old and new projects—that is,
Different technologies (for example, one project in C# and the other in
Significantly different team members (for small teams) or team
Significantly different kinds of software.
104
Should we fudge the business rules number upward to be conservative in
The focus of the estimate should be on accuracy, not conservatism. Once you move the estimate's focus away from accuracy, bias can creep
That would produce an effort range of 38 to 49 staff months rather than
105
106
"That feature will require exactly 253 lines of code.“ Difficult to directly estimate how many test cases your
How many defects to expect How many classes you'll end up with etc.
107
If you want to estimate a number of test cases, you might
If you want to estimate size in lines of code (LOC), a
108
Once you find your proxy: Estimate or count the number of proxy items and then use a
We will discusses some of the most useful proxy-based
The theme of these techniques is that the whole has a greater
Proxy techniques are useful for creating whole-project or
They are not good for creating detailed task-by-task or
109
Then use historical data about how many lines of code the average
The table below shows an example of how such an estimate might be
110
Average Lines
Feature should be based on your
historical data and are fixed before the estimation begins. Should be presented as “96,000 lines of code" or even "100,000 lines of code"
As a rule of thumb, the differences in size between adjacent categories
Look at past systems and classify each feature as Very Small, Small,
Then count the total number of lines of code for the features in each
Divide that by the number of features to arrive at the average lines of
111 Size Number of Features Count of Total LOC Average LOC Very Small 117 14,859 127 Small 71 17,963 253 Medium 56 28,000 500 Large 169 171,366 1,014 Very Large 119 237,762 1,998
The sizes of individual Small features could range from 50 lines of
Although the rolled-up estimate produced by fuzzy logic can be very
If you don't have at least 20 total features to estimate, the statistics of
112
Fuzzy logic can also be used to estimate effort if you
113
Size Average Staff Days per Feature Number of Features Estimated Effort (Staff Days) Very Small 4.2 22 92.4 Small 8.4 15 126 Medium 17 10 170 Large 34 30 1,020 Very Large 67 27 1,809 TOTAL
3,217
Use the standard components approach to estimate size if
Find relevant elements to count in your previous systems. Dynamic Web pages Static Web pages
Business rules Screens, dialogs, reports, etc. Compute the average lines of code per component for your
Estimate the number of standard components you'll have in
Compute the size of the new program based on past sizes.
114
115
Customers will say, "How can I know whether I want
A good estimator will say, "I can't tell you what it will
116
Nontechnical stakeholders typically not asking for an estimate in
They are asking general size of a specific feature.
Developers classify each feature's size relative to other features
These two sets of entries are then combined, as shown in the
117
118
119
Development Cost Business Value Extra Large Large Medium Small Extra Large 4 6 7 Large –4 2 3 Medium –6 –2 1 Small –7 –3 –1
Feature Business Value Development Cost Approximate Net Business Value Feature A L S 3 Feature F L M 2 Feature C L L Feature D M M Feature G S S Feature ZZ S S ... Feature B S L –3
120
Each team member estimate pieces of the project individually
Then meet to compare team estimates.
Work until reaching consensus on high and low ends of
Don't just average estimates and accept that.
Compute the average, but it’s important to discuss the
Don’t just take the calculated average automatically.
Arrive at a consensus estimate that the whole group accepts.
If there is an impasse, you can't vote. Must discuss differences and obtain buy-in from all group
121
Studies have found that the use of 3 to 5 experts with
122
123
1.
2.
3.
4.
124
5.
6.
7.
8.
125
126
127
Estimation techniques that rely on averaging individual
In studies, however, 20% of the groups' initial estimation
This means that averaging their initial estimates cannot
Using Wideband Delphi, one-third of the groups whose
In other words, the Wideband Delphi estimate turns out to
128
a combined need for uncommon usability algorithmic complexity exceptional performance intricate business rules etc.
it's useful for flushing out estimation assumptions. 129
130
Features Requirements Use cases Function points Web pages GUI components (windows, dialog boxes, reports, and so on) Database tables Interface definitions Classes Functions/subroutines Lines of code
131
Using lines of code for software estimation is similar to
The LOC measure is a terrible way to measure software size,
The LOC measure is the norm of software estimation.
132
Disadvantages of using lines of code:
Simple models such as "lines of code per staff month" are error-
LOC can't be used as a basis for estimating an individual's task
A project that requires more code complexity than the projects
Using the LOC measure as the basis for estimating requirements
Lines of code are difficult to estimate directly, and must be
What exactly constitutes a line of code must be defined carefully
133
One alternative to the LOC measure is function points. A function point is a synthetic measure of program size
Function points are easier to calculate from a
They provide a basis for computing size in lines of code. Many different methods for counting function points
The standard for function-point counting is maintained by
134
The number of function points in a program is based on
External Inputs - Screens, forms, dialog boxes, or control
External Outputs - Screens, reports, graphs, or control signals
135
External Queries - Input/output combinations in which an input results in
Internal Logical Files - Major logical groups of end-user data or control
External Interface Files - Files controlled by other programs with which
136
137
138
The Microsoft Windows NT project produced code at a
because it was a systems software project rather than an
139
Approximate productivity by using industry-average
140
141
If your historical data is for projects within a narrow size range
use a linear model to compute the effort estimate for a
142
If the historical data included effort only for development and testing,
If the historical data also included effort for requirements, project
Not management work Not requirements, and all development work except requirements.
part of why industry-average data varies as much as it does. 143
144
145
The Basic Schedule Equation
A rule of thumb is that you can estimate schedule
Sometimes the 3.0 is a 2.0, 2.5, 3.5, 4.0
146
Switch to some other technique when you know the
147
148
149
If the nominal schedule is 12 months with a team of 7 developers, you
Larger teams require more coordination and management overhead. Larger teams introduce more communication paths, which introduce
Shorter schedules require more work to be done in parallel. The more
150
The consensus of researchers is that schedule compression of more
Not by working harder. Not by working smarter. And not by finding
151
152
153
154
155
156
Database 10 tables User interface 14 Web pages Graphs and reports 10 graphs + 8 reports Foundation classes 15 classes Business rules ??? Database 14 tables User interface 19 Web pages Graphs and reports 14 graphs + 16 reports Foundation classes 15 classes Business rules ???
157
Subsystem Actual Size of AccSellerator 1.0 Estimated Size of Triad 1.0 Multiplication Factor Database 10 tables 14 tables 1.4 User interface 14 Web pages 19 Web pages 1.4 Graphs and reports 10 graphs + 8 reports 14 graphs + 16 reports 1.7 Foundation classes 15 classes 15 classes 1.0 Business rules ??? ??? 1.5 The factors of 1.4 for database, 1.4 for user interface, and 1.0 for foundation classes seem straightforward. The factor of 1.7 for graphs and reports is a little tricky. Should graphs be weighted the same as reports? Maybe. Graphs might require more work than reports, or vice versa. If we had access to the code base for AccSellerator 1.0, we could check whether graphs and reports should be weighted equally or whether one should be weighted more heavily than the other. In this case, we'll just assume they're weighted equally. We should document this assumption so that we can retrace our steps later, if we need to. The business rules entry is also problematic. The team in the case study didn't find anything they could count. For sake of the example, we'll just accept their claim that the business rules for Triad will be about 50% more complicated than the business rules were in AccSellerator.
158
Computing Size of Triad 1.0 Based on Comparison to AccSellerator 1.0
159
Look for the following major sources of inconsistency:
Significantly different sizes between the old and new projects—that is,
Different technologies (for example, one project in C# and the other in
Significantly different team members (for small teams) or team
Significantly different kinds of software.
160