The software cost estimation aims to predict the most realistic effort that is required to finish a software project and so it is critical to the success of a software project management. A Software Cost Estimation affects nearly all management activities, including project bidding, resource allocation and project planning. It is affected by a number of factors, such as implementation efficiency, as well as how much the various reviews and studies completed prior to the software development stage cost. Accurate cost estimation will help us to complete the project on time and within budget. Accurate estimation is important because it has led to extensive research into the methods of software cost estimation. Some important software cost estimation methods have been studied in this research work. In addition, we have set out own criteria, which has been used to compare all the different selected methods. We have also given a score for each evaluation criteria, so that we can compare the different methods numerically for cost estimation. Our observations have shown that it is best to use a number of different estimating techniques or cost models, and then compare the results before determining the reasons for any of the large variations. None of the methods are necessarily better or worse than the others. We found, in fact, that their strengths and weaknesses often complement each other. Therefore, the main conclusion is that there is no one single technique that is best for every situation, and the results of a number of different approaches need to be carefully considered to discover what is the most likely to produce estimates that are realistic.
Estimating the costs of software projects is a critical activity that requires the use of both proper methods and techniques in order to achieve a good estimation of the results. This is a challenging task that poses many obstacles. The size of the software and its accuracy has a great effect on the estimation’s accuracy. Project management also plays a vital role in the guidance of these estimation processes. Much research has been carried out that reflects the rising demands of high-quality software through effective cost estimation [
Software engineers have to apply the theories, tools and methods in a software project in order to solve a problem. However, they must also work within the financial constraints that were predefined. A vital issue which is closely related to a software project’s financial aspects is the accurate estimation of the software cost involved. This helps to manage any software project as it means it will be within the set budget [
Software cost estimation is a very challenging activity in the project management of software because predicting the cost is a difficult process at the early stage of the software’s development [
It is important to state that the estimation of the software’s cost is a continuous activity that begins at the proposal stage and then carries on throughout the life of the project. When project cost management is calculated, it includes the processes that are required to ensure the project is finished within the approved budget. The main processes include [
・ Estimating the costs (including top-down and bottom-up estimates, parametric modelling, etc.);
・ Determining the budget (the cost baseline);
・ Controlling the costs (Earned Value Management (EVM)).
Many approaches have been designed to address this software cost estimation process, which have been proposed by both scientists and researchers trying to create an accurate cost estimation technique that is accurate. The research work gives an extensive overview. It will address a total of five fundamental software cost estimation approaches, and a comparison will be made between the approaches based on evaluation criteria. This will then be thoroughly examined and used throughout the research work.
It is essential to point out that the novelties of this work include: studying important software cost estimation methods, setting out basic criteria (i.e., ease of use, adaptability, accuracy, consistency, interpretable, automatable, tool supported, empirical validations, sensitivity, and handling imprecision and uncertainty), comparing selected cost estimation methods based on these evaluation criteria and giving a score for each evaluation criteria in order to compare the different methods numerically for cost estimation. Moreover, this work provides the implementation for one of the well known software cost estimation models that indicate both the time and the effort required to complete a software project of a specific size.
The paper will be structured as follows. Section 2 provides a list and summary of some of the existing approaches to software cost estimation. Section 3 lists the evaluation characteristics and Section 4 provides both discussions and comparison between the different approaches. Finally, Section 5 gives the conclusion and also future directions.
We will list and summarize some of the existing approaches to software cost estimation throughout this section. For each approach, we will also describe the mechanisms and features. We can divide Cost Estimation Techniques into two main broad categories. These are those that utilize the source lines of codes (SLOC) as their input and others that do not.
It is considered that COCOMO is a very important model that can calculate a software cost estimate.This uses an algorithmic formula in order to estimate the software’s cost [
The first model, which is the Basic one, is used as a function of the program size in the computing software effort and cost. It is primarily used for small to medium-sized software projects in order to perform a speedy, rough estimation. Basic COCOMO is, therefore, considered to be effective in circumstances where only a rough effort estimate is required. The equation for estimating the software effort for this basic model is:
Effort = a * ( SIZE ) b (1)
The SIZE is measured in this equation in a thousand delivered source instructions (KLOC, this is a thousand lines of code). Both of the coefficients, “a” and “b”, are productivity coefficient, as well as the scale factor coefficient, respectively. It is vital to point out here that the value of the coefficients all depends on the modes of the project. Three different modes of the project proposed by Boehm are the Organic mode, the Semi-detached mode and the embedded mode. The first is organically utilized for small-sized projects of up to 2 - 50 KLOC. The second is a semi-detached mode which is for medium-sized projects of up to 50 - 300 KLOC. Thirdly, the embedded mode is for complex, large projects that are typically over 300 KLOC.
The intermediate model is utilized to compute the effort as a function of the program’s size and the set of cost drivers. This model differs slightly from the Basic one as the Basic COCOMO fails to take into account the software’s development environment, which the intermediate model does. The Intermediate COCOMO has 15 cost drivers that add a level of accuracy to the Basic COCOMO. There are four classes of these cost drivers, which are Computer attributes, Product attributes, Project attributes and Personnel attributes.
The equation that is used for estimating the software Effort for the intermediate model is [
Effort=a ( SIZE ) b × m ( X ) (2)
In the equation, m(X) presents the effort adjustment factor and this is the product of a total of 15 Effort Multipliers. The third one (the detailed model) has two more capabilities. These are phase-sensitive effort multipliers and 3 level product hierarchies. The 3 levels are the Module, Subsystem and system and these are used to derive an accurate estimate.
The authors in [
In this paragraph, the authors will briefly address the Architecture Design of their System for Software Cost Estimation they have proposed. Its main parameters, which will be used as inputs for the proposed methods, are the size, cost factors and the scale factors. These parameters are all from the Actual Dataset that has been collected, as per the Project Specification. The second step is to apply the PCA. This is done by calculating the correlation coefficient matrix, as well as the Eigen-value of correlation coefficient matrix. The amount of principal components can be determined after that. These components are fed as input into the neural network system in order to train the dataset. The output layer then sends the size, effort multiplier and the scale factor values to COCOMOII. From these inputs, which are sent from the neural network system COCOMOII, the software cost can be estimated. This result was based on the COCOMO sample dataset, which is widely used by researchers. It consists of over 161 historical projects collected from various countries all over the world. The results show that the Hybrid technique provides a more accurate cost estimation than those provided by the same type of algorithm when the PCA and neural network are not applied.
The Putnam/SLIM estimating method was developed in the late 1970s by Larry Putnam of Quantitative Software Management, as is highlighted in the references [
B 1 3 * Size Productivity = Effort 1 3 * Time 4 3 (3)
The software equation in practical use is solved for effort when making an estimate for a software task [
Effort = [ Size ( Productivity * Time ) ( 4 3 ) ] 3 * B (4)
The estimated size of the software when the project is completed and the productivity of the organisational process is used. The Time-Effort Curve is calculated by plotting the effort as a function of time and the estimated total effort that it takes to complete the project is represented by the points along the curve [
This method of estimating is quite sensitive to uncertainty in the size and productivity process. Putnam advocates getting this process productivity through calibration [
ProcessProductivity = [ Size [ Effort B ] 1 / 3 * Time ( 4 3 ) ] (5)
SLIM’s Advantages
・ It utilizes linear programming in order to consider the development constraints of both the cost and effort required.
・ One of the Putnam model’s distinguishing features is that the total effort decreases as the time taken to finish the project extends. This is usually represented by a schedule relaxation parameter in other parametric models.
・ SLIM needs fewer parameters in order to generate an estimate over both COCOMO’81 and COCOMO’II.
SLIM’s Drawbacks
・ This model is based on either knowing or being able to accurately estimate the size of the software (in the lines of code) to be developed. There is frequently a lot of uncertainty about the size of the software, which can result in the cost estimation being inaccurate. SLIM’s error percentage is 772.87% [
・ This model is extremely sensitive to development time, as decreasing this can greatly increase the number of people and months that are required for development.
・ It is not suitable for small projects.
Define abbreviations and acronyms the first time they are used in the text, even after they have been defined in the abstract. Abbreviations such as IEEE, SI, MKS, CGS, sc, dc, and rms do not have to be defined. Do not use abbreviations in the title or heads unless they are unavoidable.
Algorithmic models, such as COCOMO, Putnam, etc., need the number of SLOC (source line of codes) to be estimated so as to get both the man-months and the duration estimates. Function Point Analysis is another method that can be used to quantify both a software system’s size and complexity, in terms of which functions it is able to deliver to the user. Allan Albrecht at IBM developed the Function Points Measurement method, which was first published in 1976 [
・ Counting the various user functions
・ Making adjustments for processing the complexity
Currently, the five user function categories are: external output types, external input types, external interface file types, logical internal file types and external inquiry types. It was recognized by Albrecht that the effort that is needed to provide a given level of functionality could vary, and this depended on environmental factors. For example, it is harder to input transactions to a program if much emphasis has been placed on either the system throughput or on end-user convenience. Therefore, Albrecht listed 14 processing complexity characteristics in response to this. These are to be rated on a scale that goes from 0 (which signifies no influence) up to 5 (meaning a strong influence). All the processing complexity points that have assigned are then summed up in the next step. This number is multiplied by 0.01. It is then added to 0.65 in order to obtain the following weighting:
P C A = 0.65 + 0.01 * ( ∑ i = 1 14 c i ) (6)
where PCA = processing complexity adjustment and then ci = complexity factors.
As a result, the various Function Points can vary ± 35 percent from the original Function Counts. Once they have been computed, these Function Points can be used in order to compare the size of the project that is proposed, compared with previous projects.
There are a number of advantages of a function point analysis based model, and these are [
・ The function points can be estimated from either the requirements specifications or the design specifications, which makes it possible to estimate the development costs in the development’s early phases.
・ These function points are independent of the language, tools or methodologies that have been used for implementation.
・ Non-technical users are able to obtain a better understanding of what the function points are measuring, as the function points have been based on the system user’s own external view of the system.
The authors employed Wavelet Neural Network (WNN) in [
The WNN that was used in this study is made up of a total of three interconnected layers. These are the input layer, the hidden layer and an output layer that has a single unit, as shown in
acceptance is determined by deterministic criteria, instead of a probabilistic approach.
The idea is as follows: that the forward part of the back propagation remains undisturbed, while the back propagating of TA updates is done by making all of the weights a vector of decision variables. This TA concept was adopted by the authors in order to train the WNN, which is why it is called the Threshold Acceptance Wavelet Neural Network, or the TAWNN learning algorithm. This training algorithm’s objective function is given as:
M S E = ∑ K = 1 n p ( V k − V a ) 2 (7)
The study’s results demonstrate that the 4-models of WNN that are used in these experiments successfully produce better results compared to the other techniques. The mean magnitude of the relative error (MMRE) of both WNN- Morlet and WNN-Gaussian are successful, compared to both TAWNN-Morlet and TAWNN-Gaussian for the IBMDS and CF datasets.
After describing the previous section’s software cost estimation approaches, we will list our evaluation characteristics that we are going to use to compare them (see
This implies how simple it is to use and how easy it is to utilise a certain technique or approach. One fact that needs to be understood here is that the effort
needed to estimate the cost of software development should be minimal. The approach used should preferably be simple enough to be done in a reasonable amount of time. If a software estimation approach uses a complex formula and algorithm, then the software cost estimation approach is said to have higher complexity and so might be undesirable.
The model’s or method’s ability to adjust to the new environment and fit the development practices’ incremental style is called the adaptability of the model [
The definition of accuracy is how close a result is to the correct value [
Models that have been developed in different environments require calibration to work well. To consistently overestimate or under estimate a model is not as difficult to calibrate as an inconsistent one. As well as accuracy, consistency is an important feature for estimation models [
The modelling technique results all have to be interpretable. For example, if a modelling technique which produces hard-to-interpret results is identified as being the best one, it would not be a useful recommendation. This is because project managers would, in practice, be unlikely to apply a model that could not be understood. This excludes techniques like Artificial Neural Networks [
As many techniques need intensive computation for accuracy, a technique that could be substantially automated [
Software cost estimation tools are able to improve accuracy by carrying out an automated calculation for the project. Tool-supported characteristics are able to point out if the proposed approach has a tool that supports it or not. If it is supported, then the major characteristic of this tool will be highlighted, such as its usability or efficiency [
A model’s evaluation and validation or a general approach is vital. If the model can be validated, the criterion for validation and the dataset that it is validated on are considered. The industry’s datasets are considered to be more reliable than the student datasets or those from open sources [
An input’s receptiveness or responsiveness to an input stimulus is called sensitivity, and in software development, we call a sensitive model one where there is a change in an estimated effort with respect to a small change in the input values. It is desirable to have a low sensitivity in effort/time estimation.
It is common for all software development practices to take into account both the imprecisions and uncertainty that is associated with the processes. There is reasonable imprecision when estimating the software’s size and much uncertainty in predicting the various factors that are associated with developing the software [
Ease of Use is an important criterion for evaluating the various approaches. This determines the degree of simplicity of a given approach, as it will try and answer how easy it is to utilise this approach. It is easy to use the COCOMO Model on small projects. However, it can be difficult to utilise it in large projects due to how complex these projects are and how many unknown variables there are that exist in these situation. Thus, this approach’s score in this characteristic is 12 out of 15. For the hybrid approach (both algorithmic and non-algorithmic methods) proposed by the authors of [
Adaptability is another important characteristic. This gives a degree of adaptability if the given approach can be adjusted, according to new changes and environments. The COCOMO Model is adaptive to the changes, particularly for little projects. In addition, we can just re-compute the values in the case of changes as a result of its mathematical model. It is also important to take the project’s size into account as this can affect the COCOMO Model’s adaptability. It, therefore, has a score of 8 out 10. Similarly, for [
As long as we are feeding this model with almost correct values in terms of its accuracy, the COCOMO Model is very accurate, as this is a mathematical model. But this model will produce the wrong result if we feed it with the wrong values for the variables. In the first case in our study, when we feed the COCOMO Model with almost the correct value for variables, the accuracy characteristic for the model is 12 out of 15. For [
An important feature for any estimation model should be the consistency of its result. We will focus during our evaluation on whether we are able to determine the consistency level or not. The COCOMO Model, Rina et al. [
We will assign a score for this based on the results of the approach, which can be easily interpreted. The COCOMO Model gets 10 out of 10 as it produces numbers that everyone can both understand and interpret. It is, therefore, a useful approach. It is recommended that project managers apply such a model as it means they will be able to understand the outputs. For [
When it comes to Automatable and “Supportability”, we did not find any tool from the survey that is automated fully for the COCOMO Model. Therefore, we did not discover a tool that could produce the results from A-Z directly and automatically. We can utilise a tool at a specific stage to calculate a particular value or parameter.
Validation is another important characteristic of an approach or model. COCOMO is a validated model as this is a mathematical model, so it is therefore valid. It can be run on some datasets or an empirical study in order to validate it. It, therefore, has a score of 10/10 score for this characteristic. The approach for [
Sensitivity is also an extremely important characteristic. A model is sensitive when there is a big change in the estimated effort in the input values. The COCOMO Model is called a sensitive model as it is a mathematical one and if we alter a power/exponent in the function, the difference that exists between the old and new value will be a significant one. For example, the 1001 and 1002 result for the first one is 100 and for the second, the result is 10,000. This shows a very big difference for a very small change, which was from power 1 to power 2. It too is, therefore, a sensitive model. Similarly, the Rina et al. [
This is a vital characteristic when the different software’s cost estimation is compared. We will determine in our evaluation if a given model takes into consideration imprecision during the development of a software project. We can say that the COCOMO Model is a static model and has several variables that must be known before an estimation of the overall cost can be calculated. It also takes the software’s size into consideration. However, the project might not correctly handle the issues of uncertainty if the model is large and complex because there will be unknown variables. Our score for both the COCOMO Model and Rina et al. [
Software cost estimation can be seen as essential activity that needs the utilization of both right methods and techniques in order to accomplish a good estimation of the results. This is why we studied several cost estimation approaches in this work and then evaluated and compared five of them. These five approaches
are Constructive Cost Model (CoCoMo), Feed-forward neural network with Principal Component Analysis (PCA), Putnam model/SLIM, Function Point Analysis and Wavelet Neural Network (WNN).
It is important to note here that we introduced different evaluation characteristics (i.e., ease of use, adaptability, accuracy, consistency, interpretable, automatable, tool supported, empirical validations, sensitivity, and handling imprecision and uncertainty) in order to compare between these five software cost estimation approaches.
Our observations indicated that it is best to use a number of different estimating techniques or cost models for the project manager, and then compare the results, before determining the reasons for large variations and documenting any assumptions that were made while making the estimates.
Keshta, I.M. (2017) Software Cost Estimation Approaches: A Survey. Journal of Software Engineering and Applications, 10, 824-842. https://doi.org/10.4236/jsea.2017.1010046
As one of the first algorithmic cost models, the Putnam estimating method [
Effort = [ Size ( Productivity * Time ) ( 4 3 ) ] 3 * B (8)
By plotting effort as a function of time, the Time-Effort Curve is established. The various points along this curve represent the estimated total effort that is made in order to complete the project. A distinguishing feature of the model is that the total effort decreases as the time that is taken to complete the project gets extended. Other parametric models usually represent this with a schedule relaxation parameter.
The model is used as a case study in order to see the effect of a factor on the Time-Effort curve. The size in the first case was assumed to be 50,000 source lines of code, while factor B was 0.000005. The curves were plotted by utilising the productivity values of 15, 13 and 11. As productivity decreases, the time that is taken to finish the project increases.
In another case, which involved the same source lines of codes, as well as the same three productivity factors, factor B was increased. This was done in order to see the effect of it on the curve. The three sets of curves (from left to right) are for the values of B, as 0.000005, 0.00005, and 0.0005 respectively. As factor B increases, the time taken to finish the project also increases.
As can be seen from the two figures above (
nam model is based on the knowledge of―or being able to accurately estimate―the size (in the lines of code) of the software that is to be developed. There is frequently much uncertainty about the software’s size, which can result in the cost estimation being inaccurate. In addition, this model is unsuitable for projects that are very small.