Open Access Library Journal
Vol.05 No.08(2018), Article ID:86677,13 pages
10.4236/oalib.1104767
Bayesian Predictive Analyses for Logarithmic Non-Homogeneous Poisson Process in Software Reliability
Nickson Cheruiyot, Luke Akong’o Orawo, Ali Salim Islam
Department of Mathematics, Egerton University, Egerton, Kenya
Copyright © 2018 by authors and Open Access Library Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
Received: July 10, 2018; Accepted: August 12, 2018; Published: August 15, 2018
ABSTRACT
This paper discusses the Bayesian approach to estimation and prediction of the reliability of software systems during the testing process. A Non-Homogeneous Poisson Process (NHPP) arising from the Musa-Okumoto (1984) software reliability model is proposed for the software failures. The Musa-Okumoto NHPP reliability model consists of two components―the execution time component and the calendar time component, and is a popular model in software reliability analysis. The predictive analyses of software reliability model are of great importance for modifying, debugging and determining when to terminate software development testing process. However, Bayesian and Classical predictive analyses on the Musa-Okumoto (1984) NHPP model is missing on the literature. This paper addresses four software reliability issues in single-sample prediction associated closely with development testing program. Bayesian approach based on non-informative prior was adopted to develop explicit solutions to these problems. Examples based on both real and simulated data are presented to illustrate the developed theoretical prediction results.
Subject Areas:
Mathematical Statistics
Keywords:
Non-Informative Priors, Non-Homogeneous Poisson Process, Bayesian Approach, Intensity Function, Software Reliability Model
1. Introduction
Software has become a driver for everything in the 21st century from elementary education to genetic engineering. Thus due to high dependency, the size and complexity of computer systems have grown and these pose a great problem in their reliability as failures are prone to happen during their operations. To avoid the failures and faults, reliability of software needs to be studied during development of software so as to come up with reliable software. Reliability of software is of a lot of concern to the developers.
Software reliability is defined as the probability of failure free software operations for a specified period of time in a specified environment [1] . With the increasing need of software with zero defects, predicting reliability of software systems is gaining more and more importance [2] . Software reliability is achieved through testing during the software development stage [3] . Software Reliability modeling is done to estimate the form of the curve of the failure rate by statistically estimating the parameters associated with the selected model. In most cases, the reliability development of a complex system often take place by testing a system until it fails, then making repairs and design changes and testing it again. This process continues until a desired level of reliability is achieved [4] . The purpose of this measure is to estimate the extra execution time during test required to meet a specified reliability objective and to identify the expected reliability of the software when the product is released. During reliability modeling, the software systems are tested in an environment that resembles to the operational environment [5] .
Over the past decades many software reliability models that can be used for predictive analyses have been proposed by different authors [6] [7] . The Musa-Okumoto reliability model, also known as logarithmic was developed by Musa and Okumoto in 1984; which they confirmed to be more accurate than the exponential model. The Musa-Okumoto software reliability model is one of non-homogeneous Poisson process software models with the intensity function given by;
(1)
The model is based on the assumptions that failures are observed during execution time caused by remaining faults in the software; whenever a failure is observed, an instantaneous effort is made to find what caused the failure and the faults are removed prior to future tests and whenever a repair is done it reduces the number of future faults not like other models. The model must remain stable during the entire testing period for any particular testing environment and a reasonably accurate prediction of reliability must be provided by the model. These are the two main aspects of a good reliability model [8] . The Musa-Okumoto (1984) model has been used in various testing environment and in many instances, it provides good estimation and prediction of software reliability. Compared to other models when used in testing industrial data set, Musa-Okumoto model is the best performer in terms of fitting and predictive capability to the data [5] .
There has been a lot of application of Musa-Okumoto software reliability growth model as it one of the best predictive models, it belongs to the selected models in the AIAA recommended practice standard on software reliability [9] , [10] . Musa-Okumoto model have been also used in software cost estimation models with high accuracy [11] [12] [13] . A critical review and categorization of software reliability have been done by many researchers [14] [15] . Predictive analyses on this model is missing in literature and this paper presents predictive analyses on Musa-Okumoto software reliability model using Bayesian approach. This paper presents Bayesian single-sample predictive inference for Musa-Okumoto software reliability model using Bayesian approach.
2. Bayesian Methodology
Bayesian method owes its name to the fundamental role of Bayes’ theorem. In Bayesian reasoning, uncertainty is attributed not only to data but also to the parameters. Therefore, all parameters are modelled by distributions. Before any data are obtained, the knowledge about the parameters of a problem are expressed in the prior distribution of the parameters. Given actual data, the prior distribution and the data are combined into the posterior distribution of the parameters. The posterior distribution summarizes our knowledge about the parameters after observing the data.
In this paper we assume that a reliability growth testing is performed on a computer software system and the number of failures in the time interval , denoted by is observed. We also assume that follows the NHPP with intensity given in Equation (1). Let be the successive failure times. When testing stops after a pre-determined n number of failures is observed, the failure data is said to be failure-truncated. We denote the n failures time by where , a time-truncated data is when testing is observed for fixed time t. We denote the corresponding observed data by , where .
2.1. Issues in Prediction
In this paper we present four issues 1) 2) 3) and 4) as listed below in single-sample prediction which are associated closely with development testing program of a software. Here, we consider one software and assume that its cumulative time between failure times obey Musa-Okumoto software reliability growth model with observed data as either or . Based on or , we are interested in the following problems:
1) What is the probability that at most k software failure will occur in the future time period with ?
2) Given that the pre-determined target value for the failure rate of the software undergoing development testing is not achieved at time T, what is the probability that the target value will be achieved at time ?
3) Suppose that the target value for the software failure rate is not achieved at time T, how long will it take so that the software failure rate will be attained at ?
4) What is the upper prediction limit (UPL) of with level . being a pre-determined value greater than T?
2.2. Prior, Posterior and Predictive Distributions
Let represent or . The joint density of is therefore :
(2)
Case 1: , the shape parameter is known, we adopt the following non-informative prior distribution for :
(3)
The posterior distribution of is thus given by;
(4)
Let be the random variable being predicted. The predictive density of is;
(5)
Hence, the Bayesian UPL of with level , denoted as , must satisfy
(6)
Case 2: The shape parameter is unknown; we consider the following joint prior distribution of and where both parameters are assumed to be independent.
(7)
Thus the corresponding joint posterior distribution for and is given as;
(8)
Equation (8) is similar to Equation (4), let be the random variable predicted. The predictive density of is;
(9)
and the Bayesian UPL denoted by of with level similar to Equation (6) is;
(10)
3. Main Results for Prediction Using Non-Informative Priors
In this section we address the four issues stated in Section 2.1 using the Bayesian approach. The main results are presented as propositions and their proof given in the Appendix. Below, we use to represent the percentage point of the chi-square distribution with n degrees of freedom such that , and define Poisson and gamma . The prior is assumed to be Equation (3) and Equation (7) in all subsequent propositions.
Preposition 1 (issue 1)
The probability that at most k failures will occur in the time interval with is
(11)
Preposition 2 (issue 2)
The probability that the target value will be achieved at time ( ) is
(12)
Preposition 3 (issue 3)
For a given level , the time required to attain is
(13)
Remark 1: For the second part of Equation (13), is the solution to the equation
. (14)
Preposition 4 (issue 4)
The Bayesian UPL of with level is
(15)
Remark 2: For the second part of Equation (15), is the solution to
(16)
4. Real Example
We have used the time between failures data described in [16] to illustrate the developed methodologies for the single-sample Bayesian predictive analysis. We conducted the goodness of fit test presented in [17] and found that the data obey the Musa-Okumoto process. On the basis of this data set the maximum likelihood estimates for the parameters and of the Musa-Okumoto growth model were obtained as and , respectively.
1) Suppose we are interested in the probability that at most k failures will occur in a future time period . a) For the case known, we take its maximum likelihood estimate as its true value, i.e. . Using the first formula in Equation (11), we have , , , , , , , , , , , , , , , . b) When is unknown, from the second formula of Equation (33), we obtain , , , , , , , , , , , , , , , Figure 1 shows the graph of desired probabilities when is known and when it is unknown.
From the graph it can be seen that there is high probability that at most 15 failures will occur during that time interval when is unknown as compared to when it is known. 2) Suppose the target value is given by chosen arbitrarily. At the time , the MLE of the achieved failure rate for this software is , which is greater than thus it cannot be achieved at time and development testing will continue. Suppose we want to find the probability that the target value will be achieved at the time . a) When is known (say, ), from the first formula in Equation (12), we obtain
Figure 1. The graph of the probabilities that at most k failures will occur in the time interval (180, 250] for the cases of known and unknown.
, which is very small and hence the target valuewill not be achieved. b) when is unknown, from the second formula in Equation (12) we have computed by the Monte Carlo Method of integration based on a sample of size . This shows that, when is unknown there is a possibility of achieving the target value at time .
3) Since the target value was not achieved at , we want to know how long it will take for the target value to be achieved. a) when is known (say, ), let , from the first formula in Equation (13) we obtain . This means that, it will take another 538.7523 hours in order to achieve the desired failure rate. b) when is unknown, from second formula in Equation (13) and Remark1, we obtain . Thus, it takes another 414 hours in order to achieve the desired failure rate when is unknown this shows a high reduction in time as compared to when is known. 4) Given , from first formula in Equation (15)
the Bayesian UPL of with level is given by
.
5. Conclusions
In software development, predictive analysis is very important as it helps the software developer to make a trade-off decision at the right time. In this paper, explicit solution to predictive issues that may arise during development process were derived using Bayesian approach. These solutions are helpful to software developers in many instances such as resource allocation, when to terminate the testing process, modification needed in the software before termination.
The study used Bayesian approach with non-informative priors to derived explicit solutions for predictive issues that may arise during software development process. In all the cases when the shape parameter was known, solutions to posterior and predictive distributions had closed forms while when it is unknown, solutions had no closed forms and the study used Markov Chain Monte Carlo (MCMC). Bayesian approach was used as it is advantageous over classical approach. Bayesian approach is available for small sample sizes and allows the input of prior information about reliability growth process and provides full posterior and predictive distributions [6] .
However, it will be interesting to look at two-sample prediction for Musa-Okumoto (1984) model considering procedures that [3] used. These procedures presented in this paper can also be extended to other NHPP models such as Cox-Lewis process and the delayed S-shaped process. This is left open for future research.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.
Cite this paper
Cheruiyot, N., Orawo, L.A. and Islam, A.S. (2018) Bayesian Predictive Analyses for Logarithmic Non-Homogeneous Poisson Process in Software Reliability. Open Access Library Journal, 5: e4767. https://doi.org/10.4236/oalib.1104767
References
- 1. Nuria, T.R. (2011) Stochastic Comparisons and Bayesian Inference in Software Reliability. Ph.D. Thesis, Universidad Carlos III de Madrid, Madrid.
- 2. Sonia, D. and Renu, D. (2014) A Study of Various Reliability Growth Models. International Journal of Advanced Research in Computer Science and Software Engineering, 4, 1213-1219.
- 3. Daniel, R.J. and Hoang, P. (2001). On the Maximum Likelihood Estimates for the Goel-Okumoto Software Reliability Model. The American Statistician, 55, 219-222.
https://doi.org/10.1198/000313001317098211 - 4. Muralidharan, K., Rupal, S. and Deepak, H.D. (2008) Future Reliability Estimation Based on Predictive Distribution in Power Law Process. Quality Technology & Quantitative Management, 5, 193-201.
https://doi.org/10.1080/16843703.2008.11673396 - 5. Ullah, N., Morisio, M. and Vetro, A. (2013) A Comparative Analysis of Software Reliability Growth Models Using Defects Data of Closed and Open Source Software. In: 35th Annual IEEE Software Engineering Workshop, Heraclion, 12-13 October 2012, 187-192.
- 6. Yu, J.-W., Tian, G.-L. and Tang, M.-L. (2007) Predictive Analyses for Non-Homogeneous Poisson Processes with Power Law Using Bayesian Ap-proach. Computational Statistics & Data Analysis, 51, 4254-4268.
https://doi.org/10.1016/j.csda.2006.05.010 - 7. Akuno, A.O., Orawo, L.A. and Islam, A.S. (2014) One-Sample Bayesian Predictive Analyses for an Exponential Non-Homogeneous Poisson Process in Software Reliability. Open Journal of Statistics, 4, 402-411.
https://doi.org/10.4236/ojs.2014.45039 - 8. Kapur, P.K., Pham, H., Gupta, A. and Jha, P.C. (2011) Software Reliability Assessment with OR Applications. Springer Series in Reliability Engineering, Springer-Verlag, London, 58.
https://doi.org/10.1007/978-0-85729-204-9 - 9. Lyu, M. (1996) Handbook of Software Reliability Engineering. McGraw-Hill, New York.
- 10. Malaiya, Y.K. and Denton, J. (1997) What Do the Software Reliability Growth Model Parameters Represent? Technical Report CS-97-115, Department of Computer Science, Colorado State University, Fort Collins.
- 11. Xia, W., Capretz, L.F. and Ho, D. (2008) A Neuro-Fuzzy Model for Function Point Calibration. WSEAS Transactions on Information Science and Applications, 5, 22-30.
- 12. Nassif, A.B., Capretz, L.F. and Ho, D. (2010) Software Estimation in the Early Stages of the Software Life Cycle. In: International Conference on Emerging Trends in Computer Science, Communication and Information Technology, Nanded, 9-11 January, 5-13.
- 13. Nassif, A.B., Ho, D. and Capretz, L.F., (2013) Towards an Early Software Estimation Using Log-Linear Regression and a Multilayer Perceptron Model. Journal of Systems and Software, 86, 144-160.
https://doi.org/10.1016/j.jss.2012.07.050 - 14. Yadav, A. and Khan, R.A. (2009) Critical Review on Software Reliability Models. International Journal of Recent Trends in Engineering, 2, 114-116.
- 15. Sheakh, T.H., Quadri, S.M.K. and Singh, V. (2012) Critical Review of Software Reliability Model. International Journal of Emerging Technology and Advance Engineering, 2, 496-499.
- 16. Xie, M., Goh, T.N. and Ranjan, P. (2002) Some Effective Control Chart Procedures for Reliability Monitoring. Reliability Engineering & System Safety, 77, 143-150.
https://doi.org/10.1016/S0951-8320(02)00041-8 - 17. Zhao, J. and Wang, J. (2005) A New Goodness-of-Fit Test Based on the Laplace Statistic for a Large Class of NHPP Models. Communications in Statistics—Simulation and Computation, 34, 725-736.
https://doi.org/10.1081/SAC-200068389
Appendix: Proof of Preposition 1 - 4
We first state the following identity without proof: That is
(A.1)
where m is any positive integer, a and b are two real numbers such that , is an increasing and differentiable function and
.
Proof of Proposition 1
The probability that at most k failures will occur in the interval is . When is known, we have
. (A.2)
where is given by equation (4) and
(A.3)
From Equation (2), we have , and
Thus Equation (A.3) becomes
(A.4)
And hence Equation (A.2) becomes
(A.5)
The integral part of Equation (A.5) integrates to 1 since it is a gamma distribution with parameters j and and hence Equation (A.5) reduces to
. (A.6)
This is the first formula of Equation (11).
When is unknown, noting that and are given by Equation (A.4) and Equation (8) respectively, we obtain
(A.7)
Since the summation of k is from n to and k’s are not the same, we substitute letter k with d in Equation (A.7) where as used in equation (8). Equation (A.7) implies the second formula in Equation (11).
Proof of preposition 2
Let denote the posterior of . Hence, the probability that the target value will be achieved at time is given by
(A.8)
when is known, making transformation , we have and . Consequently, the posterior density of is
(A.9)
From Equation (A.9), it can easily be noted that has gamma distribution with parameters n and . Noting that gamma and Poisson distributions have a relationship defined as
. (A.10)
By substituting Equation (A.9) and Equation (A.10) into Equation (A.8), we obtain the first formula of Equation (12).
When is unknown, making transformation and , we obtain and . Note that the Jacobian is . From Equation (8), the joint posterior density of is
.
(A.11)
By substituting Equation (A.10) and Equation (A.11) into Equation (A.8), we obtain the second formula of Equation (12).
Proof of preposition 3
For given level , the time required to attain the target value is , where satisfies Equation (44). When is known, from Equation (46), it can easily be seen that
follows a chi-square distribution with 2n degrees of freedom. Thus we have
. (A.12)
and Equation (13) follows immediately.
The time required to attain the target with level when is unknown is where is the solution to
. (A.13).
Proof of preposition 4
For a pre-determined , the Bayesian upper prediction limit for with level is satisfying . From Equation (A.8) and Equation (A.12), we have , thus follows Equation (15). The second part follows similarly.