Open Journal of Statistics
Vol.4 No.5(2014), Article ID:49084,10 pages DOI:10.4236/ojs.2014.45039

One-Sample Bayesian Predictive Analyses for an Exponential Non-Homogeneous Poisson Process in Software Reliability

Albert Orwa Akuno, Luke Akong’o Orawo, Ali Salim Islam

Department of Mathematics, Egerton University, Egerton, Kenya

Email: orwaakuno@gmail.com, orawo2000@yahoo.com, asislam54@yahoo.com

Copyright © 2014 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

http://creativecommons.org/licenses/by/4.0/

Received 21 May 2014; revised 28 June 2014; accepted 12 July 2014

Abstract

The Goel-Okumoto software reliability model, also known as the Exponential Nonhomogeneous Poisson Process, is one of the earliest software reliability models to be proposed. From literature, it is evident that most of the study that has been done on the Goel-Okumoto software reliability model is parameter estimation using the MLE method and model fit. It is widely known that predictive analysis is very useful for modifying, debugging and determining when to terminate software development testing process. However, there is a conspicuous absence of literature on both the classical and Bayesian predictive analyses on the model. This paper presents some results about predictive analyses for the Goel-Okumoto software reliability model. Driven by the requirement of highly reliable software used in computers embedded in automotive, mechanical and safety control systems, industrial and quality process control, real-time sensor networks, aircrafts, nuclear reactors among others, we address four issues in single-sample prediction associated closely with software development process. We have adopted Bayesian methods based on non-informative priors to develop explicit solutions to these problems. An example with real data in the form of time between software failures will be used to illustrate the developed methodologies.

Keywords: Nonhomogeneous Poisson Process, Non-Informative Priors, Software Reliability Models, Bayesian Approach

1. Introduction

Over the last decade of the 20th century and the first few years of the 21st century, the demand for complex software systems has gone high as it is seen that today, computers are embedded in automotive mechanical and safety control systems, industrial and quality process control, real-time sensor networks, aircrafts, nuclear reactors, hospital healthcare and air traffic control systems among others; computer systems have become an indispensable component of our modern society today. Consequently, the reliability of software used in these systems has been a major concern and a requirement in the modern generation. Software reliability is defined as the probability of failure-free software operations for a specified period of time in a specified environment [1] . A single software defect can cause system failure and to avoid these failures, reliable software is required. Software reliability is achieved through testing during the software development testing stage [2] . The usual criteria of removing bugs in software are by running test cases in a manner that exercises the software similar to the way that users will operate in their particular environment. However, emulating end-user environment during the test interval is difficult and time-consuming especially when there are multiple types of end-users and also, business pressure to release a software system within a tight market window puts a constraint on the amount of time that can be spent testing the software. Software reliability modeling comes in handy to address this dilemma. As indicated by [3] , software reliability modeling can provide the basis for planning reliability growth tests, monitoring progress and estimating current reliability and forecasting and predicting future reliability improvements. Forecasting and prediction are achieved through predictive analyses. In particular, predictive analyses are useful in determining when to terminate the development process of software or hardware. Often, a prediction interval is constructed to provide the time frame when the future failure observation will occur with a pre-determined confidence level [4] .

Many software reliability models have been developed by various authors and researchers in the past three decades. Amongst, an Exponential Nonhomogeneous Poisson Process with intensity function

(1)

is the earliest software reliability model to be developed by Goel and Okumoto in 1979. In various literatures, this NHPP is called the Goel-Okumoto (1979) model.

As noted by [5] , the Goel-Okumoto (1979) model has been applied to a number of software testing environments and its application on assessing and detecting software failures has been investigated by various authors. For instance, the Goel-Okumoto model has been used to develop a statistical control mechanism that could be used to detect whether a software process is statistically under control or not. ML estimation of the parameters of the Goel-Okumoto (1979) model has been conducted and in particular, it has been shown that the ML estimates of the parameters of the model are not consistent as the testing period extends to infinity. [6] presented an empirical method for selecting software reliability growth models for release decision-making where they applied iteratively various software reliability models namely Goel-Okumoto (1979), Delayed S-shaped, Gompertz and Yamada exponential software reliability growth models to weekly cumulative software failure data during system test to determine the number of remaining failures expected in software after release. [7] also performed parameter estimation of the Goel-Okumoto, Yamada S-shaped and Inflection S-shaped software reliability growth models where they also established a necessary and sufficient condition with respect to the software failure data, of which, if satisfied, will ensure that the MLE method returns a unique positive and finite estimation of the unknown parameters of the Goel-Okumoto and the Yamada S-shaped models. [8] presented software failure data which, after study, depicted that the failure rate, i.e. the number of failures per hour, seemed to be decreasing with time, an indication that a Nonhomogeneous Poisson Process with mean value function , a mean value function corresponding to that of the Goel-Okumoto software reliability model, was a reasonable model to describe the failure process. From the literature, it is evident that most of the study that has been done on the Goel-Okumoto software reliability model is parameter estimation using, especially, the MLE method and model fit. There is a conspicuous absence of literature on both the classical and Bayesian predictive analyses on the model.

This paper focuses on single-sample predictive inference for the Goel-Okumoto (1979) software reliability model using Bayesian approach. We first identify four issues in the single-sample prediction associated closely with the development testing process of software and proceed to develop and derive the corresponding predictive distributions in Section 2. The main results for single-sample prediction are presented in Section 3. A real example in the form of secondary software failure data in the form of execution times between successive software failures is used to illustrate the proposed and developed methodologies in Section 4. A discussion is given in Section 5 and thereafter, mathematical proofs are given in the Appendix.

2. Predictive Issues and Bayesian Method

During the development testing stage of a software, statisticians and engineers are overly interested in various predictive problems whose solutions are believed to be very important in modifying, debugging and determining when to terminate software development testing process. In this section, we present four issues associated closely with software development testing process and derive the predictive distributions using Bayesian approach. For the purposes of the four predictive issues, we assume that a reliability growth testing is performed on a software and the cumulative number of failures of the software in the time interval, denoted by is observed. We further assume that follows the NHPP with intensity function given in Equation (1). Let be the observed failure times. Failure data is said to be failure-truncated when testing stops after a predetermined n number of failures occur. We denote the n failure times by where. Failure data is said to be time truncated if testing stops at a predetermined time t. We denote the corresponding observed failure data by, where.

Prediction interval is a confidence interval for a future observation or a function of some future observations. Specifically, a double-sided (bilateral) prediction interval for with confidence level is defined by such that. Similarly, a single-sided (unilateral) lower or upper prediction limit for with level is defined by (or) which satisfies (or). Both and depends only on a single sample (or a single software) and are called single-sample prediction limits. Prediction limits involving two samples (or two softwares) can be defined similarly and are called two-sample prediction limits.

2.1. Issues in Single-Sample Software Reliability Prediction

Here, we consider one software and assume that its cumulative inter-failure times obey the Goel-Okumoto (1979) software reliability model with observed data being either or. Based on or, we are interested in the following problems:

Issue A: what is the probability that at most k software failures will occur in the future time period with?

Issue B: suppose that the pre-determined target value for the failure rate of the software undergoing development testing is not achieved at time T, what is the probability that the target value will be achieved at time?

Issue C: suppose that the target value for the software failure rate is not achieved at time T, how long will it take so that the software failure rate will be attained at?

Issue D: what is the upper prediction limit (UPL) of with level, being a predetermined value greater than T?

2.2. Posterior and Predictive Distributions

Let represent or. The joint density of is therefore

(2)

Case 1: When the shape parameter is known, we adopt the following non-informative prior distribution of:

. (3)

The posterior distribution of is thus given by

(4)

Let be the random variable being predicted. Then the posterior predictive distribution of is give as

(5)

Hence the Bayesian UPL of with level denoted as must satisfy

. (6)

Case 2: When the shape parameter is unknown, we consider the following non-informative joint prior density for and (we assume that and are independent).

(7)

Hence the corresponding joint posterior density is given as

(8)

where

. (9)

Similar to Equation (5) and Equation (6), let denote the Bayesian UPL of with level, then

(10)

and

(11)

3. Main Results for the Prediction Problems

In this section, we address the four single-sample prediction issues raised in Section 2.1 using Bayesian approach. The following propositions are considered as the main results with proofs being given in the Appendix. In the subsequent results, we use to represent the percentage point of the chi-square distribution with n degrees of freedom and we also assume the priors to be Equation (3) and Equation (7).

Proposition 1 (for issue A): The probability that at most k software failures will occur in the future time period with is

(12)

Proposition 2 (for issue B): Suppose that the pre-determined target value for the failure rate of the software undergoing development testing is not achieved at time T, the probability that the target value will be achieved at time is

(13)

Remark 1: Let be i.i.d. sample from, we can approximate the second part of (13) via MCMC method.

Proposition 3 (for issue C): For given level, the time required to attain is

(i) (14)

(ii) (15)

where is the solution to the following equation:

(16)

Proposition 4 (for issue D): The Bayesian UPL of with level is

(i)  (17)

(ii) (unknown) such that  (18)

(19)

4. Example

In this section, a real example from the time between failure data given by [9] is used to illustrate the developed methodologies for the single-sample Bayesian predictive analysis. The Table 1 gives the Time Between Failure.

The study has used the cumulative time between failures as failure times where. These data obey the Goel-Okumoto (1979) software reliability model [10] . The MLEs of the parameters of the software reliability model based on the data are and. In the illustration of the developed methodologies, the study has used these MLEs.

1) Suppose that we are interested in the probability that at most k failures will occur in the future time period a) When is known (say), using the first formula in Equation (12), we have, , , , , , , 0.7193, , , , , , , and b) When is unknown, from the second formula in Equation (12) we obtain, , , , , , , , , , , , , , , ,.

Figure 1 shows the graph of the desired probabilities for the case when is known and when is unknown.

2) Suppose that the target value is given by. At time, the MLE of the achieved failure rate for this software is which is greater than i.e. it cannot be achieved at time. Thus the development testing will continue. Suppose we want to predict the probability that the target value will be achieved at time. a) When is known, say, from the first formula in Equation (13) we obtain. Thus we can conclude that the target value (failure rate) will not be achieved. b) When is unknown, from the second formula in Equation (13) and Remark 1, we obtain where the Monte Carlo sample size is.

Table 1. Time between failures data.

Figure 1. Comparison of the probabilities γk that at most k failures will occur in the time interval (180, 240] for the cases of known and unknown β.

3) Since the target value was not achieved at time, we want to know how long it will require in order to attain. a) When is known (i.e.), let, from Equation (14) we obtain. In other words, it will take another 268.6116h in order to achieve the desired failure rate. b) When is unknown, from Equation (15) and Equation (16), we obtain. In other words, it will take another 770.79 h in order to achieve the desired failure rate when is unknown.

4) Given, a) when is known, from Equation (17), the Bayesian Upper Prediction Limit of with level 0.90 is given by b) When is unknown, from Equation (18) and Equation (19), the Bayesian UPL of with level 0.90 is given by.

5. Discussion

Several prediction problems arise during the development of any software especially when the Goel-Okumoto (1979) software reliability model is used to model the failure process. We have used Bayesian approach with non-informative priors to address some of the prediction problems that may arise during software development testing stage. We have obtained explicit solutions to these problems, which may prove useful for the modification, debugging and for the decision to terminate the development testing process of the software.

The adoption of Bayesian approach for the derivation of the solutions is advantageous in that the approach is available for cases of small sample sizes [11] [12] . Another advantage of the Bayesian approach is that it allows the input of prior information about the reliability growth process and provides full posterior and predictive distributions.

In this paper, we have used non-informative priors to derive the methodologies to address the said prediction problems. However, informative priors can similarly be used in place of non-informative priors. The same procedures presented in this paper can also be applied to other NHPPs such as the delayed S-shaped process and the Cox-Lewis process.

References

  1. Nuria, T.R. (2011) Stochastic Comparisons and Bayesian Inference in Software Reliability. Ph.D. Thesis, Universidad Carlos III de Madrid, Madrid.
  2. Daniel, R.J. and Hoang, P. (2001) On the Maximum Likelihood Estimates for the Goel-Okumoto Software Reliability Model. The American Statistician, 55, 219-222. http://dx.doi.org/10.1198/000313001317098211
  3. Meth, M. (1992) Reliability Growth Myths and Methodologies: A Critical View. Proceedings of the Annual Reliability and Maintainability Symposium, New York, 21-23 January 1992, 230-238.
  4. Yu, J.-W., Tian, G.-L. and Tang, M.-L. (2007) Predictive Analyses for Nonhomogeneous Poisson Processes with Power Law Using Bayesian Approach. Computational Statistics & Data Analysis, 51, 4254-4268.http://dx.doi.org/10.1016/j.csda.2006.05.010
  5. Kapur, P.K., Pham, H., Gupta, A. and Jha, P.C. (2011) Software Reliability Assessment with OR Applications. Springer Series in Reliability Engineering, Springer-Verlag London Limited, London. http://dx.doi.org/10.1007/978-0-85729-204-9
  6. Stringfellow, C. and Amschler, A.A. (2002) An Empirical Method for Selecting Software Reliability Growth Models. Empirical Software Engineering, 7, 319-343. http://dx.doi.org/10.1023/A:1020515105175
  7. Meyfroyt, P.H.A. (2012) Parameter Estimation for Software Reliability Models. Ph.D. Thesis, Universidad Carlos III de Madrid, Madrid.
  8. Razeef, M. and Mohsin, N. (2012) Software Reliability Growth Models: Overview and Applications. Journal of Emerging Trends in Computing and Information Sciences, 3, 1309-1320.
  9. Xie, M., Goh, T.N. and Ranjan, P. (2002) Some Effective Control Chart Procedures for Reliability Monitoring. Reliability Engineering and System Safety, 77, 143-150. http://dx.doi.org/10.1016/S0951-8320(02)00041-8
  10. Satya, P., Bandla, S.R. and Kantham, R.R.L. (2011) Assessing Software Reliability Using Inter Failures Time Data. International Journal of Computer Applications, 18, 975-978.
  11. Phillips, M.J. (2000) Bootstrap Confidence Regions for the Expected ROCOF of a Repairable System. IEEE Transactions on Reliability, 49, 204-208. http://dx.doi.org/10.1109/24.877339
  12. Quigley, J. and Walls, L. (2003) Confidence Intervals for Reliability-Growth Models with Small Sample-Sizes. IEEE Transactions on Reliability, 52, 257-262. http://dx.doi.org/10.1109/TR.2003.811865

Appendix (Proofs of Propositions 1 - 4)

In order to prove the propositions, we first give an identity without proof. The identity is

(A.1)

where m is any positive integer, a and b are two real numbers, is an increasing and differentiable function, and

Proof of Proposition 1: The probability that at most k failures will occur in the interval is. When is known, we have

(A.2)

where is given by Equation (4) and

(A.3)

From Equation (2) we have and

(A.4)

Hence (A.3) becomes

(A.5)

And (A.2) becomes

(A.6)

Equation (A.6) implies the first formula of Equation (12).

When is unknown, noting that and are given by Equation (A.3) and Equation (8) respectively, we have

(A.7)

Equation (A.7) implies the second formula of Equation (12).

Proof of Proposition 2: Let denote the posterior density of. Hence the probability that the target value will be achieved at time is given by

. (A.8)

When is known, making the transformation , we have and. Consequently, the posterior density of is. This implies that which after simplification reduces to

(A.9)

We note that from Equation (A.9) follows a gamma distribution with parameters n and Noting the relationship between gamma and Poisson distributions as

(A.10)

and from Equations (A.8), (A.9) and (A.10), we obtain the first formula of Equation (13).

When is unknown, making the transformation and, we obtain and. Note that the Jacobian is . From Equation (8), the joint posterior density of is given as

(A.11)

From Equation (A.8), Equation (A.10) and Equation (A.11) we obtain

(A.12)

Equation (A.12) implies the second formula of Equation (13).

Proof of Proposition 3: For given level, the time required to attain the target value is where satisfies Equation (A.8). When is known, from Equation (A.9), it can easily be seen that

follows a chi-square distribution with degrees of freedom. Therefore, we have

(A.13)

and Equation (14) follows immediately. We can obtain (ii) by following similar arguments given in the proof for the second part of Proposition 2.

Proof of Proposition 4: For a pre-determined , the Bayesian Upper Prediction Limit (UPL) for with level is satisfying. From Equation (A.8) and Equation (A.13)

we have. This implies that

(A.14)

Making the subject from Equation (A.14) we arrive at

(A.15)

Equation (A.15) is the exact formula in Equation (17).

The formula in Equation (18) can be obtained by similar arguments.