Journal of Software Engineering and Applications, 2013, 6, 65-73 http://dx.doi.org/10.4236/jsea.2013.62011 Published Online February 2013 (http://www.scirp.org/journal/jsea) 65 Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units Mohammad A. Alshraideh, Basel A. Mahafzah, Hamzeh S. Eyal Salman, Imad Salah The Department of Computer Science, The University of Jordan, Amman, Jordan. Email: mshridah@ju.edu.jo Received December 8th, 2012; revised January 9th, 2013; accepted January 20th, 2013 ABSTRACT PL/SQL is the most common language for ORACLE database application. It allows the developer to create stored pro- gram units (Procedures, Functions, and Packages) to improve software reusability and hide the complexity of the execu- tion of a specific operation behind a name. Also, it acts as an interface between SQL database and DEVELOPER. Therefore, it is important to test these modules that consist of procedures and functions. In this paper, a new genetic algorithm (GA), as search technique, is used in order to find the required test data according to branch criteria to test stored PL/SQL program units. The experimental results show that this was not fully achieved, such that the test target in some branches is not reached and the coverage percentage is 98%. A problem rises when target branch is depending on data retrieved from tables; in this case, GA is not able to generate test cases for this branch. Keywords: Genetic Algorithms; SQL Stored Program Units; Test Data; Structural Testing; SQL Exceptions 1. Introduction PL/SQL is an imperative third generation language (3GL) that was designed specifically for the processing of SQL commands. It provides specific syntax for this purpose and supports exactly the same data types as SQL. ORA- CLE database can be accessed by calling PL/SQL named block that include functions and procedures. Therefore, they must be executed properly in order to guarantee a reliable and confidence database system [1]. Software testing is an important stage of software de- velopment life cycle (SDLC). It is an activity that helps finding out bugs and errors in a software system that is under development in order to provide a bug free and reliable system/solution to the customer [2]. Testing has two main types based on the knowledge of the system: black box testing (functional) and white box testing (structural) [2-4]. The functional testing deals with the system as a black-box that does not explicitly use knowl- edge of the internal structure; which means it usually makes sure that the system is working according to the system requirements, while the structural testing gener- ates the test data depend on the knowledge of internal code of the system. During structural testing, the goal is to generate a test data which satisfy a given testing crite- rion to cover given elements of the program. In this pa- per, the branch coverage criterion is considered, where each branch of the program should be reached by some test data [2]. Generally, structural testing techniques are classified into two categories: static testing (manual) and dynamic testing (automatic). In the static testing, a code reviewer reads the source code statement by statement and visu- ally follows the logical program flow by feeding an in- put, so it is costly. In contrast, dynamic testing tech- niques execute the program under test on test input data and then simply observe the results. Consequently, dy- namic testing reduces the cost of software development and maintenance [2]. Search-based software testing is an example of dynamic method used to generate test set that can be successfully applied in structural testing. It relies on a cost function that can be used to compare candidate test data [5]. Genetic algorithms (GA) have been very interesting area of study in many disciplines, such as optimization, automatic programming, economics, immune systems, ecology and social systems. In this paper we apply the GA as a search technique to find test data to test named block in ORACLE; specifically, IF-statement and While- statement and their combinations are considered [6]. The rest of the paper is organized as follows: Section 2 presents background and related work. Section 3 presents a strategy for applying GA to test named block in ORA- CLE. Section 4 presents experimental environment and Section 5 presents experimental results. Finally, Section 6 concludes the paper. Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 66 2. Background and Related Work This section presents an overview of evolutionary algo- rithms; such as random test data generation and Hill Climbing, and meta-heuristic search algorithms which proposed a potential better alternative for developing test data generators [7,8]. Efficient existing meta-heuristic search algorithms include Simulated Annealing, Tabu Search, GA and Ant Colony Optimization. Each of these search algorithms has its own advantages and disadvan- tages over the others. They are strongly domain depend- ent problem, because they use domain dependent knowl- edge or heuristics related to the problem domain under consideration. Also in this section, stored program units are explained and an overview about Jordan University Hospital Computer systems is presented. 2.1. Random Test Data Generation Random test data generation is a technique based on se- lection test data randomly until the suitable test data is found. It only explores the search space by randomly selecting solutions and evaluating their fitness. This is quite an unintelligent strategy but it does not take much effort to be implemented [9]. 2.2. Hill Climbing Hill Climbing is a well known local search algorithm. Hill Climbing works to improve one solution, with an initial solution randomly chosen from the search space as a starting point. The neighborhood of this solution is in- vestigated. If a better solution is found, then the current solution is replaced. The neighborhood of the new solu- tion is then investigated. If a better solution is found, the current solution is replaced again, and so on, until no improved neighbors can be found for the current solution. Hill climbing is simple and gives fast results. However, it is easy for the search to yield sub-optimal results when the Hill Climbing leads to a solution that is locally opti- mal, but not globally [10]. 2.3. Simulated Annealing Simulated Annealing (SA) extends Hill Climbing such that it accepts poor solutions with low probability. SA allows for less restricted movement around the search space. The probability of acceptance (p) of an inferior solution changes as the search progresses, and is calcu- lated as in Equation (1) [11,12]. et p (1) where (δ) represents the difference in the objective value between the current solution and the neighboring inferior solution being considered, and (t) is a control parameter known as the temperature. The temperature is cooled according to a cooling schedule. Initially the temperature is high, in order to allow free movement around the search space. As the search progresses, the temperature decreases. However, if cooling is too rapid, not enough of the search space will be explored, and the chance of the search becoming stuck in the local optima is in- creased [12]. 2.4. The Principles of Genetic Algorithms The basic concepts of GAs are developed by Holland [13]. GAs is commonly applied to a variety of problems involving searching and optimization. GAs search meth- ods are rooted in the mechanisms of evolution and natu- ral genetics. GAs draw inspiration from the natural sea- rch and selection processes leading to the survival of the fittest individuals. GAs generates a sequence of popula- tions by using a selection mechanism, and use crossover and mutation as search mechanisms [14-16]. The principle behind GAs is that they create and main- tain a population of individuals represented by chromo- somes (essentially a character string analogous to the chromosomes appearing in DNA). These chromosomes are typically encoded solutions to a problem. The chro- mosomes then undergo a process of evolution according to the rules of selection, mutation and reproduction. Each individual in the environment (represented by a chromo- some) receives a measure of its fitness. Reproduction selects individuals with high fitness values in the popula- tion, and through crossover and mutation of such indi- viduals, a new population is derived in which individuals may be even better fitted to their environment. The proc- ess of crossover involves two chromosomes swapping chunks of data (genetic information) and is analogous to the process of sexual reproduction. Mutation introduces slight changes into a small proportion of the population and is representative of an evolutionary step. The struc- ture of a simple GA is given in Figure 1. The algorithm in Figure 1 will iterate until the population has evolved to form a solution to the problem, or until a maximum number of iterations have occurred. Simple Genetic algorithm ( ) { initialize population; evaluate population; while terminationcriterion not reached { select solutions for next population; perform crossover and mutation; evaluate population; } } Figure 1. The structure of a simple GA. Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 67 2.5. Stored PL/SQL Program Units There are three types of stored program units in PL/SQL; procedures, functions, and packages. Every stored pro- gram unit has a declarative part, an executable part or body and an exception handling part which is optional [1]. Declarative part contains variable declarations. Body of the named block contains executable statements of SQL and PL/SQL. Statements to handle exceptions are written in exception part. However, subprograms provide the following advantages [13]: They allow you to write PL/SQL program that meets our need. They allow you to break the program into manageable modules. They provide reusability and maintainability for the code. Procedure is a subprogram used to perform a specific action. A procedure contains two parts; specification and body. Procedure specification begins with the procedure name and ends with parameters list. Procedures that do not take parameters are written without a parenthesis. The body of the procedure starts after the reserved word (IS) or (AS) and ends with keyword END [17]. A func- tion is PL/SQL Block which is similar to a procedure. The major difference between a procedure and a function is the function must always return a value, but a proce- dure does not return a value [1,17]. A package is an en- capsulated collection of related program objects (for example, procedures, functions, variables, constants, cur- sors, and exceptions) stored together in the database. Using packages is an alternative to creating procedures and functions as standalone schema objects. Packages have many advantages over standalone procedures and functions. For example, they: Let you organize your application development more efficiently. Let you grant privileges more efficiently. Let you modify package objects without recompiling dependent schema objects. Enable Oracle Database to read multiple package ob- jects into memory at once. Can contain global variables and cursors that are available to all procedures and functions in the pack- age. Let you overload procedures or functions. Overload- ing a procedure means creating multiple procedures with the same name in the same package, each taking arguments of different number or data type. 2.6. Jordan University Hospital Computer System The core of Jordan University Hospital (JUH) informa- tion system is bought in 1994, and then the JUH IT team developed the Hospital Information System (HIS) using Oracle forms, and upgrades it to Oracle 10 g. HIS devel- oped to provide best medical services for patients and physicians. Delivering these services require hospitals to review the way they manage their business processes and supply more efficient features to physicians, patients, and hospitals officials as well as other decision makers. In order to provide such services, the health facility must focus on developing a solution to connect all its re- sources and makes it available to all who needs utilizing it using latest technology. This kind of solution will en- hance the performance and optimize the efficiency and will reduce the cost of ownership. IT department in JUH creates a solution suite that transforms the hospital to a community allowing the ac- cess to all resources and data as needed. HIS is a com- prehensive solution developed specifically for health facilities in the region. It is flexible, comprehensive, multilingual, integrated and secured solution that sup- ports clinical, financial, administration and higher man- agement needs. In general, a hospital management system can be sub categorize into the following groups (Figure 2): Medical Information System (Administrative and Clinical). Enterprise Resource Planning (ERP) (Material, Fi- nancial and Human Resources). Support System. Medical systems are developed to deliver all needed services to the hospital community (Physicians, Patients and Administration). The systems manage all patients’ data and information during their treatment episode in a professional and efficient manner. Medical systems stra- tegically support a full range of hospital functions. It contains a repository of all patients’ clinical, billing and Me dic al Sys tem Support Sys tem Medi cal System (Clinical) Enterprise Res ource Planning (ERP) Figure 2. Hospital management system sub-categorizes. Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 68 demographic data, reducing paper work, manual effort and errors. Furthermore; it allows for better staff utiliza- tion allowing for more time to focus on planning and goals achievements. This enables the hospital to provide better quality and more efficient services, needed by pa- tients and physicians. Medical systems are integrated with financial, administration, human resources, and ma- terial management systems. It contains vast collection of data including patient data, treatment data, hospital visit data, patient transactions data, hospital data, and statisti- cal information. HIS medical systems provide many key functions in- cluding: Medical administrative including: Patient master index Admission, discharge and transfer Scheduling and appointments Medical records Medical reports Medical statistics Catering Order entry and results communication Medical clinical including: Out-patient clinics Accidents and emergency Operation theater Maternity Doctors desktop Nurse station Laboratory Radiology Pharmacy Patient accounting including: Pricing and package deals Patient billing Insurance contract management Claims management In order to test our system in this research we will se- lect different procedures and functions, which will be described later in this paper. 3. A Strategy for Applying GA to Test Stored PL/SQL Program Units In general, the process of automatic structural test data generation for branch coverage consists of three major steps [7,18,19]: 1) Construction of control logic graph, e.g. control flow graph (CFG) or control dependency graph (CDG). 2) Selection the target according to branch coverage criterion. 3) Finding out a set of test data that satisfies the se- lected adequacy criterion. In order to use GA for solving an optimization prob- lem, there are multiple issues must be considered such as: how to build the fitness function and how to represent the problem in a chromosome expression (individual), i.e. sort of a sequence of binary digits that resembles the chromosome sequence, which GA can understand and manipulate. GA works on this encoded problem and de- livers the result as the problem solution; hence, the user should provide the meaning of the encoded problem [20, 21]. In this paper integer vector and binary string repre- sentation will be considered. 3.1. Branch Cost Functions To use a control dependency path or any set of branches as a search goal, it is necessary to determine the cost values for each branch predicate. To accomplish this, each conditional node in the program is associated with a real-valued predicate cost function that is evaluated whenever the conditional node is executed. This predi- cate cost function returns a positive value whenever the predicate is false and a negative value if the predicate is true. The cost of an evaluation of a logical negation of a predicate is the arithmetic negation of the cost of the evaluation of the predicate. Each reached branch main- tains two cost values, both derived from the associated predicate cost function. One cost value is the cost that all attempts to execute the branch are successful. This is called the cumulative and-cost. The other cost value is the cost when any attempt is successful, is called the cu- mulative or-cost. These costs can be illustrated with an example showing three failed and two successful at- tempts to execute the predicate a ≤ b for various integer values of a and b (Table 1). The predicate cost function is a – b when the predicate is false and a – b – 1 when the predicate is true. The cost function of or-cost and and-cost are shown in Table 1, where a and b are posi- tive (fal se), and a` and b` are negative (true), also a and b are never zero. The cost values produced by relational predicates are normalized, but the un-normalized values are used in Table 2. The cost of a conjunction of two false costs is the sum of the costs of the conjuncts. However, the cost of a disjunction of two false costs (Costd) is shown in Equation (2), where P and Q are the disjunct costs. Costd PQ PQ (2) Table 1. Logical or-cost and logical and-cost table. a b or-cost and-cost a b (ab)/(a + b) a + b a b` b` a a` b a` b a` b` a` + b` (a`b`)/(a` + b`) Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 69 Table 2. Cumulative or-cost and and-cost for the predicate a ≤ b for the listed values. a b cost or-cost and-cost 8 3 5 5 5 6 3 3 15/8 8 5 3 2 30/31 10 3 3 −1 −1 10 1 3 −3 −4 10 These costs can be illustrated with an example show- ing three failed and two successful attempts to execute the predicate a ≤ b for various integer values of a and b as shown in Table 2. Note that when both branches at a conditional node have been executed, the and-cost is positive and the or-cost is negative. Moreover, the mag- nitude of the and-cost is an indication of the number and magnitude of the failures to satisfy the predicate. A high and-cost indicates that the predicate has hardly been sat- isfied. A low and-cost indicates that the predicate has not been satisfied. The cost of a search goal is calculated, according to Equation (2), as the conjunction of the indi- vidual branch goal costs. Each individual branch cost is either a branch or-cost or a branch and-cost. Using this method, there is a disadvantage that a single large branch cost may dominate the overall cost value. Normalization or costs reduces this risk. The cost values produced by relational predicates (Costr) are normalized to lie within [−1, 1] using Equation (3), where c is the branch distance value. 1 1if 1 1 Cost1if0 1 0other r c c c c 0 wise (3) An alternative method, not used in the work reported here, is to compute a cost consisting of two components. One component, counts the branch goals that have yet to be satisfied. This cost component is the analogue to the ones used by Wegener et al. [22]. The second component is applicable only if the first component is nonzero and it is calculated as the disjunction of the unsatisfied branch goals. For each branch, there are two associated branch search goals that may be specified to guide a search, namely branch-or (on at least one occasion that the branch is reached, and it is executed) and branch -and (on every occasion that the branch is reached, and it is executed). A branch goal is satisfied if the associated or- cost or and-cost is negative. If the execution of a branch is required to satisfy branch coverage or a control dependency condition then branch- or is the relevant branch goal. Note that the goal of sat- isfying the branch or-cost is not adopted primarily to avoid creating an excessive number of sub-goals. More- over, if it is necessary to find an input that executes both branches at a predicate, it is hoped that such an input will be found during the search for an input to satisfy the and-cost. Recall that, the and-cost is adopted as the branch goal after one branch at a predicate has already been executed. The above rules are applied to non-loop branches only. Loops are treated different than if-state- ment, because for programs that terminate loop entries are eventually followed by a loop exit. For this reason, a sub- goal that specifies a loop predicate is always true is not sensible and thus the two possible branch goals at a loop condition are loop entry and no loop entry. 4. Experimental Environment An experimental study was designed to feature test goals that cause problems for evolutionary testing. The exper- imental study featured JUH real six test objects. These objects are drawn from the system applied at JUH hospi- tal. 4.1. Test Objects This section describes the test objects and the input do- main sizes used. The following are source code for the test objects: OutPricing: Determines the pricing of treatments at outpatient clinics. Depending on his insurance the pa- tient, this function calculates the amount of money the patient has to pay (depending on the type of in- surance, the patient pays different ratios for his treat- ment) and the amount of money the insurance has to pay for the patient’s treatment. InPricing: This procedure calculates the invoice value of the patient inside the hospital based on the type of patient insurance and the type of medical procedure offered to patients (accommodation, scout- ing, doctors’ fees, operations, laboratory, radiology, medicine, etc.). Also, this program calculates the per- centage paid by the patient and the percentage paid by the insurance company, if any. Moreover, this func- tion bills the patient with the amount of money he has to pay and bills the insurance company with the amount of the money it has to pay. JU-Med-fees-deduction: This package used for Jor- dan University staff, where there is an allocated ac- count number for each staff in the system of JUH. It calculates bill value based on Jordan University in- surance. Then deported the total amount of bill after deduct the hand-collect from the patients into tables to be used in Jordan University financial department Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 70 later on. Pat-info-ibr: This function calculates the invoice val- ue for private patients (in patients and out patients), then bills the patients with the amount of money he or she has to pay. Lab-interface: The main goal of this function is to transfer the results from medical machines (lab de- vices) to HIS system automatically (without user in- teraction). So, the function receives the message from medical devices then converts it to be entered to HIS system. Salup_new_calc_all: This procedure calculates staff incentives as follows: it selects the category that owns the nursing, administrative, officer or a medical tech- nician, by the department and qualifications. Then it determines the share of the incentives that the em- ployee is entitled, as his career (Branch Chief, Chief, Division of, etc.). It discounts days leave without pay from the employee share incentives. 4.2. Hardware and Software Environment In this section, the specifications of the experimental environment utilized by this work are presented. These specifications include both hardware and software mod- ules used in implementing the simulator. More specifi- cally, the hardware specifications that are used in the experiments include a Dual-Core Intel Processor (CPU 2.66 GHz), 2 MB L2 Cache per CPU, and 1 GB RAM. Moreover, the software specifications that are used in the experiments include windows XP. Also, the tested pro- grams that have been used to evaluate this algorithm are described in this section. Moreover, in order to assess the reliability of the cost functions introduced in the previous Section 4.1, an em- pirical investigation was done. A number of test pro- grams were assembled from JUH system including func- tions and procedures. These programs are described in Table 3. The size of each program is given as Lines of Code (LOC), number of branches, where the number of input variables ranges from 3 to 21, as shown in Table 3. The programs have been selected from JUH HIS system. Figure 3 shows a cyclomatic complexity for each pro- gram. The cyclomatic complexity metric is described by Watson and McCabe [23], which provides an objective measure of the complexity of a given module of a pro- gram code by examining its decision structure. Cyclo- matic complexity is calculated as e – n + 2, where n is the number of nodes in a graph and e is the number of edges between nodes, or we can calculate cyclomatic complexity as P + 1, where P is the number of predicate nodes in the flow graph (While and If statements). Pre- dicate nodes are those representing control structures and have one or more edges emanating from them. Cyc- Table 3. The functions used for empirical investigation. Program name Lines of code Number of branches Number of input variables OutPricing 295 116 14 InPricing 362 148 13 JU-Med-fees-deduction 307 92 4 Pat-info-ibr 259 48 3 Lab-interface 1389 538 21 Salup_new_calc_all 707 326 6 Figure 3. A cyclomatic complexity of the test programs. lomatic complexity gives an upper bound on the number of test cases required to cover all feasible branches if collateral coverage is taken into account. Each program has special characteristics to investigate the performance of GA as test data generator in order to test named block in ORACLE. Figure 3 shows a cyclomatic complexity of the test programs. The range of program’s cyclomatic complexity is between 25 and 270. These programs are available from the authors on re- quest. Each of the cost functions and associated search operators were implemented in a prototype test data gen- eration tool. The tool has been constructed by modifying the JScript (JavaScript) language compiler within the Shared Source Common Language Infrastructure (SSCLI) and can therefore be used to test PL/SQL Unit by passing test cases as parameters to these units, while these pro- grams are connected to JUH HIS. The program must in- clude directives to specify any input domain constraints that are to be applied. The tool then inserts instrumenta- tion code at each branch in the function. This instrument- tation code calculates the cost of each branch predicate whenever it is executed. The cost of each relational predicate expression was calculated according to the cost functions given in the previous Section 3.1. Where bran- ch predicate expressions consist of two or more relational predicates joined by logical connectives, and, or and not, Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units Copyright © 2013 SciRes. JSEA 71 any of the programs are higher of those typical programs that would used in unit testing, although the number of branches is probably higher than usual, which is why it were selected for the experiment. and the cost values were combined according to the sch- eme given in Bottaci [24]. The search was directed to generate data for one bran- ch at a time. The order in which the branches of the pro- gram were targeted was arbitrary, except that no nested branch was targeted before the containing branch. 5. Experimental Results and Discussions A steady-state style genetic algorithm, similar to Geni- tor [25], was used in this work. The cost function values computed for each candidate input were used to rank candidates within the population in which no duplicate genotypes are allowed. A probabilistic selection function selected parent candidates from the population with a probability based on their rank, where the highest rank- ing having the highest probability. More specifically, for a population of size n, the probability of selection (Ps) is shown in Equation (4). This section presents the results of the experiments that have been carried out to evaluate the effectiveness of our GA. Table 4 shows the number of subject program exe- cutions required by each genetic algorithm over 20 trials, where >50,000 means that the cost function not able to cover all the branches within this criteria. Also, in this table branch coverage ratio is shown, which is defined as the following: number ofbranch executed total numberofbranches 2rank1 1 s n Pnn (4) The branch coverage ratio ranged from 94% in JU- Med-fees-deduction program to 100% in Pat-info-ibr program. The total average of branch coverage for all programs is: In this work, a fixed population size of 100 was used. This parameter was not “tuned” to suit any particular program under test. In a steady state update style of ge- netic algorithms (as used in this work); new individuals that are sufficiently fit are inserted in the population as soon as they are created. Full branch coverage was at- tempted for each of the programs under test. Each branch was taken as individual target of the search, unless it was fortuitously covered during the search for test data for another branch. Genetic algorithms search generates in- puts for the function containing the current structural target. A vector of floating point, integer, characters, and string variable values corresponding to the input data is optimized. The ranges of each variable are specified. The test subject is then called with this input data. The crite- rion to stop the search was set up to terminate the search after 50,000 executions of the program under test, when only if full coverage was not achieved. Individuals were recombined using binary and real-valued (one-point and uniform) recombination, and mutated using real-valued mutation. Real-valued mutation was performed using “Gaussian distribution” and “number creep”. The size of 1242 98% 1268 Analyzing our results, we found that a 100% of condi- tion-decision coverage is impossible to reach in some test programs because there are conditions that cannot be true or false in certain situations. So, a weakness of GA could be observed in the generation of test cases to cover branch, especially when there is a strong dependency in data (records) retrieved from table, such that a specific order is required. As long as there are only few records that play an important role to satisfy a certain condition, it is possible to find adequate test scenarios. For example, this could be observed while applying GA on OutPric- ing function, as shown in Figure 4, in line 21 insur ance_status to be equal to 7, this only happen if line 18executed and this happen only if line 16 execute branch to be true (execute insert command and insur- ance_status = 7) this happen only when dummy variable is equal to “p” in line 2 and this is depends on the value Table 4. The number of branch covered and uncovered with 50,000 executions of each program. Program Name Number of branch covered Number of branch uncovered Branch coverage ratio Number of subject program executions OutPricing 112 4 96% >50,000 InPricing 144 2 99% >50,000 JU-Med-fees-deduction 87 5 94% >50,000 Pat-info-ibr 48 0 100% 11,680 Lab-interface 527 11 98% >50,000 salup_new_calc_all 321 5 98% >50,000
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 72 …[1] select p_insur_type into dummy from pricing where … [2] …[3] select decode(p_insur_type, '1', prc_limit_out_e, '2', prc_limit_out_f) [4] into p_max_cov[5] from prc_limits [6] where prc_group = p_group_id[7] and prc_division = p_div; [8] p_max_cov := nvl(p_max_cov, 99999);[9] exception [10] when no_data_found then[11] p_error_no := 1; raise exit_proc; [12] when others then[13] p_error_no := 11; raise exit_proc; [14] ….[15] if dummy = 'p' then [16] insert into ….[17] insurance_status :=7; [18] end if;[19] …[20] if insurance_status =7 then [21] update out_invoice[22] set … [23] end if;[24] when exit_proc then [25] if p_error_no = 1 then[26] raise_application_error( -20001,' Coverage Limits do not Exits'); [27] elsif p_error_no = 2 then[28] raise_application_error( -20003, ' Rate pricing does not exits for this materials!!'); [29] elsif p_error_no = 3 then[30] raise_application_error( -20004, ' Material is not Defined in the Table Price !!'); [31] elsif p_error_no = 5 then[32] raise_application_error( -20001, 'Pricing Data is incomplte for this patient!!'); [33] elsif p_error_no = 11 then[34] … [35] end if; [36] Figure 4. OutPricing program fragment. retrieved from pricing table, also in line 20 to execute branch to be true this happen only when line 12 is exe- cuted. In this case, where the target of the search is node 20 to be true, the fact that p_error_no needs to be 1 at line 12 and this happen only when the select statement in line 4 executes and exception no_data_found rose. This also is applied to lines 22, 24, 26, 28, 30 and 32. These situations are occurred in all other programs apart from Pat-info-ibr, where GA generates test data for all branch. In these cases, the test generator cannot reach the 100% of coverage due to the test program itself. With respect to the program Pat-info-ibr, there are branch conditions depend on data retrieved from table, but by coincidence, these branches are covered. This problem become more difficult when there is more than branch depends on un- covered branch. In Figure 5, we notice that all branches are covered, the number of execution of program ranged from 11,680 for Pat-info-ibr to 36,134 for salup_new_ calc_all. 6. Conclusions and Future Work In this paper we present how GA can be used as test data generator to find suitable test data according to branch criteria to test stored program units (procedures, func- tions, and packages) in ORACLE. Selected procedures, functions, and packages from Jordan University Hospital Figure 5. The number of executions required to find test data to achieve branch coverage after excluding uncovered branches in Table 4. Information System are used to test GA. The experi- mental results show that the test target in all programs under test is not reached and that the average coverage ratio percentage is 98%. A problem occurs when the tar- get branch depends on data retrieved from oracle tables. That GA cannot generate test data to execute the target branch that depends on data retrieved from tables. The future work will be focused on testing SQL exceptions, SQL statements, and combinations between branch coverage criteria and SQL commands testing try- Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 73 ing to increase the coverage ratio to 100%. REFERENCES [1] M. G. Alshraideh and L. Bottaci, “Search-Based Software Test Data Generation for String Data Using Program- Specific Search Operators,” Special Issue of Software Testing, Verification and Reliability Devoted to Extended Papers from the Third UK Testing Conference (UKTest 2005), Vol. 16, No. 3, 2006, pp. 175-203. [2] M. Alshraideh, B. A. Mahafzah and S. Al-Sharaeh, “A Multiple-Population Genetic Algorithm for Branch Cov- erage Test Data Generation,” Software Quality Control, Vol. 19, No. 3, 2011, pp. 489-513. doi:10.1007/s11219-010-9117-4 [3] M. Alshraideh, L. Bottaci and B. A. Mahafzah, “Using Program Data-State Scarcity to Guide Automatic Test Data Generation,” Software Quality Control, Vol. 18, No. 1, 2010, pp. 109-144. doi:10.1007/s11219-009-9083-x [4] A. Baresel, H. Pohlheim and S. Sadeghipour, “Structural and Functional Sequence Test of Dynamic and State- Based Software with Evolutionary Algorithms,” Pro- ceedings of the Genetic and Evolutionary Computation Conference, Chicago, 12-16 July 2003, pp. 2428-2441. [5] B. Korel, “Automated Test Generation for Programs with Procedures,” Proceedings of the International Symposium on Software Testing and Analysis, San Diego, 8-10 Janu- ary 1996, pp. 209-215. [6] S. N. Sivanandam and S. N. Deepa, “Introduction to Ge- netic Algorithms,” 1st Edition, Springer, New York, 2010. [7] C. C. Michael, G. E. McGraw and M. A. Schatz, “Gener- ating Software Test Data by Evolution,” IEEE Transac- tions on Software Engineering, Vol. 27, No. 12, 2001, pp. 1085-1110. doi:10.1109/32.988709 [8] B. Korel, “Automated Software Test Data Generation,” IEEE Transactions on Software Engineering, Vol. 16, No. 8, 1990, pp. 870-879. doi:10.1109/32.57624 [9] J. A. Edvardsson, “Survey on Automatic Test Data Gen- eration,” Proceedings of the Second Conference on Com- puter Science and Engineering, Linkoping, 21-22 October 1999, pp. 21-28. [10] J. Duran and S. Ntafos, “An Evaluation of Random Test- ing,” IEEE Transactions on Software Engineering, Vol. 10, No. 4, 1984, pp. 438-444. doi:10.1109/TSE.1984.5010257 [11] M. Harman and P. McMinn, “A Theoretical & Empirical Analysis of Evolutionary Testing and Hill Climbing for Structural Test Data Generation,” Proceedings of the 2007 International Symposium on Software Testing and Analysis, London, 9-12 July 2007, pp. 73-83. doi:10.1145/1273463.1273475 [12] P. McMinn, “Search-Based Software Test Data Genera- tion: A Survey: Research Articles,” Software Testing, Verification & Reliability, Vol. 14, No. 2, 2004, pp. 105- 156. doi:10.1002/stvr.294 [13] H. S. Eyal Salman, “Using Genetic Algorithm in Test Data Generation for ORACLE Named Block,” Master Thesis, The University of Jordan, Amman, 2010. [14] H. W. Arthur and J. Thomas, “Structured Testing: A Test- ing Methodology Using the Cyclomatic Complexity Met- ric,” National Institute of Standards, Gaithersburg, 1996. [15] M. Mitchell, “An Introduction to Genetic Algorithms,” 1st Edition, Massachusetts Institute of Technology, Cam- bridge, London 1996. [16] M. Srinivas and L. M. Patnaik, “Genetic Algorithms: A Survey,” IEEE Computer, Vol. 27, No. 6, 1994, pp. 17- 26. doi:10.1109/2.294849 [17] S. Kirkpatrick, C. D. Gellat and M. P. Vecchi, “Optimiza- tion by Simulated Annealing,” Science, Vol. 220, No. 4598, 1983, pp. 671-680. doi:10.1126/science.220.4598.671 [18] S. Urman, R. Hardman and M. McLaughlin, “Oracle Database 10g Pl/SQL Programming,” 1st Edition, McGraw- Hill, New York, 2004. [19] A. H. Watson and T. J. McCabe, “Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric,” NIST Special Publication, No. 500-235. Natio- nal Institute of Standards and Technology, Gaithersburg, 1996. [20] N. J. Tracey, J. Clark, K. Mander and J. McDermid, “An Automated Framework for Structural Test-Data Genera- tion,” Proceedings 13th IEEE Conference in Automated Software Engineering, Hawaii, 13-16 October 1998, pp. 285-288. [21] R. Pargas, M. Harrold and R. Peck, “Test-Data Genera- tion Using Genetic Algorithms,” Software Testing, Veri- fication and Reliability, Vol. 9, No. 4, 1999, pp. 263-282. doi:10.1002/(SICI)1099-1689(199912)9:4<263::AID-ST VR190>3.0.CO;2-Y [22] J. Wegener, A. Baresel and H. Sthamer, “Evolutionary Test Environment for Automatic Structural Testing,” In- formation and Software Technology, Vol. 43, No. 14, 2001, pp. 841-854. doi:10.1016/S0950-5849(01)00190-2 [23] J. Holland, “Adaptation in Natural and Artificial Sys- tems,” University of Michigan Press, Ann Arbor, 1975. [24] L. Bottaci, “Predicate Expression Cost Functions to Guide Evolutionary Search for Test Data,” Genetic and Evolutionary Computation Conference (GECCO 2003), Chicago, 12-16 July 2003, pp. 2455-2464. doi:10.1007/3-540-45110-2_149 [25] D. Whitley, “The genitor Algorithm and Selective Pres- sure: Why Rank-Based Allocation of Reproductive Trials Is Best,” Proceedings of the Third International Confer- ence on Genetic Algorithms (ICGA-89), 1989, Morgan Kaufmann Publishers Inc., San Francisco, pp. 116-121. Copyright © 2013 SciRes. JSEA
|