Journal of Software Engineering and Applications, 2013, 6, 65-73
http://dx.doi.org/10.4236/jsea.2013.62011 Published Online February 2013 (http://www.scirp.org/journal/jsea)
65
Using Genetic Algorithm as Test Data Generator for
Stored PL/SQL Program Units
Mohammad A. Alshraideh, Basel A. Mahafzah, Hamzeh S. Eyal Salman, Imad Salah
The Department of Computer Science, The University of Jordan, Amman, Jordan.
Email: mshridah@ju.edu.jo
Received December 8th, 2012; revised January 9th, 2013; accepted January 20th, 2013
ABSTRACT
PL/SQL is the most common language for ORACLE database application. It allows the developer to create stored pro-
gram units (Procedures, Functions, and Packages) to improve software reusability and hide the complexity of the execu-
tion of a specific operation behind a name. Also, it acts as an interface between SQL database and DEVELOPER.
Therefore, it is important to test these modules that consist of procedures and functions. In this paper, a new genetic
algorithm (GA), as search technique, is used in order to find the required test data according to branch criteria to test
stored PL/SQL program units. The experimental results show that this was not fully achieved, such that the test target in
some branches is not reached and the coverage percentage is 98%. A problem rises when target branch is depending on
data retrieved from tables; in this case, GA is not able to generate test cases for this branch.
Keywords: Genetic Algorithms; SQL Stored Program Units; Test Data; Structural Testing; SQL Exceptions
1. Introduction
PL/SQL is an imperative third generation language (3GL)
that was designed specifically for the processing of SQL
commands. It provides specific syntax for this purpose
and supports exactly the same data types as SQL. ORA-
CLE database can be accessed by calling PL/SQL named
block that include functions and procedures. Therefore,
they must be executed properly in order to guarantee a
reliable and confidence database system [1].
Software testing is an important stage of software de-
velopment life cycle (SDLC). It is an activity that helps
finding out bugs and errors in a software system that is
under development in order to provide a bug free and
reliable system/solution to the customer [2]. Testing has
two main types based on the knowledge of the system:
black box testing (functional) and white box testing
(structural) [2-4]. The functional testing deals with the
system as a black-box that does not explicitly use knowl-
edge of the internal structure; which means it usually
makes sure that the system is working according to the
system requirements, while the structural testing gener-
ates the test data depend on the knowledge of internal
code of the system. During structural testing, the goal is
to generate a test data which satisfy a given testing crite-
rion to cover given elements of the program. In this pa-
per, the branch coverage criterion is considered, where
each branch of the program should be reached by some
test data [2].
Generally, structural testing techniques are classified
into two categories: static testing (manual) and dynamic
testing (automatic). In the static testing, a code reviewer
reads the source code statement by statement and visu-
ally follows the logical program flow by feeding an in-
put, so it is costly. In contrast, dynamic testing tech-
niques execute the program under test on test input data
and then simply observe the results. Consequently, dy-
namic testing reduces the cost of software development
and maintenance [2]. Search-based software testing is an
example of dynamic method used to generate test set that
can be successfully applied in structural testing. It relies
on a cost function that can be used to compare candidate
test data [5].
Genetic algorithms (GA) have been very interesting
area of study in many disciplines, such as optimization,
automatic programming, economics, immune systems,
ecology and social systems. In this paper we apply the
GA as a search technique to find test data to test named
block in ORACLE; specifically, IF-statement and While-
statement and their combinations are considered [6].
The rest of the paper is organized as follows: Section 2
presents background and related work. Section 3 presents
a strategy for applying GA to test named block in ORA-
CLE. Section 4 presents experimental environment and
Section 5 presents experimental results. Finally, Section
6 concludes the paper.
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units
66
2. Background and Related Work
This section presents an overview of evolutionary algo-
rithms; such as random test data generation and Hill
Climbing, and meta-heuristic search algorithms which
proposed a potential better alternative for developing test
data generators [7,8]. Efficient existing meta-heuristic
search algorithms include Simulated Annealing, Tabu
Search, GA and Ant Colony Optimization. Each of these
search algorithms has its own advantages and disadvan-
tages over the others. They are strongly domain depend-
ent problem, because they use domain dependent knowl-
edge or heuristics related to the problem domain under
consideration. Also in this section, stored program units
are explained and an overview about Jordan University
Hospital Computer systems is presented.
2.1. Random Test Data Generation
Random test data generation is a technique based on se-
lection test data randomly until the suitable test data is
found. It only explores the search space by randomly
selecting solutions and evaluating their fitness. This is
quite an unintelligent strategy but it does not take much
effort to be implemented [9].
2.2. Hill Climbing
Hill Climbing is a well known local search algorithm.
Hill Climbing works to improve one solution, with an
initial solution randomly chosen from the search space as
a starting point. The neighborhood of this solution is in-
vestigated. If a better solution is found, then the current
solution is replaced. The neighborhood of the new solu-
tion is then investigated. If a better solution is found, the
current solution is replaced again, and so on, until no
improved neighbors can be found for the current solution.
Hill climbing is simple and gives fast results. However, it
is easy for the search to yield sub-optimal results when
the Hill Climbing leads to a solution that is locally opti-
mal, but not globally [10].
2.3. Simulated Annealing
Simulated Annealing (SA) extends Hill Climbing such
that it accepts poor solutions with low probability. SA
allows for less restricted movement around the search
space. The probability of acceptance (p) of an inferior
solution changes as the search progresses, and is calcu-
lated as in Equation (1) [11,12].
et
p
(1)
where (δ) represents the difference in the objective value
between the current solution and the neighboring inferior
solution being considered, and (t) is a control parameter
known as the temperature. The temperature is cooled
according to a cooling schedule. Initially the temperature
is high, in order to allow free movement around the
search space. As the search progresses, the temperature
decreases. However, if cooling is too rapid, not enough
of the search space will be explored, and the chance of
the search becoming stuck in the local optima is in-
creased [12].
2.4. The Principles of Genetic Algorithms
The basic concepts of GAs are developed by Holland
[13]. GAs is commonly applied to a variety of problems
involving searching and optimization. GAs search meth-
ods are rooted in the mechanisms of evolution and natu-
ral genetics. GAs draw inspiration from the natural sea-
rch and selection processes leading to the survival of the
fittest individuals. GAs generates a sequence of popula-
tions by using a selection mechanism, and use crossover
and mutation as search mechanisms [14-16].
The principle behind GAs is that they create and main-
tain a population of individuals represented by chromo-
somes (essentially a character string analogous to the
chromosomes appearing in DNA). These chromosomes
are typically encoded solutions to a problem. The chro-
mosomes then undergo a process of evolution according
to the rules of selection, mutation and reproduction. Each
individual in the environment (represented by a chromo-
some) receives a measure of its fitness. Reproduction
selects individuals with high fitness values in the popula-
tion, and through crossover and mutation of such indi-
viduals, a new population is derived in which individuals
may be even better fitted to their environment. The proc-
ess of crossover involves two chromosomes swapping
chunks of data (genetic information) and is analogous to
the process of sexual reproduction. Mutation introduces
slight changes into a small proportion of the population
and is representative of an evolutionary step. The struc-
ture of a simple GA is given in Figure 1. The algorithm
in Figure 1 will iterate until the population has evolved
to form a solution to the problem, or until a maximum
number of iterations have occurred.
Simple Genetic algorithm ( )
{
initialize population;
evaluate population;
while terminationcriterion not reached
{
select solutions for next population;
perform crossover and mutation;
evaluate population;
}
}
Figure 1. The structure of a simple GA.
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 67
2.5. Stored PL/SQL Program Units
There are three types of stored program units in PL/SQL;
procedures, functions, and packages. Every stored pro-
gram unit has a declarative part, an executable part or
body and an exception handling part which is optional
[1]. Declarative part contains variable declarations. Body
of the named block contains executable statements of
SQL and PL/SQL. Statements to handle exceptions are
written in exception part. However, subprograms provide
the following advantages [13]:
They allow you to write PL/SQL program that meets
our need.
They allow you to break the program into manageable
modules.
They provide reusability and maintainability for the
code.
Procedure is a subprogram used to perform a specific
action. A procedure contains two parts; specification and
body. Procedure specification begins with the procedure
name and ends with parameters list. Procedures that do
not take parameters are written without a parenthesis.
The body of the procedure starts after the reserved word
(IS) or (AS) and ends with keyword END [17]. A func-
tion is PL/SQL Block which is similar to a procedure.
The major difference between a procedure and a function
is the function must always return a value, but a proce-
dure does not return a value [1,17]. A package is an en-
capsulated collection of related program objects (for
example, procedures, functions, variables, constants, cur-
sors, and exceptions) stored together in the database.
Using packages is an alternative to creating procedures
and functions as standalone schema objects. Packages
have many advantages over standalone procedures and
functions. For example, they:
Let you organize your application development more
efficiently.
Let you grant privileges more efficiently.
Let you modify package objects without recompiling
dependent schema objects.
Enable Oracle Database to read multiple package ob-
jects into memory at once.
Can contain global variables and cursors that are
available to all procedures and functions in the pack-
age.
Let you overload procedures or functions. Overload-
ing a procedure means creating multiple procedures
with the same name in the same package, each taking
arguments of different number or data type.
2.6. Jordan University Hospital Computer
System
The core of Jordan University Hospital (JUH) informa-
tion system is bought in 1994, and then the JUH IT team
developed the Hospital Information System (HIS) using
Oracle forms, and upgrades it to Oracle 10 g. HIS devel-
oped to provide best medical services for patients and
physicians. Delivering these services require hospitals to
review the way they manage their business processes and
supply more efficient features to physicians, patients, and
hospitals officials as well as other decision makers. In
order to provide such services, the health facility must
focus on developing a solution to connect all its re-
sources and makes it available to all who needs utilizing
it using latest technology. This kind of solution will en-
hance the performance and optimize the efficiency and
will reduce the cost of ownership.
IT department in JUH creates a solution suite that
transforms the hospital to a community allowing the ac-
cess to all resources and data as needed. HIS is a com-
prehensive solution developed specifically for health
facilities in the region. It is flexible, comprehensive,
multilingual, integrated and secured solution that sup-
ports clinical, financial, administration and higher man-
agement needs.
In general, a hospital management system can be sub
categorize into the following groups (Figure 2):
Medical Information System (Administrative and
Clinical).
Enterprise Resource Planning (ERP) (Material, Fi-
nancial and Human Resources).
Support System.
Medical systems are developed to deliver all needed
services to the hospital community (Physicians, Patients
and Administration). The systems manage all patients’
data and information during their treatment episode in a
professional and efficient manner. Medical systems stra-
tegically support a full range of hospital functions. It
contains a repository of all patients’ clinical, billing and
Me dic al
Sys tem
Support
Sys tem
Medi cal
System
(Clinical)
Enterprise
Res ource
Planning
(ERP)
Figure 2. Hospital management system sub-categorizes.
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units
68
demographic data, reducing paper work, manual effort
and errors. Furthermore; it allows for better staff utiliza-
tion allowing for more time to focus on planning and
goals achievements. This enables the hospital to provide
better quality and more efficient services, needed by pa-
tients and physicians. Medical systems are integrated
with financial, administration, human resources, and ma-
terial management systems. It contains vast collection of
data including patient data, treatment data, hospital visit
data, patient transactions data, hospital data, and statisti-
cal information.
HIS medical systems provide many key functions in-
cluding:
Medical administrative including:
Patient master index
Admission, discharge and transfer
Scheduling and appointments
Medical records
Medical reports
Medical statistics
Catering
Order entry and results communication
Medical clinical including:
Out-patient clinics
Accidents and emergency
Operation theater
Maternity
Doctors desktop
Nurse station
Laboratory
Radiology
Pharmacy
Patient accounting including:
Pricing and package deals
Patient billing
Insurance contract management
Claims management
In order to test our system in this research we will se-
lect different procedures and functions, which will be
described later in this paper.
3. A Strategy for Applying GA to Test
Stored PL/SQL Program Units
In general, the process of automatic structural test data
generation for branch coverage consists of three major
steps [7,18,19]:
1) Construction of control logic graph, e.g. control
flow graph (CFG) or control dependency graph (CDG).
2) Selection the target according to branch coverage
criterion.
3) Finding out a set of test data that satisfies the se-
lected adequacy criterion.
In order to use GA for solving an optimization prob-
lem, there are multiple issues must be considered such
as: how to build the fitness function and how to represent
the problem in a chromosome expression (individual),
i.e. sort of a sequence of binary digits that resembles the
chromosome sequence, which GA can understand and
manipulate. GA works on this encoded problem and de-
livers the result as the problem solution; hence, the user
should provide the meaning of the encoded problem [20,
21]. In this paper integer vector and binary string repre-
sentation will be considered.
3.1. Branch Cost Functions
To use a control dependency path or any set of branches
as a search goal, it is necessary to determine the cost
values for each branch predicate. To accomplish this,
each conditional node in the program is associated with a
real-valued predicate cost function that is evaluated
whenever the conditional node is executed. This predi-
cate cost function returns a positive value whenever the
predicate is false and a negative value if the predicate is
true. The cost of an evaluation of a logical negation of a
predicate is the arithmetic negation of the cost of the
evaluation of the predicate. Each reached branch main-
tains two cost values, both derived from the associated
predicate cost function. One cost value is the cost that all
attempts to execute the branch are successful. This is
called the cumulative and-cost. The other cost value is
the cost when any attempt is successful, is called the cu-
mulative or-cost. These costs can be illustrated with an
example showing three failed and two successful at-
tempts to execute the predicate a b for various integer
values of a and b (Table 1). The predicate cost function
is a – b when the predicate is false and a – b – 1 when
the predicate is true. The cost function of or-cost and
and-cost are shown in Table 1, where a and b are posi-
tive (fal se), and a` and b` are negative (true), also a and b
are never zero.
The cost values produced by relational predicates are
normalized, but the un-normalized values are used in
Table 2. The cost of a conjunction of two false costs is the
sum of the costs of the conjuncts. However, the cost of a
disjunction of two false costs (Costd) is shown in Equation
(2), where P and Q are the disjunct costs.
Costd
PQ
PQ
(2)
Table 1. Logical or-cost and logical and-cost table.
a b or-cost and-cost
a b (ab)/(a + b) a + b
a b` b` a
a` b a` b
a` b` a` + b` (a`b`)/(a` + b`)
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 69
Table 2. Cumulative or-cost and and-cost for the predicate
a b for the listed values.
a b cost or-cost and-cost
8 3 5 5 5
6 3 3 15/8 8
5 3 2 30/31 10
3 3 1 1 10
1 3 3 4 10
These costs can be illustrated with an example show-
ing three failed and two successful attempts to execute
the predicate a b for various integer values of a and b
as shown in Table 2. Note that when both branches at a
conditional node have been executed, the and-cost is
positive and the or-cost is negative. Moreover, the mag-
nitude of the and-cost is an indication of the number and
magnitude of the failures to satisfy the predicate. A high
and-cost indicates that the predicate has hardly been sat-
isfied. A low and-cost indicates that the predicate has not
been satisfied. The cost of a search goal is calculated,
according to Equation (2), as the conjunction of the indi-
vidual branch goal costs. Each individual branch cost is
either a branch or-cost or a branch and-cost. Using this
method, there is a disadvantage that a single large branch
cost may dominate the overall cost value. Normalization
or costs reduces this risk. The cost values produced by
relational predicates (Costr) are normalized to lie within
[1, 1] using Equation (3), where c is the branch distance
value.
1
1if
1
1
Cost1if0
1
0other
r
c
c
c
c

 
0
wise
(3)
An alternative method, not used in the work reported
here, is to compute a cost consisting of two components.
One component, counts the branch goals that have yet to
be satisfied. This cost component is the analogue to the
ones used by Wegener et al. [22]. The second component
is applicable only if the first component is nonzero and it
is calculated as the disjunction of the unsatisfied branch
goals. For each branch, there are two associated branch
search goals that may be specified to guide a search,
namely branch-or (on at least one occasion that the
branch is reached, and it is executed) and branch -and
(on every occasion that the branch is reached, and it is
executed). A branch goal is satisfied if the associated or-
cost or and-cost is negative.
If the execution of a branch is required to satisfy branch
coverage or a control dependency condition then branch-
or is the relevant branch goal. Note that the goal of sat-
isfying the branch or-cost is not adopted primarily to
avoid creating an excessive number of sub-goals. More-
over, if it is necessary to find an input that executes both
branches at a predicate, it is hoped that such an input will
be found during the search for an input to satisfy the
and-cost. Recall that, the and-cost is adopted as the
branch goal after one branch at a predicate has already
been executed. The above rules are applied to non-loop
branches only. Loops are treated different than if-state-
ment, because for programs that terminate loop entries are
eventually followed by a loop exit. For this reason, a sub-
goal that specifies a loop predicate is always true is not
sensible and thus the two possible branch goals at a loop
condition are loop entry and no loop entry.
4. Experimental Environment
An experimental study was designed to feature test goals
that cause problems for evolutionary testing. The exper-
imental study featured JUH real six test objects. These
objects are drawn from the system applied at JUH hospi-
tal.
4.1. Test Objects
This section describes the test objects and the input do-
main sizes used. The following are source code for the
test objects:
OutPricing: Determines the pricing of treatments at
outpatient clinics. Depending on his insurance the pa-
tient, this function calculates the amount of money
the patient has to pay (depending on the type of in-
surance, the patient pays different ratios for his treat-
ment) and the amount of money the insurance has to
pay for the patient’s treatment.
InPricing: This procedure calculates the invoice
value of the patient inside the hospital based on the
type of patient insurance and the type of medical
procedure offered to patients (accommodation, scout-
ing, doctors’ fees, operations, laboratory, radiology,
medicine, etc.). Also, this program calculates the per-
centage paid by the patient and the percentage paid by
the insurance company, if any. Moreover, this func-
tion bills the patient with the amount of money he has
to pay and bills the insurance company with the
amount of the money it has to pay.
JU-Med-fees-deduction: This package used for Jor-
dan University staff, where there is an allocated ac-
count number for each staff in the system of JUH. It
calculates bill value based on Jordan University in-
surance. Then deported the total amount of bill after
deduct the hand-collect from the patients into tables
to be used in Jordan University financial department
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units
70
later on.
Pat-info-ibr: This function calculates the invoice val-
ue for private patients (in patients and out patients),
then bills the patients with the amount of money he or
she has to pay.
Lab-interface: The main goal of this function is to
transfer the results from medical machines (lab de-
vices) to HIS system automatically (without user in-
teraction). So, the function receives the message from
medical devices then converts it to be entered to HIS
system.
Salup_new_calc_all: This procedure calculates staff
incentives as follows: it selects the category that owns
the nursing, administrative, officer or a medical tech-
nician, by the department and qualifications. Then it
determines the share of the incentives that the em-
ployee is entitled, as his career (Branch Chief, Chief,
Division of, etc.). It discounts days leave without pay
from the employee share incentives.
4.2. Hardware and Software Environment
In this section, the specifications of the experimental
environment utilized by this work are presented. These
specifications include both hardware and software mod-
ules used in implementing the simulator. More specifi-
cally, the hardware specifications that are used in the
experiments include a Dual-Core Intel Processor (CPU
2.66 GHz), 2 MB L2 Cache per CPU, and 1 GB RAM.
Moreover, the software specifications that are used in the
experiments include windows XP. Also, the tested pro-
grams that have been used to evaluate this algorithm are
described in this section.
Moreover, in order to assess the reliability of the cost
functions introduced in the previous Section 4.1, an em-
pirical investigation was done. A number of test pro-
grams were assembled from JUH system including func-
tions and procedures. These programs are described in
Table 3. The size of each program is given as Lines of
Code (LOC), number of branches, where the number of
input variables ranges from 3 to 21, as shown in Table 3.
The programs have been selected from JUH HIS system.
Figure 3 shows a cyclomatic complexity for each pro-
gram. The cyclomatic complexity metric is described by
Watson and McCabe [23], which provides an objective
measure of the complexity of a given module of a pro-
gram code by examining its decision structure. Cyclo-
matic complexity is calculated as en + 2, where n is
the number of nodes in a graph and e is the number of
edges between nodes, or we can calculate cyclomatic
complexity as P + 1, where P is the number of predicate
nodes in the flow graph (While and If statements). Pre-
dicate nodes are those representing control structures and
have one or more edges emanating from them. Cyc-
Table 3. The functions used for empirical investigation.
Program name Lines of
code
Number of
branches
Number of
input variables
OutPricing 295 116 14
InPricing 362 148 13
JU-Med-fees-deduction 307 92 4
Pat-info-ibr 259 48 3
Lab-interface 1389 538 21
Salup_new_calc_all 707 326 6
Figure 3. A cyclomatic complexity of the test programs.
lomatic complexity gives an upper bound on the number
of test cases required to cover all feasible branches if
collateral coverage is taken into account. Each program
has special characteristics to investigate the performance
of GA as test data generator in order to test named block
in ORACLE. Figure 3 shows a cyclomatic complexity of
the test programs. The range of program’s cyclomatic
complexity is between 25 and 270.
These programs are available from the authors on re-
quest. Each of the cost functions and associated search
operators were implemented in a prototype test data gen-
eration tool. The tool has been constructed by modifying
the JScript (JavaScript) language compiler within the
Shared Source Common Language Infrastructure (SSCLI)
and can therefore be used to test PL/SQL Unit by passing
test cases as parameters to these units, while these pro-
grams are connected to JUH HIS. The program must in-
clude directives to specify any input domain constraints
that are to be applied. The tool then inserts instrumenta-
tion code at each branch in the function. This instrument-
tation code calculates the cost of each branch predicate
whenever it is executed. The cost of each relational
predicate expression was calculated according to the cost
functions given in the previous Section 3.1. Where bran-
ch predicate expressions consist of two or more relational
predicates joined by logical connectives, and, or and not,
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units
Copyright © 2013 SciRes. JSEA
71
any of the programs are higher of those typical programs
that would used in unit testing, although the number of
branches is probably higher than usual, which is why it
were selected for the experiment.
and the cost values were combined according to the sch-
eme given in Bottaci [24].
The search was directed to generate data for one bran-
ch at a time. The order in which the branches of the pro-
gram were targeted was arbitrary, except that no nested
branch was targeted before the containing branch. 5. Experimental Results and Discussions
A steady-state style genetic algorithm, similar to Geni-
tor [25], was used in this work. The cost function values
computed for each candidate input were used to rank
candidates within the population in which no duplicate
genotypes are allowed. A probabilistic selection function
selected parent candidates from the population with a
probability based on their rank, where the highest rank-
ing having the highest probability. More specifically, for
a population of size n, the probability of selection (Ps) is
shown in Equation (4).
This section presents the results of the experiments that
have been carried out to evaluate the effectiveness of our
GA. Table 4 shows the number of subject program exe-
cutions required by each genetic algorithm over 20 trials,
where >50,000 means that the cost function not able to
cover all the branches within this criteria. Also, in this
table branch coverage ratio is shown, which is defined as
the following:
number ofbranch executed
total numberofbranches


2rank1
1
s
n
Pnn
(4) The branch coverage ratio ranged from 94% in JU-
Med-fees-deduction program to 100% in Pat-info-ibr
program. The total average of branch coverage for all
programs is:
In this work, a fixed population size of 100 was used.
This parameter was not “tuned” to suit any particular
program under test. In a steady state update style of ge-
netic algorithms (as used in this work); new individuals
that are sufficiently fit are inserted in the population as
soon as they are created. Full branch coverage was at-
tempted for each of the programs under test. Each branch
was taken as individual target of the search, unless it was
fortuitously covered during the search for test data for
another branch. Genetic algorithms search generates in-
puts for the function containing the current structural
target. A vector of floating point, integer, characters, and
string variable values corresponding to the input data is
optimized. The ranges of each variable are specified. The
test subject is then called with this input data. The crite-
rion to stop the search was set up to terminate the search
after 50,000 executions of the program under test, when
only if full coverage was not achieved. Individuals were
recombined using binary and real-valued (one-point and
uniform) recombination, and mutated using real-valued
mutation. Real-valued mutation was performed using
“Gaussian distribution” and “number creep”. The size of
1242 98%
1268
Analyzing our results, we found that a 100% of condi-
tion-decision coverage is impossible to reach in some test
programs because there are conditions that cannot be true
or false in certain situations. So, a weakness of GA could
be observed in the generation of test cases to cover
branch, especially when there is a strong dependency in
data (records) retrieved from table, such that a specific
order is required. As long as there are only few records
that play an important role to satisfy a certain condition,
it is possible to find adequate test scenarios. For example,
this could be observed while applying GA on OutPric-
ing function, as shown in Figure 4, in line 21 insur
ance_status to be equal to 7, this only happen if line
18executed and this happen only if line 16 execute
branch to be true (execute insert command and insur-
ance_status = 7) this happen only when dummy variable
is equal to “p” in line 2 and this is depends on the value
Table 4. The number of branch covered and uncovered with 50,000 executions of each program.
Program Name Number of branch
covered
Number of branch
uncovered
Branch
coverage ratio
Number of subject
program executions
OutPricing 112 4 96% >50,000
InPricing 144 2 99% >50,000
JU-Med-fees-deduction 87 5 94% >50,000
Pat-info-ibr 48 0 100% 11,680
Lab-interface 527 11 98% >50,000
salup_new_calc_all 321 5 98% >50,000
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units
72
[1]
select p_insur_type into dummy from pricing where …
[2]
[3]
select decode(p_insur_type, '1', prc_limit_out_e, '2', prc_limit_out_f)
[4]
into p_max_cov[5]
from prc_limits
[6]
where prc_group = p_group_id[7]
and prc_division = p_div;
[8]
p_max_cov := nvl(p_max_cov, 99999);[9]
exception
[10]
when no_data_found then[11]
p_error_no := 1; raise exit_proc;
[12]
when others then[13]
p_error_no := 11; raise exit_proc;
[14]
….[15]
if dummy = 'p' then
[16]
insert into ….[17]
insurance_status :=7;
[18]
end if;[19]
[20]
if insurance_status =7 then
[21]
update out_invoice[22]
set …
[23]
end if;[24]
when exit_proc then
[25]
if p_error_no = 1 then[26]
raise_application_error( -20001,' Coverage Limits do not Exits');
[27]
elsif p_error_no = 2 then[28]
raise_application_error( -20003, ' Rate pricing does not exits for this materials!!');
[29]
elsif p_error_no = 3 then[30]
raise_application_error( -20004, ' Material is not Defined in the Table Price !!');
[31]
elsif p_error_no = 5 then[32]
raise_application_error( -20001, 'Pricing Data is incomplte for this patient!!');
[33]
elsif p_error_no = 11 then[34]
[35]
end if;
[36]
Figure 4. OutPricing program fragment.
retrieved from pricing table, also in line 20 to execute
branch to be true this happen only when line 12 is exe-
cuted. In this case, where the target of the search is node
20 to be true, the fact that p_error_no needs to be 1 at
line 12 and this happen only when the select statement in
line 4 executes and exception no_data_found rose. This
also is applied to lines 22, 24, 26, 28, 30 and 32. These
situations are occurred in all other programs apart from
Pat-info-ibr, where GA generates test data for all branch.
In these cases, the test generator cannot reach the 100%
of coverage due to the test program itself. With respect to
the program Pat-info-ibr, there are branch conditions
depend on data retrieved from table, but by coincidence,
these branches are covered. This problem become more
difficult when there is more than branch depends on un-
covered branch.
In Figure 5, we notice that all branches are covered,
the number of execution of program ranged from 11,680
for Pat-info-ibr to 36,134 for salup_new_ calc_all.
6. Conclusions and Future Work
In this paper we present how GA can be used as test data
generator to find suitable test data according to branch
criteria to test stored program units (procedures, func-
tions, and packages) in ORACLE. Selected procedures,
functions, and packages from Jordan University Hospital
Figure 5. The number of executions required to find test
data to achieve branch coverage after excluding uncovered
branches in Table 4.
Information System are used to test GA. The experi-
mental results show that the test target in all programs
under test is not reached and that the average coverage
ratio percentage is 98%. A problem occurs when the tar-
get branch depends on data retrieved from oracle tables.
That GA cannot generate test data to execute the target
branch that depends on data retrieved from tables.
The future work will be focused on testing SQL
exceptions, SQL statements, and combinations between
branch coverage criteria and SQL commands testing try-
Copyright © 2013 SciRes. JSEA
Using Genetic Algorithm as Test Data Generator for Stored PL/SQL Program Units 73
ing to increase the coverage ratio to 100%.
REFERENCES
[1] M. G. Alshraideh and L. Bottaci, “Search-Based Software
Test Data Generation for String Data Using Program-
Specific Search Operators,” Special Issue of Software
Testing, Verification and Reliability Devoted to Extended
Papers from the Third UK Testing Conference (UKTest
2005), Vol. 16, No. 3, 2006, pp. 175-203.
[2] M. Alshraideh, B. A. Mahafzah and S. Al-Sharaeh, “A
Multiple-Population Genetic Algorithm for Branch Cov-
erage Test Data Generation,” Software Quality Control,
Vol. 19, No. 3, 2011, pp. 489-513.
doi:10.1007/s11219-010-9117-4
[3] M. Alshraideh, L. Bottaci and B. A. Mahafzah, “Using
Program Data-State Scarcity to Guide Automatic Test
Data Generation,” Software Quality Control, Vol. 18, No.
1, 2010, pp. 109-144. doi:10.1007/s11219-009-9083-x
[4] A. Baresel, H. Pohlheim and S. Sadeghipour, “Structural
and Functional Sequence Test of Dynamic and State-
Based Software with Evolutionary Algorithms,” Pro-
ceedings of the Genetic and Evolutionary Computation
Conference, Chicago, 12-16 July 2003, pp. 2428-2441.
[5] B. Korel, “Automated Test Generation for Programs with
Procedures,” Proceedings of the International Symposium
on Software Testing and Analysis, San Diego, 8-10 Janu-
ary 1996, pp. 209-215.
[6] S. N. Sivanandam and S. N. Deepa, “Introduction to Ge-
netic Algorithms,” 1st Edition, Springer, New York, 2010.
[7] C. C. Michael, G. E. McGraw and M. A. Schatz, “Gener-
ating Software Test Data by Evolution,” IEEE Transac-
tions on Software Engineering, Vol. 27, No. 12, 2001, pp.
1085-1110. doi:10.1109/32.988709
[8] B. Korel, “Automated Software Test Data Generation,”
IEEE Transactions on Software Engineering, Vol. 16, No.
8, 1990, pp. 870-879. doi:10.1109/32.57624
[9] J. A. Edvardsson, “Survey on Automatic Test Data Gen-
eration,” Proceedings of the Second Conference on Com-
puter Science and Engineering, Linkoping, 21-22 October
1999, pp. 21-28.
[10] J. Duran and S. Ntafos, “An Evaluation of Random Test-
ing,” IEEE Transactions on Software Engineering, Vol.
10, No. 4, 1984, pp. 438-444.
doi:10.1109/TSE.1984.5010257
[11] M. Harman and P. McMinn, “A Theoretical & Empirical
Analysis of Evolutionary Testing and Hill Climbing for
Structural Test Data Generation,” Proceedings of the
2007 International Symposium on Software Testing and
Analysis, London, 9-12 July 2007, pp. 73-83.
doi:10.1145/1273463.1273475
[12] P. McMinn, “Search-Based Software Test Data Genera-
tion: A Survey: Research Articles,” Software Testing,
Verification & Reliability, Vol. 14, No. 2, 2004, pp. 105-
156. doi:10.1002/stvr.294
[13] H. S. Eyal Salman, “Using Genetic Algorithm in Test
Data Generation for ORACLE Named Block,” Master
Thesis, The University of Jordan, Amman, 2010.
[14] H. W. Arthur and J. Thomas, “Structured Testing: A Test-
ing Methodology Using the Cyclomatic Complexity Met-
ric,” National Institute of Standards, Gaithersburg, 1996.
[15] M. Mitchell, “An Introduction to Genetic Algorithms,”
1st Edition, Massachusetts Institute of Technology, Cam-
bridge, London 1996.
[16] M. Srinivas and L. M. Patnaik, “Genetic Algorithms: A
Survey,” IEEE Computer, Vol. 27, No. 6, 1994, pp. 17-
26. doi:10.1109/2.294849
[17] S. Kirkpatrick, C. D. Gellat and M. P. Vecchi, “Optimiza-
tion by Simulated Annealing,” Science, Vol. 220, No.
4598, 1983, pp. 671-680.
doi:10.1126/science.220.4598.671
[18] S. Urman, R. Hardman and M. McLaughlin, “Oracle
Database 10g Pl/SQL Programming,” 1st Edition, McGraw-
Hill, New York, 2004.
[19] A. H. Watson and T. J. McCabe, “Structured Testing: A
Testing Methodology Using the Cyclomatic Complexity
Metric,” NIST Special Publication, No. 500-235. Natio-
nal Institute of Standards and Technology, Gaithersburg,
1996.
[20] N. J. Tracey, J. Clark, K. Mander and J. McDermid, “An
Automated Framework for Structural Test-Data Genera-
tion,” Proceedings 13th IEEE Conference in Automated
Software Engineering, Hawaii, 13-16 October 1998, pp.
285-288.
[21] R. Pargas, M. Harrold and R. Peck, “Test-Data Genera-
tion Using Genetic Algorithms,” Software Testing, Veri-
fication and Reliability, Vol. 9, No. 4, 1999, pp. 263-282.
doi:10.1002/(SICI)1099-1689(199912)9:4<263::AID-ST
VR190>3.0.CO;2-Y
[22] J. Wegener, A. Baresel and H. Sthamer, “Evolutionary
Test Environment for Automatic Structural Testing,” In-
formation and Software Technology, Vol. 43, No. 14,
2001, pp. 841-854. doi:10.1016/S0950-5849(01)00190-2
[23] J. Holland, “Adaptation in Natural and Artificial Sys-
tems,” University of Michigan Press, Ann Arbor, 1975.
[24] L. Bottaci, “Predicate Expression Cost Functions to
Guide Evolutionary Search for Test Data,” Genetic and
Evolutionary Computation Conference (GECCO 2003),
Chicago, 12-16 July 2003, pp. 2455-2464.
doi:10.1007/3-540-45110-2_149
[25] D. Whitley, “The genitor Algorithm and Selective Pres-
sure: Why Rank-Based Allocation of Reproductive Trials
Is Best,” Proceedings of the Third International Confer-
ence on Genetic Algorithms (ICGA-89), 1989, Morgan
Kaufmann Publishers Inc., San Francisco, pp. 116-121.
Copyright © 2013 SciRes. JSEA