User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

doi:10.4236/jsea.2010.36062

Paper Menu >>

Journal Menu >>

J. Software Engineering & Applications, 2010, 3, 541-547

doi:10.4236/jsea.2010.36062 Published Online June 2010 (http://www.SciRP.org/journal/jsea)

User Session-Based Test Case Generation and

Optimization Using Genetic Algorithm*

Zhongsheng Qian

School of Information Technology, Jiangxi University of Finance & Economics, Nanchang, China.

Email: changesme@163.com

Received March 22nd, 2010; revised April 12th, 2010; accepted April 13th, 2010.

ABSTRACT

An approach to generating and optimizing test cases is proposed for Web application testing based on user sessions using

genetic algorithm. A large volume of meaningful user sessions are obtained after purging their irrelevant information by

analyzing user logs on the Web server. Most of the redundant user sessions are also removed by the reduction process.

For test reuse and test concurrency, it divides the user sessions obtained into different groups, each of which is called a

test suite, and then prioritizes the test suites and the test cases of each test suite. So, the initial test suites and test cases,

and their initial executing sequences are achieved. However, the test scheme generated by the elementary prioritization is

not much approximate to the best one. Therefore, genetic algorithm is employed to optimize the results of grouping and

prioritization. Meanwhile, an approach to generating new test cases is presented using crossover. The new test cases can

detect faults caused by the use of possible conflicting data shared by different users.

Keywords: User Session, Genetic Algorithm, Test Case, Test Suite, Reduction, Prioritization

1. Introduction

Incapable Web applications can have far-ranging conse-

quences on businesses, economies, scientific progress,

health, and so on. Therefore, all the entities of a Web

application, in essence, must be tested adequately to en-

sure that the application meets its original design speci-

fications.

In Web application testing, some test methods and

techniques were presented [1-4]. Kung, et al. [2] depicted

an object-oriented Web Test Model to support Web ap-

plication testing. Hieatt, et al. [1] introduced a method of

acceptance testing, and developed a testing tool to show

the system operations and the expected output results in

XML. The approaches proposed by Kung and Hieatt, et al.,

however, focus primarily on unit testing without con-

cerning the whole testing for Web applications. Liu, et al.

[4] presented a data flow-based approach to testing Web

applications. Most of these methods of testing Web ap-

plications are achieved through extending the testing

methods for traditional software. Additionally, none of

these methods [1-4] yields test data according to user

sessions.

Elbaum, et al. [5] demonstrated the fault detection ca-

pabilities and cost-effectiveness of user session-based

testing. Increment concept analysis [6] is used to analyze

user sessions dynamically and minimize continuously the

number of user sessions maintained. Khor, et al. [7] com-

bined the genetic algorithm with formal concept analysis

to trace the relationship between test data and corres-

ponding test run.

Sthamer [8] analyzed deeply the test case optimization

efficiency of different coding schemes and fitness func-

tions of genetic algorithm for different structure-based

software in his doctoral dissertation. Some researchers

also studied on test case generation techniques using

genetic algorithm [9-10]. However, they aimed for sim-

plex optimization focusing on one-off optimization com-

putation, while the testing is continuous and iterative. So,

dynamically continuous optimization computation is

more propitious to improve test performance. Jia, et al.

[11] discussed the key problems of producing test data of

covering designated paths using genetic algorithm, and

introduced deeply the factors of influencing the gene-

This research has been partly supported by the National High-

Tech Research and Development Plan of China under Grant No. 2007

AA01Z144; the Science and Technology Plan Project of the Education

Department of Jiangxi Province of China under Grant Nos. GJJ10120

and GJJ10117; Shanghai Leading Academic Discipline Project of Chi-

na with Project No. J50103; and the School Foundation of Jiangxi Uni-

versity of Finance & Economics of China under Grant No. 04722015.

tic algorithm’s efficiency through experimental results.

Berndt and Watkins [12] summarized the advancement in

generating data for generic algorithm-based testing re-

cently. These studies [5,7-12] used genetic algorithm to

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

542

analyze test problems with the exception of [5], but not

focusing on Web application testing. In [5], the authors

discussed user session-based Web application testing, but

not concerning deeply the optimization of test cases.

Different from them, this paper investigates a key pro-

blem in Web application testing: test case generation and

optimization. It delves into an approach to testing and

optimizing Web applications based on user sessions us-

ing genetic algorithm. When converting a user session to

a corresponding test case, we preserve the user input

data.

2. Collecting and Reducing User Sessions

In the logs on each Web server, each access record cor-

responds to a request by a user each time. The contents of

the record include request source (user IP address), re-

quest time, request mode (such as GET, POST), the URL

of requested information, data transport protocol (such as

HTTP), status code, the number of bytes transferred and

the type of client, etc. It needs to scan the logs only once

to resolve the original active historic records from current

access logs. However, it is difficult to organize these

original records directly, which must be preprocessed. We

first purge irrelevant data including the records whose

status codes are erroneous (the code 200 for success, 400

for error), embedded resources such as script files and

multimedia files whose extension names are .gif, .jpeg

or .css, etc., to obtain the set of user sessions for primitive

analysis. Then, we create user sessions through scanning

the logs on Web servers. Once a new IP address occurs, a

new user session is created. The sequential requests sent

from this IP address are appended to the new session

under the condition that the time interval of two con-

tinuous requests is not greater than max-session-idle-time

pre-determined, or else another new user session begins.

The set of all the user sessions is finally achieved.

There are often large volumes of user sessions collected

and a user session is converted to a test case transmitted to

Web server. Therefore, we can eliminate redundant user

sessions using reduction techniques, and then preserve

necessary user sessions. For the convenience of discus-

sion, some important concepts are given first.

Definition 1 (URL trace). URL trace is the URL se-

quence requested by a user session.

Let α be a URL trace. Its length is the number of URLs

requested in the trace, denoted by |α|.

Definition 2 (prefix). A trace α is the prefix of another

trace β, if and only if α is the subsequence of β and they

have the same initial symbol.

Definition 3 (common prefix). If a trace is the prefix of

several traces, then this trace becomes their common prefix.

Definition 4 (greatest common prefix). The longest

common prefix in all the common prefixes of two traces is

their greatest common prefix.

The longer the greatest common prefix of two traces is,

the more similar the two traces are, i.e., their similarity is

higher. Let we have four traces that are γ1 = abcdefg, γ2 =

abcdeh, γ3 = abcd and γ4 = cde. Then, γ3 is the common

prefix of γ1 and γ2, but not the greatest one. The greatest

common prefix of γ1 and γ2 is abcde. Although γ4 is the

subsequence of γ1 and γ2, it is not their prefix, for the

initial symbol of γ4 is not the same as that of γ1 and γ2. We

define a function isPrefix(α, β), which decides whether or

not α is the prefix of β, i.e., whether or not the URL trace

requested by a user session is redundant corresponding to

that requested by another user session. If there is a prefix,

then the function returns a BOOLEAN value TRUE, or

else returns FALSE. The URL trace-based user session

algorithm ReduceUSession is shown in Figure 1. Note

that, an HTTP request here is regarded as a symbol of a

trace.

The ReduceUSession algorithm decides whether or

not a URL trace α requested by a user session is the pre-

fix of β requested by another user session. If it is true,

then the user session corresponding to α is removed. The

number of user sessions obtained using this algorithm

will be reduced greatly.

The ReduceUSession algorithm is different from other

Algorithm: ReduceUSession

input:

The set of user sessions Λ = {s1, …, sk}, where k is the number

of user sessions;

The URL trace U1, …, Uk, which are requested by s1, …, sk

respectively;

output:

The reduced set of user sessions denoted by Γ;

begin

Γ = Ф;

while (another user session that is not marked in Λ exists)

tag1 = FALSE;

tag2 = FALSE;

Select a user session si that is not marked in Λ, and then

mark it with “USED”;

for (the URL trace Uj requested by each user session sj in

Γ)

if isPrefix(Uj, Ui) //the URLs requested by si is

//more, so sj is redundant

Γ = Γ-{sj};

tag1 = TRUE;

endif;

if isPrefix(Ui, Uj) //here, Γ keeps unchanged,

//and si is redundant

tag2 = TRUE;

break; //exit for cycle

endif;

endfor;

if tag1 || (!tag1 && !tag2) //si is necessary

Γ = Γ∪{si};

endif;

endwhile;

Output the reduced set of user sessions Γ;

end.

Figure 1. The URL trace-based user session reduction algo-

rithm ReduceUSession

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

543

user session reduction algorithms. Most other reduction

algorithms analyze each test case in the test suites one by

one and remove those test cases that cannot change or

affect test requirements, i.e., they distinguish redundant

and necessary test cases, as is too difficult to manage in

practice, for it is very intractable to discriminate that the

test requirements satisfied by some test cases (i.e., re-

dundant ones) are also satisfied by other test cases before

the test execution using the existential algorithms. While

our ReduceUSession algorithm decides whether or not a

URL trace requested by a user session is the prefix of

another URL trace requested by another user session.

Based on this, we can identify the redund an t test cases,

as can be easily done in practice. Besides, it covers all

the URLs requested by the original set of user sessions

and keeps the sequence of URL requests, i.e., it guaran-

tees that the original test requirements are satisfied.

3. Grouping and Prioritizing User Sessions

The user sessions reduced by the ReduceUSession algo-

rithm are divided into subgroups, each of which is re-

garded as a test suite. The goal of grouping is to reuse test

cases and the testing can be executed at different plat-

forms in parallel (or concurrently), to lessen test time and

improve test efficiency. Moreover, the interacting test can

be conducted between a pair of user sessions in each

group (see Subsection 4.4 for the details). We try our best

to keep the property for the user sessions in the same

group that the URL traces requested bear greatest com-

mon prefix of a certain length. Several discontinuous

integral threshold values, denoted by ζ1, ζ2, …ζk (ζi ≥ 1, 1

≤ i ≤ k), are defined. The user sessions, the lengths of

whose greatest common prefix of URL traces requested

fall in between a certain threshold values, are grouped

together. Fox example, let we have three threshold values

ζ1 = 2, ζ2 = 4 and ζ3 = 7, then the user sessions are divided

into four groups that are S1, S2, S3 and S4, the lengths of

whose greatest common prefix (denoted by α) are |α| ≤ 2,

2 < |α| ≤ 4, 4 < |α| ≤ 7 and |α| > 7 respectively. The test

cases in these four groups can be executed at different

platforms in parallel (or concurrently), and the greatest

common prefix is also reused in each group. Notice that

this is not the unique grouping way. For example, in the

four traces γ1 = abcdefg, γ2 = abcdeh, γ3 = abcd and γ4 =

cde, we can divide them into two groups {γ1, γ2} and {γ3,

γ4}, the lengths of whose greatest common prefix are 4 <

|α| ≤ 7 and |α| ≤ 2 respectively, or another two groups {γ1,

γ2, γ3} and {γ4}, the lengths of whose greatest common

prefix are 2 < |α| ≤ 4 and |α| ≤ 2 respectively. Of course, it

is unnecessary to require that the greatest common prefix

of URL traces requested by any two user sessions in each

group fall in between a certain threshold values; it is

recommended that most of them satisfy this property (a

percentage can be pre-designed or generated randomly for

the measurement). A compromise way needs to be found

to classify these URL traces (or test suites), i.e., a tradeoff

should be found between grouping and concurrent testing

and test reuse. Suppose that N user sessions are divided

into K groups. In general, there is almost the same number

of user sessions in one group as that in another, i.e., the

number equals approximately to





. Additionally, this

way of grouping is preparatory, called elementary

grouping.

The common prefix indicates the users’ common

events, or the same or similar operations. It also shows

that the users bear the same or similar interests. The

longer the common prefix is, the more evident it is, as is

the case of most users. In addition, there is a special group

of user sessions, the length of whose greatest common

prefix of URL traces requested is shortest. This group of

user sessions often indicates different URL requests,

which represent distinct requirements for a Web applica-

tion. In these sessions, many aberrant events often occur

with unwonted input data. They belong to boundary cases,

which are very easy to go wrong for the Web application.

Herein, we prioritize test suites. The test suite with short-

est length of common prefix ranks first, then all the other

test suites are arranged according to their lengths of

common prefix in descending order. So, the test suite in

final position is that whose length of common prefix is

last but one. In the test suites S1, S2, S3 and S4, the lengths

of whose common prefixes are |α| ≤ 2, 2 < |α| ≤ 4, 4 < |α| ≤

7 and |α| > 7 respectively, if we prioritize them, then the

test executing sequence for those test suites are S1, S4, S3,

S2. That is to say, the test cases in S1 are executed first,

then the test cases in S4 and S3 are executed respectively

and finally, the test cases in S2 are executed. In each test

suite, the test cases are prioritized according to the cov-

erage ratios of URLs requested, i.e., the test case with

longer URL trace requested is executed earlier. If the

lengths of URL traces of several test cases in the same test

suite are equal, then they are randomly executed.

Our approach of prioritization is different from others.

The existing methods of prioritization add the strongest

test case, which is of the maximal use for the coverage

ratio of test requirements, to the new test suite. Those

methods aim to execute earlier the test cases of high pri-

ority than those of lower priority, to satisfy some test re-

quirements as soon as possible. These methods, however,

are often difficult to find the (nearly) strongest test case

each time before test run, while our approach divides test

cases into several groups according to the idea of common

prefix before prioritizing them. It is more convenient for

the testers to selectively execute the test cases of some

group by grouping all the test cases first, to detect some

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

544

types of faults, for the same or similar types of errors are

often detected by those test cases in the same groups. In

addition, test cases can be reused by grouping and the

testing can be executed at different platforms in parallel

(or concurrently), to lessen test time and improve test

efficiency. Moreover, we have considered a special type

of user sessions, the length of whose greatest common

prefix of URL traces is shortest. This group of user ses-

sions often contains specific requests, which are the pri-

mary source for errors.

4. Testing Web Applications Using Genetic

Algorithm

After the process of grouping and prioritization above, we

obtain several initial test suites and test cases with the

initial executing sequences. However, the test scheme

generated by the elementary prioritization is not fast in

finding faults and can not satisfy the requirements earlier,

i.e., it is not much approximate to the best one. Therefore,

genetic algorithm is employed further to optimize the

grouping and prioritization.

In 1975, an American professor Holland first proposed

the idea of genetic algorithm systematically [13]. It has

attracted a large number of researchers and extended into

those aspects of optimization, search and machine learn-

ing with a solid theoretic foundation. The genetic algo-

rithm focuses on all the individuals in one population, and

uses random techniques to search efficiently for a coded

parameter space. Selection, crossover and mutation are

the basic operators in genetic algorithm; parameter coding,

the setting of initial population, the design of fitness

function, the selection of genetic operations and control

parameters consist of the critical part of genetic algorithm.

As a global optimization search algorithm of high effi-

ciency, the genetic algorithm has distinctive advantage in

solving the difficult problems in the domains of big space,

multiple peak, nonlinearity and parallel processing, etc.

In the following, we test Web applications using genetic

algorithm to further optimize the initial test suites and test

cases, and their initial executing sequences, in order to

achieve better test suites and test cases that satisfy test

requirements. The process of yielding a population of next

generation using three basic operators that are selection,

crossover and mutation once is called an iteration. To

obtain a good result, much iteration is repeated. The fol-

lowing introduces the process of selection, crossover and

mutation.

4.1 Selection

A pair of individuals is selected from a parent population

with the probability of ps. The probability that an indi-

vidual is selected is in direct proportion to its fitness

value, as is often implemented using the strategy of rou-

lette wheel [13]. In selecting, the individual of high fit-

ness value is duplicated into the population of next gen-

eration directly. The higher the fitness value of an indi-

vidual is, the higher the probability of yielding its off-

spring is, as shows that it is more appropriate to the ex-

pected result. Let we get K test suites (constituting the

initial population), which are S1, S2, …, SK of the de-

scending order according to the prioritization technique.

In practice, we combine the error coverage ratios of test

suites and the cost of test run to design fitness function.

The fitness value is listed as f1, f2, …, fK from high to

low, where fi is the fitness of Si (1≤i≤K). The probability

that an individual is selected equals to the resulting value

that its fitness value divides the sum of fitness values of

all the individuals, i.e., the selected probability of Si,

denoted by ps

Si, is





. Obviously, the sum of the

selected probabilities of all the K test suites equals to 1.

According to the discussion above, the fitness f1 and f2

corresponds to two special test suites S1 and S2, the

lengths of whose greatest common prefixes are shortest

and longest respectively. S1 and S2 are selected to be the

individuals of next generation directly (in practical use,

we can also select more than two individuals of high

fitness to become the next individuals); in case they do

not be selected. Now, we randomly yield two numbers

that are g1, g2∈[0, 1], and randomly select two test suites

that are Si and Sj whose probabilities are not less than g1

and g2 respectively. The two new test suites Si’ and Sj’

(the individuals of next generation) are generated throu-

gh crossovering and mutating their parents Si and Sj (see

Subsections 4.2 and 4.3). Repeat the process of selection,

until adequate test suites of next generation are yielded.

One point should be emphasized that some test suites of

higher probabilities may not be selected as parents, while

others of lower probabilities are selected. This case is

reasonable and accords with the theory of biological

evolution in nature, because any thing has its necessity

and occasionality at the same time.

4.2 Crossover

The genes chain of two parent individuals selected are

crossovered with the probability of pc using the

TSCrossover algorithm, where pc is a system control

parameter. The new individuals after crossovering are

used to replace their parents. The TSCrossover algorithm

is shown in Figure 2.

Let we have two test suites Si = < c1, c2, c3, c4, c5, c6, c7,

c8, c9 >, which contains 9 test cases and Sj = < c10, c11, c12,

c13, c14, c15, c16 >, which contains 7 test cases. If the posi-

tion of crossovering is 3, then two new individuals of

next generation after crossovering Si and Sj are: Si’ = < c1,

c2, c12, c13, c14, c15, c16, c8, c9 > and Sj’ = < c10, c11, c3, c4,

c5, c6, c7 >, where the test cases indicated in bold face are

those interchanged one by one from corresponding test

cases in parent test suites.

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

545

4.3 Mutation

Each bits of the genes chain of new individuals are mu-

tated with the probability of pm using the TSMutation

algorithm, where pm is a system control parameter. The

probability of mutation is often small, or else it will raise

questions over the system stability. The populations can

be prevented from stagnating through mutation. If there

is no mutation, then the test data of new populations will

be confined to the initial values. The mutation process is

conducted respectively using the TSMutation algorithm

for the test suites Si’ and Sj’ crossovered using the

TSCrossover algorithm. The new individuals after muta-

ting are employed to replace those before mutating. The

TSMutation algorithm is shown in Figure 3.

Let pm = 0.015 and Si’ = <c1, c2, c12, c13, c14, c15, c16, c8,

c9>, which is one of the test suites after using the

TSCrossover algorithm. We randomly generate some

data: g(c1) = 0.246, g(c2) = 0.580, g(c12) = 0.025, g(c13) =

0.012, g(c14) = 0.632, g(c15) = 0.073, g(c16) = 0.431, g(c8)

= 0.193 and g(c9) = 0.059. For pm is not less than g(c13),

c13 is mutated, i.e., the positions of c13 and c12 are inter-

changed. The test suites Si’ after mutating becomes < c1,

c2, c13, c12, c14, c15, c16, c8, c9 >.

4.4 The Interacting Testing among User Sessions

Some new test cases, which contain the requested infor-

mation of different users, can also be generated using

crossover. The goal of the new test cases is to detect er-

rors caused by the use of possible conflicting data shared

by different users. The crossover way shows the idea of

information interchange among different user sessions.

Let S be a test suite, the steps of generating a test case

using crossover are:

1) for a test case c in S, randomly generate a number g

∈[0, 1], if a given crossover probability is not less than

g, then

a) copy the requests from r1 to ri in c to an interim test

case, where i is a random number, i∈[1, |c|]; |c| is the

length of URL trace in c;

b) select a test case d (different from c) in S randomly,

and then search rj in d reversely, which is the first one to

have the same URL as ri. If not found, then another test

case (also denoted as d) is selected, until it is found or up

to a given time;

c) if d is found, then all the requests after rj in d are

appended to the interim test case, which is the new test

case generated; otherwise go 2);

2) repeat 1) until each test case in the test suite is

processed.

We search the requests in test case d with an inverse

search method, for the length of greatest common prefix

of two test cases c and d in the same test suite is often

greater than 1; when the corresponding URL trace from r1

to ri is the prefix of this greatest common prefix, we can

not obtain different test cases if using the sequential search

method. For example, given two URL traces c =

u1u2u4u3u5u6u8u7 and d = u1u2u4u3u7u4u5u9, we randomly

copy u1u2u4 from c to an interim test case t firstly (i.e., t

equals to u1u2u4 temporarily). Here, I = 3 and ri = u4. Now,

we search rj in d reversely, which is the first one to have

the same URL as ri and then we have j = 6 and rj = u4. So,

all the requests after r6 in d (i.e., u5u9) are appended to t

such that t = u1u2u4u5u9. If searching the requests in d with

a sequential search method, we have j = 3 and rj = u4. This

time, if copying all the requests after r3 in d (i.e.,

u3u7u4u5u9) to t, we have t = u1u2u4u3u7u4u5u9, which

equals to d itself. And no new test case is generated. The

Algorithm: TSCrossover

input:

Parent test suites Si and Sj;

The crossover probability of pc;

output:

Si’ and Sj’ of next generation;

begin

Si’, Sj’←Si, Sj;

Randomly generate a number g∈[0, 1];

if pc is not less than g

Randomly generate an integral number g’(0, min(∈|Si’|,

|Sj’|)]; //|Si’| and |Sj’| are the number

//of test cases in Si’ and Sj’ respectively

Interchange the test cases in Si’ and Sj’ from the position of

g’ one by one,

until no more test case needs to be interchanged in any of

them;

endif;

Output Si’ and Sj’;

end.

Figure 2. The crossover algorithm TSCrossover for test suites

Algorithm: TSMutation

input:

The test suites Si’ and Sj’ after using TSCrossover algorithm;

The mutation probability of pm;

output;

The test suites Si’ and Sj’ after mutating;

begin

for each S in {Si’, Sj’}

for each c in S

Randomly generate a number g∈[0, 1];

if pm is not less than g

Interchange the positions of c and the test case

before c;

endif;

endfor

Output S;

endfor

end.

Figure 3. The mutation algorithm TSMutation for test suites

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

546

reason is that the corresponding URL trace (i.e., u1u2u4)

from r1 to ri (i = 3) in c is the prefix of greatest common

prefix (i.e., u1u2u4u3) of c and d.

In addition, it is often not to mutate a test case, for the

new generated test case after exchanging the two adja-

cent requests ri and ri+1 makes no sense. The possible

reason is that the former request ri-1 may never reach the

request ri+1 (note that, this time ri becomes the next re-

quest of ri+1).

5. Experimental Analysis

Consider a typical miniature Web application developed

to demonstrate our approach: the SWLS (Simple Web

Starting at the first page (indicated by a dashed arrow,

reasonably, a blank page can be used to request for the

first page of a Web application), i.e., a home page (p1),

the user can enter into the news page (p2) to list the news

by clicking on the view link, or enter into the login page

(p3) by pressing the login button. In page p3, the user

enters the userid and password, and presses the submit

button. Upon this pressing, the userid and password are

sent to the Web server for authentication. A logged page

(p4) will be loaded if both userid and password are co-

rrect. On the contrary, an error page (p7) containing an

error message is displayed if at least one of the submitted

values for userid and password is wrong. From the

logged page, it is possible to go to info page (p5) for se-

cure information viewing by just clicking on the browse

link. The user can click on the intra-page link continue to

view the different parts of the same page p5 if it is too

long. A logout page (p6) will be displayed when the user

presses the exit or logout button. Then, the user may

come back to the home page for login again. Note that

each time the login page is displayed; both the userid

and password fields should be initialized to be empty.

We inject only one fault in each page respectively, so

at least 7 faults exist totally (there may be other faults in

the Web application originally). A set of user sessions

are obtained after scanning user logs on the Web server

and purging their irrelevant information; then 89 mean-

ingful user sessions are finally created after scanning

them again. Only 17 user sessions are used to generate

test cases after the reduction of the meaningful user ses-

sions using the URL trace-based ReduceUSession algo-

rithm; and the reduction ratio reaches 80.9%. Imperson-

ally, more user sessions there are for a given Web appli-

cation, much higher is the reduction efficiency, for more

user sessions mean higher possibility that the URL trace

requested by a user session is the prefix of that requested

by another in the view of statistics. The set of 17 user

sessions† (or test cases) is denoted by Γ1. After grouping

and prioritizing Γ1, an initial executing sequence <S1, S2,

S3> of test suites is obtained, which is denoted by Γ2,

where S1, S2 and S3 contain initial executing sequences

of 7, 6 and 4 test cases respectively. We achieve the final

executing sequence <S1’, S2’, S3’> of test suites using

genetic algorithm for further processing, which is de-

noted by Γ3, i.e., S1’, S2’ and S3’ is obtained through

crossovering and mutating S1, S2 and S3.

Now, we run the test suites (and test cases) in Γ1, Γ2

and Γ3 for the Web application respectively, and find all

the 7 faults injected as well as an additional fault (i.e.,

the user may browse p2 just by directly entering its ad-

dress in the URL address bar). The executing time of Γ2

and Γ3, however, is much shorter than that of Γ1, and it

takes less time of Γ3 than that of Γ2 too. This means that

the test suites (and test cases) grouped and prioritized run

faster than those, which are not grouped and prioritized;

and that the test suites (and test cases) processed further

by genetic algorithm are much faster. It is predictive that

the test approach proposed in this paper will yield more

evident positive effects for larger Web applications.

6. Concluding Remarks and Future Work

Generating test cases of high quality is the premise of

Web application testing. The approach in this paper cap-

tures a series of user events, i.e., the sequences of URLs

and name-value pairs in Web server (s). It then employs

the reduction, grouping, prioritization and genetic algo-

rithm to yield test cases and optimizes them. Compared

to the methods of capturing user events in clients, our

approach is very effective when a large volume of users

exist, and it is a Web application testing method of high

efficiency. The main contributions include:

1) an approach to generating and optimizing test cases

is proposed for Web application testing based on user

sessions using genetic algorithm.

2) several important definitions such as URL trace,

prefix, common prefix, greatest common prefix, are

given. These definitions are convenient for reducing and

grouping user sessions.

3) a URL trace-based reduction algorithm is designed.

The user sessions acquired are lessened greatly using the

view

browse

in continue

logout

return submit

submit

exit

home

page

logged

page

logout

page

error

page

info

page

home

page

news

page

† For the convenience of test run, we have preprocessed the user sessions

appropriately. Figure 4. A simple web login system

User Session-Based Test Case Generation and Optimization Using Genetic Algorithm

547

algorithm. However, it covers all the URLs requested by

the original set of user sessions and keeps the sequence

of URL requests.

4) an approach to grouping and prioritizing user sessions

is presented, as can improve the efficiency of test run.

5) an approach to generating new test cases is pro-

posed using crossover. The new test cases generated in-

clude information requested by different users and can

detect errors caused by the use of possible conflicting

data shared by different users. That’s to say, the cross-

over indicates the idea of information interchange among

different user sessions, which helps to test the interacting

of user sessions.

6) a strategy of test reuse and concurrence is provided,

as can decrease the time of test run greatly as well as le-

ssen test cost.

Web application testing is much complex systems engi-

neering. It is not easy to acquire an effective and practical

test scheme. Our approach only evaluates a test case ac-

cording to its test coverage ratio. However, many factors

need to be considered, such as the running cost of each test

case itself (for example, CPU time), including the actual

running time, loading time, time to save test state, and the

influence of different test criteria on a test case. All these

questions need to be answered in future research.

REFERENCES

[1] E. Hieatt and R. Mee, “Going Faster: Testing the Web Ap-

plication,” IEEE Software, Vol. 19, No. 2, 2002, pp. 60-65.

[2] D. C. Kung, C. H. Liu and P. Hsia, “An Object-Oriented

Web Test Model for Testing Web Applications,”

Proceedings of the 1st Asia-Pacific Conference on Web

Applications, New York, 2000, pp. 111-120.

[3] J. Offutt, Y. Wu. and X. Du, et al., “Bypass Testing of

Web Applications,” Proceedings of the 15th IEEE Inter-

national Symposium on Software Reliability Engineering,

Bretagne, November 2004.

[4] C. H. Liu, D. C. Kung and P. Hsia, “Object-Based Data

Flow Testing of Web Applications,” Proceedings of the

1st Asia-Pacific Conference on Quality Software, Hong

Kong, 2000, pp. 7-16.

[5] C. Elbaum, G. Rothermel and S. Karre, et al., “Leveraging

User Session Data to Support Web Application Testing,”

IEEE Transaction on Software Engineering, California,

May 2005.

[6] R. Godin, R. Missaoui and H. Alaoui, “Incremental

Concept Formation Algorithms Based on Galois (Concept)

Lattices,” Computational Intelligence, Vol. 11, No. 2,

1995, pp. 246-267.

[7] S. Khor and P. Grogono, “Using a Genetic Algorithm and

Formal Concept Analysis to Generate Branch Coverage

Test Data Automatically,” Proceedings of the Inter-

national Conference on Automated Software Engin-

eering, Austria, 2004, pp. 346-349.

[8] H. H. Sthamer, “The Automatic Generation of Software

Test Data Using Genetic Algorithms,” PhD. Dissertation,

University of Glamorgan, Wales, 1996.

[9] D. Berndt, J. Fisher and L. Johnson, et al., “Breeding

Software Test Cases with Genetic Algorithms,” Pro-

ceedings of the 36th Hawaii International Conference on

System Sciences, 2003, pp. 17-24.

[10] R. P. Pargas and M. J. Harrold, “Test-Data Generation

Using Genetic Algorithms,” Journal of Software Testing,

Verification and Reliability, Vol. 9, No. 4, 1999, pp.

263-282.

[11] X. X. Jia, J. Wu, M. Z. Jin, et al., “Some Experiment

Analysis of Using Generic Algorithm in Automatic Test

Data Generation,” in Chinese, Journal of Chinese Com-

puter Systems, Vol. 28, No. 3, 2007, pp. 520-525.

[12] D. J. Berndt and A. Watkins, “Investigating the Perfor-

mance of Genetic Algorithm-Based Software Test Case

generation,” Proceedings of the International Symposium

on High Assurance Systems Engineering, Florida, 2004,

pp. 261-262.

[13] J. H. Holland, “Adaptation in Natural and Artificial

System,” University of Michigan Press, Michigan, 1975.