DARM: Decremental Association Rules Mining

doi:10.4236/jilsa.2011.33019

Paper Menu >>

Journal Menu >>

Journal of Intelligent Learning Systems and Applications, 2011, 3, 181-189

doi:10.4236/jilsa.2011.33019 Published Online August 2011 (http://www.SciRP.org/journal/jilsa)

181

DARM: Decremental Association Rules Mining

Mohamed Taha1, Tarek F. Gharib2,3, Hamed Nassar4

1Faculty of Computers & Informatics, Benha University, Benha, Egypt; 2Faculty of Computer & Information Sciences, Ain Shams

University, Cairo, Egypt; 3Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;

4Faculty of Computers & Informatics, Suez Canal University, Ismailia, Egypt.

Email: mohamed.taha@fci.bu.edu.eg, tfgharib@cis.asu.edu.eg, nassar@ci.suez.edu.eg

Received December 29th, 2010; revised April 1st, 2011; accepted April 8th, 2011.

ABSTRACT

Frequent item sets mining plays an important role in association rules mining. A variety of algorithms for finding fre-

quent item sets in very large transaction databases have been developed. Although many techniques were proposed for

maintenance of the discovered rules when new transactions are added, little work is done for maintaining the discov-

ered rules when some transactions are deleted from the database. Updates are fundamental aspect of data management.

In this paper, a decremental association rules mining algorithm is present for updating the discovered association rules

when some transactions are removed from the original data set. Extensive experiments were conducted to evaluate the

performance of th e p ropo sed algo rithm. The resu lts sh o w tha t the proposed algo rithm is efficien t a nd ou tp erforms other

well-know n al g ori t h ms.

Keywords: Decremental Mining, Association Rules Maintenance, Upda ting Association Rules

1. Introduction

Association rules mining is a data mining technique

which discovers strong associations or correlation rela-

tionships among data [1]. Recently, it has attracted much

attention in data mining research because of its wide ap-

plicability in many areas, including decision support,

market strategy and financial forecast. An association

rule is a relation between items in a set of transactions.

This rule must have two measures to express its statisti-

cal significance: the support and the confidence [2].

There are many mining algorithms to find out associa-

tion rules in large databases: the Apriori and its varia-

tions, Tree-Projection algorithm, FP-growth and others

[3-5]. The Apriori algorithm is the first successful algo-

rithm for mining association rules. The main idea of Ap-

riori-based algorithms is to run number of iterations. In

each iteration, a set of candidate itemsets is generated

then the database is scanned to count the number of

transactions that contain each candidate set. The candi-

dates with support counts greater than or equal to the mi-

nimum support count are the frequent itemsets. These

mining algorithms differ in the techniques used to create

the candidate sets. The smaller the number of candidate

sets is, the faster the algorithm would be.

The Apriori-based algorithms belong to a generate-

and-test paradigm. They compute frequent itemsets by

generating candidates and checking their frequencies a-

gainst the transaction database. Another paradigm is

based on a test-only approach. It does not generate can-

didates and only tests for frequency. The FP-growth al-

gorithm belongs to this paradigm. It uses a tree structure

to represent all the transactions of the database. The fre-

quent itemsets are generated with only two passes over

the database and without any candidate generation proc-

ess [6].

Due to advances in information technology, a large

amount of data could be collected easily. Databases may

be frequently or occasionally updated. Such updates may

change the characteristic of the database and hence in-

validate the existing discovered rules (some of the rules

may still be valid and new rules may appear). A brute

force approach to solve the update problem is to re- mine

the entire updated database to find the new association

rules. However, this approach is not efficient because it

ignores the previous computation that has been per-

formed. The present approach to the update problem is to

use incremental and decremental mining algorithms. In-

cremental mining refers to mining the database when

new transactions are added to it while decremental min-

ing refers to mining the database when some obsolete

transactions are deleted from it [7,8]. The main purpose

of these mining algorithms is to update the previously

discovered rules by benefiting from the previously dis-

covered information without repeating all the work done

DARM: Decremental Association Rules Mining

182

previously. Much effort has been devoted to the problem

of incremental mining and several algorithms have been

already proposed to update the association rules after

adding new transactions but decremental mining is not as

fortunate.

In this paper, a decremental updating algorithm called

Decremental Association Rules Mining algorithm (DARM)

has been introduced which aims to store and maintain the

itemsets that are not frequent at present but may be fre-

quent after updating the database. Therefore the process-

ing cost of the updated database can be reduced.

The rest of the paper is organized as follows. In the

next section, some preliminaries in association rules

mining are illustrated. Section 3 surveys the related work.

In Section 4, the DARM algorithm is presented. Section

5 shows the experimental results. Finally, conclusions are

presented in Section 6.

2. Preliminaries

Let I = {i1, i

2,, im} be a set of items and DB be a

transaction database, where each transaction T is a set of

items such that T



I. For an itemset A



I, a transaction T

contains A if and only if A



T. An association rule is

defined as an implication of the form A



B, where A



I and A



B =



. The association rule A



B has

support s in DB if s is the percentage of transactions in DB

contains A



B. The association rule A



B holds in DB

with confidence c if c is the percentage of transactions in

DB that contain A also contain B. The problem of mining

association rules is to extract all the association rules

whose support and confidence are not less than given

threshold values called min_sup and min_conf. The asso-

ciation rules that satisfy both the minimum support thre-

shold and minimum confidence threshold are called

strong rules to distinguish them from the weak ones. The

support of an itemset A is the percentage of transactions in

DB that contain A. An itemset A is frequent if its support is

not less than a minimum support threshold min_sup [9-

12].

Let L be the set of frequent itemsets in the database DB,

D be the number of transactions in DB, s be the minimum

support and X.support be the support count for an itemset

X which is the number of transactions in DB containing X.

Assume that for each X



L, X.support is available. After

some updates, a set of old transactions db (decrement

database) is deleted from DB, and d is the number of

transactions in db. Using the same minimum support s, an

itemset X is frequent in the updated database (DB – db) if

the support of X in (DB – db) is not less than or equal to s,

i.e., X.support  s  (D – d) [9,11]. Therefore, the problem

of updating association rules can be defined as finding the

set L` of frequent itemsets in DB – db.

3. Related Work

In the literature, the majority of the research in updating

the discovered association rules is focused on the incre-

mental mining. This is due to updating databases by

adding new transactions has many applications such as

web log records, stock market data, grocery sales data,

transactions in electronic commerce, and daily weather/

traffic records, to name a few [9]. Several incremental

mining algorithms are proposed such as Fast UPdate al-

gorithm (FUP) [10], The Update Large Itemsets algo-

rithm (ULI) [13], Negative Border with Partitioning

(NBP) [2], Update with Early Pruning algorithm (UWEP)

[11], New Fast UPdate algorithm (NFUP) [14] and

Granular Computing Based Incremental Frequent Itemset

Mining in distorted databases (GrC-IFIM) [15]. Unfor-

tunately, little work has addressed the decremental min-

ing although deletion is one of the most frequently op-

erations in many database systems. In many applications,

the user may want to remove the out-of-date data from

the database.

Cheung et al. proposed the first decremental mining

algorithm called FUP2 [12]. In fact, it is an extension of

the incremental algorithm FUP to update the discovered

association rules when transactions are deleted from the

database. Mainly, the framework of FUP2 is similar to

that of Apriori. It contains number of iterations. At each

iteration, the frequent itemsets of the same size are found.

However, FUP2 has two main problems; it generates

large number of candidate itemsets and it requires multi-

ple scans of the database.

Lee et al. proposed a Sliding-Window Filtering tech-

nique (SWF) for updating the discovered association

rules [9]. The SWF technique partitions the transaction

database into a set of partitions and employs the mini-

mum support threshold in each partition to filter unnec-

essary candidate itemsets. The algorithm uses a scan re-

duction technique in which it generates all the candidate

itemsets of all sizes from the candidate 2-itemsets to re-

duce the number of database scans. However, this leads

to generating larger number of candidates. In fact, SWF

does not utilize previously discovered frequent itemsets

which are used in the other algorithms to improve incre-

mental or decremental mining. Since the SWF is based

on partitioning the database, it requires the data to be

uniformly distributed to work well. Chang et al. extended

the SWF algorithm by incorporating previously discov-

ered information and proposed two algorithms to en-

hance the performance for incremental mining [16]. How-

ever, they are only suitable for cases in which a whole

partition of data is deleted from a database at a time. In

practice, situations requiring the deletion of continuous

blocks of data from databases are not very common [17].

DARM: Decremental Association Rules Mining

183

Lian et al., [18] addressed the relationships between

old and new maximal frequent itemsets and proposed an

algorithm called IMFI, which is based on these relation-

ships to reuse previously discovered knowledge. The al-

gorithm follows a top-down mechanism rather than tradi-

tional bottom-up methods to produce fewer candidates.

Moreover, they integrated SG-tree into IMFI to improve

the counting efficiency.

Zhang et al. proposed the Decrement Updating Algo-

rithm (DUA) [7,8]. The algorithm tries to find those

itemsets that are most likely to change their frequentness

in the remaining database by: 1) finding the frequent

itemsets of the decrement database using a threshold

value less than the minimum support. 2) finding the fre-

quent itemsets of a random set of transactions from the

original database. One important drawback of this algo-

rithm is that its accuracy drops when a lower minimum

support threshold is used because it can not find all the

frequent itemsets in the updated database. Another draw-

back is that the DUA algorithm is efficient as long as the

size of the decrement database is less than 30% of the

total data in the original database [7].

Also, Zhang et al. proposed another algorithm called

Efficient Dynamic Database Updating Algorithm (EDUA)

[17]. It is similar to the SWF in adopting the scan reduc-

tion technique. The EDUA consists of two procedures:

the pre-mining procedure which uses the scan reduction

technique to calculate all the frequent itemsets of the

original database and store all the candidate 2-itemsets

for the following procedure; the dynamic maintaining

procedure in which all the candidate 2-itemsets in the

deleted database are calculated with their support counts

and subtracted from their corresponding 2-itemsets in the

original database then the scan reduction technique is

used again to generate all the candidates of all sizes. Fi-

nally, the new candidate itemsets are checked against the

updated database in order to discover the new frequent

itemsets. The main advantage of the EDUA over the

SWF is that it can deal with the case in which transac-

tions are deleted from any part of the database, while the

SWF can only function when the whole first partition is

deleted from the database. Like SWF, EDUA does not

benefit from the previously discovered frequent itemsets.

A pruning technique was also proposed in [17] to en-

hance the performance of the EDUA that tries to use

heuristic knowledge of the prior mining results.

Mengling Feng et al., [19] addressed the maintenance

of the frequent patterns space of both incremental and

decremental updates. The author’s conducted an in-depth

investigation on how the frequent pattern space evolves

under both incremental and decremental updates. Based

on the evolution analysis, a new data structure, Genera-

tor-Enumeration Tree (GE-tree), is developed to facilitate

the maintenance of the frequent pattern space. With the

concept of GE-tree, Mengling Feng et al., proposed two

novel algorithms, Pattern Space Maintainer+ (PSM+)

and Pattern Space Maintainer− (PSM−), for the incre-

mental and decremental maintenance of frequent pat-

terns.

4. The DARM Algorithm

The main idea of the proposed algorithm (DARM) is to

store and maintain the itemsets that are not frequent at

present but may be frequent after deleting a set of trans-

actions. We will call these itemsets the semi-frequent

itemsets and denote them by SF. In practice, the decre-

mental algorithm is not invoked every time a transaction is

deleted from the database. However, it is invoked after a

non-trivial number of transactions are deleted. Assume

that the algorithm will be invoked every time d transac-

tions are deleted from the database, so an itemset X is

semi-frequent if it satisfies the following condition:

s  D > X.support  s  (D – d)

This means that the semi-frequent itemsets are those

itemsets that may be frequent after d transactions are

deleted from the database where their support counts are

less than the minimum support count and greater than or

equal to s  (D – d).

When deleting a set of transactions from a database, all

the itemsets in the updated database can be categorized

into four classes as illustrated in Ta ble 1. The itemsets in

classes A and D need some processing because they may

be frequent or infrequent in the updated database. How-

ever, the itemsets in classes B and C are straight forward.

If an itemset is frequent in the original database and in-

frequent in the decrement database, it will be frequent in

the updated database. Also, if an itemset is infrequent in

the original database and frequent in the decrement data-

base, it will be infrequent in the updated database.

In order to find the new frequent itemsets of the up-

dated database, the DARM algorithm must handle all the

itemsets in the four classes. The itemsets of class A, B

and C can be obtained easily by using the frequent item-

sets of the original and the decrement databases. How-

Table 1. The four classe s for an item set in the updated data-

base.

Class Original Database

(DB) Decrement Database

(db) Updated Database

(DB – db)

AFrequent Frequent Frequent or Infrequent

BFrequent Infrequent Frequent

CInfrequent Frequent Infrequent

DInfrequent Infrequent Frequent or Infrequent

DARM: Decremental Association Rules Mining

184

ever, the itemsets of class D need further consideration

that is why the semi-frequent itemsets are stored. Table 2

shows the definitions of symbols used in the pseudo-code

of the DARM algorithm. The actual steps of the DARM

algorithm are shown in Figure 1. The first step of the

algorithm is to find the frequent itemsets of the decre-

ment database with their support counts. Then, it sub-

tracts the support counts of the frequent itemsets of the

decrement database from their corresponding frequent

itemsets of the original database. The new support counts

of these itemsets are then compared to the minimum

support to determine which itemsets will be still frequent

and which will not. By this way, the algorithm handles

those itemsets falling in class A.

To determine the itemsets of class D, the algorithm

eliminates each frequent itemset in the decrement data-

base from SF. After this elimination, the remaining item-

sets in SF are the infrequent itemsets in both the original

and the decrement database (class D). These itemsets

require a scan of the decrement database to update their

support counts then the algorithm verifies them against

the minimum support. The rest of the itemsets that are

frequent in the original database and they do not occur in

the frequent itemsets of the decrement database will be

frequent in the updated database and do not need to be

verified (class B).

However, the DARM algorithm needs a scan to the

decrement database for these itemsets to update their

support counts for further runs. Hence, the algorithm

scans the decrement database only once to handle all the

itemsets in class B and class D. Also, the rest of frequent

itemsets of decrement database that do not have a corre-

sponding frequent itemsets in the original database will

be infrequent so that they do not need any processing at

all (class C).

For the subsequent runs of the algorithm, the semi-

Table 2. The list of symbols used in the DARM Algorithm.

Symbol Meaning

DB The original database

D The number of transactions in DB

db The decrement database

d The number of transactions in db

DB – db The updated database



The number of transactions in the updated database

S The minimum support

SF The set of semi-frequent itemsets

L The set of frequent itemsets in DB

Ldec The set of frequent itemsets in db



The set of frequent itemsets in DB – db

X An itemset

X.support The number of transactions containing X in the data-

base

X.supportDB The number of transactions containing X in DB

X.supportdb The number of transactions containing X in db

X.supportDB–db The number of transactions containing X in DB – db

Figure 1. The DARM algorithm.

123

Figure 2. The update procedure of SF.

frequent itemsets need to be updated. Figure 2 shows the

Update_SF procedure that maintains the itemsets in SF.

First, the procedure computes the negative border of new

frequent itemsets of the updated database. Then, it com-

putes the support count of each itemset in the negative

border in the updated database and checks if the itemset

satisfies the condition of semi-frequent itemsets.

Example 4.1 Consider the database DB presented in

Figure 3(a) with ten transactions indexed from T1 to

T10. All the frequent and semi-frequent itemsets of DB

are known in advance and shown in Figure 3(b) with a

Inputs:

DB, db, L, SF and s

Output:

L (L is initially set to ) and SF

Algorithm:

1- Find the set Ldec of frequent itemsets in the decrement data-

base db.

2- For all X  L

If X  Ldec then

X.supportDB-db = X.supportDB – X.supportdb

If X.supportDB-db  s  (D – d) then

L = L  {X}

Remove X from L and Ldec

3- For all X  SF

If X  Ldec then

Remove X from SF

4- Scan db and compute X.supportdb for all X  L  SF

5- For all X  L  SF

X.supportDB–db = X.supportDB – X.supportdb

If X  L then

L = L  {X}

Else

If X.supportDB–db  s  (D – d) then

L = L  {X}

6- Update_SF (L) //a procedure to maintain the itemsets in

7- Return L

Function Update_SF (L



)

1- Compute Negative border of L NBd (L)

2- Scan DB – db and compute X.supportDB-db for all X  NBd

(L)

3- For X  NBd (L)

If X.supportDB-db <s



D && X.supportDB-db





D

SF = SF  {X}

End Function

DARM: Decremental Association Rules Mining

185

minimum support 30%. After a period of time, four

transactions are deleted from DB: T1, T7, T9 and T10.

To find the new frequent itemsets of the updated data-

base, the algorithm computes the frequent itemsets of the

decrement database and the result is shown in Ldec in

Figure 3(b).

After that, it begins to handle the itemsets of class A

Original database (DB) Decrement database (db)

TID Transaction

T1 A B E F T1 A B E F

T2 B C F T7 C E F

T3 A C E F T9 B C E F

T4 B D E T10 B F

T5 A B C E

T6 A B D E

T7 C E F

T8 A B F

T9 B C E F

T10 B F

(a)

L Ldec SF

Itemset Count Itemset Count Itemset Count

A 5 B 3 D 2

B 8 C 2 AC 2

C 5 E 3 ABF 2

E 7 F 4 AEF 2

F 7 BE 2 BCE 2

AB 4

BF 3 BCF 2

AE 4

CE 2 BEF 2

AF 3

CF 2

BC 3

EF 3

BE 5 BEF 2

BF 5 CEF 2

CE 4

CF 4

EF 4

ABE 3

CEF 3

(b)

Class A L

Itemset Count Itemset Count

B 5 B 5

C 3 C 3

E 4 E 4

F 3 F 3

BE 3 BE 3

BF 2 BF 2

CE 2 CE 2

CF 2 CF 2

EF 1

CEF 1

(c)

Class B Class C Class D

Itemset Count Itemset Count Itemset Count

A 5 BEF 2 D 2

AB 4 AC 2

AE 4 ABF 2

AF 3 AEF 2

BC 3 BCE 2

ABE 3 BCF 2

(d)

Class B Class D L

Itemset Count ItemsetCount Itemset Count

A 4 D 2 A 4

AB 3 AC 2 B 5

AE 3 ABF 1 C 3

AF 2 AEF 1 D 2

BC 2 BCE 1 E 4

ABE 2 BCF 1 F 3

AB 3

AC 2

AE 3

AF 2

BC 2

BE 3

BF 2

CE 2

CF 2

ABE 2

(e)

Figure 3. An illustrative example.

which are the common itemsets in L and Ldec. All the

support counts of the frequent itemsets in Ldec are sub-

tracted from their corresponding frequent itemsets in L

(the highlighted itemsets in Figure 3(b)) and these item-

sets are removed from the two lists. The results of the

subtraction are shown in class A in Figure 3(c). The

itemsets in class A for which the support count is not less

than (6 × 30% = 1.8 ≈ 2) are moved to L' as shown in

Figure 3(c). Note that the itemsets EF and CEF have not

enough support count so they are not moved to L'. The

remaining itemsets in L and Ldec are shown in class B and

class C in Figure 3(d) respectively. By default, all the

itemsets in class B will be frequent in the updated data-

base and there is no need to be verified but their support

counts need to be updated. Only their support counts in

DB are known so they need a db scan to determine their

support counts. The DARM algorithm postpones this

scan until it determines the itemsets of class D to handle

both of them in one scan. Using the itemsets in SF and in

Ldec, the algorithm can determine the itemsets of class D

by subtracting the itemsets of Ldec from SF. Here, the

itemset BEF marked by an oval in Figure 3(b) is elimi-

nated from SF because it appears in Ldec and all the re-

maining itemsets in SF will belong to class D as shown

in Figure 3(d). Then, the algorithm scans db to update

the support counts of the itemsets of class B and D. Fig-

ure 3(e) shows the itemsets of class B and D with their

new support counts in the updated database. All the

itemsets of class B are then moved to L' without any

verification while the itemsets of class D are verified

against the minimum support count of the updated data-

base. The itemsets D and AC are the only itemsets from

class D that are moved to L'.

5. Experimental Results

Several experiments are designed to assess the perform-

DARM: Decremental Association Rules Mining

186

ance of the proposed algorithms and compare it with the

performance of two other decremental algorithms: DUA

and EDUA. These two algorithms are chosen because

they are the most recent algorithms for updating associa-

tion rules when some transactions are deleted from the

database. The comparisons are based on the following

metrics: run time, minimum support, original database

size and decrement database size.

In this section, the experiments of the DARM algo-

rithm are reported. All the programs are implemented

using C# and run on a PC with 1.8 GHz Intel Core 2 Duo

processor and 1GB memory running on Windows XP

Professional. In the experiments of data mining, it is

common to use the notation [Tx-Iy-Dm-dn] to represent

a dataset in which x is the mean size of a transaction, y is

the mean size of maximal frequent itemsets, m is the

number of transactions in the original database (in thou-

sands) and n is the number of transactions in the decre-

ment database (in thousands).

The experiments carried out performed using the syn-

thetic data used in the experimental results of the previ-

ously mining algorithms introduced in [7-9,15,16].

Readers are referred to these papers for a detailed de-

scription. In general, several different transaction data-

bases are generated from a set of potentially frequent

itemsets using the parameters mentioned in the notion

[Tx-Iy-Dm-dn]. These transactions mimic the transac-

tions in the retailing environment.

The first experiment measures the relative run time of

the DARM, DUA and EDUA algorithms. The three al-

gorithms are tested against different datasets with mini-

mum support thresholds varying from 0.1% to 1%. The

results are shown in Figures 4-6 with original database

sizes 50 k, 100 k and 150 k respectively. The size of the

decrement database is equal to 20% of the transactions in

the original database. It is clear from these figures that

the DARM algorithm reduces the run time required to

Figure 4. The relative performance of the decremental al-

gorithms and the DARM Algorithm.

Figure 5. The relative performance of the decremental al-

gorithms and the DARM Algorithm.

Figure 6. The relative performance of the decremental al-

gorithms and the DARM Algorithm.

perform the update operation than other two decremental

algorithms.

By examining these figures more carefully, it can be

noticed that at low support thresholds, the performance

difference among the three algorithms is significant while

it decreases as the support thresholds increase. This is

because at low support thresholds, large numbers of fre-

quent itemsets are produced hence the performance dif-

ferences among different algorithms are prominent. How-

ever at high support thresholds, there is a restricted num-

ber of frequent itemsets so the run time curves of differ-

ent algorithms approach each other in this region. Also, if

we examine these figures more carefully, we can notice

sharper differences in the run times of EDUA at high

support thresholds than other two algorithms. To under-

stand this phenomenon, recall that EDUA generates the

entire candidate itemsets of all sizes from the candidate

2-itemsets to reduce the number of database scans. How-

ever, the DUA and the DARM algorithm are based on

handling the frequent itemsets.

Figure 7 shows the average speedup ratio achieved by

DARM: Decremental Association Rules Mining

187

the DARM algorithm with respect to DUA and EDUA

algorithms. It can be seen that the average speed is 1.14

with the DUA and 1.62 with the EDUA.

In the second experiment, the frequent itemsets gener-

ated by each algorithm are verified to determine its ac-

curacy. The accuracy is defined as the ratio of the number

of frequent itemsets found by the algorithm against the

number of frequent itemsets found by running Apriori on

the updated database [7,8]. Table 3 shows the number of

frequent itemsets generated by each algorithm. The ac-

curacy of the DUA algorithm is 89.77%, 99.69% and

95.95% for the datasets 50 k, 100 k and 150 k respectively

while the accuracy of the DARM algorithm and EDUA is

equal to 100% for all datasets. The reason for the accuracy

difference among these algorithms is in the way they

handle the itemsets of class D. The DUA tries to find the

class D itemsets by reducing the minimum support by

which the mining is performed on the decrement database

and by finding the frequent itemsets of a random set of

transactions from the original database. However, this

does not guarantee to find all itemsets of class D. For the

EDUA, it depends on storing all candidate 2-itemsets and

generating all the candidates of all sizes from them so no

frequent itemset will be missed. For the DARM algorithm,

only the semi-frequent itemsets are stored and this guar-

antees finding all frequent itemsets without any miss.

The next experiment is performed to find out how the

size of decrement database affects the performance of the

DARM algorithm. In this experiment, the size of original

database is fixed while the size of the decrement database

is changing. As shown in Figure 8, the results are very

Figure 7. The speedup ratio.

Table 3. Accuracy comparsion of the decremental algori-

thms.

Number of frequent itemsets generated

Datasets DUA EDUA DARM Apriori

T10-I4-D50-d10 13,404 14,931 14,931 14,931

T10-I4-D100-d20 27,377 27,461 27,461 27,461

T10-I4-D150-d30 22,470 23,419 23,419 23,419

Accuracy 89.77% to

99.69% 100% 100%

encouraging. It shows that the DARM algorithm not only

can work with small decrement, but also can work very

well in case of large decrement. In general, the larger the

decrement is, the longer it would take to do the update.

Another similar experiment is done to know the influ-

ence of changing the size of the original database. Figure

9 shows the scalability of the DARM algorithm with re-

spect to changing the size of the original database. It

shows that the proposed algorithm is adaptive to size

increase and can be applied to very large databases.

The last experiment evaluates the reduction in the total

number of candidates generated by the DARM algorithm.

In this experiment, the number of candidates generated

by the DARM algorithm is counted and compared with

the number of candidates generated by the other algo-

rithms. The results are shown in Table 4. Note the sig-

nificant reduction in total number of candidate itemsets

compared to the other algorithms. The DARM algorithm

achieves a reduction rate equal to 34.98% relative to the

DUA and 47.73% relative to the EDUA. Figure 10 shows

the candidate reduction rates of the DARM algorithm

with respect to the DUA and EDUA algorithms.

6. Conclusions

The maintenance of the frequent pattern is a challenging

task in data mining and machine learning. In this paper, a

Figure 8. Scalability with the number of transactions in db.

Figure 9. Scalability with the number of transactions in db.

DARM: Decremental Association Rules Mining

188

Table 4. Comparing the number of candidates.

Number of candidates generated

Itemsets DUA EDUA DARM

1 2,427 2,000 1,611

2 33,459 37,801 20,530

3 24,433 30,852 15,877

4 19,147 23,341 12,667

5 11,431 15,203 7,890

6 5,335 9,612 3,838

7 1,965 2,757 1,409

8 543 1,002 377

9 100 359 67

10 9 36 6

Total 98,849 122,963 64,272

Figure 10. Candidate reduction.

decremental association rules mining algorithm (DARM)

is presented. The proposed algorithm studied the prob-

lems of pattern maintenance when some transactions are

removed from a large database.

The main idea of the DARM algorithm is to store and

maintain the itemsets that are not frequent at present but

may be frequent after deleting a set of transactions. There-

fore the processing cost of the original database can be

reduced. The DARM algorithm does not require rescan-

ning the original database and can determine the new

frequent itemsets by scanning the decrement database

only. Experiments have shown that the DARM algorithm

not only attain highly accurate mining results, but also

run significant faster than existing algorithms for updat-

ing the frequent itemsets. The DARM algorithm outper-

forms the DUA in terms of both run time and in the ac-

curacy.

REFERENCES

[1] Z. Q. Zhou and C. I. Ezeife, “A Low-Scan Incremental

Association Rule Maintenance Method,” Proceedings of

the 14th Canadian Conference on Artificial Intelligence

(AI 2001), Ottawa, 2001, pp. 26-35.

[2] R. Kashef and Y. El-Sonbaty, “NBP: Negative Border

with Partitioning Algorithm for Incremental Mining of

Association Rules,” Proceedings of the International

Conference on Intelligent Systems Design and Applica-

tions (ISDA’04), Budapest, Hungary, August 2004.

[3] G. Grahne and J. Zhu, “Fast Algorithms for Frequent

Itemset Mining Using FP-Trees,” IEEE Transactions on

Knowledge and Data Engineering, Vol. 17, No. 10, 2005,

pp. 1347-1362. doi:10.1109/TKDE.2005.166

[4] R. C. Agarwal, C. C. Aggarwal and V. V. V. Prasad, “A

Tree Projection Algorithm for Generation of Frequent

Item Sets,” Parallel and Distributed Computing Journal,

Vol. 61, No. 3, 2001, pp. 350-371.

doi:10.1006/jpdc.2000.1693

[5] J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns

without Candidate Generation,” Proceedings of ACM-

SIGMOD International Conference on Management of

Data, Dallas, 2000, pp. 1-12.

doi:10.1145/342009.335372

[6] A. Ceglar and J. F. Roddick, “Association Mining,” ACM

Computing Surveys Journal, Vol. 38, No. 2, 2006, Article

[7] S. C. Zhang, J. L. Zhang and Z. Jin, “A Decremental Al-

gorithm of Frequent Itemset Maintenance for Mining

Updated Databases,” The Journal of Expert Systems with

Applications, Vol. 36, No. 8, 2009, pp. 10890-10895.

[8] S. C. Zhang, X. D. Wu, J. L. Zhang and C. Q. Zhang, “A

Decremental Algorithm for Mining Dynamic Databases,”

Proceedings of 7th International Conference on Data

Warehousing and Knowledge Discovery (DaWak 2005),

Copenhagen, 2005, pp. 305-314.

[9] C. H. Lee, C. R. Lin and M. S. Chen, “Sliding-Window

Filtering: An Efficient Algorithm for Incremental Min-

ing,” Proceedings of 10th International Conference on

Information and Knowledge Management, Atlanta, 2001,

pp. 263-270.

[10] D. W. Cheung, J. Han, V. T. Ng and C. Y. Wong, “Main-

tenance of Discovered Association Rules in Large Data-

bases: An Incremental Updating Technique,” Proceedings

of 12th International Conference on Data Engineering,

New Orleans, 1996, pp. 106-114.

[11] N. F. Ayan, A. U. Tansel and E. Arkun, “An Efficient

Algorithm to Update Large Itemsets with Early Pruning,”

Proceedings of 5th ACM SIGKDD International Confer-

ence on Knowledge Discovery and Data Mining, San

Diego, 1999, pp. 287-291.

[12] D. W. Cheung, S. D. Lee and B. Kao, “A General Incre-

mental Technique for Maintaining Discovered Associa-

tion Rules,” Proceedings of 5th International Conference

on Database Systems for Advanced Applications, Mel-

bourne, 1997, pp. 185-194.

doi:10.1142/9789812819536_0020

[13] S. Thomas, S. Bodagala, K. Alsabti and S. Ranka, “An

Efficient Algorithm for the Incremental Updating of As-

sociation Rules in Large Database,” Proceedings of 3rd

International Conference on Data Mining and Knowledge

Discovery, Newport Beach, 1997, pp. 263-266.

[14] C.-C. Chang, Y.-C. Li and J.-S. Lee, “An Efficient Algo-

rithm for Incremental Mining of Association Rules,”

Proceedings of 15th International IEEE Workshop on

Research Issues in Data Engineering: Stream Data Min-

DARM: Decremental Association Rules Mining

189

ing and Applications (RIDE-SDMA’05), Tokyo, 2005.

[15] C. Xu and J. Wang, “An Efficient Incremental Algorithm

for Frequent Itemsets Mining in Distorted Databases with

Granular Computing,” Proceedings of 2006 IEEE/WIC/

ACM International Conference on Web Intelligence, Hong

Kong, 2006, pp. 913-918.

[16] C.-H. Chang and S.-H. Yang, “Enhancing SWF for In-

cremental Association Mining by Itemset Maintenance,”

Proceedings of 7th Pacific-Asia Conference on Advances

in Knowledge Discovery and Data Mining (PAKDD

2003), Seoul, 2003, pp. 301-312.

[17] S. C. Zhang, J. L. Zhang and C. Q. Zhang, “EDUA: An

Efficient Algorithm for Dynamic Database Mining,” In-

formation Sciences Journal, Vol. 177, No. 13, 2007, pp.

2756-2767. doi:10.1016/j.ins.2007.01.034

[18] W. Lian, D. W. Cheung and S. M. Yiu, “Maintenance of

Maximal Frequent Itemsets in Large Databases,” Pro-

ceedings of 2007 ACM Symposium on Applied Computing

(SAC’07), Seoul, 2007, pp. 388-392.

[19] M. L. Feng, G. Z. Dong, J. Y. Li, Y.-P. Tan and L. Wong,

“Pattern Space Maintenance for Data Updates and Inter-

active Mining,” Computational Intelligence, Vol. 26, No.

3, 2010, pp. 282-316.

doi:10.1111/j.1467-8640.2010.00360.x