Intelligent Information Management, 2012, 4, 217-224
http://dx.doi.org/10.4236/iim.2012.45032 Published Online September 2012 (http://www.SciRP.org/journal/iim)
A Study on Associated Rules and Fuzzy Partitions for
Classification
Yeu-Shiang Huang, Jyi-Feng Yao
Department of Industrial and Information Management, National Cheng Kung University, Chinese Taipei
Email: yshuang@mail.ncku.edu .tw
Received May 12, 2012; revised June 22, 2012; accepted July 2, 2012
ABSTRACT
The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers
who fail to proficiently manipulate the data produced may make incorrect decisions and therefore harm their business.
Thus, the task of extracting and classifying the useful information efficiently and effectively from huge amounts of
computational data is of special importance. In this paper, we consider that the attributes of data could be both crisp and
fuzzy. By examining the suitable partial data, segments with different classes are formed, then a multithreaded compu-
tation is performed to generate crisp rules (if possible), and finally, the fuzzy partition technique is employed to deal
with the fuzzy attributes for classification . The rules generated in classifying the overall data can be used to gain more
knowledge from the data collected.
Keywords: Data Mining; Fuzzy Partition; Partial Classification; Association Rule; Knowledge Discovery
1. Introduction
Due to the rapid development of information technology,
such as databases and networks, modern industry is able
to gather and store large amounts of data easily. The col-
lected data, which may include product sales, manufac-
turing records, supplier profil es, and customer informati on,
etc., are utilized for transaction processes, information
management, and decision support. The enormously in-
creased amount of data may trigger a sensation of infor-
mation overload, causing decision makers to be incapa-
ble of manipulating the data efficiently and effectively.
This might result in incorrect decisions and therefore
harm their business. Some customer-oriented companies
have started to realize that it is necessary to pay more
attention to customers and their preferences. The effective
manipulation of large amounts of customer related in-
formation has been a critical success factor for corpora-
tion, since these useful (but often unorganized data),
usually stored in data warehouses, may contain informa-
tion important for managerial decision making. Hence,
the crucial task of discovering the deeply buried knowl-
edge in data warehouses is of special importance and a
source of strength for decision makers. The conceptual
idea of knowledge discovery in database (i.e., KDD) may
be a good solution for dealing with such problems. It can
be used to transform the data into valuable knowledge for
supporting decision analyses and improving customer
relationships. Recent research into knowledge discovery
has been based on statistical methods, artificial intelli-
gence, neural networks, genetic algorithms, and fuzzy
inference, etc.
Ordinal classification approaches are generally focused
on one specific type of data, especially numerical or no-
mial data. Such a limitation is impractical when the at-
tributes of collected data are both numerical and nominal,
and yet are both crisp and fuzzy. Association rules can be
facilitated to classify desirable knowledge with associa-
tions by analyzing, matching, and combining the data,
but such classification usually results in crisp rules. How-
ever, the possible fuzzy representations of some valuable
attributes need to be critically concerned as well, and
fuzzy inferences and fuzzy rule generations can thus play
important roles in dealing with such problems.
In this paper, we make u se of association rules to co n-
struct the classification knowledge, and at the same time,
a multithread approach is emplo yed to speed up the crisp
rules generation process. Furthermore, in cases where the
crisp rules generated are not sufficient for classification,
the fuzzy rules would be utilized to cope with the fuzzy
attributes and generate the ultimate rules for classifica-
tion. This approach would significantly reduce the amount
of computation required when compared with other ap-
proaches, such as neural networks or genetic algorithms.
Therefore, the objective of this paper is to gain the clas-
sification knowledge that is buried in the data warehouse
by using the approaches of association rules and fuzzy
partitions in coping with both crisp and fuzzy data. In
C
opyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO
218
addition, we also study the effects of some important
factors on the classification results, such as the number
of classes and the supports. It is notable that the use of
multithread technique would massively reduce the proc-
essing time required to achieve the desired results. Fi-
nally, this paper is organized as follows: Section 2 states
the concepts of data association and classification in
dealing with collected data; Section 3 shows the algo-
rithms; Section 4 performs applications to validate the
proposed schemes; and finally, Section 5 draws the con-
cluding remarks.
2. Literature Review
Knowledge discovery has been a vital theme for corpora-
tions since 1991. The process of knowledge discovery
includes data cleansing, integrating, selecting, trans-
forming, mining, and interpreting [7,20]. KDD is a cyclic
process to gain valuable knowledge, and data mining
plays an important role in KDD [8]. Fayyad et al. [7]
stated that data mining is “the nontrivial process of iden-
tifying valid , novel, p otentially useful, and ultimately un-
derstandable patterns in data.” Data mining can extract
information by the approaches of data association, classi-
fication, clustering, visualization, and template, etc., and
the research areas of data mining include statistics, ma-
chine learning, data visualization, knowledge manage-
ment, database techniques, and others [9]. Decision trees,
neural networks, and genetic algorithms are often used to
cope with the complicated problems of classification and
prediction that arise in this pro cess. This paper dealt with
both crisp and fuzzy data by using the techniques of as-
sociations and classifications to gain effective knowledge
for improved ma nag erial deci s i on making.
The major task of data association is to find the pat-
terns of association among data, and then generate the
association rules according to their predefined confi-
dences and supports [1,5,10,22]. Han et al. [10] used the
transaction data of supermarkets and found an interesting
(and now famous) association rule that customers who
bought diapers usually bought beers at the same time.
This seems to be odd at first, since no one ever thought
that diapers and beers could be allocated on shelves near
each other in order to raise their sales, but it eventually
turns out to be a great idea. Therefore, data association is
a valuable technique to mine deeply buried, otherwise
inconceivable information to be made available for
managerial decision making. It is obvious that the whole
set of association rules can be obtained by mining the
database with the data association technique, but the cru-
cial task will then be the pruning process for such large
rule sets. The minimum coverage of these rules in the
database is the threshold for retaining the useful associa-
tion rules. The general procedure of data association
consists of two steps [1]. First of all, to find the largest
sets of items whose supports are greater than the preset
value of support, and secondly, to generate the associa-
tion rules by using the largest sets of items. Mannila et al.
[18] and Park et al. [19] followed these basic steps and
proposed their revised association procedure to improve
the effectiveness of association mining.
Compared with the unsupervised method of data clus-
tering, data classification is a supervised method that
provides decision makers with the guidelines for decision
analysis or outcomes prediction by identifying the rules
of actual events [4,6,14]. Unlike data association, the
results of data classification are not only interpretable
and meaningful but also acceptable and understandable
by decision makers. The techniques include induced de-
cision trees, Bayesian classification, classification based
on association rule, case-based reasoning, and fuzzy set
approach, etc. Normally, the data are separated into two
parts, i.e., the training data and the test data. Learning
and classifying are two major phases in data classifica-
tion. During the learning process, the training data are
managed to generate the classification rules, and then the
classification process classifies the test data by using the
classificati on r ul e s generated fro m the learning pr oces s.
Han and Kamber [9] stated that the major concerns of
data classification were predictive accuracy, speed, ro-
bustness, scalability, and interpretability. One crucial
problem for data classification is that the amount of data
is usually remarkably large, and it is very time consum-
ing to perform the process of classification. Some re-
search has used heuristic approaches to reduce the proc-
essing time. Bayardo [3] proposed a brute-force mining
technique to generate classification rules with high con-
fidence. Ali et al. [2] suggested a method of partial clas-
sification using association rules to resolve the problem
of over computing. The major benefit of partial classifi-
cation compared to traditional classification is that it
provides a way to classify data more accurately and effi-
ciently. Ali et al. [2] also stated that the main problems
of inefficiency for classification might stem from too
many attributes, missing values of attributes, not uni-
formly distributed classes, interdependent attributes, and
too many training examples. However, these problems
can be resolved by using the data association technique.
Liu et al. [17] proposed an approach of class associative
rules (CARs) and claimed that it could reduce over ex-
panded amounts of classification rules, but the continu-
ous types of data have to be preprocessed and trans-
formed into discrete types, since CARs can only deal
with discrete types of data. Li et al. [16] proposed an-
other method of classification based on multiple class
association rules (CMAR), which was improved from the
CARs. They empirically showed that the average accu-
rate rates for C4.5 [21], classification based on associa-
tion (CBA), and CMAR were about 83.3%, 84.7%, and
Copyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO 219
85.2%, respectively. It seems that classification by asso-
ciation can actually enhance the accuracy of classifica-
tion.
Usually, the attributes of collected data are not all
crisp. Some fuzzy attributes indeed exist in real world
situations, such as customer satisfaction and customer
loyalties, etc. The fuzziness from human cognitive proc-
esses can often be modeled by taking a fuzzy approach
(i.e., fuzzy numbers and membership functions) [11,12].
Kruse et al. [15] indicated that fuzzy systems have the
deduction abilities to process semantic data by trans-
forming them into mathematic structures. Ishibuchi et al.
[13] proposed a classification method by using fuzzy
rules to analyze the classification knowledge. He stated
two basic methods for rule-based fuzzy systems: 1) vot-
ing by multiple fuzzy if-then rules in a single fuzzy
rule-based classification system; 2) voting by multiple
fuzzy rule-based classification systems.
The problem with fuzzy classification stems from ei-
ther low accuracy because of the relatively wide fuzzy
partitions or the large number of rules generated b ecause
of the relatively narrower fuzzy partitions. Certainly,
neither of these conditions is ideal for decision makers.
Yen [23] proposed three approaches to fuzzy partitions
which are grid, scatter, and tree partitions, respectively.
In this paper, the grid partition was used combined with
the classification method proposed by Ishibuchi et al.
[13], since they both produce more understandable re-
sults and are relatively straightforward to implement..
3. The Proposed Approach
The collected data for decision making usually has both
crisp and fuzzy attributes. To cope with this problem, an
approach was developed which combined the association
rules and the fuzzy rules along with th e multithread tech-
nique to integrate a classification system. Suppose that a
database D has N transaction records {d1,d2, ···, dN},
each transaction record has r crisp items and s fuzzy
items, i.e., D = {d1,d2, ···, dN}, C = {C1,C2, ···,Cr}, and F
= {F1,F2, ···, Fs}, where C and F are the sets of crisp
items and fuzzy items, respectively. Furthermore, we
assume that the ith crisp term Ci has pi different catego-
ries, and the jth fuzzy item Fj has qj different categories,
i.e., Ci:(cip1, c
ip2, ···, c
ipi), and Fj:(fjq1, f
jq2, ···, f
jqj). The
possible number of different classes of the data set is
assumed to be M, i.e., T:(T1,T2, ···, TM). Therefor e, if we
select n records from the database D to form the training
data set DT which is denoted by DT = {dT1,dT2, ··· ,dTn},
the training data will be of the form as Table 1, where
T_ID denotes the transaction iden tification.
The framework of the two-stage process of classifica-
tion is shown in Figure 1, and the detail development of
rule generation is shown in Figure 2.
The main tasks in generating the association rules are
partitioning the training data b y different classes, finding
1-ruleitem rules, generating association rules, using the
multithread technique, creating the data for fuzzy classi-
fication, and generating fuzzy association rules, etc. We
discuss these tasks in detail in the next few paragraphs.
For partitioning the training data into different classes,
some SQL statements were simply implemented and the
results are shown in Table 2.
Table 1. Training transaction data.
T_ID C1C2··· CrF1 F2 ··· FsClass
1 c11 c21 ··· cr1f11 f21 ··· fs1
2 c12 c22 ··· cr2f12 f22fs2
T1
T2
· · · ···· . . ··· · ·
· · · ···· . . ··· · ·
· · · ···· . . ··· ·
n c1nc2n ··· crnf1n f2n ··· fsn
·
TM
Table 2. The data table of Class Ti.
T_ID C1C2··· CrF1 F2 ··· FsClass
1 · . . · . · . ·
2 · . . · . · . ·
· · . ···· . · ··· · Ti
· · . ···· . · ··· ·
· · . ···· . · ··· ·
ni · . . · . · ··· ·
Selecting
Training Data
Training
Data
Rule
Base
Test
Data
Rule Generation:
Gener a te Cl assifi c ati o n Tabl es
Generate Crisp Rules
Generate Fuzzy Rules
Classification
System
Selecting
Test Data
Database
Classification
Results
Figure 1. The two-stage process of classification.
Copyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO
220
1-RuleItem
Generation
Training
Data Delete
Data
Store
Rule
k-RuleItem
Generation k-Ruleitem
Generation
k-RuleItem
Rules Generation
k-RuleItem
Generation
1-Ruleitem
Generation
Rules
Base
1-RuleItem
Rules Base
Store
Rules
Generate
Rule?
Multi threaded Process ing
Generate
Rule?
Stop
Y
N
Un-Processed
Ite m Sets
k-RuleItem
Rules Base
Training
Data
Figure 2. The rule generation process.
As shown in Table 2, each different class has its own
table of transaction data, and each table can be used to
generate the association rules for the same class. Note that
1
M
i
inn
12
112 2
:andand
then with
.
We first process the crisp rules. After the 1-rule item
rules have been generated, the association rules can be
found by the appro ach proposed by Han and Kamber [9 ].
Some revisions were made to fit our problem descrip-
tions. The approach includes four parts, such as the main
routine, the ruleitem sets generation routine, the deleting
infrequent ruleitems routine, and the finding freqent
1-itemrules routine. This program is merely for one class,
i.e., Ti. To obtain more efficiency for classification, the
multithreaded technique can be used to process the pro-
gram mentioned previously in parallel for each of the M
classes. After executing all the M processes in parallel,
crisp rules can be generated for each class if they exist.
Note that it is possible that the program for each class
can result in similar association rules which have the
same values for the crisp attributes but assign them to
different classes. Such situations of the same rule but in a
different class can happen because we only process the
crisp attributes so far. The conflict rules have to be re-
fined in order to obtain the rational rules after further
examination by considering the fuzzy attributes. We
store these conflict rules in the table of fuzzy processing
rule (FPRT), and wait for the fuzzy classification.
For joining the two tables of DT and FPRT to obtain
the transaction data corresp onding to the fuzzy items, the
SQL statements are generated for fuzzy rules processing
and an example table of results is shown in Table 3.
Table 3 shows that for certain number of data (e.g.,
nl*), the categories of their crisp items are all identical
(e.g., (c1*,c2*, ···, cr*)), but they belong to different clas-
ses (e.g., Ti, ···,Tj). In such cases, further processing is
necessary to be performed by considering the fuzzy items.
We use the approach proposed by Ishibuchi et al. [13] for
fuzzy classification. Triangular membership functions
are employed to represent the four human judgment de-
grees of small, medium, medium large, and large, and a
sample result of a fuzzy rule Rj can be given as follows:
s
j
qqssq
Mj
RIFFfF FFf
ClassT CF

where CFj denotes the grade of certainty for rule Rj [13].
Note that the algorithms used in this section are pre-
sented in Appendix.
4. Numerical Investigation
A set of data of customers’ creditab ility for classification
was simulated by the computer program. For simplicity
without losing generality, six crisp and two fuzzy attrib-
utes are used, and one class attribute is utilized to classify
the data. The sa mple data are shown in Table 4.
The crisp items are gender: {Female, Male}, annual
income: {under 15K, 15K~20K, 20K~25K, 25K~30K,
30K~35K, over 35K}, loan: {None, 15K, 20K, 25K, 30
K}, saving: {under 15K, 15K~20K, 20K~25K, 25K~30K,
30K~35K, over 35K}, housing: {Yes, No}, and area:
{East, West, South, North, Central}. The fuzzy items are
customer loyalty: [0..1] and customer satisfaction: [0..1].
We assume that the customers have been preset to 21
classes, i.e., from Class A to Class U, according to their
credit records. From this data set, the possible combina-
tions of customers from the crisp items are over 3,600,
and the possible number of fuzzy rules are the partitions
by the fuzzy items which is 4 × 4 = 16 for each conflict
case mentioned in the previous section. 33,000 records of
customer data are randomly simulated under the situ ation
Table 3. A processing table for fuzzy rules.
T_ID C1C2··· CrF1 F2 ··· FsClass
1 · · · · f11 f21 ··· fs1
2 · · · · f12 f22 ··· fs2 Ti*
· c1*c2*···cr*· · ··· · ·
· · ·
··· · · · ··· · ·
· · ·
··· · · · ··· ·Tj*
nl* · ·
·· f1nl* f2nl* ··· fsnl*
Copyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO
Copyright © 2012 SciRes. IIM
221
where 10 predefined crisp rules are included, and the
amounts of data for each predefined crisp rule are be-
tween 300 to 500 and 1000 to 2000. The objective of
putting predefined crisp rules in the data set is to verify
that the proposed schema is actually capable of finding
the hidden, predefined crisp rules. We set the minimal
supports to be 0.2%, 0.4%, and 0.6%, to evaluate how
the minimal support affects the accuracy in identifying
the predefin ed crisp rules. We set the minimal confidence
to 60%. We also took 25%, 50%, and 75% of the data set
as the training data to study the different effects of parti-
tioning.
The final results after running the classification proc-
ess would be two types of rules. One is the crisp rule,
since the rule can be generated merely by the crisp items,
and no further process is needed. An example is given by
IF “sex = F AND annual income = 25K ~ 30K AND loan
= None AND saving = Under 15K AND housing = No
AND area = North” THEN “class = A”, and the other is
the fuzzy rule, which consists of crisp and fuzzy items,
since the rule can not be decided alone by the crisp items,
and the fuzzy items need to be taken into consideration.
An example is also given by IF “sex = F AND annual
income = 25K ~ 30K AND loan = None AND saving =
Under 15K AND housing = No AND area = North” AND
IF “customer loyalty = Small and customer satisfactory =
Large” THEN “class = A” WITH CF = 0.7.
Table 5 shows the classification results for the differ-
ent minimal supports.
Note that the accuracy rate denotes the percentage that
the pr edefin ed rules are co rrectly identified by the gener-
ated rules. From Table 6 (the gray area) we can see that
the proposed approach can accurately identify the prede-
fined crisp rules for most of the cases. However, increas-
ing the minimal support seems to reduce the number of
correct rules generated and the accuracy rates. This is
obviously true, since the minimal support is the threshold
for rule ge nerations [10].
Table 4. Simulated customer transaction data.
Gender Annual income Loan Saving HousingAreaCustomer loyalty Customer satisfactionClass
F 25K ~ 30K None Under 15KNo North0.31 0.75 A
F Over 35K 20K 25K ~ 30KYes East0.52 0.84 D
M 20K ~ 25K 15K 20K ~ 2 5KYes East0.2 3 0.76 B
F 25K ~ 30K 20K 25K ~ 30KNo South0.78 0.69 A
M 30K ~ 35K 25K 30K ~ 3 5 KYes West0.65 0.54 E
- - - - - - - - -
- - - - - - - - -
Table 5. The results of classification for crisp rules.
Percentage of
training data The number of
data Minimal
support The number of
rule generated The number of
predefined rules Accuracy
rate
0.2% 11 10 100%
0.4% 10 10 100%
25% 8250
0.6% 10 10 100%
0.2% 12 10 100%
0.4% 10 10 100% 50% 16500
0.6% 9 10 90%
0.2% 12 10 100%
0.4% 10 10 90% 75% 24750
0.6% 9 10 90%
Table 6. Accuracy 6 of fuzzy classification 6.
20 30 50
Numbe r of class
Training data (%) 2 × 2 5 × 5 10 × 102 × 2 5 × 5 10 × 10 2 × 2 5 × 5 10 × 10
25% 87% 60% 48% 76% 41% 28% 56% 43% 24%
50% 89% 69% 56% 79% 48% 33% 59% 45% 30%
75% 93% 73% 59% 82% 50% 36% 63% 56% 35%
Y.-S. HUANG, J.-F. YAO
222
For the rules that cannot be decided merely by the
crisp items, the further process for fuzzy classification
was then performed. We preset the threshold of CF (i.e.,
the grade of certainty) to be 25%. If, for example, after
running the training process, a fuzzy rule claims that
some customer data can be classified as Class A with
CFA = 30%, as Class B with CFB = 25%, as Class C with
CFC = 20%, as Class D with CFD = 15%, and as Cla ss E
with CFE = 10%, respectively, then it will be treated as
correct if the test data classifies the certain customer data
as A or B, otherwise (e.g., classified as C, D, or E) it will
be treated as incorrect. Table 6 shows the results of
fuzzy classification.
As shown in Table 6 (the gray area), for our simulated
data which has 50 classes and 10 × 10 fuzzy partitions,
the accuracy rates are not very encouraging. The reason
is that there are too many classes to be classified and also
too many categories of the fuzzy attributes to be parti-
tioned, i.e., the more the fuzzy partitions the less cer-
tainty a pattern may belong to a class, so the number of
data in each partition is therefore very small which may
cause the CF to be close to 1. It means that every cus-
tomer data record may produce a single rule, thus, the
chance to find another identical record in the test data is
very small and as a result the accuracy rate is also small.
In such a case, in order to improve the classification ac-
curacy, a large amount of data is required. We performed
several experiments by decreasing the number of classes
and the partitions of fuzzy attributes, and the results are
also shown in Table 6 (the gray area). As can be seen in
the figure, the accuracy rate increases when the numbers
of classes and the fuzzy partitions both decrease. The
percentage of training data used is also an affecting fac-
tor. This is understandable, since when we use more data
to generate classification rules, the rules generated will
be more statistically accurate.
5. Conclusion
This paper deals with the hybrid systems which contain
both the crisp and fuzzy attributes. The associations with
crisp items are used firstly to generate the crisp ru les, and
the multithread technique is also employed at this stage.
Then the further process is required to use fuzzy items to
classify data if some crisp rules are not successfully gen-
erated. This combined approach can improve the task of
classification more efficiently and effectively. However,
the problems of a large number of classes and a large
number of categories for fuzzy items are the issues of
importance which might need to be further investigated.
REFERENCES
[1] R. Agrawal, T. Imielinski and A. Swami, “Mining Asso-
ciation Rules between Sets of Items in Large Databases,”
Proceeding of the ACM SIGMOD Conference on Man-
agement of Data, Washington DC, 26-28 May 1993, pp.
207-216.
[2] K. Ali, S. Manganaris and R. Srikant, “Partial Classifica-
tion Using Association Rules,” Proceeding of the 3rd In-
ternational Conference on Knowledge Discovery and Da-
ta Mining, (KDD-97), Newport Beach, 14-17 August
1997, pp. 115-118.
[3] R. J. Bayardo, “Brute-Force Mining of High Confidence
Classification Rules,” Proceeding of the 3rd International
Conference on Knowledge Discovery and Data Mining,
(KDD-97), Newport Beach, 14-17 August 1997, pp. 123-
126.
[4] M. S. Chen, J. Han and P. S. Yu, “Data Mining: An Over-
view from a Database Perspective,” IEEE Transactions
on Knowledge and Data Engineering, Vol. 8, No. 6, 1996,
pp. 866-883. doi:10.1109/69.553155
[5] F. Coenen, P. Leng and L. Zhang, “Threshold Tuning for
Improved Classification Association Rule Mining,” Ad-
vances in Knowledge Discovery and Data Mining, Vol.
3518, 2005, pp. 303-305.
doi:10.1007/11430919_27
[6] A. Evfimievski, R. Srikant, R. Agrawal and J. Gehrke,
“Privacy Preserving Mining of Association Rules,” In-
formation Systems, Vol. 29, No. 4, 2004, pp. 343-364.
doi:10.1016/j.is.2003.09.001
[7] U. M. Fayyad, G. Piatetsky -Shapiro and P. Smyth, “From
Data Mining to Knowledge Discovery: An Overview,” In:
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R.
Uthurusamy, Eds., Advances in Knowledge Discovery
and Data Mining, AAAI Press, Menlo Park, 1996, pp. 1-
34.
[8] A. A. Freitas, “A Survey of Evolutionary Algorithms for
Data Mining and Knowledge Discovery,” Advances in
Evolutionary Computing—Natural Computing Series, Part
II, Springer-Verlag, New York, 2003, pp. 819-845.
[9] J. Han and M. Kamber, “Data Mining Concepts and Te-
chniques,” Morgan Kaufmann, San Francisco, 2001.
[10] J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns
without Candidate Generation,” Computing Science Te-
chnical Report, TR-99-12, Simon Fraser University, Bur-
naby, 1999.
[11] Y.-C. Hu, R.-S. Chen and G.-H. Tzeng, “Finding Fuzzy
Classification Rules Using Data Mining Techniques,”
Pattern Recognition Letters, Vol. 24, No. 3, 2003, pp.
509-519. doi:10.1016/S0167-8655(02)00273-8
[12] H. Ishibuchi an d T. Yamamoto, “Fuz zy Rule Se lection by
Multi-Objective Genetic Local Search Algorithms and
Rule Evaluation Measures in Data Mining,” Fuzzy Sets
and Systems, Vol. 141, No. 1, 2004, pp. 59-88.
doi:10.1016/S0165-0114(03)00114-3
[13] H. Ishibuchi, T. Nakashima and T. Morisawa, “Voting in
Fuzzy Rule-Based Systems for Pattern Classification Pro-
blems,” Fuzzy Sets and Systems, Vol. 103, No. 2, 1999,
pp. 223-238. doi:10.1016/S0165-0114(98)00223-1
[14] M. Kantarcioglu and C. Clifton, “Privacy-Preserving Dis-
tributed Mining of Association Rules on Horizontally
Partitioned Data,” IEEE Transactions on Knowledge and
Copyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO 223
Data Engineering, Vol. 16, No. 9, 2004, pp. 1026-1037.
doi:10.1109/TKDE.2004.45
[15] R. Kruse, J. Gebhardt and F. Klawonn, “Foundations of
Fuzzy Data,” John Wiley & Sons, Chichester, 1994.
[16] W. Li, J. Han and J. Pei, “CMAR: Accurate and Efficient
Classification Based on Multiple Class-Association Ru-
les,” Proceeding of 2001 International Conference on
Data Mining (ICDM’01), San Jose, 2001.
[17] B. Liu, W. Hsu and Y. M. Ma, “Integrating Classification
and Association Rule Mining,” Proceeding of the 4th In-
ternational Conference on Knowledge Discovery and Da-
ta Mining (KDD-98), New York, 27-31 August 1998, pp.
80-86.
[18] H. Mannila, H. Toivonen and I. Verkamo, “Efficient Al-
gorithms for Discovering Association Rules,” AAAI
Workshop on Knowledge Discovery in Databases, Seattle,
1994.
[19] J. S. Park, M. Chen and P. S. Yu, “An Effective Hash
Based Algorithm for Mining Association Rules,” Pro-
ceeding of the ACM SIGMOD International Conference
on Management of Data, New York, 1995, pp. 175-186.
[20] G. Piatetsky-Shapiro and W. J. Frawley, “Knowledge
Discovery in Databases: An Overview,” MIT Press,
Cambridge, 1991.
[21] J. R. Quinlan, “C4.5: Programs for Machine Learning,”
Morgan Kaufmann, California, 1993.
[22] F. Thabtah, “A Review of Associative Classification Min-
ing,” The Knowledge Engineering Review Archive, Vol.
22, No. 1, 2007, pp. 37-65.
doi:10.1017/S0269888907001026
[23] J. Yen, “Fuzzy Logic—A Modern Perspective,” IEEE
Transactions on Knowledge and Data Engineering, Vol.
11, No. 1, 1999, pp. 153-165. doi:10.1109/69.755624
Copyright © 2012 SciRes. IIM
Y.-S. HUANG, J.-F. YAO
224
Appendix
A.1 SQL Statements for Partitioning the Training Data
Insert into Table_i
select * from DT
where Class = “Ti
A.2 The Generation of 1-RuleItem Rules
for i = 1 to r ;
{m = Ci.number_of_attribute; for j = 1 to m;
{ Count Cij; Count Cij.Class
when Cij.Count = Cij.Class.Count
Ruleij = <Cij, Class>;
delete dTj (where Cij.Value=Cij)}}
A.3 The Genera ti on of k-Ruleitem Rules
Generate_itemrules_of_ Clas sTi ()
L1 = find_frequent_1-itemrules (DT);
for(k = 2;Lk-1! =
ψ
;k + + ) {Ck = apriori_gen(Lk-1,
minsup); for each transaction {Ct = subset
(Ck,t); for each candidate
T
tD
t
cC
c.count++;}}
return L=kLk;

appriori_gen(Lk-1,minsup)
for each itemset l1Lk-1
for each itemset l2Lk-1
if (l1 [1] = l2 [1]^ l1 [2] = l2 [2]^ ······ l1 [k-2] = l2[k-2]^
l1[k-1] = l2[k-1]) then {c = l
1l2; if has infre-
quent_subset(c,Lk-1) then delete c; else add c to Ck; }}
infrequent_subset(c,Lk-1)
for each (k-1)-subset s of c
if sLk-1 then retu rn TRUE; return FALSE;
find_frequent_1-itemrules(DT)
for i = 1 to r
{ for j = 1 to m { if Ci. cijL1;
then add [Ci.cij] to L1;
else
j++;}
i++;}
A.4 The Algorithm of Multithreaded Computing for M
Classes
Multithread( )
{ ThreadStart Generate1 = new ThreadStart (Gener-
ateRule1);
ThreadStart Generate2 = new ThreadStart (Gener-
ateRule2);
ThreadStart GenerateM= new ThreadStart(Gener-
ateRuleM);
Thread g1= new Thread (Generate1);
Thread g2= new Thread (Generate2);
···
Thread gM=new Thread (GenerateM);
g1.Start;
g2.Start;
···
gM.Start;}
A.5 The Generat i on of FPRT
select Table_1.*
from (Tab le_1 inner join Table_2 on (Table_1.* = Ta-
ble_2.*) inner join Table_3 on (Table_2.* = Ta-
ble_3.*) ……inner join Table_M on (Table_M-1.* = Ta-
ble_M.*)
insert into FPRT;
A.6 SQL Statements to Join Two Data Tables
insert into FUZZY_ITEMSET_T ABLE
select DT.C1 to DT.Cr
from (DT inner join FPRT on (DT. C1 to Cr = FPRTa-
ble.C1 to Cr)
A.7 SQL Statements for Finding Fuzzy RuleItmes and
Corresponding Classes
select [F1], [F2], ···,[Fs],[Class T]
from FUZZY_ITEMSET_TABLE;
select [F1],[F2], ··· [Fs], [Class T]
from FUZZY_ITEMSET_TABLE
where F1 = f1q1and F2 = f2q2 and···and Fs = fsqs”.
Copyright © 2012 SciRes. IIM