Journal of Software Engineering and Applications, 2013, 6, 30-33
doi:10.4236/jsea.2013.67B006 Published Online July 2013 (http://www.scirp.org/journal/jsea)
The Application of Book Intelligent Recommendation
Based on the Association Rule Mining of Clementine
Jia Lina, Mao Zhiyong
Graduated Scho o l , L i ao ning Technica l University, Hu Ludao, China.
Email: jelena1988@sina.cn
Received May, 2013
ABSTRACT
The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine
to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete the redundant data. It can
avoid scanning the database repeatedly and producing a large number of false rules. Secondly, the paper used clustering
results to perform association rule mining. It can obtain valuable information and achieve the service of intelligent
recommendation.
Keywords: Data Mining; Association Rules; Clustering; Intelligent Recommenda tion; Clementine
1. Introduction
The recommended service plays an important role in the
process of the digital library gradually toward personal-
ization and intelligent. Th e syste m can recommend book s
to the readers by the relev ant information which is found
from the readers’ lending behavior and preferences from
data mining. Relevance information mining is association
rules mining[1]. This question has been paid attention
and studied by many international researchers after it has
been put forward by Rakesh Agrawal and researchers
also raise many kinds of algorithms.
Association rules are put fo rward to break the transac-
tion limit. To find the relationship between different
transactions so that to predict events that users interest
reasonably. It will be a long time to do the data mining
and the rules will be a lot with false rules when transac-
tion analysis is carried out on the larg e database. And the
mining efficiency is reduced. Based on it, this paper uses
the data mining software Clementine to clustering analy-
sis on the reader firstly, and cluster the behavior of bor-
rowing book s for high frequency, mediu m frequency and
low frequency[2]. To do the association rule mining to
the books which is borrowed by readers who borrow by
high frequency and medium frequency? Finally, transfer
the mining result to the client user by Web service.
Choose the books borrowed by users which are the high
frequency and medium frequency to have the association
rule mining is because the amount of borrowed books is
huge and the association rule is strong. So it narrows the
amount of data involved in association rule, save scan-
ning time, and then to improve the quality of mining.
2. Clementine Software Introduction
Clementine is data mining software developed by SPSS
company. It puts clustering, association rules, decision
trees, neural network and many kinds of data mining
technology to integrate in the intuitive visual graphic
interface. Clementine combine with business technology
to build the data model quickly to apply it to business
activity and help people to improve the decision making
process. The paper applies clustering and association
rules mining in Clementine 12.0 to book intelligent rec-
ommendation service[3].
2.1. Characteristics of Clementine
1) It provides that visual, strong and easy-to-use data
mining platform. The process of user modeling is to
connect each no de. It can be built the data min ing model
without programming so that user can be more focused
on the solving specific business problems by using data
mining rather than the use of tools.
2) Fully follow the CRISP-DM standards to establish.
Clementine provides good project management function.
And it can manage overall process effectively from
business understanding to result release.
3) It provides steady and strong release function.
Clementine can release data mining model or the whole
flow of data mining to improve efficiency of operations.
4) High flexibility and extensibility. Clementine has
open database interface. It provides almost all the rela-
Copyright © 2013 SciRes. JSEA
The Application of Book Intelligent Recommendation Based on the Association Rule Mining of Clementine 31
tionship database. Meanwhile, it owns extended function.
2.2. Six Stages of CRISP-DM Process Model
1) Business understanding. It is the most important
stage in data mining. It includes that confirm business
object, estimate situation, confirm target of data mining
and set out engineering plan.
2) Data understanding. It provides materials of data
mining to realize data characteristics of data source. It
includes that collect initial data, describe data, clean data,
and check the quality of data.
3) Data preparation. Classify the data source from data
mining. It includes that data selection, cleaning, structure,
integration and formattin g.
4) Modeling. It is the core part of data mining. It in-
cludes that choose modeling technology, generate test
design and structure and evaluation model.
5) Model evaluation. It can evaluate result of data
mining that can help to realize business target after
choosing the model. It includes that result, view the pro-
cess of data mining and confirm the next step.
6) Result deploys. It can combine the new knowledge
with daily business flow to solve initial business prob-
lems. It includes that plan deploy, monitoring, maintain,
produce final report and review the project[4].
3. Library Data Mining Based Clementine
The information requests and forms of users in library
are diversified. It provides personalized recommendation
service based on the requests and interests of readers.
The paper clusters analysis to the times of readers. It can
be divided into three types: high frequency, medium fre-
quency and low frequency. And then association rules
analysis to the books which are borrowed by high and
medium frequency readers to realize personalized rec-
ommendation service[10].
3.1. Data Acquisition
The data in this paper is from lib rary in Liao Ning Tech-
nical University. The total amount of reader borrowing
books is 62261 from Nov 7th, 2011 to Mar 7th, 2012.
And extract 3108 from it to serve as the experimental
subject.
3.2. Data Pre-Processing
The paper gets to the Excel table to import SQL Server
2000 database to do the data pre-processing. The data
pre-processing mainly reprocess data in previous stage to
check the integrity of data and consistency of data. It
includes noise immunization, deduce to calculate missing
data, remove duplicate record and complete data type
transfer. In preprocessing stage, delete “dirty data” which
is redundancy vacancy data, not completing, noise in-
formation. It establishes the foundation for data mining
in next step and improves the digging efficiency and dig-
ging quality[7].
3.3. Modeling Based on Clementine
3.3.1. Clus ter Mo del i n g
Input the data which is collected after preprocessing into
cluster modeling in SPSS Clementine to cluster modeling
analysis. The paper uses K-means algorithm to cluster
modeling for the reader’s borrowing behavior.
K-means[15] algorithm is a process of iterating to calcu-
late “centroid” and being based on the distance between
sample and centroid to appoint every sample to cluster.
The following is the process[5].
1) Make sure initial centroid. Select the first sample as
the first centroid. And calculate th e distance and Squared
Euclidean distance between it and centroid for every
sample. Define centroid vector and a
sample vector

12
,Q
Ccc c
12
,Q
X
xx x
q, Q is the amount of prop-
erties in data set.
x
is the first q attribute values,
1, 2,,qQ
. So the following is computational formula
of Euclidean distance between sample and centroid:

2
1
Q
qq
q
dxc

After the initial K centroids are gen-
erated, the algorithm begins to iterate and appoint[14].
Select the biggest sample of Euclidean distance to be as
another centroid. And repeat it till K centroids are all
identified.
2) Appoint sample. During every iteration, each of the
samples is appointed to the cluster which is nearest to
itself. The distance is defined by the square of the Eu-
clidean distance so the distance between sample I and
centroid j:
2
2
1
Q
ijijqi qj
q
dXC xc
 
i
X
is vec-
tor which is constituent by attribute values of sample i,
C is centroid vector of cluster j,Q is the amount of
property,
qj
qi
x
is the number q property value of number
i sample, c is the number of q property value of the
centroid in cluster j. Begin to update every centroid of
cluster after all the records are all appointed.
qj
3) Update centroid. Some samples in one cluster may
be transferred into other clusters in the process of ap-
pointing samples. So it needs to recount centroid of every
cluster. Establish mis the sample amount of number j
cluster after appointing sample. So the vector of recount
the centroid of cluster is:
j

12
, ,...,,
jjjQj
Xxx x num-
ber
1,2,...,qq Q in vector and component qi
x
is:

1
j
m
qi
i
qj j
x
j
xm
,
qi
j is the number q property value
Copyright © 2013 SciRes. JSEA
The Application of Book Intelligent Recommendation Based on the Association Rule Mining of Clementine
32
in sample i of cluster j[11].
4) Stopping criterion. Firstly, “the max iterations”
controls that the algorighm search stable cluster. The
algorighm will repeat “appoint sample-update centroid”
until “the max iterations”[13]. It will generate final mod-
el after it reaches the limitation and the algorighm will
stop to update cluster. And “Tolerance of differences”
provides another way to control algorighm to be stopped.
Calculate distance in centroid space after every iteration
finish. Such as, iteration after t times finish, the distance
in centroid space in number j cluster is:
 
1
jj
Ct Ct, is centroid vector of number
j cluster of iteration in t times, is the centroid
vector of number j cluster when the last iteration. So
there are k results that produced by k clusters. Select the
max in it:

j
Ct
 
1
j
Ct
max 1Ct t
jj
J, if the max is less than C
Tolerance of differences which is predefined. So the al-
gorighm will stop. If not, it will go on.
Through these steps, the following Figure 1 is view of
cluster model.
The result shows that it divides it into three classifies:
high frequency (cluster 2), medium frequency (cluster 3),
and low frequency (cluster 1). Extract the high and me-
dium users because their borrowing amount is huge and
the association rules in the books are strong. The cluster1
is regarded as noisy data to delete so that the association
rules are more typical.
3.3.2. Assoc i a tion Rules Min i ng
Regard the clustering analysis as the pretreatment part of
association rules mining. It can find association rules
efficiently and avoid generating the false rules[6]. It can
Figure 1. Model view.
make data more illustrative, pertinency, veracity. Extract
reader data in Cluster 3 and Cluster 2 are totally 764.
Query the 764 students’ borrowing information from
database to save as data sheet. Use Apriori note in
Clementine to do association rules mining. The process
is:
1) Generate frequent item set. Based on
1k
frequent item sets to make up gather , and generate
all candidate k-item-set , and prune , and calcu-
1k
LC
k
Ck
lated support in every item-set w : support =
k
Ci
N
N,
i is the amount of transaction of including item-set w.
N is amount of all the transaction. Put item set of support
into item-set k in frequently k-. Find the
frequently k- item-set and k is less than max which is
predefined by user. Repeat above steps and search the
frequently item-set
N
min_ supL,k
1k
.
2) After getting all the frequently item-set L, the al-
gorighm will generate association rules based on fre-
quently item-set. Firstly, generate l’s all nonvoid subset
based on frequently item-set l of L. Secondly, for very
nonvoid subset A, if it content valuation criterion
(
  
sup min_,sup and sup are
sup
port lconfport lportA
port A
item-set l and A ‘s support), and then the output role is
A
A”,and -
A
lA[12].
So the association rules is Figure 2, Figure 3
The call number of library is Chinese Library Classi-
fication. From picture Figure 2, the reader who borrows
B83-09/13(historical pedigree and theoretic finality) also
want to borrow B83/20 = 3(aesthetics introduction. re-
vised edition), it can be the reason for reader recommend.
From Figure 3, it can be clearly shown the association
rules among books. And the association rules with thick
line are stronger th an fine line.
Figure 2. Model view.
Copyright © 2013 SciRes. JSEA
The Application of Book Intelligent Recommendation Based on the Association Rule Mining of Clementine
Copyright © 2013 SciRes. JSEA
33
4. Realize Intelligent Recommendation recommending service by digital library development in
the direction of intelligence[9]. The paper views the
cluster as the data pre-processing of association rules
mining to make the rules more accurate. The paper
shows that the subject is effective and viable.
By the data mining process, transfer the association rules
to readers through agent. When there is a request from
client to Web server, transfer the request to the reader
recommended agent to match. And transfer the matching
recommended rules to Web server. Finally, transfer it to
the user in client[8]. This can give readers more selec-
tions, and improve the use ratio of books. Figure 4 is
mode pattern of books intelligent recommendation.
REFERENCES
[1] C. G. Yuan, “Data Mining Theory and SPSS Clementine
Application,” Beijing: Electronic industry publishing,
2009, pp.547-578
5. Conclusions [2] F. Y. You,“Data Mining and digital library application
Office automation magazine,2007, pp.5152
It is important to provide flexible and targeted books [3] C. H. Bao, “Data warehouse and Data Mining,” Beijing:
Tsinghua University Press, 2006.
[4] J. Han, M. Kamber, “Data Mining and Technology” M.
Fan, Translator, Beijing: China Machine Press, 2001,
pp.10-33.
[5] H. Y. Chen, “Based on Weighting Association Rules and
Browse Behavioral Personality,” Chongqing University,
2005.
[6] W. Wang, “Reader Behavior Analysis Based on Data
Mining,” Modern Library and information technology,
2006, pp.51-54.
[7] H. Y. Cai, “The Application in University Library System
for Data Mining about Association Rules,” NUT College
Journal, 2005, pp.85-88.
[8] W. H. Li, “Personality Information Recommend System
in Digital Library,” 2007, pp.109-110.
[9] W. W. Chen, “Data Mining Research about Reader Be-
havior,” Chongqing Southwest University, 2007.
Figure 3. Association rules webs.
[10] B. C. Xie, “Data Mining Clementine Application,” Bei-
jing: THU press, 2008, pp.213-215.
History acc ess
data
Data
Preparation Ass o ciation ru le
mining
Readers r ecom m en d Agent
User current ly access dat a
WebServer Client
Recommended
rules after
matching
[11] J. Bao, S. W. Fan, “The Data Pre-processing for Data
Mining,” Library and Information Science, Vol. 26, No. 2,
2008, pp. 31-33.
[12] Z. G. Li, G. Ma, “DW and DM Application,” Beijing:
Higher Education Press, 2008, pp.150-170.
[13] Q. H. Xiao, “Data Mining Apply in Information Server,”
Library forum, Vol. 24, No. 1, 2004, pp.140-142.
[14] B. H. Wang, “Data Mining and Application,” Statistics
and decision, 2006, pp.122- 123.
[15] X. Li, C. H. Yang, “K-means Cluster Application,” Li-
brary and information Science, Vol. 25, No. 2, 2009,
pp.15-17
Figure 4. Mode pattren of intelligent recommendation.