A Personalized Recommendation Algorithm Based on Associative Sets

doi:10.4236/jssm.2009.24048

Paper Menu >>

Journal Menu >>

J. Service Science & Management, 2009, 2: 400-403

doi:10.4236/jssm.2009.24048 Published Online December 2009 (www.SciRP.org/journal/jssm)

A Personalized Recommendation Algorithm Based

on Associative Sets

Guorui JIANG, Hai QING, Tiyun HUANG

School of Economics and Management, Beijing University of Technology, Beijing, China.

Email: jianggr@bjut.edu.cn

Received August 11, 2009; revised September 13, 2009; accepted October 24, 2009.

ABSTRACT

During the process of personalized recommendation, some items evaluated by users are performed by accident, in other

words, they have little correlation with users’ real preferences. These irrelevant items are equa l to noise data, and often

interfere with the effectiveness of collaborative filtering. A personalized recommendation algorithm based on Associa-

tive Sets is proposed in this paper to solve this problem. It uses frequent item sets to filter out noise data, and makes

recommendations according to users’ real preferences, so as to enhance the accuracy of recommending results. Test

results have proved the superiority of this algorithm.

Keywords: Frequent Itemsets, Associative Sets, Collaborative Filtering, Recommendation Technology

1. Introduction

How to help users quickly and effectively access to the

information they really need when facing abundant re-

sources becomes a challenging task and also a hot topic

of current academic study. Personalized recommenda-

tion system is one of the effective tools to solve this

problem. A helpful method is to develop intelligent

recommendation system to provide personalized service

[1], that is to recommend products to users according to

their preferences or demands, so as to help them finish

th e p ur c h asing pro cess.

Nearest neighbor collaborative filtering approach is a

recommendation technique that is the most widely used

right now [2]. Its basic idea is to generate recommenda-

tions for target users according to the rating data of near-

est neighbors that have given similar ratings. As items’

(movies, music, etc.) ratings g iven by the nearest nei g hbo rs

are quite similar to those given by target users, items’

ratings given by target users can be estimated by the w e ight -

ed average of the ratings given by the nearest neighbors.

The advantage of collaborative filtering approach is that

it can adapt to the rapid updating of users’ information. It

caculates the tightness among users according to the lat-

est data every time, so as to make recommendations. H ow -

ever, the consequent disadvantage is that it is quite slow

to get K nearest neighbors within large amounts of data.

Meanwhile, results would not be satisfactory when sparse

data is dealt with, especially for new products and new users.

At the same time, its scalability is not very good [3].

On the basis of traditional collaborative filtering algo-

rithm, our paper proposes a personalized recommendation

algorithm based on Associative Sets. This algorithm first

supposes user rating matrix as transaction sets, while ev ery

transaction is a user’s rating set. Then it generates freq ue nt

itemsets through frequent itemsets generation algorithm,

puts frequent itemsets into a series of Associative Sets

according to one user’s rating record, and performs col-

laborative filtering among Associative Sets so as to im-

prove the accuracy and scalability of the algorithm.

2. Traditional Collaborative Filtering

Algorithm and Its Analysis

2.1 Traditional Collaborative Filtering

Algorithm

Collaborative filtering algorithm is the most widely used

approach in personalized recommendations, which can

forecast target users’ interests and preferences according

to neighbor users’ interests and preferences. It first finds

neighbors that have the same preferences with target us-

ers under the help of statistical techniques, and then

makes recommendations to target users according to their

neighbors’ preferences. It includes three stages [4]:

1) Representation

Inputting dat a can usual l y be expressed as an m×n user

rating matrix, where m represents the number of users, n

represents the number of items, and Rij represents the

rating given by user i to item j. Such ratings can have

several scales just as Table 1.

A Personalized Recommendation Algorithm Based on Associative Sets401

Table 1. User/item rating matrix

Item

User

I1 I2 … Ij … In

U1 R11 R12 … R1j … R1n

U2 R21 R22 … R2j … R2n

… … … … … … …

Ui Ri1 Ri2 … Rij … Rin

… … … … … … …

Um Rm1 Rm2 … Rmj … Rmn

2) Neighbor Generation

For user u, generate a “nearest neighbor” set according

to the level of similarity between neighbors. The calcula-

tion of similarity values between neighbors can be per-

formed through vector space similarity calculation me thods

that are widely used currently, such as cosine method,

pearson similarity method and so on. There are two ways

to determine neighbors, one is to determine the similarity

threshold through cosine method first, and then select

users whose similarity values are greater than the simi-

larity threshold as neighbor users; the other is to deter-

mine the number of neighbor users N first, and then se-

lect the first N users whose similarity values are greater

as neighbor users.

3) Recommendation

As “nearest neighbor” set is generated, we can forecast

one certain user’s rating for each item, and then make

recommendations to that user according to the level of

forecasting ratings.

2.2 Problem Analysis

Traditional collaborati ve filtering al gorithm considers users’

entire historical information as its preference in formation

and uses such information to find its nearest neighbors.

However, users’ preferences are often formed exploringly

and progressively in reality. It is a historical progress,

and during this progress, users often try many times and

become stable gradually, so as to form their real interests.

Even though users’ interests have already been formed,

they would sometimes try other items in daily search

process for various reasons. Such items cannot be seen as

their interests and the supporting evidences for recom-

mendations. Therefore, if we want to gain real preference

information of users, we must filter out the occasional

search information to reduce interference. Find nearest

neighbors according to users’ real preference information,

and then make recommendations, while the results of

recommendations can become better.

3. A Personalized Recommendation

Algorithm Based on Associative Sets

As we have analyzed above, we can gain associative

items through frequent itemsets. These associative items

constitute the foundations of different interests. As for

current users, we use their entire information to filter

associative items, and then merge the associative items

after filtering to form their interest sets.

3.1 Algorithm Descriptions

1) Set all items as 123

{,, ,..., }

III I



, n as the number

of items. See every user’s rating record as one item of

transaction, , which represents the rating set of

user i, wherein

TI

{1,2,3,..., }im



, m represents the num-

ber of users. So, user rating matrix can be seen as trans-

action set 123

, ,...,}

TT T{ ,TT



2) Use Apriori algorithm to generate the frequent item-

sets F of transaction set T, whose support level is support.

 In the first iteration of the algorithm, each item of I

is a member of the set of candidate 1-itemsets. The algo-

rithm simply scans all of the transactions T in order to

count the number of occurrences of each item.

 Select the candidate 1-itemsets, which satisfies

minimum support support, to consist the set of frequent

1-itemsets 1

 Use 11



to generate a candidate set of 2-itemsets,

and prune using apriori property---A ll nonempty su b s e ts o f

a frequent itemset must be frequent. Then, scan all of the

transactions T in order to count the number of occur-

rences of each item in candidate set of 2-itemsets.

 Select the candidate 2-itemsets, which satisfies min-

imum support support, to consist the set of frequent

2-itemsets 2

 Constantly use 11kk





to generate a candidate

set of k-itemsets, and prune it. Then, scan all of the

transactions T in order to count the occurrence of each

A Personalized Recommendation Algorithm Based on Associative Sets

402

item in candidate set of k-itemsets. Select the candidate

k-itemsets, which satisfies minimum support support, to

consist the set of frequent k-itemsets k

 If the candidate set of k-itemsets is null, all frequent

itemsets are gained.

3) Gain rating items *

of current user a, and merger

the frequent itemsets, which contains some items of *

and also the number it contains is more than parameter

num in F, as associative sets C.

4) Use Pearson correlation coefficient algorithm to cal-

culate the similarity between user a and any other user b

in associative sets C.

()(



)

(,)[( )][( )]

aj a bb

aj ab

jC jC

RR R

wab RR R











(1)

where j is the item in associative sets C, is the rat-

ing given by user a to item j, is the rating given by

user b to item j,

R and C

R are average ratings of

user a and user b separately.

5) For a, arrange all the users according to the value of

, and select the first M users that have greater

values as neighbor users of user a.

(,)

wab

6) Forecast the rating of u ser a to item j. The forecast-

ing formul a is:

Neighbor

)



(,)()

b Neighbor

aj aC

b Neighbor

wab R

PR wa







 (2)

where is the forecasting rating of user a to item j,

is the rating of user b to item j,

,aj

,bj

R and b

R are

average ratings of user a and user b to all items.

7) Arrange items according to the value of , and

select the first N items that have greater as recom-

mendatory items.

,aj

3.2 Algorithm Explanations

1) In addition to the original rating records, there are four

other parameters in this algorithm: support for the calcula-

tion of frequent itemsets support, threshold for the selec-

tion of associative sets from frequent itemsets num, num-

ber of nearest neighbors M and number of items that are

recommended to users N. Wherein, support and num are

used to determine Associative Sets, but what is the right

combination needs to be tested. Usually, different data sets

have different proper combination of support and num.

Therefore, it will take more time to learn this algorithm.

2) Step 1 and step 2 in algorithm description are mai-

nly used to generate frequent itemsets, which will take

much more time. However, as it is performed offline,

instant recommendations cannot be influenced.

3) As frequent itemsets have to be merged (Step 3)

before collaborative filtering, it will take more time online

than traditional algorithm will take, but its accuracy can

be improved greatly. As the duration of merging frequent

itemsets relies on the number of frequent items, to reduce

frequent itemsets through offline activities can shortern

online duration. In addition, because associative filtering

items for every user are somewhat less than all the items,

the duration of collaborative filtering process itself will

be reduced. Through optimization, online duration of

algorithm can be reduced accordingly.

4) Frequent itemsets include the complete set of fre-

quent itemsets, the closed frequent itemsets, maximal fre-

quent itemsets and so on [5]. The frequent itemsets used

in this paper are maximal frequent itemsets, which can

reduce the number of frequent itemsets greatly. If other

frequent itemsets are used, we can calculate the impor-

tance of different items when calculating nearest neig-

hbors with the help of support when merging frequent

itemsets, which can improve the accuracy further more.

4. Test Process

4.1 Data Set and Evaluation Standard

Data set MovieLens is used to test this algorithm, which

is provided by the GroupLens research lab at the Univer-

sity of Minnesota. The data was collected through the

MovieLens web site (movielens.umn.edu) during a

seven-month period. MovieLens includes 100000 records

of ratings given by 943 users to 1682 movies. A rating is

a number from 1 to 5, optionally supplemented by the

number of seconds which the user spent reading the

movie. Users are encouraged to assign ratings based on

how much they liked the movie, with 5 highest and 1

lowest. Each user has given ratings to 20 moves at least.

You can get the date set at www.grouplens.org.

Average Abs olute Error (MAE) is used to evaluate the

forecasting accuracy of this algorithm. MAE is the devia-

tion average of the actual value and the predictive value of

the ratings given by all users to the items. The lower the

value of MAE is, the better the recommendations are. Sup-

posing user rating set is , and the actual user

rating set is , MAE is defined as follows [6]:

{ ,,...,}

pp p

}

{, ,...,qq

MAE N







 (3)

4.2 Test Results and Remarks

We will compare Associative Sets Based Collaborative

Filtering (ASBCF) and traditional User Based Collabora-

tive Filtering (UBCF) during the test process. In order to

verify the results, we will test in two dimensions.

One is to test in different sparse degrees. Here we se-

lect the first 200 users and first 500 items in MovieLens

rating records, and then deduct some records every time

randomly. In the end, we gain rating records under

A Personalized Recommendation Algorithm Based on Associative Sets

403

200*500, 13270 pieces, 7976 pieces, 4525 pieces and 21 6 2

pieces separately, and their sparse degrees are 0.13, 0.08,

0.043, 0.021662 separately, see Figure 1.

fectively than UBCF, and when the number of neighbors

increases, users that are a little further from current user

are also selected, which can increase error. That is to say,

ASBCF is more effective than UBCF. From the results, we can see that in different sparse

degrees, MAE of ASBCF is lower than that of UBCF,

5.2206% in average. That is to say, ASBCF performs

better in every sparse degree than UBCF. However, as

the data is sparse extremely, ASBCF’s accuracy will be

reduced accordingly.

5. Conclusions

This paper proposes a personalized recommendation (col-

laborative filtering) algorithm based on Associative Sets. It

generates a series of frequent itemsets through frequent

itemsets generation algorithm, and then filters out some

noise items that have little relevence with users by merg-

ing, so as to make collaborative filtering algorithm more

effective. It is proved that this algorithm is better than tra-

ditional algorithm in recommendation accuracy. Although

it takes more time to generate frequent items, it will not

influence instant recommendations, as the generation can

be performed offline. Support of frequent itemsets owns

one kind of new information, which represents different

items’ importance. If such information is used in collabo-

rative filtering, forecasting accuracy will be improved, and

this is the breakthrough point for further research.

The other is to compare values of MAE with different

numbers of nearest neighbors. Here we select 10, 20, 30,

40 and 50 nearest neighbors, and the test results can be

seen in Figure 2.

From Figure 2, we can see that ASBCF also performs

better than UBCF with all kinds of nearest neighbor

numbers. However, along with the increasing of neighbor

number, their gap becomes smaller and smaller. Maybe it

is because ASBCF can find nearest neighbors more ef-

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.13 0.08 0.043 0.02162

Sparse of Data

MAE

ASBCF

UBCF

REFERENCES

[1] J. B. Schafer, J. A. Konstan, and J. Riedl, “E-Commerce

recommendation applications,” Journal of Data Mining

and Knowledge Discovery, pp. 115–153, 2001.

[2] J. Breese, D. Hecherman, and C. Kadie, “Empirical

analysis of predictive algorithms for collaborative

filtering,” In Proceedings of the 14th Conference on

Uncertainty in Artificial Intelligence (UAI-98), pp. 43–52,

1998.

Figure 1. MAE of ASBCF and UBCF under different sparse

degrees [3] L. Zhao, N. J. Hu, and S. Z. Zhang, “Algorithm design for

personalization recommendation systems,” Journal of

Com- puter Research and Development, pp. 986–991,

August 2002.

0.71

0.62

0.63

0.64

0.65

0.66

0.67

0.68

0.69

0.7

10 20 3040 50

umber of Neighbors

MAE

ASB C F

UBCF

[4] Y. Li, L. Liu, and X. F. Li, “Research on personalized

recommendation algorithm for user’s multiple interests,”

Journal of Computer Integrated Manufacturing Systems,

pp. 1610–1615, December 2004.

[5] J. W. Han and M. Kamber, “Data mining concepts and

techniques (Second Edition),” Ming Fang, Xiaofeng

Meng translated, China Machine Press, pp. 149–161,

March 2007.

[6] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item

based collaborative filtering recommendation algo-

rithms,” In Proceedings of the Tenth International World

Wide Web Conference, pp. 285–295, 2001.

Figure 2. MAE of ASBCF and UBCF with different num-

bers of nearest neighbors