### Paper Menu >>

### Journal Menu >>

J. Service Science & Management, 2009, 2: 400-403 doi:10.4236/jssm.2009.24048 Published Online December 2009 (www.SciRP.org/journal/jssm) Copyright © 2009 SciRes JSSM A Personalized Recommendation Algorithm Based on Associative Sets Guorui JIANG, Hai QING, Tiyun HUANG School of Economics and Management, Beijing University of Technology, Beijing, China. Email: jianggr@bjut.edu.cn Received August 11, 2009; revised September 13, 2009; accepted October 24, 2009. ABSTRACT During the process of personalized recommendation, some items evaluated by users are performed by accident, in other words, they have little correlation with users’ real preferences. These irrelevant items are equa l to noise data, and often interfere with the effectiveness of collaborative filtering. A personalized recommendation algorithm based on Associa- tive Sets is proposed in this paper to solve this problem. It uses frequent item sets to filter out noise data, and makes recommendations according to users’ real preferences, so as to enhance the accuracy of recommending results. Test results have proved the superiority of this algorithm. Keywords: Frequent Itemsets, Associative Sets, Collaborative Filtering, Recommendation Technology 1. Introduction How to help users quickly and effectively access to the information they really need when facing abundant re- sources becomes a challenging task and also a hot topic of current academic study. Personalized recommenda- tion system is one of the effective tools to solve this problem. A helpful method is to develop intelligent recommendation system to provide personalized service [1], that is to recommend products to users according to their preferences or demands, so as to help them finish th e p ur c h asing pro cess. Nearest neighbor collaborative filtering approach is a recommendation technique that is the most widely used right now [2]. Its basic idea is to generate recommenda- tions for target users according to the rating data of near- est neighbors that have given similar ratings. As items’ (movies, music, etc.) ratings g iven by the nearest nei g hbo rs are quite similar to those given by target users, items’ ratings given by target users can be estimated by the w e ight - ed average of the ratings given by the nearest neighbors. The advantage of collaborative filtering approach is that it can adapt to the rapid updating of users’ information. It caculates the tightness among users according to the lat- est data every time, so as to make recommendations. H ow - ever, the consequent disadvantage is that it is quite slow to get K nearest neighbors within large amounts of data. Meanwhile, results would not be satisfactory when sparse data is dealt with, especially for new products and new users. At the same time, its scalability is not very good [3]. On the basis of traditional collaborative filtering algo- rithm, our paper proposes a personalized recommendation algorithm based on Associative Sets. This algorithm first supposes user rating matrix as transaction sets, while ev ery transaction is a user’s rating set. Then it generates freq ue nt itemsets through frequent itemsets generation algorithm, puts frequent itemsets into a series of Associative Sets according to one user’s rating record, and performs col- laborative filtering among Associative Sets so as to im- prove the accuracy and scalability of the algorithm. 2. Traditional Collaborative Filtering Algorithm and Its Analysis 2.1 Traditional Collaborative Filtering Algorithm Collaborative filtering algorithm is the most widely used approach in personalized recommendations, which can forecast target users’ interests and preferences according to neighbor users’ interests and preferences. It first finds neighbors that have the same preferences with target us- ers under the help of statistical techniques, and then makes recommendations to target users according to their neighbors’ preferences. It includes three stages [4]: 1) Representation Inputting dat a can usual l y be expressed as an m×n user rating matrix, where m represents the number of users, n represents the number of items, and Rij represents the rating given by user i to item j. Such ratings can have several scales just as Table 1. A Personalized Recommendation Algorithm Based on Associative Sets401 Table 1. User/item rating matrix Item User I1 I2 … Ij … In U1 R11 R12 … R1j … R1n U2 R21 R22 … R2j … R2n … … … … … … … Ui Ri1 Ri2 … Rij … Rin … … … … … … … Um Rm1 Rm2 … Rmj … Rmn 2) Neighbor Generation For user u, generate a “nearest neighbor” set according to the level of similarity between neighbors. The calcula- tion of similarity values between neighbors can be per- formed through vector space similarity calculation me thods that are widely used currently, such as cosine method, pearson similarity method and so on. There are two ways to determine neighbors, one is to determine the similarity threshold through cosine method first, and then select users whose similarity values are greater than the simi- larity threshold as neighbor users; the other is to deter- mine the number of neighbor users N first, and then se- lect the first N users whose similarity values are greater as neighbor users. 3) Recommendation As “nearest neighbor” set is generated, we can forecast one certain user’s rating for each item, and then make recommendations to that user according to the level of forecasting ratings. 2.2 Problem Analysis Traditional collaborati ve filtering al gorithm considers users’ entire historical information as its preference in formation and uses such information to find its nearest neighbors. However, users’ preferences are often formed exploringly and progressively in reality. It is a historical progress, and during this progress, users often try many times and become stable gradually, so as to form their real interests. Even though users’ interests have already been formed, they would sometimes try other items in daily search process for various reasons. Such items cannot be seen as their interests and the supporting evidences for recom- mendations. Therefore, if we want to gain real preference information of users, we must filter out the occasional search information to reduce interference. Find nearest neighbors according to users’ real preference information, and then make recommendations, while the results of recommendations can become better. 3. A Personalized Recommendation Algorithm Based on Associative Sets As we have analyzed above, we can gain associative items through frequent itemsets. These associative items constitute the foundations of different interests. As for current users, we use their entire information to filter associative items, and then merge the associative items after filtering to form their interest sets. 3.1 Algorithm Descriptions 1) Set all items as 123 {,, ,..., } n I III I , n as the number of items. See every user’s rating record as one item of transaction, , which represents the rating set of user i, wherein i TI {1,2,3,..., }im , m represents the num- ber of users. So, user rating matrix can be seen as trans- action set 123 , ,...,} m TT T{ ,TT . 2) Use Apriori algorithm to generate the frequent item- sets F of transaction set T, whose support level is support. In the first iteration of the algorithm, each item of I is a member of the set of candidate 1-itemsets. The algo- rithm simply scans all of the transactions T in order to count the number of occurrences of each item. Select the candidate 1-itemsets, which satisfies minimum support support, to consist the set of frequent 1-itemsets 1 L. Use 11 LL to generate a candidate set of 2-itemsets, and prune using apriori property---A ll nonempty su b s e ts o f a frequent itemset must be frequent. Then, scan all of the transactions T in order to count the number of occur- rences of each item in candidate set of 2-itemsets. Select the candidate 2-itemsets, which satisfies min- imum support support, to consist the set of frequent 2-itemsets 2 L. Constantly use 11kk LL to generate a candidate set of k-itemsets, and prune it. Then, scan all of the transactions T in order to count the occurrence of each Copyright © 2009 SciRes JSSM A Personalized Recommendation Algorithm Based on Associative Sets 402 item in candidate set of k-itemsets. Select the candidate k-itemsets, which satisfies minimum support support, to consist the set of frequent k-itemsets k L. If the candidate set of k-itemsets is null, all frequent itemsets are gained. 3) Gain rating items * I of current user a, and merger the frequent itemsets, which contains some items of * I and also the number it contains is more than parameter num in F, as associative sets C. 4) Use Pearson correlation coefficient algorithm to cal- culate the similarity between user a and any other user b in associative sets C. 22 ()( CC j bj R R ) (,)[( )][( )] aj a bb jC C CC aj ab jC jC RR R wab RR R (1) where j is the item in associative sets C, is the rat- ing given by user a to item j, is the rating given by user b to item j, aj R bj R C a R and C b R are average ratings of user a and user b separately. 5) For a, arrange all the users according to the value of , and select the first M users that have greater values as neighbor users of user a. (,) C wab a 6) Forecast the rating of u ser a to item j. The forecast- ing formul a is: Neighbor , ) bj R b , (,)() (, a a Cb b Neighbor aj aC b Neighbor wab R PR wa (2) where is the forecasting rating of user a to item j, is the rating of user b to item j, ,aj P ,bj Ra R and b R are average ratings of user a and user b to all items. 7) Arrange items according to the value of , and select the first N items that have greater as recom- mendatory items. ,aj P ,aj P 3.2 Algorithm Explanations 1) In addition to the original rating records, there are four other parameters in this algorithm: support for the calcula- tion of frequent itemsets support, threshold for the selec- tion of associative sets from frequent itemsets num, num- ber of nearest neighbors M and number of items that are recommended to users N. Wherein, support and num are used to determine Associative Sets, but what is the right combination needs to be tested. Usually, different data sets have different proper combination of support and num. Therefore, it will take more time to learn this algorithm. 2) Step 1 and step 2 in algorithm description are mai- nly used to generate frequent itemsets, which will take much more time. However, as it is performed offline, instant recommendations cannot be influenced. 3) As frequent itemsets have to be merged (Step 3) before collaborative filtering, it will take more time online than traditional algorithm will take, but its accuracy can be improved greatly. As the duration of merging frequent itemsets relies on the number of frequent items, to reduce frequent itemsets through offline activities can shortern online duration. In addition, because associative filtering items for every user are somewhat less than all the items, the duration of collaborative filtering process itself will be reduced. Through optimization, online duration of algorithm can be reduced accordingly. 4) Frequent itemsets include the complete set of fre- quent itemsets, the closed frequent itemsets, maximal fre- quent itemsets and so on [5]. The frequent itemsets used in this paper are maximal frequent itemsets, which can reduce the number of frequent itemsets greatly. If other frequent itemsets are used, we can calculate the impor- tance of different items when calculating nearest neig- hbors with the help of support when merging frequent itemsets, which can improve the accuracy further more. 4. Test Process 4.1 Data Set and Evaluation Standard Data set MovieLens is used to test this algorithm, which is provided by the GroupLens research lab at the Univer- sity of Minnesota. The data was collected through the MovieLens web site (movielens.umn.edu) during a seven-month period. MovieLens includes 100000 records of ratings given by 943 users to 1682 movies. A rating is a number from 1 to 5, optionally supplemented by the number of seconds which the user spent reading the movie. Users are encouraged to assign ratings based on how much they liked the movie, with 5 highest and 1 lowest. Each user has given ratings to 20 moves at least. You can get the date set at www.grouplens.org. Average Abs olute Error (MAE) is used to evaluate the forecasting accuracy of this algorithm. MAE is the devia- tion average of the actual value and the predictive value of the ratings given by all users to the items. The lower the value of MAE is, the better the recommendations are. Sup- posing user rating set is , and the actual user rating set is , MAE is defined as follows [6]: 12 { ,,...,} N pp p } N q 12 {, ,...,qq 1 N ii i pq MAE N (3) 4.2 Test Results and Remarks We will compare Associative Sets Based Collaborative Filtering (ASBCF) and traditional User Based Collabora- tive Filtering (UBCF) during the test process. In order to verify the results, we will test in two dimensions. One is to test in different sparse degrees. Here we se- lect the first 200 users and first 500 items in MovieLens rating records, and then deduct some records every time randomly. In the end, we gain rating records under Copyright © 2009 SciRes JSSM A Personalized Recommendation Algorithm Based on Associative Sets Copyright © 2009 SciRes JSSM 403 200*500, 13270 pieces, 7976 pieces, 4525 pieces and 21 6 2 pieces separately, and their sparse degrees are 0.13, 0.08, 0.043, 0.021662 separately, see Figure 1. fectively than UBCF, and when the number of neighbors increases, users that are a little further from current user are also selected, which can increase error. That is to say, ASBCF is more effective than UBCF. From the results, we can see that in different sparse degrees, MAE of ASBCF is lower than that of UBCF, 5.2206% in average. That is to say, ASBCF performs better in every sparse degree than UBCF. However, as the data is sparse extremely, ASBCF’s accuracy will be reduced accordingly. 5. Conclusions This paper proposes a personalized recommendation (col- laborative filtering) algorithm based on Associative Sets. It generates a series of frequent itemsets through frequent itemsets generation algorithm, and then filters out some noise items that have little relevence with users by merg- ing, so as to make collaborative filtering algorithm more effective. It is proved that this algorithm is better than tra- ditional algorithm in recommendation accuracy. Although it takes more time to generate frequent items, it will not influence instant recommendations, as the generation can be performed offline. Support of frequent itemsets owns one kind of new information, which represents different items’ importance. If such information is used in collabo- rative filtering, forecasting accuracy will be improved, and this is the breakthrough point for further research. The other is to compare values of MAE with different numbers of nearest neighbors. Here we select 10, 20, 30, 40 and 50 nearest neighbors, and the test results can be seen in Figure 2. From Figure 2, we can see that ASBCF also performs better than UBCF with all kinds of nearest neighbor numbers. However, along with the increasing of neighbor number, their gap becomes smaller and smaller. Maybe it is because ASBCF can find nearest neighbors more ef- 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.13 0.08 0.043 0.02162 Sparse of Data MAE ASBCF UBCF REFERENCES [1] J. B. Schafer, J. A. Konstan, and J. Riedl, “E-Commerce recommendation applications,” Journal of Data Mining and Knowledge Discovery, pp. 115–153, 2001. [2] J. Breese, D. Hecherman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pp. 43–52, 1998. Figure 1. MAE of ASBCF and UBCF under different sparse degrees [3] L. Zhao, N. J. Hu, and S. Z. Zhang, “Algorithm design for personalization recommendation systems,” Journal of Com- puter Research and Development, pp. 986–991, August 2002. 0.71 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 10 20 3040 50 N umber of Neighbors MAE ASB C F UBCF [4] Y. Li, L. Liu, and X. F. Li, “Research on personalized recommendation algorithm for user’s multiple interests,” Journal of Computer Integrated Manufacturing Systems, pp. 1610–1615, December 2004. [5] J. W. Han and M. Kamber, “Data mining concepts and techniques (Second Edition),” Ming Fang, Xiaofeng Meng translated, China Machine Press, pp. 149–161, March 2007. [6] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item based collaborative filtering recommendation algo- rithms,” In Proceedings of the Tenth International World Wide Web Conference, pp. 285–295, 2001. Figure 2. MAE of ASBCF and UBCF with different num- bers of nearest neighbors |