Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filtering is one of the most promising recommendation technologies. However, the existing collaborative filtering methods don’t consider the drifting of user’s interest. This often leads to a big difference between the result of recommendation and user’s real demands. In this paper, according to the traditional collaborative filtering algorithm, a new personalized recommendation algorithm is proposed. It traced user’s interest by using Ebbinghaus Forgetting Curve. Some experiments have been done. The results demonstrated that the new algorithm could indeed make a contribution to getting rid of user’s overdue interests and discovering their real-time interests for more accurate recommendation.
Recent years, like Facebook, Twitter, short texts are very popular in the social field all over the world. One of the most prominent short texts is micro-blog in China. Depending on the advantage of brief, real-time in information sharing, spreading and acquisition, weibo gains sharp development and begins to influence people’s lives and their way of thinking. In July 2014, 34th China Internet network development state statistic report [
With the influx of large quantities of users, weibo surged in a short time. People have lost in the ocean of microblog information already. In the fast-pace today, how to acquire the most accurate information needed by users in the shortest time, has become a hot issue nowadays.
At present, there are two main recognized way to solve the problem of information overload: information retrieval and information filtering technology. Represented by Google, Yahoo, information retrieval technology has indeed achieved great success. However, it draws on the requirement that users must be able to accurately describe their personal needs. Once users cannot describe their demands well, information search quality of it cannot be guaranteed, which often leads to search results undesirable. Information filtering technology can solve this problem very well. As an important application of information filtering, recommendation system has become an indispensable part of individualized information service form among the new generation of Web applications. Collaborative filtering algorithm (CFA) is the most efficient recommendation algorithm at present. CFA analyzes user’s interest and finds others who have the same interest with him and then integrates these similar users’ evaluation with some information and forms recommendations for him. It is quite precise on locating users’ interest. It can also filter some concepts complex and indescribable, which is obvious superior to other algorithms. However, CFA can’t make a distinction between real-time interest and overdue interest well, which results in an unsatisfactory precision. This paper gives a new algorithm, time weight algorithm (TWA), which can tell user’s real-time interest well and improve the precision of recommendation.
The rest of paper is organized as follows: Section 2 presents the research status home and aboard. Section 3 gives the preliminary concepts, regarding forgetting curve and the details of TWA. Section 4 analyzes the experiments results. Section 5 concludes and gives the pointer to the future work.
As a new thing, weibo filtering has not caused widely concern relatively. Western scholar Ernesto [
Under this background, this paper proposed TWA based on Ebbinghaus Forgetting Curve [
German psychologist H. Ebbinghaus studied carefully and systematically about the phenomena of memory loss and made a forgetting curve using the testing results from the experiments about featureless syllables and letters. This is the famous Ebbinghaus Forgetting Curve, shown in
Among
Weibo behavior of a man is a reaction of his psychology. So changing of user’s interest on publishing, transmitting and commenting a weibo also follows this forgetting law. As the shape of forgetting curve much mat- ches exponential function, we use exponential function to simulate user’s interest changing over time. Yu Hong [
where,
According to man’s oblivious nature, we divide man’s interest into real-time interest which includes long-term interest and recent interest, and overdue interest. Then use TWA based on Forgetting Curve to better explore user’s real-time interest and get rid of his overdue interest to improve the precision of personalized recommendation. TWA formula is as follows:
where,
We analyze the rationality of formula (2) from three aspects.
1) User has involved frequently on the theme
situation describes user’s interest rightly.
2) User has involved frequently on the theme k in the past and recently. This shows that user has been interested in
3) User has involved frequently on the theme
all published in the recent, have short time intervals with the current time, that is to say,
smaller than the first situation, so
The algorithm of improved ITC [
mation of the term in a category
been demonstrated that it is very useful and efficient on short text classification. In our algorithm, it is used to acquire the weight of each item preliminarily. The process description of our new algorithm is given as follows:
The flow chart of this algorithm is given in
Data set in this paper is grasped from the open platform [
For those weibo that user re-tweets or comments, we regard the previous weibo and the content of user’s comment as its real content. Then we begin to pre-process. Pre-processing covers eliminating the stop list and function words, such as “haha”, “too”, “also” and so on. We treat weibo whose number of words after pre- process less than 5 as pointless weibo and wipe out it. In addition, this experiment is only for Chinese content. If weibo is completely foreign language, we will wipe it out, too. Thus, after pre-process, our data set still contains 4981 weibo. Then, we select 14 randomly from 15 users and regard their 4707 weibo which were tweeted
by them in recent 3 months as training set. 274 weibo published by another one comes into being test set. The classification of training set and test set is shown in
We try to manually annotate weibo of training set to tell user’s real-time interest and overdue interest. After training and calculating, we reach when m take 0.4, the result simulated by formula (2) is the most similar to the result we mark. What’s more, we depend on the statistics and set threshold. Then, we bring
According to TWA based on forgetting curve, we compare the weight of weibo with threshold. If the weight is greater than threshold, we set the result of weibo as 1. Otherwise set its result as −1. Then put all weibo under a theme
In this paper, we take Precision, Recall and MAPE as evaluation criterion.
Precision is the radio of the number of related documents which were retrieved and the number of all do- cuments which were retrieved. It measures the precision of a recommendation system. Recall is the radio of the number of related documents which were retrieved and the number of all related documents. It measures the comprehensive radio of a recommendation system. Their formulas are shown as follows:
where,
MAPE (Mean Absolute Percentage Error) measures the precision of algorithm by computing the mean absolute percentage error between predicted value and true value. The smaller the value of MAPE, the smaller the gap of predicted value and true value, which means predicting much closer to the true choice of user, the higher precision of recommendation. We make prediction score set of user
Experiment one show the traditional difference of Precision and Recall between TWA and improved ITC. We set two classifications on this experiment. The result is shown in
For a better intuitive effect, we give histograms of
Number of user | Number of weibo | |
---|---|---|
Data set | 15 | 4981 |
Training set | 14 | 4707 |
Test set | 1 | 274 |
Economy | Tourism | Health | Education | IT | Car | |
---|---|---|---|---|---|---|
Economy | 39 | 1 | 2 | 0 | 1 | 1 |
Tourism | 0 | 43 | 2 | 0 | 1 | 2 |
Health | 1 | 0 | 50 | 1 | 0 | 1 |
Education | 2 | 3 | 3 | 14 | 0 | 0 |
IT | 2 | 2 | 1 | 1 | 11 | 0 |
Car | 0 | 2 | 2 | 0 | 0 | 29 |
Recall | 0.830 | 0.811 | 0.769 | 0.778 | 0.714 | 0.829 |
Precision | 0.886 | 0.843 | 0.833 | 0.875 | 0.769 | 0.879 |
Economy | Tourism | Health | Education | IT | Car | |
---|---|---|---|---|---|---|
Economy | 35 | 1 | 2 | 0 | 1 | 1 |
Tourism | 0 | 42 | 2 | 0 | 1 | 2 |
Health | 1 | 0 | 50 | 1 | 0 | 1 |
Education | 2 | 3 | 3 | 12 | 0 | 0 |
IT | 2 | 2 | 1 | 1 | 11 | 0 |
Car | 0 | 2 | 2 | 0 | 0 | 23 |
Recall | 0.814 | 0.808 | 0.769 | 0.75 | 0.714 | 0.793 |
Precision | 0.875 | 0.840 | 0.833 | 0.857 | 0.769 | 0.852 |
Experiment two shows the difference between improved ITC and TWA at capturing user’s real-time insterest. For MAPE, we set
From
Traditional collaborative filtering algorithms haven’t consider sufficiently about the change of user’s interest.
Algorithm | MAPE |
---|---|
Improved ITC | 0.333 |
TWA | 0.167 |
This leads to a big difference between the result of recommendation and user’s real demands. Under this context, we propose the TWA based on collaborative filtering algorithm. As the experimental result suggests that TWA is obviously prefer to other traditional algorithms on the precision and it can promotes the quality of recommendation in a large extent. It can do user more effective personalized recommendation indeed. However, for the limitation of Sina Micro-blog open platform, our privilege is so low that we can only test for 15 users one time, which inevitably leads to the experimental subjects slightly single. We expect that Sina Mirco-blog open platform could open more user privilege in future. Thus we can trace and test more users in real-time. It can not only further improve the precision of personalized recommendation, but also be the highlight of our next work.