The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.
Since the early 1990s the Internet has exploded as a means by which people communicate information in various forms. Between 1990 and 2000, it was estimated that Internet traffic was doubling every three to four months [
Due to its large volume of data flow, data mining in social networks has become a popular research field, with sentiment analysis being an area of particular interest. The users of a social network can frequently be split into distinct groups based on common interests. By distinguishing between these groups it is possible to model their overall sentiment as a representative of a larger population, using the sub-denomination in a particular OSN as a sample. Sentiment analysis inspects data presented by individuals within the larger groups and, given a sample, allows for the determination of the overall attitude or opinion of that group towards certain topics.
Pfitzer, Garas, and Shweitzer [
In a study by Garas, Garcia, Skowron, and Shweitzer [
The language of tweets is unique due to the 140-character limit imposed upon individual posts, causing users to often utilize shorthand notation as well as emoticons in sentiment expression [
Most closely related to the research done in this paper is the work of Aston, Liddle, and Hu [
Similarly we aim to apply the Modified Balanced Winnow algorithm to social network sentiment classifica- tion in data stream and to implement online feature selection in conjunction with this in order to account for changing data over time.
Three datasets were used for sentiment analysis on data streams. We also incorporated online feature selection to improve runtime and reveal the change in feature importance over time.
Traditional studies in sentiment analysis have primarily used batch mode learning to repeatedly traverse rela- tively small datasets. However, in most real world applications of algorithms designed to analyze sentiment from OSNs, the datasets are much larger and constantly changing due to the nature of online social networking. The combination of the sheer size and dynamic nature of OSNs makes batch learning an impractical solution [
In the case of OSN sentiment classification using a streaming algorithm is a more efficient solution. Analyz- ing data in a stream allows posts to be processed in real time as they appear on social networks. Running algo- rithms in a data stream environment means that it is only viable to perform a single pass over the data. This me- thod results in decreased accuracy, but means that we gain a considerable improvement in runtime. For these reasons as well as the ability to handle small changes in data streams over time [
For the purpose of this research we utilized three public datasets: Sanders Corpus [
Sanders Corpus consists of 5513 manually classified tweets. As a large number of these have become unavaila- ble since the creation of the dataset and because certain tweets were irrelevant to our study of sentiment analysis the total number used was reduced to 3320. We were interested in two subsets of the reduced Sanders Corpus, specifically analyzing tweets for positive or negative sentiment. In the positive/negative subset all tweets with a neutral label were removed from the dataset, then the remaining tweets are analyzed for positive or negative sentiment overall.
This dataset contains 2034 tweets, which are hand labeled according to their sentiment, either positive or nega- tive. They are assigned sentiment values of either 0 or 4 based on how negative (0) or positive (4) the tweet is. This dataset annotates the tweets and entities (target subject) separately, allowing for finer sentiment of tweets containing annotated entities.
This dataset is composed of 4242 posts from 5 different social media sites (MySpace, YouTube, Digg, BBC, Runners World) which have been manually labeled according to their sentiment, with a score being given for both positive and negative sentiment appearing in the post. Each post is assigned two scores, one from −1 to −5 representing how negative the post is with −5 being the most negative, and another from 1 to 5 representing how positive the post is with 5 being the most positive.
Because we are representing the posts as n-grams, each one has 95n features. As a result of the large number of features for each post, classification takes a great deal of time, which is problematic in a data stream environment. For this reason it is advantageous to reduce the number of features that must be considered when performing classification. In addition to time consumption, many features of each post do not play a significant role in the classification of the sentiment. Thus we performed online feature selection to discover the top features of each gram representation [
The Winnow algorithm is a mistake driven machine learning technique that uses a promotion parameter α in or- der to train on pre-labeled instances [
. The instances are labeled with boolean values and classified as either 0 or 1. Each instance is input to the learner, classified, and finally compared to the actual class label. The process of the Winnow algorithm is outlined briefly in Winnow Algorithm below
Winnow Algorithm |
---|
For each instance i: 1. If prediction(i) = true class a. Continue 2. If prediction(i) = 0 and true class = 1 a. Multiply weight matrix entries by α 3. Else if prediction = 1 and true class = 0 a. Multiply weight matrix entries by 0 |
Balanced Winnow extends upon the Winnow algorithm by adding a demotion parameter β. It utilizes a promo- tion parameter α > 1, a demotion parameter 0 < β < 1, and a threshold θ. For each instance trained on, if the in- stance is predicted correctly then the training step is skipped, if it is predicted as 1 with an actual class 0 the weight matrix is multiplied by the demotion parameter, and if it is predicted as 0 with actual class 1 the weight matrix is multiplied by the promotion parameter.
We used MBW to classify posts. Like Balanced Winnow, MBW requires a promotion parameter α and demotion parameter β. It is separated from Balanced Winnow by utilizing larger margins and a minor change in the update rules for the weight matrices [
. Like Balanced Winnow, MBW requires a promotion parameter α and demotion parameter β. It is separated from Balanced Winnow by utilizing larger margins and a minor change in the update rules for the weight matrices [14] . MBW is outlined in detail in the figure Modified Balanced Winnow. We in- itially trained the MBW classifier over the first 100 instances. For each following instance we first classified it using MBW; if the classification was correct we continued to the next instance and updated the correct count. If it was incorrect we updated the incorrect count and updated the weight matrix
Modified Balanced Winnow |
---|
Once data had been collected and a data stream simulation was set up we used the MBW algorithm in conjunc- tion with massive online analysis (MOA) in order to train and run a classifier. We present two sets of results, one on sentiment analysis and the other on feature selection.
The Sanders corpus, SentiStrength, and STS_Gold datasets were all analyzed for sentiment. We achieved the highest accuracy on STS_Gold, with the lowest accuracy on the Sanders corpus.
We performed dynamic online feature selection on each of the utilized datasets. For each feature we calcu- lated the importance score as
MBW requires four different user defined parameters (promotion, demotion, feature selection, good feature se- lection), which may lead to a decrease in accuracy when chosen randomly. Therefore we performed extensive checks over these parameters to reveal possible good ranges for each. It was determined that the larger the gram size used, the more accurate our prediction became, since larger gram sizes reveal more of a word than smaller gram sizes. Also, when using feature selection and good feature selection, we noticed that accuracy was greatest with the more features and good features used, as shown in
Our division created 22 segments (or timestamps) for Sanders.
A high accuracy is seen when α and β values increase together from around {1, 1} onward. By incorporating dynamic feature selection in our MBW we achieved an accuracy of 73.3% while [
The work of [
. Sentiment prediction accuracy of Sanders 5 grams check over percent of features to use (rows) to percent of good features to use (columns). Color ranges with accuracy low to high (red to green)
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 | |
---|---|---|---|---|---|---|---|---|---|---|
0.1 | 58.3 | 61.6 | 64.2 | 65.4 | 66.6 | 67.4 | 66.9 | 68.8 | 69.8 | 71.8 |
0.2 | 58.9 | 61.7 | 64.4 | 65.4 | 67.1 | 66.7 | 68.4 | 68.6 | 69.8 | 72.6 |
0.3 | 60.2 | 62.9 | 64.4 | 66.5 | 67.5 | 67.7 | 68.9 | 68.4 | 69.4 | 71.9 |
0.4 | 59.6 | 62.7 | 64 | 66.1 | 66.8 | 67.6 | 68.8 | 69.8 | 70.3 | 73.3 |
0.5 | 60.9 | 62.5 | 65.1 | 66.1 | 66.9 | 67.9 | 68.4 | 70.7 | 69.3 | 71.7 |
0.6 | 61.2 | 63.6 | 65 | 64.5 | 67 | 68.1 | 69.9 | 68 | 68.8 | 72.6 |
0.7 | 61.4 | 63.2 | 64.3 | 65.3 | 68 | 69.1 | 67.3 | 69 | 69.9 | 72.2 |
0.8 | 63.4 | 65.2 | 64.5 | 65.5 | 67.7 | 66.3 | 68.4 | 69.2 | 69.3 | 72.6 |
0.9 | 64.8 | 66.5 | 66.7 | 65.4 | 68.2 | 65.6 | 67.8 | 68.3 | 69.5 | 72 |
1 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 | 67.4 |
Sanders 5 grams top 20 features as they change over time. X axis denotes the timestamps and the Y axis denotes the importance of each feature. Note: box is a space character
Sentiment Strength 5 grams top 20 features as they change over time. X axis denotes the timestamps and the Y axis denotes the importance of each feature. Note: box is a space character
The SentiStrength dataset was divided into 112 sets of size 100 each. We achieved an accuracy of 73.6% using promotion and demotion values of 0.7 and 1.4 respectively. Features such as “love”, “you”, “happy”, “thank”, and “good” contain the highest importance through this dataset, shown in
The promotion (α) and demotion (β) table of SentiStrength 3 grams representation revealed that values of α and β which optimize the accuracy of prediction seem to follow a logarithmic pattern from α in the range 1 - 3 and β in the range 0.2 - 0.9. In the STS_Gold dataset high accuracy is seen when α is greater than 1 but less than 3 and β is less than α.
In terms of accuracy with feature selection, it is best to use all features. In Sanders and STS_Gold, the differ- ence in accuracy from using all features to using 10% of the features is around 10%, while with SentiStrength, the difference is only around 2% to 3%.
We discovered that the values of promotion and demotion and the feature selection contribute independently to the sentiment prediction accuracy. The varying range of accuracy resulting from differing percentages of selected features makes it difficult to determine what a good feature selection percentage should be without a significant decrease in accuracy. Our MBW achieved the highest accuracy of 87.5% with STS_Gold on 5 grams representation and only 73.3% on 3 grams and 73.6% on 5 grams for Sanders and Sentiment Strength respectively. Our results for Sanders are close to the accuracy of [
We thank Houghton College for its financial support.