The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Sentiment analysis, an important part of text mining, attempts to learn about the authors’ opinion on a text through its content and structure. Such information is particularly valuable for determining the overall opinion of a large number of people. Examples of the usefulness of this are predicting box office sales or stock prices. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to predict a sentiment value for stock related tweets on Twitter, and demonstrate a correlation between this sentiment and the movement of a company’s stock price in a real time streaming environment. Both n-gram and “word2vec” textual representation techniques are used alongside a random forest classification algorithm to predict the sentiment of tweets. These values are then evaluated for correlation between stock prices and Twitter sentiment for that each company. There are significant correlations between price and sentiment for several individual companies. Some companies such as Microsoft and Walmart show strong positive correlation, while others such as Goldman Sachs and Cisco Systems show strong negative correlation. This suggests that consumer facing companies are affected differently than other companies. Overall this appears to be a promising field for future research.
Over the last several years there has been an explosion of growth and new activity in social networking. Various companies such as Facebook, LinkedIn, Reddit, Pintrest, and Twitter have grown exponentially in recent years. The amount of data exchanged between users on these sites is staggering. On Facebook alone on an average day in 2014 there are 4.75 billion items being shared, 4.5 billion items “liked”, and 300 million photographs being uploaded. That translates to over 500 terabytes of data generated by Facebook users on a single day [
Sentiment analysis focuses on determining the opinion of a speaker on the particular topic about which he is speaking. The most basic structure for sentiment analysis is a single word, unfortunately based on sentence structure and words with context dependent meanings, techniques that ignore sentence structure or bag of words models often fail on smaller texts. A solution to this is constructing parse trees which identify the structure of a sentence as a binary tree by separating distinct phrases. In this case using the sentiment of each word in the tree can take into account clause structures and the possibility of multiple meanings. In cases where a larger text must be analyzed, it can be treated as a collection of smaller phrases, or as a larger bag of words. Opinions are usually classified somewhere between positive or negative often with some stratification between the two. This can be done numerically or categorically. When division is categorical, it usually distinguishes between positive, negative, and sometimes neutral sentiments, otherwise the numerical classification falls somewhere on a continuum between positive and negative. These classifications can be used to determine and aggregate the sentiment of a large number of authors on a given topic.
Because of sarcasm, and even simple negations can completely reverse the predicted sentiment from the actual opinion represented, parse trees are the most accurate method of determining the sentiment of sentences. One of the most recent methods of sentiment analysis, published by Stanford [
This Stanford model makes use of deep neural networks. Deep neural networks are an expansion on early neural networks such as Perceptrons. Advances in hardware processing speeds, particularly graphics processing units, as well as an increasing interest in parallel processing have brought resurgence in the use of artificial neural networks by enabling the addition of hidden layers of neurons, and backpropagation. The additional layers allow these models to become more highly non-linear fitting closer to the data, while backpropagation enhances training efficiency on labeled data in deep neural networks. Such networks have won numerous competitions in pattern and image recognition competitions over the last five years, and appear to have great promise in improving classification accuracies in most areas of machine learning.
One topic of user sentiment that can be easily checked for correlations between public opinion and public behavior is that of stock price prediction. There are two basic methods for predicting whether the price of a given stock will rise or fall, fundamental, and technical analysis. Fundamental analysis relies on the financial data of the company to make assessments of financial stability, growth potential, and inherent value. This value can be matched against the current market price. If the estimated real value is higher than the current market price, it is believed that the company is undervalued and that the stock is more likely to rise than to fall. Similarly if the estimated value is lower than the market price, it is assumed that the stock is overvalued and that it is more likely to fall than to rise [
Technical analysis takes a different route. Rather than focusing on financial data, technical analysis uses historical price data to make predictions about the expected direction of price change in the future. Frequently observed patterns that appear to occurring such as head and shoulders or double tops, as well as recent trends such as channels and uptrends are used to predict future prices. Another way technical analysis seeks to predict prices is through observing the behavior of others. One factor in technical analysis is buying and selling at the same time as company insiders, and buying and selling opposite odd-lot traders. The idea behind this is that company insiders are best acquainted with the company's prospects, and make the most educated buying and selling decisions. Meanwhile odd-lot traders are almost always individual rather than professional traders and generally lack a strong investing background. Their trades are negatively correlated with trades by company insiders, and generally result in buying when prices are high, and selling when prices are low. By buying and selling opposite them, one can often buy when prices are low and sell when prices are high. [
Text analysis and more specifically sentiment detection could provide an insight for investor and general public opinion on a company and its stock price on a large scale. This insight could provide more information for use in analysis techniques similar to those currently supported by technical analysis. This could be a promising method for determining the relationship between human evaluations and stock price apart from the apparent underlying values of companies uncovered by fundamental analysis.
There has been a significant amount of research into text analysis, including sentiment analysis, as well as some interest in utilizing these tools for prediction through Twitter, however up until now these projects have primarily worked with text analysis and sentiment prediction more generally. This is one of the unique difficulties of the problem of detecting investor sentiment on Twitter. Since tweets expressing clear sentiment about a stock can look either objective or simply noisy to general models. For example one collected tweet reads “$MSFT bullish…” which has little natural meaning, however in the context of the jargon particular to securities markets, this tweet expresses clear positive sentiment towards Microsoft Corporation. Such difficulties necessitate the construction of a sentiment classifier particular to this field of study. General models such as the Stanford NLP sentiment classifier discussed in the introduction, however, can still be immensely valuable in providing a basic framework for a context specific classifier.
There are three primary means of representing text in statistical textual analysis. These are n-gram, vector space modeling, and character streams. The first of these techniques, n-gram representations, has been around for decades and provides the simplest most straightforward method of representing text based on simple word or character sequence counts. Vector space models are a more complex and far more recent development with the most popular implementation, “word2vec”, having been created within the last few years. Finally character stream techniques are the most recent development in the field with the first viable model having been published mere months before this writing. As such this final technique, though an extremely promising avenue for progress in the field, has been omitted from this research. Its likely impact however is significant enough to justify inclusion in any overview of textual representation techniques.
N-gram representations are based on simple character or word sequence counts. In these techniques a full corpus of related text is parsed, and every appearing character or word sequence of length n is extracted to form a dictionary of words and phrases. For example the text “the quick brown fox jumps over the lazy dog” has the following 5-gram word features: “the quick brown fox jumps”, “quick brown fox jumps over”, “brown fox jumps over the”, “fox jumps over the lazy”, and “jumps over the lazy dog”. Similarly the text “g2g ttyl” has the following 5-gram character features: “g2g t”, “2g tt”, “g tty”, and “ttyl”. Every text in the corpus can then be easily marked as a vector based on a simple count of the number of times each phrase in the dictionary occurs in the text. The main advantages of this technique are its simplicity, and flexibility to specifically match the corpus of text being studied [
Vector space techniques require a substantial set of text cleaned so as to include only the words of the language. The spatial relationship between the words is then analyzed as described by Mikolov et al. [
Character stream modeling requires only a dictionary of valid characters to learn from a text. This enables it to be language agnostic to the point that the same algorithm can work effectively on languages as diverse as English and Chinese when provided adequate training data. This is a huge advancement over previous methods which often maintain rigid requirements for the language they are designed to model. So far this technique demonstrates superior predictive power to other methods albeit at the cost of increased computational complexity [
In studying sentiment analysis on noisy and biased data, it was found that a multilevel classification model can provide more robust and accurate predictions in difficult data sets [
There have also been several deep learning approaches to sentiment classification on twitter that have been specialized to account for the relatively limited data available in a text with a maximum of one-hundred forty characters. These studies have found that a combination of specially chosen metadata and textual features, along with more traditional analyses such as n-grams can provide a more accurate classification model than simple features alone [
There is some precedent for using aggregated sentiment analysis from Twitter data to make predictions about the behavior of a population. Previous research has demonstrated a significant connection between the overall sentiment of Twitter users towards new movie releases and the box office sales figures [
In order to perform a comparison between stock price changes and Twitter sentiment it was necessary to collect data on both trading values and tweets related to companies with timestamps to properly match the two into a consistent stream. In order to narrow down the range of data needing collection this study focused on the thirty companies of the Dow Jones Industrial Average. Data was collected from both sources over a period of several months from November 2014 through March 2015 utilizing specialized APIs. Only the data collected between February 6th, and February 18th was fully evaluated due to computational limitations and gaps in tweet data caused by throttling and inconsistent network connectivity.
Twitter provides its own API through which developers may obtain limited streams of live tweets. This interface was used through the Twitter4j java library to filter all available live tweets for any containing any complete company name, or ticker symbol on the Dow Jones Industrial Average. All tweets matching the filter were saved
along with all available metadata including timestamp, sender, geotagging, retweet status etc. The number of tweets per day, and the distribution of tweets between companies are shown in
Because all of this information was collected in a real time streaming environment with very brief time windows, and no modifications to the data, this approach lends itself well to the type of moment by moment analysis that must be conducted for technical stock analysis.
A sampling of these tweets was selected and manually labeled with sentiment values. According to this sample the distribution of sentiments can be found in
After collecting raw text data from it was also necessary to compute the sentiment value for each tweet since it
is this sentiment which may relate closely to the stock price. Initially the Stanford NLP Sentiment Classifier was used to predict the sentiment of each tweet. In order to evaluate the accuracy of these predictions it was necessary to prepare a set of tweets labeled with true sentiment values. This was done manually to ensure accuracy, and as such the labeled set consisted of only one thousand tweets. Due to the nature of the parser and its primary training on movie reviews and newspaper articles, it was particularly inadequate for the task performing with approximately 30% accuracy.
In response to this we constructed my own classification models, one using n-gram, and the other “word2vec” textual representation techniques to preprocess raw text before using a standard random forest model for classification. Each of these models performed with accuracy between 60% and 70% on the labeled data set, an acceptably high level of accuracy for textual sentiment analysis on such short texts.
Since there are only two variables involved for each company, namely sentiment and price, Pearson’s Correlation Coefficient can readily demonstrate a connection between the two. In order to calculate the correlation between these values, sentiment values over five minute increments had to be aggregated. This allows a pairing of the sentiment over a five minute period with the value of the stock after five minutes. Unfortunately these numbers are of very different kinds. The sentiment value takes into account only the previous five minutes, while the stock value at specific moment takes into account everything that has occurred before hand. In order to create a more proper comparison two techniques were used. The first changes price values to account only for the last five minutes by using the price change since the previous measurement, while the second uses a running total of sentiment to help sentiment values aggregate beyond their five minute periods. Each of these techniques still leaves values in very different ranges; to correct this both sequences were normalized to fall between zero and one. Once this normalization was complete it was possible to calculate correlations between these series for each company. Unfortunately there remained a significant amount of noise because of the limited data available for any given five minute interval, and the variability of readings. To correct this, a moving average was used with a window length of one day, and a step size of one hour. This significantly smoothed both curves and made results far more readable. This made it possible to plot these variables over the recorded time range to visually evaluate the relationship between price and sentiment for each company. A sampling of these calculations and charts may be found it the next section.
This section describes the results obtained through the methodologies described in the preceding section. The foundational problem to this study was the sentiment classification which was utilized by all subsequent testing methods. The confusion matrices of the classification using n-gram and vector space representations are shown in
These matrices show that using the n-gram representation predictions on positive, neutral, and negative tweets had accuracies of 55.4%, 84.6%, and 34% respectively. Similarly the same accuracies using the word2vec representation were 42.4%, 88.4%, and 12.1%. Preference for neutral sentiment is due to the overall probability of a given tweet having neutral sentiment. Since most tweets are neutral the classification model errs towards neutral predictions. This leads to a prediction biased towards positive since positive prediction is higher than negative prediction, however given sufficient training data this would not be an issue as this bias is caused by the occurrence of more positive tweets than negative ones in the sample data. The predictions remain in proportion with the true labels of the training data.
Results of the correlation analysis were very mixed across companies, as may be seen in
Predicted Label | ||||
---|---|---|---|---|
positive | neutral | negative | ||
True Label | positive | 179 | 126 | 18 |
neutral | 70 | 461 | 14 | |
negative | 17 | 70 | 45 |
Predicted Label | ||||
---|---|---|---|---|
positive | neutral | negative | ||
True Label | positive | 137 | 173 | 13 |
neutral | 47 | 479 | 16 | |
negative | 18 | 98 | 16 |
Company (symbol) | word2vec representation correlation | n-gram representation correlation | Company (symbol) | word2vec representation correlation | n-gram representation correlation | |
---|---|---|---|---|---|---|
WMT | 0.848094 | 0.847447 | XOM | −0.41886 | −0.45982 | |
MSFT | 0.856233 | 0.844525 | MRK | −0.4751 | −0.51843 | |
UTX | 0.841241 | 0.839601 | KO | −0.53817 | −0.52242 | |
UNH | 0.796927 | 0.815517 | MCD | −0.52038 | −0.52473 | |
GE | 0.787573 | 0.796093 | INTC | −0.51058 | −0.62937 | |
IBM | 0.902387 | 0.745618 | T | −0.69758 | −0.68944 | |
DD | 0.725684 | 0.698454 | MMM | −0.73474 | −0.71081 | |
AXP | 0.58826 | 0.668651 | NKE | −0.73293 | −0.73339 | |
PFE | 0.796572 | 0.598452 | BA | −0.73247 | −0.75375 | |
CVX | 0.40015 | 0.424398 | JPM | −0.79279 | −0.75673 | |
TRV | 0.489148 | 0.38869 | HD | −0.78832 | −0.78254 | |
VZ | 0.376637 | 0.380748 | DIS | −0.86191 | −0.81751 | |
JNJ | 0.334807 | 0.286778 | CSCO | −0.91567 | −0.89665 | |
PG | −0.28735 | −0.33896 | GS | −0.90957 | −0.90044 | |
V | −0.39149 | −0.37326 |
This is important as high correlations are not necessarily significant where both lines are simply moving in the same direction throughout.
In a study of correlation between Twitter sentiment and stock price, it is expected to have three possible outcomes: positive, negative and neutral correlation. The main contribution of our work is the identification of those companies that are in each of the three categories during the time period of our investigation. The correlation has been shown to be strongly positive in several companies, particularly Walmart and Microsoft which are primarily consumer facing corporations. There is of course not a uniform connection between sentiment and price across all companies. Based on promising results in sentiment to price correlations on company groups
from previous studies and on the strong correlation between sentiment and price for certain companies in this study, we believe that further research on the correlation between sentiment and stock prices is warranted.
We would like to thank Houghton College for its financial support.