Twitter mood predicts the stock market

Authors: Johan Bollen, Huina Mao, Xiaojun Zeng

Publication, Year: Journal of Computational Science, 2011

Link to Paper

Notes by: Matthew R. DeVerna


Twitter mood predicts the stock marketOverall FindingsIntroData and Methods OverviewDataPhases of Analysis1. Mood assessment2. Test Hypothesis, "Public mood is predictive of future DIJA values"3. Can existing prediction models improve DJIA prediction?Generating Public Mood Time Series: OpinionFinder and GPOMSOpinion Finder (OF)Google-Profile of Mood States (GPOMS)NormalizationCross-validating OF and GPOMS against large socio-cultural eventsNon-linear models for emotion-based stock predictionFindingsDiscussionIgnored factors / limitations:

Overall Findings

Intro

It is therefore reasonable to assume that the public mood and sentiment can drive stock market values as much as news.

Data and Methods Overview

Data

Phases of Analysis

1. Mood assessment

2. Test Hypothesis, "Public mood is predictive of future DIJA values"

3. Can existing prediction models improve DJIA prediction?

Generating Public Mood Time Series: OpinionFinder and GPOMS

Opinion Finder (OF)

Google-Profile of Mood States (GPOMS)

The score of each POMS mood dimension is thus determined as the weighted sum of the co-occurrence weights of each tweet term that matched the GPOMS lexicon

Normalization

For example, the z-score of time series , denoted is defined as:

Where and represent the mean and standard deviation of the time series within the period .

The mentioned z-score normalization is intended to provide a common scale for comparisons of the OF and CPOMS time series. However, to avoid so-called "in-sample" bias, we do not apply z-score normalization to the mood and DJIA time series that are used to test the prediction accuracy of our Self-Organizing Fuzzy Neural Network. This analysis and our prediction results rest on the raw values for both time series and the DJIA.

Cross-validating OF and GPOMS against large socio-cultural events

Based on the results of our Granger causality (shown in Table 2), we can reject the null hypothesis that the mood time series do not predict DJIA values, i.e. with a high level of confidence. However, this results only applies to 1 GPOMS mood dimension. We observe that (i.e. Calm) has the highest Granger causality relation with DJIA for lags ranging from 2 to 6 days (-values<0.05). The other four mood dimensions fro GPOMS do not have significant causal relations with changes in the stock market, and neither does the OpinionFinder series.

Non-linear models for emotion-based stock prediction

Represents the DJIA values and represents the values of the GPOMS mood dimensions 1, at time , , and . According to the same notation represent combination of historical DJIA with mood dimensions 3, 4, 5 and 6 at time , , and . For example, represents a set of inputs that includes the DJIA values , , and , and mood dimensions 1 and 6 at the same times.

Findings

Discussion

Ignored factors / limitations:

  1. Not designed to limit the analysis by geographical location
  2. Though some form of cross-validation was conducted on the mood measurements, there is no knowledge of "ground truth" for public mood states so there is a leap of faith being taken here
  3. There is no causative mechanism connecting online public mood states with DJIA values addressed in this study, despite the correlation between public mood states and Twitter feeds that is shown.

 


1 E. Gilbert, K. Karahalios, Widespread worry and the stock market, in: Fourth International AAAI Conference on Weblogs and Social Media, Washington, DC, 2010, pp. 58–65, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1513/1833.