CAN THE SOUTH AFRICAN TWITTER MOOD BE USED TO PREDICT STOCK EXCHANGE INDEX MOVEMENT? A TEST USING THE JOHANNESBURG STOCK EXCHANGE ALL SHARE INDEX

A Research Paper presented to

The Department of Information Systems

University of Cape Town

By

Stiaan Maree

MRXSTI001

in partial fulfillment of the requirements of the INF4024W Information Systems Course

1 October 2012

Plagiarism Declaration

1. I know that plagiarism is wrong. Plagiarism is to use another’s work and pretend that it is one’s own.

2. I have used the APA convention for citation and referencing. Each contribution to, and quotation in, this paper“CAN THE SOUTH AFRICAN TWITTER MOOD BE USED TO PREDICT STOCK EXCHANGE INDEX MOVEMENT? A TEST USING THE JOHANNESBURG STOCK EXCHANGE ALL SHARE INDEX”,from the work(s) of other people, has been attributed and has been cited and referenced.

3. This paper “CAN THE SOUTH AFRICAN TWITTER MOOD BE USED TO PREDICT STOCK EXCHANGE INDEX MOVEMENT? A TEST USING THE JOHANNESBURG STOCK EXCHANGE ALL SHARE INDEX”,is my own work.

4. I have not allowed, and will not allow, anyone to copy my work with the intention of passing it off as his or her own work.

5. I acknowledge that copying someone else’s assignment or essay, or part of it, is wrong, and declare that this is our own work.

6. I have not falsified or manufactured any data, and declare that all data was ethically collected.

Signature:Date: 1 October 2012

Name: Stiaan Maree

TABLE OF CONTENTS

LIST OF FIGURES

List of Acronyms

ABSTRACT

1INTRODUCTION

1.1.Purpose/Aim of the study

1.1.1Main problem

1.1.2Sub-problem

1.2Rationale

1.3Research Method

1.4Context of the study

1.5Assumptions and limitations

1.6Ethical considerations

1.7Outline of the study

2Literature review

2.1Introduction

2.2Definitions and background

2.2.1Twitter

2.2.2Profile of mood states

2.2.3Twitter as a gauge for public mood

2.2.4Stock exchange prediction

2.2.5Johannesburg Stock Exchange (JSE)

2.2.6Event study

2.3Twitter mood predicts the stock market

2.3.1Hypothesis 1

2.4Twitter mood classified according to XPOMS predicts the stock market

2.4.1Hypotheses 2 to 7

2.5Conclusion of literature review

3Research methodology

3.1Research philosophy

3.2Research approach

3.3Purpose of the research

3.4Research timeframe

3.5Research method

3.6Research process

3.6.1Population

3.6.2Sampling

3.6.3Research instrument

3.6.4Data collection

3.6.5Data preparation

3.6.6Data analysis

3.7Ethics

3.8Limitations/Risks

3.9Validity and reliability

3.9.1External validity

3.9.2Internal validity

3.9.3Reliability

4RESEARCH ANALYSIS, FINDINGS AND DISCUSSION

4.1Description of sample data

4.2Analysis according to XPOMS (H2 – H7)

4.2.1Data overview – basic statistics

4.2.2Event study analysis

4.2.3Correlation

4.2.4Causality

4.2.5Prediction

4.3Analysis of Twitter mood (H1)

4.4Other analysis

4.5Summary of findings

4.6Discussion

5CONCLUSION

5.1Summary

5.2Discussion

5.3Recommendations

References

Appendix A: Dataset used for analysis

Appendix B1: Mood, POMS and XPOMS in English

Appendix B2: Mood, POMS and XPOMS in Afrikaans

LIST OF TABLES

Table 1: Hypothesis 1

Table 2: Hypotheses 2 to 7

Table 3: DateAll table

Table 4: DateWeek table

Table 5: Feeling terms: English and Afrikaans

Table 6: Mood score count calculation

Table 7: Score per day per mood

Table 8: Mood score per mood per day and JSE ALSI

Table 9: Addition of JSE lag

Table 10: Four levels of tests

Table 11: Descriptive statistics

Table 12: Descriptive statistics: first 19 days

Table 13: Descriptive statistics: last 20 days

Table 14: Means correlation: first 19 days, last 20 days, all (p < 0,05 in red)

Table 15: Standard deviation: first 19 days, last 20 days, all (p < 0,05 in red)

Table 16: Dataset Spearman rank correlation (p < 0,05 in red)

Table 17: Dataset Spearman rank correlation (p < 0,01 in red)

Table 18: Four null accepted hypotheses

Table 19: MATLAB Depression JSE Granger causality p-value

Table 20: Depression null hypothesis accepted

Table 21: MATLAB Fatigue JSE Granger causality p-value

Table 22: Fatigue hypothesis accepted

Table 23: Neural network results (TRUE = up)

Table 24: Twitter mood hypothesis accepted

Table 25: Spearman rank correlations of weekday data (p < 0,05 in red)

Table 26: Sub-theme research summary

Table 27: Summary of all hypotheses

LIST OF FIGURES

Figure 1: An example of a Tweet

Figure 2: Dependent and independent variables

Figure 3: Model of research process

Figure 4: Twitter Streaming API classes

Figure 5: Twitter Search API classes

Figure 6: Mood score calculation SQL query

Figure 7: Weighted score selection SQL query

Figure 8: Depression histogram

Figure 9: Tension histogram

Figure 10: Anger histogram

Figure 11: Vigour histogram

Figure 12: Fatigue histogram

Figure 13: Confusion histogram

Figure 14: News events mapped on Depression mood

Figure 15: Comparison of four moods and JSE ALSI

Figure 16: Depression mood and JSE ALSI

Figure 17: Fatigue mood and 1 day lag JSE ALSI

Figure 18: MATLAB Depression Granger causality

Figure 19: R Depression Granger causality

Figure 20: MATLAB Fatigue Granger causality

Figure 21: R Fatigue Granger causality

Figure 22: Neural network input, key and prediction variables

Figure 23: Neural network case table mapping

Figure 24: Neural network tables

Figure 25: Neural network model

List of Acronyms

ALSIAll Share Index

APIApplication Programming Interface

CSVComma separated value

DJIADow Jones Industrial Average

EMHEfficient Market Hypothesis

GBGigabyte

GPOMSGoogle Profile of Mood States

JSEJohannesburg Stock Exchange

JSONJavaScript Object Notation

POMSProfile of Mood States

RAMRandom-access Memory

RESTRepresentational State Transfer

RTRe-Tweet

SOFNNSelf-Organising Fuzzy Neural Network

SQLStructured Query Language

STFSocionomic Theory of Finance

UCTUniversity of Cape Town

XPOMSExtended Profile of Mood States

ABSTRACT

Several theories of stock market prediction have emerged and lost popularity over the years. The research question is: “Could one or more of the South African Twitter moods be used to predict the movement of the Johannesburg Stock Exchange (JSE) All Share Index (ALSI)?” The research is based on a paper published in 2011 by Bollen, Mao and Zeng who used the Dow Jones Industrial Average. The research is done in a South African context, analysing Tweets from South Africa, and focussing on the JSE ALSI. The research method was to download secondary data from Twitter’s Application Programming Interface (API), develop a model to extract public mood called Extended Profile of Mood States (XPOMS), and search for a causality effect of the mood on the closing values of the JSE ALSI.

Four of the moods did not produce any correlation. The mood Depression had a significant negative correlation with the same day’s JSE ALSI values, but the research found that there is rather a causality relationship from the JSE to the Depression mood. The major finding of the research was that there was a highly significant positive correlation between the Fatigue mood and the next day’s closing value of the JSE ALSI. A significant causality correlation was also found from the Fatigue mood to the JSE ALSI values, meaning that the movement of the JSE ALSI can indeed be predicted, using Twitter mood. The theoretical significance of the findings was that Bollen et al.’s 2011 results were replicated, although on different moods. The findings also supported the behavioural finance theory which states that public mood can influence the stock market.

1

1INTRODUCTION

“Our results indicate that the accuracy of Dow Jones Industrial Average (DJIA) predictions can be significantly improved by the inclusion of specific public mood dimensions but not others” (Bollen, Mao, & Zeng, 2011, p. 1).

The prospect of easy profits by predicting the stock market motivates researchers to formulate new models and methodologies (Atsalakis, Dimitrakakis, Zopounidis, 2011). Bollen, et al. (2011) have discovered that the movement of the DJIA can be predicted with 86,7% accuracy using Twitter mood. The research was based on the research by Bollen et al. (2011) and was done in a South African context, using Tweets from within South Africa, and the Johannesburg Stock Exchange (JSE) All Share Index (ALSI). The title of the research is:“Can the South African Twitter mood be used to predict stock exchange index movement? A test using the Johannesburg Stock Exchange All Share Index”.

1.1.Purpose/Aim of the study

The aim of the research was to find out whether one or more of the South African Twitter moods could be used to predictthe movement of the JSE ALSI. The research was based on prior research done by Bollen et al. (2011). To do this a model needed to be developed to extract the moods out of South African Tweets. The moods as defined by the model were then mapped against JSE ALSI values to see whether it was possible to use one or more ofthe South African Twitter moods to predict the movement of the ALSI.

1.1.1Main problem

The main problem was to determine if one or more of the South African Twitter moods could be used to predict the movement of the JSE ALSI.

1.1.2Sub-problem

The sub-problem was to determine if one or more of the South African Twitter moods, as classified according to theExtended Profile of Mood States (XPOMS), could be used to predict the movement of the JSE ALSI.

1.2Rationale

The study filled a gap mentioned by Bollen et al. (2011) forfurther research on using Twitter moods to predict stock market movements in a specific geographical area. The study will benefit investors who are trying to make better decisions about stock market predictions, as well as academics who study the predictability of the stock market or who study Twitter as a gauge for public mood.

1.3Research Method

The research was done by analysing secondary data downloaded, using two of Twitter’s Application Programming Interfaces (APIs) and Sanlam’s iTrade website. A modelcalled XPOMS, based on the Profile of Mood States (POMS), to extract the mood from Twitter data was developed, and quantitative analysis was done on the data, in order to accept or reject the hypotheses as drawn up after the literature review.

1.4Context of the study

Stock market prediction has attracted research, with none being able to fully predict the market as yet (Schumaker & Chen, 2009). Various studies have been done, including the use of neural networks (ZhangWu, 2009). Bollen et al. (2011) successfully predicted the movement of theDow Jones Industrial Average(DJIA) index by implementing aSelf-Organising Fuzzy Neural Network(SOFNN) adding Twitter mood data.

1.5Assumptions and limitations

The research is based on prior research done by Bollen et al. (2011). One of the limitations of the study is that none of the Bollen et al. (2011) datasets or models wereavailable for use. The researcher and supervisor tried to contact all three authors of the Bollen et al. (2011) paper to ask for the use of the information, but received no feedback. To test the researcher’s XPOMS model on Bollen et al.’s (2011) data and Bollen et al.’s (2011)Google Profile of Mood States (GPOMS) model on the researcher’s data could have added another dimension of reliability and validity to the research.

1.6Ethical considerations

The data used for the research included Tweets available publicly through some of Twitter’s APIs, and closing values of the JSE ALSI, which are also publicly available. TheUniversity of Cape Town (UCT) Faculty of Commerce Ethics in Research Committee approved the ethics application of the research.

1.7Outline of the study

The introduction (Chapter 1) is followed by a literature review (Chapter 2) which forms the theoretical basis for the hypotheses. Twitter, POMS, Twitter as a gauge for public mood, stock exchange prediction, the JSE and event studyare discussed, followed by the main theme and sub-themes. The hypotheses are developed from the main and sub-themes. The research methodology is discussed in Chapter 3, which details the research philosophy, approach, method and process (which contains the population and sampling discussion). Ethics and limitations are also detailed, followed by a discussion on validity and reliability of the data. Chapter4 contains the examination of the research findings, analysis and discussion. Quantitative analysis is done on the data and the hypotheses are tested against basic statistics, Spearman rank correlations tests, Granger causality tests and neural networks. A discussion on the interpretation of the findings ends Chapter 4. Chapter 5 concludes the research with a summary of the findings and recommendations for future research.

2Literature review

The reason for the literature review was to investigate the current state of knowledge on using Twitter mood to predict stock market movements, its limitations, and how the topic fits into its wider context (Saunders, Lewis & Thornhill, 2009). This led to the development of the hypotheses that formed the basis for the research.

2.1Introduction

Since Twitter’s inception in 2006, it has seen a tremendous growth, with 140 million users posting 340 million Tweets daily (Rios & Lin, 2012). Users Tweet on everything, including the weather, sports results and their own feelings. Behavioural finance has proven that financial decisions are driven by mood and emotions (Bollen et al., 2011; Subrahmanyam, 2007). Bollen et al.’s (2011) research was to find out if public mood, as measured by Twitter, influences stock market movements and whether, in fact, Twitter mood can be used to predict these stock market movements.

In terms of the structure of the literature review, the first discussion addressed definitions and background. Twitter was briefly discussed, as well as POMS, Twitter as a gauge for public mood and general stock market predictions. The JSE and event study was examined to provide detailed background. The first research problem was then addressed, namely to find out whether Twitter mood can be used to predict the stock market. The second level of research problems were then discussed, which included classifying Twitter mood according to XPOMS, and finding out if these moods have any effect on the stock market.

The search for this literature was conducted in the following way. It started with an article titled “Twitter mood predicts stock market” by Bollen et al. (2011). This article was found on Google Scholar when simply searching for the term “Twitter”. Bollen et al. (2011) did their research on the Dow Jones, using Tweets from around the world. Bollen et al. (2011) called for future studies to factor in location and language. Other articles were found from the reference section of the Bollen et al. (2011) article. Articles that referenced the Bollen et al. (2011) article were also found on Google Scholar. Twitter’s website was also searched for information that could be useful for the study.

2.2Definitions and background

2.2.1Twitter

“Twitter is an Internet social-network and micro-blogging platform with both mass and interpersonal communication features for sharing 140-character messages, called Tweets, with other people, called followers” (Chen, 2011, p. 755). There are 140 million active users on Twitter, posting 340 million Tweets a dayglobally (Rios & Lin, 2012). In South Africa there are 1,1 million registered Twitter users (Vermeulen, 2012). Figure 1 shows an example of a Tweet.

Figure 1: An example of a Tweet

2.2.2Profile of mood states

The POMS is a low-cost, user friendly instrument whose factor-analytical structure has been validated numerous times, has been used in hundreds of researches and has been normed for various populations (Pepe & Bollen, 2008). The POMS questionnaire (McNair, Lorr,& Droppelman, 1971) measures six dimensions of mood, namely tension-anxiety, depression-dejection, anger-hostility, vigor-activity, fatigue-inertia and confusion-bewilderment (Pepe & Bollen, 2008).

Various causes exist for the different moods, for example, when examining depression, the following facts have been found. Late life depression can be caused by financial strain (Arean, 2012) and financial problems are also mentioned by Lauber, Falcato, Nordt andRossler (2003) as the seventh highest cause of depression, listed higher than isolation, unemployment and phobia. The moods, classified according to POMS, can also be treated, for example by exercise, which has a positive effect on all six mood states (Berger & Motl, 2000). The moods can also cause other behaviour, for instance, as described in behaviour finance, the influence on financial decisions(Subrahmanyam, 2007).

2.2.3Twitter as a gauge for public mood

Mood can be tracked with the use of large-scale public surveys, but the accuracy is limited to the degree with which the indicators correlate with public mood (Bollen et al., 2011). Great improvements in the use of social media to track public mood have been made recently, using Twitter (Bollen et al., 2011).

2.2.4Stock exchange prediction

A generation ago, academic financial economists accepted the Efficient Market Hypothesis (EMH) as the leading theory for stock market prediction (Malkiel, 2003). “This hypothesis is associated with the view that stock market price movements approximate those of a random walk. If new information develops randomly, then so will market prices, making the stock market unpredictable apart from its long-run uptrend” (Malkiel, 2005, p.1). The implication of the hypothesis is that stock market trends can only be predicted with 50% accuracy, according to EMH (Bollen et al., 2011).

Lately, however, economists doubt the efficiency of EMH (Malkiel, 2005), and Bollen et al. (2011) identified two issues with EMH. The first problem with it is that several studies have concluded that stock prices can, to a certain degree, be predicted and do not follow a random walk (Bollen et al., 2011; Malkiel, 2003). The second problem is that, although news is unpredictable, early indicators can be extracted from social media to predict economic indicators (Bollen et al., 2011).

Other theories of stock market prediction have since emerged such as Socionomic Theory of Finance (STF), behavioural economics and behavioural finance (Bollen et al., 2011). Subrahmanyam (2007) ties mood to stock market changes through behavioural finance, and Edmans, Garcia andNorli (2007) noticed that stock market changes can be influenced by sporting events, which affect the country as a whole. “This suggests that investor mood (ostensibly negative on cloudy days) affects the stock market” (Subrahmanyam, 2007, p. 17).

Behavioural finance can be defined as follows: “Behavioural finance combines behavioural and financial theory with the aim of analyzing the psychology, behaviour and mood involved in financial decision-making, meaning the results of such research fall within the realms of both psychology and finance” (Wang, Lin & Lin, 2012, p. 696).

2.2.5Johannesburg Stock Exchange (JSE)

The Johannesburg Stock Exchange (JSE), established in 1887, is South Africa’s stock market and the biggest in Africa (Eita, 2012). The market capitalisation grew from $ 151 billion in 1998 to 182,6 billion in 2004, with 427 listed companies, making it the seventeenth largest in the world (Eita, 2012). Initiatives to improve the JSE’s functioning were introduced in the late 1990s, including the Stock Exchanges Control Act, an electronic clearing and settlement system and a real-time stock exchange news service (Eita, 2012). The research focused on the ALSI of the JSE. “The JSE All Share Index is calculated based on component share prices that are averaged according to specific rules which are impacted by stock splits and dividends” (Campbell, 2011, p. 4).