A Study of the Relationships Between Jump Statistics and Trading Volume

A Study of the Relationships Between Jump Statistics and Trading Volume

A study of the relationships between Jump Statistics and Trading Volume

Pongpitch Amatyakul

Econ 201FS

Spring 2009

*I have adhered to the Duke Community Standard in completing this report

1.Introduction

Jumps are rare financial events in which asset prices move drastically from one moment in time to the next. Ever since Merton (1981) figured out that pricing of assets should not just incorporate the continuous process, but should also take into account the jumps that occur.Jump detection has been a hot area of research in the past decade due to more accessible high frequency data and better computer processors.

This paper aims to find the relationship between trading volume and jumps. Trading volume, along with price quotes, are probably the two indicators easiest to observe in the stock market. Relating an easily observed variable such as volume to jumps could provide some groundwork for some future interesting applications. According to Tauchen and Pitts (1983), volume and price changes are related. Price changes are what most of the estimators used in jump tests are based on, so there could be some relationship between jumps and volume.Different jump tests from literature will be used and compared. Each jump test calculates a statistic at which is compared to a threshold in order to detect jumps on each day. That statistic will be regressed against volume in order to find any existing relationships.

The remainder of this paper is organized into sections as followed. Section 2 describes the high-frequency stock data and daily volume that is used. Section 3 discusses market microstructure noise and the sampling interval used in this paper. Section 4 describes the jump tests and the corresponding statistics used to find correlation with volume. Section 5 introduces the volume data and key trends related to volume. Section 6 describes the regression done to test the relationship between volumes and jumps. Section 7 is the result of the regression from section 6. Finally section 8 provides concluding remarks.

2. Data

The data used for this paper is based on the minute-by-minute price quotes from the commercial vendor, price-data.com. The stocks chosen were the 10 largest companies (based on market capitalization) in the S&P 100 index as of 31 December 2008 that has the full set of data for the last 12 years. The ten stocks include Procter and Gamble (symbol: PG), General Electric (symbol: GE), AT&T (symbol: T), Johnson and Johnson (symbol: JNJ), Microsoft (symbol: MSFT), Pfizer (symbol: PFE), JP Morgan Chase (symbol: JPM), International Business Machines (symbol: IBM), Cisco Systems (symbol: CSCO), and Coca Cola (symbol: KO). The time period is between 9:35am to 4:00pm of every trading day from April 1997 to January 2009. The number of days ranges from 2921 days to 2925 days for the ten stocks.

The volume data is obtained from Google finance. It is the number of shares that were traded per day, everyday in alignment with the price quotes.

3.1 Minimizing market microstructure noise

In Hansen and Lunde (2006) the market microstructure noise is defined to be

(1)

where p*(t) is the latent real log price and p(t) is the observable log price in the market at time t. For many of the estimators used in high frequency analysis, such as the realized variance, we are interested in the change in observable prices.

(2)

The magnitude of the change in latent real log price decreases as  approaches zero but the change in the noise is independent of . So this means that as we make  closer and closer to zero, all we observe is the change in noise.

Market microstructure noise poses a very large problem to high frequency data. With noise, some of the estimators such as the realized variance and bipower variation could be biased. According to Merton (1981), the best way to estimate the realized variance is to observe continuous price movements without noise. In theory, if there is no noise, then it is best to sample at the highest possible frequency. However, in reality, this market microstructure noise play such a big role at the highest frequencies that this bias is not tolerable.

A simple graphical tool called ‘signature volatility plot’ was created by Andersen, Bollersleve, Diebold, and Labys (1999). The idea used here is that variance is independent of the sampling frequency at which prices are observed. In the signature volatility plots, average realized variance (or the equivalent annualized standard deviation) is plotted against different sampling frequencies. The idea is that this should be the same if noise does not play a role. Andersen et al (2000) found that if the stocks are very liquid, as it is the case in all of the stocks used in this paper, then the shape of the volatility signature plots will be downward sloping. Realized variance is basically the sum of the squared of the returns. The returns are represented by equation 2. As the sampling frequency increases, the term dominates the noise term and RV is no longer biased.

Figure 1 is an example of a volatility signature plot from one of the ten stocks chosen for this paper, Coca Cola. The average of each day’s realized variance is converted into annualized standard deviation and plotted against sampling frequency which ranges from the most fine possible given the data (1-minute) to 20 minutes. Based on this and several other plots using other stocks, it was decided that the sampling frequency should be about 10 minutes in order to lower the noise and still be able to obtain all the basic price movements.

3.2 Tick Size and Signature volatility

This section aims to find out whether the sampling interval should remain constant throughout the sample. Tick size is the minimal increment in which the stock price can move. In theory if the price increment is , the maximum possible error due to rounding is one half of. If the real price distribution is uniform and can take any value in the increment, then the average error due to rounding will be one fourth of.

Hansen and Lunde (2006) found that noise plays a much smaller role after the tick size dropped from 1/16 of a dollar to decimals in 2001. There were two tick sizes changes in the data. The first one was only two months into the data in June 1997 when the tick size was changed from 1/8 of a dollar to 1/16 of a dollar. The second time was in January 2001 when the tick size was decreased to 1 cent.

This naturally divides the data into three parts. Coca-Cola was again the stock chosen, along with Procter and Gamble, to plot the volatility signature plots for the different time periods. It is difficult to compare the three plots even on the same graph because of the natural change in volatility during the three time periods. In Table 1, the change in volatilities compared to the volatility in one minute sampling interval was tabulated.

Table 1 showed that drop-off of volatility is most prevalent in the time where the tick size is highest and that the drop-off is much lower in the last 8 years where the tick size is in cents. For Coca cola, just changing from sampling at 1 minute intervals to 5 minute intervals decreased volatility by 37 percent in 1997, compared to just about 10 percent for the whole set of data. This could be misleading because the data set is smallest in the first interval, only 36 days, compared to over 1900 days from the interval from 2001 to 2009. For PG, the dropoff difference between when the tick size was to 1/8 of a dollar to 1/16 of a dollar is very slight and the drop-off is greater for 1/16 dollar, again, possibly due to the small sample size in the first interval. Although it is relatively clear that the minimum changes in volatility for all the sampling frequencies occur after the tick size had been reduce to 1 cent.

Although it might be better to approximate the different estimators such as RV using a higher sampling frequency after each tick size change, market microstructure noise consist of more than just rounding error and tick size changes. There are other parts of it that is not fully captured such as the bid-ask spread and information asymmetry. Therefore, in this paper, the sampling interval of 10 minutes was used, unless stated otherwise.

4. Jump Test and Jump Test Statistics

In recent years, with greater computing power and readily accessible high-frequency data, there have been several tests to see whether stock price movements contain discontinuous processes. The most notable ones include Mancini (2006), Lee and Mykland (2006), Barndorff-Nielsen and Shepherd (2006), Jiang and Oomen (2008), and Aït-Sahalia and Jacod (2008). The last three were selected for this paper to explore the relationship between volume and jumps.

First, let’s define some variables that will persist throughout the entire paper. Let s(ti) be the real price at time ti. p(ti) is the logarithmic price.

(3)

Return at each time period is defined to be as followed:

(4)

4.1 Barndorff-Nielsen and Shepherd Test

Barndorff-Nielsen and Shepherd test (2004, 2006) is one of the most commonly used jump test in literature. It used the relationship between realized variance and bipower variation in order to detect rare jumps in stock prices. If there are no jumps, bipower variation and realized variance asymptotically approach the integrated variance as the sampling frequency reaches infinity.

If each sampled data is represented by the letter i and there are M number of sampled data per day, the daily realized variance is defined as the following.

(5)

In the limit, the realized variance goes to the integrated variance plus the jump component, κ.

(6)

Bipower variation is defined to be the following.

(7)

In the limit, this bipower variation goes just to the integrated variance.

(8)

These asymptotic properties only hold at high sampling frequencies, as is the case in the data for this paper. Our interest lies in the jump component. So in order to isolate the jump term, it is natural to obtain the difference between the realized variance and bipower variation. Huang and Tauchen (2005) created this variable called relative jump.

(9)

In order to normalize this value so that we can compare them through different days, integrated quarticity has to be found. Barndorff-Nielsen and Shepherd (2006) recommended using quadpower quarticity.

(10) where

In the limit

(11)

With these information, Huang and Tauchen proposed the ratio max-adjusted test statistic which is a one sided normal with variance one.

(12)

This is the test statistic that will be used later on in order to find the relationship between volume and jump days. The higher the statistic, the higher probability that the day contains a jump.

4.2 Jiang and Oomen Test

Based partially on the work of Barndorff-Nielsen and Shepherd, Jiang and Oomen (2008) devised their own method to test for jump days. It is based on a variable in which they called ‘swap-variance.’ It is the equivalent of a delta hedged log contract when there is no discontinuity.

Let’s define some variables. Capitalized R is the geometric return of the real price.

(13)

Swap variance is defined to be

(14)

Where ST is the price at the end of the day and S0 is the opening price.

In the probability limit, swap variance and realized variance should be equal if there is no discontinuity. The sign of the difference between swap variance and realized variance is also significant. A negative result means that a negative jump occurred and a positive result means that a positive jump occurred.

The problem is again to normalize the test statistic so it is distributed normally and thus easy to test whether that day is a jump day or not. Jiang and Oomen came up with three simple test statistics that are distributed standard normal. This paper picked only one of the three and the ratio test was picked. The formula of the ratio test is as followed.

(15)

Basically the formula uses the ratio of realized variance over the swap variance and, since it is supposed to be 1, it is subtracted from 1. The normalization part is done by multiplying bipower variation and dividing through by the omega term, defined below

(16)

where μa= and x is normally distributed with mean 0 and variance 1.

The test statistic that will be used to find the relationship between volume and jump will be the absolute value of equation 15. Since a jump will occur if the absolute value of ratt is greater than some threshold.

4.3 Aït-Sahalia and Jacod Test

Aït-Sahalia and Jacod (2008) introduces a new and different method to detect jumps. They take the sum of the absolute returns and put it to a high power. This paper uses power=4. It also uses two different sampling intervals: one at  and the other one at k. This paper uses  equals to 5 and k equals to 2.

The estimator B is defined to be:

(17)

The test statistic is:

(18)

This test statistic S should converge to 1 if there is a presence of jump/jumps and will converge to or 2 for the specifications done in this paper. Under the null hypothesis of no jump, the test statistic is less than another statistic c. In order to build the c statistic, it is easiest to define some more variables.

The variable O is defined as:

(19)

is defined the same way as in Equation 16, , where both X and Y are standard normals independent of each other. The following variable A, is the equivalent of Barndorff-Nielsen and Shepherd’s multipower variation

(20)

where v=p/(p+1) and q=p+1.

Another variable V is defined to be:

(21)

With those variables defined, we can now define c

(22)

If the statistic S in equation 18 is less than c in equation 22, then a jump is detected. So the statistic chosen to find the relationship between volume and jump day is c-S. The higher the c-S statistic, the more chance that the stock will jump on that day.

5. Notes on volume

Volumes versus times were plotted for all the 10 stocks. Normally volume is in the range of millions to tens of millions per day. This means that all the stocks have very high liquidities, as expected since they are the top stocks in the S&P 100.One of the trends observed is that the volumes spike up at the end of 2008 and early 2009. This is true for all the stocks except IBM. Overall, more stocks have an upward trend through the sample.

There are a few things to note for individual stocks. Pfizer and AT&T seemed to have a rather large upward trend in the data set. IBM’s volumes spike up the most around 2000 and never reach that height again. Microsoft and Cisco’s volume trends are very similar; they have very high volatilities and look the same throughout the sample except for the last few months.

It was of interest to see what the volume trends were for each day of the week. This was plotted in figure 2 for 4 randomly selected stocks: Coca Cola, Johnson&Johnson, Procter and Gamble and ATT&T. The main trend seen here is that volume is lowest on Monday. It seemed like the volume is randomly distributed on the other days. Some simple regression of volume on each of the days of the week was done for several of the stocks and the results seemed to confirm that. The only coefficient that’s almost always statistically significant is Monday and the coefficient is always negative. This will play a role in the regression, which follows.

6. The regressions

Now that all of the statistics had been defined, it is time to find the relationships between the jump statistics and trading volume. The first model will be just a simple one variable regression of the log of volume on the jump statistics.

(23)

(24)

(25)

This first model probably suffers from omitted variable biased. There are more factors that affect volume and volume are not independent of each other. It could be easily observed from the earlier plot in section 5 that volumes tend to cluster. High volume days tend to follow high volume days. It was also argue in section 5 that Monday probably has an effect on volume and should be included as one of the regressors. As a result, a new model incorporating these variables is created. This model includes the volume of the day before, the volume of the week before and the dummy (0 or 1) variable for Monday, as follows:

(26)

(27)

(28)

7. Results

The results of the simple regressions corresponding to equations 23 through 25 are tabulated in table 2. So for the Barndorff-Nielsen and Shepherd test, five of the ten stocks have a negative relationship between volume and jump statistic that is significant at the 5 percent level. The other five are just not significant. As for the Jiang-Oomen swap variance test, all the coefficients are significant except for Microsoft. Seven showed a negative relationship between the probability of jumping and volume on that day, and two showed a positive relationship. For the last test, the Aït-Sahalia-Jacod test, nine of the relationships were significant and all pointed towards a negative correlation between jump statistic and volume.

So a simple regression suggests a negative relationship between jump statistic and volume. So as stated before in section 6, this is hardly a complete enough model. A model with more regression variables was introduced in equations 26-28 and the result of that multiple regression is presented in table 3. First thing to note is that the adjusted R-squared is considerably higher than the previous model. It is at least ten times higher than the R-squared in the simple model. Let’s observe the coefficients for the newly included variables. The Monday variable is significant at the 5% level for all of the stocks except Microsoft and Cisco. As noted earlier, these two stocks have unique volume trends in which the variability day to day is largest in the sample. For every other stock in the sample, if the day is a Monday, then the trading volume tends to decrease by 7 to 10 percent.