Spearman Rank Correlation

Introduction

What is Spearman's Rank Correlation Coefficient ?

Spearman's Rank Correlation Coefficient is used to discover the strength of a link between two sets of data. In this example we will look at the strength of the link between the distance across a meander and the depth of the river. When written in mathematical notation the Spearman Rank formula looks like this :

Hypothesis

The hypothesis must be written in a clear and concise way so that other people can easily understand the aims of the investigation.

We would expect to find that the depth of the water increases as distance from the inner bank of the meander increases. In other words, the deeper water will be found on the outside of the meander.

What Can Go Wrong ?

Having decided upon the wording of the hypothesis, the researcher should consider whether there are any other factors that may influence the study.

Looking again at the example of river depth and distance from the bank, the following additional factors could be considered. This is not a full list, just a group of examples.

a) Is this a 'natural' river or has it been subjected to human intervention such as flood control, dredging or water extraction?

b)What is the river bed made of? Rivers running on solid rock are less likely to demonstrate depth changes due to erosional forces than those running over softer materials such as alluvium or pebbles.

c) How far apart are the measurements going to be ? A spacing of 50cm was used in the example since this is often a good compromise between work load and amount of detail. Spacing of 10cm would give more detailed information, but would require many more measurements.

The researcher should mention such factors in their project. Reference should always be made to any factors which may influence the results of the investigation.

Gathering The Data

The hypothesis is written, the study area has been chosen and as many as possible of the potential problems have been solved. The practical part of the research can now begin.

Before you start gathering data be sure you know exactly what you need to record. Decide upon a way in which you will write down your results,and make sure that you do write them down immediately. Remember that conditions can change from day to day, so try to collect data under similar conditions all the time. This is especially true when dealing with rivers, where a change in weather can dramatically alter the data being gathered.

Presenting The Data

Getting Results

The first thing to do is to enter the data you have recorded.
Distance across the river from the bank is easy since it is a progression in 50cm jumps. The depth readings for each location are then entered in the depth column.

Ranking is achieved by giving the ranking '1' to the biggest number in a column,'2' to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This should be done for both sets of measurements.

The remainder of the table can then be filled in. Lets look at each entry in turn.

1) Difference in ranks : This is the difference between the ranks of the two values on each row of the table. The rank of the second value (depth) is subtracted from the rank of the first (distance from the bank).
Using our example table and looking at the values recorded 300cm from the bank you can see that the width is ranked 5, and the depth is ranked 6.
This gives a difference in ranks of 5 - 6 = -1.
To remove such negative values, square the difference in the next column. This removes any negative numbers.

a)Now to put all these values into the formula.

Find the value of all the d² values by adding up all the values in the Difference² column. In our example this is 4. Multiplying this by 6 gives 24.

b) Now for the bottom line of the equation.

The value n is the number of sites at which you took measurements. This, in our example is 10. Substituting these values into n³ - n we get 1000 - 10.

c) We now have a formula like this... ……..R² = 1 - (24 / 990) …….which gives a value for R² of 0.9757.

This value doesn't mean much on its own. It must be looked up on a Spearman Rank significance Table. In our example with the numer in our sample taken into account, the value 0.9757 gives a significance level of better than 0.05. That means that the probability of the relationship you have found being a chance event is less than 1 in a 20. You are over 95% certain that this result was not obtained by chance.