Problem Solving and Analysis Tools
CORRELATION
QUALITY TOOLS
Correlation
Description of Correlation
A correlation is a statistical technique, degree and an index of the relationship strength between any two or more quantities (variables) in which they vary together over aperiod and it shows whether and how strongly pairs of variables are related. Possible correlations range from +1 to –1. It does not prove or disprove any cause-and-effect (causal)relationships between them.
Although correlation is fairly obvious that data may contain unsuspected correlations. One may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of data.
Types
There are three types of correlations that are identified:
1. Positive correlation:
When an increase in one variable leads to an increase in the other and a decrease in one leads to a decrease in the other. For example, the amount of money that a person possesses might correlate positively with the number of cars he owns.
2. Negative correlation:
When an increase in one variable leads to a decrease in another and vice versa. For example, the level of education might correlate negatively with crime. This means if by some way the education level is improved in a country, it can lead to lower crime. Note that this doesn't mean that a lack of education causes crime. It could be, for example, that both lack of education and crime have a common reason: poverty.
3. No correlation:
Two variables are uncorrelated when a change in one doesn't lead to a change in the other and vice versa. For example, among millionaires, happiness is found to be uncorrelated to money. This means an increase in money doesn't lead to happiness.
Techniques in determining Correlation:
· Pearson Product Moment Correlations (or "r"), most commonly-used method,
assume the two variables being considered are measured on continuously- measured
scales (like the numbers 1, 2, 3, 4, 5, 6, 7 or height or weight).
· Spearman Rank Order Correlations (or "rho") and Kendall's Tau-b (or "tau")
Correlations are used when the variables are measured as ranks (from highest-to-
lowest or lowest-to-highest).
How to use Correlation
Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable datain which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color.
Rating Scales
Rating scales are a controversial middle case. The numbers in rating scales have meaning, but that meaning isn't very precise. They are not like quantities. With a quantity (such as dollars), the difference between 1 and 2 is exactly the same as between 2 and 3. With a rating scale, that isn't really the case. One can be sure that your respondents think a rating of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly halfway between. This is especially true if you labeled the mid-points of your scale (you cannot assume "good" is exactly half way between "excellent" and "fair").
Correlation coefficient:
The main result of a correlation study is called the Correlation coefficient(r). It ranges from -1.0 to +1.0. A value close to +1 indicates a strong positive correlation while a value close to -1 indicates strong negative correlation. A value near zero shows that the variables are uncorrelated.
Graphical/Pictorial presentation of “Positive correlation coefficients”(r):
Graphical/Pictorial presentation of “Negative correlation coefficients”(r):
Graphical/Pictorial presentation of “No correlation coefficients”(r):
While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes then easier to understand. The square of the coefficient (or r square) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49).
Following guidelines have been proposed to interpreting Pearson's correlation coefficient.
Coefficient,rStrength of Association / Positive / Negative
Small / .1 to .3 / -0.1 to -0.3
Medium / .3 to .5 / -0.3 to -0.5
Large / .5 to 1.0 / -0.5 to -1.0
Remember that these values are guidelines and whether an association is strong or not will also depend on what is to be measured.
When to use the Correlation
· Correlation is used to find a linear relationship between two variables. It can be
used in a causal as well as an associative research hypothesis but it can't be used
with an attributive RH because it is univariate.
· Correlation is used for testing in Within Groups studies
· Economic theory and business studies relationships between variables
· Correlation analysis helps in deriving precisely the degree and direction of such
relationships.
· The effect of correlation is to reduce effect of uncertainty of predictions and these
predictions are more reliable and near to reality.
Tips on use of Correlation
· The variables must be either interval or ratio measurements.
· The variables must be approximately normally distributed.
· There is a linear relationship between the two variables.
· Outliers are either kept to a minimum or are removed entirely.
Applications of Correlation
· Relationships between height and weights
· Relationships between quantum of rainfall and wheat
· Relationships between price and demand of commodity
· Relationships between dose of insulin and sugar level
ExamplesHeight and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights.
An example of a curvilinear relationshipis age and health care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults. Multiple regressions (can be used to examine curvilinear relationships.
Two scatter plots are given below showing the amount of sleep needed per day by age and its correlation by estimating a line of best fit. It can been be noticed as one grow older, less sleep is needed but obviously, a 40 year old needs more that 2 hours of sleep/day. This example proofs that prediction may be carried out up to a certain time but not for all
References
· Edwards, A.L. "The Correlation Coefficient." Ch.4 in An Introduction to Linear Regression and Correlation.San Francisco, CA: W.H. Freeman, pp.33-46, 1976.
· Gonick, L. and Smith, W. "Regression." Ch.11 inThe Cartoon Guide to Statistics.New
York: Harper Perennial, pp.187-210, 1993.
· Snedecor, G.W. and Cochran, W.G. "The Sample Correlation Coefficient" and "Properties of." §10.1-10.2 inStatistical Methods, 7th ed.Ames, IA: Iowa State Press,
· Spiegel, M.R. "Correlation Theory." Ch.14 inTheory and Problems of Probability and Statistics, 2nd ed.New York: McGraw-Hill, pp.294-323, 1992.