Suppose you have two different stocks, we'll call them S and T. What we need to know is how much the prices of S and T track each other. That is, when S goes up, does T go up, go down, or stay the same. We'll start with an artificial and easy example. Below are some curves. We'll pretend that these curves are price curves for a few stocks. Of course, real stock price charts are a lot more jagged and random than these curves, but these curves will do for a little while.
There are four curves above colored blue (1), pink (2), yellow (3), and green (4). Let's look at them for a while and see what stands out. The first thing we notice is that the blue and yellow curves have exactly the same shape, except the yellow curve goes up higher and down lower. These curves move together perfectly. In mathematics we say they are perfectly correlated. If they were the prices of two stocks, in terms of risk and reward there would be no point to buying both of the stocks. You would make or lose money on either stock at exactly the same time.
The green curve is also just like the blue and yellow curves, except it goes down when the other two go up and up when the other tow go down. We say it has a correlation of -1. You can predict and of these three curves once you know the value of one of the curves. If these were stocks, we could use this fact to protect ourselves from price fluctuations. If we bought a bunch of yellow stock and a bunch of green stock, then the price fluctuations would cancel each other out and we would never lose money. The way these curves are set up we would never make money either. However, here we see that we can potentially select stocks so that they cancel out each other's risk. Things in the real market are never this clean and perfect, but there are opportunities in the market to locate pairs of investments where a substantial amount of risk cancels out and you are left with relatively risk free profit potential.
Finally, the pink curve seems to be completely uncorrelated with any of the other three curves. In fact, we'll see in a few minutes that the correlation of the pink curve with any of the other three is zero. This is again useful as a way to reduce risk - if we bought some pink stock and some blue stock the risks will frequently cancel out.
We can see correlations with our eyes. Now, the question is, can we calculate a number that expresses what we see? Yes. We remember from the previous page that we can calculate a number for a single stock called the variance. To calculate this we subtracted the average price from each daily price, and then multiplied that by itself:
Variance = Σ (p(i) - A)*(p(i) - A) / n.
For two different stocks we can easily calculate something called the covariance. We'll call their prices S(i) and T(i), and their average prices SA and TA.
Covariance( S, T ) = Σ (S(i) - SA)*(T(i) - TA) / n.
We won't ever use the covariance itself, but just as the variance led us to the far more useful standard deviation the covariance will lead us to the far more useful correlation.
Our covariance formula has a problem. If we had two stocks which fluctuated around their averages by a large amount, say up and down $50 over the course of a year, and another two stocks which fluctuated around their averages only a little, say up and down $10 over the course of a year, these stocks would have very different covariances. The first would include terms as large as $2500, whereas the second has terms no larger than $100. What we would like is a number which is 1 for perfect correlation, -1 for perfect anti-correlation, and 0 for no correlation. We can get these numbers easily by simply scaling the price fluctuations.
The price fluctuations, (p(i) - A), have an average value. More interestingly in statistics they have a root mean square value, which we call the standard deviation. If we divide the price fluctuations by the standard deviation then the price fluctuations of various stocks will all be scaled to a common size. So, here's the formula for correlation:
Correlation( S, T ) = Σ (S(i) - SA)*(T(i) - TA) / (σS * σT * n).
The correlation is just the covariance divided by the two standard deviations. This fits our criteria: the correlation of blue and yellow is 1. If you think about it, the correlation between any curve and itself is obviously 1. The correlation calculation finds the variance, the standard deviation squared, on the top, and divides this by the standard deviation squared. The correlation of green and yellow or green and blue is -1. The difference in size between blue and yellow no longer matter. The correlation between pink and any of the other three curves is 0. There is no correlation between pink and any of the others, pink marches to the beat of a different drummer.