So finally I understand how the Pearson Product Moment Correlation Coefficient, or also known simply as the correlation coefficient, works.
Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.
A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.
How do we get the correlation coefficient?
So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.
In other words, it is:
What does this mean?
Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.
Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.
In other words, the standard deviation is defined like so:
Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.
You calculate the z score of a value by taking the value (x) and subtract it from the mean (x̄) — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).
Thus, the z-score is:
So if you take the sum of the product of the z scores of the two variables and divide by the number of values, you get:
Now if you expand the Z scores for the two variables, then you get:
This is the same as:
Now if you take the standard deviation of a value x... (also apply this with the variable y)
And then substitute this into the equation (and while doing so also change µX to X̄ µY and to Ȳ), then you get: