Random technical stuff: Pearson Product Moment Correlation Coefficient

Thursday, August 12, 2010

Pearson Product Moment Correlation Coefficient

So finally I understand how the Pearson Product Moment Correlation Coefficient, or also known simply as the correlation coefficient, works.

Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.

A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.

How do we get the correlation coefficient?

So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.

In other words, it is:

What does this mean?

Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.

Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.

In other words, the standard deviation is defined like so:

Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.

You calculate the z score of a value by taking the value (x) and subtract it from the mean (x̄) — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).

Thus, the z-score is: