Random technical stuff: August 2010

Tuesday, August 31, 2010

My app in Ubuntu crashed, but I didn't have debugsymbols...

Say you've enabled Apport on Ubuntu, and you dutifully report segfaults to Launchpad.

Well, normally they won't do you much good without debug symbols. But the bug is already filed! What do you do?

Easy.

$ sudo apt-get install apport-retrace
$ sudo apport-retrace -o /tmp/trash \
> /var/crash/_usr_lib_openoffice_program_soffice.bin.1000.crash

Saturday, August 14, 2010

Active Directory trusts

An excellent article on AD trusts can be found at Tech Republic. Another one can be found at Redmond Magazine.

Friday, August 13, 2010

For posterity

The following is a very interesting slashdot post on the issues of the NBN and Gigabit speeds...

Expensive, if done right.

Thursday, August 12, 2010

Pearson Product Moment Correlation Coefficient

So finally I understand how the Pearson Product Moment Correlation Coefficient, or also known simply as the correlation coefficient, works.

Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.

A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.

How do we get the correlation coefficient?

So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.

In other words, it is:

What does this mean?

Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.

Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.

In other words, the standard deviation is defined like so:

Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.

You calculate the z score of a value by taking the value (x) and subtract it from the mean (x̄) — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).

Thus, the z-score is:

So if you take the sum of the product of the z scores of the two variables and divide by the number of values, you get:

Now if you expand the Z scores for the two variables, then you get:

This is the same as:

Now if you take the standard deviation of a value x... (also apply this with the variable y)

And then substitute this into the equation (and while doing so also change µ_X to X̄ µ_Y and to Ȳ), then you get:

And down the rabbit hole I go...

Currently I am reading the book "Statistics for the Utterly Confused". So far it's been excellent - for instance, I now have a good understanding of standard deviation, variability, percentiles and even box-plots.

And so now I start learning about bivariate data, and of course I get to learning about the correlation coefficient. Or more specifically the Pearson product-moment correlation coefficient, of which they present the following spectacular and complex formula:

Of course, there is no attempt to explain why this formula works. So now I'm looking up Wikipedia on this topic. Sadly, Wikipedia does not have a good broad overview of this that I can see, and assumes prior knowledge, so I'm now in the process of following a trail of topics (from bottom to top):

And, as Wikipedia doesn't seem to want to explain Vectors well, I'm also reading the following introduction to vectors.

Update: When I did a Google search, as it turns out there is actually a much simpler version of this formula (who knew?) that is explained here.

Tuesday, August 10, 2010

Undocumented optimizer feature in SQL Server 2005/2008

An interesting series of posts has been done by Paul White...

This is the most interesting...

Random technical stuff