Tuesday, December 28, 2010

My greatest claim to fame

Sadly, my greatest claim to fame is that I invented the [citation needed] tag. I'll stop sobbing in a corner one day.

P.S. if anyone writes [Citation needed], then [Citation provided].

Markov Chain generator

As it turns out, with a sufficiently dense input text, a Markov Chain generator is actually reasonably easy to implement.

From page 62 of The Practice of Programming:

set w1 and w2 to the first two words in the text

print w1 and w2


randomly choose w3, one of the successors of prefix w1 w2 in the text
print w3
print w1 and w2 by w2 and w3
repeat loop

Sunday, December 26, 2010

Sargability of monotonic functions in SQL

An excellent writeup by Quassnoi about search arguments of monotonic functions and how most query engines don't handle them, and how they should.

Wednesday, December 1, 2010

SQL Server Management Studio 2008 R2

While it's a great idea to include a debugger to step through T-SQL code, it's not very well done. I frequently get a window handle error, and then I either need to reboot client or restart SSMS.

Microsoft - do some QA testing before release cool new features like this. It's tantamount to baiting me!

Thursday, October 21, 2010

Sequence tables in SQL Server 2005

Interesting post by Paul White about Sequence Tables in SQL Server.

Tuesday, October 19, 2010

Mistake in Wikipedia article on the Builder pattern

The following article on the Builder Design Pattern on Wikipedia, is sort of OK, except that the Java example provided is inaccurate.

The abstract PizzaBuilder class uses abstract functions. That's not really recommended - if you subclass this then it means that you have to implement all the functions. But really, the pattern is meant to be more flexible than that - if you don't want to use the function then you shouldn't have to use it. Therefore, as the Gang of Four say in their book on page 101, the build methods should be left intentionally empty and not declared as abstract functions, "letting clients override only the operations they are interested in".

Monday, October 18, 2010

ICloneable: Microsoft's big mistake

So here's the deal: if you implement ICloneable in a public API, you are a making a big mistake. Why? Because there's no way of knowing if the Clone() function will do a deep or shallow copy.

As it turns out, even Microsoft know that they have made a mistake with this interface. Oopsy!

Wednesday, September 15, 2010

Poor man's LAG() and LEAD()

How to do LAG() and LEAD() in SQL Server 2008.

Recursive queries vs. nonrecursive queries; and consistent reads vs. currently reads

Recursive reads vs. non-recursive reads: best to quote Tom Kyte on this one!

non-recursive statements are statements issued by the client to the server.

recursive statements are statements executed by that non-recursive call.

non-recursive statements can be sql, or plsql - they are just "a statement submitted by client to server"

recursive statements can be plsql, or sql - they are just "a statement executed by the statement submitted by the client to the server"

Consistent reads vs. current reads - got the following from the Oracle-l mailing list from Mark Boback:

A 'db block get' is a current mode get. That is, it's the most up-to-date
copy of the data in that block, as it is right now, or currently. There
can only be one current copy of a block in the buffer cache at any time.
Db block gets generally are used when DML changes data in the database.
In that case, row-level locks are implicitly taken on the updated rows.
There is also at least one well-known case where a select statement does
a db block get, and does not take a lock. That is, when it does a full
table scan or fast full index scan, Oracle will read the segment header
in current mode (multiple times, the number varies based on Oracle version).

A 'consistent get' is when Oracle gets the data in a block which is consistent
with a given point in time, or SCN. The consistent get is at the heart of
Oracle's read consistency mechanism. When blocks are fetched in order to
satisfy a query result set, they are fetched in consistent mode. If no
block in the buffer cache is consistent to the correct point in time, Oracle
will (attempt to) reconstruct that block using the information in the rollback
segments. If it fails to do so, that's when a query errors out with the
much dreaded, much feared, and much misunderstood ORA-1555 "snapshot too old".

Tuesday, August 31, 2010

My app in Ubuntu crashed, but I didn't have debugsymbols...

Say you've enabled Apport on Ubuntu, and you dutifully report segfaults to Launchpad.

Well, normally they won't do you much good without debug symbols. But the bug is already filed! What do you do?


$ sudo apt-get install apport-retrace
$ sudo apport-retrace -o /tmp/trash \
> /var/crash/_usr_lib_openoffice_program_soffice.bin.1000.crash

Saturday, August 14, 2010

Active Directory trusts

An excellent article on AD trusts can be found at Tech Republic. Another one can be found at Redmond Magazine.

Friday, August 13, 2010

For posterity

The following is a very interesting slashdot post on the issues of the NBN and Gigabit speeds...

Expensive, if done right.

Thursday, August 12, 2010

Pearson Product Moment Correlation Coefficient

So finally I understand how the Pearson Product Moment Correlation Coefficient, or also known simply as the correlation coefficient, works.

Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.

A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.

How do we get the correlation coefficient?

So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.

In other words, it is:

What does this mean?

Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.

Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.

In other words, the standard deviation is defined like so:

Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.

You calculate the z score of a value by taking the value (x) and subtract it from the mean () — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).

Thus, the z-score is:

So if you take the sum of the product of the z scores of the two variables and divide by the number of values, you get:

Now if you expand the Z scores for the two variables, then you get:

This is the same as:

Now if you take the standard deviation of a value x... (also apply this with the variable y)

And then substitute this into the equation (and while doing so also change µX to X̄ µY and to Ȳ), then you get:

And down the rabbit hole I go...

Currently I am reading the book "Statistics for the Utterly Confused". So far it's been excellent - for instance, I now have a good understanding of standard deviation, variability, percentiles and even box-plots.

And so now I start learning about bivariate data, and of course I get to learning about the correlation coefficient. Or more specifically the Pearson product-moment correlation coefficient, of which they present the following spectacular and complex formula:

Of course, there is no attempt to explain why this formula works. So now I'm looking up Wikipedia on this topic. Sadly, Wikipedia does not have a good broad overview of this that I can see, and assumes prior knowledge, so I'm now in the process of following a trail of topics (from bottom to top):
And, as Wikipedia doesn't seem to want to explain Vectors well, I'm also reading the following introduction to vectors.

Update: When I did a Google search, as it turns out there is actually a much simpler version of this formula (who knew?) that is explained here.

Tuesday, August 10, 2010

Undocumented optimizer feature in SQL Server 2005/2008

An interesting series of posts has been done by Paul White...

This is the most interesting...

Saturday, July 31, 2010

Wow, I got that wrong!

I wrote the following:

A relational database is so named because the data it stores relates to each other. In SQL, the relationships between data is expressed in a query by what is known as a join.

What I meant was that the tables relate to each other.

That is wrong.

A relational database is so named because of the mathematic concept of the binary relation.

Friday, July 30, 2010

How to debug segfaults in Ubuntu

Actually, turns out it's not that hard!

You need to do the following:

  1. Firstly, create a build folder.
    chris@ubuntu:~/src$ mkdir sqlitebuild

  2. Now download the source package:
    chris@ubuntu:~/src$ cd sqlitebuild
    chris@ubuntu:~/src/sqlitebuild$ sudo apt-get source sqlite3

  3. Now you need to build any package dependencies:
    chris@ubuntu:~/src/sqlitebuild$ sudo apt-get build-dep sqlite3

  4. Now extract the source files from the dsc file:
    chris@ubuntu:~/src/sqlitebuild$ dpkg-source -x foo_version-revision.dsc

  5. We don't want to strip the debug symbols, we don't really need to worry about build tests, and we don't want optimizations (they get in the way of the backtrace), so we need to set theDEB_BUILD_OPTIONS:
    chris@ubuntu:~/src/sqlitebuild$ cd sqlite3-3.6.22
    chris@ubuntu:~/src/sqlitebuild/sqlite3-3.6.22$ sudo /
    > DEB_BUILD_OPTIONS="nocheck noopt nostrip" /
    > fakeroot debian/rules binary

    sudo DEB_BUILD_OPTIONS="nocheck noopt nostrip" fakeroot debian/rules binary
  6. It will now spit out lots of compilation info.
Now you need to actually install the package, which can be tricky if you've already got it installed.

  1. Most Ubuntu/Debian packages have dependencies. You'll need to force the uninstallation of the package:
    chris@ubuntu:~/src/sqlitebuild/sqlite3-3.6.22$ cd ..
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends sqlite3
    [sudo] password for chris:
    (Reading database ... 333636 files and directories currently installed.)
    Removing sqlite3 ...
    Processing triggers for man-db ...
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends libsqlite3
    dpkg: warning: ignoring request to remove libsqlite3 which isn't installed.
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends libsqlite3-0
    dpkg: libsqlite3-0: dependency problems, but removing anyway as you requested:
    libsqlite3-0-dbg depends on libsqlite3-0 (= 3.6.22-1).
    libmono-sqlite2.0-cil depends on libsqlite3-0 (>= 3.6.13).
    libsqlite3-dev depends on libsqlite3-0 (= 3.6.22-1).
    evolution-indicator depends on libsqlite3-0 (>= 3.6.22); however:
    Package libsqlite3-0 is to be removed.
    libmono-sqlite1.0-cil depends on libsqlite3-0 (>= 3.6.13).
    evolution-couchdb depends on libsqlite3-0 (>= 3.6.22).
    libpackagekit-glib2-12 depends on libsqlite3-0 (>= 3.6.22); however:
    Package libsqlite3-0 is to be removed.
    evolution depends on libsqlite3-0 (>= 3.6.22); however:
    Package libsqlite3-0 is to be removed.
    libedata-book1.2-2 depends on libsqlite3-0 (>= 3.6.22).
    libwebkit-1.0-2 depends on libsqlite3-0 (>= 3.6.22).
    evolution-plugins depends on libsqlite3-0 (>= 3.6.22); however:
    Package libsqlite3-0 is to be removed.
    evolution-exchange depends on libsqlite3-0 (>= 3.6.22).
    libcamel1.2-14 depends on libsqlite3-0 (>= 3.6.22).
    libnss3-1d depends on libsqlite3-0 (>= 3.6.22).
    libedataserverui1.2-8 depends on libsqlite3-0 (>= 3.6.22).
    libopensync0 depends on libsqlite3-0 (>= 3.6.16).
    python2.6-dbg depends on libsqlite3-0 (>= 3.6.22).
    libqt4-webkit depends on libsqlite3-0 (>= 3.6.22).
    python2.6 depends on libsqlite3-0 (>= 3.6.22).
    libsvn1 depends on libsqlite3-0 (>= 3.6.16).
    packagekit depends on libsqlite3-0 (>= 3.6.22); however:
    Package libsqlite3-0 is to be removed.
    evolution-data-server depends on libsqlite3-0 (>= 3.6.22).
    libebook1.2-9 depends on libsqlite3-0 (>= 3.6.22).
    bibledit depends on libsqlite3-0 (>= 3.6.16).
    xulrunner-1.9.2 depends on libsqlite3-0 (>= 3.6.22).
    libaprutil1-dbd-sqlite3 depends on libsqlite3-0 (>= 3.6.22).
    telepathy-gabble depends on libsqlite3-0 (>= 3.6.22).
    libgpod4 depends on libsqlite3-0 (>= 3.6.22).
    libqt4-sql-sqlite depends on libsqlite3-0 (>= 3.6.22).
    libgda-4.0-4 depends on libsqlite3-0 (>= 3.6.22).
    libsoup-gnome2.4-1 depends on libsqlite3-0 (>= 3.6.22).
    banshee depends on libsqlite3-0 (>= 3.6.22).
    python-gpod depends on libsqlite3-0 (>= 3.6.22).
    opensyncutils depends on libsqlite3-0 (>= 3.6.16).
    (Reading database ... 333630 files and directories currently installed.)
    Removing libsqlite3-0 ...
    Purging configuration files for libsqlite3-0 ...
    Processing triggers for libc-bin ...
    ldconfig deferred processing now taking place
    /sbin/ldconfig.real: /usr/lib/debug/usr/lib/libpython2.6.so.1.0-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
  2. Next, you install the newly built debs:
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i libsqlite3-0_3.6.22-1_i386.deb
    Selecting previously deselected package libsqlite3-0.
    (Reading database ... 333625 files and directories currently installed.)
    Unpacking libsqlite3-0 (from libsqlite3-0_3.6.22-1_i386.deb) ...
    Setting up libsqlite3-0 (3.6.22-1) ...

    Processing triggers for libc-bin ...
    ldconfig deferred processing now taking place
    /sbin/ldconfig.real: /usr/lib/debug/usr/lib/libpython2.6.so.1.0-gdb.py is not an ELF file - it has the wrong magic bytes at the start.

    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i libsqlite3-0-dbg_3.6.22-1_i386.deb
    (Reading database ... 333632 files and directories currently installed.)
    Preparing to replace libsqlite3-0-dbg 3.6.22-1 (using libsqlite3-0-dbg_3.6.22-1_i386.deb) ...
    Unpacking replacement libsqlite3-0-dbg ...
    Setting up libsqlite3-0-dbg (3.6.22-1) ...
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i sqlite3_3.6.22-1_i386.deb
    Selecting previously deselected package sqlite3.
    (Reading database ... 333627 files and directories currently installed.)
    Unpacking sqlite3 (from sqlite3_3.6.22-1_i386.deb) ...
    Setting up sqlite3 (3.6.22-1) ...
    Processing triggers for man-db ...
    chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i sqlite3-doc_3.6.22-1_all.deb
    (Reading database ... 333633 files and directories currently installed.)
    Preparing to replace sqlite3-doc 3.6.22-1 (using sqlite3-doc_3.6.22-1_all.deb) ...
    Unpacking replacement sqlite3-doc ...
    Setting up sqlite3-doc (3.6.22-1) ...

Now to debug the segmentation fault. I'm currently looking into why Firefox is segfaulting on me every one and half minutes or so, and I've worked out that because I have a corrupted places.sqlite (index has gone bad) that it's actually sqlite3 that is having the problem.

I've actually applied this same procedure to Firefox, and I determined that sqlite wasn't happy with the following query:

select h.url, v.visit_date, h.hidden, 0 AS whole_entry FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id where v.visit_date < '2000-01-01';

From here, it's pretty easy to hook up gdb to get the full backtrace, or even step through the code.

Update: My efforts have resulted in a bugfix from the sqlite team - yay! Basically, the issue was that Firefox was segfaulting every minute and a half or so. It turns out that my places.sqlite datafile was corrupted, the sqlite guys have now developed a fix.

If anyone else has a similar issue to this one, incidentally, then they can't go wrong with the following:

chris@ubuntu:~/.mozilla/firefox/1u64q3v3.default$ for i in *.sqlite; do echo "Reindexing $i"; echo "reindex;" | sqlite3 $i; done
Reindexing content-prefs.sqlite
Reindexing cookies.sqlite
Reindexing downloads.sqlite
Reindexing formhistory.sqlite
Reindexing permissions.sqlite
Reindexing places.sqlite
Reindexing search.sqlite
Reindexing signons.sqlite
Reindexing urlclassifier3.sqlite

Friday, July 23, 2010



Friday, July 2, 2010

True geek heaven!

So I just found a place with a lot of free and legal textbooks.


It's quite unbelievable!

Monday, June 28, 2010

Yay powersets!

Boo! Arial can display this correctly, why is blogger doing this? Boo!

Friday, June 25, 2010

Java classpath issues

So I had a strange issue today. I was getting classpath exceptions. Turns out turning off the JQS on Windows XP fixed it.

But it got me wondering how this works. Here are some links for my own reference:

Tuesday, May 18, 2010

Youtube clips I have posted to my Facebook profile

My, there are quite a few!

Monday, May 17, 2010

Cross browser code

Interesting post from Microsoft:

Same Markup: Writing Cross-Browser Code

Update: and damn... look what they've done. meta tags and http headers!

Friday, May 14, 2010

Giano quote file


Have you read this [4] are you seriously endorsing it? you thik that is the standards of writing and lower middle class behaviour we should all aspir to? we would be a laughing stock over all of Europe.

Giano on vulgarity


You have deliberatly and very poorly attempted to misconstrue and portray me in a bad light - but don't wory I expect no less from you - and I think you are the loser for it. However, there is one thing that I do admire about you Mr Wales, your spelling.

Giano on spelling.


Thanks, I always handle all things to my complete satisfaction, there is only one thing to do with a peanut and I do it. People love to hate me, probably because I have ths iritating habbit of generally being proved right. Sometimes, it takes a while, but sooner or later the editor concerned bites the dust. I put it down to frequent bridge playing and yacht racing, one learns to evaluate one's assets and take few risks.

Giano on his importance and ability


Not at all, but a reference is a reference and a fact is a fact and a FA is a FA.

Giano explaining references, facts and featured articles


I suspect this user has an agenda, and I cannot be bothered to discover it.

Giano on suspicious editors


The civility police behave like a lot of ancient old ladies, sitting knitting mishapen garments, waiting dribbling for someone to use the word "fuck" so they can leap out of their chairs in exitement.

Giano on old ladies, some time before this request.


Jimbo, sweetheart, this thread has nothing to do with me, please go to the top and start again...

Giano expresses his affection for Jimbo


Sorry, I find that I have sufficient elderly aunts of my own to protect and keep out of trouble. Anyway, do you think the goat ridden mountains of Sicily a suitable environment for an elderly aunt?"

Giano on aunties


Tuesday, April 27, 2010

Running total in SQL Server...

Not really sure how good this is... but seems to do the job!

;with theData (rowNum, GroupA, GroupRowNum, theValue) as
   select row_number() over (order by MajorGroup, GroupOrder),
   row_number() over (partition by MajorGroup
      order by MajorGroup, GroupOrder),
   from DataTable
select X.GroupA, X.GroupRowNum, X.theValue +
      (select sum(theValue)
      from theData Y
      where Y.GroupA=X.GroupA and Y.GroupRowNum > X.GroupRowNum), 0
from theData X

For some reason I can't get an execution plan as I get an error - I've filed a Connect bug with Microsoft.

Wednesday, March 17, 2010

How many members should be in the Australian House of Parliament?

You find this by the following formula:

Quota = Number of people in Australia / (Number of senators in the Senate * 2)

Members of the House of Parliament for each state = Number of people in the state / Quota + (Optional)

Optional is worked out as follows:
  • If mod(Number of people in the state / Quota) > (Quota / 2) then Optional = 1
  • Else Optional = 0
However, Members of the House of Parliament for each state must be equal to or greater than 5.