Sadly, my greatest claim to fame is that I invented the [citation needed] tag. I'll stop sobbing in a corner one day.
P.S. if anyone writes [Citation needed], then [Citation provided].
Tuesday, December 28, 2010
Markov Chain generator
As it turns out, with a sufficiently dense input text, a Markov Chain generator is actually reasonably easy to implement.
From page 62 of The Practice of Programming:
From page 62 of The Practice of Programming:
- set w1 and w2 to the first two words in the text
- print w1 and w2
- loop:
- randomly choose w3, one of the successors of prefix w1 w2 in the text
print w3
print w1 and w2 by w2 and w3
repeat loop
Labels:
algorithm,
markovchain
Sunday, December 26, 2010
Sargability of monotonic functions in SQL
An excellent writeup by Quassnoi about search arguments of monotonic functions and how most query engines don't handle them, and how they should.
Wednesday, December 1, 2010
SQL Server Management Studio 2008 R2
While it's a great idea to include a debugger to step through T-SQL code, it's not very well done. I frequently get a window handle error, and then I either need to reboot client or restart SSMS.
Microsoft - do some QA testing before release cool new features like this. It's tantamount to baiting me!
Microsoft - do some QA testing before release cool new features like this. It's tantamount to baiting me!
Labels:
fail,
sqlserver2008
Thursday, October 21, 2010
Sequence tables in SQL Server 2005
Interesting post by Paul White about Sequence Tables in SQL Server.
Labels:
databases,
sequencetables,
sqlserver2005,
sqlserver2008
Tuesday, October 19, 2010
Mistake in Wikipedia article on the Builder pattern
The following article on the Builder Design Pattern on Wikipedia, is sort of OK, except that the Java example provided is inaccurate.
The abstract PizzaBuilder class uses abstract functions. That's not really recommended - if you subclass this then it means that you have to implement all the functions. But really, the pattern is meant to be more flexible than that - if you don't want to use the function then you shouldn't have to use it. Therefore, as the Gang of Four say in their book on page 101, the build methods should be left intentionally empty and not declared as abstract functions, "letting clients override only the operations they are interested in".
The abstract PizzaBuilder class uses abstract functions. That's not really recommended - if you subclass this then it means that you have to implement all the functions. But really, the pattern is meant to be more flexible than that - if you don't want to use the function then you shouldn't have to use it. Therefore, as the Gang of Four say in their book on page 101, the build methods should be left intentionally empty and not declared as abstract functions, "letting clients override only the operations they are interested in".
Labels:
designpatterns,
wikipedia
Monday, October 18, 2010
ICloneable: Microsoft's big mistake
So here's the deal: if you implement ICloneable in a public API, you are a making a big mistake. Why? Because there's no way of knowing if the Clone() function will do a deep or shallow copy.
As it turns out, even Microsoft know that they have made a mistake with this interface. Oopsy!
As it turns out, even Microsoft know that they have made a mistake with this interface. Oopsy!
Labels:
dotnet,
interfaces
Monday, October 11, 2010
SQL Server indexing...
There's a series of articles called SQL Server Index Black Ops... and indeed they are!
Posts:
Posts:
Labels:
databases,
microsoft,
sqlserver2005,
sqlserver2008
Wednesday, September 15, 2010
Recursive queries vs. nonrecursive queries; and consistent reads vs. currently reads
Recursive reads vs. non-recursive reads: best to quote Tom Kyte on this one!
Consistent reads vs. current reads - got the following from the Oracle-l mailing list from Mark Boback:
non-recursive statements are statements issued by the client to the server.
recursive statements are statements executed by that non-recursive call.
non-recursive statements can be sql, or plsql - they are just "a statement submitted by client to server"
recursive statements can be plsql, or sql - they are just "a statement executed by the statement submitted by the client to the server"
Consistent reads vs. current reads - got the following from the Oracle-l mailing list from Mark Boback:
A 'db block get' is a current mode get. That is, it's the most up-to-date
copy of the data in that block, as it is right now, or currently. There
can only be one current copy of a block in the buffer cache at any time.
Db block gets generally are used when DML changes data in the database.
In that case, row-level locks are implicitly taken on the updated rows.
There is also at least one well-known case where a select statement does
a db block get, and does not take a lock. That is, when it does a full
table scan or fast full index scan, Oracle will read the segment header
in current mode (multiple times, the number varies based on Oracle version).
A 'consistent get' is when Oracle gets the data in a block which is consistent
with a given point in time, or SCN. The consistent get is at the heart of
Oracle's read consistency mechanism. When blocks are fetched in order to
satisfy a query result set, they are fetched in consistent mode. If no
block in the buffer cache is consistent to the correct point in time, Oracle
will (attempt to) reconstruct that block using the information in the rollback
segments. If it fails to do so, that's when a query errors out with the
much dreaded, much feared, and much misunderstood ORA-1555 "snapshot too old".
Tuesday, August 31, 2010
My app in Ubuntu crashed, but I didn't have debugsymbols...
Say you've enabled Apport on Ubuntu, and you dutifully report segfaults to Launchpad.
Well, normally they won't do you much good without debug symbols. But the bug is already filed! What do you do?
Easy.
$ sudo apt-get install apport-retrace
$ sudo apport-retrace -o /tmp/trash \
> /var/crash/_usr_lib_openoffice_program_soffice.bin.1000.crash
Well, normally they won't do you much good without debug symbols. But the bug is already filed! What do you do?
Easy.
$ sudo apt-get install apport-retrace
$ sudo apport-retrace -o /tmp/trash \
> /var/crash/_usr_lib_openoffice_program_soffice.bin.1000.crash
Saturday, August 14, 2010
Active Directory trusts
An excellent article on AD trusts can be found at Tech Republic. Another one can be found at Redmond Magazine.
Labels:
activedirectory,
microsoft
Friday, August 13, 2010
For posterity
The following is a very interesting slashdot post on the issues of the NBN and Gigabit speeds...
Expensive, if done right.
Expensive, if done right.
Labels:
nbn,
networking,
slashdot
Thursday, August 12, 2010
Pearson Product Moment Correlation Coefficient
So finally I understand how the Pearson Product Moment Correlation Coefficient, or also known simply as the correlation coefficient, works.
Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.
A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.
How do we get the correlation coefficient?
So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.
In other words, it is:
What does this mean?
Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.
Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.
In other words, the standard deviation is defined like so:
Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.
You calculate the z score of a value by taking the value (x) and subtract it from the mean (x̄) — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).
Thus, the z-score is:
So if you take the sum of the product of the z scores of the two variables and divide by the number of values, you get:
Now if you expand the Z scores for the two variables, then you get:
This is the same as:
Now if you take the standard deviation of a value x... (also apply this with the variable y)
And then substitute this into the equation (and while doing so also change µX to X̄ µY and to Ȳ), then you get:
Firstly, you need to understand the definition of the term correlation. A correlation is a statistical association or relationship between two variables.
A correlation coefficient measures the strength and direction of a linear (think: straight line graph) relationship between two variables using population data.
How do we get the correlation coefficient?
So down to business: the correlation coefficient is the sum of the product of the z-scores over the number of samples.
In other words, it is:
What does this mean?
Well, firstly, the standard deviation allows us to see the variability of data in a dataset. It is the square root of the variance.
Assuming that we know all the values in the sample, then the variance is actually given by taking the deviation of each value from the mean (x - x̄) and squaring it (to ensure that the signs don't cancel themselves out...). Each squared deviation is then summed and the sum is then divided by the number of values in the sample.
In other words, the standard deviation is defined like so:
Now a z score is a way of saying how many standard deviations a value is above or below from the mean value of a particular data set.
You calculate the z score of a value by taking the value (x) and subtract it from the mean (x̄) — this gives you how far from the mean the value is — and then divide by the standard deviation (σ).
Thus, the z-score is:
So if you take the sum of the product of the z scores of the two variables and divide by the number of values, you get:
Now if you expand the Z scores for the two variables, then you get:
This is the same as:
Now if you take the standard deviation of a value x... (also apply this with the variable y)
And then substitute this into the equation (and while doing so also change µX to X̄ µY and to Ȳ), then you get:
Labels:
math,
statistics
And down the rabbit hole I go...
Currently I am reading the book "Statistics for the Utterly Confused". So far it's been excellent - for instance, I now have a good understanding of standard deviation, variability, percentiles and even box-plots.
And so now I start learning about bivariate data, and of course I get to learning about the correlation coefficient. Or more specifically the Pearson product-moment correlation coefficient, of which they present the following spectacular and complex formula:
Of course, there is no attempt to explain why this formula works. So now I'm looking up Wikipedia on this topic. Sadly, Wikipedia does not have a good broad overview of this that I can see, and assumes prior knowledge, so I'm now in the process of following a trail of topics (from bottom to top):
Update: When I did a Google search, as it turns out there is actually a much simpler version of this formula (who knew?) that is explained here.
And so now I start learning about bivariate data, and of course I get to learning about the correlation coefficient. Or more specifically the Pearson product-moment correlation coefficient, of which they present the following spectacular and complex formula:
Of course, there is no attempt to explain why this formula works. So now I'm looking up Wikipedia on this topic. Sadly, Wikipedia does not have a good broad overview of this that I can see, and assumes prior knowledge, so I'm now in the process of following a trail of topics (from bottom to top):
- Pearson product-moment correlation coefficient
- Covariance
- Random Variable
- Second moment
- Expected value
- Random vector
- Vector space
- Plane
- Euclidean vector
Update: When I did a Google search, as it turns out there is actually a much simpler version of this formula (who knew?) that is explained here.
Labels:
math,
statistics,
wikipedia
Tuesday, August 10, 2010
Saturday, July 31, 2010
Wow, I got that wrong!
I wrote the following:
What I meant was that the tables relate to each other.
That is wrong.
A relational database is so named because of the mathematic concept of the binary relation.
A relational database is so named because the data it stores relates to each other. In SQL, the relationships between data is expressed in a query by what is known as a join.
What I meant was that the tables relate to each other.
That is wrong.
A relational database is so named because of the mathematic concept of the binary relation.
Friday, July 30, 2010
How to debug segfaults in Ubuntu
Actually, turns out it's not that hard!
Now to debug the segmentation fault. I'm currently looking into why Firefox is segfaulting on me every one and half minutes or so, and I've worked out that because I have a corrupted places.sqlite (index has gone bad) that it's actually sqlite3 that is having the problem.
I've actually applied this same procedure to Firefox, and I determined that sqlite wasn't happy with the following query:
From here, it's pretty easy to hook up gdb to get the full backtrace, or even step through the code.
Update: My efforts have resulted in a bugfix from the sqlite team - yay! Basically, the issue was that Firefox was segfaulting every minute and a half or so. It turns out that my places.sqlite datafile was corrupted, the sqlite guys have now developed a fix.
If anyone else has a similar issue to this one, incidentally, then they can't go wrong with the following:
You need to do the following:
- Firstly, create a build folder.
chris@ubuntu:~/src$ mkdir sqlitebuild
- Now download the source package:
chris@ubuntu:~/src$ cd sqlitebuild
chris@ubuntu:~/src/sqlitebuild$ sudo apt-get source sqlite3 - Now you need to build any package dependencies:
chris@ubuntu:~/src/sqlitebuild$ sudo apt-get build-dep sqlite3
- Now extract the source files from the dsc file:
chris@ubuntu:~/src/sqlitebuild$ dpkg-source -x foo_version-revision.dsc
- We don't want to strip the debug symbols, we don't really need to worry about build tests, and we don't want optimizations (they get in the way of the backtrace), so we need to set theDEB_BUILD_OPTIONS:
chris@ubuntu:~/src/sqlitebuild$ cd sqlite3-3.6.22
chris@ubuntu:~/src/sqlitebuild/sqlite3-3.6.22$ sudo /
> DEB_BUILD_OPTIONS="nocheck noopt nostrip" /
> fakeroot debian/rules binary
sudo DEB_BUILD_OPTIONS="nocheck noopt nostrip" fakeroot debian/rules binary - It will now spit out lots of compilation info.
- Most Ubuntu/Debian packages have dependencies. You'll need to force the uninstallation of the package:
chris@ubuntu:~/src/sqlitebuild/sqlite3-3.6.22$ cd ..
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends sqlite3
[sudo] password for chris:
(Reading database ... 333636 files and directories currently installed.)
Removing sqlite3 ...
Processing triggers for man-db ...
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends libsqlite3
dpkg: warning: ignoring request to remove libsqlite3 which isn't installed.
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg --purge --force-depends libsqlite3-0
dpkg: libsqlite3-0: dependency problems, but removing anyway as you requested:
libsqlite3-0-dbg depends on libsqlite3-0 (= 3.6.22-1).
libmono-sqlite2.0-cil depends on libsqlite3-0 (>= 3.6.13).
libsqlite3-dev depends on libsqlite3-0 (= 3.6.22-1).
evolution-indicator depends on libsqlite3-0 (>= 3.6.22); however:
Package libsqlite3-0 is to be removed.
libmono-sqlite1.0-cil depends on libsqlite3-0 (>= 3.6.13).
evolution-couchdb depends on libsqlite3-0 (>= 3.6.22).
libpackagekit-glib2-12 depends on libsqlite3-0 (>= 3.6.22); however:
Package libsqlite3-0 is to be removed.
evolution depends on libsqlite3-0 (>= 3.6.22); however:
Package libsqlite3-0 is to be removed.
libedata-book1.2-2 depends on libsqlite3-0 (>= 3.6.22).
libwebkit-1.0-2 depends on libsqlite3-0 (>= 3.6.22).
evolution-plugins depends on libsqlite3-0 (>= 3.6.22); however:
Package libsqlite3-0 is to be removed.
evolution-exchange depends on libsqlite3-0 (>= 3.6.22).
libcamel1.2-14 depends on libsqlite3-0 (>= 3.6.22).
libnss3-1d depends on libsqlite3-0 (>= 3.6.22).
libedataserverui1.2-8 depends on libsqlite3-0 (>= 3.6.22).
libopensync0 depends on libsqlite3-0 (>= 3.6.16).
python2.6-dbg depends on libsqlite3-0 (>= 3.6.22).
libqt4-webkit depends on libsqlite3-0 (>= 3.6.22).
python2.6 depends on libsqlite3-0 (>= 3.6.22).
libsvn1 depends on libsqlite3-0 (>= 3.6.16).
packagekit depends on libsqlite3-0 (>= 3.6.22); however:
Package libsqlite3-0 is to be removed.
evolution-data-server depends on libsqlite3-0 (>= 3.6.22).
libebook1.2-9 depends on libsqlite3-0 (>= 3.6.22).
bibledit depends on libsqlite3-0 (>= 3.6.16).
xulrunner-1.9.2 depends on libsqlite3-0 (>= 3.6.22).
libaprutil1-dbd-sqlite3 depends on libsqlite3-0 (>= 3.6.22).
telepathy-gabble depends on libsqlite3-0 (>= 3.6.22).
libgpod4 depends on libsqlite3-0 (>= 3.6.22).
libqt4-sql-sqlite depends on libsqlite3-0 (>= 3.6.22).
libgda-4.0-4 depends on libsqlite3-0 (>= 3.6.22).
libsoup-gnome2.4-1 depends on libsqlite3-0 (>= 3.6.22).
banshee depends on libsqlite3-0 (>= 3.6.22).
python-gpod depends on libsqlite3-0 (>= 3.6.22).
opensyncutils depends on libsqlite3-0 (>= 3.6.16).
(Reading database ... 333630 files and directories currently installed.)
Removing libsqlite3-0 ...
Purging configuration files for libsqlite3-0 ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
/sbin/ldconfig.real: /usr/lib/debug/usr/lib/libpython2.6.so.1.0-gdb.py is not an ELF file - it has the wrong magic bytes at the start. - Next, you install the newly built debs:
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i libsqlite3-0_3.6.22-1_i386.deb
Selecting previously deselected package libsqlite3-0.
(Reading database ... 333625 files and directories currently installed.)
Unpacking libsqlite3-0 (from libsqlite3-0_3.6.22-1_i386.deb) ...
Setting up libsqlite3-0 (3.6.22-1) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
/sbin/ldconfig.real: /usr/lib/debug/usr/lib/libpython2.6.so.1.0-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i libsqlite3-0-dbg_3.6.22-1_i386.deb
(Reading database ... 333632 files and directories currently installed.)
Preparing to replace libsqlite3-0-dbg 3.6.22-1 (using libsqlite3-0-dbg_3.6.22-1_i386.deb) ...
Unpacking replacement libsqlite3-0-dbg ...
Setting up libsqlite3-0-dbg (3.6.22-1) ...
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i sqlite3_3.6.22-1_i386.deb
Selecting previously deselected package sqlite3.
(Reading database ... 333627 files and directories currently installed.)
Unpacking sqlite3 (from sqlite3_3.6.22-1_i386.deb) ...
Setting up sqlite3 (3.6.22-1) ...
Processing triggers for man-db ...
chris@ubuntu:~/src/sqlitebuild$ sudo dpkg -i sqlite3-doc_3.6.22-1_all.deb
(Reading database ... 333633 files and directories currently installed.)
Preparing to replace sqlite3-doc 3.6.22-1 (using sqlite3-doc_3.6.22-1_all.deb) ...
Unpacking replacement sqlite3-doc ...
Setting up sqlite3-doc (3.6.22-1) ...
Now to debug the segmentation fault. I'm currently looking into why Firefox is segfaulting on me every one and half minutes or so, and I've worked out that because I have a corrupted places.sqlite (index has gone bad) that it's actually sqlite3 that is having the problem.
I've actually applied this same procedure to Firefox, and I determined that sqlite wasn't happy with the following query:
select h.url, v.visit_date, h.hidden, 0 AS whole_entry FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id where v.visit_date < '2000-01-01';
From here, it's pretty easy to hook up gdb to get the full backtrace, or even step through the code.
Update: My efforts have resulted in a bugfix from the sqlite team - yay! Basically, the issue was that Firefox was segfaulting every minute and a half or so. It turns out that my places.sqlite datafile was corrupted, the sqlite guys have now developed a fix.
If anyone else has a similar issue to this one, incidentally, then they can't go wrong with the following:
chris@ubuntu:~/.mozilla/firefox/1u64q3v3.default$ for i in *.sqlite; do echo "Reindexing $i"; echo "reindex;" | sqlite3 $i; done
Reindexing content-prefs.sqlite
Reindexing cookies.sqlite
Reindexing downloads.sqlite
Reindexing formhistory.sqlite
Reindexing permissions.sqlite
Reindexing places.sqlite
Reindexing search.sqlite
Reindexing signons.sqlite
Reindexing urlclassifier3.sqlite
chris@ubuntu:~/.mozilla/firefox/1u64q3v3.default$
Friday, July 23, 2010
Friday, July 2, 2010
True geek heaven!
So I just found a place with a lot of free and legal textbooks.
http://www.freebookcentre.net
It's quite unbelievable!
http://www.freebookcentre.net
It's quite unbelievable!
Labels:
freebooks
Monday, June 28, 2010
Friday, June 25, 2010
Java classpath issues
So I had a strange issue today. I was getting classpath exceptions. Turns out turning off the JQS on Windows XP fixed it.
But it got me wondering how this works. Here are some links for my own reference:
But it got me wondering how this works. Here are some links for my own reference:
- Dustin's Java development tools blog post
- The jps tool.
- Jconsole
- Actually using Jconsole
- ClassNotFoundException vs. NoClassDefFoundError
- IBM DeveloperWorks articles on Classpath:
- Demystifying class loading problems, Part 1: An introduction to class loading and debugging tools
- Demystifying class loading problems, Part 2: Basic class loading exceptions
- Demystifying class loading problems, Part 3: Tackling more unusual class loading problems
- Demystifying class loading problems, Part 4: Deadlocks and constraints
Tuesday, May 18, 2010
Monday, May 17, 2010
Cross browser code
Interesting post from Microsoft:
Same Markup: Writing Cross-Browser Code
Update: and damn... look what they've done. meta tags and http headers!
Same Markup: Writing Cross-Browser Code
Update: and damn... look what they've done. meta tags and http headers!
Labels:
microsoft,
webbrowers
Friday, May 14, 2010
Giano quote file
- Have you read this [4] are you seriously endorsing it? you thik that is the standards of writing and lower middle class behaviour we should all aspir to? we would be a laughing stock over all of Europe.
- You have deliberatly and very poorly attempted to misconstrue and portray me in a bad light - but don't wory I expect no less from you - and I think you are the loser for it. However, there is one thing that I do admire about you Mr Wales, your spelling.
- Thanks, I always handle all things to my complete satisfaction, there is only one thing to do with a peanut and I do it. People love to hate me, probably because I have ths iritating habbit of generally being proved right. Sometimes, it takes a while, but sooner or later the editor concerned bites the dust. I put it down to frequent bridge playing and yacht racing, one learns to evaluate one's assets and take few risks.
- Not at all, but a reference is a reference and a fact is a fact and a FA is a FA.
- I suspect this user has an agenda, and I cannot be bothered to discover it.
- The civility police behave like a lot of ancient old ladies, sitting knitting mishapen garments, waiting dribbling for someone to use the word "fuck" so they can leap out of their chairs in exitement.
- Giano on old ladies, some time before this request.
- Jimbo, sweetheart, this thread has nothing to do with me, please go to the top and start again...
- Sorry, I find that I have sufficient elderly aunts of my own to protect and keep out of trouble. Anyway, do you think the goat ridden mountains of Sicily a suitable environment for an elderly aunt?"
Tuesday, April 27, 2010
Running total in SQL Server...
Not really sure how good this is... but seems to do the job!
For some reason I can't get an execution plan as I get an error - I've filed a Connect bug with Microsoft.
;with theData (rowNum, GroupA, GroupRowNum, theValue) as
(
select row_number() over (order by MajorGroup, GroupOrder),
DataValue,
row_number() over (partition by MajorGroup
order by MajorGroup, GroupOrder),
GroupOrder
from DataTable
)
select X.GroupA, X.GroupRowNum, X.theValue +
coalesce(
(select sum(theValue)
from theData Y
where Y.GroupA=X.GroupA and Y.GroupRowNum > X.GroupRowNum), 0
)
from theData X
(
select row_number() over (order by MajorGroup, GroupOrder),
DataValue,
row_number() over (partition by MajorGroup
order by MajorGroup, GroupOrder),
GroupOrder
from DataTable
)
select X.GroupA, X.GroupRowNum, X.theValue +
coalesce(
(select sum(theValue)
from theData Y
where Y.GroupA=X.GroupA and Y.GroupRowNum > X.GroupRowNum), 0
)
from theData X
For some reason I can't get an execution plan as I get an error - I've filed a Connect bug with Microsoft.
Labels:
sqlserver2005
Wednesday, March 17, 2010
How many members should be in the Australian House of Parliament?
You find this by the following formula:
Quota = Number of people in Australia / (Number of senators in the Senate * 2)
Members of the House of Parliament for each state = Number of people in the state / Quota + (Optional)
Optional is worked out as follows:
Quota = Number of people in Australia / (Number of senators in the Senate * 2)
Members of the House of Parliament for each state = Number of people in the state / Quota + (Optional)
Optional is worked out as follows:
- If mod(Number of people in the state / Quota) > (Quota / 2) then Optional = 1
- Else Optional = 0
Labels:
australia
Subscribe to:
Posts (Atom)