Random technical stuff: January 2016

Friday, January 22, 2016

Why I love hacking at LibreOffice

The LibreOffice codebase is, to be frank, messy. This isn't a criticism of previous developers - it's still an amazing product and an amazing feat of programming given the number of platforms it runs on. The StarView guys, and later OpenOffice.org development team, did a great job. For instance, I was reading up on the font mapping code and I often saw Herbert Duerr's name, and I've got nothing but respect for the work that he did and his dedication to the project.

Unfortunately, the messiness is the result of a codebase that was first created in 1988 and has been actively worked on and changed directions a number of times. So... what is it that I love about working on LibreOffice?

In a word: the people. I've never met any of my fellow hackers in person, only a few phone calls to Michael Meeks and listened in on some Collabora meetings when I did some work for them. Mostly I stick around the IRC channel and keep an eye on the mailing list.

On the IRC channel, however, I've noticed the good humour and tolerance of the developers towards those not as expert as themselves. For example, Stephan Bergman (sberg) is the go to guy in terms of anything C++ related - and the other day he helped me out with a tricky bit of code (well, I thought it was tricky) related to the equality operator. Tor Lillqvist, as another example, is a diamond in the rough IMO and has often pointed me in the right direction when I'm online. And I cannot forget to mention Norbert Thiebaud (shm_get on IRC) for his constant maintenance of Jenkins (a thankless task!) and gerrit. These are literally just some of the folks who I interact with all the time and just a small sample of the folks who work on the LibreOffice project.

Without the goodwill and encouragement of the folks who hack away daily on the LibreOffice project, I doubt I'd be able to work on the small part of the codebase that I have chosen to improve. In fact, there is an active outreach program currently being run lead by Jan Iverson (JanIV on IRC) and there are quite a few commits coming in from newbies of all stripes. This is leading to some real improvements in the code, for instance new developer Dipankar Niranjan did a whole series of commits to remove the "dog-tag" code from the VCL module (the "dog-tags" was a way of attempting to ensure that Windows weren't deleted at the wrong time, but was a total hack and has been solved by VclPtr) - this has recently allowed sberg to remove ImplSVEvent::maDelData. Seeing the code evolve is also highly enjoyable :-)

There are so many more examples of this it's just not possible for me to list them in this post. It's interesting that LibreOffice has made so much traction in the last year in terms of code quality - whilst the UI hasn't changed hugely (though that is going to change I suspect in the next year!) the incremental improvements have been increasing at a rate of knots and folks are beginning to notice the positive effects. One user, who will remain nameless but is an active follower of the LibreOffice development process, told me that he noticed that the 5.x release was very much more responsive and stable that the 4.x series... and amusingly commented that "it's remarkable how interesting it is to watch the development of such an intrinsically boring type of software".

So there you have it, the reason I love hacking at LibreOffice is the people who work on it! May it ever be this way.

Thursday, January 7, 2016

Restricting Doxygen indexing to particular LibreOffice modules

I have committed a code change to the LibreOffice tree to allow developers to specify what modules they want Doxygen to index. I have done this because indexing every module is quite time consuming, and you may only want to review a subset of modules (in my case I am interested primarily in the vcl module).

To index specific modules only, you now export the $INPUT_PROJECTS environment variable and run make docs. For example, if you want to index only the vcl and svtools modules, you would do the following:

chris@libreoffice-ia64:~/repos/libreoffice$ export INPUT_PROJECTS="vcl svtools"
chris@libreoffice-ia64:~/repos/libreoffice$ make docs
chris@libreoffice-ia64:~/repos/libreoffice$ unset INPUT_PROJECTS

You can(not) have a void namespace in C++...

Update: So this was discussed on IRC, and it turns out I've got this all entirely wrong. It's actually much simpler - it turns out you don't need a space before the scope operator thus you can legally have:

void::TheClass::TheFunction(int n1, int n2) {}

This is the same as:

void ::TheClass::TheFunction(int n1, int n2) {}

and:

void ::TheClass ::TheFunction(int n1, int n2) {}

--------

I ran doxygen over the vcl module in LibreOffice and I got a lot of warnings and errors. I am now making a concerted effort to fix these warnings because I actually find doxygen generates good info on the classes, and I rather like the collaboration and inheritance diagrams it produces.

However, a strange error that it produced was:

/home/chris/repos/libreoffice/vcl/source/control/combobox.cxx:1322 warning: no uniquely matching class member found for void::ComboBox::Impl::ImplUserDrawHandler(UserDrawEvent *pEvent)

"A void namespace in the LibreOffice codebase?" I thought. How could this be? How is this even possible?

Sure enough, the definition was void::ComboBox::Impl:ImplUserDrawHandler(UserDrawEvent*)!

In an effort to work out if this was what was intended (which I was very doubtful of) I tracked down the commit, which converted the old style IMPL_LINK macro to a boost signal. It was definitely not intended!

I then tracked down where the C++ standard is held and reviewed the C++14 draft (I can't afford to spend US$216 on the official ISO standard). Section 7.3 deals with namespaces:

Amazingly (to my mind) there is no restriction on the namespace identifier, thus it is entirely possible to accidentally create a void, int, float or any other reserved C++ keyword as a namespace!

I wonder if it might be worthwhile for compilers to spit out a warning if they see this? Perhaps we need to write a clang plugin to detect this sort of thing in LibreOffice. The only reason this hasn't had an impact is that it doesn't appear that anything is currently using user-defined draw signals on comboboxes.

I pushed commit 86338fbb94324ed to fix this.

Tuesday, January 5, 2016

Possible 15 year old regression

A few days ago, I filed bug 96826 ("Typewriter attribute not given enough weight when finding font based on attributes") against LibreOffice. Basically, I've been refactoring VCL font code and I was very interested to see what my predecessors had been doing before me. This meant that I actually read pretty much all of the commits I could find all the way back to the year 2000 for that file, which held much of the font code for VCL.

One of the functions was originally in a class called ImplFontCache, which had a get() function that worked as a font mapper. It basically does a running total of a number of weighted values depending on the "strength" or "quality" of the font matching attribute. One of those attributes was to check if the font is a fixed-width font, then check to see if the font is a typewriter style font, giving a deliberately higher weighting to typewriter style fonts.

Anyway, on the 29th June 2001 someone with the initials th (no idea who this was, nor is it really important) did some tweaks to the function in an attempt to improve font mapping in commit 925806de6. To ensure that the typewriter style was given a weighted value, they have multiplied the value by 2. However, they appear to have accidentally removed a zero from the weighting, thus reducing the weighting of the typewriter style fonts, not increased it!

As I was reading the code, I thought "whoops, that was unfortunate" and assumed I'd see the error picked up somewhere down the track and fixed. Now either the weighting has little effect on these fonts, or not many people have problems with typewriter fonts not being cleanly mapped by the underlying platform, but this has never been fixed.

The code has since migrated its way into PhysicalFontCollection::FindFontFamilyByAttributes(), and it's still there!

    610         // test MONOSPACE+TYPEWRITER attributes
    611         if( nSearchType & ImplFontAttrs::Fixed )
    612         {
    613             if( nMatchType & ImplFontAttrs::Fixed )
    614                 nTestMatch += 1000000*2;
    615             // a typewriter attribute is even better
    616             if( ImplFontAttrs::None == ((nSearchType ^ nMatchType) & ImplFontAttrs::Typewriter) )
    617                 nTestMatch += 10000*2;
    618         }

As with any codebase with a 30 year pedigree, I've not committed a fix because I just don't know if this is something I want to touch. Hence I've logged the bug to see if anyone can shed any light on it. I'll probably follow up on the mailing list in a few weeks time to ping any more experienced developers, but until then this potential bug, which is in the middle of going through puberty, will remain.

Random technical stuff