Friday, October 17, 2014

Refactoring LibreOffice: VCL FontCharMap

I have been looking at the VCL in LibreOffice, which is its cross-platform widget and window library. Whilst reading the SalGraphics class (more on this in a future post) I noticed a class called ImplFontCharMap . Curious, I looked more into it. Why the "Impl"-prefix? What about FontCharMap?

As it turns out, ImplFontCharMap is a Pimpl for FontCharMap. Now normally a Pimpl has very little code and is not directly accessible by any class other than the class that uses it. A Pimpl allows for better builds in C++, and a number of other reasons. In this case ImplFontCharMap was doing a LOT.

A font character map in VCL allows a font to be mapped to Unicode codepoints. The VCL font charmap allows you to find a character in the font based on the Unicode codepoint, find the next and previous character supported by the font (these are not necessarily contiguous) and the first and last characters supported. There is also a default charmap, which maps the BMP plane, including surrogates.

ImplFontCharMap had an internal reference counting mechanism to optimise sharing of charmaps. However, this was better changed to boost's intrusive_ptr, because frankly that implementation is far more well engineered and tested, not to mention I'm not a fan of maintaining code that isn't really specifically addressing VCL needs. (incidentally, the best rundown I found of intrusive_ptr can be found at this man's blog) The commit can be found here. You can see that I've been able to immediately remove AddReference() and DeReference().

You will notice, however, that there are a few classes who rely on ImplFontCharMap (now ImplFontCharMapPtr, a typedef to an intrusive_ptr) directly. In particular, SalGraphics was relying on returning the Pimpl! Frankly, that's madness in my view. As I've said, Pimpls are really intended to be tightly coupled to a particular class and should never be used directly. The class that needs the Pimpl should be used! You can see other side effects, because the Pimpl is really duplicating code that should be in FontCharMap. This is clearly a bit if a code maintenance nightmare.

Given that a Pimpl is one of the very few concepts in C++ that relies on tightly coupling two classes, I made FontCharMap a friend class of ImplFontCharMap and moved most public functions from ImplFontCharMap to FontCharMap. I kept the destructor and the functions getDefaultCharMap() and isCharMap() but you'll notice I made them private, hence the lowercase first letter of the function names. I do NOT want VCL based code to access the Pimpl! I thought this was a necessary compromise because the logic really was more entwined with the data itself. There is a function FontCharMap::GetDefaultCharMap() although it's not normally necessary as the default charmap is shared via intrusive_ptr and the default FontCharMap constructor just returns a reference to the default charmap. I have provided it because you can get a default font charmap that excludes symbols.

I realised at this point, after a chat with Kendy on IRC, that I had dealt with managing the Pimpl for FontCharMap, but now I was returning raw FontCharMap pointers. This was defeating my refactor, so I made the typedef FontCharMapPtr, which is an intrusive_ptr to a FontCharMap instance. I then refactored the code to use this instead of the raw pointer.

The second commit that implemented this can be found here.

Finally, I have to have a big shout-out to Caolán McNamara from RedHat who found a rather embarassing bug I introduced and fixed it. The issue was that I didn't initialize the ref counter variables, except where I did where I set it to 1... which was rather silly of me, though I was reliably informed by Michael Meeks that he has done the same thing in the past.

Anyway, this is how I refactored a relatively small class. It actually took a lot of effort to do. In the end, I created a unit test to ensure that FontCharMap was still working correctly.

Sunday, March 23, 2014

How to build only vcl modules from scratch in LibreOffice

After reviewing the component diagram on the LibreOffice wiki, the following will make the VCL module in LibreOffice:

make sal salhelper store cppuhelper cppu xmlreader registry unoidl dtrans \
     binaryurp dtrans animations jvmfwk jvmaccess javaunohelper stoc i18nlangtag \
     ucbhelper comphelper basegfx tools unotools i18nutil i18npool sot svl vcl

Update: this doesn't always work. Turns out that there is some sort of circular dependency between i18npool and another module, which make sorts out itself. 

I'm now trying:

make CppunitTest_i18npool_test_breakiterator ucb configmgr vcl

Saturday, March 8, 2014

Google+ names policy discriminates against Native Americans

I find it absurd that Google+ doesn't recognize certain names. The latest outrage was when they rejected Elaine Yellow Horse. It was only after BuzzFeed contacted Google that they decided that, gee whiz, that is a real name! Who would have thought it.

So I thought I might let Google know a few things about Anthroponymy. Given Google has a whole division who work out if you've given a real name or not, you would hope they would be well-versed on what can constitute a name. It appears not, even in their own country they don't recognize the name of the original native people who inhabited the land.

 Here are a few things that Google+ administrators might like to brush up on when it comes to names of people around the world:

  • You don't have to have two names. You can be named with just one name - a mononym. This is quite common all over the world.
  • There are those who prefer to be named after their children - this is called teknonymy. As an example, I knew a Lebanese man whose first son was named George. We knew him as Abu George, literally "father of George".
  • People can have names that are... unusual. Let's give you a few:

    Jellyfish McSaveloy
    Heavenly Hiraani Tiger Lily Hutchence
    Apple Blythe Alison Martin
    Baby Hospital
    Yahoo Serious
    Goveg.com


  • There are a number of people whose names may be mistaken for rudeness:

    Argelico Fucks
    Dick Assman
    Lucious Pusey (changed to Lucious Seymour)
    Rusty Kuntz

    So let's see - Google will be banning all the Fucks, Assman, Pusey and Kuntz of the world, even though these are their legal names.
Google, pick up your act. It's getting really old. Frankly, the fact that Elaine Yellow Horse appealed your policy three times and it was only once the media got involved that you did the right thing shows that your staff are either incompetent, or racist. I hope for incompetence. Either is bad, but racism is much worse than incompetence.