Wednesday, March 16, 2016

Tips for LibreOffice newbies

Li Haoyi has written an excellent blog post entitled "Diving Into Other People's Code" about diving into an unfamilar codebase (HN discussion here).

I think this is really very helpful for anyone who wants to look at the LibreOffice source for the first time. Many of the things he mentions are directly relatable to LibreOffice - in particular getting your dev environment setup is particularly relatable. 

Getting Started

To develop in LibreOffice, you really need some level of C++ skill, an ability to read other people's code, and be willing to learn and improve over time. However, you can still contribute Java and Python code to parts of the project (though in terms of unit tests, we really encourage converting Java unit tests to C++ tests eventually). 

Any newbies should start here:


and then have a look at:

https://wiki.documentfoundation.org/Development/How_to_build

One easy way of setting up your development environment is to use lode, which is a project that helps you setup a base environment to build LibreOffice:

https://wiki.documentfoundation.org/Development/lode

Once it is installed, check out how to use gerrit:


Be warned, depending on your setup this can take quite a bit of time and can be at times frustrating. We do our best to make things as easy as possible, but even building LibreOffice can take hours and hours to complete. 

Learn git

A bit of git experience is very helpful - luckily you only ever have to do a git clone once, but you should learn about branching, rebasing, pulling and pushing. I would strongly suggest using the logerrit scripts. I personally use the following to pull in changes onto my current working branch:

./g pull -r

I strongly recommend spending a few hours on the following graphical tutorial on git, Learn Git Branching. This is well worth doing regardless of whether you contribute to LibreOffice - knowing how git works to the degree that this shows you is a skill that is immensely valuable. This tutorial demystified a lot of git concepts for me, and helped me "get" how git does things. Eventually you will get stuck in some git mess, so knowing how to reset is very valuable. 

I also recommend learning how to use interactive rebase in git, which is done via:

git rebase -i

What this does is take your changes are change the "base" of the commit in question. You can move the commit up or down the version history, reword specific commit messages, "squash" commits into the parent commit (be very careful about doing this) or edit the commit itself. When you run git rebase -i it opens up your text editor - just read what it states in the editor to learn how to use it. 

If you do ever mess up a rebase, incidentally, you can use the git reflog to back out of your change. Check out the Atlassian reflog tutorial on this, could be a lifesaver. Another tutorial I used before I found the Atlassian one can be found here, try the Atlassian one first though. 

Hot tip for vim fans

If you are a vim fan, then I suggest the following setup for your .vimrc file:

filetype indent on
set tabstop=4
set shiftwidth=4
set expandtab
highlight ExtraWhitespace ctermbg=red guibg=red
match ExtraWhitespace /\s\+$/

augroup trailing_whitespace
  autocmd!
  autocmd BufWritePre *.c :%s/\s\+$//e
  autocmd BufWritePre *.h :%s/\s\+$//e
  autocmd BufWritePre *.cxx :%s/\s\+$//e
  autocmd BufWritePre *.hxx :%s/\s\+$//e
augroup END

We don't use tabs, we use 4 spaced-tabs. Some time ago I discovered a way of showing extra whitespace at the end of lines directly in vim, and then later after some discussion on the LibreOffice developer mailing list I discovered how to safely remove trailing whitespace on write

Start Hacking!

When you are modifying the LibreOffice codebase, you'll find that you'll spend more time reading and grokking the code than actually modifying it. I'd consider this normal, so if you find this is occurring don't fret - we all have to do this. The LibreOffice codebase is massive, I think from memory it has more lines of code than the Linux codebase and has been in continuous development since 1988. 

When I started on LibreOffice hacking, I found I was most interested in finding the code entry point. Frankly, I was only initially reading the code for pleasure (yes, I'm strange that way) and so I used http://opengrok.libreoffice.org to browse the code. 

However, many years before I had actually attempted to compile OpenOffice.org - and failed rather spectacularly as I ran out of disk space. Whilst this was occuring I did a lot of reading of OpenOffice.org developer manuals - I really recommend reading OpenOffice.org's Developer Guide. Don't be put off by the fact that it is hosted on the competing Apache OpenOffice site, it's really very high quality and whilst the LibreOffice codebase is more actively developed there is still a lot of really useful information on the Apache OpenOffice sites. 

Once you have read enough, have a look at our EasyHacks list. These are a list of things that need to be done in the LibreOffice codebase that are designed for newbies - and if you ever want to participate in Google Summer of Code you'll need to demonstrate that you've worked on a few bugs or submitted a few commits to deal with easy hacks. You'll also know that you've contributed to a codebase that literally millions of people use every day, and your name will be in the commit history. :-)

Join in with the development community

LibreOffice is a very friendly environment. We mostly go out of our way to help out newbies, as we want to encourage as many participants as possible. A such, you can join us on the #libreoffice-dev IRC channel on Freenode. We don't largely answer user questions (for that there is #libreoffice), but if you are getting stuck on something feel free to ask on the channel. 

The other way of joining in is to subscribe to the developer mailing list. You can either subscribe to the digest, or get every message directly - in which case I suggest setting up a filter or else you might get drowned out in emails. If you do get digest messages, please don't reply to the digest directly but copy the message you want to respond to into the email, and retain the original subject line, prefixed with "Re:". We suggest setting up the Reply-To header on your mail client, but it's good practice to ensure that the original mail receipients are cc'ed into your responses. 

(I'm also a chronic abuser, but please don't top-post. Don't be me.)

Commit messages

A final tip: commit messages are really important. I personally tend to write longer messages detailing the work that I've done, but that's just my own style. The most important thing about commit messages is they must be to the point and explain what you have done, not what the code used to do. If your code change is complicated or not necessarily easy to follow, then it's best to try to leave an explanation of what you are doing. The commit title should be clear enough that someone browsing the git log in cgit can understand the purpose of the commit. 

If your commit is easy to understand or minor, then only a commit message title is needed. If you need to explain things in more detail in the commit message body, as I said, you should state what you did. 

An example that shows both what you should and shouldn't do is where I fixed an issue where the PPI wasn't being exported to JPEGs correctly

Good things about my commit message
  • Note that as I fixed a bug, I prefixed the commit message short title with tdf#85761. There is a bot that reviews the git logs and if it finds this in the message title it automatically updates the bugzilla bug. You can only reference one bug however.
  • I summarised my change that we don't hardcode JPEGs to 96 DPI any more but use mapmode pref size to get the DPI, and moved the scaling function from the EPS filter code to the MapMode class. 
  • It explains what the problem actually is, as it was a bit difficult to understand the cause of the issue from the original bug report. 
Bad thing:
  • I still spent too much time explaining what the code originally did (thus violating my own recommendation I give in this blog post).
FWIW, it can still be useful in explaining what the code did, but I'd leave that to a note in the last paragraph - and I recommend you make it brief.