Planet.documentfoundation.org

LibreOffice's under-the-hood progress in 4.1.0 (beta)

Subscribe to Planet.documentfoundation.org

2013-06-13

Rather soon we will be releasing LibreOffice 4.1—currently we're in
a Beta phase of that, and we appreciate people getting stuck in and helping
with testing. You can download builds from here pre-releases or
if you like some up-to-the-hour builds from
dev-builds.

We're still building our list of features and credits. We have a
number of new visible
features of course with credits against them. Cor has made a pair of beautiful
blog entries highlighting UI improvement and the Photo Album features in 4.1.
That made me think of the many developers who
have been working extremely hard on things that are under the covers and not
so easily seen, but are still really important. I'd like to explain just some
highlights of that here, (crediting the developers' employer where there is one at
the first mention). Often these are tasks that are easy to get involved with,
and may seem trivial in isolation but cumulatively add up to a code-base that is
far easier to understand and to contribute to.

Build system: configure / make

One of the tasks that most irritates and has distracted new developers
from doing interesting feature work on the code-base over many years has been
our build system.
At the start of LibreOffice, there was an incomplete transition to using GNU
make, which required us to use both the horrible old dmake tool as well as
gnumake, with configure using a Perl script to generate a shell script
configuring a set of environment variables that had to be sourced into your
shell in order to compile (making it impossible to re-configure
from that shell), with a Perl build script that batched compilation with two
layers of parallelism, forcing you to over- or undercommit on any modern
builder; it looked something like this:

Thanks to the stirling efforts Björn Michaelsen (Canonical), David Tardon (Red Hat),
Peter Foley, Norbert Thiebaud, Michael Stahl (Red Hat), Matúš Kukan,
Tor Lillqvist (SUSE), Stephan Bergmann (Red Hat), Luboš Luňák (SUSE),
Caolán McNamara (Red Hat), Mathias Bauer (Oracle), Jan Holesovsky (SUSE),
Andras Timar (SUSE), David Ostrovsky, Hans-Joachim Lankenau (Oracle)
and more—(more details) the
126 thousand targets, and 1700 makefiles are now fully converted to GNU make
so we have the significantly simpler:

No shell pollution, no 'bootstrap' script, no Perl build wrapper, no
obsolete 'dmake' required, just plain GNU make files—and incredible build
parallelism—after generating headers, we could utilize a thousand CPUs.
This is a clean-cut task with a clear boundary; like the process of removing
dead code in previous releases, it is now complete—freeing up developers
for more interesting things.

Build system: make dev-install

LibreOffice, in contrast to much other software, is fully
relocateable—you can plonk it down where you like, and run it from there.
As such we use a
make dev-install to create an install set in install/
that you can run in the build tree. This process has traditionally been
performed by a Perl script using a convoluted set of pre-processed rules, to
achieve what is (mostly) a copying operation. David Tardon has made some
great progress moving this to use much simpler file-lists that we auto-generate.
So—nowadays we have an instdir/ top-level (on which these file-lists
operate) that starts to mirror the install—the hope being to do away with the
make install phase for running inside the build tree. So far we have
more than 250 file lists, handling nearly 20k files.

This initiative makes it significantly easier to add or remove files
from install, and removes lots of zipping and un-zipping of sets of files that
used to happen during the build: thus making packaging a build faster: the
SDK packaging went from 90s to 30s or so, while also dropping lots of
scp2/ rules. The hope is that, when this is complete we will have
an office suite that is runnable out of the box after a make, without an extra
install phase.

Code cleanup / linting

A huge amount has been done here to make the code-base easier to
understand. Doing this makes it easier and quicker for us to read the code,
check it is correct, understand the flow—and so to add features or fixes.

sane includes

In the bad old days each module used to
have an inc/
directory inside itself
where its external include files were concealed. During the build
of each module, these were copied to a separate artifacts directory
(the 'solver') and the next module was compiled against those copies.
This lead to a number of problems
with debuggers identifying copies of headers, newbies editing the
wrong (solver) headers, performance issues on windows, and more.
So—thanks to Bjoern Michaelsen, Matúš Kukan,
Michael Stahl for moving
all the headers to
a single top-level include/ directory and de-crufting
the makefiles to make that nice.

tools cleanup

The tools/ module has a lot of
duplicate functionality that is not needed, in this cycle we removed
a complete duplicate file-system abstraction by writing it out of
the code, thanks to Tomas Turek, Krisztian Pinter, Thomas Arnhold,
Marcos Paulo de Souza & Andras Timar. It is always good for
security to remove yet another duplicate, cross-platform,
safe temporary file creation code-path.

String cleanups

We continued to make good progress on the
removal of the obsolete UniString class, with a couple more method
removals from Jean-Noël Rouvignac & Caolán McNamara.
In addition Lubos Lunak did a mass removal of redundant rtl::
namespace prefixes all across the code for OUString and OString -
making the code more readable, with a number of other significant
performance, and cleanliness improvements. Large numbers of call
sites were upgraded from UniString to OUString, had their redundant
RTL_USTRING_CONSTASCII macro bloat removed, and
used faster ways of concatenating strings—thanks to:
Olivier Hallot, Christina Rossmanith, Stephan Bergmann,
Chris Sherlock, Peter Foley, Marcos Paulo de Souza, José
Guilherme Vanz, Jean-Noël Rouvignac, Markus Mohrhard,
Ricardo Montania, Donizete Waterkemper, Sean Young, Thomas Arnhold,
Rodolfo Ribeiro Gomes, Lionel Elie Mamane, Matteo Casalin, Janit Anjaria,
Noel Grandin, Tomaž Vajngerl, Krisztian Pinter, Fridrich Strba (SUSE),
Gergő Mocsi, Prashant, Ádám Csaba Király, Kohei Yoshida—and more
I missed in the log (mail me).

component service registration

Noel Grandin continued
his indomitable
work to cleanup all call-sites that create components with new-style
service constructors, with lots of other associated improvements—around
two hundred and fifty new commits in 4.1.

Code quality work

Perhaps the least visible kind of improvement is crasher bugs that
are not there anymore. Clearly the goal is never crashing, but how do we get there ?
Markus Mohrhard worked on a lovely set of
automated
tests to load over twenty four thousand files—of the most evil and twisted
kind: ie. the contents of all bugzillas we could scrape. Thanks to some great work
from Markus, Fridrich Strba (SUSE), Michael Stahl, Eike Rathke (Red Hat) for
fixing the results, we hope users will enjoy fewer sightings of our ugly crash
dialog.

Another source of significant improvement, was the use of static
checking tools to increase code quality, and hence reliability. This release a
team started systematically going through the coverity data. This yielded nearly three hundred commits—thanks to:
Markus Mohrhard, Julien Nabet, Norbert Thiebaud, Caolán McNamara,
Marc-André Laverdière (TCS), and others. In addition Julian Nabet got
over sixty fixes from the cppcheck tool included. Lastly lint-wise, we continue to
use Clang and Lubos' nice plugins
to find and remove questionable code as it appears.

Another great tool we that has improved here is bibisect—allowing us to
have a git repository with binaries from every few dozen previous commits included
inside it. This allows
end-user testers to find very precisely where a given bug was introduced into
the product using bisection of lots of binary builds crammed into a single git
repository. Thanks to Bjoern Michaelsen & Canonical's QA labs for more
build hardware here.

We also built and executed more unit tests with LibreOffice
4.1 to avoid regressions as we change the code. These are rather hard to
measure, since people like to pile up new tests inside existing unit test
modules. By grepping for the CPPUNIT_TEST registration macro we
can see that that we added around a hundred such tests to 4.1—the majority
of these were added to calc, with significant gains in writer, chart2,
connectivity and impress. Thanks to Miklos Vajna (SUSE), Kohei
Yoshida (SUSE), Noel Power (SUSE), Markus Mohrhard, Luboš Luňák, Stephan
Bergmann, Michael Stahl, Noel Grandin, Eike Rathke, Julien Nabet, Caolán
McNamara, Jan Holesovsky, Thomas Arnhold, Tor Lillqvist, David Ostrovsky,
Pierre-Eric Pelloux-Prayer (Lanedo), Christina Rossmanith and others for
working on the tests.

Calc core refactoring

One of the reasons why Calc gained so many, badly needed, systematic
unit tests for previously un-covered code, was the very
significant re-factoring work going on in the core. For many years, calc was
architected under the delusion that a spreadsheet is composed of cells -
which created some serious scalability and performance problems. The end goal
of this work is to kill ScBaseCell completely—and move to storage
of spans of contiguous data of uniform type down a column. Some of the initial
work for this is in place in 4.1, but the full benefit will have to wait
at least until 4.2 or even later versions when we can make further adjustment to
take full advantage of the new cell storage structure. The aim with 4.1 is to have
no visible performance regression, perhaps some minor speedups and memory footprint
reductions in some areas, but more importantly, better code maintainability thanks
to the separation of cell broadcaster mechanism from the cell storage itself.
Thanks to Kohei Yoshida for his great work here.

German Comment Translation

Always encouraging to build the metrics, in the last release cycle
we lost approaching five thousand lines of German comment: translated into
English. That helps new developers get started on the code, understand it
and get developing faster. The rough graph of this (which unfortunately
includes a number of false positives for lines of German) looks like this:

With many thanks to Urs Fässler, Christian M. Heller,
Philipp Weissenbacher, Luc Castermans, David Verrier, Chris Sherlock,
Joren De Cuyper, Thomas Arnhold, Philipp Riemer, and others. Help
appreciated from German speakers with translating the last sixteen-thousand
lines—it's a matter of checking
the code out and running bin/find-german-comments on a module,
translating a few lines and mailing a git diff to libreoffice At
lists.freedesktop.org (no subscription required).

Completed Wizard conversion to python

Java remains an excellent, if not preferred environment for
writing cross-platform extensions. All the existing Java support and
APIs remain as before. Having said that—on some platforms Java is not
available, and as such using our bundled, internal python runtime makes
good sense for built in features.

This release we completed porting the Java wizards, which can be
used in the File->Wizards menu, to Python. This should give a better
experience for Windows users who are not lucky enough to have a JRE
installed. Many thanks to Xisco Fauli, and Javier Fernandez (Igalia)

Linking & startup

One of the key features required to get the LibreOffice prototypes
running on Android and iOS was to be able to link nearly all our code into a
single shared library (Android) or executable (iOS). This work is re-used with an
--enable-mergelibs configure option—which aggregates much of
the common code of LibreOffice into a huge, single shared library: much as is
done with Mozilla. This is increasingly the default choice for Linux distribution
builds, and should yield improved seek and hence cold-start performance. Work
remains to be done on code re-ordering, and PGO to further improve startup
performance. Many thanks to Matúš Kukan (for the Raspberry Pi Foundation)
and Tor Lillqvist for working on this.

Another startup performance feature kindly funded by the Raspberry Pi
foundation is to reduce the amount of configuration data pointlessly parsed
during startup. One nice win in this area was removing fourteen thousand lines
of data for printing sheets of labels from our configuration, and defering
that parsing, until someone wants to print a label, thanks to Matus Kukan
for that too.

New type format

The programming interfaces that are used in LibreOffice require type
information to inform their work, particularly for scripting. In the past this
was stored in some ancient, inefficient, legacy binary database. Thanks to
Stephan Bergmann (Red Hat) we now have a new, more efficient and compressed
binary format, with our main offapi.rdb shrinking ten-fold from
6.5Mb to 0.65Mb, more details in his Well Typed Uno talk at FOSDEM.
So far this format is used only for private, internal type information, and we
plan to remain fully backwards compatible for extensions that provide old-style
type information. Documentation of the format is availble in the source tree:
nowadays we have increasingly detailed structural / overview documentation in
each module's README file.

Miscellaneous

Other areas showed some great improvements:

Time

The resolution of the time-related datatypes in UNO (LibreOffice's API)
has been increased to nanoseconds, from centi- and milliseconds.
This is mainly useful in Base,
where LibreOffice will not anymore truncate
times and timestamps to centiseconds,
nor durations to milliseconds,
in user data. Lionel Elie Mamane

Base

In a form, DatabaseListBox now exposes the selected value(s)
(as opposed to the selected display strings)
to the scripting interface. Lionel Elie Mamane

UI migration to Glade XML

The UI migration to Glade layout based XML files continued
apace with contributions from many individuals, we managed to go from
64 .ui dialog descriptions in 4.0 to 230
in the 4.1 branch (so far): quite a jump towards completeness at five
hundred dialogs—thanks to Caolán McNamara, Krisztian Pinter,
Jack Leigh, Alia Almusaireae (KACST), Katarina 'Bubli' Behrens,
Abdulaziz A Alayed (KACST),
Jan Holesovsky, Faisal M. Al-Otaibi (KACST), Abdulmajeed Ahmed (KACST),
Andras Timar, Manal Alhassoun (KACST), Bubli, Albert Thuswaldner,
Olivier Hallot, Miklos Vajna, Abdulelah Alarifi (KACST),
Gokul Swaminathan (KACST), Rene Engelhard, and
others. It is also worth mentioning the great work done by
translators to check & update strings here. The most significant
benefit of the UI migration is finally making it extremely painless
to tweak and improve the user interface.

Debugging output

There are new SAL_INFO and SAL_DEBUG macros which make it easy to add
filtered, or temporary debugging output. Our git hooks warn if you
leave any SAL_DEBUG statements around on commit too.

Gallery building

LibreOffice has been lumbered with
a rather hideous format in which to store galleries. We generally ship
the gallery images as standalone files, but have a set of binary
resources containing thumbnails of these, and unqiue integers to refer
to their translated names which ship in the libreoffice binary. In
4.1 we build most of these on each platform during compile, making them
easy to extend, (and avoiding having impenetrable binaries in git), and
we translate the theme name with a new .desktop syntax
file alongside. This also should make it easy for users to build their
own galleries as extensions and ship them with translated names.

Intermediate SDF removal

While we removed SDF from our developer facing translation flow
for 4.0—we still generated some SDF files as temporary build
intermediates. Thanks to Tamás Zolnai for moving us to a pure
.po solution.

Getting involved

I hope you get the idea that more developers continue to find a home
at LibreOffice and work together to complete some rather significant work both
under the hood, and also on the trim. I've enjoyed hacking on several of these
improvements. Our hope is that as the on-ramp to the project gets less
precipitous, people will join us, and find out how fun, and how much easier
it is to improve the code these days. You'll also be in good company—first in
terms of the number of code contributors to collaborate with:

And also in terms of diversity of code commits, we love to see
the unaffiliated volunteers contribution by volume, though clearly the volume
and balance changes with the season, release cycle, and time available for
mentoring:

Of course, we maintain a list of small, bite-sized tasks which you
can use to get involved at our Easy Hacks
page, with simple build /
setup instructions. We now have a cleaner, and safer environment to work
on improving the code.

One of the easiest things to do is to help out with bug reporting, and
bug triage (confiming
and quality checking other people's bug reports), you can be an effective triager
with little experience, and good bug reports really help developers out, just
grab and install a pre-release and
you're ready to contribute alongside the rest of the development team. Even
better you could get involved with the fun QA
Bug Triage Contest and win a prize.

Conclusion

LibreOffice 4.1 will be another milestone, and we hope a yet-higher
watermark for code-quality, design improvement, and incrementally more solid
foundations for improving the best office suite in the world. Of course, with
so much changing, we really appreciate early testing of our betas and
release candidates, which (we hope) should be useful for doing work with -
though save regularly and generationally. If you havn't time to test our betas
or release candidates, our time-based
release plan predicts our final release date at the very end of July.
Thank you for supporting LibreOffice.