Planet.documentfoundation.org

Collabora Online Development Edition 2.0

Subscribe to Planet.documentfoundation.org

2016-11-02

Today
we release
CODE 2.0 which includes Collaborative Editing. We've done a huge amount
of work since CODE 1.0 - and many of these improvements have been back-ported
for our customers & community, but it is perhaps well to credit the authors
in one place and survey progress over the last six months.

There have been 1200 commits to 'online' and around 500 to the LibreOffice
core since we branched for 1.0. While we celebrate and credit the fantastic work
of the LibreOffice community without which there could be no LibreOffice Online - it
is clear that Collabora is the architect and driving force behind putting LibreOffice
in the Cloud - we have 14 online commits from non-collaborans. On the flip side we're
eager to change that - that there is lots of low hanging fruit here, for web developers,
and we love contributions. Anyhow - to the details:

Testing & Debugging

To 2.0 our unit tests improved very significantly with 230
commits from ~all developers; catching all manner of horrible corner case
errors from crashing lokits, to out of disk-space conditions.

Another very powerful tool created by Laszlo Nemeth is the

client JS debugging mode pictured:

This has a large number of really useful development features:

Showing where the last series of invalidations
occured, the most recent in block red, older with red
borders - notice how LibreOffice has quite efficient
invalidation / re-rendering behaviour.

Can't be seen in this shot - but rendering in blue
invalidated tiles waiting for refresh, and showing which
tiles are served from the cache in yellow.

Lots of measures of latency, count of tiles rendered etc.

Logging of all protocol messages to and fro to the JS
console when this mode is enabled

a 'Typing' check-box, you can use to get stock text
typed in at your cursor for stress testing.

These invaluable debugging features make isolating the locus of
a problem extraordinarily easier.

WOPI operation timing and measurement improvements - to
allow us to improve load performance thanks to Pranav Kant along
with improved logging to better capture the volume of diagnostic
output we want thanks to Jan 'Kendy' Holesovsky & Ashod Nakashian.

Session record and replay - an extremely useful
diagnostic approach for logging and reproducing user problems, of
course with horrible privacy implications but great for debugging,
and also benchmarking:

Concurrency

While each LibreOffice Kit process is fundamentally single-threaded
and isolated into its own jail, we serve many hundreds of clients per WSD
daemon involving a certain amount of concurrency excitement. In
2.0 Ashod Nakashian removed a per-connected-client thread count
from the slave kit processes significantly simplifying things, he
also did a lot of lock auditing, session cleanup and admin console state
synchronisation improvements.

Meanwhile Pranav Kant ensured that users' WOPI credentials
are tracked such that the right user is tagged as having hit the explicit
'save' (of course we also auto-save regularly).

Other cleanups and fixes here include switching to un-named pipes
thanks to Tor Lillqvist, catching and fixing
races around tile rendering (Ash) and avoiding a race when loading
the same document multiple times concurrently thanks to Kendy.

Performance &Memory

Performance and interactivity is significantly improved in 2.0
with a large number of improvements. Tile requests are now prioritized
such that tiles near active cursors are rendered and returned before
others thanks to Ash and Kendy who also re-wrote the tile request,
combining, de-combining, and re-aggregating in the core to ensure that
we render the largest area possible to avoid the costs of repeated
in-document image scaling. Ash also added a tile versioning
scheme to ensure that we always get the latest, correct tile under
heavy load and avoid duplicate painting. Kendy and Tor added
a prioritisation scheme for presentation slide thumbnailing to improve
interactivity.

Other improvements are to use LD_BIND_NOW to have
all the linking performed by the forkit process once and not again,
Ash also improved the forkit child spawning and lifecycle
management.

Pranav profiled and optimized the WOPI GetFileInfo usage
to reduce it to a single call from three, some implementations have
unexpectedly expensive implementations of this. Marco Cecchetti -
made SSL support run-time configurable to make it easier to
configure SSL off-load / acceleration in large setups and Ash
added compiled-in limits to avoid misleading enterprise users that CODE
is suitable for deployment at scale without support. CODE
remains firmly targetted at home users, Henry - added a warning for too many users/connections.

User Experience

The most obvious improvement in 2.0 is collaborative editing,
which has a lot of UX elements to enable users to see other users' cursors,
selections, with name popups when they are active, or mouse-overed, and more
thanks to Henry Castro. There are also a number of nice new elements
such as the user-list pop-up in the bottom toolbar containing the
colors of users' cursors thanks to Pranav you can also click to
jump to that user's position in the document which is neat.
Pranav also dunged out lots of the shared editing pieces from 1.0.
Meanwhile Miklos Vajna did lots of heavy-lifting around Undo & Redo
which gets significantly more complex with multiple users. Our eventual solution
is of building an infrastructure for simple de-confliction, while also
allowing a Repair Document power mode to unwind the rare

-a,

issues. This seems to work
really well under our normal collaborative editing conditions.

Various general cleanups have been implemented such as re-locating,
and anti-aliasing the progess-bar and spinner thanks to Pranav,
who also added git hashes to help->about dialog - with Andras
adding cgit web links to make QA versioning trivial.
Andras also did lots of l10n enablement work to the Admin Console,
slide layouts, status bar, colour pickers, menus and context menus, while
Pranav added context sensitive enablement to menubar items.
Meanwhile Kendy switched us to use larger icons, and
improved the toolbar look, while Andras made it easy to escape
menus with that key. Meanwhile Henry added status bar items to all
of the component - with much requested features like calc selection quick
summing, and powerful word-count in writer.

Calc Improvements

One beautiful win for calc users is Henry's tweak to highlight
row and column headers to show cursor and selection positions. He also took
the time to implement lots of other nice wins around row & column headers
from drag & drop re-sizing, double-click to optimial size, as well as
adding context menus to show & hide rows & columns.

Meanwhile Marco fixed a pernicious range issue with
page-up/down to make editing smoother while Pranav fixed some
formula-bar ergonomics and Henry tackled the interactive auto-sum
functionality. Henry also extended the toolbars and menus to include
sorting, simple number formats, merge & center, wrap text and more.
Meanwhile, for this release, Andras disabled zooming in calc -
interestingly our competition don't do this either - and the non-linear
co-ordinate space caused by row height rounding was causing real issues.

Writer Improvements

Meanwhile in writer Andras significantly extended the format
menu allowing simple page size, line-spacing, alignment and other simple
formatting. Pranav improved the context menus exposed
on in-document comments, to allow replying and deleting, while Andras
added foot & end-notes, page & column break insertion, as well as
Table of Contents and image wrap & anchor context menu items for one of
our customers.

General UX bits

Lots of other miscellaneous but useful wins went in such as Henry
fixing
-f to focus the 'find' toolbar and Pranav mapping

-s to .uno:Save. A volunteer Feyza Yavuz added the insert
comment toolbar button. Faruk Uzun implemented a draggable insert table
toolbar button grid, and Pranav customized a nice read-only mode for
users without edit permission.

Collaboration

Collaboration was the largest, and the most substatial part of the 2.0 work.
It is hard to list everything that was done there though - it is
infrastructure work, many parts of that are implementation details, it was
done incrementally, and also with some trial & error on the way; really
more of a research task than just plain development.

Conceptually, we wanted to reuse as much of existing code as possible - as we
always do - which was luckily possible for the collaboration too. LibreOffice has
a feature that allows users to open multiple views of the same document: Try
Window -> New Window in your desktop LibreOffice. You will see
that this already gives you multiple cursors, and multiple selections.
We "just" mapped the multiple document views to the multiple users. That
means, on the server, there is only one document open, but with many views -
one view for one user.

Of course, the hard part was to make it all fit together: Each user (view)
needs to see the cursors and selections of the other users (views) which is
something that LibreOffice did not provide previously. There were many bugs
when update in one view did not trigger update of the other view (which is
probably because the feature itself has not been much used until now). Lots
of code had to be updated and fixed. And also, lots of decision had to be made
about the updates - what change has to be broadcasted to all the views ? vs.
which changes apply only to the current view, and what to other views than
the current one. Not to mention all the corner cases when one of the views
is closed in the middle of the broadcast to all the views, etc. You get
the picture.

To mention the most important parts of this work:

Extension of the multiple-view rendering infrastructure and the
appropriate changes in the LibreOfficeKit API had to be done before anything
could have been exposed to the daemon and JavaScipt parts. (Miklos, Ash)

Then it was necessary to broadcast the events (like selection
changes) accordingly, and make the JavaScript UI render it. (Miklos, Ash,
Pranav, Henry, Marco)

To optimize the amount of messages that are sent from the server
to the client, work had to be done to elide redundant invalidations. On the
other hand, there were cases where the invalidates were missing, and had to be
added. (Miklos)

In the UI, tracking & rendering of multiple, colored cursors
had to be implemented. (Pranav, Ash)

Another tough nut to crack was Calc - with its concept of cell
cursors; multiple cursors, concurrent cell editing had to be implemented or
fixed in many areas. (Marco)

Miscellaneous

To communicate between the main webpage that embeds the iframe
and the content of the iframe, the postmessage API had to be much improved.
(Pranav)

Due to the asynchronous nature of everything, new requests
may come at any time - so it is necessary to stop accepting new
requests during termination. (Ash)

A new API for change tracking & colors had to be implemented to
be able to have the document cursor always black for the current user, but
other colors synchronized for the other editors.(Miklos, Pranav)

The WOPI discovery XML was extended to provide read-only access
for formats we cannot export eg. Visio, and many more such formats
were added (Miklos)

Sometimes it is necessary to cancel the tile requests - for
example when the user scrolls quickly, and far away from the location where he
or she was editing previously. But there is an exception - we must not cancel
requests for slide thumbnails. (Tor)

For the JavaScript part of the solution, we use several JavaScript
libraries, but in order to be self-contained, we cannot just reference them -
instead, they are shrinkwrapped and packed all together. We also made our CSS
much more browser-friendly - bundled and browserify'ed it. (Pranav)

One of the conditions that can negatively affect the entire
solution is when the server running Online gets low on storage space - a
warning was implemented to handle that. (Tor, Pranav)

All websocket URLs were unified so that it is easy to create HA
balancing rules. (Pranav)

As browsers cache JavaScript and CSS, it was necessary to
start versioning all served assets, like l10n, images etc. in the path name
to avoid problems when upgrading from one version of Online to the
new one.(Pranav)

Packaging pieces were updated, and various debian rules
improved. (Andras, Katarina Behrens (CIB))

As the amount of code grows, the documentation becomes
increasingly important. Now all the classes have their doxygen
documentation. (Ash)

To improve the user experience, the JavaScript dialogs were
migrated from simplemodal to vex (Pranav)

Debuggability is important too. Due to the nature of the solution,
it is non-trivial to run the loolwsd daemon under valgrind; consequently a
make rule make run_valgrind was introduced to make its execution
under valgrind easier. (Ash)

All the work that uses LibreOfficeKit was implemented in the
gtktiledviewer first, because it is much easier to implement it there: It
avoids most of the asynchronicity that is necessary in the communication
between the server and the JavaScript part. Thanks to that, gtktiledviewer
(and the underlying Gtk+ widget) was tremendously improved. There is a
large amount of functionality exposed that could be used to improve GNOME or
KDE document viewers and simple editors. (Miklos)

Bug fixes

Of course, on top of the new features, various people have contributed a
tremendous amount of bugfixes. To list the most important ones:

Different views (for users who edit the same document concurrently)
can have different zoom levels, which is obvious from the user point of view,
but has many corner cases and usage scenarioas that may go wrong - now the
status bar zoom levels are correct across different views (Ash)

It is important to handle permissions correctly. Now the
integrators can set whether the document should be read-only or read/write
easily, because we honour the WOPI UserCanWrite parameter (Pranav)

The most substantial feature of Online is that the documents
are rendered the same with the same look and feel as in the desktop version.
To be able to do that, there were myriad tile rendering / invalidation fixes
needed (Ash, Miklos)

The iframe with the menu, toolbar, document itself and the status
bar is non-trivial, and for the flawless user experience, many focus problems
were fixed. (Henry)

Good logging and error reporting is essential for both for admins
(so that they know what to fix during the setup) and the users (so that they
know how to get their work done even in the non-ideal conditions - like
unstable netwerk). That lead to improved logging & user notification of
various errors (Tor)

The loolwsd daemon needs quite a lot of control over the system,
because we spawn lots of child process, create chroots for them, etc. But for
security reasons, we don't want to use setuid of course - instead we set
only the capabilities that are missing; and had to check and warn when
capabilities are missing. Unfortunately Ubuntu 14.04 LTS doesn't support
these file-system capabilities.(Tor)

Fixed X-WOPIOverride, a header that is used in the WOPI response.
(Aleksander Machniak (Kolab)), and handle unexpected WOPI URLs inncluding
spaces for a customer. (Pranav)

The Calc (spreadsheet) and Impress (presentations) use a concept
of "parts" - they are the separate sheets in the Calc, or the slides in
Impress. Writer has pages - but they are only a flow of text that can easily
run from one to another, so we don't use that concept there, and it was
necessary to unwind Writer parts invalidation, and rendering. (Ash)

The entire solution is highly asynchronous - the server must not
wait for the JavaScript clients that may easily go away (on connection loss,
user's action), and many race conditions had to be fixed - including races
between sheet switch and part setting etc. (Marco)

Another race fixed was between rendering and invalidation
that caused missing last character typed from time to time. (Kendy)

When the user does not edit the document, it goes inactive - the
changes are not sent until the user clicks the document again. This is to
save bandwidth and the server CPU. But of course, the views need a refresh on
UI re-activation. (Ash)

The time before the document becomes inactive needed tuning for a
good user experience, so the idle time was increased after testing (Ash)

We expect that potential attackers might want to try to inject
unknown or bad commands, and use that for the communication with the server,
so the input sanitisation was improved. (Pranav)

Another related bugfix was related to potential guessing of the
location of the chroots, and the random directory creation was improved. (Ash)

Resizing of the toolbars would sometimes trigger an infinite
layout loop - this was fixed. (Pranav)

Summary

While 99% of the code comes from Collabora, many thanks again to
everyone who contributed including the LibreOffice translation
community. As announced at the LibreOffice conference - we're
working hard to get a first up-stream release of LibreOffice included
in LibreOffice 5.3 - and to make daily image builds available for
testing as we move towards that.

We've done a huge amount of work here; this is just a short
summary of it, and it is an amazing privilege to be able to work with
such a talented set of developers at Collabora and across the wider
community around LibreOffice. Why not head to and
get involved with LibreOffice, or deploy the stable 1.0 version of Collabora Online in your
organization - while we harden Collabora Online 2.0 for your delectation.