Just about three years ago, I created the “automatic rule-based time
tracker” (arbtt), a tool that quietly runs in the background, records
your current windows and their titles, and allows you to later process
this data using, well, rules. By now, I have almost half a million
records, spanning 317 days of me working at the computer. Processing
this amount of data has become very memory-hungry: It takes 1G of RAM to
process my data. Other users, with a presumably higher sampling rate,
report even worse performance, which finally made me approach the problem again.
Naturally, one wonders why it is such a memory hog, after all it
takes just one run over the data to produce a report. The data itself is
a lazy list so the garbage collector should the be able to drop the
seen elements immediately.
But arbtt is able to print several reports in one run; there is even
an option (--each-category) that creates a number of reports based on
the data itself. The code was structured that first a number of values
that are possibly used by several reports are first created one and then
re-used by the reports. And then there is the issue of the filters: The
user specifies a what predicates are interesting (e.g., those where he
was active) and some reports need to know not only which samples were
matched, but also distinguish the consecutive runs of selected samples.
Lots of reasons that made it hard to not use the list of of samples
several times, which causes it to be completely retained in memory. The
idea of completely rewriting everything in a much more imperative way
was not appealing. But luckily Haskell’s good abstraction possibilities can help here.
What is the “correct” way to traverse a lazy list and calculate some information about it? It is a strict left fold, i.e. an algorithm that can be represented as arguments to
For those who are new to folds: The first argument is a function that processes one element from the list, returning the new state of the processing. The second argument is the initial state. And foldl' can apply such an algorithm to a list to return a final state.
So obviously I had to express my reports as left folds. To think clearer about it, I gave them a name. I also added a finishing step to the mix, which is handy:
A value of type LeftFold x a is then an algorithm that traverses a list of xes and returns something of type a. The forall s. means that the user of such a processor must not make any assumptions about the type of the state. I also have a function that runs such a list processor, simply by invoking foldl' with the arguments stored in the LeftFold constructor:
Now, as a Haskell programmer, I am searching for monads everywhere, and it really would have been nice if this were a monad, as they are very easy to compose. But unfortunately, they do not form a monad. If they did then I could decide after running one processor and looking at the result how I would want to process the list now, but this obviously contradicts with the desire to traverse the list only once.
But there is the monad’s smaller ugly sibling, the applicative functor. One can still somewhat nicely compose them to form more complex algorithms, just with more restrictions. Here is my instance:
The :!: operator is a strict tuple constructor. This is important, as otherwise the fold would be come a lazy one, I would be building up huge numbers of thunks and performance would be very bad. For the same reason every list processor should make sure that its state does not build thunks once it is in weak head normal form.
Now I am at a abstraction level that I feel comfortable working in, by writing functions that combine the list processors in various ways. Probably the most important one for me is already present in the libraries (here with a specialized type):
This is the whole trick: Now I can, based on the user input, run any number and selection of reports in one single list traversal. I find it very elegant that “under the hood”, the strict tuples are nested at precisely the right depth depending on the length of the list.
As hinted at before, I do some interesting stuff that was previously represented as lists of lists of values. Expressing these as left folds was a bit tricky, but again, the complexity is contained in the combinators that I wrote (although it still gets confusing, check out processIntervalReport in Stats.hs):
As a result, the program now only requires 10MB and runs 18% faster.
If you wonder why, in the definition of runLeftFold, I do not apply the finish function strictly: That is intentional; I am actually passing the result of a processor that calculates some often required data (e.g. total time recorded) as an argument to all the list processors. (Near the end of the main function in stats-main.hs) This “loop” works fine as long as I am using these only in the finish functions.
There is one downside to this approach: I cannot easily calculate stuff on demand and share it between two reports. Previously, I could just bind this to a variable, pass it to all reports, only if at least one is interested it would be evaluated and if several are interested, the result would be shared. I do not have a good solution for this yet.
The code above is in the self-contained module LeftFold.hs in the arbtt sources.
Several People have had this idea before, here is one blog post by Max Rabkin.
Flattr this post