2014-05-16

In a recent post on his blog
Matthew Horsfall explained a few of the optimizations to Perl that he contributed recently.
On this site, we reported on the
fantastic work that Dave Mitchell is doing.
The two of them inspired me to try my hand at this optimization game at a recent
hackathon at work by improving the Perl optimizer.

But first, let's discuss a bit about how the Perl compiler works and its work products.

The output of the Perl compiler is two-fold. The first is
an in-memory data structure, that resembles a traditional compiler's
Abstract Syntax Tree. This is called
the OP tree since it consists of struct OPs that generally represent some
OPeration to be performed. The second is a linked list that connects (some of) these OPs
in the order they are to be executed[1].

So for the code my $a = $b + 3, we have this tree

where sassign indicates scalar assignment, and padsv is a lexical variable access.
It helps to think of perl as a stack machine, so the linked list for execution would be
this sequence (with annotations on what they actually do):

Things get just a tiny bit more complicated if you consider operations
that take a variable number of items as parameters such as taking a slice
of an array. Consider @x[1,2]

The array slice needs two pieces of information: a list of indexes to slice
from the array, and the array itself. In Perl, the way to pass an
arbitrary number of items to an operation is to remember the stack
position (which is what pushmark does[2]) and then simply push the
items onto the stack. The recipient simply looks at the most recently
remembered stack position and pops its parameters off the stack until it
reaches this position. aslice is such an operation that receives
a variable number of parameters, so the first thing to do is to
execute a pushmark. The execution order of operations here is:

This is where it gets a bit weird. One would
expect to simply execute the two consts and the padav now. That
would correspond to this tree

and this execution order.

The list operation has a very simple purpose. It looks up the
most recently remembered stack position. Then it checks which context
it was invoked in (context in the normal Perl sense of context).
If it was invoked in list context, then it will simply do nothing and
leave the items on the stack. If it was invoked in scalar context,
it will only leave the last, topmost item on the stack.
Let that sink in for a moment.

Indeed, in list context, list simply undoes the effect of one preceding
pushmark. The two operations of the inner pushmark and the list
cancel each other out entirely. This is one of the places where the OP
tree's dual nature as an AST and as a data structure intended for the
actual execution of the program shows.

And here is where my optimization comes into play:
In Perl 5.20, the optimizer is just a tiny bit smarter now because it can
detect list / pushmark pairs that will be a no-op. It modifies the
execution chain to not execute them while leaving the AST intact
for decompilers and similar tooling. Because the internet has informed me
that optimizations without benchmarks to "prove" their effectiveness
are meaningless, let's indulge in some micro-benchmarking while investing
some effort into making the result somewhat meaningful. What we'll
benchmark is simply array slicing @y = @x[3,7].

This type of graph is generally referred to as a box plot and
it shows the measurement results with additional information on the variability
of the repeat measurements. "after" refers to a perl built from commit
7d3c8a6837b55fff
whereas "before" is a perl built from the commit prior to that.
This particular example benchmark shows a statistically significant speed-up of 8%
in slicing arrays. You can see similar speed-ups in hash slices, certain map
blocks, and a few other list-type operations. For a day's work on a very mature
code base, that's not too shabby at all!

[1]
Technically speaking, there are many such linked lists since a logical
operation will create a branch. So it's really a tree. But many
stretches of code will be contained in linear sections, so please
excuse me for the slight oversimplification.

[2]
The name pushmark stems from the fact that the way to remember multiple
stack positions in nested operations is done with another stack, the
MARK stack. So pushmark does very much what's on the tin.

Show more