Ceylon-lang.org

Abstracting over functions

2013-01-21

Last week, I was finally able to write the functions compose() and curry()
in Ceylon and compile them with the Ceylon compiler. These functions are
general purpose higher order functions that operate on other functions at a
very high level of abstraction. In most programming languages, it's very easy
to define a compose() function that works for functions with just one
parameter. For example, in Ceylon:

In this function signature, f() and g() are functions with one parameter where
the return type of g() is the same as the parameter type of f(). The resulting
function is also a function with one parameter, and has the same return type
as f() and the same parameter type as g().

Now, things get trickier when g() has more than one parameter. Imagine the
following definition of format():

This function has the type String(String,Float|Integer), which can't be a
Y(Z), no matter what argument types we choose or infer for Y and Z. To
see this more clearly, and to see how we can come up with a more powerful
definition of compose(), we're going to have to understand how Ceylon really
represents function types. But first we're going to need to understand tuples.

Tuples

Ceylon recently grew tuples. I've heard all sorts of arguments for including
tuples in the language, but what finally convinced me was realizing that they
were a simpler way to represent function types.

So a major goal of adding tuples to the language was to add zero additional
complexity to the type system. So tuples are instances of an ordinary class,
Tuple, which you can see defined
here.

A tuple is a linked list where each link in the list encodes the static type
of the element. Without syntax sugar, we can write:

I bet you're wondering what type point has. Well, if you really have to
know, it's:

Phew! Fortunately we never see this type, because Ceylon lets us abbreviate it
to [Float,Float,String], and the IDE always shows us the abbreviated form.
We can also use the square brackets to instantiate tuples, letting us write:

(Yep, tuples are square in Ceylon. Don't worry, you'll quickly get used to
that.)

A tuple is a sequence (Tuple is a subclass of Sequence), so the following
is well-typed:

Now, what makes a tuple special is that we can access its elements without
losing their static type information. We can write:

I should emphasize that we're not seeing any magic here. You could write your
own MyTuple class and reproduce the exact same behavior!

I bet you're wondering what happens if I type point.rest.rest.rest.first.
Well, take a closer look at the verbose type of point again. The chain of
"links" is terminated by Empty, Ceylon's empty sequence type. And the type
of first on Empty is Null. So we have:

That is, the expression is provably null. Provable within the type system,
that is.

Of course, we don't want to make you write out suff like point.rest.rest.first
whenever you need to get something out of a tuple. A tuple is a sequence, so
you can access its elements using the index operator, for example:

But, as a special convenience, when the index is a literal integer, the compiler
will treat it as if it were a chain of calls to rest and first:

A chain of Tuple instances doesn't need to be terminated by an Empty. It may,
alternatively, be terminated by a nonempty sequence, any instance of Sequence.
We can write the following:

The type [Float, Float, String*] is an abbreviation for:

It represents a sequence of two Floats, followed by an unknown number of
Strings. So the following is well-typed:

We'll now see how all this is useful for abstracting over function parameter
types.

Function types

An instance of the interface Callable is a function.

Going back to our function format() above, its type is:

You can see that we've represented the parameter types of the function using a
tuple type. That is to say, Ceylon views a function as accepting a tuple of
arguments, and producing a single value. Indeed, Ceylon even lets use write
that explicitly:

Here we "spread" the elements of the tuple args across the parameters of the
function format(), just like you can do in some dynamically typed languages!

Now consider a function with a variadic argument:

This function accepts a String, followed by any number of Floats and
Integers. We can represent its type as follows:

We usually abbreviate function types, writing String(String,Float|Integer)
or String(String,Float|Integer*) for the function types above.

Ceylon also has defaulted parameters. The function type String(String,String=)
means Callable<String,[String]|[String,String]> and represents a function whose
second parameter has a default value.

Notice how, given the definitions we've just seen, assignability between function
types works out correctly:

a String(Object) is an instance of Object(String),

a String(String=) is an instance of String() and of String(String), and

a String(String*) is an instance of String(), of String(String), and of
String(String,String).

Function composition and currying

Finally we have the machinery we need to define compose() and curry().

The signature of compose() is:

Notice how the type parameter Args abstracts over all possible parameter lists,
just as it does in the definition of Callable. To actually implement compose(),
I'm going to need to resort to two native functions that I can't yet implement
within the language. I can write down their signatures, however, so this isn't
a limitation of the type system itself:

If you're not extremely interested, you can skip over these declarations, and
just look at an example of what they do:

That is, unflatten() takes any function with any parameter list, and returns a
function that accepts a single tuple of the same length as the original parameter
list. On the other hand, flatten() does the opposite. It takes a function that
accepts a single tuple, and returns a function with a parameter list of the same
length as the tuple.

OK, now we can implement compose():

Perhaps I should unpack this slightly for readability:

The definition of curry() is similarly dense:

This function accepts a function with at least one parameter and returns a function
with two parameter lists, the first parameter list has the first parameter of the
original function, and the second parameter list has the rest of the parameters.

Great! Now we can write:

A final word

This post has been long and quite involved. I hope some of you made it this far.
This isn't typical Ceylon code we're looking at here. We're starting to move into
the field of metaprogramming, which is just becoming possible in Ceylon. What
I'm really trying to demonstrate here is how Ceylon layers sophisticated constructs
as pure syntax sugar over a relatively simple type system. I'm trying to show that
you can have things like tuple and function types without defining them primitively
as part of the type system, or resorting to inelegant hacks like F1, F2, F3,
Tuple1, Tuple2, Tuple3, etc. And indeed it works out better this way, I
believe.