Planet.lisp.org

Christophe Rhodes: http-content-negotiation-and-generalized-specializers

2014-03-13

I promised a non-trivial example of a use for generalized specializers
a while ago.
Here it is: automatic handling of HTTP
(RFC 2616)
Content-Type negotiation
in computed responses.

In
RESTful services
and things of that ilk, a client indicates that it wants to apply a
verb (GET, for example) to a particular resource (named by a URN, or
possibly identified by a URI). This resource is a conceptual object;
it will have zero or more concrete manifestations, and content
negotiation provides a way for the client to indicate which of those
manifestations it would prefer to receive.

That's all a bit abstract. To take a concrete example, consider the
woefully incomplete list of
books in my living room
at openlibrary. A user operating a
graphical web browser to access that resource is presumed to want to
retrieve HTML and associated resources, in order to view a shiny
representation of the information associated with that resource (a
"web page", if you will). But the human-oriented representation of
the information is not the only possible one, and it is common
practice in some circles to provide machine-readable representations
as well as human-oriented ones, at the same URL; for example, try:

and observe the difference between that and visiting the same URL in a
graphical browser.

How does the web server know which representation to send? Well, the
example has given away the punchline (if the links above to RFC
sections haven't already). The graphical web browser will send an
Accept header indicating that it prefers to receive objects with
presentational content types - text/html, image/* and so on; the
browser I have to hand sends
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 as
its Accept header, meaning "please give me text/html or
application/xhtml+xml, or failing that application/xml, or failing
that anything else". If the server has more than one representation
for a given resource, it can use this client-supplied information to
provide the best possible content; if the client has particular
requirements - for example, attempting to find machine-readable
content for further processing - it can declare this by specifying
particular acceptable content-types in its Accept header.

For a resource for which more than one representation exists, then,
the server must dispatch between them based on the client Accept
header. And this is exactly a non-standard dispatch of the kind I've
been discussing. Consider a resource http://foo.example/ which is
implemented by sending the return value of the generic function foo
back to the client:

The default behaviour is somewhat a matter of taste, but one
reasonable choice is that if no content-type matches we should use the
defined HTTP status code to indicate that the responses we could
generate are not acceptable to the client:

Maybe we have a couple of presentational representations for the
resource:

And we might have some machine-readable representations:

(I apologize to any fans of XML/RDF if I have mangled that).

Now a graphical web browser sending an accept header of
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 as
above will cause the server to send the HTML version, as that is the
most specific applicable method to that accept string. Given this,
it is perfectly possible to construct specialized clients with
alternative preferences expressed in the accept header. A
terminal-based client might prioritize text/plain over text/html
(though in fact neither w3m nor
lynx does that, at least in the versions I
have installed). A client for the Semantic Web might instead accept
data in serialized RDF, preferring more modern serializations, by
sending an accept string of text/turtle,application/rdf+xml;q=0.9.
All these clients could each be served the resource in their preferred
format.

The case of serving one of a set of alternative files hosted on the
filesystem in response to a request with an arbitrary accept string is
different; in this case, it doesn't make sense to do the dispatch
through methods. If we were to try to do so, it would look something
like

but we would need to define one such method per possible mime-type we
might want to serve: doable, but unnecessary compared with the
alternative strategy of finding all content-types servable for a given
url, then choosing the one with the highest score:

(the set of files on the filesystem effectively already define a set
of methods for a given url; it doesn't make sense to try to mirror
that as a set of reified methods on a generic function. Also, I've
written this out using
do*
largely to keep the do*-is-not-that-bad society alive.)

Anyway. There's an interesting detail I've elided so far; not only do
response-generating functions have to generate the content they wish
to send in the response; they also have to indicate what
content-type they are actually sending. Our accept-generic-function
already handles dispatching on content-type; can it also take
responsibility for setting the content-type of the response?

Why yes! The way to do this is using a method combination; it might
look something like this:

This behaves just like the or built-in
method-combination,
except that when calling a primary method whose specializer for the
first argument is one of our accept-specializers, the content-type
of the specializer is stored in a special variable; the last thing the
effective method does is to call the new handle-content-type generic
function, passing it the original generic function's first argument.

Now let's redefine our foo generic function to have the new method
combination, and a method on handle-content-type:

and now, finally, we can try it all out:

OK, but by what magic do these accept-specializer objects exist and
function? I wrote a paper about that, with Jan Moringen and David
Lichteblau: as part of my ongoing open access experimentation, the
version we submitted to the European Lisp Symposium is viewable
at Goldsmiths' e-prints repository
or on arXiv. The ELS Chairs have
just announced a deadline extension, so there's still time (until
March 23) for anyone to submit technical papers or abstracts for
tutorials and demonstration sessions: please do, and I hope to see
many of my readers there.