2011-06-09

This is cycling in my head for a while now, and I think it's close
to become something concrete.

Let me summarize the idea: web services are most of the time doing the
same post- and pre-processing tasks over and over and there should be a
way to describe them via a DSL.

Nothing revolutionary here, but what if Nginx could handle for you all
the boring parts and let you just handle the meat of your services.
Having a DSL to describe web services potentially allows such
delegation.

Anatomy of a Web Service

A Web Service is basically doing these four steps:

1. [pre-processing] Check the request body and headers, and
potentially reject it. Rejection can be due to a Basic
Authentication failure, an unexpected value for the request body,
etc.
2. [routing] Find what code or application should be called to
build the response. This is usually computed with the path
information and sometimes some headers.
3. [execution] Invoke the code to build the response
4. [post-processing] Return the response built and maybe do some
post-processing or post-assertions like converting the content-type
etc.

Steps 1., 2. and 4. could be delegated to a proxy as long
as it has enough details on what should be done.

In Python, when you build web services using a WSGI framework like
Pylons, Pyramid, or simply Routes + WebOb, all of these steps happen in
your code. You define the routing using Routes descriptions, or
using more clever dispatching systems like what Pyramid offers, then
delegate the execution to a controller class or a simple function,
after a potential pre-processing. Although the pre-processing part
is often merged with the execution part because they are closely
related.

For instance, if you have a web service that requires a JSON mapping in
the request body, you could write something that looks like:

def my_webservice(request):

Of course you can always generalize this by using a decorator to
clearly separate the pre-processing part:

@if_not_json(400)

Same thing for the post-processing step:

@if_not_json(400)

Err... well, in some frameworks, the routing itself is expressed as
a decorator:

@route('/here/is/my/webservice')

It turns out that there are a lot of pre/post steps that can be pushed
to a meta level.

Delegation of pre- and post-processing steps

A web application is most of the time accessed through a proxy. At
Mozilla Services, we use Nginx for all our Python applications.
NGinx is here --among other things-- to pool incoming requests and
dispatch them to our Python application. The proxying job is pretty dumb
right now, as everything that comes in is directly sent to the Python
backend.

What if we were able to delegate all the pre- and post-processing we've
seen earlier to NGinx ?

There would be some benefits, like a faster rejection of bad requests:
no need to invoke the Python application anymore and spend CPU cycles in
a backend for this. If some pre-requisites are not met, we can 400 right
away.

Having all the pre-processing at the proxy level also make it simpler
to modify them without touching the web service code itself. That can be
a default as well of course in some cases : your application logic is
split in two parts and this can be hard to follow. But as long as the
full description of the web service is in a single place, I think it's
fine.

Last, we've talked about Python, but each piece could be implemented in
a different language, as long as NGinx is able to invoke it. Using Lua
for all the pre-processing part is not a bad idea..

The DSL

The last time I've talked about this topic, someone talked about
SPORE which is indeed quite similar to what I want to achieve. I
guess the biggest difference is that SPORE focuses on providing a DSL to
build clients that can interact with an existing set of server APIs.

What I want on my side is to provide a DSL API developers can use to
create web services, and eventually have a proxy like NGinx use it to
run the application.

A developer ideally would:

1. describe her web services in a DSL file
2. implement the execution part
3. test them in a development environment where a web server would load
the DSL and the code
4. deploy the web service in production with NGinx

I could start off with SPORE but I want to experiment with my own DSL
and build it little by little.

I started to build it the other week-end, and it looks like this:

define path hello (

Some details:

- python:demo.post means here: the code to be invoked is located in
the "demo.post" callable - that's the fully qualified name to reach
it, so demo can be a package, or a module.
- unless type is xxx return xxx is a full part of the DSL, a
recognized structure. When parsing the file, it's loaded in an AST
and executed on each request against the body.

Implementation details

I used PLY to read the DSL files, and it'll check many aspects of
the DSL file like:

- make sure the method is known (GET/POST/etc)
- make sure the HTTP codes used are valid ones
- control that the URL path is valid
- etc.

I wrote a small Python application that loads the DSL file at startup
in an AST. Then it provides a web server that will do the
post/pre-processing, and eventually delegate the execution to some
Python code, by passing a request object using WebOb. The URL is for now
using a simple regexp pattern backed by Routes.

I've also wrote:

- a script that validates a DSL file
- a __doc__ web page in the small Python server, that displays the
available web services
- an __api__ page that just publish the DSL file for client-side
discovery. (fwiw)

You can have a look at the ugly code of the prototype here:
https://bitbucket.org/tarek/redbarrel

The next steps in this experiment will be to rewrite one of our small
Services app with it, and see how it comes out.

All in all, I am having a lot of fun doing this, and it's eating some
of the free time I have when I should be really doing some work on
Python packaging... meh ...

Show more