2016-11-14

One year ago I was introduced by my friend Roberto Ciatti to the concept of Clean Architecture, as it is called by Robert Martin. The well-known Uncle Bob talks a lot about this concept at conferences and wrote some very interesting posts about it. What he calls "Clean Architecture" is a way of structuring a software system, a set of consideration (more than strict rules) about the different layers and the role of the actors in it.

As he clearly states in a post aptly titled The Clean Architecture, the idea behind this design is not new, being built on a set of concepts that have been pushed by many software engineers over the last 3 decades. One of the first implementations may be found in the Boundary-Control-Entity model proposed by Ivar Jacobson in his masterpiece "Object-Oriented Software Engineering: A Use Case Driven Approach" published in 1992, but Martin lists other more recent versions of this architecture.

I will not repeat here what he had already explained better than I can do, so I will just point out some resources you may check to start exploring these concepts:

The Clean Architecture a post by Robert Martin that concisely describes the goals of the architecture. It also lists resources that describe similar architectures.

The Open Closed Principle a post by Robert Martin not strictly correlated with the Clean Architecture concept but important for the separation concept.

Hakka Labs: Robert "Uncle Bob" Martin - Architecture: The Lost Years a video of Robert Martin from Hakka Labs.

DDD & Testing Strategy by Lauri Taimila

(upcoming) Clean Architecture by Robert Martin, published by Prentice Hall.

The purpose of this post is to show how to build a web service in Python from scratch using a clean architecture. One of the main advantages of this layered design is testability, so I will develop it following a TDD approach. The project was initially developed from scratch in around 3 hours of work. Given the toy nature of the project some choices have been made to simplify the resulting code. Whenever meaningful I will point out those simplifications and discuss them.

If you want to know more about TDD in Python read the posts in this category.

Project overview

The goal of the "Rent-o-matic" project (fans of Day of the Tentacle may get the reference) is to create a simple search engine on top of a dataset of objects which are described by some quantities. The search engine shall allow to set some filters to narrow the search.

The objects in the dataset are storage rooms for rent described by the following quantities:

An unique identifier

A size in square meters

A renting price in euros/day

Latitude and longitude

As pushed by the clean architecture model, we are interested in separating the different layers of the system. The architecture is described by four layers, which however can be implemented by more than four actual code modules. I will give here a brief description of those layers.

Entities

This is the level in which the domain models are described. Since we work in Python, I will put here the class that represent my storage rooms, with the data contained in the database, and whichever data I think is useful to perform the core business processing.

It is very important to understand that the models in this layer are different from the usual models of framework like Django. These models are not connected with a storage system, so they cannot be directly saved or queried using methods of their classes. They may however contain helper methods that implement code related to the business rules.

Use cases

This layer contains the use cases implemented by the system. In this simple example there will be only one use case, which is the list of storage rooms according to the given filters. Here you would put for example a use case that shows the detail of a given storage room or every business process you want to implement, such as booking a storage room, filling it with goods, etc.

Interface Adapters

This layer corresponds to the boundary between the business logic and external systems and implements the APIs used to exchange data with them. Both the storage system and the user interface are external systems that need to exchange data with the use cases and this layer shall provide an interface for this data flow. In this project the presentation part of this layer is provided by a JSON serializer, on top of which an external web service may be built. The storage adapter shall define here the common API of the storage systems.

External interfaces

This part of the architecture is made by external systems that implement the interfaces defined in the previous layer. Here for example you will find a web server that implements (REST) entry points, which access the data provided by use cases through the JSON serializer. You will also find here the storage system implementation, for example a given database such as MongoDB.

API and shades of grey

The word API is of uttermost importance in a clean architecture. Every layer may be accessed by an API, that is a fixed collection of entry points (methods or objects). Here "fixed" means "the same among every implementation", obviously an API may change with time. Every presentation tool, for example, will access the same use cases, and the same methods, to obtain a set of domain models, which are the output of that particular use case. It is up to the presentation layer to format data according to the specific presentation media, for example HTML, PDF, images, etc. If you understand plugin-based architectures you already grasped the main concept of a separate, API-driven component (or layer).

The same concept is valid for the storage layer. Every storage implementation shall provide the same methods. When dealing with use cases you shall not be concerned with the actual system that stores data, it may be a MongoDB local installation, a cloud storage system or a trivial in-memory dictionary.

The separation between layers, and the content of each layer, is not always fixed and immutable. A well-designed system shall also cope with practical world issues such as performances, for example, or other specific needs. When designing an architecture it is very important to know "what is where and why", and this is even more important when you "bend" the rules. Many issues do not have a black-or-white answer, and many decisions are "shades of grey", that is it is up to you to justify why you put something in a given place.

Keep in mind however, that you should not break the structure of the clean architecture, in particular you shall be inflexible about the data flow (see the "Crossing boundaries" section in the original post of Robert Martin). If you break the data flow, you are basically invalidating the whole structure. Let me stress it again: never break the data flow. A simple example of breaking the data flow is to let a use case output a Python class instead of a representation of that class such as a JSON string.

Project structure

Let us take a look at the final structure of the project

The global structure of the package has been built with Cookiecutter, and I will run quickly through that part. The rentomatic directory contains the following subdirectories: domain, repositories, REST, serializers, use_cases. Those directories reflect the layered structure introduced in the previous section, and the structure of the tests directory mirrors this structure so that tests are easily found.

Source code

You can find the source code in this GitHub repository. Feel free to fork it and experiment, change it, and find better solutions to the problem I will discuss in this post. The source code contains tagged commits to allow you to follow the actual development as presented in the post. You can find the current tag in the Git tag: <tag name> label under the section titles. The label is actually a link to the tagged commit on GitHub, if you want to see the code without cloning it.

Project initialization

Git tag: step01

I usually like maintaining a Python virtual environment inside the project, so I will create a temporary virtualenv to install cookiecutter, create the project, and remove the virtualenv. Cookiecutter is going to ask you some questions about you and the project, to provide an initial file structure. We are going to build our own testing environment, so it is safe to answer no to use_pytest. Since this is a demo project we are not going to need any publishing feature, so you can answer no to use_pypi_deployment_with_travis as well. The project does not have a command line interface, and you can safely create the author file and use any license.

Now answer the questions, then finish creating the project with the following code

Get rid of the requirements_dev.txt file that Cookiecutter created for you. I usually store virtualenv requirements in different hierarchical files to separate production, development and testing environments, so create the requirements directory and the relative files

The test.txt file will contain specific packages used to test the project. Since to test the project you also need to install the packages for the production environment the file will first include the production one.

The dev.txt file will contain packages used during the development process and shall install also test and production package

(taking advantage of the fact that test.txt already includes prod.txt).

Last, the main requirements.txt file of the project will just import requirements/prod.txt

Obviously you are free to find the project structure that better suits your need or preferences. This is the structure we are going to use in this project but nothing forces you to follow it in your personal projects.

This separation allows you to install a full-fledged development environment on your machine, while installing only testing tools in a testing environment like the Travis platform and to further reduce the amount of dependencies in the production case.

As you can see, I am not using version tags in the requirements files. This is because this project is not going to be run in a production environment, so we do not need to freeze the environment.

Miscellaneous configuration

The pytest testing library needs to be configured. This is the pytest.ini file that you can create in the root directory (where the setup.py file is located)

To run the tests during the development of the project just execute

If you want to check the coverage, i.e. the amount of code which is run by your tests or "covered", execute

If you want to know more about test coverage check the official documentation of the Coverage.py and the pytest-cov packages.

I strongly suggest the use of the flake8 package to check that your Python code is PEP8 compliant. This is the flake8 configuration that you can put in your setup.cfg file

To check the compliance of your code with the PEP8 standard execute

Flake8 documentation is available here.

Note that every step in this post produces tested code and a of coverage of 100%. One of the benefits of a clean architecture is the separation between layers, and this guarantees a great degree of testability. Note however that in this tutorial, in particular in the REST sections, some tests have been omitted in favour of a simpler description of the architecture.

Domain models

Git tag: step02

Let us start with a simple definition of the StorageRoom model. As said before, the clean architecture models are very lightweight, or at least they are lighter than their counterparts in a framework.

Following the TDD methodology the first thing that I write are the tests. Create the tests/domain/test_storageroom.py and put this code inside it

With these two tests we ensure that our model can be initialized with the correct values and that can be created from a dictionary. In this first version all the parameters of the model are required. Later we could want to make some of them optional, and in that case we will have to add the relevant tests.

Now let's write the StorageRoom class in the rentomatic/domain/storageroom.py file. Do not forget to create the __init__.py file in the subdirectories of the project, otherwise Python will not be able to import the modules.

The model is very simple, and requires no further explanation. One of the benefits of a clean architecture is that each layer contains small pieces of code that, being isolated, shall perform simple tasks. In this case the model provides an initialization API and stores the information inside the class.

The from_dict method comes in handy when we have to create a model from data coming from another layer (such as the database layer or the query string of the REST layer).

One could be tempted to try to simplify the from_dict function, abstracting it and providing it through a Model class. Given that a certain level of abstraction and generalization is possible and desirable, the initialization part of the models shall probably deal with various different cases, and thus is better off being implemented directly in the class.

The DomainModel abstract base class is an easy way to categorize the model for future uses like checking if a class is a model in the system. For more information about this use of Abstract Base Classes in Python see this post.

Serializers

Git tag: step03

Our model needs to be serialized if we want to return it as a result of an API call. The typical serialization format is JSON, as this is a broadly accepted standard for web-based API. The serializer is not part of the model, but is an external specialized class that receives the model instance and produces a representation of its structure and values.

To test the JSON serialization of our StorageRoom class put in the tests/serializers/test_storageroom_serializer.py file the following code

Put in the rentomatic/serializers/storageroom_serializer.py file the code that makes the test pass

Providing a class that inherits from json.JSONEncoder let us use the json.dumps(room, cls=StorageRoomEncoder) syntax to serialize the model.

There is a certain degree of repetition in the code we wrote, and this is the annoying part of a clean architecture. Since we want to isolate layers as much as possible and create lightweight classes we end up somehow repeating certain types of actions. For example the serialization code that assigns attributes of a StorageRoom to JSON attributes is very similar to that we use to create the object from a dictionary. Not exactly the same, obviously, but the two functions are very close.

Use cases (part 1)

Git tag: step04

It's time to implement the actual business logic our application wants to expose to the outside world. Use cases are the place where we implement classes that query the repository, apply business rules, logic, and whatever transformation we need for our data, and return the results.

With those requirements in mind, let us start to build a use case step by step. The simplest use case we can create is one that fetches all the storage rooms from the repository and returns them. Please note that we did not implement any repository layer yet, so our tests will mock it.

This is the skeleton for a basic test of a use case that lists all the storage rooms. Put this code in the tests/use_cases/test_storageroom_list_use_case.py

The test is straightforward. First we mock the repository so that is provides a list() method that returns the list of models we created above the test. Then we initialize the use case with the repo and execute it, collecting the result. The first thing we check is if the repository method was called without any parameter, and the second is the effective correctness of the result.

This is the implementation of the use case that makes the test pass. Put the code in the rentomatic/use_cases/storageroom_use_case.py

With such an implementation of the use case, however, we will soon experience issues. For starters, we do not have a standard way to transport the call parameters, which means that we do not have a standard way to check for their correctness either. The second problem is that we miss a standard way to return the call results and consequently we lack a way to communicate if the call was successful of if it failed, and in the latter case what are the reasons of the failure. This applies also to the case of bad parameters discussed in the previous point.

We want thus to introduce some structures to wrap input and outputs of our use cases. Those structures are called request and response objects.

Requests and responses

Git tag: step05

Request and response objects are an important part of a clean architecture, as they transport call parameters, inputs and results from outside the application into the use cases layer.

More specifically, requests are objects created from incoming API calls, thus they shall deal with things like incorrect values, missing parameters, wrong formats, etc. Responses, on the other hand, have to contain the actual results of the API calls, but shall also be able to represent error cases and to deliver rich information on what happened.

The actual implementation of request and response objects is completely free, the clean architecture says nothing about them. The decision on how to pack and represent data is up to us.

For the moment we just need a StorageRoomListRequestObject that can be initialized without parameters, so let us create the file tests/use_cases/test_storageroom_list_request_objects.py and put there a test for this object.

While at the moment this request object is basically empty, it will come in handy as soon as we start having parameters for the list use case. The code of the StorageRoomListRequestObject is the following and goes into the rentomatic/use_cases/request_objects.py file

The response object is also very simple, since for the moment we just need a successful response. Unlike the request, the response is not linked to any particular use case, so the test file can be named tests/shared/test_response_object.py

and the actual response object is in the file rentomatic/shared/response_object.py

Use cases (part 2)

Git tag: step06

Now that we have implemented the request and response object we can change the test code to include those structures. Change the tests/use_cases/test_storageroom_list_use_case.py to contain this code

The new version of the rentomatic/use_case/storageroom_use_cases.py file is the following

Let us consider what we have achieved with our clean architecture up to this point. We have a very lightweight model that can be serialized to JSON and which is completely independent from other parts of the system. The code also contains a use case that, given a repository that exposes a given API, extracts all the models and returns them contained in a structured object.

We are missing some objects, however. For example, we have not implemented any unsuccessful response object or validated the incoming request object.

To explore these missing parts of the architecture let us improve the current use case to accept a filters parameter that represents some filters that we want to apply to the extracted list of models. This will generate some possible error conditions for the input, forcing us to introduce some validation for the incoming request object.

Requests and validation

Git tag: step07

I want to add a filters parameter to the request. Through that parameter the caller can add different filters by specifying a name and a value for each filter (for instance {'price_lt': 100} to get all results with a price lesser than 100).

The first thing to do is to change the request object, starting from the test. The new version of the tests/use_cases/test_storageroom_list_request_objects.py file is the following

As you can see I added the assert req.filters is None check to the original two tests, then I added 5 tests to check if filters can be specified and to test the behaviour of the object with an invalid filter parameter.

To make the tests pass we have to change our StorageRoomListRequestObject class. There are obviously multiple possible solutions that you can come up with, and I recommend you to try to find your own. This is the one I usually employ. The file rentomatic/use_cases/request_object.py becomes

import collections

class InvalidRequestObject(object):

def __init__(self):
self.errors = []

def add_error(self, parameter, message):
self.errors.append({'parameter': parameter, 'message': message})

def has_errors(self):
return len(self.errors) > 0

def __nonzero__(self):
return False

__bool__ = __nonzero__

class ValidRequestObject(object):

@classmethod
def from_dict(cls, adict):
raise NotImplementedError

def __nonzero__(self):
return True

__bool__ = __nonzero__

class StorageRoomListRequestObject(ValidRequestObject):

def __init__(self, filters=None):
self.filters = filters

@classmethod
def from_dict(cls, adict):
invalid_req = InvalidRequestObject()

if 'filters' in adict and not isinstance(adict['filters'], collections.Map

Show more