Codingthearchitecture.com

Mapping software architecture to code

2013-04-08

One of the things I'm currently doing with a number of software teams is teaching them how to draw pictures. As as industry we've got really good at visualising the way that we work using things like Kanban boards and story walls, but we've forgotten how to visualise the software that we're building. In a nutshell, many teams are trying to move fast but they struggle to create a shared vision that the whole team can work from, which ultimately slows them down. And few people use UML nowadays, which just exaggerates the problem. I've written an article about this and it's due for publication soon (I'll come back and add a link) plus it's covered in my Software Architecture for Developers ebook and in a number of talks that I'm doing around Europe (ITARC, IASA UK, Mix-IT) and the US (SATURN) during April. Here are the slides from Agile software architecture sketches - NoUML! that I presented a few weeks ago in Dublin.

The TL;DR version

The TL;DR version of this post is simply this ... if you're building monolithic software systems but think of them as being made up of a number of smaller components, ensure that your codebase reflects this. Consider organising your code by component (rather than by layer or feature) to make the mapping between software architecture and code explicit. If it's hard to explain the structure of your software system, change it.

Decomposition into components

For the purpose of this post, let's assume visualising a software system isn't a problem and that you're sketching some ideas related to the software architecture for a new system you've been tasked to build. An important aspect of "just enough" software architecture is to understand how the significant elements of a software system fit together. For me, this means going down to the level of components, services or modules. It's worth stressing this isn't about understanding low-level implementation details, it's about performing an initial level of decomposition. The Wikipedia page for Component based development has a good summary, but essentially a component might be something like a risk calculator, audit logger, report generator, data importer, etc. The simplest way to think about a component is that it's a set of related behaviours behind an interface, which may be implemented using one or more collaborating classes. Good components share a number of characteristics with good classes. They should have high cohesion, low coupling, a well-defined public interface, good encapsulation, etc.

There are a number of benefits to thinking about a software system in terms of components, but essentially it allows us to think and talk about the software as a small number of high-level abstractions rather than the hundreds and thousands of individual classes that make up most enterprise systems. The photo below shows a typical component diagram produced during the training classes we run. Groups are asked to design a simple financial risk system that needs to pull in some data, perform some calculations and generate an Excel report as the output.

This sketch includes the major components you would expect to see for a system that is importing data, performing risk calculations and generating a report. These components provide us with a framework for partitioning the behaviour within the boundary of our system and it should be relatively easy to trace the major use cases/user stories across them. This is a really useful starting point for the software development process and can help to create a shared vision that the team can work towards. But it's also very dangerous at the same time. Without technology choices (or options), this diagram looks like the sort of thing an ivory tower architect might produce and it can seem very "conceptual" for many people with a technical background.

Talk about components, write classes

People generally understand the benefit of thinking about software as higher level building blocks and you'll often hear people talking in terms of components when they're having architecture discussions. This often isn't reflected in the codebase though. Take a look at your own codebase. Can you clearly see components or does your codebase reflect some other structure? When you open up a codebase, it will often reflect some other structure due to the organisation of the code. Mark Needham has a great post called Coding: Packaging by vertical slice that talks about one approach to code organisation and a Google search for "package by feature vs package by layer" will throw up lots of other discussions on the same topic. The mapping between the architectural view of a software system and the code are often very different. This is sometimes why you'll see people ignore architecture diagrams (or documentation) and say "the code is the only single point of truth".

Auto-generating architecture diagrams

To change tack slightly, I was in Dublin a few weeks ago and I met Chris Chedgey, who is part of the inspiration behind this post. Chris is the co-founder of a company called Headway Software and they have a product called Structure101. You should take a look if you've not seen it before, they have some cool stuff in the pipeline. I won't do their product any justice by trying to summarise what it does, but one of its many features is to visualise and understand an existing codebase.

When I teach people how to visualise their software systems, we create a number of simple NoUML sketches at different levels of abstraction. These are the context, containers and components diagrams. This context, containers and components approach is basically just a tree structure. A system is made up of containers (e.g. a web server, application server, database, etc), each of which is further made up of components. You can see some example diagrams on Flickr and in my book.

Given this is really just a tree structure, it should be fairly straightforward to auto-generate these diagrams from an existing codebase. And perhaps there is a tool out there that can do this, but I've never seen one that has worked really well. Microsoft Visual Studio can generate some layer diagrams but I've never met anybody that really raves about the architecture diagram support. Most tools generate diagrams showing dependencies between packages or classes but they don't tend to show components. And what's a component anyway? Is any class that implements an interface a component? If you're using inversion of control, perhaps everything that you inject is a component?

There are a number of reasons why auto-generating such diagrams is tricky but, once we start coding, much of the semantics associated with "containers" (runtime environments, process boundaries, etc) and "components" becomes lost of the sea of classes that make up the typical codebase. Many developers break their systems up into a number of projects within their IDEs to represent reusable libraries and deployable units but external tools often don't have access to this information if they are solely working from a bunch of JAR files or DLLs (for example). In essence, the information related to the abstract structural elements isn't adequately represented within a codebase. If you take a look at most codebases, I'm fairly sure that you could come up with a set of rules as to what defines a component but perhaps it would be easier to simply make these concepts explicit. Some techniques already exist to do this (e.g. the Architecture Description Language) but I've never seen them used in the corporate world.

Packaging by component

To bring this discussion back to code, the organisation of the codebase can really help or hinder here. Organising a codebase by layer makes it easy to see the overall structure of the software but there are trade-offs. For example, you need to delve inside multiple layers (e.g. packages, namespaces, etc) in order to make a change to a feature or user story. Also, many codebases end up looking eerily similar given the fairly standard approach to layering within enterprise systems. Uncle Bob Martin says that if you're looking at a codebase, it should scream something about the business domain. Organising your code by feature rather than by layer gives you this, but again there are trade-offs. A variation I've been experimenting with is organising code explicitly by component. The following screenshot shows an example of this in the codebase for my techtribes.je website (a content aggregator and portal for Jersey's digital sector).

This is similar to packaging by feature, but it's more akin to the "micro services" that Mark Needham talks about in his blog post. Each sub-package of je.techtribes.component houses a separate component, complete with it's own internal layering and Spring configuration. As far as possible, all of the internals are package scoped. You could potentially pull each component out and put it in it's own project or source code repository to be versioned separately. This approach will likely seem familiar to you if you're building something that has a very explicit loosely coupled architecture such as a distributed messaging system made up of loosely coupled components. I'm fairly confident that most people are still building something more monolithic in nature though, despite thinking about their system in terms of components. I've certainly packaged *parts* of monolithic codebases using a similar approach in the past but it's tended to be fairly ad hoc. Let's be honest, organising code into packages isn't something that gets a lot of brain-time, particularly given the refactoring tools that we have at our disposal. Organising code by component lets you explicitly reflect the concept of "a component" from the architecture into the codebase. If your software architecture diagram screams something about your business domain (and it should), this will be reflected in your codebase too.

The structural elements of software

We could create a convention here to say that all sub-packages of je.techtribes.component are components, but it would be much easier to explicitly mark components using metadata. In Java, we could use annotations to do this, attributes in .NET, etc. If we used the same approach for other structural elements of software (e.g. services, layers, containers, etc), tool vendors could use this metadata to generate meaningful and *simple* architecture diagrams automatically. Plus, they could also use this structural information to generate dependency diagrams that focus on components rather than classes. I've started experimenting with annotations as a way to do this and I've created a Github repo to store whatever I come up with.

The major caveat to all of this is that designing a software system based around components isn't "the only way". It's a nice approach to think about software systems that are more monolithic in nature and it's a great stepping stone to designing loosely coupled architectures. But it isn't a silver bullet. Regardless of how you design software, I do hope this post has got you thinking about the mapping between software architecture and how it's reflected in the code.

Software architecture and coding are often seen as mutually exclusive disciplines and there's often very little mapping from the architecture into the code and back again. Effectively and efficiently visualising a software architecture can help to create a good shared vision within the team, which can help it go faster. Having a simple and explicit mapping from the architecture to the code can help even further, particularly when you start looking at collaborative design and collective code ownership. Furthermore, it helps bring software architecture firmly back into the domain of the development team, which is ultimately where it belongs.