Geekswithblogs.net

Lightweight Software Architecture on Napkins

2014-03-16

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/03/16/lightweight-software-architecture-on-napkins.aspx

Doing explicit software architecture is nothing for the faint of heart, it seems. Software architecture carries the nimbus of heavy weight, of something better left to specialists – or to eschew altogether. Both views are highly counterproductive, though. Leaving architecture to specialists builds walls where there should be none; it hampers feedback and flexibility. And not explicitly architecting (or just designing) software leads to less than desirable evolvability as well as collective software ownership.

So the question is, how to do “just enough” software design up-front. What´s the right amount? What´s an easy way to do it?

Since 2005 I´m on a quest to find answers to these questions. And I´m glad I have found some – at least for me ;-) I´ve lost my “fear of the blank flipchart” when confronted with some requirements document. No longer do I hesitate to start designing software. I´ve shrugged off UML shackles, I´ve gotten off the misleading path of object oriented dogma.

This is not to say, there is no value in some UML diagrams or features of object oriented technologies. Of course there is – as long as it helps ;-)

But as with many practices one never reaches the finishing line when it comes to software architecture. Although I feel comfortable attacking just about any requirements challenge, it´s one think to feel confident – and an altogether different to actually live up to the challenge. So I´m on a constant lookout for exercises in software architecture to further hone my skills. That means applying my method – which is a sampling of many approaches with some added idiosyncrasies – plus reflecting on the process and the outcome.

At the Coding Dojo of the Clean Code Developer School I´ve compiled more than 50 such exercises of different sizes (from small code/function katas to architecture katas). If you like, try them out yourself or with your team (some German language skills required, though ;-).

And recently I stumbled across another fine architecture kata. It´s from Simon Brown whose book “Software Architecture for Developers” I read. First the exercise was only in the book, but then he made it public. I included it in the Coding Dojo and added line numbers and page numbers. Find the full text for the requirements of a Financial Risk System here.

Since Simon included pictures of architectural designs for this exercise from students of his trainings, I thought, maybe I should view that as a challenge and try it myself. If I´m confident my approach to software architecture is helpful, too, then why not expose myself with it. Maybe there are interesting similarities to be discovered – maybe there are differences that could be discussed.

Following I´ll approach the architecture for the Financial Risk System (FRS) my way. This will show how I approach a design task, but it might lack some explanation as to the principles underlying this approach. Here´s not enough room, though, to layout my whole thought framework. But I´m working on a book… ;-) It´s called “The Architect´s Napkin – The Cheat Sheet”. But beware, for the first version it´s in German.

Why it´s called the Architect´s Napkin you´ll see in a minute – or read here. Just let me say: If software architecture is to become a disciplin for every developer it needs to be lightweight. And design drawing should fit on an ordinary napkin. All else will tend to be too complicated and hard to communicate.

And now for some design practice. Here´s my toolbox for software architects:

Basic duality: system vs environment

Every software project should start by focusing on what´s its job, and what not. It´s job is to build a software system. That´s what has to be at the center of everything. By putting something at the center, though, other stuff is not at the center. That´s the environment of what´s at the center. In the beginning (of a software project) thus there is duality: a system to build vs the environment (or context) of the system.

And further the system to build is not created equal in all parts. Within the system there is a core to be distinguished from the rest. At the core of the system is the domain. That´s the most important part of any software system. That´s what we need to focus on most. It´s for this core that a customer wants the system in the first place.

That´s a simple diagram. But it´s an important one as you´ll see. Best of all: you can draw it right away when your boss wants you to start a new software project ;-) It even looks the same for all software systems. Sure, it´s very, very abstract. But that helps as long as you don´t have a clue what the requirements are about.

Spotting actors

With the system put into the focus of my attention I go through the requirements and try to spot whose actually going to use it. Who are the actors, who is actively influencing the system? I´m not looking for individual persons, but roles. And these roles might be played by non-human actors, i.e. other software systems.

The first actor I encounter is such a non-human actor: the risk calculation scheduler. It requests the software system to run and produce a risk report. Line 8 and 9 in the requirements document allude to this actor.

Then on page 2, line 54f the second actor is described: risk report consumer. It´s the business users who want to read the reports.

Line 56f on the same page reveal a third actor: the calculation configurator. This might be the same business user reading a report, but it´s a different role he´s playing when he changes calculation parameters.

Finally lines 111ff allude to a fourth actor: the monitoring scheduler. It starts the monitoring which observes the risk calculation.

The risk calculation scheduler and the monitoring scheduler are special in so far as they are non-human actors. They represent some piece of software which initiates some behavior in the software system.

Here´s the result of my first pass through the requirements:

Now I know who´s putting demands on the software system. All functionality is there to serve some actor. Actors need the software system; they trigger it in order to produce results [1].

Compiling resources

During my second pass through the requirements I focus on what the software system needs. Almost all software depends on resources in the environment to do its job. That might be a database or just the file system. Or it might be some server, a printer, a webcam, or some other hardware. Here are the resources I found:

Page 1, line 10f: the existing Trade Data System (TDS)

Page 1, line 11: the existing Reference Data System (RDS)

Page 2, line 52f: the risk report (RR)

Page 2, line 54f: some means to distribute the RR to the risk report consumers (risk report distribution, RRD)

Page 2, line 54f: a business user directory (BUD)

Page 2, line 56f: external parameters (EP) for the risk calculation

Page 3, line 92ff: an audit log (AL)

Page 4, line 111ff: SNMP

Page 4, line 117f: an archive (Arc)

Nine resources to support the whole software system. And all of them require some kind of special technology (library, framework, API) to use them.

The diagram as a checklist

What I´ve done so far was just analysis, not design. I just harvested two kind of environmental aspects: actors/roles and resources. But they are important for three reasons. Firstly they help structuring the software system as you´ll see later. Secondly they guide further analysis of the requirements. And last but not least they function as a checklist for the architect.

Each environmental element poses questions, some specific to its kind, some general. And by observing how easy or difficult it is to answer them, I get a feeling for the risk associated with them.

My third pass through the requirements is not along the document, but around the circle of environmental aspects identified. As I focus on each, I try to understand it better. Here are some questions I´d feel prompted by the “visual checklist” to ask:

Actor “Risk calculation scheduler”: How should the risk calculation be started automatically each day? What´s the operating system it´s running on anyway? Windows offers a task scheduler, on Linux there is Crontab. But there are certainly more options. Some more research is needed – and talking to the customer.

Actor “Risk report consumer”: How should consumers view reports? They need to be able to use Excel, but is that all? Maybe Excel is just a power tool for report analysis, and a quick overview should be gained more easily? Maybe it´s sufficient to send them the reports via email. Or maybe just a notification of a new report should be sent via email, and the report itself can be opened with Excel from a file share? I need to talk to the customer.

Actor “Calculation configurator”: What kind of UI should the configurator be using? Is a text editor enough to access a text file containing the parameters – protected by operating system access permissions? Or is a dedicated GUI editor needed? I need to talk to the customer.

Actor “Monitoring scheduler”: The monitoring aspect to me is pretty vague. I don´t really have an idea yet how to do it. I feel this to be an area of quite some risk. Maybe monitoring should be done by some permanently running windows service/daemon which checks for new reports every day and can be pinged with a heartbeat by the risk calculation? Some research required here, I guess. Plus talking to the customer about how important monitoring is compared to other aspects.

Resource “TDS”:

What´s the data format? XML

What´s the data structure? Simple table, see page 1, section 1.1 for details

What´s the data volume? In the range of 25000 records per day within the next 5 years (see page 3, section 3.b)

What´s the quality of the data, what´s the reliability of data delivery? No information found in the requirements.

How to access the data? It´s delivered each day as a file which can be read by some XML API. Sounds easy.

Available at 17:00h (GMT-4 (daylight savings time) or GMT-5 (winter time)) each day.

Resource “RDS”:

What´s the data format? XML

What´s the data structure? Not specified; need to talk to the customer.

What´s the data volume? Some 20000 records per day (see page 3, section 3b)

What´s the quality of the data, what´s the reliability of data delivery? No information found in the requirements.

How to access the data? It´s delivered each day as a file which can be read by some XML API. Sounds easy – but record structure needs to be clarified.

Resource “Risk report”:

What´s the data format? It needs to be Excel compatible, that could mean CSV is ok. At least it would be easy to produce. Need to ask the customer. If more than CSV is needed, e.g. Excel XML or worse, then research time has to be alotted, because I´m not familiar with appropriate APIs.

What´s the data structure? No information have been given. Need to talk to the customer.

What´s the data volumne? I presume it depends on the number of TDS records, so we´re talking about some 25000 records of unknown size. Need to talk to the customer.

How to deliver the data? As already said above, I´m not sure if the risk report should be stored just as a data file or be sent to the consumers via email. For now I´ll go with sending it via email. That should deliver on the availability requirement (page 3, section 3.c) as well as on the security requirements (page 3, section e, line 82f, 84f, 89f).

Needs to be ready at 09:00h (GMT+8) the day following TDS file production.

Resource “Risk report distribution”: This could be done with SMTP. The requirements don´t state, how the risk reports should be accessed. But I guess I need to clarify this with the customer.

Resource “Business user directory”: This could be an LDAP server or some RDBMS or whatever. The only thing I´d like to assume is, the BUD contains all business users who should receive the risk reports as well as the ones who have permission to change the configuration parameters. I would like to run a query on the BUD to retrieve the former, and use it for authentication and authorization for the latter. Need to talk to the customer for more details.

Resource “External parameters”: No details on the parameters are given. But I assume it´s not much data. The simplest thing probably would be to store them in a text file (XML, Json…). That could be protected by encryption and/or file system permissions, so only the configurator role can access it. Need to talk to the customer if that´d ok.

Resource “Audit log”:

Can the AL be used for what should be logged according to page 3, section 3.f and 3.g?

Is there logging infrastructure already in place which could be used for the AL?

What´s the access constraints for the AL? Does it need special protection?

Resource “SNMP”: I don´t have any experience with SNMP. But it sounds like SNMP traps can be sent via an API even from C# (which I´m most familiar with). Need to do some research here. The most important question is, how to detect the need to send a trap (see above the monitoring scheduler).

Resource “Archive”:

Is there any Arc infrastructure already in place?

What are the access constraints for the Arc? Does it need special protection?

My current idea would be to store the TDS file and the RDS file together with the resulting risk report file in a zip file and put that on some file server (or upload it into a database server). But I guess I need to talk to the customer about this.

In addition to the environmental aspects there is the domain to ask questions about, too:

How are risks calculated anyway? The requirements don´t say anything about that. Need to talk to the customer, because that´s an important driver for the functional design.

How long will it take to calculate risks? No information on that, too, in the requirements document; is it more like some msec for each risk or like several seconds or even minutes? Need to talk to the customer, because that´s an important driver for the architectural design (which is concerned with qualities/non-functional requirements).

When the TDS file is produced at 17:00h NYC time (GMT-5) on some day n it´s 22:00h GMT of the same day, and 06:00h on day n+1 in Singapore (GMT+8). This gives the risk calculation some 3 hours to finish. Can this be done with a single process? That needs to be researched. The goal is, of course, to keep the overall solution as simple as possible.

So much for a first run through the visual checklist the system-environment diagram provides.

The purpose of this was to further understand the requirements – and identify areas of uncertainty. This way I got a feeling for the risks lurking in the requirements, e.g.

The larges risk to me currently is with the domain: I don´t know how the risk calculation is done, which has a huge impact on the architecture of the core.

Then there is quite some risk in conjunction with infrastructure. What kind of infrastructure is available? What are the degrees of freedom in choosing new infrastructure? How rigid are security requirements?

Finally there is some risk in technologies I don´t have any experience with.

Here´s a color map of the risk areas identified:

With such a map in my hand, I´d like to talk to the customer. It would give us a guideline in our discussion. And it´s a list of topics for further research. Which means it´s kind of a backlog for things to do and get feedback on.

But alas, the customer is not available. Sounds familiar? ;-) So what can I do? Probably the wisest thing to do would be to stop further design and wait for the customer. But this would spoil the exercise :-) So I´ll continue to design, tentatively. And hopefully this does not turn out to be waste in the end.

Refining to applications – Design for agility

The FRS is too big to be implemented or even further discussed and designed as a whole. It needs to be chopped up :-) I call that slicing in contrast to the usual layering. At this point I´m not interested in more technical details which layers represent. I´d like to view the system through the eyes of the customer/users. So the next step for me is to find increments that make sense to the customer and can be focused on in turn.

For this slicing I let myself be guided by the actors of the system-environment-diagram. I´d like to slice the system in a way so that each actor gets its own entry point into it. I call that application (or app for short).

Each app is a smaller software system by itself. That´s why I use the same symbol for them like for the whole software system. Together the apps make up the complete FRS. But each app can be delivered separately and provides some value to the customer. Or I work on some app for a while without finishing it, then switch to another to move it forward, then switch to yet another etc. Round and round it can go ;-) Always listening to what the customer finds most important at the moment – or where I think I need feedback most.

As you can see, each app serves a single actor. That means, each app can and should be implemented in a way to serve this particular actor best. No need to use the same platform or UI technologies for all apps.

Also the diagram shows how I think the apps share resources:

The Risk Report Calculation app needs to read config data from EP and produces a report RR to be sent to business users listed in BUD. Progress or hindrances are logged to AL.

The Config Editor also needs to access BUD to check, who´s authorized to edit the data in EP. Certain events are logged to AL.

The Report Reader just needs to access the report file. Authorization is implicit: since the business user is allowed to access the RR folder on some file share, he can load the report file with Excel. But if need be, the Report Reader could be more sophisticated and require the business user to authenticate himself. Then the report reader would also need access to BUD.

The Monitor app checks the folder of the report files each day, if a file has arrived. In addition the Monitor app could be a resource to the Report Calculation which can send it heart beat to signal it´s doing well.

The other resources are used just by the Report Calculation.

Now that I have sliced up the whole software system into applications, I can focus on them in turn. What´s the most important one? Where should I zoom in?

Hosting applications – Design for quality

Zooming in on those applications can mean two things: I could try to slice them up further. That would mean I pick an application and identify its dialogs and then then interactions within each dialog. That way I´d reach the function level of a software system, where each function represents an increment. Such slicing would be further structuring the software system from the point of view of the domain. It would be agile design, since the resulting structure would match the view of the customer. Applications, dialogs, and interactions are of concern to him.

Or I could zoom in from a technical angle. I´d leave the agile domain dimension of design which focuses on functionality. But then which dimension should I choose? There are two technical dimensions, in my view. One is concerned with non-functional requirements or qualities (e.g. performance, scalability, security, robustness); I call it the host dimension. Its elements describe the runtime structure of software. The other is concerned with evolvability and production efficiency (jointly called the “security of investment” aspect of requirements); I call it the container dimension. Its elements describe the design time structure of software.

So, which dimension to choose? I opt for the host dimension. And I focus on the Risk Report Calculation application. That´s the most important app of the software system, I guess.

Whereas the domain dimension of my software architecture approach decomposes a software system into ever smaller slices called applications, dialogs, interactions, the host dimension provides decomposition levels like device, operation system process, or thread.

Focusing on one app the questions thus are: How many devices are needed to run the app so it fulfills the non-functional requirements? How many processes, how many threads?

What are the non-functional requirements determining the number of devices for the calculation app of the FRS? It needs to run in the background (page 1, line 7f), it needs to generate the report within 3 hours (line 28 + line 63), it needs to be able to log certain events (page 3, line 94) and be monitored (page 4, lines 111ff).

How many devices need to be involved to run the calculation strongly depends on how long the calculations take. If they are not too complicated, then a single server will do.

And how many processes should make up the calculation on this server? Reading section 2, lines 43ff I think a single process will be sufficient. It can be started automatically by the operating system, it is fast enough to do the import, calculation, notification within 3 hours. It can have access to the AL resource and can be monitored (one way or the other).

At least that´s what I want to assume lacking further information as noted above.

Of course this host diagram again is a checklist for me. For each new element – device, process – I should ask appropriate questions, e.g.

Application server device:

Which operating system?

How much memory?

Device name, IP address?

How can an app be deployed to it?

Application process:

How can it be started automatically?

Which runtime environment to use?

What permissions are needed to access the resources?

Should the application code own the process or should it run inside an application server?

The device host diagram and the process host diagram look pretty much the same. That´s because both container only a single element. In other cases, though, a device is decomposed into several processes. Or there are several devices each with more than one process.

Also this is only the processes which seem necessary to fulfill quality requirements. More might be added to improve evolvability.

Nevertheless drawing those diagrams is important. Each host level (device, process – plus three more) should be checked. Each can help to fulfill certain non-functional requirements, for example: devices are about scalability and security, processes are about robustness and performance and security, threads are about hiding or lowering latency.

Separating containers – Design for security of investment

Domain slices are driven directly by the functional requirements. Hosts are driven by quality concerns. But what drives elements like classes or libraries? They belong to the container dimension of my design framework. And they can only partly be derived directly from requirements. I don´t believe in starting software design with classes. They don´t provide any value to the user. Rather I start with functions (see below) – and then find abstractions on higher container levels for them.

Nevertheless the system-environment-diagram already hints at some containers. On what level of the container hierarchy they should reside, is an altogether different question. But at least separate classes (in OO languages) should be defined to represent them. Mostly also separate libraries are warranted for even more decoupling.

Here´s the simple rule for the minimum number of containers in any design:

communication with each actor is encapsulated in a container

communication with each resource is encapsulated in a container

the domain of course needs its own container – at least one, probably more

a dedicated container to integrate all functionality; usually I don´t draw this one; it´s implicit and always present, but if you like, take the large circle of the system as its representation

Each container has its own symbol: the actor facing container is called a portal and drawn as a rectangle, the resource facing containers are called providers and drawn as triangles, and the domain is represented as a circle.

That way I know the calculation app will consist of 9+1+1+1=12 containers. It´s a simple and mechanical separation of concerns. And it serves the evolvability as well as production efficiency.

By encapsulating each actor/resource communication in its own container, it can more easily replaced, tested, and implemented in parallel. Also this decouples the domain containers from the environment.

Interestingly, though, there is no dependency between these containers! At least in my world ;-) None knows of the others. Not even the domain knows about them. This makes my approach different from the onion architecture or clean architecture, I guess. They prescribe certain dependency orientations. But I don´t. There are simply no dependencies :-) Unfortunately I can´t elaborate on this further right here. Maybe some other time… For you to remember this strange approach here is the dependency diagram for the containers:

No dependencies between the “workhorses”, but the integration knows them all. But such high efferent coupling is not dangerous. The integration does not contain any logic; it does not depend itself on the operations of the other containers. Integration is a special responsibility of its own.

Although I know a minimal set of containers, I don´t know much about their contracts. Each is encapsulating some API through which it communicates with a resource/actor. But how the service of the containers is offered to its environment is not clear. I could speculate about it, but most likely that would violate the YAGNI principle.

There is a way, however, to learn more about those contracts – and maybe find more containers…

Zooming in – Design for functionality

So far I´ve identified quite some structural elements. But how does this all work together? Functionality is the first requirement that needs to be fulfilled – although it´s not the most important one [2].

I switch dimensions once again in order to answer this question. Now it´s the flow dimension I´m focusing on. How does data flow through the software system and get processed?

On page 2, section 2 the requirements document gives some hints. This I would translate into a flow design like so. It´s a diagram of what´s going on in the calculation app process [3]:

Each shape is a functional unit which does something [4]. The rectangle at the top left is the starting point. It represents the portal. That´s where the actor “pours in” its request. The portal transforms it into something understandable within the system. The request flows to the Import which produces the data Calculate then transforms into a risk report.

That´s the overall data flow. But it´s too coarse grained to be implemented. So I refine it:

Zooming into Import reveals to separate import steps – which could be run in parallel – plus an Join producing the final output.

Zooming into Calculate releals several processing steps. First the input from the Import is transformed into risk data. Then the risk data is transformed into the actual risk report, of which then the business users are informed. Finally the TDS/RDS input data (as well as the risk report) gets archived.

The small triangles hint at some resource access within a processing step. Whether the functional unit itself would do that or if it should be further refined, I won´t ponder here. I just wanted to quickly show this final dimension of my design approach.

For Import TDS and Import RDS I guess I could derive some details about the respective container contracts. Both seem to need just one function, e.g. TDSRecord[] Import(string tdsFilename).

The other functional units hint at some more containers to consider. Report generation (as opposed to report storage) looks like a different responsibility than calculating risks, for example. Also Import and Calculate have a special responsibility: integration. They see to that the functional units form an appropriate flow.

At least the domain thus is decomposed into at least three containers:

integration

calculation

report generation

Each responsibility warrants its own class, I´d say. That makes it 12-1+3=14 containers for the calculation application.

Do you see how those containers are a matter of abstraction? I did not start out with them; rather I discovered them by analyzing the functional structure, the processing flow.

Retrospective

So much for my architectural design monologue ;-) Because a monologue it had to be since the customer was not at hand to answer my many questions. Nevertheless I hope you got an impression of my approach to software design. The steps would not have been different if a customer had be available. However the resulting structure might look different.

Result #1: Make sure the customer is close by for questions when you start your design.

The exercise topic itself I found not particularly challenging. The interesting part was missing ;-) No information on what “calculating risks” means. But what became clear to me once more was:

Result #2: Infrastructure is nasty

There are so many risks lurking in infrastructure technologies and security constraints and deployment and monitoring. Therefore it´s even more important to isolate those aspects in the design. That´s what portals and providers are for.

To write up all this took me a couple of hours. But the design itself maybe was only half an hour of effort. So I would not call it “big design up-front” :-)

Nonetheless I find it very informative. I would not start coding with less. Now I talk about focus and priorities with the customer. Now I can split up work between developers. (Ok, some more discussion and design would be needed to make the contracts of the containers more clear.)

And what about the data model? What about the domain model? You might be missing the obligatory class diagram linking data heavy aggregates and entities together.

Well, I don´t see much value in that in this case – at least from the information given in the requirements. The domain consists of importing data, calculating risks and generating a report. That´s the core of the software system. And that´s represented in all diagrams: the host diagram shows a process which does exactly this, the container diagram shows a domain responsible for this, and the flow diagram shows how several domain containers play together to form this core.

Result #3: A domain model is just a tool, not a goal.

All in all this exercise went well, I´d say. Not only used I flows to design part of the system, I also felt my work flowing nicely. No moment of uncertainty how to move on.

Endnotes

[1] Arguably the non-human actors in this scenario don´t really need the software system. But as you´ll see it helps to put them into the picture as agents to cause reactions of the software system.

[2] The most important requirements are the primary qualities. It´s for them that software gets commissioned. Most often that´s performance, scalability, and usability. Some operations should be executed faster, with more load, and be easier to use through software, than without. But of course, before some operation can become faster it needs to be present first.

[3] If this reminds you of UML activity diagrams, that´s ok ;-)

[4] Please not my use of symbols for relationships. I used lines with dots at the end to denote dependencies. In UML arrows are used for that purpose. However, arrows I reserve for data flow. That´s what they are best at: showing from where to where data flows.