2016-03-02

Introduction
SOA or Service Oriented Architecture is one of the buzzwords among architects/senior-developers, job descriptions for the last few years. However, most of the definitions of SOA online are riddled with formal words, such as the one from OASIS, which says: "A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations."

The above definition though is precise, is too abstract for a developer. This post tries to explain what constitutes a [micro-]service oriented architecture and how it differs from a traditional monolithic approach that a programmer may be accustomed to. This post is an introductory material aimed at beginners to SOA, who have already done some monolithic projects. Experts in SOA, could validate the facts mentioned and suggest alternatives.

Microservices

A very informal way to understand microservices is, if we split up every class of our design into a HTTP accessible webservice on its own, we would end up with a bunch of services, which together constitute a microservices based architecture. The difference between SOA and microservices is just the level of granularity to which you decompose your classes (in a monolithic application) into independent HTTP services. The more minimal in functionality, each of your service implementation is, the more closer it is to be called a microservice.

Splitting a single application into multiple services, imposes a few restrictions on our coding, but in turn gives us a lot of flexibility and power in scaling. Let us look at some of the coding/design constraints.

Stateless Systems

The fundamental difference from a monolithic design is in maintaining state information. All the individual classes which earlier interacted via global variables (for locks, mutexes, config variables, etc.) can no longer rely on them.

Let us take a simple example, we are building a rudimentary Shopping application with just one type of item stored. The Shopping application has two parts, the Inventory part that adds new items and the Sales part that removes items. Let us consider the following pseudo-code:

In the above code snippet (trivialized for brevity), we have a global variable stockItemCount, which is protected by a mutex mu. The AddToStock function of the Inventory class/type, adds to this global variable whereas the UpdateStock function of the Sales class/type, removes from the global variable. The mu lock synchronizes the access, such that the functions have exclusive access to the global variable on execution.

In a SOA, the Inventory and the Sales classes will become their own individual HTTP webservices. These new individual classes, viz., SalesService and InventoryService, may now run on different machines.

Inter-Service Co-ordination
So how do these different services potentially running on different machines, share and synchronize access to common data ? The solution is simple. We move away from the globalVariable+mutex pattern and implement a publish-subscribe or queueing pattern. What does that mean ?

We move the stockItemCount management into a separate StockService which is accessible by both the InventoryService and SalesService (earlier considered classes/type). Let us take a look at a sample pseudo-code:

As seen above, we have two Classes which are converted into Services (SalesService and InventoryService) and a new third service named StockService. We also have a Q (a distributed Queue infrastructure) that we use. We have a Operation class/type, with a Type string, whose instance we will be adding to the Q. The AddToStock function in the Inventory service, creates a new Operation item of type "Add", whereas the UpdateStock function of the SalesService creates a new Operation item of type "Remove" to the queue. The StockService has a ProcessQ function which goes on an infinite loop to fetch items from the Q and based on the Type of the operation, perform either addition or deletion of value.

It should be clear now that the SalesService and the InventoryService are now totally stateless. They just make use of the Q to communicate with the StockService. The meticulous among the blog-readers would have observed that the StockService is still stateful. We maintain the count variable still as a global variable. In any large scale system, there may be some components, which may not be completely stateless. We will have some drawbacks because of having such stateful parts. We will discuss them more in a future section.

The Q forms a very central part of the above architecture. The Q can be implemented by the programmer manually, and could potentially be deployed in a totally different set of machine(s) from either of A, B or C. However, there are some stable Queue implementations that we could use, instead of reinventing the wheel. Apache Kafka, RabbitMQ are some popular opensource systems. If you want a hosted solution, Amazon SQS is offered by AWS and Cloud PubSub by Google. These systems could be called Messaging Middleware.

There are projects where a massive amount of data will be generated (say from sensors instead of humans) and we will need realtime processing of streaming data. We could use specialized streaming middleware such as Apache Storm or a hosted solution such as Amazon Kinesis.

Benefits of SOA
As we just saw above, what was a simple single process with two classes, became three different classes and a queueing system with four different processes across four (or more) different machines, to accommodate SOA. Why should a programmer put up with so much of complexity ? What do we get in return ? We will see some of the benefits in this section.

Horizontal Scalability
If we have a server with 4GB RAM serving a 100k requests per second for our above Shopping site, and due to an upcoming holiday season, there will be an estimated increases in the visitors count and we will have to serve, 400k parallel requests per second, we could do one of two things. (1) We could buy more expensive hardware, say a 16GB RAM machine. We could move our site deployment to this bigger machine until the holiday season and get back to the old system later. (2) We could launch another three of 4 GB RAM machines and handle the increased load. The former is called Vertical Scaling and the latter is called Horizontal Scaling.

Vertical scaling is appealing for small workloads but is costlier as we have to provision huge machines. Even if you could rent high-end VMs in the cloud, the pricing is not too friendly. Horizontal scaling is cheaper on your wallet as well as provides more throughput and allows for more dynamism.

Auto-Scaling
In our Shopping application, we saw that the Sales and the Inventory Services are stateless. So we could horizontally scale them individually. For example, we could launch 3 new instances of the SalesService to handle a holiday-traffic while maintaining the single machine for the Inventory service. This kind of flexibility would not have been possible with our earlier monolithic design. However, note that the Stock Service that we had was stateful and so it could not be horizontally scaled. This is the drawback of having stateful components in your architecture.

Once we know that the systems could be horizontally scaled, the next logical progression is to make the scaling automatic. There are systems like Amazon Beanstalk and Google AppEngine (to a certain extent (with vendor lockin)) that allow your application code to automatically horizontally scale by launching new instances whenever the demand is higher. The new instances will be automatically shutdown when the burst of traffic is reduced. This reduces huge IT administration overheads. We could have such nice features, only because our application architecture was composed of stateless services.

Serverless Systems
The next step in the evolution of auto-scaling, is to have code that automatically decides the number of servers on which it should run, instead of having to provision anything. To quote, Dr. Werner Vogels, CTO of Amazon, "No server is easier to manage than no server". We are clearly moving in this direction with serverless webapps. Amazon Lambda brings to life this functional programming dream. Google is not far behind and have recently launched Cloud Functions (but not as rich as AWS Lambda yet imho). We have frameworks to build entire suite of applications without servers, using these services.

Polyglot Development
As we are deploying each service independently, we could use different programming languages, frameworks and technologies for each of the services. For example, any CPU intensive service could be written in a performant language like Go while a bunch of front end code could be written in parallel in React or nodejs.

Mobile First
Since we have developed proper HTTP APIs for our application, in addition to the webclient, any mobile client too could use our webservices. In this day, most of the companies start with a mobile-first or mobile-only strategy and do not require a webclient. Some pro-monolithic engineers tend to argue that the first iteration of development should be in a monolithic model and we could re-engineer for a SOA at a later stage of development, as development speed is faster in monolithic design. Personally, I disagree to this. If we start with a SOA in mind from scratch, with our modern day development stack, we could easily plumb existing things instead of reinventing wheel and could do projects faster. There are frameworks and techniques to auto-generate a lot of code, once we have finalized the APIs. I have had experience building web applications both as a monolith and in SOA from scratch, I have felt happier with SOA code every time, YMMV.

Auxiliary Parts
If we are building a SOA based system, we need to have a lot more auxiliary support systems. If we do not have these auxiliary parts in place, it will be very difficult to measure, debug or optimize. Different companies implement different parts below, based on their business needs and deadlines.

Performance Metrics
The most important auxiliary aspect of SOA is to have precise performance metrics for each of the services. If we have SOA without performance/metrics measurement, It will be as ineffective as trying to do bodybuilding or weightloss without observing what we eat. We will not be able to rate limit requests, prevent DoS attacks, understand the health of the service without measurement. The performance measurement can be done in two ways, (1) Measure the performance and show metrics by realtime event monitoring (2) Log various events, errors, response times, etc., aggregate these logs and batch process them later, to understand the health of various components. We will need a combination of both the approaches for any large scale systems.

Luckily there are plenty of tools, services and libraries available for this. AWS API Gateway is perhaps the easiest way to register your APIs and monitor the endpoints. However we may need more finegrained measurements too (such as how long the calls to the database takes, which user is causing more load, what times are the loads high, etc.). There are various tools that we could use such as statsd, ganglia, nagios, etc. and various companies that offer hosted solutions too, such as sematext, signalfx, newrelic, etc.

Distributed Tracing
Tracing is a concept that is supplementary to metrics and performance measurement. When a new request comes to a service, it may in turn make use of 3-4 other services to serve the original request. Those 3-4 other services may in turn call 3-4 other services. Tracing helps us find out, on a per-request basis, the map of which services are used to serve it, how long it took at each point, where the request is stuck if could not be serviced, etc.

We could achieve tracing by giving a unique id / context object to each incoming new request in the outermost API which receives the request, pass it along as we make further API requests until the final response is finished. This context could be passed along as a parameter in the webservice calls. The monitoring of the tracing events could again be realtime or deducted from log-aggregation.

Dapper is a paper released by Google summarizing how tracing is done in Google. Twitter have released Zipkin, a FOSS implementation of the above Dapper paper, that is in production.

Pagination
Assume that we are exposing an API in our StockService to list all the items that we have, along with its RetailPrice. If we have say a billion products, the response to the API will be huge. Not just the response, the system resources needed to build that response, on the server side will be tremendous. If we are fetching the billion items from the database, the caches will be thrashed, the network will be clogged, etc. To avoid all these issues, any API that could potentially list a lot of items should consider paginating its response by a pagenumber, i.e., an API call should take a page number as a parameter and should return only M number of items in a page. The value of M could be decided based on the size of each item on the response. We can optionally get the number of results that the user wants, also as a HTTP Parameter.

For example:

http://127.0.0.1:8080/posts/label/tech  - Returns the first 10 blog posts with label "tech"

http://127.0.0.1:8080/posts/label/tech/1 - Same as above

http://127.0.0.1:8080/posts/label/tech/2 - Returns blog posts 11 -> 20 with label "tech"

http://127.0.0.1:8080/posts/label/tech/?limit=5 - Returns the first 5 blog posts with label "tech"

http://127.0.0.1:8080/posts/label/tech/?start=15&limit=5 - Returns the blog posts 15 to 20 with label "tech"

API Versioning
If software never changes, we software engineers will be out of jobs. It is good that software evolves. However, we need some contracts/APIs, so that the changes are smooth and does not bring down the entire ecosystem when a change happens. Once we have an exposed an API outside our developer team, it is wiser to never change its request/response parameters.

In our StockService example (that we discussed a few paragraphs ago), we could have the following API:

http://stockservice/items/ - Returns all the items.

Later someone figured out that, it is not wise to return all the items always and decides to change the behavior to return only the first 10 items. This change will break all the existing clients, who will all assume that there are only 10 items in total while in reality we may have a billion more items waiting to be paginated.

The easiest way to regulate the API changes is by adding version to APIs. For example, if the original API to return all the items had a version param, we could just increment it like:

http://stockservice/V1/items/ - Returns all the items
http://stockservice/V2/items - Returns the top 10 items

The version need not be part of the URL always. We could take the Version as an extra HTTP header also, instead of creating a new URL endpoint. It is a matter of taste and each approach has its own pros and cons.

CircuitBreaking
Once we have multiple components in a system, there is a high chance that some part of the system may be down for updates. When such a thing happens, a service could choose to wait for some time before making any attempts to retry if it knows that the service will be failing. Martin Fowler has written in detail about this, which is a good read.

Service Discovery

In a large scale system architected with a microservices based design, you will have a plenty of services. Now each of these service, may want to know the location (URL, ip-address+port, etc.) for the services on which it depends. So we need some kind of a centralized service registry where all these information is stored and maintained.

The easiest and probably most used way to identify these services is through DNS. However, there are plenty of other tools available for this purpose too. ZooKeeper from Apache, etcd from CoreOS are all strongly consistent, distributed datastores which could be used for service discovery. Consul from HashiCorp, Eureka from Netflix are dedicated service discovery software. All of the above are FOSS projects as well. If your application has only less than a dozen services, probably it makes sense to just read from a shared file, across these services, instead of deploying a complex suite of software too. But keep in mind that it won't scale as you grow and so it is better to start with good practices as a habit from the beginning.

SDKs
A new TCP connection takes time to establish because of the initial handshake delay. It will be foolish to not reuse these connections. There is an inherent need for retrying things in HTTP if things fail, before giving up. Some programmers do not like writing HTTP client code always either. It is often recommended to release SDKs for the APIs that we release, to facilitate programmers to consume our APIs easily. For example, a python programmer can merely import our SDK's classes to add an item to our StockService, instead of having to write http retry code.

In the past we have had technologies like DCOM, CORBA, RMI etc. that aimed at doing distributed computing within walled gardens of technology. They lost out in market share due to the simplicity of REST services where HTTP verbs (GET, PUT, POST, DELETE) could perform remote operations, without the need for complex and mostly platform-specific stubs/skeletons etc.

There is a common middle ground where the best of both worlds could be used. The most notable framework for this is gRPC. It is an open source project, started by Google, adopted by many companies (most recently coreos) that helps in providing a web API where the client SDK generation is also made easy. It support http2 as well. If I were starting a new project today, I would give this a serious thought.

Further Information

A very good read on the need for SOA is Steve Yegge's Platforms rant.

Read the techblog of companies who are moving to SOA (not just those who have moved already)

Talk to engineers from Netflix, Amazon Web Services, if you know someone. Sadly both of those companies do not exist in India (as of 2016), even though the both the services are available.

Follow Netflix techblog http://techblog.netflix.com/

Watch AWS reInvent videos and if you have a chance attend that event (instead of events like Google I/O which are more business driven)

Other Notes:

If you like this post, share it with your friends.

Please send any comments / feedback regarding the language or content used. I am planning to use this for teaching material in a college, for a 1 hour talk, shortly. Should some other topics be covered ?

All opinions expressed are purely personal.

Show more