We have had a few posts in the past about the coding philosophy here at MetaBroadcast (e.g. minimalist Java). Like most companies we have certain ways we prefer to do things and those tend to influence the design and architectural choices we make. One of the “tenants”—if you will— of that philosophy is that rather than trying to use one-size-fits-all solutions we instead favour narrowing and separating our various use cases and then using the right tool for each of them.
This is the philosophy that led us to introduce tools like Apache Cassandra and ElasticSearch, to mention two examples, in order to better fit our architecture to the types of queries we need to tackle. It is not however a decision to be taken lightly. Adding new tools to our stack means that we have more to support and maintain as we add complexity to our architecture. On the flip side it also saves us from having to deal with a lot of square-peg-in-round-hole types of issues when a tool is shoehorned to serve a use case that it was not designed for. In the end, as always in engineering, there is a tradeoff to be made.
In the case of graph databases this decision making process started around 2008 with a suggestion to use Neo4j by @glen_ford. Since then we had to satisfy ourselves that Neo4j would cover a real need, that it would do so better than our existing tools and also that it is mature enough for production use and can deliver the levels of performance we need. Many tests were written and a lot of documentation read before we could give Neo4j the go ahead and this blog post could be written.
node_a -[:is_related_to]-> node_b
One of the sources of complexity when dealing with our data are the relationships between various pieces of content. A piece of content could be an episode which belongs to a series which belongs to a brand. The content could have multiple broadcasts and could also have other pieces of content from other providers that it is equivalent to. When the ability to execute arbitrary queries across these relationships is desired it is easy to see that the complexity of dealing with this data structure can increase quite quickly.
This data however can be much easier to reason with if it is represented in the form of a graph and so in the same vein of the “using the right tool for the job” philosophy we have been experimenting with the use of Neo4j to represent this data and to allow us to express arbitrary queries against it.
To give you an example of using the Neo4j Cypher language I’ll write a query to find a single episode, the series and brand it belongs to as well as any broadcasts it may have.
This would return a graph like the following:
This is just a trivial example, but the Cypher language allows for queries to be easily expanded to cover more sophisticated and complicated use cases.
That is all for this week. This was a short introduction into our use of Neo4j. We will continue this article series with some more in-depth posts about some of the more interesting, fun, and challenging things we do with it.
If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.