Mikesmithers.wordpress.com

The Anti-Pattern – EAV(il) Database Design ?

2013-12-23

Early evening TV in our house is Soap time. Deb annexes the remote control, after which we are treated to an

assortment of angry women being angry with each other in a variety of accents originating from the North of England.

It could be worse, I suppose. We could be subjected to the offering on the other main channel ( angry London women being angry at each other in accents originating from the South East of England).

Then again, either is preferrable to an angry Welsh woman being angry at you in a Welsh accent.

Ok then, how do you make a database professional hot under the collar ? Mention the EAV design pattern.

This pattern goes by many names, most commonly :

EAV – Entity-Attribute-Value

OTLT – One True Lookup Table

Open Schema

Diabolically Enticing Method Of Data Storage (DEMONS)

OK. I made that last one up.

It is with some trepidation ( and having donned precautionary flame-proof underpants) that I am embarking on an exploration on the nature of EAV and whether it can ever be appropriate for use in a Database. Before we go any further though, I’d like to take a moment to clarify exactly what the term “database” means in the context of this discussion

NoSQL, No comment

We’re talking here about Relational Database Management Systems (RDBMS).

There are a number of NoSQL databases out there, which are another matter entirely.

The discussion here pertains to RDBMSs only ( and Oracle in particular).

Therefore, the most relevant comparison is between EAV and the traditional Relational Design approach.

Now that’s out of the way, it’s time to introduce the protagonists….

In the Red Corner…Relational Design

Supplied with the Oracle database is the example HR schema.

This has been designed, as you might expect, using the relational approach. The Physical Data Model looks like this :

A table relationship diagram, just like Mother used to make

Entities such as Employees and Departments are implemented as discreet physical tables. Attributes of these entities are implemented as columns on those tables.

Primary Keys for each entity are identified ( sometimes implementing a synthetic key for convenience).

Relationships between entities are implemented as Referrential Integrity Constraints.

This structure is easy to follow. Well, it’s easy to follow when done properly.The TRD provides a map as to how the data in the application hangs together.

In the Blue Corner…The EAV Pattern

The EAV pattern dictates a rather simpler approach to database design. Rather than having a discreet table for each entity and it’s attributes, it holds everything together in what amounts to a name/value pair.

The resulting data structure, as the name of the pattern suggests, holds details of the entity, the attribute of that entity, and the value of that attribute.

It will also hold a unique key to identify the instance of the entity to which the value relates.

There are variations on this particular theme. EAV can be implemented with a catalogue of entities and attributes and used to define them to a greater or lesser extent. Separate tables can be implemented to account for different value datatypes. It can even be implemented for specific groups of Entities within an application.

For illustrative purposes, I’m going to use a fairly simple three table model containing :

ENTITIES table – a list of valid entities for the application

ATTRIBUTES table – a list of valid attributes for a given entity

ATTRIBUTE_VALUES table – the values for each entity/attribute combination

The resulting DDL is :

So, we have a list of unique entities ( ENTITIES table). Attributes of those entities ( also unique within an entity) are stored in the ATTRIBUTES table.

For ease of reference, we’ve created a synthetic key on ATTRIBUTES (attr_id).

This is used as a Foreign Key in ATTRIBUTE_VALUES.

We could make the model more complex to allow values to be stored in their native datatype by either having an ATTRIBUTE_VALUES table for each type, or by having a value column in the table for each type. For the purposes of this discussion however, I’ve decided to keep it simple.

Adding Entities and Attributes in EAV

Unlike the Relational Model, if we want to add entities and attributes to our EAV model, we simply need to insert the requisite details :

Normally in Oracle, you’d use a sequence to generate the attr_id. Once again, I’ve left this out for brevity as it’s not relevant to the discussion at hand.

The comparative ease with which attributes can be added is what leads proponents of this pattern to characterise it as flexible.

Entering Data

Inserting records into the EAV implementation is a little different to the Relational one.

Adding a Department and a couple of employee records in the relational model would require a single insert statement for each record :

To save the same information in the EAV model would require something like :

Getting Data Out

As you’d expect from the name, retrieving an array of name-value pairs using EAV is reasonably straightforward :

Nothing too scary there. OK, what if we want to get the department name instead of the department id.

In a relational model, the query would be fairly trivial :

The EAV query, however, is more of a challenge :

Techniques employed to mitigate this complexity are typically either to create views (as in this example) to present the data in a more SQL-friendly manner, or to programatically generate queries on-the-fly.

Such code generation is perfectly possible, and the sophistication of the generated queries will depend on the meta-data being stored in the ENTITY and ATTRIBUTE tables.

Interestingly, some proponents of EAV see this issue as being a problem with SQL’s lack of flexibility rather than with the pattern itself.

We’ll come back to that in a bit.

The Attractions of EAV

Flexibility

As mentioned above, the main technical attraction of EAV is the flexibility it provides in terms of defining new attributes…and entities, come to that.

Not only are such changes an insert statement away, it also allows attribute and entity names of any reasonable length you desire. In Oracle 11g ( and since time immemorial), the maximum length of a table or column is 30 characters.

In the above example code, we have the facility to define entity and attribute names of up to 100 characters.

A less technical, but still pertinent point – in large corporations, changes to the data model tend to have to go through some exacting (and, in some cases, painfully beaurocratic) approval process. The EAV approach can be seen as a way of by-passing this tangle of red tape.

Simplicity

The simplicity of the data model ( in our case, create three tables, job done), means that modelling does not take a great deal of time.

Most of the world seems to be going Agile these days ( or at least, using Agile terms). Having that data model sticky move across the Sprint Board in a single sprint is a sight to warm the cockles of a Scrum Master’s heart.

In other contexts, it may be the Web Designer, who needs to develop a data-driven site, who sees the advantages of EAV. Minimal messing about with the database design so they can get on with what they do best.

Reusability

You can use this model to store more-or-less anything you want. You never have to design a database ever again.

I mean, let’s face it, what could possibly go wrong ?

The nature of Relational Databases

SQL

It is an eternal truth that the biggest factor affecting the performance of an application running on an RDBMS is the design of it’s data model.

SQL is intrinsic to an RDBMS in the same way that your brain is intrinsic to your body.

Whatever code you write to do anything with data in an RDBMS, at some level a SQL statement will be executed.

SQL has it’s roots in the same relational theory that gave rise to relational databases.

Furthermore, SQL is a declarative language. You tell it what you want it to do, it has to work out how best to do it.

You can help SQL become more efficient by giving the optimizer tools to help it.

These tools include indexes, histograms, even RI constraints.

These tools are, at best, of limited value in an EAV model.

Furthermore, distribution of data becomes somewhat problematic as most of it is stored in one (or few) tables.

Of course, this can be mitigated to an extent by means of partitioning. However, in Oracle, partitioning is an additional cost feature of the database Enterprise Edition, which is not what you would call cheap.

The Catalog

Part of the definition of a Relational Database is that it retains a catalog of all of the objects held within it.

In Oracle, this catalog is known as the Data Dictionary.

If you create an object, it’s automatically saved in tables in the Data Dictionary.

If you create a table, you can find it in the USER_TABLES data dictionary view. A table’s columns can be found in the USER_TAB_COLS data dictionary view.

To the developer, all of this meta-data is “for free”. No special code needs to be written to ensure that this meta data is stored, it just happens automatically.

The problem with EAV…

It’s at this point where we can spell out the catch. Or catches.

Relational Database Management Systems tend to operate most efficiently when application data is arranged in broadly relational structures.

An EAV approach is likely to run into a number of challenges.

Performance degradation over time

Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.

Applications based on the relational approach are also likely to hit this problem. The difference is that the tools available to deal with these issues are largely unavailable for the EAV model.

When tuning problem statements, the first task is usually to look at index usage. Should there be an index ? Is there an index which is not being used where it should ? etc.

Indexes are built on one or more columns of a table. You can’t add an index to an attribute in our attribute_values table.

Reporting

We’ve already looked at the challenges presented in getting reports out of the EAV design.

Whatever the original requirements an application has been built to, it’s quite likely that at some point, someone will ask for some reporting/analysis on the data in the application.

This is likely to prove painful both in terms of complexity and, once again, performance.

Data Integrigy

As I’ve already observed, data integrity checks that are intrinsic to a Relational Database when using the Relational Design approach are largely absent in the EAV model.

It is possible to implement them “manually” but this requires a fair amount of effort.

The sort of things we’re talking aobut here include :

datatype of input data

maximum allowable size of input data ( length of a VARCHAR, number of decimal places in a numeric value etc)

uniqueness of a record with in an entity( there’s only one Deb !)

parent-child relationships between attributes (e.g. an Employee must be a member of an existing Department)

All of that additional code will add complexity to your application.

Of course, there is a design decision to be made in terms of how much of this you implement in EAV.

It’s often the case that a trade-off occurs between effort and data integrity.

The upshot is that EAV applications can be susceptible to data corruption ( orphaned records, duplicate records etc).

The Learning Curve

As I’ve said before, a Table Relationship Diagram for a Relational data model is a bit like a map. Following the relations between tables makes it relatively easy to learn how the data in an application hangs together.

No such map is readily available with EAV. As a result, the learning curve for new developers or support analysts tends to be significantly steeper.

This issue is exacerbated where the application employs a significant amount of dynamic SQL.

Such code is much harder to understand and debug as, before anything else, you have to figure out the SQL statement that is actually executed.

EAV as a replacement for Relational Design

The EAV pattern looks very appealing at first glance.

At first glance is quite appropriate here. The upsides of EAV are usually most apparent during design and development.

The downsides don’t tend to manifest themselves until after the application has gone live.

There are some problem domains, such as online catalogs, where EAV is put forward as the best solution for holding volatile, sparsely populated attributes.

Based upon my own first-hand experience of supporting applications in which EAV has been employed, I would be inclined to treat such assertions with some scepticism.

I think that the key point to bear in mind is that no system is perfect out-of-the-box.

Whatever design pattern you employ, be it Relational or EAV, there will be gaps – stuff you just haven’t thought of or anticipated. This could be in the model itself, the reporting requirements, or the data and usage volumes.

In such circumstances EAV would appear to provide a much smaller margin of error. If you don’t get it spot-on first time, you’re in big trouble. Even if you do, the scope for the evolution of your application is far more limited by it’s design than would be the case for the Relational approach.

I’m not the only one to have had direct and painful experience of supporting an EAV designed application on a Releational Database. This particular horror story is an extreme example, but does encapsulate some of the problems commonly encountered with this pattern.

So, is there absolutely no place for EAV in a Relational Database ? Well, I wouldn’t say that….

The Thingy Table

One of the first ( and best) Oracle books I read, many years ago, was Ed Koch’s Oracle 7 – Complete Reference.

In the book, Koch made the point that sometimes, you have an item of data that just doesn’t relate to anything else.

To store this data, Koch advanced the concept of the “Junk Drawer” table.

An example of such a data item might be the Company Registration Number, which goes on the footer of all correspondence generated from the application.

Rather than hard-coding this value in every letter generation program, you could simply store it in a table.

Other instances where such a table would be useful might include values that can be looked up from stored program units.

Some companies have a shutdown period over the Christmas and New Year period. The dates of the shutdown vary from year to year. Storing the start and end dates for the shutdown in a table would enable any scheduled jobs over the period to reference these values and then simply terminate if they were running during the shutdown.

Of course, such items could just be declared in a PL/SQL package header. However, changing them would cause invalidations of any database stored program unit dependent on the package in question.

With the table approach, this is not an issue.

Such a table would probably look somewhat familiar to the EAV attribute_values table :

To be clear, this is not the same as taking an OTLT ( One True Lookup Table) approach where a number of entities, each with only a few rows, are lumped together in one big table.

This is more a home for these, rare, data items that don’t relate to anything else.

Care needs to be taken about what data is held in this table.

There is always a danger that inappropriate data items will be added and the table can quickly grow from a few application configuration values into several thousand rows.

When this happens, the advantage of having this table is lost and there is a high probability that all of the problems associated with EAV will manifest themselves.

Conclusion

EAV is one of those oh-so-tempting but oh-so-dangerous design patterns in the context of RDBMS.

Extreme caution ( and some serious analysis) needs to be done before implementing it in your database.

If you decide it really is the solution to all of your problems, then consideration needs to be given as to whether an RDBMS is what you should be using in the first place.

I’m in the good books (with Deb, at least) as I’ve volunteered to cook Christmas Dinner, and do the washing up afterwards.

Hopefully that take long enough to avoid the Soap Christmas Special extravaganza that’s otherwise coming my way.

Filed under: Oracle, PL/SQL, SQL Tagged: database design, eav pattern, in-line views, otlt, relational model, thingy table