2012-09-20

I am working on updated our websites product database. It’s built in MySQL but this is more of a general database design pattern question.

I’m planning on switching to a Supertype/Subtype pattern. Our current/previous DB was mainly a single table that had data on a single type of product. We’re looking at expanding our product offering to include dissimilar products.

This new draft design is like this:

The area that I question is the product attributes tables. The idea is that a product can have a list of given attributes like color: red, green, blue, etc or material: plastic, wood, chrome, aluminum, etc.

This list would be stored in a table and the PK for that attribute item will be used in the specific product table as a FK.

(Martin Fowler's book Patterns of Enterprise Application Architecture call this "Foreign Key Mapping")

This allows a website interface to pull the list of attributes for a given attribute type and spit it out in a drop-down select menu or some other UI element. This list can be considered an "authorized" list of attribute values.

The thing that bothers me about this is simply the number of joins that ends up happening when pulling a specific product. You have to join every product attribute table to the product so you can get that attributes fields. Commonly, that field might simply be nothing more than a string (varchar) for its name.

This design pattern ends up creating a large number of tables as well as you end up with a table for each attribute. One idea to conteract this would be to create something more of a “grab bag” table for all product attributes. Something like this:

This way, your table might look like this:

This could help table creep but it doesn’t cut down the number of joins and it feels a little wrong combining so many different types into a single table. But you would be able to get all the available “color” attributes pretty easily.

However, there might be an attribute that has more fields than just "name" such as a colors RGB value. This would require that specific attribute to possible have another table or to have a single field for name:value pair (which has it's own drawbacks and frowny-faces).

The last design pattern I can think of is storing the actual attribute value in the specific product table and not have an “attribute table” at all. Something like this:

Instead of a Foreign Key to another table, it would contain the actual value such as:

This would eliminate joins and prevent table creep (maybe?). However, this prevents having an “authorized list” of attributes. You could return all the currently entered values for a given field (ie: color) but this also eliminates the idea of having an “authorized list” of values for a given attribute.

To have that list, you would have to still create a “grab bag” attribute table or have multiple tables (table creep) for each attribute.

This creates the bigger drawback (and why I’ve never did this approach) of now having the product name in multiple locations.

If you have the color value of “red” in the “master attribute table” and also store it in the “product_[type]” table, an update to the “master” table will cause a potential data integrity issue if the application doesn’t update all the records with the old value in the “product_type” table as well.

So, after my long winded explanation and analysis of this scenario, my realization is that this can’t be an uncommon scenario and there might even a name for this type of situation.

My question is, is there a generally accepted/used solution to this design challenge? Is the potentially large number of joins acceptable if the tables are relatively small? Is storing the attribute name, instead of a attribute PK acceptable under some situation? Is there another solution I’m not thinking about?

Thanks for any help in advance!

Please let me know if this is a poor question and how I could improve it!

A few notes about this product database / application:

Products aren’t frequently updated/added/removed

Attributes aren’t frequently updated/added/removed

The table is most frequently queried for reading / returning information

Server side caching is enabled to cache the result of a given query/result

I plan on starting with just one product type and extending/adding others over time and will have potentially 10+ different types

Show more