2008-10-20

This idea that there is generality in the specific is of far-reaching importance.
— Douglas Hofstadter, Gödel, Escher, Bach

Note: Today's entry is a technical article: it isn't funny. At least not intentionally.

Update, Oct 20th 2008: I've added an Updates section, where I'll try to track significant responses, at least for a week or so. There are three entries so far.

Contents

Introduction

Three Great Schools of Software Modeling

Class Modeling

Relational Modeling

XML Modeling

Other schools

Finding the sweet spot

Property Modeling

Brains and Thoughts

Who uses the Properties Pattern?

Eclipse

JavaScript

Pushing it even further

The pattern takes shape...

Wyvern

Lisp

XML revisited

Bigtable

Properties Pattern high-level overview

Representations

Keys

Quoting

Missing keys

Data structures

Inheritance

The deletion problem

Read/write asymmetry

Read-only plists

Performance

Interning strings

Perfect hashing

Copy-on-read caching

Refactoring to fields

Refrigerator

REDACTED

Rolling your own

Transient properties

The deletion problem (remix)

Persistence

Query strategies

Backfills

Type systems

Toolkits

Problems

Further reading

New Updates

Final thoughts

Introduction

Today I thought I'd talk about a neat design pattern that doesn't seem to get much love: the Properties Pattern. In its fullest form it's also sometimes called the Prototype Pattern.

People use this pattern all over the place, and I'll give you a nice set of real-life examples in a little bit. It's a design pattern that's useful in every programming language, and as we'll see shortly, it's also pretty darn useful as a general-purpose persistence strategy.

But even though this pattern is near-universal, people don't talk about it very often. I think this is because while it's remarkably flexible and adaptable, the Properties Pattern has a reputation for not being "real" design or "real" modeling. In fact it's often viewed as a something of a shameful cheat, particularly by overly-zealous proponents of object-oriented design (in language domains) or relational design (in database domains.) These well-meaning folks tend to write off the Properties Pattern as "just name/value pairs" – a quick hack that will just get you into trouble down the road.

I hope to offer a different and richer perspective here. With luck, this article might even help begin the process making the Properties Pattern somewhat fashionable again. Time will tell.

Three Great Schools of Software Modeling

Before I tell you anything else about the Properties Pattern, let's review some of the most popular techniques we programmers have for modeling problems.

I should point out that none of these techniques is tied to "static typing" or "dynamic typing" per se. Each of these modeling techniques can be used with or without static checking. The modeling problem is orthogonal to static typing, so regardless of your feelings about static checking, you should recognize the intrinsic value in each of these techniques.

Class Modeling

You know all about this one. Class-based OO design is the 800-pound gorilla of domain modeling these days. Its appeal is that it's a natural match for the way we already model things in everyday life. It can take a little practice at first, but for most people class modeling quickly becomes second nature.

Although the industry loves OO design, it's not especially well liked as an academic topic. This is because OO design has no real mathematical foundation to support it — at least, not until someone comes along and creates a formal model for side effects. The concepts of OOP stem not from mathematics but from fuzzy intuition.

This in some sense explains its popularity, and it also explains why OOP has so many subtly different flavors in practice: whether (and how) to support multiple inheritance, static members, method overloading vs. rich signatures, and so on. Industry folks can never quite agree on what OOP is, but we love it all the same.

Relational Modeling

Relational database modeling is a bit harder and takes more practice, because its strength stems from its mathematical foundation. Relational modeling can be intuitive, depending on the problem domain, but most people would agree that it is not necessarily so: it takes some real skill to learn how to model arbitrary problems as relational schemas.

Object modeling and relational modeling produce very different designs, each with its strengths and weaknesses, and one of the trickiest problems we face in our industry has always been the object-relational mapping (ORM) problem. It's a big deal. Some people may have let you to believe that it's simple, or that it's automatically handled by frameworks such as Rails or Hibernate. Those who know better know just how hard ORM is in real-world production schemas and systems.

XML Modeling

XML provides yet another technique for modeling problems. Usually XML is used to model data, but it can also be used to model code. For instance, XML-based frameworks such as Apache Ant and XSLT offer computational facilities: loops or recursion, conditional expressions and setting variables.

In many domains, programmers will decide on an XML representation before they've thought much about the class model, because for those domains XML actually offers the most convenient way of thinking about the problem.

The kinds of data that work well with XML modeling tend to be poorly suited for relational modeling, and vice-versa, with the practical result that XML/relational mapping is almost as infamously thorny as O/R mapping.

And as for XML/OO mapping, most of us tend to treat it as a more or less solved problem. However, in practice there are several competing ways of doing XML/OO mapping. The W3C DOM and SAX enjoy the broadest use, but they are both sufficiently cumbersome that alternatives such as JDom and REXML (among others) have gained significant followings.

I mention this not to start a fight, but only to illustrate that XML is a third modeling technique in its own right. It has both natural resonances and surfaces of friction with both relational design and OO design, as one might expect.

Other schools

I'm not claiming that these three modeling schools are the only schools out there – far from it! Two other obvious candidates are Functional modeling (in the sense of Functional Programming, with roots in the lambda calculus) and Prolog-style logical modeling. Both are mature problem-modeling strategies, each with its pros and cons, and each having varying degrees of overlap with other strategies. And there are still other schools, perhaps dozens of them.

The important takeaway is that none of these modeling schools is "better" than its peers. Each one can model essentially any problem.

There are tradeoffs involved with each school, by definition — otherwise all but one would have disappeared by now.

Finding the sweet spot

Sometimes it makes sense to use multiple modeling techniques in the same problem space. You might do a mixed XML/relational data design, or a class-based OO design with Functional aspects, or embed a rules engine in a larger system.

Choosing the right technique comes down to convenience. For any given real-world problem, one or two modeling schools are likely to be the most convenient approaches. Exactly which one or two depends entirely on the particulars of the problem.

By convenient, I mean something different from what you might be thinking. To me, a convenient design is one that is convenient for the users of the design. And it should also be convenient to express, in the sense of minimalism: all else being equal, a smaller design beats a big one. One way of looking at this is that the design should be convenient for itself!

Unfortunately, most programmers (myself included) tend to use exactly the wrong definition of convenience: they choose a modeling technique that is convenient for themselves. If they only have experience in one or two schools, guess which techniques they'll jump to for every problem they face?

This problem rears its head throughout computing. There's always a "best" tool for any job, but if programmers don't know how to use it, they'll choose an inferior tool because they think their schedule doesn't permit a learning curve. In the long run they're hurting their schedules, but it's hard to see that when you're down in the trenches.

Modeling schools are just like programming languages, web frameworks, editing environments and many other tools: you won't know how to pick the right one unless you have a reasonably good understanding of all of them, and preferably some practice with each.

The important thing to remember is that all modeling schools are "first class" in the sense of being able to represent any problem, and no modeling school is ideal for every situation. Just because you are most comfortable solving a problem using a particular strategy does not mean that it is the ideal solution to the problem. The best programmers aim to master all available techniques, giving them a better chance at making the right choices.

Property Modeling

With this context in mind, I claim that the Properties Pattern is yet another kind of domain modeling, with its own unique strengths and tradeoffs, distinct from all the other modeling schools I've mentioned. It is their first-class peer, inasmuch as it is capable of modeling the same broad set of problem domains.

After we've finished talking about the Properties Pattern in exhausting detail, I think I'll have convinced you of the pattern's status as a major school of modeling. Hopefully you'll also start to have a feel for the kinds of problems it's well-suited to solve – sometimes more so than other schools, even your current favorite.

But before we dive into technical details, let's take a brief peek at a fascinating comparison of Property-based modeling to class-based OO design. It's a non-technical argument that I think has some real force behind it.

Brains and Thoughts

Douglas Hofstadter has spent a lifetime thinking about the way we think. He's written about it perhaps more than anyone else in the past century. Even if someone out there has beaten him in sheer quantity of words on the subject, nobody has come close to rivaling his style or his impact on programmers everywhere.

All of his books are wonderfully imaginative and are loads of fun to read, but if you're a programmer and you haven't yet read Gödel, Escher, Bach: An Eternal Golden Braid (usually known as "GEB"), then I envy you: you're in for a real treat. Get yourself a copy and settle in for one of the most interesting, maddening, awe-inspiring and just plain fun books ever written. The Pulitzer Prize it won doesn't nearly do it justice. It's one of the greatest and most unique works of imagination of all time.

Hofstadter made a compelling argument in GEB (thirty years ago!) that property-based modeling is fundamental to the way our brains work. In Chapter XI ("Brains and Thoughts"), there are three little sections titled Classes and Instances, The Prototype Principle, and The Splitting-off of Instances from Classes that together form the conceptual underpinnings of the Properties Pattern. In these little discussions Hofstadter explains how the Prototype Principle relates to classic class-based modeling.

I wish I could reproduce his discussion in full here — it's only three pages — but I'll have to just encourage you to go read it instead. His thesis is this:

The most specific event can serve as a general example of a class of events.

Hofstadter offers several supporting examples for this thesis, but I'll paraphrase one of my all-time favorites. It goes more or less as follows.

Imagine you're listening to announcers commenting on an NFL (American football) game. They're talking about a new rookie player that you don't know anything about. At this point, the rookie – let's say his name is L.T. – is just an instance of the class "football player" with no differentiation.

The announcers mention that L.T. is a running back: a bit like Emmitt Smith in that he has great speed and balance, and he's great at finding holes in the defense.

At this point, L.T. is basically an "instance" of (or a clone of) Emmitt Smith: he just inherited all of Emmitt's properties, at least the ones that you're familiar with.

Then the announcers add that L.T. is also great at catching the ball, so he's sometimes used as a wide receiver. Oh, and he wears a visor. And he runs like Walter Payton. And so on.

As the announcers add distinguishing attributes, L.T. the Rookie gradually takes shape as a particular entity that relies less and less on the parent class of "football player". He's become a very, very specific football player.

But here's the rub: even though he's a specific instance, you can now use him as a class! If Joe the Rookie comes along next season, the announcers might say: "Joe's a lot like L.T.", and just like that, Joe has inherited all of L.T.'s properties, each of which can be overridden to turn Joe into his own specific, unique instance of a football player.

This is called prototype-based modeling: Emmitt Smith was a prototype for L.T., and L.T. became a prototype for Joe, who in turn can serve as the prototype for someone else. Hofstadter says of The Prototype Principle:

"This idea that there is generality in the specific is of far-reaching importance."

Again, it's a three-page discussion that I've just skimmed here. You should go read it for yourself. Heck, you should read the whole book: it's one of the greatest books ever written, period, and every programmer ought to be familiar with it.

Hopefully my little recap of Hofstadter's argument has convinced you that my calling it the "Universal Design Pattern" might just possibly be more than a marketing trick to get you to read my blog, and that the rest of this article is worth a look.

The Properties Pattern is unfortunately big enough to deserve a whole book. Calling an entire school of modeling a "design pattern" is actually selling it short by a large margin.

Hence, if this article seems excruciatingly long, it's because I've tried to cram a whole book into a blog, as I often do. But look on the bright side! I've saved you a bunch of time this way. This will go much faster than reading a whole book.

Even so, don't feel bad if it takes a few sittings to get through it all. It's still a lot of information. I considered splitting it into 3 articles, but instead I just cut about half of the material out. (Jeff and Joel: seriously. I cut 50%.)

Who uses the Properties Pattern?

I assume I've already convinced you that this pattern is worth learning about, or you'd have left by now. I'm showing you these use cases not to draw you in, but to show you some of the very different ways the pattern can be used.

We could probably find hundreds of examples, but I'll focus on just a handful of real-world uses that I hope will illustrate just how widespread and flexible it is.

Before we start, there are two things to keep in mind. The first is that people have different names for this pattern, because even though it's quite commonplace, there hasn't been much literature on it. One paper calls it Do-It-Yourself Reflection. Another article calls it Adaptive Object Modeling. See Further reading for the few links I could dig up.

Whatever name you use, once you know how to look for it, you'll start seeing this pattern everywhere, wearing various disguises. Now that we have a common name for it, it should be easier to spot in the wild.

The second thing to keep in mind is that the Properties Pattern scales up: you can choose how much to use it in your system. It can be anything from simple property lists attached to a few of your classes to make them user-annotatable, up through a full-fledged prototype-based framework that serves as the foundation for modeling everything in your system.

So our examples will range from small ones to very big ones.

Eclipse

One nice small-scale example of the pattern is the Eclipse Java Development Tools (JDT): a set of classes that model the Java programming language itself, including the abstract syntax tree, the symbol graph and other metadata. This is used by the Eclipse backend to do all the neat magic it does with your Java code, by treating Java source code as a set of data structures.

You can view the javadoc for this class hierarchy at help.eclipse.org. Click on JDT Plug-in Developer Guide, then Programmer's Guide, then JDT Core, then org.eclipse.jdt.core.dom. This package defines strongly-typed classes and interfaces that Eclipse uses for modeling the Java programming language itself.

If you click through any class inheriting from ASTNode you'll see that it has a property list. ASTNode's javadoc comment says:

"Each AST node is capable of carrying an open-ended collection of client-defined properties. Newly created nodes have none. getProperty and setProperty are used to access these properties."

I like this example for several reasons. First, it's a very simple use of the Properties pattern. It doesn't muck around with prototypes, serialization, metaprogramming or many of the other things I'll talk about in a little bit. So it's a good introduction.

Second, it's placed smack in the middle of a very, very strongly-typed system, showing that the Properties pattern and conventional statically-typed classed-based modeling are by no means mutually exclusive, and can complement one another nicely.

And third, their property system itself is fairly strongly typed: they define a set of support classes such as StructuralPropertyDescriptor, SimplePropertyDescriptor, and ChildListPropertyDescriptor to help place some constraints on client property values. I'm not a huge fan of this approach myself, since I feel it makes their API fairly heavyweight. But it's a perfectly valid stylistic choice, and it's useful for you to know that you can implement the pattern this way if you so choose.

JavaScript

At the other end of the "how far to go with it" spectrum we have the JavaScript programming language, which places the Prototype Principle and Properties Pattern at the very core of the language.

People love to lump dynamic languages together, and they'll often write off JavaScript as some sort of inferior version of Perl, Python or Ruby. I was guilty of this myself for over a decade.

But JavaScript is substantively different from most other dynamic languages (even Lisp), because it has made the Properties Pattern its central modeling mechanism. It borrowed this heritage largely from a language called Self, and some other modern languages (notably Io and one other language that I'll talk about below) have also chosen prototypes and properties over traditional classes.

In JavaScript, every user-interactible object in the system inherits from Object, which has a built-in property list. Prototype inheritance (think back to our example of the Emmitt Smith instance having been the prototype for the L.T. instance) is a first-class language mechanism, and JavaScript offers several kinds of syntactic support for accessing properties and declaring property lists (as "object literals").

JavaScript is often accurately described as the world's most misunderstood programming language. Armed with our newfound knowledge, we can start to see JavaScript in a new light. To use JavaScript effectively, you need to gain experience with a whole new School of Modeling. If you simply try to use JavaScript as a substitute for (say) Java or Python, you'll encounter tremendous friction.

Since most of us have precious little actual experience with property-based modeling, this is exactly what happens, and it's no wonder JavaScript gets a bad rap.

Pushing it even further

In addition to how centrally you want to use the Properties pattern in your system, you can also decide how recursive to make it: do your properties have explicit meta-properties? Do you have metaprogramming hooks? How much built-in reflection do you offer?

JavaScript offers a few metaprogramming hooks. One such hook is the recently-introduced __noSuchMethod__, which lets you intercept a failed attempt to invoke a nonexistent function-valued property on an object.

Unfortunately JavaScript does not offer as many hooks as I'd like. For instance, there is no corresponding __noSuchField__ hook, which limits the overall flexibility somewhat. And there are no standard mechanisms for property-change event notification, nor any reasonable way to provide such a mechanism. So JavaScript gets it mostly right, but it stops short, possibly for performance reasons, of offering a fully-extensible metaprogramming system such as those offered by SmallTalk and to some extent, Ruby.

The pattern takes shape...

Before we move on to other uses of the Property Pattern, let's put JavaScript (and its central use of the pattern) into perspective, by comparing it to another successful language.

First: JavaScript is not my favorite language. I've done a lot of JavaScript programming over the past 2 years or so, both client-side and server-side, so I'm as familiar with it as I am with any other language I've used.

JavaScript in its current incarnation is not the best tool for many tasks. For instance, it's not great for building APIs, and it's not great for Unix scripting the way Perl and Ruby are. It has no library or package system, no namespaces, and is missing many other modern conveniences. If you're looking for a general-purpose language, JavaScript leaves you wanting.

But JavaScript is the best tool for many other tasks. As just one example, JavaScript is an outstanding language for writing unit tests — both for itself, and also for testing code in other languages. Being able to use the Properties Pattern to treat every object (and class) as a bag of properties makes the creation of mock objects a dream come true. The syntactic support for object literals makes it even better. You don't need any of the silly frameworks you see coming from Java, C++ or even Python.

And JavaScript is one of the two best scripting languages on the planet, in the most correct sense of the term "scripting language": namely, languages that were designed specifically to be embedded in larger host systems and then used to manipulate or "script" objects in the host system. This is what JavaScript was designed to do. It's reasonably small with some optional extensions, it has a reasonably tight informal specification, and it has a carefully crafted interface for surfacing host-system objects transparently in JavaScript.

In contrast, Perl, Python and Ruby are huge sprawls, all trying (like C++ and Java) to be the best language for every task. The only other mainstream language out there that competes with JavaScript for scripting arbitrary host systems is Lua, famous for being the scripting language of choice for the game industry.

And wouldn't you know it, Lua is also a language that uses the Properties Pattern as its central design. Its central Table structure is remarkably similar to JavaScript's built-in Object, and Lua also uses prototypes rather than classes.

So the world's two most successful scripting languages are prototype-based systems. Is this just a cosmic coincidence? Or is it possible that a suitably designed class-based language could have been just as successful?

It's hard to say. I've used Jython as an embedded scripting language for a long time, and it's worked pretty well. But I've personally come to believe that the Properties Pattern is actually better suited for extensibility than class-based modeling, and that prototype-based languages make better extension languages than class-based languages. That's effectively what's happening with embedded scripting: the end-users are growing and extending the host system.

In fact I was convinced of it before I even knew JavaScript. Let's take a look at another interesting "Who uses it?" example: Wyvern.

Wyvern

My multiplayer game Wyvern takes the Properties Pattern quite far as well, although in some different directions than what we've discussed so far. I designed Wyvern long before I'd heard of Self or Lua, and before I'd learned anything about JavaScript. In retrospect it's amazing how similar my design was to theirs.

Wyvern is implemented in Java, but the root GameObject class has a property list, much like JavaScript's Object base class. Wyvern has prototype inheritance, but since I'd never heard of prototypes before, I called them archetypes. In Wyvern, any game object can be the archetype for any other game object, and property lookup and inheritance work more or less identically to the way they work in JavaScript.

I arrived at this design after scratching my head for months (in late 1996) over how to build the ultimate extensible game. I wanted all the game content to be created by players, and I came up with dozens upon dozens of detailed use cases, in all of which I wanted players to be able to extend the game functionality in surprising new ways. In the end I arrived at a set of interleaved design patterns, including a rich command system, a rich hooks/advice system, and several other subsystems I'd love to document someday.

But the core data model was the Properties Pattern.

In some ways, Wyvern's implementation is more full-featured than JavaScript's. Wyvern offers more metaprogramming facilities, such as vetoable property change notifications, which gives in-game objects tremendous flexibility in responding to their environment. Wyvern also supports both transient and persistent properties, a scheme I'll discuss below.

On other ways, Wyvern just made different decisions. One big one is that Wyvern's property values are statically typed. The property names are always strings, just like in JavaScript, but the values can be various leaf types (ints, longs, booleans, strings, etc.), or functions (a trick that wasn't easy in Java), or even archetypes.

But despite the differences, Wyvern's core property-list infrastructure is a lot like that of JavaScript, Self and Lua. And it's been a design I've been fundamentally happy with for over ten years. It's met or exceeded all my original expectations for enabling end-user extensibility, particularly in its ability to let people extend the in-game behavior on the fly, without needing to reboot. This has proven extraordinarily powerful (and popular with the players.)

Where Wyvern clearly got the pattern wrong was in its lack of support for syntax. As soon as I decided to use the Properties pattern centrally in my game, I should have decided to use a programming language better suited for implementing the pattern: ideally, one that supports it from the ground up.

I eventually wound up using Python (actually, Jython) for a ton of my code, and it was far more succinct and flexible than anything I wrote in Java. But I was foolishly worried about performance, and as a result I wound up writing at least half the high-level game logic in Java and piling on hundreds of thousands of lines of getProperty and setProperty code. And now the system is hard to optimize; it would have been much easier if I'd had a cleaner separation of game-engine infrastructure from "scripty" game code.

Even if I'd done the whole game in Python, I'd still have had to implement a prototype inheritance framework to enable any object to be able to serve as the prototype for any other object.

I realize I haven't really explained why prototype inheritance works so well, except for my brief mention of mock objects for unit testing. But to keep this article tractable, I had to delete several pages of detailed examples, such as "Chieftain Monsters" that could be programmatically constructed by adding a few new properties to any existing monster

When I told you this pattern was big enough for a book, I meant a big book. Without the examples handy, all I can do is say that using JavaScript/Rhino (or Lua, once it became available on the JVM) might have made my life easier. Or heck, writing my own language might have been the best choice for a system that large and ambitious.

In any case, live and learn. It's a lot of code, but Wyvern is still a properties-based, prototype-based system, and it has amazing open-ended flexibility as a result.

We've been through the two big examples now (Wyvern and JavaScript). I'll close this "Who Uses It" section with just a few more key examples.

Lisp

Lisp features a small-scale example of the Properties Pattern: it has property lists for symbols. Symbols are first-class entities in Lisp. They're effectively the names in your current namespace, like Java's fully-qualified class names.

If Java classes all had property lists, it would still be a small-scale instantiation of the Properties pattern, but it would open up an awful lot of new design possibilities to Java programmers. Similarly, Lisp stops short of making everything have a property list, but to the extent it offers property lists they're exceptionally useful design tools.

Emacs Lisp actually makes heavy use of the Properties Pattern, inasmuch as essentially every one of its thousands of configuration settings is a property in the global namespace. It supports the notion of transient vs. persistent properties, and it offers a limited form of property inheritance via its buffer-local variables.

Unfortunately Emacs doesn't support any notion of prototypes, and in fact it doesn't have any object-orientation facilities at all. Sometimes I want to model things in Emacs using objects with flexible property lists, and at such times I find myself wishing I were using JavaScript. But even without prototypes, Emacs gains significant extensibility from its use of properties for data representation.

Keep in mind that there are, of course, big tradeoffs to make when you're deciding how much to use the Properties pattern; I'll discuss them in a bit. I'm not criticizing any of the systems or languages here for the choices they've made; all of them have been improved by their use of this pattern, regardless of how far they decided to take it.

XML revisited

Earlier I described XML as a first-class modeling school. Now that we have more context, it's possible to view XML as being an instantiation of the Properties Pattern, inasmuch as it uses the pattern as part of its fundamental structure.

XML's view of the pattern is two-dimensional: properties can take the form of either attributes or elements, each kind having different syntactic and semantic restrictions. Many folks have criticized XML for the unnecessary complexity of this redundant pair of property subsystems, but in the end it doesn't really matter much, since two ways to model properties is still better than zero ways.

So far we've seen various policies around static checking: Eclipse (strong/mandatory), JavaScript/Lua (very little), and Wyvern (moderate).

XML offers what I think is the ideal policy, which is that it lets you decide for yourself. During your initial domain modeling (the "prototyping" phase — a term now loaded with Even More Delicious Meaning), you can go with very weak typing, opting for nothing more than simple well-formedness checks. And for many problems, this is as much static checking as you'll ever need, so it's nice that you have this option.

As some of your models become more complex, you can choose to use a DTD for extra validation. And if you need a really heavy-duty constraint system, you can migrate up to a full-fledged XML Schema or Relax NG schema, depending on your needs.

XML has proven to be a very popular modeling tool for Java programmers in particular — more so than for the dynamic language communities. This is because Java offers essentially zero support for the Properties Pattern. When Java programmers need access to the pattern, the easiest approach is currently to use XML.

The Java/XML combination has proven reasonably powerful, despite the lack of syntactic integration and numerous other impedance mismatches. Using XML is still often preferable to modeling things with Java classes. Even Eclipse's AST property lists might have been better modeled using XML and a DOM: it would have been less work, and the interface would have been more familiar. And as for Apache Ant: JSON-style JavaScript objects for build files would have been exactly what they needed, but by the time they'd realized they needed a plug-in system, the damage was done.

As Mozilla Rhino becomes better documented, and as more Java programmers begin to appreciate the usefulness of JSON as a lightweight alternative to XML, JavaScript may begin to close the gap. Rhino provides Java with the Properties Pattern much more seamlessly than any XML solution. I've already mentioned that it's superior (even to XML) for unit tests and representing mock test data.

But it goes deeper than unit testing. Every sufficiently large Java program, anything beyond medium-sized, needs a scripting engine, whether the authors realize it or not. Programs often have to grow to the size of browsers or spreadsheets or word processors before the authors finally realize they need to offer scripting facilities, but in practice, even small programs can immediately benefit from scripting. And XML doesn't fit the bill. It's yet another example of programmers choosing a School of Modeling because they know it, rather than learning how to use the right tool for the job.

So it goes.

Bigtable

Last example: Google's Bigtable, which provides a massively scalable, high performance data store for many Google applications (some of which are described in the paper – click the link if you're curious.) This particular instantiation of the Properties Pattern is a multidimensional table structure, where the keys are simple strings, and the leaf values are opaque blobs.

Hardcore relational data modelers will sometimes claim that large systems will completely degenerate in the absence of strong schema constraints, and that such systems will also fail to perform adequately. Bigtable provides a nice counterexample to these claims.

That said, explicit schemas are useful for many domains, and I'll talk more about how they relate to the Properties Pattern in a bit.

This would probably be a good time to mention Amazon's Simple Storage Service, but I don't know anything about it. I've heard they use name-value pairs.

In any case, I hope these examples (Eclipse AST classes, JavaScript, Wyvern game objects, Lisp symbols, XML and HTML, and Bigtable) have convinced you that the Properties pattern is ubiquitous, powerful, and multifaceted, and that it should be part of any programmer or designer's lineup.

Let's look in more depth at how it's implemented, its trade-offs, and other aspects of this flexible design strategy.

Properties Pattern high-level overview

At a high level, every implementation of the Properties Pattern has the same core API. It's the core API for any collection that maps names to values:

get(name)

put(name, value)

has(name)

remove(name)

There are typically also ways to iterate over the properties, optionally with a filter of some sort.

So the simplest implementation of the Properties Pattern is a Map of some sort. The objects in your system are Maps, and their elements are Properties.

The next step in expressive power is to reserve a special property name to represent the (optional) parent link. You can call it "parent", or "class", or "prototype", or "mommy", or anything you like. If present, it points to another Map.

Now that you have a parent link, you can enhance the semantics of get, put, has and remove to follow the parent pointer if the specified property isn't in the object's list. This is largely straightforward, with a few catches that we'll discuss below. But you should be able to envision how you'd do it without too much thought.

At this point you have a full-fledged Prototype Pattern implementation. All it took was a parent link!

From here the pattern can expand in many directions, and we'll cover a few of the interesting ones in the remainder of this article.

Representations

There are two main low-level implementation considerations: how to represent property keys, and what data structure to use for storing the key/value pairs.

Keys

The Properties pattern almost always uses String keys. It's possible to use arbitrary objects, but the pattern becomes more useful with string keys because it trivially enables prototype inheritance. (It's tricky to "inherit" a property whose key is some opaque blob - we usually think of inheritance as including a set of named fields from the parent.)

JavaScript permits you to use arbitrary objects as keys, but what's really happening under the covers is that they're being cast to strings, and they lose their unique identity. This means JavaScript Object property lists cannot be used as be a general-purpose hashtable with arbitrary unique objects for keys.

Some systems permit both strings and numbers as keys. If your keys are positive integers, then your Map starts looking an awful lot like an Array. If you think about it, Arrays and Maps share the same underlying formalism (a surjection, not to put too fine a point on it), and in some languages, notably PHP, there isn't a user-visible difference between them.

JavaScript permits numeric keys, and allows you to specify them as either strings or numbers. If your object is of type Array, you can access the numeric keys via array-indexing syntax.

Quoting

JavaScript syntax is especially nice (compared to Ruby and Python) because it allows you to use unquoted keys. For instance, you can say

and the symbols name, age and favorite_days are NOT treated as identifiers and resolved via the symbol table. They're treated exactly as if you'd written:

You also have to decide whether to require quoting values. It can go either way. For instance, XML requires attribute values to be quoted, but HTML does not (assuming the value has no whitespace in it).

Missing keys

You will need to decide how to represent "property not present". In the simplest case, if the key isn't in the list, the property is not there (but see Inheritance further on).

If a property is frequently removed and re-added, it may make sense to leave the key in the list with a null value. In some systems, you may need null to be a valid property value, in which case you'd need to use some other distinguished (and reserved) value for this micro-optimization to work.

Data structures

The simplest property-list implementation is a linked list. You can either have the alternating elements be the keys and values (Lisp does this), or you can have each element be a struct containing pointers to the key and value.

The linked list implementation is appropriate when:

you're just using the pattern to allow user annotations on object instances

you don't expect many such annotations on any given instance

you're not incorporating inheritance, serialization or meta-properties into your use of the pattern

Logically a property list is an unordered set, not a sequential list, but when the set size is small enough a linked list can yield the best performance. The performance of a linked list is O(N), so for long property lists the performance can deteriorate rapidly.

The next most common implementation choice is a hashtable, which yields amortized constant-time find/insert/remove for a given list, albeit at the cost of more memory overhead and a higher fixed per-access cost (the cost of the hash function.)

In most systems, a hashtable imposes too much overhead when objects are expected to have only a handful of properties, up to perhaps two or three dozen. A common solution is to use a hybrid model, in which the property list begins life as a simple array or linked list, and when it crosses some predefined threshold (perhaps 40 to 50 items), the properties are moved into a hashtable.

Note that we'll often refer to property sets as "property lists" (or "plists" for short), because they're so often implemented as lists. But it's fairly unusual for the order to matter. In the rare cases when it matters, there are usually two possibilities: the names need to be kept in insertion order, or they need to be sorted.

If you need constant-time access and want to maintain the insertion order, you can't do better than a LinkedHashMap, a truly wonderful data structure. The only way it could possibly be more wonderful is if there were a concurrent version. But alas.

If you need to impose a sort order on property names, you'll want to use an ordered-map implementation, typically an ordered binary tree such as a splay tree or red/black tree. A splay tree can be a good choice because of the low fixed overhead for insertion, lookup and deletion, but with the tradeoff that its theoretical worst-case performance is that of a linked list. A splay tree can be especially useful when properties are not always accessed uniformly: if a small subset M of an object's N properties are accessed most often, the amortized performance becomes O(log M), making it a bit like an LRU cache.

Note that you can get a poor-man's splay tree (at least, the LRU trick of bubbling recent entries to the front of the list) using a linked list by simply moving any queried element to the front of the list, a constant-time operation. It's surprising that more implementations don't take this simple step: an essentially free speedup over the lifetime of most property lists.

Inheritance

With prototype inheritance, each property list can have a parent list. When you look for a property on an object, first you check the object's "local" property list, and then you look in its parent list, on up the chain.

As I described in the Overview, the simplest approach for implementing inheritance is to set aside a name for the property pointing to the parent property list: "prototype", "parent", "class" and "archetype" are all common choices.

It's unusual (but possible) to have a multiple-inheritance strategy in the Properties pattern. In this case the parent link is a list rather than a single value, and it's up to you to decide the rules for traversing the parent chains during lookups.

The algorithm for inherited property lookup is simple: look in my list, and if the property isn't there, look in my parent's list. If I have no parent, return null. This can be accomplished recursively with less code, and but it's usually wiser to do it iteratively, unless your language supports tail-recursion elimination. Property lookups can be the most expensive bottleneck in a Properties Pattern system, so thinking about their performance is (for once) almost never premature.

The deletion problem

If you delete a property from an object, you usually want subsequent checks for the property to return "not found". In non-inheritance versions of the pattern, to delete a property you simply remove its key and value from the data structure.

In the presence of inheritance the problem gets trickier, because a missing key does not mean "not found" – it means "look in my parent to see if I've inherited this property."

To make the problem clearer, assume you have a prototype list called Cat with a property named "friendly-to-dogs", whose value defaults to the boolean true. Let's say you have a specific cat instance named Morris, whose prototype is Cat:

Let's say Morris has a nasty run-in with a dog, and now he hates all dogs, so we want to make a runtime update to his friendly-to-dogs property. Our first idea might be to delete the property, since a missing key or null value are often interpreted as false in a boolean context. (This is true even in class-based languages like C++ or Java, in which a hasFooBar function will return true if the internal fooBar field is non-null.)

However, Morris does not have a copy of "friendly-to-dogs" in his local list: he inherits it from Cat. So if your deleteProperty method does nothing but delete the property from the local list, he will continue to inherit "friendly-to-dogs", which will irk him (and you) endlessly until you figure out where the bug is.

You can't delete "friendly-to-dogs" from the Cat property list, or all of your cats will suddenly become dog-haters, and you'll have outright war on your hands. (Note that in some settings this is exactly what you want to do, illustrating the inherent universal trade-off between flexibility and safety.)

The solution for Morris is to have a special "NOT_PRESENT" property value that deleteProperty sets when you delete a property that would otherwise be inherited. This object should be a flyweight value so that you can check it with a pointer comparison.

So to account for deletion of inherited properties, we have to modify our property-lookup algorithm to look in the local list for (a) a missing key, (b) a null value, or (c) the NOT_PRESENT tag. If any of these apply, the property is considered not present on the object. [Note: the semantics of null values are up to the system designer. You don't have to make null values mean "not there." Either way is fine.]

Read/write asymmetry

One logical consequence of prototype inheritance as we've defined it is that reads and writes work differently. In particular, if you read an inherited property, it gets the value from an ancestor in the prototype chain. But if you write an inherited property, it sets the value in the object's local list, not in the ancestor.

To illustrate, let's add a "night-vision" property to our Cat prototype. Its value is expected to be an integer representing how well the cat can see in the dark. Let's say that the default value is 5, but our hero Morris has been eating his carrots, so we want to set his "night-vision" property value to 7.

The setProperty code does not need to check the parent chain: it simply adds the key/value pair {"night-vision", 7} to Morris's local property list. If we set the property on Cat, then all cats would have Morris's super-vision, which isn't what we want.

This asymmetry is normal. Back in our L.T. / Emmitt Smith example, when we were adding properties to L.T., we didn't want to modify Emmitt! It's just how the pattern works: you override inherited values by adding local properties, even when the override is a deletion.

Read-only plists

Many implementations of the pattern offer "frozen" property lists. Sometimes (e.g. for debugging) it's useful to flag an entire property list as read-only. Ruby supports this via the "freeze" method on the built-in root Object class. In any sufficiently large, robust implementation of the Properties pattern, you should include the option to freeze your property lists.

If you offer a "freeze" function, you should think about whether you want to offer a "thaw" as well. The decision depends on whether you want to offer programmers additional protection, or you just want to lock them up and throw away the key.

My personal view is that Java programmers tend to overuse the "freeze" function when they start with Ruby. For that matter, they tend to overuse "final" in Java. I mentioned before the trade-off between flexibility and safety. When you use the Properties Pattern, you're consciously choosing flexibility over safety, and in many domains this is the right choice. In fact, safety can be viewed as a kind of optimization: something that should ideally be layered on, functioning behind the scenes rather than being interleaved with the user APIs and flexible data model.

A nice (and simple to implement) compromise on safety and flexibility is to offer a ReadOnly property attribute, as JavaScript does. There are certain properties (such as the parent pointer) that are less likely to need to change as the system evolves, so it's probably OK to lock them down early on. Doing this on a property-by-property basis is much less draconian. Even better, you should consider making the ReadOnly property attribute non-inheritable, so that subtypes can choose their own policies without compromising the integrity of the supertypes.

We're more or less done with inheritance: it's not very complicated. There are a few other inheritance-related design issues that I'll cover in upcoming sections.

Performance

Performance is one of the biggest trade-offs of using the Properties Pattern. Many engineers are so concerned with performance (and its attendant paradoxes and fallacies) that they refuse to consider using the Properties pattern, regardless of the situation.

As it happens, the pattern's performance can be improved and mitigated in several clever ways. I won't cover all of them here, but I'll touch on some of the classics and one or two new approaches.

Interning strings

Make sure your string keys are interned. Most languages provide some facility for interning strings, since it's such a huge performance win. Interning means replacing strings with a canonical copy of the string: a single, immutable shared instance. Then the lookup algorithm can use pointer equality rather than string contents comparison to check keys, so the fixed overhead is much lower.

The only downside of interning is that it doesn't help much when you're constructing a property name on the fly, since you still need to hash the string to intern it.

That's not much of a downside, so as a rule, you should always intern your keys. A large percentage of property names in any system are accessed as string literals from the source code (or are read from a configuration file and can be interned all at once when the file is read), and interning works in these common cases.

Corollary: don't use case-insensitive keys. It's performance suicide. Case-insensitive string comparison is really slow, especially in a Unicode environment.

Perfect hashing

If you know all the properties in a given plist at compile-time (or at runtime early on in the life of the process), then you might consider using a "perfect hash function generator" to create an ideal hash function just for that list. It's almost certainly more work than it's worth unless your profiler shows that the list is eating a significant percentage of your cycles. But such generators (e.g. gperf) do exist, and are tailor-made for this situation.

Perfect hashing doesn't conflict with the extensible-system nature of the Properties pattern. You may have a particular set of prototype objects (such as your built-in monsters, weapons, armor and so on) that are well-defined and that do not typically change during the course of a system session. Using a perfect hash function generator on them can speed up lookups, and then if any of them is modified at runtime, you just fall back to your normal hashing scheme for that property list.

Copy-on-read caching

If you have lots of memory, and your leaf objects are inheriting from prototype objects that are unlikely to change at runtime, you might try copy-on-read caching. In its simplest form, whenever you read a property from the parent prototype chain, you copy its value down to the object's local list.

The main downside to this approach is that if the prototype object from which you copied the property ever changes, your leaf objects will have the now-incorrect old value for the property.

Let's call copy-on-read caching "plundering" for this discussion, for brevity. If Morris caches his prototype Cat's copy of the "favorite-food" property (value: "9Lives"), then Morris is the "plunderer" and Cat is the plundered object.

The most common workaround to the stale-cache problem is to keep a separate data structure mapping plundered objects to their plunderers. It should use weak references so as not to impede garbage collection. (If you're writing this in C++, then may God have mercy on your soul.) Whenever a plundered object changes, you need to go through the plunderers and remove their cached copy of the property, assuming it hasn't since then changed from the original inherited value.

That's a lot of stuff to keep track of, so plundering is a strategy best used only in the direst of desperation. But if performance is your key issue, and nothing else works, then plundering may help.

Refactoring to fields

Another performance speedup technique is to take your N most commonly used properties and turn them into instance variables. Note that this technique is only available in languages that differentiate between instance variables and properties, so it would work in Java but not in JavaScript.

This optimization may sound amazingly attractive, especially during your first round of performance optimizations. Be warned: this approach is fraught with pitfalls, so (as with nearly all performance optimizations) you should only use it if your profiler proves that the benefits will outweigh the costs.

The first cost is API incompatibility: suddenly instead of accessing all properties uniformly through a single getProperty/setProperty interface, you now have specific fields that have their own getters and setters, which could potentially have massive impact in your system (since, after all, these are the most commonly-accessed properties). And unless you're using Emacs, your refactoring editor probably isn't smart enough to do rename-method constrained on argument value.

You can mitigate the API problem by continuing to go through the get/setProperty interface, and have them check the argument to see if it's one of the hardwired fields. This will result in an increasingly large switch-statement (or equivalent), so you're trading API code maintenance for API simplicity. It also slows down the field access considerably, which offsets the performance gain from using a field.

The next cost is system complexity: you have twice as many code paths through every part of your system that deals with the Properties pattern. Does inheritance still work the same? What about serialization? Transient properties? Metaprogramming? What about your user interface for accessing and modifying property lists? You face a huge code-bloat problem when you split property access into two classes: plist properties and instance variables.

The next cost is accidentally getting it wrong: how do you know what the most-accessed properties are? You may do some runtime profiling and see that it's one set, but over time the characteristics of your system might change in such a way that it's an entirely different set. Realistically you will have to instrument your system to keep track, on a regular basis, of which properties are accessed at a rate beyond your tolerance threshold, so you can convert them to fields as well. This isn't a one-off optimization.

But all these costs pale in comparison to the big one, which is extensibility: instance variables are set in stone. It will be a vast amount of work to try to give them parity with your plist properties: change-list notifications, the ability to override them or remove them, and so on. It's likely that you will wind up sacrificing at least some flexibility for these fields.

So use this optimization with extreme caution.

Refrigerator

The last performance optimization I'll mention is more about conserving memory than CPU. If you're worried about the overhead of a per-object property-list field, even if it's usually null, then you can implement property lists using a separate, external data structure.

I don't know what it's normally called, so I'm calling it the Refrigerator, since you're basically putting yellow stickies all over it. The idea is that you don't need to pay for the overhead of property lists when very few of the objects in your system will ever have one. Instead of using a field in each class with a property list, you maintain a global hashtable whose keys are object instances, and whose values are property lists.

To fetch the property list of an object, you go look to see if it's on the Refrigerator. The property list itself can follow any of the implementation schemes I discussed in Representations above.

I first heard this idea from Damien Conway in roughly 2001 at a talk he gave. He said he was considering using it for Perl 6, and I thought it was pretty clever. I don't remember what he called it, and I don't know if he wound up using it, but consider this idea to be his gift to you. Thanks, Damien!

REDACTED

Brendan Eich came up with astoundingly clever performance optimization for the Properties Pattern, which he told me about back in January. I was ready to publish this article, but I told him I'd hold off until he blogged about his optimization. Every once in a while he'd ping me and tell me "any day now."

Brendan, it's October, dammit!

Rolling your own

Needless to say, I've only scratched the surface on performance optimization of the Prope

Show more