Mortoray.com

The true cost of zero cost exceptions

2013-09-12

Exceptions are a cornerstone of programming. If we intend to use them prolifically, as in C++, we need an efficient implementation. Thus was born the zero-cost exception model. It offers zero runtime overhead for an error-free execution, which surpasses even that of C return-value error handling. As a trade-off, the propagation of an exception has a large overhead.

This article gives an overview of how the zero-cost model is achieved, and what actual costs are involved. It is not intended to be a discussion of the good and the bad aspects of exceptions.

How it works

GCC and CLang follow the Itanium ABI for exception handling. From the limited information I found, Windows structured exception handling appears to use a similar scheme (in 64-bit versions). As needed I will use the Itanium/GCC terminology. These approaches comprise two aspects: the static compile-time generation of exceptions tables; and the runtime searching and unwinding.

The exception specification is not limited to a single language. Multiple languages use the same framework and are intended to be mixed together via libraries. I’m not sure to what extent this happens in practice. I’ll write as though it does, using examples from different languages. The concepts are the same.

Exception Frames

Exceptions are handled in frames. The simplest of these frames is a ‘try/catch’ block in the source code. The ‘try’ opens an exception frame and the ‘catch’ defines the handler for that frame. Exceptions within that frame flow to the handler: either they match and the exception is handled, or they do not match and the exception is propagated further.

In addition to ‘catch’ handlers, are cleanup handlers. This is code that must be called during exception propagation but doesn’t stop the exception. This includes any ‘finally’ blocks and C++ destructors. Explicit ‘finally’ blocks are matched with a ‘try’, so it is easy to see where their frame is defined. Destructors create implicit frames, starting at the point of instantiation (variable declaration), ending with the enclosing scope. Note that every object with a destructor creates its own exception frame.

Languages with ‘defer’ statements, like Leaf, also introduce exception frames. These are very similar to destructors: the ‘defer’ statement opens a frame which ends with the enclosing scope. Any language feature which requires code to execute during an exception will require an exception frame.

Here is the same code with the implicit frames replaced with explicit try blocks. Pay attention to where the try block starts in relation to a variable declaration: the destructor is only called if construction was successful.

It’s important to see all of these frames; compilers used to generate code at each ‘try’ statement to handle exceptions. One technique was ‘setjmp/longjmp’: each ‘try’ adds a jump address to an exception handler stack and the ‘finally’ clauses remove it. Even if no exception was thrown the work of adding/removing from the handler stack would still be done. One of the key goals of zero-cost exceptions was to get rid of this needless setup/teardown work.

Refer to building blocks for more information on flow handling.

Exception Tables

Enter the exception table. At runtime the system knows the present code location: the program counter (PC) is how the CPU knows what to execute next. It’s of course an address relative to the compiled code, not the source code. These addresses are all calculated and predictable though, so we could work backwards to source code. All we need is a translation table.

The exception table, also known as an unwind table, is just a range map: a series of begin and end keys. At runtime the PC is located in this map. There is obviously a bit of overhead here: it is a search operation. I have not looked at the fine details, but being a search we can assume it is roughly logarithmic with the number of entries in the table (at least one per exception frame).

What value is being retrieved? In the source code, any exception within a frame must land at the handler for that frame. At the level of compilation where the tables are being created, the concept of a frame is kind of lost. Each time an exception could be thrown, each place a function is called, the exception handler must be explicitly provided. Two key details are stored: the personality routine and the language specific data area (LSDA). A personality routine is a function that is called whenever the function call results in an exception. The LSDA is block of data used by the personality routine.

Exception Propagation

A ‘throw’ statement in code starts exception propagation. First the object being thrown is created, possibly a ‘std::exception’ in C++. It is then passed to a wrapper function that creates an ‘_Unwind_Exception’ structure, comprising: a cleanup routine, an exception class, and two private fields. The cleanup routine is a function responsible for freeing up any exception resources once the propagation is complete. The exception class identifies the type of the exception and is usually a language-specific identifier. One of the private fields is used to store a pointer the language specific exception (the object passed to ‘throw’).

For Leaf, which is still written in C++ for now, I have code that looks something like this:

Once created, the exception structure is passed to ‘_Unwind_RaiseException’, starting the ABI specified exception propagation. __(On Linux this function is defined in ‘libgcc_s’. This is often the case of ABI support: it is implemented in user-space. The kernel doesn’t really care what is being used as exceptions do not propagate through it. But to be usable, all programs and libraries must conform to a common set of definitions. On Linux x86_64 these standards are essentially whatever GCC does. For now let’s just assume this is all well defined and well documented and continue with our discussion.)__

It is the responsibility of the ‘_Unwind_RaiseException’ function to do the actual work of propagating the exception. This is where those exception frames come in. The PC is used to search the exception tables and locate the appropriate exception frame. The personality routine of this frame is then called to determine how the exception is handled: this frame doesn’t handle the exception, this frame catches and handles the exception, or this frame needs to do some cleanup but does not handle the exception.

If the frame does not handle the exception then the ‘_Unwind_RaiseException’ function must search further for another handler. Here it will walk the stack to find the next PC, then exception frame and personality handler. The process repeats until one of those handlers says it handles the exception. Now, for reasons unknown to me, it actually does this process twice: once to find the correct handler, and once to execute each cleanup frame and the final handler.

Note that cleanup handlers could result from ‘finally’ blocks. At a high-level these blocks of code are executed during an exception but do not block that exception. I say “could” since a compiler doesn’t need to use cleanup frames. It could instead use a normal handler and then rethrow the exception when it is done. This is what I do in Leaf at the moment.

The personality routine

Even with all this preparatory work the personality routine still has a lot to do. It will be provided with the ‘_Unwind_Exception’ and an ‘_Unwind_Context’. With the context it can call ‘_Unwind_GetLanguageSpecificData’ to get the LSDA for this frame. Now that the LSDA is needed we can look at what it stores.

High level ‘catch’ statements allow you to specify which exceptions are being caught, like ‘catch( runtime_error )’. The personality routine needs this information to decide whether it actually handles an exception. It can’t generally be known at compile-time which types make it to this handler, thus it must be checked at runtime. The compiler stores the catch details in the LSDA. How exactly a runtime exception type is mapped to this static information is left as an exercise for the personality routine. In C++ this can be rather involved as it traverses the type hierarchy looking for matching base classes.

One could, in theory, store anything in the LSDA, but something like LLVM allows only a very limited amount of data. It seems like just enough to support C++ style exceptions. The LSDA is also not raw data, instead it is encoded to save space. GCC and LLVM use DWARF encoding. The personality routine is required to decode this data (surprisingly there is no standard library nor function which does this).

Recall this happens twice. In the first round ‘_Unwind_RaiseException’ is just looking for the appropriate handler, so the personality routine just returns a result code. In the second phase the handler code will actually be called. The personality handler does this by calling a few ‘_Unwind_Set’ functions and returning an execution code.

Recap for the costs

The term “zero-cost exception” is a bit of misnomer: it refers only to the runtime cost when exceptions are not being thrown. The actual overhead in throwing an exception is quite high. It can be considered from two aspects: size and speed. First, a quick recap on the vital points.

Data:

* try/finally/catch/destructors translate into exception frames stored in exception tables

* each frame specifies a personality routine and an LSDA

* catch specifications are stored in the LSDA, DWARF encoded

Execution:

* a native language exception is created (like ‘std::exception’)

* this is wrapped in an ‘_Unwind_Exception’ and passed to ‘_Unwind_RaiseException’

* the PC (program counter) is used to locate an entry in the exception tables

** a handler is sought

*** the personality routine is called with the exception details

*** the LSDA is located and decoded

*** the exception type is compared

*** return: handles exception, doesn’t handler, or cleanup

** the previous is done again, and for each cleanup and the final handler:

*** the execution context is setup with ‘_Unwind_Set*’

*** the native language code is executed

* ultimately the ‘exception_cleanup’ function is called

Size

The exception tables can get quite large for a program. It isn’t wasted data however, they are compactly encoded, and generally don’t store unnecessary data. This information captures the error handling logic of the program. I would generally not be concerned about the raw size of it, just like we aren’t normally concerned about the size of our code.

One potential issue comes from its separation from the main code. The assumption is that exceptions aren’t used often, thus we don’t want to waste the CPUs time loading unneeded data. This definitely improves the non-exception execution times. Should an exception occur however, this data must be loaded. This will likely involve a lot of cache misses and loads from main memory. If exceptions do happen frequently, the data will end up closer in the caches and the cost be somewhat minimized (at the very least the second iteration over the handlers will not need to reload the data). There might be a few domains where the cold loading time is an issue, but I’d suspect they are quite rare.

I’ve seen some comparisons, or rather complaints, about the size of exception data compared to not using exceptions. I’m not sure on the validity of most of these complaints. If one compares to a program which simply aborts on error, then obviously the ‘abort’ based code will be smaller. I will of course argue that it isn’t really doing error handling. My guess is that a robust return-value style handling, achieving the same program logic as exceptions, would be a comparable size.

Speed

The clearer cost for zero-cost exceptions is the runtime execution speed. Take a look at the list of steps again: it is quite involved! None of the individual steps is too costly, but they aren’t trivial, and the search loop could be run for a lot of exception frames. The DWARF encoding is also an issue here. It uses a variable length encoding that is relatively slow to decode. On the plus side, the encoding minimizes the data size and thus reduces the time the CPU waits to load it from memory.

Most programs are not likely impacted by this speed. Exceptions are often tied to events that already have a variable, or high time overhead. For example, the time of error detection from a user dialog is entirely irrelevant compared to the time it takes the user to complete the form. A less obvious case involves system calls, which tend to have a high overhead. For example, attempting to open a file is also a very costly operation, adding an exception to report missing files should not significantly alter this time. The same also applies to network activity, where the time overhead of the reading/writing and protocol handling renders the exception speed irrelevant.

Where exceptions are used heavily, without intervening input, they will become a speed bottleneck. This relates to discussions about misusing exceptions and the infamous phrase “exceptions are exceptional”. It results in an aversion to using exceptions, even in places where they might make sense. For example, a lot of libraries still use return values to indicate basic errors, reserving exceptions for when things “really go wrong”. (In fairness, a lot of the aversion is more likely related to bulky syntax rather than speed concerns.) This leaves me without a clear example of when the propagation speed is a genuine issue.

Feedback

Please let me know if you have real examples of problems with exception costs. That is, you have code where using exceptions resulted in a performance issue. Concrete examples would help explain the potential problems. It would also help me with Leaf, where I intend on unifying all error handling.