Planet.mozilla.org

Assigning blame to unsafe code

2017-01-22

While I was at POPL the last few days, I was reminded of an idea
regarding how to bring more struture to the unsafe code guidelines
process that I’ve been kicking around lately, but which I have yet to
write about publicly. The idea is fresh on my mind because while at
POPL I realized that there is an interesting opportunity to leverage
the “blame” calculation techniques from gradual typing research. But
before I get to blame, let me back up and give some context.

The guidelines should be executable

I’ve been thinking for some time that, whatever guidelines we choose,
we need to adopt the principle that they should be automatically
testable. By this I mean that we should be able to compile your
program in a special mode (“sanitizer mode”) which adds in extra
assertions and checks. These checks would dynamically monitor what
your program does to see if it invokes undefined behavior: if they
detect UB, then they will abort your program with an error message.

Plenty of sanitizers or sanitizer-like things exist for C, of course.
My personal favorite is valgrind, but there are a number of
other examples (the
data-race detector for Go
also falls in a similar category). However, as far as I know, none of
the C sanitizers is able to detect the full range of undefined
behavior. Partly this is because C UB includes untestable (and, in my
opinion, overly aggressive) rules like “every loop should do I/O or
terminate”. I think we should strive for a sound and complete
sanitizer, meaning that we guarantee that if there is undefined
behavior, we will find it, and that we have no false positives. We’ll
see if that’s possible. =)

The really cool thing about having the rules be executable (and
hopefully efficiently executable) is that, in the (paraphrased)
words of John Regehr, it changes
the problem of verifying safety from a formal one into a matter of
test coverage, and the latter is much better understood. My ultimate
goal is that, if you are the developer of an unsafe library, all you
have to do is to run cargo test --unsafe (or some such thing), and
all of the normal tests of your library will run but in a special
sanitizer mode where any undefined behavior will be caught and flagged
for you.

But I think there is one other important side-effect. I have been (and
remain) very concerned about the problem of programmers not
understanding (or even being aware of) the rules regarding correct
unsafe code. This is why I originally wanted a system like the Tootsie
Pop rules, where programmers have to learn as few things as possible.
But having an easy and effective way of testing for violations changes
the calculus here dramatically: I think we can likely get away with
much more aggressive rules if we can test for violations. To play on
John Regehr’s words, this changes the problem from being one of having
to learn a bunch of rules to having to interpret error messages. But
for this to work well, of course, the error messages have to be
good. And that’s where this idea comes in.

Proof of concept: miri

As it happens, there is an existing project that is already doing a
limited form of the kind of checks I have in mind: miri, the MIR
interpreter created by Scott Olson and now with
significant contributions by Oliver Schneider. If you haven’t seen
or tried miri, I encourage you to do so. It is very cool and
surprisingly capable – in particular, miri can not only execute safe
Rust, but also unsafe Rust (e.g., it is able to interpret the
definition of Vec).

The way it does this is to simulate the machine at a reasonably
low-level. So, for example, when you allocate memory, it stores that
as a kind of blob of bytes of a certain size. But it doesn’t only
store bytes; rather, it tracks additional metadata about what has been
stored into various spots. For example, it knows whether memory has
been initialized or not, and it knows which bits are pointers (which
are stored opaquely, not with an actual address). This allows is to
interpret a lot of unsafe code, but it also allows it to detect
various kinds of errors.

An example

Let’s start with a simple example of some bogus unsafe code.

The problem here is that this “innocent looking function” claims to
borrow the box b but it actually frees it. So now when main()
comes along to execute *b += 1, the box b has been freed. This
situation is often called a “dangling pointer” in C land. We might expect
then that when you execute this program, something dramatic will happen,
but that is not (necessarily) the case:

As you can see, I got no error or any other indication that something
went awry. This is because, internally, freeing the box just throws
its address on a list for later re-use. Therefore when I later make
use of that address, it’s entirely possible that the memory is still
sitting there, waiting for me to use it, even if I’m not supposed to.
This is part of what makes tracking down a “use after free” bug
incredibly frustrating: oftentimes, nothing goes wrong! (Until it
does.) It’s also why we need some kind of sanitizer mode that will
do additional checks beyond what really happens at runtime.

Detecting errors with miri

But what happens when I run this through miri?

(First, before going further, let’s just take a minute to be impressed
by the fact that miri bothered to give us a nice stack trace here. I
had heard good things about miri, but before I started poking at it
for this blog post, I expected something a lot less polished. I’m
impressed.)

You can see that, unlike the real computer, miri detected that *b
was freed when we tried to access it. It was able to do this because
when miri is interpreting your code, it does so with respect to a more
abstract model of how a computer works. In particular, when memory is
freed in miri, miri remembers that the address was freed, and if there
is a later attempt to access it, an error is thrown. (This is very
similar to what tools like valgrind and electric fence do as well.)

So even just using miri out of the box, we see that we are starting to
get a certain amount of sanitizer rules. Whatever the unsafe code
guidelines turn out to be, one can be sure that they will declare it
illegal to access freed memory. As this example demonstrates, running
your code through miri could help you detect a violation.

Blame

This example also illustrates another interesting point about a
sanitizer tool. The point where the error is detected is not
necessarily telling you which bit of code is at fault. In this
case, the error occurs in the safe code, but it seems clear that the
fault lies in the unsafe block in innocent_looking_fn(). That
function was supposed to present a safe interface, but it failed to do
so. Unfortunately, for us to figure that out, we have to trawl through
the code, executing backwards and trying to figure out how this freed
pointer got into the variable b. Speaking as someone who has spent
years of his life doing exactly that, I can tell you it is not fun.
Anything we can do to get a more precise notion of what code is at
fault would be tremendously helpful.

It turns out that there is a large body of academic work that I think
could be quite helpful here. For some time, people have been exploring
gradual typing systems. This
is usually aimed at the software development process: people want to
be able to start out with a dynamically typed bit of software, and
then add types gradually. But it turns out when you do this, you have
a similar problem: your statically typed code is guaranteed to be
internally consistent, but the dynamically typed code might well feed
it values of the wrong types. To address this, blame systems
attempt to track where you crossed between the static and dynamic
typing worlds so that, when an error occurs, the system can tell you
which bit of code is at fault.

Traditionally this blame tracking has been done using proxies and
other dynamic mechanisms, particularly around closures. For example,
Jesse Tov’s Alms language allocated stateful proxies to allow
for owned types to flow into a language that didn’t understand
ownership (this is sort of roughly analogous to dynamically wrapping a
value in a RefCell). Unfortunately, introducing proxies doesn’t seem
like it would really work so well for a “no runtime” language like
Rust. We could probably get away with it in miri, but it would never
scale to running arbitrary C code.

Interestingly, at this year’s POPL, I saw a paper that seemed to
present a solution to this problem. In
Big types in little runtime, Michael Vitousek, Cameron
Swords (ex-Rust intern!), and Jeremy Siek describe a system for doing
gradual typing in Python that works even without modifying the Python
runtime – this rules out proxies, because the runtime would have to
know about them. Instead, the statically typed code keeps a log “on
the side” which tracks transitions to and from the unsafe code and
other important events. When a fault occurs, they can read this log
and reconstruct which bit of code is at fault. This seems eminently
applicable to this setting: we have control over the safe Rust code
(which we are compiling in a special mode), but we don’t have to
modify the unsafe code (which might be in Rust, but might also be in
C). Exciting!

Conclusion

This post has two purposes, in a way. First, I want to advocate for
the idea that we should define the unsafe code guidelines in an
executable way. Specifically, I think we should specify predicates
that must hold at various points in the execution. In this post we saw
a simple example: when you dereference a pointer, it must point to
memory that has been allocated and not yet freed. (Note that this
particular rule only applies to the moment at which the pointer is
dereferenced; at other times, the pointer can have any value you want,
though it may wind up being restricted by other rules.) It’s much more
interesting to think about assertions that could be used to enforce
Rust’s aliasing rules, but that’s a good topic for another post.

Probably the best way for us to do this is to start out with a minimal
“operational semantics” for a representative subset of MIR (bascally a
mathematical description of what MIR does) and then specify rules by
adding side-clauses and conditions into that semantics. I have been
talking to some people who might be interested in doing that, so I
hope to see progress here.

That said, it may be that we can instead do this exploratory work by
editing miri. The codebase seems pretty clean and capable, and a lot of the
base work is done.

In the long term, I expect we will want to instead target a platform
like valgrind, which would allow us to apply these rules even around
to unsafe C code. I’m not sure if that’s really feasible, but it seems
like the ideal.

The second purpose of the post is to note the connection with gradual
typing and the opportunity to apply blame research to the problem. I
am very excited about this, because I’ve always felt that guidelines
based simply on undefined behavior were going to be difficult for
people to use, since errors are are often detected in code that is
quite disconnected from the origin of the problem.