2014-07-23

There was an interesting /r/DailyProgrammer challenge this week
to write a program that properly hashes passwords for storage in an
account database. It's a necessary part of any system that needs to
securely authenticate users. Storing user passwords in plain text is
not just bad practice, it's irresponsible. If the database is
compromised, the attacker learns every user's password. Since people
are likely to re-use passwords on different websites, the database
could be used to infiltrate accounts on other sites.

The solution to this problem is to run the password through a one-way
cryptographic hash function before storing it in the database. When
the database is compromised, it's more difficult to work backwards to
recover the passwords. Examples of one-way hash functions are MD5,
SHA-1, and the SHA-2 family. Block ciphers can also be converted into
hash functions (e.g. bcrypt from Blowfish), though it must be done
carefully since block ciphers were generally designed with different
goals in mind.

However, these particular functions (SHA-1 and SHA-2) alone are poor
password hashing functions: they're much too fast! An offline attacker
can mount a rapid brute-force attack on these kinds of hashes. They
also don't include a salt, a unique, non-secret, per-hash value used
as additional hash function input. Without this, an attacker could
prepare the entire attack ahead of time -- a rainbow table. Once the
hashes are obtained, reversing them is just a matter of looking them
up in the table.

Good password hashing needs to be slow and it needs support for salt.
Examples of algorithms well-suited for this purpose are PBKDF2,
bcrypt, and scrypt. These are the functions you'd want to use in a
real application today. Each of these are also more generally key
derivation functions. They can strengthen a relatively short
human-memorable passphrase by running it through a long, slow
procedure before making use of it as an encryption key. An brute-force
attacker would need to perform this slow procedure for each individual
guess.

Alternatively, if you're stuck using a fast hash function anyway, it
could be slowed down by applying the function thousands or even
millions of times recursively to its own output. This is what I did
in order to strengthen my GPG passphrase. However, you're still
left with the problem of applying the salt. The naive approach would
be a plain string concatenation with the password, but this likely to
be vulnerable to a length extension attack. The proper approach
would be to use HMAC.

RC4 Solution

For my solution to the challenge, I wasn't looking for something
strong enough to do key derivation. I just need a slow hash function
that properly handles a salt. Another important goal was to keep the
solution small enough to post as a reddit comment, and I wanted to do
it without using any external crypto libraries. If I'm using a
library, I might as well just include/import/require PBKDF2 and make
it a 2-liner, but that would be boring. I wanted it to be a reasonably
short C program with no external dependencies other than standard
libraries. Not counting the porcelain, the final result weighs in at
115 lines of C, so I think I achieved all my goals.

So what's the smallest, modern cryptographic algorithm I'm aware of?
That would be RC4, my favorite random generator! Unlike
virtually every other cryptographic algorithm, it's easy to commit to
memory and to implement from scratch without any reference
documentation. Similarly, this password hashing function can be
implemented entirely from memory (if you can imagine yourself in some
outlandish sitation where that would be needed).

Unfortunately, RC4 has had a lot of holes punched in it over the
years. The initial output has been proven to be biased, leaking key
material, and there's even good reason to believe it may already be
broken by nation state actors. Despite this, RC4 remains the most
widely used stream cipher today due to its inclusion in TLS. Most
importantly here, almost none of RC4's weaknesses apply to this
situation -- we're only using a few bytes of output -- so it's still a
very strong algorithm. Besides, what I'm developing is a proof of
concept, not something to be used in a real application. It would be
interesting to see how long it takes for someone to break this (maybe
even decades).

Before I dive into the details, I'll link to the source repository. As
of this writing there are C and Elisp implementations of the
algorithm, and they will properly validate each other's hashes. I call
it RC4HASH.

https://github.com/skeeto/rc4hash

Here are some example hashes for the password "foobar". It's different
each time because each has a unique salt. Notice the repeated byte
12 in the 5th byte position of the hash.

Each also validates as correct.

The Algorithm

RC4 is a stream cipher, which really just means it's a fancy random
number generator. How can we turn this into a hash function? The
content to be hashed can be fed to the key schedule algorithm in
256-byte chunks, as if it were a key. The key schedule is a cipher
initialization stage that shuffles up the cipher state without
generating output. To put all this in terms of C, here's what the RC4
struct and initialization looks like: a 256-element byte array
initialized with 0-255, and two array indexes.

The key schedule shuffles this state according to a given key.

Notice it doesn't touch the array indexes. It can be called over and
over with different key material to keep shuffling the state. This is
how I'm going to mix the salt into the password. The key schedule will
first be run on the salt, then again on the password.

To produce the hash output, emit the desired number of bytes from the
cipher. Ta da! It's now an RC4-based salted hash function.

Both password and salt are the inputs to hash function. In order to
validate a password against a hash, we need to keep track of the salt.
The easiest way to do this is to concatenate it to the hash itself,
making it part of the hash. Remember, it's not a secret value, so this
is safe. For my solution, I chose to use a 32-bit salt and prepend it
to 20 bytes of generator output, just like an initialization vector
(IV). To validate a user, all we need is a hash and a password
provided by the user attempting to authenticate.

Fixing a Flaw

Right now there's a serious flaw. If you want to find it for yourself,
stop reading here. It will need to get fixed before this hash function
is any good.

It's trivial to find a collision, with is the death knell for any
cryptographic hash function. Certain kinds of passwords will collapse
down to the simplest case.

Notice a pattern? Take a look at the RC4 key schedule function. Using
modular arithmetic, the password wraps around repeating itself over
256 bytes. This means passwords with repeating patterns will mutate
the cipher identically regardless of the number of repeats, so they
result in the same hash. A password "abcabcabc" will be accepted as
"abc".

The fix is to avoid wrapping the password. Instead, the RC4 generator,
seeded only by the salt, is used to pad the password out to 256 bytes
without repeating.

This should also help mix the RC4 state up a bit more before
generating the output. I'm no cryptanalyst, though, so I don't know if
it's worth much.

Slowing It Down

The next problem is that this is way too fast! It shuffles bytes
around for a few microseconds and it's done. So far it also doesn't
address the problems of biases in RC4's initial output. We'll kill two
birds with one stone for this one.

To fix this we'll add an adaptive difficulty factor. It will be a
value that determines how much work will be done to compute the hash.
It's adaptive because the system administrator can adjust it at any
time without affecting previous hashes. To accomplish this, like the
salt, the difficulty factor will be appended to the hash output. All
the required information will come packaged together in the hash.

The difficulty factor comes into play in two areas. First, it
determines how many times the key schedule is run. This is the same
modification CipherSaber-2 uses in order to strengthen RC4's weak
key schedule. However, rather than run it on the order of 20 times,
our hash function will be running it hundreds of thousands of times.
Second, the difficulty will also determine how many initial bytes of
output are discarded before we start generating the hash.

I decided on an unsigned 8-bit value for the difficulty. The number of
key schedules will be 1 shifted left by this number of bits (i.e.
pow(2, difficulty)). This makes the minimum number of key schedules
1, since any less doesn't make sense. The number of bytes skipped is
the same bitshift, minus 1, times 64 ((pow(2, difficulty) - 1) * 64),
the muliplication is so that it can skip large swaths output.
Therefore the implementation so far has a difficulty of zero: one key
schedule round and zero bytes of output skipped.

The dynamic range of the difficulty factor (0-255) puts the the time
needed on a modern computer to compute an RC4HASH between a few
microseconds (0) to the billions of years (255). That should be a more
than sufficient amount of future proofing, especially considering that
we're using RC4, which will likely be broken before the difficulty
factor ever tops out.

I won't show the code to do this since that's how it's implemented in
the final version, so go look at the repository instead. The final
hash is 26 bytes long: a 208-bit hash. The first 4 bytes are the salt
(grabbed from /dev/urandom in my implementations), the byte is the
difficulty factor, and the final 21 bytes are RC4 output.

In the example hashes above, the 12 constant byte is the difficulty
factor. The default difficulty factor is 18 (0x12). I've considered
XORing this with some salt-seeded RC4 output just to make the hash
look nice, but that just seems like arbitrary complexity for no real
gains. With the default difficulty, it takes almost a second for my
computers to compute the hash.

I believe RC4HASH should be quite resistant to GPGPU attacks. RC4 is
software oriented, involving many random array reads and writes rather
than SIMD-style operations. GPUs are really poor at this sort of
thing, so they should take a significant performance hit when running
RC4HASH.

Break My Hash!

For those interested in breaking RC4HASH, here are a couple of hashes
of English language passphrases. Each is about one short sentence in
length ([a-zA-Z !.,]+). I'm not keeping track of the sentences I used,
so the only way to get them will be to break the hash, even for me.

If you can find a string that validates with these hashes, especially
if it's not the original passphrase, you win! I don't have any prizes
in mind right now, but perhaps some Bitcoin would be in order if your
attack is interesting.

Show more