2016-10-07

This is a summary of the talk I gave at GrrCon '16.

We're always told: don't roll your own crypto!

This has always felt like a kind of abstinence-only education to me. Of course,
it's correct, that almost certainly if you decide to use your own encryption
mechanism instead of say, TLS, that you'll almost certainly do a worse job than
the IETF. You'll certainly fail at making a better block cipher than Daemen and
Rijmen did. But there was always a sort of "don't even learn about it" tone to
this recommendation to me.

Is this recommendation effective? That is, do people or companies actually roll
their own crypto? Are the crypto systems they made horribly broken? I decided
to find out.

The Survey

I needed an area to survey to answer this question. Where can I find lots of
examples of custom cryptography? Can I find a lot of common issues in those
implementations?

It turns out that there is a lot of custom cryptography in one particular
place: custom single sign-on implemenations. I found 21 implementations of
companies that offer some kind of custom single sign-on for their product.

Custom Single Sign-on

Single sign-on is any system which grants access to other systems by virtue of
being authenticated against it. For instance, Facebook Connect is a popular
single sign-on mechanism for many websites. Instead of registering with every
website you use, you can sign in with Facebook and the website will get the
user information it needs from Facebook directly on your behalf. OAuth2 and
SAML 2.0 are examples of open standards that provide single sign-on.

But what if that's not quite what you want?

What if you want "a few lines of PHP" in order to have users be authenticated
against your site? Best if it works with Wordpress and whatever weird Java 6
system some of your enterprise customers use. No need to worry about what a
bearer token is and why you'd want to refresh it.

What if instead you made your own little crypto function that combined some
secret and gave it to your customers, who could then authenticate their users
to your service?

For instance, say Alice has a TODO list service that her customers buy. Alice
buys Bob's helpdesk software so that her customers can file support tickets
when they have a problem.

When one of Alice's customers wants to file a support ticket because their
TODOs were missing, Alice computes something like this:

Where H is some kind of HMAC or hash function (or even something terrible
only dreamt of in nightmares), and shared secret is a secret shared by Alice
and Bob.

Alice then redirects the user to Bob's website, with the result of that
computation and the user's email, like:

Bob then uses the same email address and the same shared secret, and hopefully
comes up with the same hash value, 59bcc3ad6775562f845953cf01624225. If so,
the user is successfully authenticated to Alice's support site, hosted by Bob.
The user didn't need to register on Bob's website, so, to the user, it was
seamless.

Since the user doesn't know the shared secret, the user can't compute the hash
value themselves.

Common Flaws

The good thing about these custom single sign-on implemenations is that they're
simple. The bad thing is that they're often dangerously insecure. For example,
this bug reported in Freshdesk resulted from the name and the
email being concatenated. There are plenty of tricky little bugs that can
impact these systems.

For this study, I picked seven flaws that I thought would be common problems
with these custom SSO solutions, and examined each solution's publicly
available documentation and example code for the problems. I didn't do a deep
inspection of each implementation, but rather just enough to determine if the
flaw was present or not.

No HMAC

Essentially, these single sign-on implemenations are trying to pass an
authenticated message by an untrusted third party, the user. The best way to do
that is with an message authentication code (MAC).

An HMAC combines a hash function, a secret key, and a message in a
secure way that resists length extension attacks and
provides preimage resistance. Not using an HMAC or any kind of real message authentication opens
up the SSO implementation to many different kinds of attacks.

Uses Obsolete Crypto Primitives

Does the implementation use known bad crypto primitives? For this, I counted
HMAC-MD5 as bad, as MD5 is known bad, even though there are no known attacks
against HMAC-MD5 specifically. As with some other flaws I studied, not all of
these problems I wanted to identify were critical. I also wanted to study less
important flaws to understand how fast or slow the adoption of new crypto
primitives, like SHA-3 were.

Spoiler alert: no one used SHA-3.

Short Keys

Shared secret keys are often distributed in hexadecimal, like this:

If you're not paying attenion, you might do something like this, in Java:

This does not give you 16 byte array like [0x35, 0xf7, … 0x3a]. Instead, it
gives you a 32 byte array of the UTF-8 representation of the string, like
this: [0x33, 0x35, … 0x61], which is almost certainly not what you meant.

If your cipher takes only the number of bytes it needs, it will leave some of
the key material out! This means that if you're using a 128-bit key, it could
be using only a 64-bit key. That's a massive reduction in the number of
available keys.

Replay Attacks

Since the "authenticated message" is being passed to a potentially untrusted
user, it's important to make sure that the message has some kind of expiration.
One simple way is just to attach a timestamp that expires soon after the
message is generated. Another way would be to use a nonce, which
ensures that the message cannot be used more than once.

If a user can execute a replay attack, they could use another user's
compromised SSO URL, or use an older URL of their own to stay logged in.

Static Initialization Vector

Block ciphers have modes. These modes make it possible to use block ciphers on
more than one block of data. These mode typically require an initialization
vector (IV) that's random. Some modes, like CTR and CBC, require that the IV
isn't reused, otherwise it will leak information. In CTR mode, IV reuse is
particularly catastrophic, so much so that some crypto experts are
recommending against CTR mode.

For SSO implementations that used a block cipher, I wanted to see if they made
this classic error.

Known Plaintext

Usually it's best to limit what the attacker knows. Like the secret key. Best
not to share that with your attacker. But sometimes even knowing (or
controlling) the plaintext can help the attacker. With well designed crypto
systems, this shouldn't matter at all. Attackers could encode whatever messages
they want and not learn anything about other messages or the key. But many
crypto systems are not well designed, so I kept track of which implementations
had plaintexts that the attacker knew or controlled.

Random Crap

This category is a bit tongue-in-cheek, and actually came about after reviewing
the implementations. I noticed a lot of weird stuff that absolutely has no
effect on the crypto guarantees (or lack thereof) of the system. Twiddling
bits, reversing strings, taking the MD5 of the SHA-1 of the MD5 of the SHA-1
of the key, and so on.

Survey Results

Here are the aggregate problems found of the 21 custom single sign-on
implementations studied:

Several of the implementations that had the short keys problems used an HMAC
that did not truncate the key, therefore those aren't so much vulnerabilities
as sloppy programming. Similarly, using obsolete primitives is not always
immediately exploitable, but it is no longer best practice.

One implementation used a block cipher (AES) in a mode that requires the IV to
be used only once, and it failed to do so.

Only one implementation was free from all problems studied.

The response from vendors was disappointing. Of the 20 implementations that had
problems, nearly half did not acknowledge my vulnerability report. Two claimed
that the problems I found were not bugs. Only one implementation fixed the bugs
I reported.

Custom cipher

Interestingly, one implementation decided that even traditional cryptography
primitives, like MD5 or SHA-1 or AES were too fancy for them and wanted to make
their own. Here it is, edited for clarity:

This code has several obvious flaws. It operates on a per character basis,
which means there's no avalanche effect. It naively just adds
together a hex character (0-9A-F) and the plaintext, and then base 36 encodes
it. For some reason it reverses the resulting two characters, probably to add
some mystery.

Here's a table of what a few iterations of this function does for a plaintext
of ASCII zeroes and the key "hello":

Plaintext

Key

Val

Adder

Addition

Base 36

0

a

48

97

145

41

0

a

48

97

145

41

0

f

48

102

150

46

0

4

48

52

100

2S

To reverse this, it's a simple matter of taking the ciphertext and the plaintext and undoing the operations that were performed to get the secret key. "Val" is the ASCII value of the plaintext, "Decimal" is the decimal value of the base 36 number, "Subtract" is what happens when you subtract those two, and the key is the ASCII representation.

Plaintext

Base 36

Val

Decimal

Subtract

Key

0

41

48

145

97

a

0

41

48

145

97

a

0

46

48

150

102

f

0

2S

48

100

52

4

Here's the code for exactly that:

There's probably a more clever ciphertext-only attack that's possible because of how bad this cipher is, but I didn't do it because the attacker has access to the plaintext and the ciphertext in this attack. For anyone that has done any cryptanalysis or CTFs, this "encryption" function is a joke.

When the attacker gains the shared secret key, the attacker can then impersonate any user, including admins. This is a classic privilege escalation attack, done over single sign-on.

Takeaways

Should you roll your own crypto? No.

Do people roll their own anyway? Yes.

The standard recommendation from the security community about learning and
implementing your own cryptography has been to avoid it. Let cryptographers do
cryptography. However, it's clear from these results that people will implement
their own crypto regardless.

This strikes me as similar to absintence-only education. We've tried telling
everyone not to write custom crypto code, but because of product demands,
ignorant developers, or hubris, that hasn't been successful. I think it's time
we try a different tactic: teach everyone crypto. Make it a standard part of
being a software engineer. Everyone should at least know what an HMAC is and
why it should be used, what an initialization vector is and how to handle them,
and how to securely generate random numbers (hint: use /dev/urandom).

There should be simple, imitable implemenations of various real world problems
that cryptography solves. No one wants to make insecure systems. We need to
make it harder to mess up.

And, should you learn cryptography? Yes.

Resources for learning cryptography

I covered this in an older blog post of mine,
"So, you want to crypto", but it needs updating.
So here's the updated version:

Courses

A good place to start is Cryptography I at Coursera. Your local
university's cryptography course is also handy. Formal courses are good for a
foundation in some of the mathematics behind cryptography.

Books

My favorite practical cryptography is far and away
Cryptography Engineering by Ferguson et al. It combines the best
aspects of theoretical knowledge and practical experience. However, there are
many good cryptography books. Some, like
Introduction to Cryptography with Coding Theory, is completely
on the theoretical/mathemetical side, but also covers some historical ciphers,
which sometimes pops up in CTFs.

Learn to break it

Learning to break cryptography is perhaps the most effective way. After you
get a solid foundation from an introductory course or book, I'd recommend doing
the Cryptopals Crypto Challenges. This will get you thinking
about how crypto systems break, which is arguably the most important way to
design and examine them.

Learn by imitation

Look at pre-existing solutions to problems. Most crypto protocols have had
several versions where they fixed critical bugs. That's interesting because you
can learn from their mistakes. Implementations that I'd recommend are:
AWS Authentication, OAuth2, and Double Ratchet. Why were
they designed the way they were? What flaws do they have? Is their complexity
necessary? Are they too simple?

Show more