2015-07-08

Someone asked: why does the Express session middleware add a hash suffix to the session id cookie? A great question.

But first the obligatory disclaimer: like any security advice from someone who doesn’t know the specifics of your own system, this is for educational purposes only. Security is a complex and very specific area and if you are concerned about the security of your system you should hire an expert that can review your system along with a threat analysis and provide the appropriate advice.

Brute Force

Brute force attacks are those in which the attacker is trying to gain access to the system by making repeated requests using different credentials (until one works). The most common example is of an attacker who tries guessing a user password. This is why passwords should be long and avoid using dictionary words to make it harder to guess. Properly designed systems keep track of failed authentication requests and escalate the issue when it appears an attack is in progress.

Passwords are not the only credential used in web authentication. The most common implementation includes a login page which upon successful authentication sets a session cookie on the client. The session cookie acts as a bearer token – whoever shows up with the token is considered to be the authenticated user. Setting a session cookie removes the need to enter your username and password on every page. However, this session cookie now acts as the sole authentication key and anyone who gains access to this key will gain access to the system. Cookies are, after all, just a simple string of characters.

A session id guessing attack is a type of brute force attack. Instead of trying to guess the password, the attacker is trying to guess the session id and forge the authentication cookie. The attacker generates session ids and tries to make requests using those ids, in hope that they will match actual active sessions. For example, if a web application session ids are generated in sequence, an attacker can look up their own session id and based on that forge requests using nearby session id values. To protect against this attack we need to make guessing session ids impractical. Note I’m saying “impractical,” not “impossible”.

Impracticality

The first step is to make sure session ids are sufficiently long and non-sequential. Just like passwords, the longer the session id is, the harder it is to find a valid one by guessing. It is also critical that session ids are not generated using a predictable algorithm such as a counter because if such logic exists, the attacker is no longer guessing but generating session ids. Using a cryptographically secure random number generator to produce sufficiently long session ids is the best common practice. What’s “sufficiently long”? Well, that depends on the nature of your system. The size has to translate into an impractical effort to guess a valid session id.

Another way to prevent an attacker from guessing session ids is to build integrity into the token by adding a hash or signature to the session cookie. The way the Express session middleware does this is by calculating a hash over the combination of the session id and a secret. Since calculating the hash requires possession of the secret, an attacker will not be able to generate valid session ids without guessing the secret (or just trying to guess the hash). Just like strong random session ids, the hash size must match the security requirements of the specific application it is meant to protect. This is because at the end, the session cookie is still just a string and open to guessing attacks.

Session ids must be sufficiently long and impractical to guess. There are a few ways to accomplish this. The randomness and hashing techniques above are the two most common ways but not the only ones.

Layers

If we generate strong random session ids, do we still need the hash? Absolutely!

The core security principal is layering. This is also known as not putting all your eggs in one basket. If you rely on a single source of security, you end up with no security at all if that single source fails. For example, what if someone finds a bug in your random number generator? What if they find a way to hack that part of your system and replace it? There are countless of known attacks exploiting exactly this – the generation of random numbers that turns out not to be so random after all.

Combining a strong random session id with hash for integrity will protect against flaws in the random number generator. It will also protect against developer errors such as using the wrong random number generator function (e.g. the not so random method every system offers alongside the strong method). We all write bad code no matter how great our process is or how experienced we are. It is part of software engineering. This is why it is so important to layer your security. A moat is not enough, you also want a wall behind it, and probably some guards behind the wall.

If you think using the wrong random function or a deep bug in OpenSSL are the only two issues here consider the common practice of monkey patching code in JavaScript and other dynamic languages. If someone anywhere in an entire application deployment messes with the global random facilities (for testing, logging, etc.) and breaks it (or it is part of a malicious code injection), session ids relying solely on randomness are no longer secure.

Alarms

An important difference between guessing passwords and guessing session ids is the fact that passwords are associated with an account (e.g. username). The account-password pair makes it easier to keep track of brute force attacks because it provides a relatively straightforward way to keep track of failed attempts. However, when it comes to session ids, it is not as simple because sessions expire and do not include an account context. This means an invalid session id could come from an expired session or from an attacker, but without additional data (e.g. IP address) it would be hard to tell the difference in a large scale system.

By including an integrity component in the session id (via a hash or signature), the server can immediately tell the difference between an expired session, an unallocated session id, and an invalid session. Even if you just log invalid authentication attempts (and you should), you would want to log an expired session differently than an invalid one. Beside the security value of knowing the difference, it will also provide useful insight about how your users behave.

Hygiene

Credentials should expire and therefore session ids should have a finite lifespan (where duration is very much a system-specific value). While cookies come with an expiration policy, there is no way to ensure it is actually obeyed. An attacker can set the cookie expiration to any value without the server being able to detect it. A common best practice is to include a timestamp in every credential issued, which can be as simple as adding a timestamp suffix to the randomly generate session id. However, in order to rely on this timestamp, we must be able to verify it was not tempered with and the way to accomplish that is with a hash or signature.

Adding a timestamp to the session id allows the server to quickly handle expired sessions without having to make an expensive database lookup. While this might sound unrelated to security, it actually is very much core to maintaining a secure application.

A denial of service attack (or DoS) is an attack in which the attacker makes repeated requests with the sole purpose of consuming too much resources on the server and either shutting it down or making it inaccessible to others. If every request authentication requires a full database lookup at the application tier, an attacker can use forged session ids to stage a DoS attack with ease. By including an integrity component in the cookie, the server can immediately identify forged or expired credentials without any backend lookup cost.

Kill Switch

Sometimes things go wrong. And when they go very wrong, you need to have a way to immediately invalidate entire classes of sessions. Because generating a hash or signature requires a server-side secret or key, replacing the secret will immediately cause all session ids to fail validation. By using different secrets for different types of session ids, entire classes of sessions can be segregated and managed. Without such a mechanism, the application itself has to make a computed decision about the state of each session or perform mass database updates.

In addition, in large distributed systems with database replication over different geographic locations, invalidating a session record in one location can take seconds and even minutes to replicate. This means the session stays active until the full system is back in sync. Compared to a self-describing and self-validating session id, the benefits are obvious.

General Purpose

An important feature of the Express session middleware is its support for user-generated session ids. This allows developer to deploy the middleware in an existing environment where session ids are generated by an existing entity which might reside on a completely different platform. Without adding a hash to the user-provided session ids, the burden of building a secure system moves from the expert (the module author) to the user (which is likely to be a security novice). Applying a hash is a much better approach than forcing an internal session id generator.

Crocodiles

Adding a hash to a strong random session id is not all you should do. Whether your moat can benefit from crocodiles is, again, a castle-specific decision. Without going too far from the topic, there are plenty of other layers you can add to your session management tier. For example, you can use two session credentials, one long lived (lasts as long as the session) and another short lived (good for minutes or hours). To refresh the short lived you use the long lived but by doing so, you are reducing the exposure of the long lived credential on the network (especially when not using TLS).

Another common practice is to have one cookie with general information about the user (e.g. first name, recently viewed items, etc.) alongside the session and to then include something from that cookie in the hash to create a binding between the user’s active state to the authentication. It’s a way of bringing back the “username” into the workflow.

To go even further, hashing can be replaced with signatures and cookie content can be encrypted (and then hashed or signed on top). The security verbosity must match the threat.

Last Words

If you take one thing away from this post I hope it is the concept of layering security. While math plays a significant role in security, it is far from the only tool. Measuring the odds of guessing a session id as the sole source of security is failing to recognize that security comes from a combination of defenses. I would also strongly advise against having the kind of academic debates focusing on a single aspect of a secure system in public (at least without the proper disclaimers). It is extremely misleading to narrow down the question to the point where it causing confusion and misinformation.

Asking “is there a statistical benefit to hashing a strong random session id?” is harmful because it creates the false impression that this is the only consideration. It moves the discussion from the real world to that of an incomplete abstraction. As I hope I demonstrated above, there are a lot of reasons for including a hash beyond just making guessing impractical.

Show more