Arup.blogspot.com

What to Learn from LinkedIn Password Hack as an Oracle DBA

2012-06-07

One of the major news today was the hacking and resultant publishing of passwords in LinkedIn. Didn't hear about it? Well, read it here. In summary, someone smart but with head screwed a little askew decided to pull passwords from LinkedIn account using a little known flaw in the LinkedIn iOS app. LinkedIn later confirmed that leak and asked users to change the password. This created a major ripple effect all over the world. The news competed for attention with others such as Spain's economic reforms; but in the end it managed to rise to the top since many professionals and executives are members of the LinkedIn site and were affected.

Well, what is that to do with being an Oracle DBA - you may ask. Fair question. You see, there is a very important lesson to be learned here from this incident - a lesson commonly ignored by many DBAs, developers, architects and pretty much all users of an Oracle database. Let's see what that is.

Mechanics of the LinkedIn Password Hack

Adversaries (a.k.a. hackers) obtain password in many ways. Some use brute force approach of guessing the password and trying to login until they succeed. However, many systems employ a simple mechanism of locking out the user when more than a threshold number of incorrect attempts are made - not very effective. There is another type of attack, where the adversary simply gets the password stored in the servers of the site. But, wait, shouldn't that password be encrypted?

Yes, they generally are. But here is where a twist comes. The passwords may not be really "encrypted"; but merely "hashed". There is a huge difference. Encryption generally requires a key that is used to encrypt a value; hashing does not. Hashing sort of transforms the value but not in a predictable way; so you can't reverse the hasing process to get the source value. Let's see how it works.

Hashing

Here is a simple example. Suppose you are negotiating a rate for your baby sitter and you agreed on an amount - $123. Now you asked the sitter to tell your spouse that amount. Well, how do you make sure she would mention that very amount? After all, she has incentive to say a higher amount, doesn't see? She could say that you agreed on $125 or even $150; your spouse will not be able to ascertain that. (Imagine for a moment that you don't have access to normal modern technology like a cellphone, etc. for you to communicate directly with your spouse). So you develop a simple strategy - you come up with a formula that creates a number from the amount. It could be as simple as, say, the total of all digits. So your amount - $123 becomes:

1 + 2 + 3 = 6

You write that down on a paper, seal it in an envelope and ask the sitter to give it to your spouse in addition to mentioned the agreed upon amount. You and your spouse both know this formula; but the sitter doesn't. Suppose she fudges the amount you agreed on to make it to, say $125. Upon her telling your spouse computes the magic number:

1 + 2 + 5 = 8

Your spouse will compare this with the number inside the sealed envelope and immediately come to the conclusion that the amount agreed by you was something different; not $125. The authenticity of the value is now definitively established to be false.

This process is called hashing and this magic number is called a hash value. Of course the hashing process is much more complex than merely adding the digits. I just wanted to show the concept with a very simple example. The mechanics of the process, which was simply ading up the digits, is known as the hashing algorithm.

Here are some properties of this hashing process:

(1) The process is one-way. You can determine the hashvalue by adding the digits (1+2+3); but you can't determine the source number from the hashvalue (6). You spouse can't determine from the hashvalue mentioned by the sitter what amount you agreed on. So, it's not the same as encryption, which allows you to decrypt and come up with the source number.

(2) The purpose is not to store values. It's merely to establish the authenticity. In this example, your spouse determines that the amount mentioned by the baby sitter ($125) must be wrong, because its hashvalue would have been 8, not 6. After that authenticity is established (or rejected, as in this case) the purpose of the hashvalue cease to exist.

(3) The hashing function is deterministic, i.e. it will always come up with the same value everytime it is invoked against the same source value.

(4) What if the baby sitter had mentioned $150? The hashvalue, in that case, would have been 1+5+0 = 6, exactly the hashvalue computed by you. In that case, your spouse would have determined the value $150 to be authentic, which would have been wrong. So, it's important that the hashvalue is somewhat unique, to reduce possibility of two different numbers producing the same result. This is known as "collision" of hashvalues.

The algorithm is the key to make sure the possibilty of collisions is reduced. There are several algortihms in use. Two very common ones are MD5 (message digest) and SHA-1 (secure hash algorithm).

Since the source value can't be computed back from hash value, this is considered by some as a more secure process than encryption. This process is useful in situations where the reverse computation of values is not necessary; merely the matching of hashvalues is needed. One such example is passwords. If you want to establish that the password entered by the user matches the stored password, all you have to do is generate the hashvalue and match that with the hashvalue stored in the database. If they match, you establish that the password is correct; if not then, well, it's not. This has an inherent security advantage. If someone somehow manages to read the passwords, all that will be exposed will be the hashvalues of the passwords; not the actual values of the passwords themselves. As we saw earlier, it will be impossible to decipher the original password from the hashvalue. That's why it is common in password storage.

Salt
So, that's great, with some higher degree of security for password store. What's the problem?

The problem is that the hashvalues are way too predictable. Recall from the previous section that the hashvalue of a specific input value is always the same value. Considering the simpel hashfunction (adding digits), the input value $123 will always return 6 as the hashvalue. Consider this: an adversary can see the hashvalue and guess the value, as shown below.

Is the input value $120? The hash value is 1+2+0 = 3, which does not match "6", so it must not be the correct number.

Is it $121? Hash value of 121 is 1+2+1=4, different from 6; so this is not correct either.

Is it $122? Hashvalue of 122 is 5; so not correct.

Is it $123? Hashvalue is 6. Bingo! The adversary now knows the input value.

In just 4 attempts the adversary figured out the input value from the hashvalue. Consider this scenario for passwords. The adversary can see the password hash (from which he can't decipher the password); but he can generate hashes from multiple nmput strings and check which one matches the stored password hashvalue. Using the computing power of modern computers this turns almost trivial. So a hash value is not inherently secure.

What is the solution, then? What if the hash value was not as predictable? If the hash value generated from an input value were different, it would have been impossible to match it against some stored value. This element of randomness to an otherwise deterministic function is briught by introducing a modifier to the process, called a "salt". Like its real-life namesake, salt adds spice to the hashvalue to give it a unique "flavor", or a different vlaue. Here is an example where we are storing the password value "Secret":

hash("Secret") = "X"

hash("Secret") + salt = "Y"

hash("Secret") + salt = "Z"

Everytime the salt is added, a different value is produced. It will not allow the matching of passwords.

In case of LinkedIn, the passwords were stored without salt. Therefore it was easy for the adversary to guess the passwords by creating SHA-1 hash values from known words and comparing against the stored value. Here is a rough pseudo-code:

for w in ( ... list of words ... ) loop
   l_hash := hash(w);
   if l_hash != stored_value
       continue;
   else
       show "Bingo! The password is '||w
   end if;

Lesson for Oracle DBAs

In Oracle database (as of 11g R2 and all prior versions), the passwords are stored in the database in a table called USER$. There is a column called PASSWORD which stores the SHA-1 hashvalue. Using the algorithm mentioned above, an adversary can pass a very long listof words, perhaps the entire Oxford English Dictionary and crack open the password. You may argue that this process is cumbersome and time consuming. Actually, it's quite trivial for a resonably fast computer.

In Oracle, the passwords are not hashed alone. The userid is combined with the password to produce the hash, e.g. suppose SCOTT's password is TIGER. The hash function is applied as:

hash('SCOTTTIGER')

Let's see how Oracle stores the password with an example. Take a look at the password column in the view DBA_USERS:

The password is hashed and thus undecipherable, but we know that SCOTT’'s password is "tiger." Therefore, the hash value for "tiger" when userid is "SCOTT" is F894844C34402B67. Now, if SCOTT’'s password changes, this hash value also changes. You can then confirm in the view DBA_USERS to see if SCOTT’s password matches this hash value, which will verify the password as "tiger".

So how can an adversary use this information? It's simple. If he creates the user SCOTT with the password TIGER, he will come to know the hash values of stored in the password column. Then he can build a table of such accounts and the hashed values of the passwords and compare them against the password hashes stored in the data dictionary. What's worse: he can create this user in any Oracle database; not necessarly the one he is attacking right now.

This is why you must never use default passwords and easily guessed passwords.

Protection
Now that you know how the adversaries use the password hash to guess passwords. you should identify all such users and expire them, or force them to change passwords. How can you get a list of such users?

In Oracle Database 11g, this is easy, almost to the point of being trivial. The database has a special view, dba_users_with_defpwd, that lists the usernames with the default passwords. Here is an example usage:

The output clearly shows the usernames that have the default password. You can join this view with DBA_USERS to check on the status of the users:

Oracle 10g
What if you don't have Oracle 11g?

In January 2006, Oracle made a downloadable utility available for identifying default passwords and their users. This utility, available via a patch 4926128 is available on My Oracle Support as described in the document ID 361482/1. As of this writing, the utility checks a handful of default accounts in a manner similar to that described above; by the time you read this, however, its functionality may well have expanded.

Security expert Pete Finnigan has done an excellent job of collecting all such default accounts created during various Oracle and third-party installations, which he has exposed for public use in his website, petefinnigan.com. Rather than reinventing the wheel, we will use Pete's work and thank him profusely. I have changed his original approach a little bit, though.

First, create the table to store the default accounts and default password:.

Then you can load the table using data collected by Pete Finnigan from many sources. (Download the script script here.) After the table is loaded, you are ready to search for default passwords. I use a very simple SQL statement to find out the users:

Here you can see some of the most vulnerable of situations, especially the last line, which where the username says is SYS and the password is "ORACLE" (as is that of SYSTEM)!! It may not be "change_on_install,", but it's just as predictable.

Action Items

Now that you know how one adversary used the salt-less hashing algorithm to guess passwords, you have some specific actions to take.

(1) Advocate the use of non-dictionary words. Remember, the adversary can create passwords and compare the resultant hash against the stored hash to see if they match. Making it impossible for him to guess the list of such input values makes it impossible to generate has values.
(2) Immediately check in the database for users with default passwords. Either change the passwords, or Expire and Lock them.
(3) Whenever you use hashing (and not encryption), use salt, to make sure it is diffcult, if not impossible for the adversary to guess.