|Programming||By Shamus||May 3, 2011||117 comments|
So, Sony Online was hacked. Again. The data stolen includes: Name, address (city, state, zip, country), email address, gender, birthdate, phone number, login name and hashed password. So the passwords were hashed, which indicates they aren’t completely incompetent.
Here is what hashing means:
Hashing takes a chunk of data and, using the magical machinations of math, turns it into a big jumble of nonsense. For example, there is a particular hashing algorithm called MD5. In PHP, you can run a common string of characters through MD5 and get a nice random-looking hash out of it. If I give it the word “password”, it gives the following string:
If I give it “passwordx”, then it gives me a different hash:
The idea is that a properly secured system will never, ever store the user’s password to disk. As soon as you create your account on a new forum, the password is hashed, then stored. This is why a forum can’t tell you your password if you lose it. It will only allow you to set a new one. It can’t tell you the old password because it doesn’t know it.
Beware of any system that can tell you your own password. That means they don’t hash their passwords. If they get hacked, the hacker will have your real password and can use it elsewhere.
When you log in, the password you entered is hashed. That hash is compared to the hashed password in the database. If they match, then the original words or phrases must have matched, and therefore you entered the correct password. This is how the forum can verify that YOU have the proper password, even if the forum itself doesn’t have it.
Now, you CAN reverse a hash. Go here, type in one of the hashes above, and you’ll get the original password back out. Since it’s math, you can get the original by reversing the steps. So it’s not completely secure. HOWEVER, for reasons of mathematics that goes completely over my head these algorithms are “easy” in one direction and “hard” in the other. It will take a small number of CPU cycles to hash, and a large number to un-hash. That way your web server can keep up with people logging in, but a hacker will need a lot of computing power if they want all the passwords.
EDIT: No, you CAN’T reverse a hash. See the comment from Joshua Kronengold below. Learn something new every day.
But still. So what if the hacker has to wait five seconds for each password? Big deal. That’s not much of a deterrent. Their consumer-level PC will churn them out at the rate of 17,280 a day, which is probably faster than anyone can put them to use. Sure, getting all one million users will take either a long time or a lot of computers, but saying “they can only hack seventeen thousand people a day” doesn’t really inspire confidence. No, we need something more. We need salt.
See, those strings of numbers above are really huge numbers. Yes, I know they have letters in them. Look, it’s complicated, okay? The point is, those 32 digits represent a 128 bit value. If you were to write the number out like normal people, it would have about 38 digits in it. It would look like this:
So… big, is what I’m saying. The cool thing is that since it’s a number, you can do number-stuff to it. Let’s say we pick a nice 38 digit value and add that sucker to our hash, and store that. This number we added in is called “salt”.
Now the hacker needs to know what salt was added to the hash. They need to subtract it back out before they try to un-hash it, or they won’t get the original password. They can experiment with trial-and-error, but suddenly that 5-second computation time starts looking really, really formidable. Five seconds times all of the possible 38-digit numbers is… a long time. I’m not going to work it out, but it’s a safe guess that by the time you find it the Playstation Network will have gone dark for good, as well as the sun.
(Note that I pulled that five second figure out of the air. I’m sure you can make attempts faster than that with most computers still in operation, but the point is: It will be slow compared to the task at hand.)
Now, in a really good system the designers will use different salt on each and every user account. So, even if the hackers break one password, it gives them no help at all in breaking the next one. You could derive the salt from some other source, and you can do more than just add it. Maybe use the timestamp of when the account was created, and make another MD5 hash out of that. Then combine it with the hashed password in some unexpected way. (Like, more unexpected than simply “add them”.) Maybe take the resulting answer and crank it through the same mathematical process a few times. Just make it a process you can reverse when the user logs in and you have to compare the passwords.
Now the only way to break open all passwords is for them to steal the source code used to run the system. A good sysadmin will make sure that stealing the source would be difficult (duh) and also make sure it would be an entirely separate job from hacking your user database.
We don’t know how complex or simple the hashing was on Sony’s passwords, and we don’t actually want Sony to tell us. Those passwords might be lightly encrypted, or they might be (in a practical sense) uncrackable. We just want Sony to make sure this doesn’t happen again.
EDIT: Read also the correction from Jabor below. I thought I knew enough about this to give the Layman’s rundown, but I was missing a few key pieces. Follow the thread below if you want the full story. Hopefully, this explanation will de-mystify hashing despite my procedural errors.