Hashing

By Shamus Posted Tuesday May 3, 2011

Filed under: Programming 128 comments

splash_lock.jpg

So, Sony Online was hacked. Again. The data stolen includes: Name, address (city, state, zip, country), email address, gender, birthdate, phone number, login name and hashed password. So the passwords were hashed, which indicates they aren’t completely incompetent.

Here is what hashing means:

Hashing takes a chunk of data and, using the magical machinations of math, turns it into a big jumble of nonsense. For example, there is a particular hashing algorithm called MD5. In PHP, you can run a common string of characters through MD5 and get a nice random-looking hash out of it. If I give it the word “password”, it gives the following string:

5f4dcc3b5aa765d61d8327deb882cf99

If I give it “passwordx”, then it gives me a different hash:

3bd27bf850cc36a34ce3b7f0cca6d6b0

The idea is that a properly secured system will never, ever store the user’s password to disk. As soon as you create your account on a new forum, the password is hashed, then stored. This is why a forum can’t tell you your password if you lose it. It will only allow you to set a new one. It can’t tell you the old password because it doesn’t know it.
Beware of any system that can tell you your own password. That means they don’t hash their passwords. If they get hacked, the hacker will have your real password and can use it elsewhere.

When you log in, the password you entered is hashed. That hash is compared to the hashed password in the database. If they match, then the original words or phrases must have matched, and therefore you entered the correct password. This is how the forum can verify that YOU have the proper password, even if the forum itself doesn’t have it.

Now, you CAN reverse a hash. Go here, type in one of the hashes above, and you’ll get the original password back out. Since it’s math, you can get the original by reversing the steps. So it’s not completely secure. HOWEVER, for reasons of mathematics that goes completely over my head these algorithms are “easy” in one direction and “hard” in the other. It will take a small number of CPU cycles to hash, and a large number to un-hash. That way your web server can keep up with people logging in, but a hacker will need a lot of computing power if they want all the passwords.

EDIT: No, you CAN’T reverse a hash. See the comment from Joshua Kronengold below. Learn something new every day.

But still. So what if the hacker has to wait five seconds for each password? Big deal. That’s not much of a deterrent. Their consumer-level PC will churn them out at the rate of 17,280 a day, which is probably faster than anyone can put them to use. Sure, getting all one million users will take either a long time or a lot of computers, but saying “they can only hack seventeen thousand people a day” doesn’t really inspire confidence. No, we need something more. We need salt.

See, those strings of numbers above are really huge numbers. Yes, I know they have letters in them. Look, it’s complicated, okay? The point is, those 32 digits represent a 128 bit value. If you were to write the number out like normal people, it would have about 38 digits in it. It would look like this:

34,028,236,700,000,000,000,000,000,000,000,000,000

So… big, is what I’m saying. The cool thing is that since it’s a number, you can do number-stuff to it. Let’s say we pick a nice 38 digit value and add that sucker to our hash, and store that. This number we added in is called “salt”.

Now the hacker needs to know what salt was added to the hash. They need to subtract it back out before they try to un-hash it, or they won’t get the original password. They can experiment with trial-and-error, but suddenly that 5-second computation time starts looking really, really formidable. Five seconds times all of the possible 38-digit numbers is… a long time. I’m not going to work it out, but it’s a safe guess that by the time you find it the Playstation Network will have gone dark for good, as well as the sun.

(Note that I pulled that five second figure out of the air. I’m sure you can make attempts faster than that with most computers still in operation, but the point is: It will be slow compared to the task at hand.)

Now, in a really good system the designers will use different salt on each and every user account. So, even if the hackers break one password, it gives them no help at all in breaking the next one. You could derive the salt from some other source, and you can do more than just add it. Maybe use the timestamp of when the account was created, and make another MD5 hash out of that. Then combine it with the hashed password in some unexpected way. (Like, more unexpected than simply “add them”.) Maybe take the resulting answer and crank it through the same mathematical process a few times. Just make it a process you can reverse when the user logs in and you have to compare the passwords.

Now the only way to break open all passwords is for them to steal the source code used to run the system. A good sysadmin will make sure that stealing the source would be difficult (duh) and also make sure it would be an entirely separate job from hacking your user database.

We don’t know how complex or simple the hashing was on Sony’s passwords, and we don’t actually want Sony to tell us. Those passwords might be lightly encrypted, or they might be (in a practical sense) uncrackable. We just want Sony to make sure this doesn’t happen again.

EDIT: Read also the correction from Jabor below. I thought I knew enough about this to give the Layman’s rundown, but I was missing a few key pieces. Follow the thread below if you want the full story. Hopefully, this explanation will de-mystify hashing despite my procedural errors.

 


From The Archives:
 

128 thoughts on “Hashing

  1. JPH says:

    Ah. You know, I’ve always wondered why forums can’t just tell me my old password…

  2. No, you cannot reasonably reverse a good hash–at least, not with current computing tech. Yes, including MD5.

    Sites like http://md5.gromweb.com/ work by memorizing hashes and then returning the memorized result! That’s why there’s that “Known MD5 sums : 45,538,019” at the bottom of the gromweb page — there are 45 million hashes they can reverse, because that’s how many they know (and every time you give them a new thing to hash, they now know how to reverse it).

    Of course, you can use password guessing techniques to attempt to brute force a giant list of hashed passwods — see “crack”. But if your password is hard to guess (being longer than 8 characters helps here) it’s unlikely to be guessed by such tools while you’re alive.

    1. Shamus says:

      Thanks. Corrected.

      1. James Schend says:

        Might also be worth mentioning that when you “reverse” a hash, you might actually receive a different password than the original– it’s possible for multiple passwords to share the same hash, since the space of possible hashes is much smaller than the space of possible passwords.

        1. pneuma08 says:

          True, but highly, highly unlikely if the hash function is devised well.

          1. Stephen says:

            If by “highly unlikely” you mean “completely possible”, then yes.

            A hash function maps an infinite set of inputs to a relatively small set of outputs. There are an infinite number of passwords that map to the same hash.

            It doesn’t matter how good the function is, that’s always true.

            The trick is that it’s very hard to find another password that matches – for a good hash function you just have to try a lot of inputs until you get lucky; for a bad one you can generate them faster than chance (that’s the flaw in MD5).

    2. GTRichey says:

      Fun fact regarding password length. I tend to use a password manager for anything that goes beyond forums etc. I was shocked when a government website here requires that your password be between 6 and 8 characters. Oddly enough this same government website requires more information to establish your account than it takes to get a driver’s license or passport (or other similar document that effectively establishes your identity generally beyond any doubt). Why anyone would restrict the security on something so hard to establish to be easily brute forced is just insane. Interesting tidbit.

      1. DanMan says:

        What I never understood is why is there an upper limit on password length? Is it just that the hash is going to be so long and the database field is configured to only hold a certain size? Is there some other reason?

        I understand that “123!@#abc” is a more secure password than “abcdefghi” because the ability to have numbers and special characters increases the number of combinations exponentially, but “123!@#abc” is less secure than “thisisapasswordthatiamprobablygoingtoforgetbecauseitssolongandhardtoremember”

        1. Jabor says:

          The password hash is always the same size regardless of the input. Personally, I am extremely suspicious of anything with a maximum length, as well as anything which disallows characters like “,(,),’,` etc., because it generally means whoever developed that has no clue about security and is probably storing stuff in plaintext.

          1. Abnaxis says:

            I wrote my own hashing code once, and while it is less secure, it can be made much more efficient if you have a maximum character length restriction. You can (in the example with the 6-8 characters above) write three different classes for multi-byte ints, of 6, 7, and 8 bytes each, and optimize them. Doing it with more arbitrary-length passwords requires you to use more generalized data structures, and doing math with that usually requires you to traipse through the password one character at a time, instead of all the bytes at once. Keeping the max under 8 lets you merge all the letters into a single big number and do math with it more quickly.

            At least, that’s how it was in my code. I’m not entirely sure there isn’t someone who’s done it better already.

            I have no idea why some passwords don’t allow non-alphanumeric characters, though I would guess it’s also to simplify the math. While most characters are represented by a byte (8 bits), more than half of that is characters you’ll never find on a keybord. Altogether, there are 62 alphanumeric characters, and you can fit exactly 64 characters into 6 bits.

            1. K says:

              And what would the point be of optimizing password hashes? They are checked against once per login, and need a disk access to do so anyway, which will add at least 3 milliseconds to the process (assuming we’re talking SSD on the local machine, and not the way more common “reliable 5400rpm drive on a remote NAS through an ODBC-connection”, where 250ms is much more realistic). You cannot even measure that performance gain any more!

              Premature optimisation is the root of all evil.

              There is also not the slightest reason to disallow anything but UTF-8 (Unicode). Just transform the thing somehow into bytes (you don’t even need to care how it’s done, as long as you do it the same very time) and interpret it as a number, then calculate MD5 from it. Done.

              1. K says:

                Addendum: And even if you only use the first 8 letters for your hashes (which would be very bad), there is no reason to limit the user. You can just cut off the excessive letters before you do the hashing. For the user, he could input “mypasswordislikethis” or “mypasswo” and both would allow him to log in. It doesn’t make it any more safe, but it is more convenient.

                1. Abnaxis says:

                  I can’t imagine this ever being a good idea. If you are only using 8 letters, you shouldn’t tell the user you are using more. What happens when their account gets compromised, you tell them to change their password, and they only change the last few characters? Not good.

                2. Blake says:

                  Better option:
                  ThisIsALongPasswordButTheInternalAlgorithmOnlyUses8Characters
                  ThisIsAL
                  +
                  ongPassw
                  XOR
                  ordButTh

                  etc etc.
                  Can still be rainbow tabled as easily, but at least it’s got less chance of being an easy password like ‘password’.

              2. Abnaxis says:

                Interpret as a number, which you have no way of knowing the size of and therefore need a math function which takes an arbitrary number of bytes to perform operations, rather than the optimised ones that already exist in the [insert language here] library.

                And remember we’re talking about government servers here, which are likely cheap, obsolete, and at the same time could see way more traffic than a server run by a private company (if there is a disaster or something).

                I’m not saying this is the way things should be done, and my own personal code didn’t work this way. I’ve just noticed, while writing my own code, that there is a significant increase in computation time when you go from a set number of bytes per password to an arbitrary length password. Is that increase significant compared to disk access time or is it enough to clog a fifteen year old server when there’s a SARS outbreak? I dunno, but I can see the justification for optimizing things just in case. Especially since the code is easier to write.

                1. Blake says:

                  Except you don’t want a password hashing algorithm to be quick.

                  Quick algorithms mean quicker brute force attacks or rainbow table generation.

                  Ideally you want to run the hash a secret number of times on itself (say 26), now if the hacker wants to generate a rainbow table (even if they know what steps your algorithm takes, it’ll take 26 times longer to generate the table.

                  The time it takes to run the algorithm 26 (or even 100) times will still be far quicker than the user will notice, will making a huge difference to would be hackers.

        2. Heron says:

          When sites have a small maximum password length, they’re inevitably storing it in plaintext on some ancient mainframe — and this is especially likely if their passwords are case-insensitive.

          Yes, it happens.

          1. Klay F. says:

            Its funny because in the top-right corner it says, “This site is secure”.

            I see what they did there….they’re trolling, they MUST be.

          2. WJS says:

            Strictly speaking, case-insensitive passwords doesn’t have to mean that it’s stored in plaintext. I can’t think of why you would want to, but you could just call toLowerCase(password) before hashing and it wouldn’t matter what case you typed the password in.

            Now one trick that I’ve seen that does seem like a good idea is what Facebook (IIRC) does with their passwords: they calculate both hash(password) and hash(invertCase(password)), and if either match, then the login succeeds. Unlike total case insensitivity, this has merit if the user accidentally tries to log in with capslock on (websites can’t warn you “hey, you have capslock on, you really wanna type your password like that?” like desktop OSes do).

    3. X2-Eliah says:

      So basically there is a site that lets users enter a passwrod to see how it hashes, and then it stores the password and the hash by the MD5 algorithm?

      That means if, for example, I used a ‘ponyPower’ password on my accounts – many of which use MD5, if another random person entered ‘ponyPower’ into their hasher, then that site now technically has my password AND it’s hashed look?

      What’s stopping the crackers to compare the hashed strings they grabbed with the hash-password matched pairs stored on that site?!

      Sure, for most there will be no correlation, but on the other hand – there are 45 million sums already deciphered, just like that – a lot of which might as well be legit passwords.

      1. Rosseloh says:

        That’s what the salts are for.

      2. psivamp says:

        As Rosseloh said, “That’s what the salts are for.”

        Also, if you make ANY change to the hashing algorithm, the entire lookup table is made useless. The table can only be used if you’re security is made by a lazy incompetent who didn’t modify the algo or salt it. And you still have to know or guess which hash function he used.

        1. Heron says:

          It takes experts years to be sure their new hashing algorithms are secure — NIST has a contest right now for developing the next hashing standard, with people like Bruce Schneier competing, and even his team has had to make some changes when vulnerabilities were found.

          Modifying the hashing algorithm itself (rather than simply choosing a different standard algorithm) is an absolutely terrible idea, because even if you’re an expert, you’ll most likely be making the algorithm weaker.

          Per-user salting is the way to go.

          1. WJS says:

            Worth noting that for the purposes of rainbow tables, “modifying the algorithm” includes such things as “hashing the passwords 97 times instead of 100”, which shouldn’t have the obvious security issues that “rolling your own hash function rather than using a crypto library” would.

      3. bucaneer says:

        Also, you have other problems if you enter your password in some random site just for kicks.

      4. Klay F. says:

        I hope to God most people don’t use MD5 nowadays, as its almost disgustingly outdated.

        I don’t know the particulars of hashing algorithms, but I never understood why people don’t using something a little more secure like WHIRLPOOL.

        1. Heron says:

          MD5 hashes are 128-bit values and brute-forcing something like that takes a really, really long time. There are some vulnerabilities, but none of them are practical for taking a list of hashes and “reversing” them. Brute-forcing MD5 is still impractical.

          MD5 also has other good uses such as file integrity checking (making sure that the file you downloaded from the server is intact and uncorrupted).

          However, if you’re protecting important things like access to financial data you’d want to pick something that’s more collision-resistant than MD5, like SHA-256 or SHA-512, which are standard secure hashing algorithms.

          What I’m trying to say is that “old” does not mean “broken”, it just means “old” :)

          1. Klay F. says:

            Again, I’m not super knowledgeable regarding cryptography, but wouldn’t an algorithm that isn’t collision-resistant at all be a terrible choice for hashing?

            1. Heron says:

              All hash algorithms have collisions, the only difference is whether we know how to generate collisions and under what circumstances. Yes, MD5 has some collision vulnerabilities, but it’s not as easy as running a program and waiting twenty minutes. The collision attacks on MD5 generally rely on already knowing one plaintext in order to generate another plaintext with a colliding hash — in other words, the hacker would need to know your password first.

              If all the hacker has is the hash of your password, they’re pretty much going to have to brute-force it unless it’s short enough to be in a lookup table somewhere.

              MD5 is not the right choice for everything, but it’s good enough for blogs and forums :)

              1. Eltanin says:

                I somehow missed this post when it came out which is killing me because this is a topic very near and dear to my heart.

                Heron, you are spot on of course. However, my other major pastime in life is to be an annoying nitpicker.

                Therefore, in response to your “Old doesn’t mean ‘broken'” comment, I just have to say: DES encryption.

                Yeah yeah, I know that my comment doesn’t really have any merit. That’s why it’s nitpicking! Yay!

              2. WJS says:

                While it’s true that collision vulnerability isn’t the same thing as reversibility, it’s really better not to use MD5 for anything, even if in some cases it might not be vulnerable. Don’t ask “Why shouldn’t I use this function with known vulnerabilities?”, ask “Why should I?”.

    4. Alexander The 1st says:

      That still leaves the case that a hacker could compare hashed passwords against the known MD5 sums – it slows down the process a bit, but for example, if “password” is a known MD5 hash, that’s…going to make a hacker’s job a lot easier.

      If I’m not mistaken, this is what a dictionary attack is then?

      1. mneme says:

        This is almost what a dictionary attack is; as a dictionary involves guessing passwords in bulk, not known hashes. Grab a dictionary, combine words and check them against your set of crypted strings, then start using typical transformations people perform on words to make them into passwords (eg, letter-number subsitution, letter-word substitution) to make more words. A decent sized salt mean you have to do the work over for every single password — rather than being able to compute a hash and compare it against your whole list (or a sizable subset of same, if, say, you’ve got 25 million hashes and the salt’s only got a space of 256), but get enough passwords together in a pile and a bunch of them will be trivial or easy to guess.

    5. David says:

      According to How To Safely Store A Password, you can literally test all lowercase, alphabetic passwords [SHA1/MD5] which are ≤7 characters in less than 2 seconds. And you can now rent the hardware which makes this possible to the tune of less than $3/hour.

      Best to use a hashing algorithm that takes a long time.

      1. Heron says:

        In fact it’s *not* best to use a hashing algorithm that takes a long time; it’s best to require longer, more complex passwords, and use a good salt. The guy that wrote that page is being quite misleading when he says salts do not help against brute-forcing.

        See, if you don’t salt, then the attacker only needs to do a single brute-force run and then look at your entire database to find matches. But if you *do* salt, then the attacker has to do an entire brute-force run for every password in your database, and that obviously drastically increases the amount of time it will take to brute-force the entire list of passwords.

        I wrote a brute-forcing program in college for an assignment, and I can tell you that brute-forcing nontrivial passwords is, well, nontrivial. Saying “short lowercase alphabetic passwords are easy to brute-force” is, frankly, an invalid argument against md5/sha; it’s just an argument against using short lowercase alphabetic passwords.

        He calls brute-force attacks on md5 and SHA “cheap and effective”, but he’s pretending that you can’t make those attacks too expensive by simply requiring more complex passwords. If you want your password hashes to be secure, don’t choose a slow hashing algorithm; instead, enforce a sane minimum password length and complexity. (In particular, using bcrypt is not a substitute for requiring sufficiently complex passwords.)

        There’s another reason to choose a fast hashing algorithm: how many times have you been annoyed that the server takes six seconds to log you on to your favorite Lawn Care Enthusiasts forum? Slow hashing algorithms just make that problem worse.

        1. Jordi says:

          Obviously there are things that are more important than the speed of your hashing algorithm, but the only real argument for a fast one I can see here is that you get verified faster. But it seems to me that the hashing speed is probably not really a bottleneck here (although you can of course overdo it).

          The obvious advantage of having a slowish hashing procedure is that it will simply take longer to crack a password. If I call the md5 function 10 times, login time is not going to increase noticeably, but I did just make cracking passwords 10 times slower. Why wouldn’t you want to do that? (Of course, having better passwords is more important, but there seems to be no reason not to have both.)

          1. Kayle says:

            There are many uses for hashes, and being slow just for being slow has little appeal. One example is SSL and IP Security, which applies a (cryptographic keyed) hash to every packet; a gateway may need to hash billions of bytes per second for tens of millions of packets per second.

            The way to make a brute force attack difficult is to increase the number of bits hashed, each additional bit doubles the amount of work needed for brute forcing, for human remembered passwords, salting is the traditional way combined with other technical means.

      2. Shamus says:

        I remember in 1999 my company used a program for brute-forcing passworded zip files. (We were hacking our own stuff. It’s a long story.) It had a dictionary it would run through and then begin cranking away at random strings. It started with 1 character passwords and worked its way up. It could do everything in under 5 characters in a couple seconds. 5 and 6 took a couple of minutes. 7 took about an hour. 8 took a few hours. 9 was over a day and 10 was almost a month.

        Now, zips use a very different encryption, but if anything I’d expect them to be LESS intensive. Finding out you can attack all <7 passwords in just 2 seconds blows my mind.

        1. Heron says:

          It’s <= 7 characters in 2 seconds if it's only lowercase letters ;) If you add in uppercase, numbers, and a handful of symbols, the complexity (and time) goes up rather quickly. 7 characters can take hours depending on the character space you're searching.

        2. scragar says:

          In 1999 zip was highly unsecure, with knowledge of the first 10 or so bytes of the encrypted file(most filetypes have a well recognised header with only a few changes, for a source file this would likely be part of a comment) you could find the first half of the generated password hash it used, given the predictable RNG it was then possible to reverse engineer this to a few possible password.

          All this would take about an hour(using the source provided by PK themselves :P) I know because I did it to a file around the same time(I had mistyped the password setting it up and now couldn’t access it).

          In 2001 they tried to stop this by adopting AES as the default encryption, and in 2004 the original system was made read only.

  3. Jabor says:

    Actually, that’s not how salting works.

    The critical point is you add the salt to the password before you hash it – thus the same password, with two different salts, will hash to markedly different things.

    If the attacker compromises your server and can get the password hashes, he has the salts as well (if your salts are more protected than your password hashes, why didn’t you protect your password hashes better?). What salts do is prevent the attacker from brute-forcing the hash once, and then looking for matches in your database – they have to run the brute-force attack separately for each individual user.

    1. Shamus says:

      Oh yeah. I’m going great today. We’re three comments in and I have two major errors in my post. Awesome. Just… awesome.

      1. X2-Eliah says:

        Take all criticism with a grain of salt, you know.

        1. Jabor says:

          Indeed. Crypto is hard, and even smart developers get it very wrong. For instance, even using salted MD5 or SHA-1 for password hashes is bad (though it is better than using unsalted hashes or cleartext passwords, obviously). MD5 and SHA-1 are designed to be fast, which is the exact opposite of what you want in a secure password system. And as computers get faster and cheaper, “fast” hashes get even worse.

          The correct choice is either an adaptive hash (i.e. scrypt) which you tune so that it stays slow and remains a strong hash even as computer hardware gets faster and cheaper, or you use something like SRP

          1. Peter H. Coffin says:

            The important thing for MD5 and SHA-1 is that they’re pretty easy to implement, and therefore they’re pervasive. Even SQL RDBMS systems have the functions thrown into them.

            (Which gets you to the point of being about to pull tricks like being able to write your password authentication as a stored procedure on the database system, that your web-facing systems have execution authorization to but do not have manipulation authority to. Same kind of ring that happens when those web servers have rules built into that that “root does not log into this machine from the network”.)

          2. Matt says:

            Also keep in mind that typically MD5-based password hashing (as well as SHA) is not the same as calculating a simple MD5 hash:

            http://en.wikipedia.org/wiki/Crypt_(Unix)#MD5-based_scheme

            That section describes how MD5 was used as the basis for what was at the time a much more computationally expensive algorithm, with the goal of making it difficult to brute-force acquired password hashes. One version of the code is available here:

            http://cpansearch.perl.org/src/LUISMUNOZ/Crypt-PasswdMD5-1.3/PasswdMD5.pm

            if you’re curious. Note that among other weirdness, the algorithm uses an MD5 library in a loop that iterates 1000 times.

        2. MrWhales says:

          A trans-dimensional Ruts has invaded the comments with his trans-dimensional puns. Damnit.

      2. Nick Bell says:

        This is why I come here, and why I read all the comments. No offense, but the collective intelligence of Twenty Sided is far greater than you alone. And better for it.

        1. Mistwraithe says:

          Resistance was futile. You have been assimilated.

  4. Thaddeus Lange says:

    I’m not a security expert, but I don’t believe MD5 can actually be reversed. It is typically broken by using a rainbow table, which is a basically just a lookup table of precomputed passwords and hashes. You do a dictionary attack once to populate your rainbow table, which will probably take a long time, but then you have a lookup table so you can quickly crack other passwords using the same hashing scheme. Salting the passwords is effective against this attack because it would require the hacker to figure out your salt and compute a new rainbow table (in the case of a single salt for all users) or create a rainbow table for each password(in the case of user specific salts).

    1. Nathon says:

      MD5 is actually broken, but only because it’s possible to create collisions given the original hashed data. That makes it worthless for things like securely signing documents and executables. However, you’re right. The whole point of hashing passwords is that the attackers don’t have access to the original password, so it’s not easy to create a value that collides with it.

    2. MrPyro says:

      This is pretty much my understanding of it as well.

      Hashing algorithms cannot be reversed because they discard data; it’s not like compression or encryption where all the data is being kept but concealed/altered. All MD5 sums are the same length, irrespective of the length of what they are hashing; you can take a hash of any file/string and it will still give you a short string in return: for example

      4778da397d5cb51c157fea1ba1092736 is the hash of a 100Mb binary file*

      c4aa42e361b5ef678e67bbd4908b66b8 is the hash of an 88Gb binary file*

      No way you’re getting the original data back from them: total violation of Shannon entropy

      Given that you can get an MD5 sum of anything, and MD5 sums are not infinitely long, this means that there are lots of distinct data items that will have the same hash (these are generally known as ‘collisions’). This isn’t normally an issue with passwords though; since passwords are so short, there is generally a 1-to-1 mapping between passwords and MD5 sums.

      * real strings taken from a backup procedure I handle for work

      1. bbot says:

        Seconded. You can find collisions of hashes, (which, essentially, is how rainbow tables work) but you can’t reverse them.

        Collisions are the reason why you can’t reverse a hash. Since you can hash an arbitrary amount of data, you can’t prove that any given hash corresponds to an input string. “Dog” may be SHA-1 to 17c5809c8f88df906f58e2cd529a1c9bd7a8db47, but Dog is not the only string which does so.

        Not a big issue with password cracking, since all you care about is the output hash; and because most people don’t have multi-thousand character passwords.

        1. WJS says:

          Heh. I’m an advocate of long password limits, but that to me means 100 or 128 bytes, not kilobyte lengths!

  5. gurugaspar says:

    Nice explanation, I have read a little about this kind of thing, but that clears some of my questions up.
    It seems like all the news I see about this is on video game review sites and blogs. I am sure there are a lot of people who may/will be affected, but don’t even know anything is wrong.

  6. DanMan says:

    Until Quantum Computing is finally figured out, we won’t be able to unhash by brute force. There are simply too many combinations.

    The number of combinations of a good 256-bit encryption would be more than the theorized number of atoms in the universe. WAY too huge to brute force.

    People do know the hash for “Password”, however. That’s why people say “don’t set your password to your name or just ‘password'” It’s not so much that people can necessarily guess it, but that if they do hack in and get your name, they can hash the name and try that as the password.

    1. Klay F. says:

      However with Quantum Computing also come huge advances in quantum crypto. The most obvious example being quantum key distribution. Any attempts to learn about a quantum key will instantly be known.

      There are plenty of ways to encrypt data using quantum mechanic. There seems to be new methods being discovered all the time.

  7. Interesting read. That explains a lot.

    What doesn’t explain much at all is this new Spoiler Warning related thing I drew. That’s because it’s less an explanation, more a prediction or sorts. Like when I say “the sun will rise tomorrow” or “Family Guy will continue to not be funny.

    I forget the point I was making. Enjoy.

    1. Shamus says:

      I swiped that and made it the Splash image for season 5. I hope that’s okay. :)

      1. Awesome.
        I’ve got no problem with that – I kind of figured you might want to anyway.
        In fact I’ll go ahead and say if I draw anything Spoiler Warning related, do what you will with it – based on your show after all.

        I’m just looking forward to the depicted events actually happening – lots of walking so far. Actually, I don’t think the dam breaking can occur, but I’m pretty sure I know what will happen then anyway. In any case, I guess it’s to it’s credit that Fallout 3 put you in basically a straight line to your goal, then did away with walking almost entirely, so there was a lot more opportunity to cause mayhem…

  8. psivamp says:

    Public key encryption operates the way you described hashing. If you have the decryption key, you have an “easy” calculation to do to turn your cryptotext into something that makes sense. If you don’t have the decryption key, you can’t easily reverse the algorithm. Most operate on the multiples of primes. It’s easy to take two prime numbers and multiply them, but it’s not as easy to take a large number and figure out which two primes made it.

    The problem with most encryption schemes is that their security is based on unproven number theory and upon the idea that quantum computing can’t happen. When quantum computing starts to actually work on a real scale, we have to rethink encryption. The other thing about encryption is that basically nothing is secure forever. We try to make things secure for longer than they’re “required” to be secure. A guaranteed-success brute force attack on 128-bit RSA might take a year*, but as long as the information you protect with that encryption is safe to disclose by then, we don’t really care.

    *This time is made up and probably the wrong order of magnitude. I’m not at home so I can’t reference my crypto book.

    1. DanMan says:

      The other problem with encrypting instead of hashing is that encryption does need to be able to come back out in a readable way. As you mentioned, you need a key to do that. People don’t always put their keys in a smart place.

      All locks need a key to be opened. All keys can be stolen.

      If the hacker gets their hands on the key, it doesn’t matter how good your encryption is.

      1. psivamp says:

        Yeah, that’s why there’s so much time spent on the key-exchange and signing stuff whereas encryption has kind of stagnated recently. You can use 512-bit RSA encryption with a bullet-proof implementation but if someone does a man-in-the-middle attack on you and pretends to be your partner the whole time from key exchange, you’re still completely hosed.

        Which is a good reason not to connect to random ad-hoc wireless networks…

  9. The Schwarz says:

    About what Thaddeus said: Rainbow tables are built around the principle of reversing a standard function. The idea is, you make one huge effort for creating a giant table of something-or-other (if you want to know the particulars, there’s always Wikipedia), and after that, every time you want to reverse that function, you can do it in a relatively small time, using the tables you built.

    So if, in this case, MD5 is your “standard” function, then you can have your rainbow tables for MD5 all ready and shining, but if someone makes even a minor change to the function (e.g. adding a “salt”), then even if you KNOW the exact change they’ve done, you’re gonna have to make all the huge effort creating the tables from scratch.

    1. K says:

      It is also important to point out that you cannot make a rainbow table for MD5 for any decent length of passes. An entry will take you around 100 bytes (that’s around 25 letters), give or take a magnitude of 10.

      For just eight letters, you’d need 26^8 entries. That’s (hold on, need a calculator) about 2 GB of data. But if you allow numbers and capitalisation, you’re up to (26+26+10)^8 entries. Now we’re at 2 TB. Let’s just go with 9 letters instead of 8. That’s 56 times more, so 100 TB.

      Now imagine 12 letters, using the full Unicode range, from a,b,C,D,à¶,$,ひらがな and so on. And that is why a strong password helps you, because it will not show up in a rainbow table any more (except for collisions, which is another topic).

  10. krellen says:

    All of this ignores the easiest way to crack a network: be inside it. If I had a dollar for every time someone at my workplace has asked me “Do you want me to give you my password?”, I’d buy myself Portal 2.

    I’m sure most of your readers know this, but it bears repeating: Never, ever, ever, ever tell anyone your password.

    1. MrPyro says:

      Ah yes, good old social engineering.

      Swiped from a friend of mine on Twitter:

      ‘The field of Information Security captured in a single symbolic image. http://i.imgur.com/rGtgr.jpg

      1. Zak McKracken says:

        Thank you sir, you just make me laugh, a lot!

      2. Peter H. Coffin says:

        And never, ever underestimate the power of a Ben Franklin or 2318008 attack.

      3. WJS says:

        That works on several layers, because it would actually keep most people out.

    2. Joe Cool says:

      My password is 12345.

      1. krellen says:

        I’d make a crack about idiots and luggage, but I’m pretty sure we’ve all already heard it.

  11. Nick-B says:

    I thought I’d read somewhere that the recent “hack” that got into SOE was actually perpetrated a day or two BEFORE the PSN hack (which itself occurred mid-April). They only took until now to realize it was also compromised after an internal audit from the very public PSN intrusion.

    EDIT: The following was an update sent out regarding the “new hack”

    “While the two systems are distinct and operated separately, given that they are both under the SONY umbrella, there is some degree of architecture that overlaps. The intrusions were similar in nature. This is NOT a second attack; new information has been discovered as part of our ongoing investigation of the external intrusion in April.”

    Seems the data was “about 10,700 direct debit records of certain customers in Austria, Germany, the Netherlands, and Spain” were lost, apparently from “an outdated database from 2007.””

  12. Zak McKracken says:

    Alright, so I’m not an expert, but Shamus is saying that an important part of security is making sure noone has access to the source code.
    Now, as far as I’m informed, that translates to “security by obscurity”, and it’s a very faulty concept. The most secure algorithms and also the source code for them, are public! The security should not be in “you don’t know what I’m doing” (because when I eventually find out, I’ll have the key to everything), but “here’s what I’m doing, and I know, and every cryptographer on earth can verify it, that this procedure is irreversible”.
    That is why GnuPG is secure, and Skype isn’t. Because if you want to compromise GnuPG (or PGP — the algorithm is also published properly) you gain nothing by knowing the code, because the secret is in the data, and it cannot be recovered except by knowing the secret. And that cannot be recovered without knowing the passphrase for the secret. While Skype … well, they claim to have a secure algorithm, but they won’t let anyone look at it, and they have deals with government agencies, so … not secure.
    Does anyone know more? I’d be worried if keeping the source to your account management software secret was a requirement.

    1. K says:

      If you know how the salt is calculated, you still cannot use it without the password in plain text, because it is added to the password to generate the hash.

      The hashes you have cannot be compared (two users with “password” as password will have different salts, and therefore different hashes). And if you know the salt, you still cannot use that to do the comparison, because the hashing algorithm only works one way. It looks like obscurity, but in reality, the company could actually tell people what they use as a salt.

      Or let me phrase it result-oriented: Every salt requires the hackers to write a rainbow table for that explicit salt.

      1. WJS says:

        Typically, the salt is unique per-user and stored in the database (thus the attacker will have it). This is actually a good thing, because the alternative is a single salt for all users. If the attacker gets his hands on that, then he can build a table for the whole site, as opposed to having to brute-force each and every password individually.

    2. CrushU says:

      This.

      Security by Obscurity isn’t a good thing… I think what Shamus was going after is that if you have code to generate a hash, and someone else gets that code, it becomes easier to unhash. (“easier” in this case means order of Years instead of order of Decades. Or some such.)

      The best thing, I would think, would be for the code to have user input for the salt to use. Thus, the information for the salt wouldn’t be in the code at all, so the full algorithm would never be known.

      1. K says:

        Nope, that doesn’t help. All you’re doing is asking for two passwords instead of one. The idea behind salts is simple, really: Instead of inventing your own (complex and possibly unsafe) hash function, you use one that’s good, and salt it with your own value, just to make the function’s results unique.

        The obscurity effect isn’t the target, it’s a beneficial side-effect.

      2. krellen says:

        Security by Obscurity is bad when it is your only form of security. Security by Obscurity is good when it is one in a suite of security protections.

        1. Zak McKracken says:

          Is it? My information may be slightly one-sided, but still: Of course putting the hashed passwords in public would not be a good thing, and the same holds for the salt or whatever data you have. But if your security (or parts of it) rely on noone knowing the code it is implemented with, then you’re in danger, for two reasons:
          1. Bad security is worse than none, because you’ll still rely on it, to some degree.
          2. If noone knows your code, then how many people have tested it properly? Checked it for errors and vulnerabilities, proven that the algorithm itself is safe, etc.? The most secure way of doing things is choosing something that has been published, peer-reviewed, found good, replicated, still found good, picked apart by the whole community, had a bug or two squashed and finally found to be solid.
          3. Any software maker that says “this is sooo secure we can’t even tell you which method it uses” is either stupid or trying to screw you, or both.

          Of course, it is an obstacle for a potential hacker if he needs to find out which software you’re using in the first place. So if you have several choices that satisfy your demands, then it might slow attacks down if you do not announce which implementation of which algorithm you’re using, but that is not something that can actually keep attackers out if they know what they’re doing. The attackers who don’t know what they’re doing are not the ones you should adapt your system to.

          The above is roughly what I know about the obscurity thing. Feel free to complement my knowledge.

          1. andy_k says:

            I will complement your knowledge – it seems good :)

            This reflects the current state of the art thinking in terms of security of IT systems. Bruce Schneier, a widely respected and outspoken IT security guy, basically preaches this approach and most serious players accept this in theory if not in practice.

          2. K says:

            This is correct, but you ignore one last possibility. When you arrive at the point where you use strong hashing, salting, enforce complex passwords, prevent brute forces through time-outs and whatever else you can imagine, you don’t tell the hackers what you do. Technically, that is obscurity, but it’s just the practical way to do it.

          3. krellen says:

            The only thing better than a Fort Knox-like facility is a Fort Knox-like facility no one knows about.

    3. James Schend says:

      Security needs to be layered. Obscurity is a perfectly fine layer– for example, it’s a lot harder to crack a password (on a weak/crappy system) when you don’t know what hashing algorithm was used than it is if someone flat-out tells you it uses SHA1. Hell, the most secure system would be one people don’t even know exists, and that’s all obscurity.

      The only problem is when people use it as the *only* layer of security. “Oh nobody will find my admin page, because the URL is just a random string of gibberish”. Sure, then you IM or email it to your buddy.

    4. Shamus says:

      Oh, I agree that obscurity by itself is TERRIBLE security.

      Of course, in my example you only needed to protect the source because I was adding the salt in the wrong spot. (The dangers of not knowing what you don’t know. In my defense, I would have hit the books if I’d been writing actual software instead of a blog post.) Done properly, the source wouldn’t matter.

      1. Zak McKracken says:

        Obscurity in the form of “you don’t know my salt” I can understand. Obscurity in the form of “noone can know the source code” I don’t
        Also, as far as I understood (this “salting” thing is new for me), if you have a unique salt for every user, it doesn’t matter if the potential attacker can recreate it, that will still not help him recover the passwords, because he’d still need new rainbow tables for each salt.

        1. anna says:

          Zak, that’s exactly right, and is exactly the point of salting each user’s password. The salt defeats rainbow tables, since the table would have to be generated with that particular salt in mind. In Linux, at least, the salt is different for each user, and readily available to an attacker who compromises the shadow file. Here is an example password hash from Linux:

          $1$yc8OZf4G$cYVYj0mqv/ZWRmGf4eukp0

          ‘$’ acts as a field separator, so the data in this is:

          1
          yc8OZf4G
          cYVYj0mqv/ZWRmGf4eukp0

          1 tells you which hashing algorithm was used (1 is md5). The second field is the salt, and the 3rd field is the hash of the actual password+salt. This is still brute force-able (given enough cycles), but it is resistant to a rainbow table unless you just HAPPEN to have a rainbow table for the salt yc8OZf4G. Generating rainbow tables for every possible salt would be… let’s see.

          8 character salt * 62 possible characters (I think they are alphanumeric only) * 30 GB (a lowballed figure for a rainbow table for md5 that goes up to about 16 characters, and has fairly short chains so that lookups are relatively fast) = 14.8 TB

          So, you need some serious disk space, not to mention the time it would take to generate all those tables initially, to break a salted md5sum. Interestingly, though, algorithms with larger keyspaces, like sha1, don’t need larger rainbow tables.

          Also, 14 TB will be entirely plausible in the near future. So, these attacks may become more common, and longer salts may be needed. I believe sha-512 uses a 16-byte salt, which would mean an attacker would need 30 TB of storage to store their rainbow tables.

          1. WJS says:

            You have a major error in your calculation here; the number of possible salts isn’t 8 * 62, it’s 62 ^ 8. Multiply that by 30GB and you get about 6.5 petabytes. Increase your salt length to 16 bytes, and you’re looking at over a million yottabytes. Safe to say that won’t be within anybodies ability for a while…

    5. dman says:

      I’d like to observe that the entire concept of passwords is – at its core – “security by obscurity” :-)

      The entire security layer is based on me knowing something the intruder doesn’t.
      Unlike having a physical key or performing biometrics (assuming either of them actually work as promised), having a password or a keycode or a PIN is all just about having a secret.
      Secret codeword, salt or secret hashing algorithm, lose either and a layer of security is ripped open.

      If the “security by obscurity is no security” mantra is taken to its logical conclusion, then it is saying that “having a secret that is no longer secret once someone knows it” is useless.
      Which is pretty much a tautology … but if applied to passwords … would successfully undermine the whole idea.
      And – maybe it does, when you think about it.

      So, security by obscurity has its place – so long as it is sufficiently obscure :-B

      1. WJS says:

        Most physical keys aren’t really very secure. A typical lock might have 5 levers with 7 possible depths each. That’s under 17 thousand unique keys (there are probably many people in your town with the same key as you unless it’s a really small town). An experienced locksmith (or “locksmith”) can probably guess the code just by looking at it, and that’s enough to have a duplicate made. That’s an entirely different attack model, obviously, but the idea that physical locks are substantially more secure than encryption is just fantasy. (Although not being internet-facing obviously helps a great deal, not all encrypted systems are online either)

    6. Alexander The 1st says:

      “That is why GnuPG is secure, and Skype isn't. Because if you want to compromise GnuPG (or PGP “” the algorithm is also published properly) you gain nothing by knowing the code, because the secret is in the data, and it cannot be recovered except by knowing the secret. And that cannot be recovered without knowing the passphrase for the secret.”

      So…That just means hackers go straight for the code, find out where it accesses the passphrase, trace it through your code to find the secret, and then use the secret to access the data. A professional hacker won’t be stalled long by this. If they literally can’t access the source, then they don’t even know how to start. If they don’t know you how you start the process, they don’t know where to start.

      If you want a truly secure system, you make it difficult for the hacker to even start the process. No matter how secure your system is – it’s a matter of time; the more you can force it to take longer to hack, the better.

      1. anna says:

        Except that’s not how it works. Your ‘secret’ is not stored anywhere in the code. It is stored in the brain of the person using gpg – the passphrase that decrypts their private key.

        The reason that open standards and source works better for security is this: attackers can still see the closed source code – they may have to disassemble the binary, but it is very possible. If everyone can see your algorithms, though, everyone can help find and point out the flaws in them – before the attackers. This is, more often than not, exactly what happens.

        To sum up the principle differently: I can only create an encryption algorithm that is as smart as I am; my enemy may be smarter. So, the more smart people I show my algorithm to, the better chance I have that my algorithm is strong.

  13. evilmrhenry says:

    I’m throwing my hat into the Hashing explanation ring.

    To start, you can just store the password in plain text. The downside is that any hackers will end up with everybody’s passwords if they ever break in. Don’t do this.

    The next step up is to hash the password. The weakness here is that Rainbow Tables exist, which are simply a mapping from hash values back to potential passwords. (Made by computing the hash values for all strings up to N characters, then storing the results.) This results in a little more work for hackers, but isn’t that much more secure.

    The next step up is to salt the passwords. Create a random string, store it on the site, then store the concatenation of SALT and password. This is significantly more secure than a plain hash, but the hackers can still make a custom rainbow table for the site, even if they don’t know the salt. (How? Create an account on the site before hacking it, then brute-force it, using the known password to help.)

    The next step up is to create a different salt for each password. That means you store SALT and CONCAT(SALT, password) in the database for each user. This is the first level of security where rainbow tables become useless. If a company is using a level of security lower than this, I’m willing to call them incompetent.

    This can also be enhanced with a global salt as above, so CONCAT(SALT, GLOBAL_SALT, password) is stored instead. This doesn’t increase real security all that much (see above), but can make things even more annoying to hackers, and isn’t that hard.

    If done properly, the only thing a hacker can do with a stolen user database is to brute-force each password individually. If the password is “12345”, the hackers will still find it, but secure passwords will require significant processor time.

  14. evilmrhenry says:

    You could derive the salt from some other source, and you can do more than just add it. Maybe use the timestamp of when the account was created, and make another MD5 hash out of that. Then combine it with the hashed password in some unexpected way. (Like, more unexpected than simply “add them”.) Maybe take the resulting answer and crank it through the same mathematical process a few times. Just make it a process you can reverse when the user logs in and you have to compare the passwords.

    Notes: Grabbing other data for use in the hash is just a variation on using a salt.

    Using combining methods other than concatenate is technically security through obscurity. It doesn’t increase real security, but can be nice to have when used in addition to real security.

  15. Amarsir says:

    Moving back to the issue of credit cards though, a one-directional encryption wouldn’t be useful. The whole point is for them to pass on the card so you don’t have to re-enter it. That certainly doesn’t justify plaintext, but it means hackers just need to figure out the technique not reverse the unreversible.

    1. Jabor says:

      Ideally you’d use some form of public-key encryption for this, with your webservers knowing how to encrypt credit card numbers, but only the payment processor knowing how to decrypt them again.

      I’m not sure whether any payment processors actually support this sort of thing, but it does seem like an obvious way to go.

      1. Aquarion says:

        Bit late to this, but for the readers in the future:

        All the payment processors I’ve used for recurring credit card systems tell you not to store the details you get from the user for the card (Number, expiry, etc.) but forget them once sent (we store the last four digits for “Oh, it was *that* card” identification purposes. This is because the first four digits identify the bank it came from, which is less useful). Then when you submit the first payment they give you back an abstract ID, which you then use to authenticate future things. All over SSL.

        This is kind of medium-tier stuff. Low tier you redirect your users to the Payment Processor’s site and they redirect you back afterwards (a-la paypal). Medium tier you do this kind of thing. Higher up you can start doing it more directly.

  16. MrWhales says:

    So instead of all this nerd-gasm nonsense. I know this would be rather silly, and possibly fool-hardy in large-scale, to implement, but what if a website were to forget this hashing and salt nonsense, and took a password and coded it.(I can’t remember if that is the word for it, and sorry if that is a pun to anyone) Like A’s are changed to 100, and a’s are 101. I’m sure you could spice up the numbering, take the before example and change the respective numbers to 4 and 6554, maybe. And then store those. Then have a seperate script run through that when password attempts are made. I still don’t think i would give passwords back on the page, email them like it is now, but surely this system could work with tweaking?

    1. Jabor says:

      If someone compromises your database, you have to assume they’ve got your source code as well. If having the source code makes it trivial to retrieve passwords, then it’s a weak password system and there are plenty of good ones out there.

      Ideally, your password verification should be easy to check if a password matches, but very difficult to find a correct password given just the verifier. Actually, it should be moderately difficult to verify (password verification isn’t generally a bottleneck, so you can slow it down with significantly impinging on performance), because that makes it all-but-infeasible to brute-force.

    2. LintMan says:

      Basically, you’re saying they should have a secret code that they store the passwords in. The problem with this is that it purely relies on the secret code remaining secret: “security through obscurity”.

      Even if your encoding technique remains a secret, the code can still likely be cracked. Particularly if the hackers have created a few acccounts with known passwords. And in general, code breaking is a thing computers have excelled at since the days of WWII.

    3. Amarsir says:

      Another problem with a custom algorithm like that is how unique the result ends up being. As stated, sometimes different passwords can provide the same hashed value but it’s a low occurrence. However, let’s take a simplified version of that algorithm:

      A = 11
      B = 12

      K = 21

      n = 111
      o = 112

      Now I use the algorithm to change characters to numbers and concatenate them, and the result is “112111”. Is that a “AKA”, or “on”? In truth it would match both. Now you can say you’d pick harder numbers, but you have to consider all possible combinations too, and you don’t want to create a situation where there are hundreds of passwords that match because not only is it irreversible, it’s thereby easier to brute force.

      While there’s value in “security by obscurity”, most custom-made algorithms are not as secure as the major ones. So the best technique is a mix of both.

      1. MrWhales says:

        Well, there a 100 numbers in each digits, surely something could be worked out to fit the allowed characters in there. And /maybe/ have your sequence-codes change every-so-often, randomly ideally i’d think. The idea for random switches would be to throw off anyone who breaks in and just watches to see how things change and work it back that way.

        I do realize this is a somewhat weak system the more i think about it, but it would be a rather interesting “experiment”, have a handful of sites switch to some system like this, and see if it acts as a more meta way of salt. After all, even if they ended up resolving brute-forcing their way through, theres a chance they use the wrong method and have to do the other(s). It’s more of a delay, but i’m sure that anybody will agree that if you have more time, you could close the holes in your system and get passwords changed.

        1. Alexander The 1st says:

          It could be a potential opening encryption, then salt and hash as second pass over it, then hash overtop of that, perhaps.

          Then, it’s difficult to know when they have the right password.

          To solve the 112121 stuff, you would just use a leading zero in the database per small character – i.e. “010” instead of just “10”. Then you can use “111”, and the only rule is to parse by each three characters. Also, don’t formalise a standard for this that all systems would use – let it be determined as an original setup per company. As this would be something that would require it to always be standardised per database, require that the source code that accesses it to hold the hardcoded values of translation, INSTEAD of putting the values in the database.

          Now, when they *do* hack you table, and even if they DO hash it, they are left with an extra level of protection which, if not worked around by looking at the source, can only be used to hack that particular service.

          EDIT: Depending on database size, you can do more by making them larger values.

    4. Aldowyn says:

      That’s a form of encrypting, I think. The problem with that is you basically have a “key” in storage, and they only need to find each character once in a single password to figure out what it is in any other password.

      You could basically do something that follows the same principle as the individual salting, but that would require a different key for every user, which would be prohibitively inefficient A salt is, what, a byte? The key would be a heck of a lot more than that, per user. Hashing and salting is much faster and works better anyway.

      The point is hashing and salting, done right, isn’t that hard to do and ridiculously hard to crack. Case in point: All the hackers got were the hashed passwords, so if they were WELL-hashed passwords, they don’t.

      If I’m wrong, please tell me. I’m not a programmer, so I’m not quite sure if I screwed something up, and I’d appreciate being corrected.

      *edit* Not a programmer YET.

  17. RCN says:

    That was an interesting read.

    You did better than you give yourself credit. I really just understand the basic of basic of programing, but this made sense to me.

    Good to know that the passwords aren’t that up in the air for picking. But if Hashing is so effective and nearly impossible to reverse how come so many World of Warcraft accounts and passwords are stolen?

    1. Shamus says:

      The passwords are hashed on the server, but of course the end-user enters it in plain text. In aggregate, users are careless, clueless, and reckless. Most password theft is done by attacking the user’s machine and nabbing the password when they type it in.

      1. mneme says:

        Oh, yes, how could I forget–key loggers are of course a very useful tool (for criminals).

      2. Zak McKracken says:

        That’s something I’ve been wondering about:
        If I enter a password for a website or something, isn’t the password also being transmitted unencrypted? As long as the site isn’t using https, that is.
        So the classical man-in-the-middle would just be able to snatch it, wouldn’t he?

        1. Shamus says:

          To my knowledge, yes.

        2. MrPyro says:

          Yes, when you enter your password into a web-form, it is sent as plain text; the hashing will occur on the server side. As you said, https (or the SSL encrypted version of whatever protocol you are using, like POP3S, IMAPS, etc) will protect against this, but otherwise that information is available to sniffers.

        3. Abnaxis says:

          I heard on the radio recently, that there’s actually a popular iPhone app for that. You take it into an internet cafe and anytime someone logs into a site while not using an hhtps connection, it will automatically pick it up and store their username and credentials. Forgot the name of it, though…

        4. Aquarion says:

          Yup, exactly that, which is why logins to anything important should happen over HTTPS.

          There are some neat systems that use one-time salts, sessions and javascript to hash passwords on the local browser before they get sent, but in general it would be both more reliable and quicker to buy a cheap HTTPS cert and force logins (at the very least) to go though that.

          For more stories on how this is exploited, google firesheep.

    2. mneme says:

      I don’t know of WoW ever getting hacked.

      That doesn’t stop people from brute forcing or social engineering a passowrd out of someone — or doing what people tend to fear happening with things like the PSN issue — cracking passwords and then using them to hack other accounts by the same users on other sites.

    3. Dee_Dubs says:

      “Hi, I’m Mike from Blizzard support. We think there is a problem with your account and we need you to verify your username and password.”
      Or something to that effect. You may laugh, but there are a surprising number of people who would fall for something as simple as this.

      You can make the data you store as secure as you like, but there is almost always going to be a weak link in the form of a user. They write their password down or give it to someone who shouldn’t have it, and next thing you know your account is in someone else’s hands. Social engineering is so much more effective than a brute force attack.

      1. K says:

        They’ve gotten a lot more clever than that obvious one.

        “Dear Customer, for reasons of security, we ask that you review your account details / your credit card has expired / … [link]”

        And then the link points to something like worldafwarcraft.com (notice the typo) and has an identical copy of the login page on it, which of course works. They just snag the password you type in and ignore your other inputs, and then log into your real account with that.

        Simple way to not fall for it: When you get a mail, go to the webpage by using your own bookmarks. Never trust links in a mail, anyone can make a link for anything.

    4. Khizan says:

      Generally speaking, when somebody from WoW or Rift or whichever internet game says “My account got hacked”, they’re using that as shorthand for “I did something stupid and compromised my account security.”

      Sometimes it means they fell to a phishing scam(fake emails, tells from “BIizzardstaff” sending them to “battIe.net”, etc). Sometimes they’ve been infected with a keylogger that catches their account name and password as plaintext when they’re entering them. Sometimes it means they’ve used the same name/password combo elsewhere(wow fan site, etc) and that site wasn’t secure. Sometimes it means they bought gold or used a powerleveling service

      What it pretty much never means, though, is that their account was truly “hacked”.

      1. RCN says:

        Yeah, that explains a lot. I did receive a lot of e-mails like this regarding my account but they’re so obvious I never thought they were actually successful.

        Although my sister did follow one of those links once (she also has an WoW account), but even then she got a security warning that it might be a phishing site so she backed away and asked for help.

        1. WJS says:

          If those emails never worked, they wouldn’t keep sending them. The same with annoying adverts; someone, somewhere, is clicking on them. If only we could find him and make him stop.

  18. andy_k says:

    edit – this pretty much paraphrases evilmrhenry’s comment. sorry.

    When you generate a hash of the password, you cannot get back to the original, because as Shamus pointed out, MD5 is generating a number that is 38 digits long based on you hash. We will call this number eleventy hajillion, or H for short. Since there are infinite possible passwords (and infinity is bigger than H) then, in theory, some passwords will end up with the same hash. This, in the jargon, is called a collision. But since H is really quite large, the number of passwords you have to have to get a collision is a number that is only slightly smaller than H, and so billions of billions of billions.

    If you restrict password length, to say, 1000 characters typable on a keyboard, you can probably make the risk of collision so close to 0 as to not notice it. So it is not a valid risk for a real hash. As hashes get larger (MD5 is 128 bits, SHA allows up to 512 bits) the possibility of collisions gets smaller.

    ‘Rainbow tables’ of passwords and their corresponding hashes allow you to look up the password if you have the hash. This is a trade off – you hash every password you can think off and stick it in a table. Then, when you get the hash you look to see if it is in the table, and if so, you have the password. Yay! (Or Boo if it was your password and you didn’t want people to know it)

    This is why you are encouraged to use a password that is long and complicated. Making a rainbow table for MD5 using between 1 and 4 alphabet characters only takes a few minutes and is only a few hundred meg long, so is very easy. 8 characters pushes the table into gigabytes, and hours to generate, 16 into petabytes, with days to generate, and so on.

    By adding a salt to the password, even if you advertise what the salt is, all existing rainbow tables (and there are a bunch on the intertubes) are less useful since they were generated without the salt added to the password. And that is a good thing for the user (bad for the crackers.)

    Edit: As evilmrhenry pointed out above – custom salts on a per user basis make life even more difficult.

    Another simple way to make life even more difficult for intruders is to take a hash of the password (with a salt added to the password) then take a hash on that.

    Now generating rainbow tables becomes a tad bit harder, the rainbow tables will take twice as long to generate (since you have to hash everything *twice*) and the resultant rainbow table will only be useful for your system.

    At this point, even if the user database is compromised, considerable effort will be required to compromise passwords. Ofcourse, by this time the user database has been compromised, so it may be a little bit late to do much. One case where this is useful is for ensuring that someone is who they say they are, since even if the user databsase is compromised the password will remain relatively difficult to crack.

    In this case though, using a public / private key system might be a better fit though.

    1. WJS says:

      I really doubt that eliminating collisions has anything to do with password length limits. Password collisions aren’t that big a deal. If someone is brute-forcing a site, they’re going to try my actual 20 byte password before a hypothetical 20MB one that happens to hash to the same thing.
      As I note above, you probably have the same front door key as several other people in your town, but that isn’t considered a security risk because there’s no attack associated with it. (I don’t consider a guy walking down the street trying his key in every lock hoping it fits a serious threat)

  19. Orteil says:

    Crypto looks like an interesting topic (it better be, being what launched computers and the Internet during World War II). When I was in high school, dabbling in PHP, I wrote a small text MMO; the user passwords were stored as raw strings. I know this is awfully irresponsible but I couldn’t resist peeking in the database to see my friends’ passwords. Of course, they knew me well, so half of them were insults. Good times…

    Could you post something on cyphers too ? Like, reversible hashes with a key and stuff. I’ve had to write my own cypher algo for a game project recently (to store the game levels), and I’d love to learn how bad were the mistakes I made.

    1. Alexander The 1st says:

      “I know this is awfully irresponsible but I couldn't resist peeking in the database to see my friends' passwords. Of course, they knew me well, so half of them were insults. Good times…”

      And the other half you used to access their Facebook accounts today, right?

  20. Eltanin says:

    This whole thread makes me happy. Yay for security wonks!

    That is all.

  21. froogger says:

    This post and all its relevant replies thrills my left brainhalf to no end. It’s typically one of my favorite things about this blog – geekpower!

  22. Joris Wildenbeest says:

    I wish I didn’t know this: the other day I had forgotten my Nintendo Club password and clicked the “forgotten my password” link. Voila, I get an e-mail with my password in bright bold purple letters. Good to know Sony isn’t alone in terrible security measures.

    1. Alexander The 1st says:

      Er, Nintendo’s security is worse then. Sony Online actually had *hashed* passwords, which wouldn’t allow a recovery.

  23. WJS says:

    I just want to say that I love reading Shamus talk about stuff even when I actually know the topic a lot better than he does (he obviously screwed this one up pretty badly, but it was still a fun read and still close enough to give a layman an idea about the basics of authentication – particularly how a website telling you your password if you forgot it is a bad bad bad thing)

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Nick Bell Cancel reply

Your email address will not be published.