If you haven’t reinstalled recently, chances are you’re using MD5-based passwords. However, the password hashes you find in /etc/shadow look nothing like what md5sum
returns.
Here’s an example:
/etc/shadow:
$1$J7iYSKio$aEY4anysz.gtXxg7XlL6v1
md5sum:
7c6483ddcd99eb112c060ecbe0543e86
What’s the difference in generating these hashes? Why are they different at all?
Just running md5sum on a password and storing that is just marginally more secure than storing the plaintext password.
Thanks to GPGPUs, a modern gaming rig can easily try 5 billion such passwords per second, or go over the entire 8-character alphanumeric space in a day. With rainbow tables, a beautiful time–space tradeoff, you can do pretty much the same in 15 minutes.
MD5-crypt employs salting to make precomputational attacks exponentially more difficult. Additionally, it uses stretching to make brute force attacks harder (but just linearly so).
As an aside, these techniques were used in the original crypt from 1979, so there’s really no excuse to do naive password hashing anymore. However, at that time the salt was 12 bits and the number of rounds 25 — quite adorable in comparison with today’s absolute minimum of 64 bits and 1000 rounds.
The original crypt was DES based, but used a modified algorithm to prevent people from using existing DES cracking hardware. MD5-crypt doesn’t do any such tricks, and can be implemented in terms of any MD5 library, or even the md5sum util.
As regular reads might suspect, I’ve written a shell script to demonstrate this: md5crypt. There are a lot of workarounds for Bash’s inability to handle NUL bytes in strings. It takes 10 seconds to generate a hash, and is generally awful..ly funny!
Let’s first disect a crypt hash. man 3 crypt has some details.
If salt is a character string starting with the characters
"$id$" followed by a string terminated by "$":
$id$salt$encrypted
then instead of using the DES machine, id identifies the
encryption method used and this then determines how the
rest of the password string is interpreted. The following
values of id are supported:
ID | Method
-------------------------------------------------
1 | MD5
2a | Blowfish (on some Linux distributions)
5 | SHA-256 (since glibc 2.7)
6 | SHA-512 (since glibc 2.7)
Simple and easy. Split by $, and then your fields are Algorithm, Salt and Hash.
md5-crypt is a function that takes a plaintext password and a salt, and generate such a hash.
To set a password, you’d generate a random salt, input the user’s password, and write the hash to /etc/shadow. To check a password, you’d read the hash from /etc/shadow, extract the salt, run the algorithm on this salt and the candidate password, and then see if the resulting hash matches what you have.
md5-crypt can be divided into three phases. Initialization, loop, and finalization. Here’s a very high level description of what we’ll go through in detail:
- Generate a simple md5 hash based on the salt and password
- Loop 1000 times, calculating a new md5 hash based on the previous hash concatenated with alternatingly the password and the salt.
- Use a special base64 encoding on the final hash to create the password hash string
Put like this, it relatively elegant. However, there are a lot of details that turn this from elegant to eyerolling.
Here’s the real initialization.
- Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 chars), and “magic” the string “$1$”
- Start by computing the Alternate sum,
md5(password + salt + password)
- Compute the Intermediate0 sum by hashing the concatenation of the following strings:
- Password
- Magic
- Salt
- length(password) bytes of the Alternate sum, repeated as necessary
- For each bit in length(password), from low to high and stopping after the most significant set bit
- If the bit is set, append a NUL byte
- If it’s unset, append the first byte of the password
I know what you’re thinking, and yes, it’s very arbitrary. The latter part was most likely a bug in the original implementation, carried along as UNIX issues often are. Remember to stay tuned next week, when we’ll compare this to SHA512-crypt as used on new installations!
From this point on, the calculations will only involve the password, salt, and Intermediate0 sum. Now we loop 1000 times, to stretch the algorithm.
You will now have ended up with Intermediate1000. Let’s call this the Final sum. Since MD5 is 128bit, this is 16 bytes long.
The bytes will be rearranged, and then encoded as 22 ascii characters with a special base64-type encoding. This is not the same as regular base64:
Normal base64 set:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
Crypt base64 set:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Additionally, there is no padding. The leftover byte will be encoded into 2 base64 ascii characters.
- Output the magic
- Output the salt
- Output a “$” to separate the salt from the encrypted section
- Pick out the 16 bytes in this order: 11 4 10 5 3 9 15 2 8 14 1 7 13 0 6 12.
- For each group of 6 bits (there are 22 groups), starting with the least significant
- Output the corresponding base64 character with this index
Congratulations, you now have a compatible md5-crypt hash!
As you can see, it’s quite far removed from a naive md5(password)
attempt.
Fortunately, one will only ever need this algorithm for compatibility. New applications can use the standard PBKDF2 algorithm, implemented by most cryptography libraries, which does the same thing only in a standardized and parameterized way.
As if this wasn’t bad enough, the next post next week will be more of the same, but with SHA512-crypt!