Secure password hashing: requirements and design choices

This is the promised follow-up to my first post on secure password hashing. We now focus on the security requirements and design choices, which will be of interest to designers of hashing schemes for the upcoming Password Hashing Competition (PHC). Some elements of this post will also appear in my talks at PasswordsCon and Black Hat 2013.

Cryptanalytic security

The foremost requirement of a cryptographic function is security. Although password hashing schemes (PHS) differ from general-purpose cryptographic hash functions, they do qualify as cryptographic functions because they protect secrets and thus have to satisfy a number of security requirements.

Following the API function required in the PHC call, a PHS should at least take as input

A password encoded as a string of variable bit length p
A salt of fixed bit length s

and it should produce a hash value, of h bits.

Ideally, a PHS H() should behave as a random function from {0,1}^n+s to {0,1}^h. This means that it should not have any particular “structure” and in particular that

Finding a collision takes approximately 2^h/2 evaluations of H()
Finding a preimage of a random h-bit value takes approximately 2^h evaluations of H()
Finding an input such that output has a specific pattern is no easier than for an “ideal” function (like Keccak or BLAKE2); an example of such a pattern is given by the CICO problem

For designers this means that the password and the salt do not play different roles in a PHS. One may thus create an algorithm that takes as input a single string, and then sets this string to a concatenation of the password and salt with proper domain separation between the two and non-ambiguous encoding of each value.

The hash value h doesn’t have to be extremely long; it just have to be long enough. 128-bit hashes are by far sufficient, because the security is not only defined by the hash length but mainly by the distribution of passwords (whose entropy is typically much lower than 128). The job of attackers is then to define a sampling method as close a possible to the real distribution.

Leakage resilience

Adding features often means processing more information and doing more operations, which in turns increases the attack surface. For example,

If passwords of arbitrary length are supported, the execution time has to depend on the password’s length, whereas a hash supporting only (say) 128-byte passwords can be constructed as a single compression function rather than an iterated construction.
If a large amount of memory is used to make unpredictable—that is, input-dependent—accesses (both reads and writes) as an attempt to minimize the return on investment of attackers, cache-timing attacks methods may leak information on the password.

In both cases an attacker monitoring execution times (such as another unprivileged user sharing the same machine, or another VM on a same host) may obtain information on the password, especially if he can affect the CPU and/or shared memory load.

A simple countermeasure to mitigate cache-timing attacks is to first pass the password through a low-memory one-way transform, so that an attacker may at best recover information on that intermediate hash. However this doesn’t really solve the problem if that hash can be used as a reference for dictionary attacks.

As explained in the FAQ of PHC, memory “hardness” is not only driven by the total storage required but also by the number of reads (loads), the number of writes (stores), the size of data blocks read and written, the order of addresses accesses, the predictability of addresses, etc. All these factors may be combined in a way that maximizes the penalty for attackers and minimizes the impact on the defender.

Parallelism

The password cracking problem is clearly embarassingly parallel, which means that the speedup scales linearly with the number of cores available. In other words, using N cores makes cracking N times as fast as with a single core. If defenders and attackers use the same hardware with one core per defender and many per attacker, a parallelizable hash offers several trade-offs to the attacker but does not provide a fundamental benefit: if a password hashing scheme supports N-way parallelism, then with N cores an attacker can either use one core per instance or N cores per instance but the overall throughput (in terms of hashes per unit of time) remains constant, as long as the evaluation of the PHS uses negligible memory.

Now assume that the defender has several cores available (server CPUs can have up to 8 or 16 physical cores), and imagine two password hashing schemes:

Algorithm A cannot be parallelized, and the server uses a single core of its 8-core CPU to hash passwords (since using 2, 3, or more cores would not speed-up hashing).
Algorithm B can be parallelized such that with 4 cores it is 4 times as fast as with a single core. The server takes advantage of this by using all 4 cores of its Haswell CPU to hash a password as fast as algorithm A but making much more operations per unit of time.

Let’s look at the attackers: with N cores they evaluate N instances of algorithm A in parallel but only N/4 instances of algorithm B in the same amount of time.

Of course reality is more complicated: attackers don’t use the same “cores” as defenders; allocating several cores of a busy server to password hashing is not necessarily a good idea; parallel evaluation of algorithm might require concurrent access to shared resources; etc. Different applications have different requirements. For example, applications with a low rate of hashes such as full-disk encryption or protection of private keys can generally afford using more than one core.

Extreme approaches

We conclude with some ideas that going beyond “being slow” and “using large RAM”.

Programmable hashes

This is an idea that I discussed with Samuel Neves but other people probably have thought of it before.

The idea is to make the program of a PHS depends on the password and/or on the salt hashed. For example with credentials veorq:12345678 the PHS calls AES with custom S-boxes as underlying primitive, whereas with veorq:87654321 it uses a different version of MixColumns and other S-boxes, and with Adr13n:BIND it’s a completely different algorithm similar to Salsa20. That is, each unique username:password pair would lead to a unique hashing algorithm.

The goal of this approach is to defeat ASICs (that have to hard-code the algorithm) and also FPGAs and GPUs (which would take significant time to reconfigure with optimized code). This may be realized by generating a pseudorandom (yet consistent) sequence of CPU instructions, preferably optimized for the defender’s platform (for example, AVX2 instructions for Haswell CPUs). Generally, this approach can be seen as the production of pseudorandom bytecode for a custom virtual machine (or custom processor). It may dramatically reduce the effectiveness of GPUs and FPGAs while requiring only minimal memory on the defender’s side.

Issues with this approach include

The expected complexity of the scheme constructed,
The difficulty of building programs of consistent performance accross platforms and inputs,
The dependence on a specific processor architecture.

Nevertheless, this approach may prove useful in specific environments, for example where a custom VM or processor are readily available.

Security through obesity

The name comes from Reddit but the idea is due to Jeremy Spilman, and has been insightfully commented by Ehan Heilman.

The basic idea of security through obesity is to pollute the hash database with dummy entries indistinguishable from those corresponding to real users, and not storing usernames (aka “blind hashing”) along with hashes in the database. Although this makes collisions possible, with long enough hash values “servers are more likely to be hit by meteors than someone ever successfully logging in with the wrong password.” To compromise a given user, an attacker thus has to check candidate hashes against any of the hashes stored, which significantly slows down dictionary attacks.

Jeremy proposes to use databases as large as a terabyte, which has the direct effect of slowing down the download of the data, as discussed by Ethan: he gives the example of downloading a 1 TB database, which takes approximately 44 hours with a T3 line (about 50 Mbits per second).

ROM-port-hardness

ROM-port-hardness is an idea by Solar Designer (from John the Ripper) inspired by scrypt‘s notion of sequential memory-hardness and by security through obesity. The same document also discusses some issues with blind hashing, like the possibility of creating more compact data structures to represent a large database

The idea of ROM-port-hardness is to require a huge array whose content does not depend on the username or password, and that can thus be stored in non-volatile memory, such as SSDs. This forces the hash function (and thus attackers) to store gigabytes of data even if they might access only a handful of values. The attacker’s cost per password is thus in the number, bandwidth, and latency of memory accesses.

Honeywords

Proposed by Ari Juels and Ron Rivest, the idea of honeywords is to produce hashes in such a way that several meaningful preimages exist: not only the genuine password registered by the user but also “fake” passwords that will serve as baits to inform the server that hashes have leaked (and have been attacked). Ari and Ron propose a number of techniques to securely implement the idea, which rely on a “honeychecker” server to distinguish genuine from fake passwords.

As argued by Per Thorsheim and others, challenges are the generation of fake passwords indistinguishable from genuine ones (which may be straightforward if details of the method are disclosed to attackers), the risk of DoS by deliberately using fake passwords, or the increase in memory requirements.