Real Hash Functions: Structural Weakness

Let's learn about real-time hash functions and their security in this lesson.

Structural weakness

The main problem of the hash functions based on Merkle-Damgärd construction is that they’re all vulnerable to length extension attacks. As the message xx is split into kk blocks x=x1,x2,,xkx=x_{1}, x_{2}, \ldots, x_{k} (whereas xkx_{k} contains the padding) and then hashed to the value h(x)=Hkh(x)=H_{k}, we can choose a message x=x,xk+1=x1,x2,,xk,xk+1x^{\prime}=x, x_{k+1}=x_{1}, x_{2}, \ldots, x_{k}, x_{k+1}. Since the first kk blocks of xx^{\prime} and xx are identical, the final hash value of xx, i.e., H(x)=HkH(x)=H_{k}, is identical to the internal hash value of H(x)H\left(x^{\prime}\right) after kk blocks. Because of the iterated formula Hi=g(Hi1,xi)H_{i}=g\left(H_{i-1}, x_{i}\right), we can conclude that H(x)=g(Hkxk+1)=g(H(x)xk+1)H\left(x^{\prime}\right)=g\left(H_{k} \| x_{k+1}\right)=g\left(H(x) \| x_{k+1}\right).

The fact that H(x)H(x) provides direct information about the inner state of H(x)H\left(x^{\prime}\right) after kk blocks are highly problematic. Suppose Alice and Bob communicate with each other, whereas Alice sends the message xx to Bob and wants to authenticate it by sending H(Kx)H(K|| x), where KK is a secret key that is only known by Alice and Bob, and xx is a known message. If HH was an ideal hash function, H(Kx)H(K|| x) would be a proper authentication system. But since HH is prone to length extensions, Eve is now able to extend the message to xxx \| x^{\prime} and forge the authentication code to H(Kxx)H\left(K\|x\| x^{\prime}\right) without knowing the secret KK.

In order to make length extension attacks impossible, Ferguson and Schneier (2003) proposed the following fix for the hash function:

Iterative hash function

Let HH be an iterative hash function. The hash function HdH_{d} is defined by Hd=H(H(x))H_{d}=H(H(x)) and has the security level of min(k,n/2)\min (k, n / 2), where kk is the security level of HH and nn is the size of the hash result.

Bitcoin uses this fix as they use double-SHA256 as their Proof-of-Work function.

Security of real-life dedicated hash functions

Today’s practical implementations of hash functions are based on dedicated hash functions, which are widely in use, such as MD5 or SHA-2. In this section, we’ll give a brief overview of the most common hash functions and their security (Mohammad A. AlAhmad et al. (2013)Mohammad A. AlAhmad and Imad Fakhri Alshaikhli. Broad view of cryptographic hash functions, 2013.).

The MD4 hash function was designed by Rivest in 1990Ronald L. Rivest. The MD4 message-digest algorithm. In Proceedings of the 10th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO’90, pages 303-11, London, UK, 1991. Springer-Verlag.. The first collision was found by Dobbertin (1996)Hans Dobbertin. Cryptanalysis of MD4. In Fast Software Encryption, Third International Workshop, Cambridge, UK, February 21-23, 1996, Proceedings, pages 53-69, 1996. with a complexity of 2202^{20} MD4 hash operations. This result was further improved by Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag., who reduced the complexity to 282^{8} hash computations such that collisions can be found in a few microseconds. Because of these attacks, the MD4 is no longer collision-resistant. Leurent (2008)Gaëtan Leurent. Fast software encryption. Chapter MD4 is Not One-Way, pages 41228. Berlin, Heidelberg, 2008. Springer-Verlag. demonstrated the first preimage attack on MD4 with a complexity of 21022^{102}, which is far below the expected 21282^{128} computations (see this definition :Security_Goal_of_Hash_Function ).

MD5 is a further development of MD4 and was presented by Rivest in 1992Ronald L. Rivest. The MD5 message-digest algorithm, 1992. Available at https://www. rfc-editor.org/rfc/rfcl321.txt.. Den Boer and Bosselaers (1993)Bert den Boer and Antoon Bosselaers. Collisions for the compression function of MD5. In Tor Helleseth, editor, Advances in Cryptology - EUROCRYPT '93, pages 293-304, Berlin, Heidelberg, 1994. Springer. have shown collisions in the underlying compression function, whereas Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg, 2005. Springer-Verlag. introduced the first efficient practical collision attack on the full MD5 hash function. Further improvements by Klíma (2005)Vlastimil Klíma. Finding MD5 collisions - a toy for a notebook. IACR Cryptology ePrint Archive, 2005:75, 2005. have shortened the search time for full collisions significantly, i.e., down to less than 4 hours using an ordinary notebook PC (Intel Pentium 1.6GHz1.6 \mathrm{GHz} ). This attack was developed by Stevens (2006)Marc Stevens. Fast collision attack on MD5. IACR Cryptology ePrint Archive, 2006:104, 2006., such that collisions could be found in only minutes using a 3GHz3 \mathrm{GHz} Intel Pentium 4. A further improvement by Stevens (2007)Marc Stevens. On collisions for MD5. Master’s thesis, Eindhoven University of Technology, 62007 . allowed them to find collisions in about 2242^{24} compressions such that an attack takes approximately six seconds on a 2.6GHz2.6 \mathrm{GHz} Intel Pentium 4. Furthermore, Sasaki and Aoki (2009) proposed a preimage attack with a computational complexity O(2123.4)\mathcal{O}\left(2^{123.4}\right).

The most popular dedicated hash functions from the Secure Hash Algorithm (SHA) family were published by the National Institute of Standard and Technology (NIST) as a U.S. Federal Information Processing Standard FIPS PUB 180X180-X, including SHA-0, SHA-1, SHA-2, and SHA-3. SHA-0, SHA-1, and SHA-2 rely on the Merkle-Damgård construction and were designed by the United States National Security Agency (NSA). SHA-0 was released in 1993 but was later replaced by SHA-1 in 1995 because of a significant weakness due to a design flaw, which is why SHA-0 plays no role in practice.

SHA-1 was released in 1995 as a 160-bit message digest function. However, many standards recommend it to be replaced after 2010 by SHA-2 or SHA-3, as Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag. proposed the first theoretical collision attack on SHA-1 that required 2692^{69}, and thus less than the expected 2802^{80} hash computations.

In 2017, Google and the Cryptology Group at Centrum Wiskunde & Informatica (CWI) Amsterdam announced the first practical collision by generating a hash collision for two different PDF documents, as shown in the figure given below. The attack requires roughly 263.12^{63.1} hash evaluations, which equals approximately 6500 single-CPU years. The corresponding PDF files can be found on the official project homepage.

Figure 1

The two PDF documents yield a collision for SHA-1, but not for SHA-256

Get hands-on with 1400+ tech skills courses.