Fred Cohen *

In this paper, we examine the two major cryptanalyses of a fast cryptographic checksum algorithm used for maintaining the integrity of files in an information system. The core of those attacks appears to be the non-uniform distribution of intermediate variables, resulting in a reduced dependence on earlier plaintext blocks, a statistical attack on some bits of the key, and susceptability to an adaptive chosen plaintext attack. We attempt to shore this system up yet again through the use of pseudo-random plaintext scheduling and an additional last step to rejoin all of the intermediate values into the final result. Although we believe that this step succeeds in fending off the present attacks, we are by no means certain of the future of this system, and suspect that the fundamental weakness created by this non-uniformity may be unavoidable in any such system.

* This work was funded by a research grant from Advanced Software Protection, PO Box 90069, Pittsburgh, PA 15213

The increased use of electronic transmission and storage of information has given rise to a new requirement for security and authentication. For the purpose of integrity, a cryptographic checksum is usually associated with the transmitted or stored information. When information is modified unintentionally or maliciously, such changes have a high probability of being detected after computation of the cryptographic checksum. In principle, the cryptographic checksum can be implemented using the block chaining mode of a strong block cipher system such as the RSA [1] or DES [2] to cipher the plaintext. The final enciphered block (or a part of it) is used as the checksum.

The main problems with these checksum schemes are that they require additional hardware, or in the absence of additional hardware, a large amount of time. As a result, some faster checksum schemes have been proposed [3,4]. These schemes employ functions to compress the plaintext in consort with cryptographic transforms to dramatically reduce computation time while providing a high degree of assurance against forgery. A problem can arise if the compressive functions are not well considered in that an attacker may find a weak point which can be exploited to forge an illicit modification.

In this paper, we cryptanalyze the scheme found in [3], point out two potential drawbacks with the proposed scheme, and propose a modification to overcome these problems.

The scheme proposed in [3] is as follows:

Step 1: Select an RSA key K=(ke,M) and a user specified seed S Step 2: Set Q=S Step 3: Set Q=RSA((Q XOR filename),K) Step 4: Set Q=1+[C(B) mod (Q-1)] where: B is the next block of the plaintext file P C(B) is the compressed version of B Step 5: Set Q=RSA(Q,K) Step 6: If not done, goto 4 else, Q is the checksum

As indicated in [3], there is no reason to keep K secret. Since the deciphering key (kd,M) is left ungenerated and the RSA is quite strong against all known attacks which attempt to derive kd from ke and M, no one can attack the scheme by deciphering the RSA using current results. Unfortunately, the algorithm as it stands can be attacked in another way.

We can see from Algorithm 1 that each result of an RSA is taken as the modulus of the next block of text. Because the seed is secret, each intermediate modulus is unknown to the forger, but the final modulus is left unguarded because it is used as the checksum. Hence a forger can append to the plaintext and generate a new valid checksum as follows:

Step 1: Assume the checksum of the original plaintext P is Q Step 2: Choose any block B and append it to P Step 3: Apply Step 4 of Algorithm 1 to B to produce a new modulus Q' Step 4: Replace the stored copy of the checksum with Q' Step 5: Repeat Steps 2-4 as many times as desired

Since the only knowledge used to generate checksums for subsequent blocks is the final result from the previous blocks and non-secret information, forgery is straight forward. Depending on implementation, this attack may only work when the final block of plaintext ends on an even block boundary. Although this reduces the probability of successful attack to the likelihood of an even block sized file, this provides almost no protection relative to the original intent of the proposed method.

Even when the length of the file is not an even multiple of the block size, there remains a second attack on Algorithm 1 which may be much more devastating. When the value of the final block of P is less than the value of the corresponding modulus, the text can be forged and the attacker can readily tell when this attack will work. Detecting whether this attack will work is done as follows:

Step 1: Set Z=RSA(C(B),K) Step 2: If Z=Q, this attack will work

Since Z=Q, the modulus formed in Step 4 of Algorithm 1 is irrelevant to the final result of that block, and thus that block and all subsequent blocks of the checksum are unrelated to the original seed S. We can attack Algorithm 1 under these conditions by replacing the last block with whatever we choose, and replacing the stored checksum with the checksum generated from assuming RSA(C(B)) as the final value of Q.

This also means that ANY block with a low enough value for Z produces a checksum for all subsequent blocks that is unrelated to S. We can simply check each block of the program for a value of Z that is low enough to make it unlikely to be more than the modulus for that block. All further blocks are subject to attack as follows:

Step 1: Find a block with C(B) low enough Step 2: Replace all subsequent blocks of P Step 3: Perform Algorithm 1 for all subsequent blocks of P assuming that Q for the block in Step 1 is C(B). Step 4: Replace the stored checksum with the final value for Q

Finding a block in Step 1 turns out to be quite
straight forward. The maximum modulus is known to be M-1, the maximum
value that can be yielded by the RSA. If we assume that the RSA is a
good pseudo-random number generator [3], the a modulus has equal
probability of being any value between 1 and M. Thus the probability of
C(B) for any particular block being below the modulus for that block is
given by C(B)/(M-1) when C(B)

A simple improvement of Algorithm 1 would be to include the length of P in the checksum computation. This might prevent the first attack since we could not easily append to the file and produce an identical checksum. It would not however, defend against the second attack. The main reason that the second attack works is that a block can have a value C(B) less than the corresponding modulus Q. If this happens, the attacker can easily find all of the subsequent moduli. Furthermore, it is easy for the attacker to determine if and when this attack applies and to begin substituting blocks at that point.

Both attacks can be prevented by introducing a variable that is unknown to the attacker into the computation of C(B). Since the seed S is random and unknown to the attacker, it is a logical variable to use for this purpose. We propose the following change to Algorithm 1:

Step 1: Select an RSA key K=(ke,M) and a user specified seed S Step 2: Set Q=S Step 3: Set Q=RSA((Q XOR filename),K) Step 4: Set Q=1+[{C(B) XOR S} mod (Q-1)] where: B is the next block of the plaintext file P C(B) is the compressed version of B Step 5: Set Q=RSA(Q,K) Step 6: If not done, goto 4 else, Q is the checksum

In Step 4, we have changed C(B) to {C(B) XOR S}. The result is that the attacker cannot know {C(B) XOR S} without being able to invert the RSA, trying every possible S, or finding another weakness in the algorithm. Even though the attacker knows C(B), it is impossible to add a block as in the first attack to produce a valid checksum since this requires knowledge of S. Furthermore, S is no more at risk from exposure than it was under Algorithm 1 because it is still covered by an RSA encryption.

The attacker meets the same problem when attempting to replace a block B with another block B'. Since S is unknown, it is impossible to tell whether {C(B) XOR S} < Q, and thus it is impossible to know whether the RSA will be a factor in the computation of the next Q. There is no way to judge whether replacing this block will work or not without trying it. Since S is not available for these trials, we can never determine whether or not the attack will work.

We have identified a problem with a previous fast cryptographic checksum algorithm and provided an improvement that appears to resolve the difficulty. It appears that the new algorithm satisfies all of the constraints of the previous algorithm while eliminating all known attacks.

[1] R. Rivest, A. Shamir, L. Adleman, "A Method for Obtaining Digital Signatures and Public Key Cryptosystems", Communications of the ACM, V21#2 pp120-126 (Feb 1988)

[2] "Data Encryption Standard", FIPS PUB 46, National Bureau of Standards, Washington, D.C. (Jan 1977)

[3] F. Cohen, "A Cryptographic Checksum for Integrity Protection", Computers and Security, V6#6 (Dec. 1987) pp505-510

[4] R. Jueneman, "A High Speed Manipulation Detection Code", Proceedings of Crypto86, pp327-346, 1986