Cure of Computer Viruses

Cure of Computer Viruses

Copyright(c), 1984, Fred Cohen - All Rights Reserved

Since prevention of computer viruses may be infeasible if widespread sharing is desired, the biological analogy leads us to the possibility of cure as a means of protection. Cure in biological systems depends on the ability to detect a virus and find a way to overcome it. A similar possibility exists for computer viruses. We now examine the potential for detection and removal of a computer virus.

Detection of Viruses

In order to determine that a given program 'P' is a virus, it must be determined that P infects other programs. This is undecidable since P could invoke the decision procedure 'D' and infect other programs if and only if D determines that P is not a virus. We conclude that a program that precisely discerns a virus from any other program by examining its appearance is infeasible. In the following modification to program V, we use the hypothetical decision procedure D which returns "true" iff its argument is a virus, to exemplify the undecidability of D.

program contradictory-virus:=
 {if ~D(contradictory-virus) then
    if trigger-pulled then do-damage;
 goto next;

Contradiction of the Decidability of a Virus "CV"

By modifying the main-program of V, we have assured that if the decision procedure D determines CV to be a virus, CV will not infect other programs, and thus will not act as a virus. If D determines that CV is not a virus, CV will infect other programs, and thus be a virus. Therefore, the hypothetical decision procedure D is self contradictory, and precise determination of a virus by its appearance is undecidable.

Evolutions of a Virus

In our experiments, some viruses took less than 4000 bytes to implement on a general purpose computer. Since we could interleave any program that doesn't halt, terminates in finite time, and doesn't overwrite the virus or any of its state variables, and still have a virus, the number of possible variations on a single virus is clearly very large. In this example of an evolutionary virus EV, we augment V by allowing it to add random statements between any two necessary statements.

program evolutionary-virus:=
subroutine print-random-statement:=
 {print random-variable-name, " = ", random-variable-name;
 loop:if random-bit = 0 then
    {print random-operator, random-variable-name;
    goto loop;}
 print semicolon;

subroutine copy-virus-with-random-insertions:=
 {loop: copy evolutionary-virus to virus till semicolon-found;
 if random-bit = 1 then print-random-statement;
 if ~end-of-input-file goto loop;

 if trigger-pulled do-damage;
 goto next;}

An Evolutionary Virus "EV"

In general, proof of the equivalence of two evolutions of a program 'P' ('P1' and 'P2') is undecidable because any decision procedure 'D' capable of finding their equivalence could be invoked by P1 and P2. If found equivalent they perform different operations, and if found different they act the same, and are thus equivalent. This is exemplified by the following modification to program EV in which the decision procedure D returns "true" iff two input programs are equivalent.

program undecidable-evolutionary-virus:=
subroutine copy-with-undecidable-assertion:=
 {copy undecidable-evolutionary-virus to file till line-starts-with-zzz;
 if file = P1 then print "if D(P1,P2) then print 1;";
 if file = P2 then print "if D(P1,P2) then print 0;";
 copy undecidable-evolutionary-virus to file till end-of-input-file;

 {if random-bit = 0 then file = P1 otherwise file = P2;
 if trigger-pulled do-damage;
 goto next;}

Undecidable Equivalence of Evolutions of a Virus "UEV"

The program UEV evolves into one of two types of programs P1 or P2. If the program type is P1, the statement labeled "zzz" will become:

while if the program type is P2, the statement labeled "zzz" will become: The two evolutions each call decision procedure D to decide whether they are equivalent. If D indicates that they are equivalent, then P1 will print a 1 while P2 will print a 0, and D will be contradicted. If D indicates that they are different, neither prints anything. Since they are otherwise equal, D is again contradicted. Therefore, the hypothetical decision procedure D is self contradictory, and the precise determination of the equivalence of these two programs by their appearance is undecidable.

Since both P1 and P2 are evolutions of the same program, the equivalence of evolutions of a program is undecidable, and since they are both viruses, the equivalence of evolutions of a virus is undecidable. Program UEV also demonstrates that two unequivalent evolutions can both be viruses. The evolutions are equivalent in terms of their viral effects, but may have slightly different side effects.

An alternative to detection by appearance, is detection by behavior. A virus, just as any other program, acts as a surrogate for the user in requesting services, and the services used by a virus are legitimate in legitimate uses. The behavioral detection question then becomes one of defining what is and is not a legitimate use of a system service, and finding a means of detecting the difference.

As an example of a legitimate virus, a compiler that compiles a new version of itself is in fact a virus by the definition given here. It is a program that 'infects' another program by modifying it to include an evolved version of itself. Since the viral capability is in most compilers, every use of a compiler is a potential viral attack. The viral activity of a compiler is only triggered by particular inputs, and thus in order to detect triggering, one must be able to detect a virus by its appearance. Since precise detection by behavior in this case depends on precise detection by the appearance of the inputs, and since we have already shown that precise detection by appearance is undecidable, it follows that precise detection by behavior is also undecidable.

Limited Viral Protection

A limited form of virus has been designed [Thompson84] in the form of a special version of the C compiler that can detect the compilation of the login program and add a Trojan horse that lets the author login. Thus the author could access any Unix system with this compiler. In addition, the compiler can detect compilations of new versions of itself and infect them with the same Trojan horse. Whether or not this has actually been implemented is unknown (although many say the NSA has a working version of it).

As a countermeasure, we can devise a new login program (and C compiler) sufficiently different from the original as to make its equivalence very difficult to determine. If the 'best AI program of the day' would be incapable of detecting their equivalence in a given amount of time, and the compiler performed its task in less than that much time, it could be reasonably assumed that the virus could not have detected the equivalence, and therefore would not have propagated itself. If the exact nature of the detection were known, it would likely be quite simple to work around it.

Although we have shown that in general it is impossible to detect viruses, any particular virus can be detected by a particular detection scheme. For example, virus V could easily be detected by looking for 1234567 as the first line of an executable. If the executable were found to be infected, it would not be run, and would therefore not be able to spread. The following program is used in place of the normal run command, and refuses to execute programs infected by virus V:

program new-run-command:=
{file = name-of-program-to-be-executed;
if first-line-of-file = 1234567 then
 {print "the program has a virus";
otherwise run file;
Protection from Virus V "PV"

Similarly, any particular detection scheme can circumvented by a particular virus. As an example, if an attacker knew that a user was using the program PV as protection from viral attack, the virus V could easily be substituted with a virus V' where the first line was 123456 instead of 1234567. Much more complex defense schemes and viruses can be examined. What becomes quite evident is analogous to the old western saying: "ain't a horse that can't be rode, ain't a man that can't be throwed". No infection can exist that cannot be detected, and no detection mechanism can exist that can't be infected.

This result leads to the idea that a balance of coexistent viruses and defenses could exist, such that a given virus could only do damage to a given portion of the system, while a given protection scheme could only protect against a given set of viruses. If each user and attacker used identical defenses and viruses, there could be an ultimate virus or defense. It makes sense from both the attacker's point of view and the defender's point of view to have a set of (perhaps incompatible) viruses and defenses.

In the case where viruses and protection schemes didn't evolve, this would likely lead to some set of fixed survivors, but since programs can be written to evolve, the program that evolved into a difficult to attack program would more likely survive as would a virus that was more difficult to detect. As evolution takes place, balances tend to change, with the eventual result being unclear in all but the simplest circumstances. This has very strong analogies to biological theories of evolution [Dawkins78], and might relate well to genetic theories of diseases. Similarly, the spread of viruses through systems might well be analyzed by using mathematical models used in the study of infectious diseases [Baily57].

Since we cannot precisely detect a virus, we are left with the problem of defining potentially illegitimate use in a decidable and easily computable way. We might be willing to detect many programs that are not viruses and even not detect some viruses in order to detect a large number of viruses. If an event is relatively rare in 'normal' use, it has high information content when it occurs, and we can define a threshold at which reporting is done. If sufficient instrumentation is available, flow lists can be kept which track all users who have effected any given file. Users that appear in many incoming flow lists could be considered suspicious. The rate at which users enter incoming flow lists might also be a good indicator of a virus.

This type of measure could be of value if the services used by viruses are rarely used by other programs, but presents several problems. If the threshold is known to the attacker, the virus can be made to work within it. An intelligent thresholding scheme could adapt so the threshold could not be easily determined by the attacker. Although this 'game' can clearly be played back and forth, the frequency of dangerous requests might be kept low enough to slow the undetected virus without interfering significantly with legitimate use.

Several systems were examined for their abilities to detect viral attacks. Surprisingly, none of these systems even include traces of the owner of a program run by other users. Marking of this sort must almost certainly be used if even the simplest of viral attacks are to be detected.

Once a virus is implanted, it may not be easy to fully remove. If the system is kept running during removal, a disinfected program could be reinfected. This presents the potential for infinite tail chasing. Without some denial of services, removal is likely to be impossible unless the program performing removal is faster at spreading than the virus being removed. Even in cases where the removal is slower than the virus, it may be possible to allow most activities to continue during removal without having the removal process be very fast. For example, one could isolate a user or subset of users and cure them without denying services to other users.

In general, precise removal depends on precise detection, because without precise detection, it is impossible to know precisely whether or not to remove a given object. In special cases, it may be possible to perform removal with an inexact algorithm. As an example, every file written after a given date could be removed in order to remove any virus started after that date.

One concern that has been expressed and is easily laid to rest is the chance that a virus could be spontaneously generated. This is strongly related to the question of how long it will take N monkeys at N keyboards to create a virus, and is laid to rest with similar dispatch.