Prevention of Computer Viruses

Prevention of Computer Viruses

Copyright(c), 1984, Fred Cohen - All Rights Reserved

We have introduced the concept of viruses to the reader, and actual viruses to systems. Having planted the seeds of a potentially devastating attack, it is appropriate to examine protection mechanisms which might help defend against it. We examine here prevention of computer viruses.

Basic Limitations

In order for users of a system to be able to share information, there must be a path through which information can flow from one user to another. We make no differentiation between a user and a program acting as a surrogate for that user since a program always acts as a surrogate for a user in any computer usage, and we are ignoring the covert channel through the user. In order to use a Turing machine model for computation, we must consider that if information can be read by a user with Turing capability, then it can be copied, and the copy can then be treated as data on a Turing machine tape.

Given a general purpose system in which users are capable of using information in their possession as they wish and passing such information as they see fit to others, it should be clear that the ability to share information is transitive. That is, if there is a path from user A to user B, and there is a path from user B to user C, then there is a path from user A to user C with the witting or unwitting cooperation of user B.

Finally, there is no fundamental distinction between information that can be used as data, and information that can be used as program. This can be clearly seen in the case of an interpreter that takes information edited as data, and interprets it as a program. In effect, information only has meaning in that it is subject to interpretation.

In a system where information can be interpreted as a program by its recipient, that interpretation can result in infection as shown above. If there is sharing, infection can spread through the interpretation of shared information. If there is no restriction on the transitivity of information flow, then the information can reach the transitive closure of information flow starting at any source. Sharing, transitivity of information flow, and generality of interpretation thus allow a virus to spread to the transitive closure of information flow starting at any given source.

Clearly, if there is no sharing, there can be no dissemination of information across information boundaries, and thus no external information can be interpreted, and a virus cannot spread outside a single partition. This is called 'isolationism'. Just as clearly, a system in which no program can be altered and information cannot be used to make decisions cannot be infected since infection requires the modification of interpreted information. We call this a 'fixed first order functionality' system. We should note that virtually any system with real usefulness in a scientific or development environment will require generality of interpretation, and that isolationism is unacceptable if we wish to benefit from the work of others. Nevertheless, these are solutions to the problem of viruses which may be applicable in limited situations.

Partition Models

Two limits on the paths of information flow can be distinguished, those that partition users into closed proper subsets under transitivity, and those that don't. Flow restrictions that result in closed subsets can be viewed as partitions of a system into isolated subsystems. These limit each infection to one partition. This is a viable means of preventing complete viral takeover at the expense of limited isolationism, and is equivalent to giving each partition its own computer.

The integrity model [Biba77] is an example of a policy that can be used to partition systems into closed subsets under transitivity. In the Biba model, an integrity level is associated with all information. The strict integrity properties are the dual of the Bell-LaPadula properties; no user at a given integrity level can read an object of lower integrity or write an object of higher integrity. In Biba's original model, a distinction was made between read and execute access, but this cannot be enforced without restricting the generality of information interpretation since a high integrity program can write a low integrity object, make low integrity copies of itself, and then read low integrity input and produce low integrity output.

If the integrity model and the Bell-LaPadula model coexist, a form of limited isolationism results which divides the space into closed subsets under transitivity. If the same divisions are used for both mechanisms (higher integrity corresponds to higher security), isolationism results since information moving up security levels also moves up integrity levels, and this is not permitted. When the Biba model has boundaries within the Bell-LaPadula boundaries, infection can only spread from the higher integrity levels to lower ones within a given security level. Finally, when the Bell-LaPadula boundaries are within the Biba boundaries, infection can only spread from lower security levels to higher security levels within a given integrity level. There are actually 9 cases corresponding to all pairings of lower boundaries with upper boundaries, but the three shown graphically below are sufficient for understanding.

  Same Divisions	 Biba within B-L	 B-L within Biba
  --------------	 ---------------	 ---------------
Biba   B-L   Result	Biba   B-L   Result	Biba   B-L   Result
----   ----   ----	----   ----   ----	----   ----   ----
|\\|   |//|   |XX|	|\\|   |//|   |XX|	|\\|   |//|   |XX|
|\\|   |//|   |XX|	|\\|   |  |   |\\|	|  |   |//|   |//|
|  | + |  | = |  |	|  | + |  | = |  |	|  | + |  | = |  |
|//|   |\\|   |XX|	|//|   |  |   |//|	|  |   |\\|   |\\|
|//|   |\\|   |XX|	|//|   |\\|   |XX|	|//|   |\\|   |XX|
----   ----   ----	----   ----   ----	----   ----   ----
\\ = can't write   // = can't read    XX = no access    \ + / = X

Biba's work also included two other integrity policies, the 'low water mark' policy which makes output the lowest integrity of any input, and the 'ring' policy in which users cannot invoke everything they can read. The former policy tends to move all information towards lower integrity levels, while the latter attempts to make a distinction that cannot be made with generalized information interpretation.

Just as systems based on the Bell-LaPadula model tend to cause all information to move towards higher levels of security by always increasing the level to meet the highest level user, the Biba model tends to move all information towards lower integrity levels by always reducing the integrity of results to that of the lowest incoming integrity. We also know that a precise system for integrity is NP-complete (just as its dual is NP-complete).

The most trusted programmer is (by definition) the programmer that can write programs executable by the most users. In order to maintain the Bell-LaPadula policy, high level users cannot write programs used by lower level users. This means that the most trusted programmers must be those at the lowest security level. This seems contradictory. When we mix the Biba and Bell-LaPadula models, we find that the resulting isolationism secures us from viruses, but doesn't permit any user to write programs that can be used throughout the system. Somehow, just as we allow encryption or declassification of data to move it from higher security levels to lower ones, we should be able to use program testing and verification to move information from lower integrity levels to higher ones.

Another commonly used policy used to partition systems into closed subsets is the category policy used in typical military applications. This policy partitions users into categories, with each user only able to access information required for their duties. If every user in a strict category system has access to only one category at a time, the system is secure from viral attack across category boundaries because they are isolated. Unfortunately, in current systems, users may have simultaneous access to multiple categories. In this case, infection can spread across category boundaries to the transitive closure of information flow.

Flow Models

In policies that don't partition systems into closed proper subsets under transitivity, it is possible to limit the extent over which a virus can spread. The 'flow distance' policy implements a distance metric by keeping track of the distance (number of sharings) over which data has flowed. The rules are; the distance of output information is the maximum of the distances of input information, and the distance of shared information is one more than the distance of the same information before sharing. Protection is provided by enforcing a threshold above which information becomes unusable. Thus a file with distance 8 shared into a process with distance 2 increases the process to distance 9, and any further output will be at at least that distance.

As an example, we show the flow allowed to information in a distance metric system with the threshold set at 1 and each user (A-E) able to communicate with only the 2 nearest neighbors. Notice that information starting at C can only flow to user B or user D, but cannot transit to A or E even with the cooperation of B and D.

  D(output) = max(D(input))
  D(shared input)=1+D(unshared input)
  Information is accessible iff D < const

   A     B     C     D     E
  +-+   +-+   +-+   +-+   +-+
  +-+   +-+   +-+   +-+   +-+

A Distance Metric with a Threshold of 1

The 'flow list' policy maintains a list of all users who have had an effect on each object. The rule for maintaining this list is; the flow list of output is the union of the flow lists of all inputs (including the user who causes the action). Protection takes the form of an arbitrary boolean expression on flow lists which determines accessibility. This is a very general policy, and can be used to represent any of the above policies by selecting proper boolean expressions.

The following figure shows an example of a flow list system implementing different restrictions (indicated by A and B) for different users (at row,col 1,3 and 2,5). Notice that although information is allowed to get to 1,5, it can't actually get there because there is no path from its source at 1,3. As in the distance metric system, transitivity of information flow does not hold, so that even if information indicated by B were able to reach 2,3, it could not transit any further.

  Information is accessible iff B(F)=1
   1     2     3     4     5     6
  +-+   +-+   +-+   +-+   +-+   +-+
1 |A|---|A|---|A|---| |---|A|---|B|
  +-+   +-+   +-+   +-+   +-+   +-+
   |     |     |     |     |     |
  +-+   +-+   +-+   +-+   +-+   +-+
2 | |---| |---|A|---| |---|B|---|B|
  +-+   +-+   +-+   +-+   +-+   +-+
   |     |     |     |     |     |
  +-+   +-+   +-+   +-+   +-+   +-+
3 |B|---|B|---|B|---|B|---|B|---|B|
  +-+   +-+   +-+   +-+   +-+   +-+

    A Sample Flow List System

The above example uses a fairly simple boolean function, but in general, very complex conditionals can be used to determine accessibility. As an example, user A could only be allowed to access information written by users (B and C) or (B and D), but not information written by B, C, or D alone. This can be used to enforce certification of information by B before C or D can pass it to A. The flow list system can also be used to implement the Biba and the distance models. As an example, the distance model can be realized as follows: @center[OR(users <= distance 1) AND NOT(OR(users > distance 1))]

In a system with unlimited information paths, limited transitivity may have an effect if users don't use all available paths, but since there is always a direct path between any two users, there is always the possibility of infection. As an example, in a system with transitivity limited to a distance of 1 it is 'safe' to share information with any user you 'trust' without having to worry about whether that user has incorrectly trusted another user.

Limited Interpretation

With limits on the generality of interpretation less restrictive than fixed first order interpretation, the ability to infect is an open question because infection depends on the functions permitted. Certain functions are required for infection. The ability to write is required, but any useful program must have output. It is possible to design a set of operations that don't allow infection in even the most general case of sharing and transitivity, but it is not known whether any such set includes non fixed first order functions.

As an example, a system with only the function 'display-file' can only display the contents of a file to a user, and cannot possibly modify any file. In fixed database or mail systems, this may have practical applications, but certainly not in a development environment. In many cases, computer mail is a sufficient means of communications, and so long as the computer mail system is partitioned from other applications so that no information can flow between them except in the covert channel through the user, this may be used to prevent infection.

Although no fixed interpretation scheme can itself be infected, a high order fixed interpretation scheme can be used to infect programs written to be interpreted by it. As an example, the microcode of a computer may be fixed, but code in the machine language it interprets can still be infected. LISP, APL, and Basic are all examples of fixed interpretation schemes that can interpret information in general ways. Since their ability to interpret is general, it is possible to write a program in any of these languages that infects programs in any or all of these languages.

In limited interpretation systems, infections cannot spread any further than in general interpretation systems, because every function in a limited system must also be able to be performed in a general system. The previous results therefore provide upper bounds on the spread of a virus in systems with limited interpretation.

Precision Problems

Although isolationism and limited transitivity offer solutions to the infection problem, they are not ideal in the sense that widespread sharing is generally considered a valuable tool in computing. Of these policies, only isolationism can be precisely implemented in practice because tracing exact information flow requires NP-complete time, and maintaining markings requires large amounts of space [Denning82]. This leaves us with imprecise techniques. The problem with imprecise techniques is that they tend to move systems towards isolationism. This is because they use conservative estimates of effects in order to prevent potential damage. The philosophy behind this is that it is better to be safe than sorry.

The problem is that when information has been unjustly deemed unreadable by a given user, the system becomes less usable for that user. This is a form of denial of services in that access to information that should be accessible is denied. Such a system always tends to make itself less and less usable for sharing until it either becomes completely isolationist or reaches a stability point where all estimates are precise. If such a stability point existed, we would have a precise system for that stability point. Since we know that any precise stability point besides isolationism requires the solution to an NP-complete problem, we know that any non NP-complete solution must tend towards isolationism.

Summary and Conclusions

The following table summarizes the limits placed on viral spreading by the preventative protection just examined. Unknown is used to indicate that the specifics of specific systems are known, but that no theory has been shown to predict limitations in these categories.

                     Limits of viral infection

         general interpretation           limited interpretation
Sharing\  limited      general             limited      general
        |-----------|-----------|       |-----------|-----------|
general | unlimited | unlimited |       | unknown   |  unknown  |
        |-----------|-----------|       |-----------|-----------|
limited | arbitrary |  closure  |       | arbitrary |  closure  |
        |-----------|-----------|       |-----------|-----------|