1. A sequence of symbols which, when interpreted by an the environment in which it operates, replicates and/or evolves. 2. A member of the viral set with respect to an environment. 3. See  for the first formal mathematical definition. Also see , , and  for some alternate definitions.
The first computer virus probably appeared in the late 1950s or early 1960s as an experiment in operating systems design. Mathematical theories of self-replicating automata has a fairly long history prior to that time. In the 1970s, the Xerox worm program  demonstrated the ability to propagate through a network. The worm accidentally caused denial of services in an isolated instance, apparently due to a bit error. The mathematical definition of viruses  is also general enough to cover other classes of living systems. We refer the reader to  and a good book on theory of computation for more references on these sorts of systems. The term virus has also been used in conjunction with an augmentation to APL in which the author places a call at the beginning of each user defined APL function to invoke a preprocessor which augments the default APL interpreter .
Throughout the 1960s and 1970s, there were a few experiments with computer viruses and similar phenomena in a piecemeal fashion, but the first scientific work to concentrate on the protection aspects of viruses and the difficulties in defending against them was a paper in 1984 . Since that time, there have been a flurry of papers and developments, and a number of widely spread computer virus attacks.
The game of "core wars"  was invented in the mid 1980s to allow programs to "do battle" with each other in a benign environment. Other variations on this theme have been reported by many unpublished authors. Efficient parallel processing algorithms are now under study using virus replication to spread throughout a network and limited forms of evolution to work towards optimal resource usage and problem solutions. There is now an emerging research field on "artificial life" concentrated in this direction.
As of this writing, there are over 100 well known real-world computer viruses, affecting every type of computer and operating system, and according to IBM's high integrity computing laboratory , new viruses are being found in the environment at a rate of about one every 6 days. Several computer viruses have been distributed by well known computer software manufacturers in legitimate "factory" distributions. In one attack, about 500,000 users were affected by a single computer virus in a matter of days . Most analysts agree that the problem is getting worse and that if protection does not improve, the number of affected users will increase dramatically from year to year.
Early Scientific Work:
A "computer virus" [1,13] was (informally) defined as a program that can "infect" other programs by modifying them to include a possibly evolved version of itself. With this infection property, a virus can spread to the transitive closure of information flow, from program to program, from user to user, from computer to computer, and from network to network, corrupting the integrity of information as it spreads . Given the widespread use of sharing in current computer systems, the threat of a virus causing widespread integrity corruption is significant [2,3].
A considerable amount of early work was done on protection policies to prevent from the illicit dissemination of information [4,5], and several systems have been implemented to provide protection from this sort of attack [6-9], but little work was done on protecting the integrity of information except in very limited domains [10,11]. With the advent of computer viruses [1,13], this situation has changed dramatically (see [12-32]). Several conferences on viruses are now held yearly, many universities are now offering graduate classes on computer viruses even though they have no other protection related course offerings in their curriculum, and integrity protection has become a global protection priority.
Unique Security Implications:
The major difference between previous self-replicating program research and the more recent computer virus research is the property of "infection". With this infection property, many major issues arise, most notably :
Viruses spread transitively, and thus have a tremendous range relative to other attacks. A typical PC based virus spreads to thousands of unconnected computers in a matter of weeks. In more well connected networks, transitive information flow has reached tens of thousands of computers in a matter of hours .
Viruses persist indefinitely, and may be accidentally saved on backup tapes, floppy disks, and other media in an infected host. They may be accidentally or purposely revived long after they appear to have been eradicated. In several cases, known viruses have returned from other media after weekly eradication from a network over a period of a year or more .
Viruses tend to persist longer than any other program in the operating environment, continuing across updates in user programs and operating system upgrades. Attacks with long latency before damage and/or very subtle damage tend to spread further and persist longer because they tend to become interwoven with legitimate activities of the system .
Viruses tend to spread very rapidly. As in biological viruses, the more the infection spreads, the more copies there are available for spreading it. A typical time to attain all privileges of all users on a timesharing system is 30 minutes . A typical network attack takes over the entire network in a few hours . In one experiment, a 60 machine IBM PC-network was completely taken over in 30 seconds .
Viruses can act as carriers for any other information, and thus can be used to cause arbitrary effects . They can carry other attacks along with them, and thus bypass many of the protection mechanisms that would otherwise be in place against the embedded attacks. Thus a virus can be used to introduce a covert channel, modify operational controls, or do almost any other damage .
Since the level of indirection between initial infection and any given instance of a virus may be so great, it may be very difficult to trace an infection to its initial source. Thus the chance of a serious attacker getting caught launching a viral attack is substantially less than for most other attacks. Only 2 out of over 100 virus attacks that did not explicitly name the author have been traced to a source, and neither of these used the infection property to spread.
Most of the problems related to detecting a virus are undecidable. It is undecidable whether a program is infected, whether a sequence of instructions is a virus, whether a known virus evolves into another known virus, etc. Evolving viruses are easily written and often far harder to detect and eradicate than simple viruses .
Infection may be performed on any information that is interpreted, and may infect any other interpretable information. Protecting binary executable programs will not prevent infection of spreadsheet files or database files. Protecting source programs, does not protect against infection of intermediate compilation files, computer mail, or load libraries. We have seen real world viruses in spreadsheets, databases, several command languages, source programs, executable files, computer mail, library files, editor macros, and bootstraps. The integrity of information must be protected at every level.
Models of Viral Spread:
Biological models of prevention, detection, and cure has been pursued to a small degree [13,30], but no significant progress has yet been made towards a systematic defense under this model. More recently , agricultural models have been applied by treating viruses as 'pests' infesting plants. These models tend to provide insight into managing the virus problem rather than as technical defense mechanisms, and are therefore most advantageous to high level decision makers.
Mathematical models of viral spread have been investigated by a number of researchers, and we now have a reasonable means of predicting the spread of typical viruses given an accurate enough characterization of a system and no defense in place .
The potential threat of a widespread protection problem has been examined , and the potential damage to government, financial, business, and academic institutions is extreme. In addition, these institutions tend to use ad-hoc protection mechanisms in response to specific threats rather than sound techniques . To help clarify the potential damage, we now describe some examples of the sort of damage a virus could do.
- Older military protection systems depend to a large degree on isolationism, but newer systems allow 'multilevel' usage, in which users of various levels of trustworthiness simultaneously coexist and communicate within a single system . In most of these systems, the user with the lowest security clearance is the greatest threat from the standpoint of viral attack, because higher security level users can run programs written at the lowest security level, and thus become infected [1,13].
- Corruption in a multilevel secure system can be exploited to leak secrets at a high rate. This attack exploits so called "Covert Channels", channels for information flow not designed for that purpose. In any system that shares resources in a non-fixed fashion, there are covert channels. Furthermore, Shannon's information theory shows that any amount of information can be reliably sent through such a channel regardless of the presence of noise, with bandwidth limited by the signal and noise characteristics of the system . Some simple examples of covert channels are the amount of disk space, the time to complete a task, printing delay, etc. In the most secure current systems, there are hundreds of covert channels with bandwidths in the range of 50 bits/sec each. Thus a cooperating pair of processes at different secrecy levels could potentially leak thousands of bits per second from the best multilevel computer security systems currently available. To exploit this vulnerability using a virus, we spread the virus from the least to the most trusted user [1,13], and then leak secrets using the covert channels.
- A data diddling virus is a virus that performs damage by changing data files . A typical scenario is that every infected program changes one pseudo-randomly chosen bit in one pseudo-randomly chosen data file once every week. This causes two major problems; we would typically have hundreds of copies of the infection per computer, causing hundreds of random data changes per week; and we typically have no way to determine that data files are changing because we don't cover these changes with any protection techniques. Almost no organization in the world today is capable of detecting this sort of attack until the data begins to yield very wrong results (and few would be able to tell this for a long period of time). Once we detect these changes, we still have the problem of fixing them. In many systems, data tends to change legitimately very often, and we may not be able to tell how far back to go into backups to correct the problem. Sequential transaction files are critical to recovery from this sort of attack in a transaction system. Once we recover, we may still be unaware of the presence of the virus, and may run into the same problem again and again.
- A random deletion virus is similar to a data diddling virus in that it spreads throughout a network and causes only indirect harm. The damage in this case is the deletion of files that have not been accessed in a long time . For example, a virus that occasionally deletes files that have not been accessed in over a month is unlikely to be detected as a result of this damage. It turns out (empirically) that a file that has not been accessed over the last month is unlikely to be accessed in the next month. The longer the time since the last access, the lower the chances are of an access in the near future. By deleting the least recently used file first, the attacker minimizes the chances of being caught. This is especially severe in systems where files are moved off line periodically as a method of saving space. By deleting 10% of the files that are about to be moved off-line for each user, the attacker may fool the user into thinking that any removed file is simply off-line, causing an extensive search of backups to no avail. This might ultimately result in a breakdown of trust in the backup system, and sets of accusations flying back and forth between users and administrators.
- A production destruction virus is designed to be used against a competitor in a manufacturing or production environment. The idea is to cause the quality of the manufactured product to be reduced slightly by making subtle variations in computer controlled parameters. For example, if you could cause a competitor's aluminum rolling mill to have too fast a cooling process when the hour of the day is the same as the day of the month minus the month of the year and only the standard users are on the system, you could probably cause reduced quality over a substantial period of time. When systems maintenance was being done or when a researcher was trying to diagnose the problem, the system would behave correctly. Meanwhile, the competition sells at a lower price or claims higher quality.
- A protection code changing virus is a virus that changes the protection codes of information to allow access that should be denied and/or deny access that should be allowed . This can cause a multitude of operational problems as well as security breaches, but it also points out a major failure in most modern protection systems. They fail to provide adequate tools to detect or correct erroneous alteration of the protection state, whether malicious or not. There are protection packages that provide this sort of checking and correction, but almost no organization currently uses these techniques.
- A name changing virus is a virus that changes file names or record headers, so that information retrieved is not as requested. This can be done at any level within a system, from changing directory names to changing names of fields within database files. There is at least one instance of a VMS program that caused names to disappear until logoff at which point they would reappear. Thus when brought to the attention of the security officer, the user would login in front of the officer and everything would appear correct. Until this was tracked down, there were a lot of misunderstandings .
- A network deadlock virus is a virus that overloads a network . The Internet virus of 1988 was just such a case , as was the Christmass card of 1987 . Each of these caused network failures which crossed national and continental boundaries. Although neither virus was apparently designed with this purpose in mind, the exponential growth of self replicating programs quickly caused this side effect. Most virus writers apparently underestimate the effect of exponential growth.
- An executive error virus is a virus designed to affect decisions made by "higher-ups" to discredit them or otherwise cause organizational changes . In a typical scenario, a vice president (perhaps of R+D) places a virus in a spread sheet. As the virus spreads from spread sheet to spread sheet, it eventually enters the CEO's spread sheets where it is triggered to randomly change a few cells when they are displayed on executive office terminals. As the CEO makes incorrect decisions based on the misinformation, the loss of credibility might eventually dislodge the CEO, perhaps making the VP of R+D the next CEO. An attack of this sort was reported in 1987, but no substantiation has yet been produced.
An important point that shows up from exploring these scenarios is that in most cases, viruses are more severe and harder to track down if the harmful effects are only indirectly linked to the infection mechanism. In many of these cases, the virus never causes any apparent failure in an infected program. This tends to extend the lifetime of the attack and cause defenders to look in the wrong places to find the cause of corruptions if and when they notice corruption taking place. Without a means of noticing corruption before it causes substantial harm, we are in a very difficult situation.
To demonstrate the feasibility of viral attack and the degree to which it is a threat, we performed several experiments in 1983 and 1984 . In each case, experiments were performed with the knowledge and consent of systems administrators. Implementation flaws were avoided because it was critical that these experiments be based on fundamental flaws in the current nature of computer use.
Experiments have been performed on a variety of systems to demonstrate feasibility and determine the ease of implementation. The following table summarizes the results of some representative experiments to date. The systems and languages are across the horizontal axis (Unix in C, Bell-LaPadula, Instrumentation, Unix shell, VMS command language, Basic, IBM token ring personal computer network, and a personal computer in the DOS command language). The vertical axis indicates measures of performance (time to write the program, infection time, number of lines of code, number of experiments performed, minimum time to takeover, average time to takeover, and maximum time to takeover) where time to takeover indicates that all rights are granted to the attacker within that delay from introducing the virus. N/A entries indicate cases where results are either not applicable or not available. Most of these results have been confirmed by researchers world wide.
There are several cases where experimenters have accidentally released viruses, presumably through carelessness. This may seem to be irresponsible, but the blame cannot always be laid entirely on the experimenter. Very few researchers maintain isolated systems for their experiments, even though this is clearly indicated by the early research in this field . In some cases this is because of budgetary limits and in some cases it is simply a matter of ignorance or ego. Most of the broad research community is unaware of the basic results. This ignorance of virus research (and protection research in general) is common among the 'open research community'. Most researchers also think that they are good enough to adequately control their experiments regardless of the environment, and forgo adequate precautions in favor of expediency.
Another major source of viruses appears to be business associations that distribute viruses to customers to demonstrate how they work. Most legitimate researchers consider this irresponsible behavior, and this contention is widely supported by numerous real incidents. In one case, a member of one of these associations released viruses in its defense product distributions! Clearly the virus defense industry must take precautions more seriously than this.
In 1987, a research virus in Silicone Valley was accidentally released and eventually infected Hewlett Packard and Apple Computer. Another incident happened in 1989, with a university researcher. Some of the widely known viruses also appear to be generated by researchers, but we have no substantial evidence of wrongdoing to be certain.
Very few organizations or individuals have adequate facilities and/or knowledge at this time to safely perform virus experiments. Of the hundreds of companies in the virus defense industry, only 3 that I know of have laboratories well enough protected to prevent accidental releases in their products. Of the non-product oriented researchers, I only know of 3 groups who have adequately protected laboratories for this purpose. Improved research precautions and standards are clearly called for, and there is a move in the research community towards a set of global standards for this research.
Well over 100 different viruses have apparently been released intentionally, and according to members of the global research community , more than 1 new virus is now discovered in the environment per week. The vast majority of the detected attacks are very simplistic and perform damage in a very obvious and rapid fashion, but there are a number of interesting cases that represent the changing nature of attacks.
One of the earliest attacks was apparently started at Lehigh University in the fall of 1987. In this PC based virus, an infected operating system disk causes itself to be replicated into the operating system of other disks it comes in contact with. After 4 replications, it destroys the file systems on all accessible disks. This virus infected thousands of disks and caused hundreds of systems to lose all of the data on their disks before it was detected. Several hundred more systems were infected before a defense was designed. A trivial change to this virus to wait till 100 infections before performing damage would almost certainly damage tens of thousands of computers worldwide. Only the rapid onset of damage and the rapid response by Lehigh University's computing community stopped this attack from being a widespread disaster.
Another very damaging virus is the 'Scores' virus which attacks Apple MacIntosh computers. In this case, the virus attacks programs written by particular authors at a particular company, and it is thought to have been written by a disgruntled ex-employee. It causes systems crashes and prevents saving information before the crash. This virus waits several days between infection and damage, and thus gets embedded into backups and other media removed from the system. Several sites have reported weekly reinfection after cleanup over a one year period.
On the very widely used "Compuserve" network, the 'MacMag' virus was planted to infect the initialization files of the Apple MacIntosh. This virus was designed to put an advertisement on the screen on a particular date. It eventually got into a major software house and was distributed in about 10,000 copies of legitimate shrink wrapped software. As far as we can tell, this was the first major release of a virus in a legitimate software distribution.
Although many people claim that as a defense, you should stay away from 'shareware', 'public domain software', and 'bulletin boards', the fact is that only one widely known virus has ever been launched through a bulletin board, and none have ever appeared in shareware or public domain software distributions from legitimate dealers. Viruses have been released in legitimate software distributions from major vendors, from universities, from private companies, and from a variety of unknown sources. Maintenance crews, employees, external consultants, and EDP auditors, have all accidentally brought viruses from other organizations back into their environment, and then carried them into their clients' systems.
We don't want to leave the impression that only personal computers have been attacked in this fashion. In fact several large computer companies have been successfully attacked by viruses that spread throughout their timesharing systems, even where the most stringent protection is provided. In a secure computer system development environment at AT+T, an internal experimental virus got into a secure development system during maintenance . The 1987 Christmas card virus went through IBM compatible mainframes worldwide, affecting about 500,000 users and bringing three networks to a halt for several hours. In 1988, the 'internet' virus entered about 60,000 engineering workstations and timesharing systems in a matter of hours, remained in about 6,000 of them, and caused worldwide disruptions for several days. Virtually every major operating system has now been successfully attacked by computer viruses.
More recently, attackers have raised the stakes by launching a series of more sophisticated attacks designed to circumvent current defenses. In 1989, the AIDS virus was mailed to tens of thousands of users on a PC mailing list, causing major disruptions for many companies. Although not very virulent, this attack was apparently well funded, and demonstrated widespread inadequacies in technical and procedural defenses. The 'Datacrime' virus of 1989 caused loss of data in many organizations in Europe, even though it was detected well in advance of any damage and its presence was broadcast on worldwide news programs. An evolutionary virus launched in late 1989 was designed to exploit a weakness in the vast majority of pattern matching programs, thus making most of them impotent against this attack. Another attack in late 1989 was designed to make infected programs appear clean when 'read' from disk, but not when 'loaded' from disk. If this trend continues, we will see the vast majority of current defensive products fail in the next few years, leaving only sound defenses effective.
Experiments and real attacks have shown that viruses spread throughout the most secure systems we have, even when we use the best available technologies circa 1983 . In an industry and government survey in 1988 , 40% of respondents had detected known viruses in their systems and said they need more defenses than they currently have.
Viral Defenses With Major Flaws:
AUDIT TRAILS - Audit trails as they exist today are not capable of tracking even the simplest viral attacks involving infection , and despite their success in the Internet and IBM attacks , are rarely capable of even tracking simple self-replicating behavior. Furthermore, most of the improvements necessary for tracking infectious viruses in computer systems are NP-Complete .
VACCINES - Many viruses place information in infected programs to prevent repeated reinfection. Defenders have created special purpose vaccines that emulate infection by placing the same information in the same locations. This class of defenses is easily defeated by a pair of viruses that look for different information in the same location. The result is alternation between two or more viruses that compete for which programs are infected. By emulating any one of them, you only create a temporary imbalance in the equilibrium which is automatically compensated for by the other viruses in the viral set .
SOFTWARE SELF DEFENSE - In 1985 , self defense was explored as a means for programs to defend themselves against attack. As an alternative to system wide defensive techniques against viruses, the survival of the fittest analogy provides an interesting means for self defense by individual programs. Every programmer could design an independent self defense mechanism, or a compiler might generate defenses automatically for programs compiled with it. A typical self defense involves detection and correction of internal errors . The advantage of such a system is that it is easily implementable in a very short period of time [23,27]. It was once thought that this method could systematically cause the complexity of undetected attack against programs and their data files to be made very large by the use of an evolutionary cryptographic checksumming technique [13,23,27], and as a result, several products are on the market that provide this sort of detection. Unfortunately, there is a generic attack against any such mechanism :
- Any such mechanism cannot make checks until it is interpreted. If the virus is inserted so that it is interpreted before the defense, it can forge any aspect of the environment the defense uses to detect corruption. This particular vulnerability can be exploited very generally without any knowledge of the defense in use. An experimental attack of this sort has been demonstrated in Unix, and it successfully defeats any such mechanism. An MS-DOS virus in late 1989 demonstrated the same capability for the DOS operating system. In this case, we cannot be assured of getting a legitimate DOS operating system unless we use some hardware protection mechanism on the operating system. Thus, we must initialize a system from a write protected floppy disk with the operating system in order to be certain our system is not corrupted from within .
PROPER PROTECTION STATE - Protection mechanisms on most systems are poorly maintained by systems administrators. By simply setting the protection to files so that other users cannot execute them, the paths of outgoing infection are limited. Rather than allowing access to a large group of people, individual permission for other users to use programs, reduces the channels of spread. Although this does not prevent viral attack in general, it makes implementation of viruses less trivial, and can even provide real limits on information flow if the operating system protection is sound and the protection state is properly set [17-22]. Unfortunately, in almost all current systems, these conditions are not met. Automated tools for this purpose are now in limited use, and this will likely expand in the near future.
USER NOTIFICATION AND AWARENESS - User awareness is a critical aspect of protection. If users are aware of the threats against them, they may be better prepared to protect themselves. As an example, every time a file is modified, the operating system could notify the user of the change. This would make the user more aware of strange side effects of programs, and allow detection of many attacks by the user being attacked. Unfortunately, any subtle attack is unlikely to be detected by normal user vigilance, and most users tend to ignore this sort of behavior, because as Shannon's information theory points out , the rarer a warning is, the more likely it is to be noticed. As the number of user decisions grows, the tendency to use default responses increases, until eventually, the user simply defaults everything and attacks become indifferentiable from normal operation.
PROGRAM CONFINEMENT - The confinement of programs to limited authorization spaces has been shown infeasible because of covert channels , and precise tracking is NP-complete . Nevertheless, there are confinement strategies more general than Lampson's program confinement that are effective.
INSTRUMENTATION - Two types of system instrumentation were considered for detecting and tracing viral attacks. Although no technique will catch all viruses, many techniques may catch most viruses. The 'flow list' method whereby every user who might have had an effect on a file whether indirectly or directly is kept in an 'incoming flow list', and the statistical behavior method whereby the behavior of normal operation is observed and compared to the behavior under viral attacks to find good measures with which to detect viruses. Each was shown to be an improvement over systems with no detection mechanisms, but they are far from ideal. Flow lists tend to become large rapidly, and are NP-complete to precisely maintain. Statistical techniques tend to identify non-viruses, and if the nature of the technique is known, can always be avoided [1,13].
NORMAL OPERATING PARAMETER CHECKERS - One interesting idea for detecting some classes of attacks is checking operating parameters against expected values to see if overly errant behavior is taking place. For example, if files are changing too rapidly, or if processes are spawning to quickly, etc. Although this may be a good idea, it does not provide protection against viruses that do not operate outside the normal expectations of average programs.
PROGRAMS THAT LOOK FOR VIRUS LIKE CODE - Programs that examine other programs' structure and behavior have been under investigation for quite some time for detecting cheating in computer science classes, and extensions of these principles are being explored for viral detection. We know that detection of a virus by examination is undecidable , and that the long term hope for this is therefore unsound. The present techniques are very limited in their application, and offer little assurance. Even more importantly, such techniques may lead to a false sense of security, take a great deal of time, and are poor relative to other existing techniques.
SOFTWARE FAULT TOLERANCE - The problem of computer viruses can be thought of as a reliability problem in which we have 'N' system users, 'M' of which are reliable. The problem we are faced with is that of getting reliable answers from a system with only 'M of N' reliable components. We have performed mathematical analysis of the effects of collusions on the spread of integrity corruption [17,18,19,21], and are presently considering issues related to redundant POset networks. Research in N-version programming has shown that it is very difficult to get multiple versions of the same program to operate with the same behavioral characteristics , and that even two correctly written programs may give different results if there specification is not sufficiently complete .
When we run a multiversion program we specify N executables, all written by different authors. The operating system automatically invokes the N programs in parallel, and performs voting on all inputs and outputs [43-45]. The user is notified of disagreements, and the answer that is held by the majority of the programs (assuming there is one) is considered correct. This result is also fed into the other programs so that they can proceed on the basis of this result for further processing. If there is no majority, the program is aborted as having too severe an error for continuation.
Two major problems with the use of N-version programs are that it requires considerably more effort to program and more hardware for the same system performance. The requirement of identical I/O behavior presents several problems. Two different methods of performing a calculation may yield very close, but differing results. The decision to take the average or disregard all results may have severe ramifications. The program as a whole can proceed no faster than the slowest algorithm, and transient faults may propagate through a program, thus giving it the appearance of a permanent fault. More reliability can be gained by increasing N, but if performance is to remain the same, this requires a factor of (slightly more than) N increase in hardware and software costs. Finally, just as simple checksums and other standard fault tolerant computing methods fail under deliberate attack, we cannot be certain about the value of N-version programs until we perform deeper analysis.
LOOKING FOR KNOWN VIRUSES - There are a number of pattern matching programs on the market to detect known viruses, and they are quite successful at this, but they present some problems. The first problem is that they give a false sense of security. Many users check for the viruses listed on the provided disk, but would not notice if another virus appeared, and would thus feel secure even though they were in fact at great risk. The second problem is that these programs fail miserably at detecting evolving viruses. It is simple to write an evolving virus with millions of evolutions in the viral set , and real world attacks have followed the theoretical predictions. Since each may be very different in appearance from the others, a single evolving attack may require millions of patterns to be checked. The third problem is that many viruses in the world today are modifications of an existing virus by an attacker wishing to change the damage or improve the effectiveness. We can detect larger classes of variations by having less accurate patterns, but this also increases the likelihood of false positives. Finally, pattern matchers only work against known attack patterns, and thus only detect viruses we already know about.
What We Know About Sound Computer Virus Defenses:
Sound defenses seem to concentrate in three areas; prevention, detection, and cure. Prevention as we will see is rarely feasible in the current computing environment. Change detection has become quite good in the last several years thanks to the development of better cryptographic checksum systems and integrity shells. Cure is very limited, especially in untrusted systems where widespread corruption is very easily attained by an attacker.
PREVENTION - There are only three things we can do to prevent viral infection in a computer system [1,13]; limit sharing, limit transitivity, and limit functionality.
In general, we can limit sharing by limiting information flow so as to form a POset of communicating information domains in a network or system [18,19,21,22]. In such a system, we can guarantee that a virus will spread only to those domains which are in the transitive flow path from its initial source . The POset policy has advantages over the Bell-LaPadula  policy, the Biba  policy, and the lattice policy , in that it is more general than these, and encompasses properties of information networks as well as single processors. Furthermore, the flow model relates more easily to most real world situations than its predecessors. We have demonstrated methods by which collusions of multiple attackers can be limited in their damage , administrative methods by which such a network can be properly managed , methods for implementing distributed administration and management of such systems , a physical device for implementing these systems with provable implementation of policies , and have implemented an actual system which has these properties .
In a system with unlimited information paths, limited transitivity may have an effect if users don't use all available paths, but since there is always a direct path between any two users, there is always the possibility of infection. As an example, in a system with transitivity limited to a distance of 1 it is 'safe' to share information with any user you 'trust' without having to worry about whether that user has incorrectly trusted another user . We have generalized this principle to arbitrary subsets of users and to arbitrary sequences of user actions . In general, this problem becomes as complex as precise maintenance of information flow which has been proven NP-complete . Furthermore, no practical use for such a system has yet been found.
Although isolationism and limited transitivity offer solutions to the infection problem, they are not ideal in the sense that widespread sharing is generally considered a valuable tool in computing. Furthermore, only isolationism can be precisely implemented in practice because tracing exact information flow requires NP-complete time, and maintaining records of the flow requires large amounts of space . This leaves us with imprecise techniques. The problem with imprecise techniques is that they tend to move systems towards isolationism because they use conservative estimates in order to prevent potential damage. When information has been unjustly deemed unreadable by a given user, the system becomes less usable for that user. This is a form of denial of services in that access to information that should be accessible is denied. Such a system always tends to make itself less and less usable for sharing until it either becomes completely isolationist or reaches a stability point where all estimates are precise. If such a stability point existed, we would have a precise system for that stability point. Since we know that any precise stability point besides isolationism requires the solution to an NP-complete problem, we know that any non NP-complete solution must tend towards isolationism [1,13].
The third option for absolute prevention is limited function. By removing Turing capability  from our application level software, we can substantially reduce the threat of viral attack, limiting it to a finite number of interpretation methods. Turing capability is sufficient for viral infection, but not necessary. Even with limited function, it may be possible to create viruses. Thus we must be careful to design limited function systems so as to be virus free. There is no simple way to do this, and the theory is not well enough developed to be able to automatically evaluate whether an environment can support viruses, but we think this can be done fairly easily by knowledgeable experts. Most real world users do not exploit the general purpose capabilities provided to them, and it would be a substantial advantage for the defender if limited function could be applied. Unfortunately, almost all modern software allows general purpose function, including most applications programs such as databases, spreadsheets, editors, mail systems, etc.
A fourth option for prevention is sound change control. Change control is used throughout the computing industry as a means to assure the propriety of changes in production environments. Even though the practice is widespread, it is rarely done effectively. Sound change control lead us to the problems of high cost, lack of flexibility, and slow rate of change, which can be prohibitive .
According to the few organizations that do sound change control, two change control experts are needed for each programmer making changes. In other words, sound change control triples programming costs. Less sound change control is less expensive, and less effective.
To be certain that change control is effective, it must be used for every programming change, including spread sheet, database, and editor macros, computer mail in mail systems with formatting, and many other 'programs' that we don't always think of as programs. In essence, sound change control can only be used where the production environment is limited in its function, or in cases where the production environment is only used for creating distributions, and not for processing.
Change controlled systems are also relatively slow to change. In a sound change controlled environment, if the production system fails due to a software error, we cannot make a quick fix to get it operating again. We have to put that fix through change control in the same fashion as any other change. This tends to take valuable time and may create more problems that it resolves.
DETECTION - An alternative to detection of viruses is more general change detection. Change detection is decidable, and is relatively easy to do. The problem then becomes one of differentiating legitimate changes from illegitimate ones. Since legitimacy is a function of intent, we must describe intent to the system in order for the system to make proper decisions. Intent is normally described either by user action, or by automated methods. A system to address automated decision making in such an environment has been devised and is beginning to receive widespread acceptance in real world applications . By checking information just before it is interpreted, we can detect all primary infection, prevent all secondary infection, and optimize time spent in checking [48,49]. Systems of this sort are called integrity shells, and are available for Unix and DOS based systems as of this writing. Finding an effective change detection method against purposeful attack is not as easy as finding one against random corruption, but substantial progress has been made in this area using cryptographic checksums [23,55].
CURE - Cure of viruses has been examined in some depth, and appears to present little hope in the general sense. The tail chasing problem indicates that the cure of viruses can only be performed while services are denied unless detection and cure are faster than infection . This is similar to illness in biological systems. In the real world, cure has often been far more difficult than detection because viruses get into the legitimate or illegitimate backup system.
Almost no known viruses are eradicated by unweaving the virus from the infected host, as this is quite difficult to do reliably even in simple cases, but there is one notable exception by a British manufacturer who has hard-coded virus removal for almost 50 known viruses. The predominant strategy is to abandon infected code and replace it with uninfected versions. Since we cannot detect viruses in general, we can never be certain that a previous version does not contain a progenitor of the detected infection, however, with most of the intentionally launched viruses detected to date, we believe that there is no subtle generation process leading to the detected version. We are by no means certain of this.
Just as the redundancy produced by self replication makes viruses so hard to eradicate, redundancy is the only hope for the cure of a virus (or other corruption) once detected. The only method in use today for providing this redundancy is backups. Off-line backups provide a slow and expensive, but effective way to restore the state of a system (but of course do not assure that the restored state is a sound one). On-line backups are used in situations where speed and availability are more important than space. Since on-line backups are usually subject to the same corruption as other on-line information, they are normally checked for integrity before use and are less reliable than off-line backups. A mix is usually most appropriate. In one integrity shell, on-line backups can be automatically restored for cure when a corruption is detected, thus making the whole process of detection and cure essentially transparent, with only a minor reduction in the time required to begin interpretation [50,53,54]. Graceful degradation is also provided when corruption is too severe for restoration if the rest of the system is still operable.
THE COMBINED DEFENSE - A combined defense is very sound against computer viruses, but has a cost that few current organizations are willing to pay. As a result, most organizations trade protection for time, space, or other costs to optimize their situation.
In the combined defense, we use flow control to limit sharing wherever feasible, change control and integrity shells to detect and correct corruption as quickly as possible, and backups to provide a disaster recovery capability. By exercising strong control over changes and testing backups to assure that they can be restored and meet the change control requirements of the original system, we provide a very strong protection environment against corruption.
REMAINING PROBLEMS - The problems that remain are mainly the granularity issue and the level of coverage issue. Granularity becomes a problem in covering large databases and similar entities that change very often or from numerous sources. The level of coverage deals with the problem of embedded interpreters . Does Basic provide coverage for Basic programs? Do databases cover their own records? These problems must be solved on an individual basis depending on the tradeoffs involved, but we do have tools and techniques to help solve them. Ultimately, we are faced with an engineering and management task involving these tradeoffs . This is almost always the case in information protection.
Legal Issues in Integrity Protection:
There are significant legal impediments to achieving integrity in computer systems. Historically, the basic premise of intellectual property rights is that in exchange for providing the public with access to intellectual property, recognition of authorship and exclusive rights to license are guaranteed. Recent changes in US copyright law, have made the copyright of software without a public record of the source code permissible. This makes it infeasible for other authors to identify violations of their rights, and effectively prevents the buyer from independently assuring the integrity of software. The provider's rights to intellectual property are protected, but the society does not benefit from the advances of its individuals.
Another legal impediment to integrity is the exclusion of software from the implied warranties of sale (i.e. suitability for a particular purpose, and merchantability) which is usually associated with products unless explicitly waived. The fact that most purchased software explicitly waives any warranty of any kind, implies that the author refuses to take responsibility for the contents. When combined with the special copyright provisions granted to software, this provides a situation in which the provider of a service need not grant access to assure the integrity of the provided service, need not assure the consumer of the suitability of that service, need not reveal the techniques used in that service to others wishing to augment it, and need not take any responsibility for any ill effects of that service . This is a sure recipe for disaster.
Although a significant number of information related laws have arisen over the last several years, simply writing and launching a virus attack may not be illegal. Sufficient notice is required on computer systems in order to get civil judgements or criminal indictments in some cases. To the extent that a virus spreads passively, it's spread may not be the legal responsibility of its creator. Some viruses are quite useful, and viral evolution is as powerful as Turing machine computation as a computing tool , so their complete elimination may be a misguided venture. The legal situation is certainly complex, and has not been adequately addresses either in individual countries, or in the international community.
Social Issues in Integrity Maintenance:
THE OPEN RESEARCH COMMUNITY - The research community as a whole, and the 'open' research community in the US in particular has been irresponsible in its blatant disregard for information protection. This appears to come from a widely held misconception that protection means government control and limiting the open exchange of information. In fact, the opposite is true. The purpose of protection is to assure that the right information gets to the right place at the right time. Fulfilling this purpose is central to the proper operation of our research community as well as our society as a whole.
The closed minded view of the open research community falls apart even in a cursory examination. If you ask the members of this community whether they would mind other researchers being able to examine their pre-publication papers without their knowledge or permission, they say that privacy is central to the legitimacy of the research process . The 60,000 or so computer users who's files were searched to track down the Internet attack apparently did not have a reasonable expectation of privacy. The same process has been used to identify 'subversive' viewpoints. It was also used to look for offensive words (e.g. 'virus') and deny access to users who used them .
This closed minded attitude by the open community is self propagating in the educational system (a social virus ) . The open community has few refereed journals on protection issues, so protection researchers cannot move up the ladder in the educational community as well as those in other fields . The US National Science Foundation reviews of protection research proposals have no designated place to be sent, and are often routed to inappropriate reviewers who do not know the subject at all. NSF proposals in protection are legally required to be sent through the National Security Agency for review, unlike other research proposals. Computer virus researchers have a very hard time finding environments where legitimate experiments are allowed, and several early feasibility studies were shut down without a reason being given . Protection researchers are often accused of launching attacks whenever a system fails, but historically, without protection research, systems would fail far more often. Almost no legitimate research environments are available in the open community for studying attacks and defenses. The Internet attack of 1988 would almost certainly never have happened if a legitimate research facility were available to its author.
If you ask the open research community whether others should be allowed to change their stored information without permission, they say no. If you ask whether attackers should be able to cause their computers to stop processing, they say no. If you ask whether others should be privy to their trade secrets without paying consulting fees, they say no. The examples go on and on, but if you ask whether research in information protection is appropriate in their environment, they virtually all answer that it is not. Timesharing cannot work without protection being addressed, information theory came from protection research, protection issues dominate parallel processing architecture, and the whole field of fault tolerant computing is based on protection from random faults . The open community is in fact quite closed to certain ideas, and this has had a devastating effect on integrity in our information systems.
PUBLIC PERCEPTION - The general public has historically had the perception that computers are perfect, and that perception has been promulgated to a large extent by the computing community. If a computer says 1+1=3, we tend to believe it, but if a person made the same error, it would be intolerable. The early IBM PCs had such a flaw in an arithmetic operation, but were trusted nonetheless. Once the flaw was detected, a software fix was provided, but nobody examined the side effects of this problem, and any resulting corruptions will likely never be corrected.
Informal surveys indicate that about 50% of all telephone bills have at least one error per year . Fraudulent use of computer based financial systems cost billions of dollars per year. Minor bank transaction errors are commonplace. According to a major world insurance broker , over 4% of the GNP of most industrialized nations is lost each year due to protection problems in computers. Over 5% of the corporate computing budget in most major firms is spent on information protection, while less than 0.01% of the time spent in computer education is on that topic . Each of these are integrity problems that substantially harm the public at large and that are not widely known.
LACK OF ADEQUATE TOOLS - Protection ignorance is widespread, and the lack of adequate research support has resulted in inadequate tools for maintaining protection even when protection capabilities are available. According to recent surveys , over 80% of the protection systems in industry and over 90% of the protection systems in government are not used properly. Deeper investigation shows that the tools provided to a typical secretary for word processing are far more sophisticated and user friendly than the tools provided to a systems administrator who is responsible for maintaining protection for 100,000 users .
Even with the best tools, we will have major problems until we learn how to use them. Protection is something that you do, not something that you buy. You cannot keep your house dry forever by buying the best available roof. You must maintain that roof, or it will ultimately leak. The same is true in information protection . Without a deep understanding of the issues and ongoing effort, we effectively guarantee ongoing corruption, violations of privacy, denial of services, and lack of accountability.
We must change as a society to demand more integrity from our automated systems and we must change as a technical community to provide it. The field of information protection must no longer be treated as a trivial and nontechnical field by the open research community. In the field of information protection, ignorance is not bliss, it is suicide.
Summary, Conclusions, and Further Work:
Viral attacks are easy to develop in a very short time, leave few if any traces in most current systems, and are effective against modern security policies for multilevel usage. Their potential for harm is severe, and they can spread very quickly through a computer system or network. They present a widespread and immediate threat to current systems and networks.
The proper use of recently developed techniques can dramatically reduce the risks of computer virus damage, and tools exist for exploiting these results. Although the technical aspects of viral defense have some hope of prevailing, the social problems of integrity protection seem much more difficult to resolve. The present situation is a recipe for disaster, but the social climate is slowly changing as a result of recent events. Viruses and other corruption mechanisms will continue to take their toll until integrity is viewed as the major requirement in information systems. It is only by making integrity a social priority, that we can hope to eradicate the corruptions that are running rampant throughout modern information systems.
 F. Cohen, "Computer Viruses - Theory and Experiments", DOD/NBS 7th Conference on Computer Security, originally appearing in IFIP-sec 84, also appearing in IFIP-TC11 "Computers and Security", V6(1987), pp22-35 and other publications in several languages
 J. P. Anderson, "Computer Security Technology Planning Study", USAF Electronic Systems Division, #ESD-TR-73-51, Oct 1972, (Cited in Denning)
 R. R. Linde, "Operating System Penetration", AIFIPS National Computer Conference, pp 361-368, 1975
 D. E. Bell and L. J. LaPadula, "Secure Computer Systems: Mathematical Foundations and Model", The Mitre Corporation, 1973 (cited in many papers)
 D. E. Denning, "Cryptography and Data Security", Addison Wesley, 1982
 E. J. McCauley and P. J. Drongowski, "KSOS - The Design of a Secure Operating System", AIFIPS National Computer Conference, pp 345-353, 1979
 G.J. Popek, M. Kampe, C.S. Kline, A. Stoughton, M. Urban, and E.J. Walton, "UCLA Secure Unix", AIFIPS, National Computer Conference 1979, pp355-364
 B. D. Gold, R. R. Linde, R. J. Peeler, M. Schaefer, J.F. Scheid, and P.D. Ward, "A Security Retrofit of VM/370", AIFIPS National Computer Conference, pp335-344, 1979
 C. E. Landwehr, "The Best Available Technologies for Computer Security", IEEE Computer, V16#7, July, 1983
 B. W. Lampson, "A note on the Confinement Problem", Communications of the ACM V16(10) pp613-615, Oct, 1973
 K. J. Biba, "Integrity Considerations for Secure Computer Systems", USAF Electronic Systems Division (cited in Denning), 1977
.bp  K. Thompson, "Reflections on Trusting Trust", Turing award lecture, 1984, CACM, Aug, 1984
 F. Cohen, "Computer Viruses", PhD Dissertation, University of Southern California, 1986, ASP Press (PO Box 81270, Pittsburgh, PA 15217 USA)
 A. Dewdney, "Computer Recreations", Scientific American, 1986
 F. Cohen, "Computer Security Methods and Systems", 1984 Conference on Information Systems and Science, Princeton University, 1984
 M. Pozzo and T. Gray, "Managing Exposure to Potentially Malicious Programs", Proceedings of the 9th National Computer Security Conference, Sept. 1986
 F. Cohen, "A Secure Computer Network Design", IFIP-TC11 Computers and Security, V4#3, Sept. 1985, pp 189-205, also appearing in AFCEA Symp. and Expo. on Physical and Electronic Security, Aug. 1985
 F. Cohen, "Protection and Administration of Information Networks Under Partial Orderings", IFIP-TC11 Computers and Security, V6(1987) pp118-128
 F. Cohen, "Design and Administration of Distributed and Hierarchical Information Networks Under Partial Orderings", IFIP-TC11 Computers and Security, V6(1987), 15 pages
 M. Pozzo and T. Gray, "Computer Virus Containment in Untrusted Computing Environments", IFIP/SEC 4th International Conference on Computers and Security, Dec. 1986
 F. Cohen, "Design and Administration of an Information Network Under a Partial Ordering - A Case Study", IFIP-TC11 Computers and Security, V6(1987) pp332-338
 F. Cohen, "Designing Provably Correct Information Networks with Digital Diodes", IFIP-TC11 Computers and Security, 1988
 F. Cohen, "A Cryptographic Checksum for Integrity Protection in Untrusted Computer Systems", IFIP-TC11 Computers and Security, V6(1987).
 F. Cohen, "Two Secure Network File Servers", IFIP-TC11, Computers and Security, 1987
 M. Pozzo and T. Gray, "An Approach to Containing Computer Viruses", IFIP-TC11 Computers and Security, 1987
 B. Cohen and F. Cohen, "Error Prevention at a Radon Measurement Service Laboratory", Radiation Protection Management, V6#1, pp43-47, Jan. 1989
 F. Cohen, "A Complexity Based Integrity Maintenance Mechanism", Conference on Information Sciences and Systems, Princeton University, March 1986
 F. Cohen, "Recent Results in Computer Viruses", Conference on Information Sciences and Systems, Johns Hopkins University, March 1985
 F. Cohen, "Maintaining a Poor Person's Integrity", IFIP-TC11 Computers and Security, 1987
 W. Murray, "The Application of Epidemiology to Computer Viruses", Computers and Security, Computers and Security, 1989.
 H. Highland - ED, Special Issue of "Computers and Security", April, 1988, IFIP TC-11
 V. McLellan, "Computer Systems Under Seige", The New York Times, Sunday, Jan. 31, 1988.
 J F Shoch and J A Hupp, "The 'Worm' Programs - Early Experience with a Distributed Computation", CACM pp172-180, March, 1982
 A. Dewdney, "Computer Recreations", Scientific American, V250#5, pp14-22, May, 1984
 J.B. Gunn, "Use of Virus Functions to Provide a Virtual APL Interpreter Under User Control", CACM, pp163-168, July, 1974
 L. J. Hoffman, "Impacts of information system vulnerabilities on society", AIFIPS National Computer Conference, pp461-467, 1982
 Kaplan, [U.S. Dept. of Justice, Bureau of Justice Statistics] "Computer Crime - Computer Security Techniques", U.S. Government Printing Office, Washington, DC, 1982
 M. H. Klein, "Department of Defense Trusted Computer System Evaluation Criteria", Department of Defense Computer Security Center, Fort Meade, Md. 20755, 1983 DOD-CSC-84-001
 A. Turing, "On Computable Numbers, with an Application to the Entscheidungsproblem", London Math Soc Ser 2, 1936
 S. Yau and R. Cheung, "Design of Self Checking Software", Conference on Reliable Software, IEEE, 1975, pp450-457
 J. Kelly and A. Avizienis, "A Specification Oriented Multi-Version Software Experiment", IEEE Symposium on Fault Tolerant Computing pp120-126, 1983
 R. Scott, J. Gault, D. McAllister, and J. Wiggs, "Experimental Validation of Six Fault Tolerant Software Reliability Models", IEEE Symposium on Fault Tolerant Computing, pp102-107, 1984
 L. Chen and A. Avizienis, "N-version programming: a fault tolerance approach to reliability of software operation", FTCS-8, pp 3-9, June, 1978
 L. Chen, "Improving Software Reliability by N-version Programming", UCLA Computer Science Dept, UCLA-ENG-7843, 1978
 Randell, "System Structure for Software Fault Tolerance", IEEE Transactions on Software Engineering, June 1975, pp220-223, Vol.SE-1
 M. Harrison, W. Ruzzo, and J. Ullman, "Protection in Operating Systems", CACM V19#8, Aug 1976, pp461-471
 AT+T, "The Unix System Programmer's Reference Manual"
 M. Cohen, "A New Integrity Based Model for Limited Protection Against Computer Viruses", Masters Thesis, The Pensylvania State University, College Park, PA 1988.
 F. Cohen, "Models of Practical Defenses Against Computer Viruses", IFIP-TC11, "Computers and Security", V7#6, December, 1988.
 F. Cohen, "Automated Integrity Maintenance for Viral Defense", IFIP-TC11, "Computers and Security", 1990
 DPMA 2nd annual computer virus symposium, New York, NY, 1989.
 S. Jones and C. White Jr., "The IPM Model of Computer Virus Management", IFIP-TC11, (submitted 1989).
 F. Cohen, "The ASP 3.0 Technical Users Manual", ASP Press, 1990 (PO Box 81270, Pittsburgh PA 15217, USA)
 F. Cohen, "ASP 3.0 - The Integrity Shell", Information Protection, V1#1, January 1990, ASP Press, 1990 (PO Box 81270, Pittsburgh PA 15217, USA)
 Y. J. Huang and F. Cohen, "Some Weak Points of One Fast Cryptographic Checksum Algorithm and its Improvement", IFIP-TC11 "Computers and Security", V8#1, February, 1989
 M. Prew, "Minimizing the Impact of Computer Crime on your Earnings", Wigham Poland (Corporation of Lloyds), 1984
 H. Highland, "Computer Virus Handbook", Elsevier, 1990.
 S. White, "A Status Report on IBM Computer Virus Research", Italian Computer Virus Conference, 1990.
 L. Adleman, "An Abstract Theory of Computer Viruses", "Lecture Notes in Computer Science", V403, Advances in Computing - Proceedings of Crypto-88, S. Goldwasser, Ed., Springer-Verlag, 1990
 F. Cohen, "A Short Course on Computer Viruses", ASP Press, 1990 (PO Box 81270, Pgh PA 15217, USA)
 E. Spafford, "Crisis and Aftermath", CACM, V32#6, June, 1989.
 J. Rochlis and M. Eichin, "With Microscope and Tweezers: The Worm from MIT's Perspective", CACM, V32#6, June, 1989.
 Newspaper and personal accounts of the IBM computer virus attack of 1987.
 F. Cohen, "Some Simple Advances in Protection Tools", Information Protection, V1#5-9, June-Oct 1990, ASP Press (PO Box 81270, Pittsburgh PA 15217, USA)
 W. Gleissner, "A Mathematical Theory for the Spread of Computer Viruses", "Computers and Security", IFIP TC-11, V8#1, Jan. 1989 pp35-41.
 Tipet, "The Tipet Theory of Computer Virus Propagation", Foundationware, USA.
 C. Shannon, "A Mathematical Theory od Communications", Bell Systems Technical Journal, 1949.
 F. Cohen, "Current Best Practice Against PC Integrity Corruption", Information Protection, V1#1, January 1990, ASP Press (PO Box 81270, Pittsburgh PA 15217)
 H. Pilewski and F. Cohen, "Copyright Laws", Information Protection, V1#9, Aug. 1990, ASP Press, (PO Box 81270, Pittsburgh PA 15217)
 F. Cohen, "Computer Viruses", chapter in "Computers Under Attack", ACM/Addison Wesley (1990)
 F. Cohen, "The Impact of Information Protection on Computer Engineering", Information Protection, V1#4, April, 1990, ASP Press, (PO Box 81270, Pittsburgh PA 15217)
 F. Cohen, "How To Do Sound Change Control and What It Costs", Information Protection, V1#6, June, 1990, ASP Press, (PO Box 81270, Pittsburgh PA 15217)
 F. Cohen, "A German Survey on Computer Security", Information Protection, V1N8, August, 1990.
 H. Gliss, "The Security of Information Resources - Results of a Survey", The Oxbridge Sessions, Holland, 1990.