In the wake of an incident, several actions should take place. These actions can be summarized as follows:
All four steps should provide feedback to the site security policy committee, leading to prompt re-evaluation and amendment of the current policy.
Removing all vulnerabilities once an incident has occurred is difficult. The key to removing vulnerabilities is knowledge and understanding of the breach. In some cases, it is prudent to remove all access or functionality as soon as possible, and then restore normal operation in limited stages. Bear in mind that removing all access while an incident is in progress will obviously notify all users, including the alleged problem users, that the administrators are aware of a problem; this may have a deleterious effect on an investigation. However, allowing an incident to continue may also open the likelihood of greater damage, loss, aggravation, or liability (civil or criminal).
If it is determined that the breach occurred due to a flaw in the systems' hardware or software, the vendor (or supplier) and the CERT should be notified as soon as possible. Including relevant telephone numbers (also electronic mail addresses and fax numbers) in the site security policy is strongly recommended. To aid prompt acknowledgment and understanding of the problem, the flaw should be described in as much detail as possible, including details about how to exploit the flaw.
As soon as the breach has occurred, the entire system and all its components should be considered suspect. System software is the most probable target. Preparation is key to recovering from a possibly tainted system. This includes checksumming all tapes from the vendor using a checksum algorithm which (hopefully) is resistant to tampering . (See sections 184.108.40.206, 220.127.116.11.) Assuming original vendor distribution tapes are available, an analysis of all system files should commence, and any irregularities should be noted and referred to all parties involved in handling the incident. It can be very difficult, in some cases, to decide which backup tapes to recover from; consider that the incident may have continued for months or years before discovery, and that the suspect may be an employee of the site, or otherwise have intimate knowledge or access to the systems. In all cases, the pre-incident preparation will determine what recovery is possible. At worst-case, restoration from the original manufactures' media and a re-installation of the systems will be the most prudent solution.
Review the lessons learned from the incident and always update the policy and procedures to reflect changes necessitated by the incident.
Before cleanup can begin, the actual system damage must be discerned. This can be quite time consuming, but should lead into some of the insight as to the nature of the incident, and aid investigation and prosecution. It is best to compare previous backups or original tapes when possible; advance preparation is the key. If the system supports centralized logging (most do), go back over the logs and look for abnormalities. If process accounting and connect time accounting is enabled, look for patterns of system usage. To a lesser extent, disk usage may shed light on the incident. Accounting can provide much helpful information in an analysis of an incident and subsequent prosecution.
Once the damage has been assessed, it is necessary to develop a plan for system cleanup. In general, bringing up services in the order of demand to allow a minimum of user inconvenience is the best practice. Understand that the proper recovery procedures for the system are extremely important and should be specific to the site.
It may be necessary to go back to the original distributed tapes and recustomize the system. To facilitate this worst case scenario, a record of the original systems setup and each customization change should be kept current with each change to the system.
Once you believe that a system has been restored to a "safe" state, it is still possible that holes and even traps could be lurking in the system. In the follow-up stage, the system should be monitored for items that may have been missed during the cleanup stage. It would be prudent to utilize some of the tools mentioned in section 18.104.22.168 (e.g., COPS) as a start. Remember, these tools don't replace continual system monitoring and good systems administration procedures.
As discussed in section 5.6, a security log can be most valuable during this phase of removing vulnerabilities. There are two considerations here; the first is to keep logs of the procedures that have been used to make the system secure again. This should include command procedures (e.g., shell scripts) that can be run on a periodic basis to recheck the security. Second, keep logs of important system events. These can be referenced when trying to determine the extent of the damage of a given incident.
After an incident, it is prudent to write a report describing the incident, method of discovery, correction procedure, monitoring procedure, and a summary of lesson learned. This will aid in the clear understanding of the problem. Remember, it is difficult to learn from an incident if you don't understand the source.
Security is a dynamic, not static process. Sites are dependent on the nature of security available at each site, and the array of devices and methods that will help promote security. Keeping up with the security area of the computer industry and their methods will assure a security manager of taking advantage of the latest technology.
Keep an on site collection of books, lists, information sources, etc., as guides and references for securing the system. Keep this collection up to date. Remember, as systems change, so do security methods and problems.
Form a subgroup of system administration personnel that will be the core security staff. This will allow discussions of security problems and multiple views of the site's security issues. This subgroup can also act to develop the site security policy and make suggested changes as necessary to ensure site security.
If an incident is based on poor policy, and unless the policy is changed, then one is doomed to repeat the past. Once a site has recovered from and incident, site policy and procedures should be reviewed to encompass changes to prevent similar incidents. Even without an incident, it would be prudent to review policies and procedures on a regular basis. Reviews are imperative due to today's changing computing environments.
A problem reporting procedure should be implemented to describe, in detail, the incident and the solutions to the incident. Each incident should be reviewed by the site security subgroup to allow understanding of the incident with possible suggestions to the site policy and procedures.