All.Net

Fri Apr 8 06:49:41 PDT 2016

Management: Testing: What should the protection testing function do and cover?

Options:

Option 1: Protection testing provides verification that protection does what it is supposed to do.
Option 2: Fault models are used to generate and evaluate tests.
Option 3: Coverage of tests are measured against the fault model.
Option 4: Testing periods are based on system risk levels.
Option 5: Systems containing authoritative high-valued content are NOT tested during operational periods.

Decision:

The following table indicates advised approaches to be used.

Issue	High risk	Medium risk	Low risk
Protection testing provides verification that protection does what it is supposed to do.	Yes	Yes	No
Fault models are used to generate and evaluate tests.	Yes	Yes	No
Coverage of tests are measured against the fault model.	Yes	Yes	No
Testing periods are based on system risk levels.	Yes	Yes	Yes
Systems are NOT tested during operational periods.	Yes	No	No

Coverage of the protection testing program

Basis:

Protection testing provides verification that protection does what it is supposed to do.
Protection testing has the objective of matching the defined goals of the controls in place with the reality of the controls in place. As such, its purpose is not to determine whether the program is what it should be, but rather to try to refute the assertion that the protection program does what it claims to do. The other approaches to "protection testing" are not in fact testing at all. They are typically something like known vulnerability scanning, verification, and so forth. All of value in their own right, but often mislabeled as testing.

Fault models are used to generate and evaluate tests.
Fault models are developed to create the basis for identifying the difference between a desired and undesired test outcome and to identify the class of faults that tests might be able to uncover. Without a fault model, testing is shooting in the dark without a clear target. With a fault model, it is possible to determine whether or not the tests are meaningful, redundant, and to what extent they provide "coverage".

Coverage of tests are measured against the fault model.
Coverage is a measurement against the fault model used to express the percentage of faults that the tests would detect if present or determine not to be present if they were not present. As such, it allows the tester to gain and provide clarity around the diagnostic utility of the tests for determining that the controls are in fact working as desired.

Testing periods are based on system risk levels.
The time taken to perform a test depends on the coverage of the test, the size of the test set, and the time per test. Since complete coverage of most fault models in most cases takes a very long time, periodicity of testing is traded off with coverage and test complexity. The tradeoff is inherently limited by the risk of the control failing without that failure being noticed. Hence, the periodicity of the test process is driven by the exposure from undetected control failure which then limits the coverage for the fault model and test times.

Systems containing authoritative high-valued content are NOT tested during operational periods.
Because systems with high consequences of failure can fail because of a test, testing is often limited to test systems that are as close as possible to operational systems (for validity) or limited to testing during non usage periods such as maintenance windows (when the consequences cannot be induced). It is also important that after testing the unit under test be put back into its proper operating (i.e., original) condition and that such condition be properly verified before going operational. Otherwise, residual effects of the test may produce the potentially serious negative consequences.

Here is an interesting paradox (restated from Randy Pratt's original version) of protection testing live systems with potentially serious negative consequences:

The purpose of a test is to determine if the system is "safe" or "unsafe" under the test conditions.
To authorize the test on a live system, we have to assume the test is "safe". Otherwise it is "unsafe" because we may induce the potentially serious negative consequences.
If we assume it is "safe", the test has no value. If we assume it is "unsafe", we cannot perform the test. If we don't know whether it is "safe" or "unsafe" we cannot authorize the test.