All.Net

Fri Apr 8 06:47:17 PDT 2016

Redundancy: Fault model: What fault model is assumed for analysis of redundancy?

Options:

Option 1: No fault model analysis
Option 2: Single point of failure analysis
Option 3: Multiple-failure modes analysis
Option 4: Common mode failure analysis
Option 5: Cascade failures and interdependency analysis
Option 6: Fault and failure graph analysis

Option A: Hidden faults are accounted for
Option B: Within faults are considered
Option C: Independent faults are considered
Option D: Limited interdependent faults are considered
Option E: All interdependent faults are considered

Basis:

Fault models are assumptions that form the basis for analysis and decisions regarding the use of redundancy in systems. As such, they lie at the heart of any analytical process involving the need for and use of redundancy.

No fault model:
In many cases, no fault model is used for analytical purposes, leading to an approach in which judgment and estimates are made without detailed analysis. In low consequence situations where the analysis may be more costly than the consequences of faults, or where system failures are not important enough to justify fault analysis, this is reasonable, although it leads to less reliable operational systems.

Single point of failure only analysis:
Most fault analysis is based on identification and selective elimination of single points of failure, depending on the cost of mitigation and the consequence of the failure. Methods like fault tree analysis are used and fault assumptions like "stuck-at", "bridging" and/or "transient" faults are made for analytical purposes.

Multiple-failure modes analysis:
Analysis of and compensation for multiple failures is essentially never done for a complete system, however, multiple failure modes are analyzed for some medium and many high consequence subsystems, such as select control systems on aircraft. This sort of analysis is usually limited to specific fault assumptions for specific subsystems and specific classes of common-mode failures.

Common mode failure analysis:
Common mode failures occur when some commonality between otherwise unrelated components is exercised such that it causes these otherwise unrelated components to fail simultaneously. For example, a fiber optic cable in proximity to a copper cable may experience a common mode failure when the same backhoe cuts both of them. The unlimited nature of potential commonality between all sets of things makes it infeasible to anticipate and protect against all common mode failures, but many such failures may be easily avoided once identified. For example, redundant communications cables should not run through common wire runs and should be separated by some distance associated with the size of holes dug by backhoes.

Cascade failures and interdependency analysis:
Analysis of and compensation for cascade failures is based on identification of interdependencies that may produce sequences of events in which dependent system fail because of failures in systems they depend upon. This is done recursively until it reaches either the underlying physics of the world or exhausts the willingness of the organization to consider further. Generally, analysis may include {internal and/or external} x {limited | comprehensive} x {recursive to level} interdependencies.

Fault and failure graph analysis:
To the extent that more comprehensive understanding is desired, the generalization of fault modeling and analysis is to consider all event sequences with potentially serious negative consequences and model the sequential system behavior in this context. All sequences include all fault models associated with all components of the composite and identified numbers of {simultaneous / sequential} events with timing. Such analysis is, in general, too complex to ever be thoroughly performed, however, simulation methods are sometimes used to provide runs through the space at a defined level of granularity, particularly to compare architectural or design alternatives. This may be generalized to a more real-time view of model-based situation anticipation and constraint. This method, if properly undertaken, includes all of the analytical methods of other analysis techniques, subsumed into the overall graph approach.

Hidden faults are accounted for:
Hidden faults are faults not normally exposed because redundancy covers them. These faults can lie undetected until a second fault occurs, leading to a failure from lack of adequate and planned redundancy. To account for them it is usually necessary to expose them for testing or otherwise find ways to identify or mitigate them.

Within faults are considered:
Faults within the area being reviewed are in scope. For example, if an enterprise is being considered, systems and mechanisms within the enterprise are within this scope.

Independent faults are considered:
Independent and seemingly unrelated faults may combine to cause failures. Analysis in this case should consider independent faults such as simultaneous power failures of two completely unrelated systems with no link between their power sources or mechanisms. Pure coincidence.

Limited interdependent faults are considered:
Interdependent faults, such as cascade failures identified above, are related but typically not all within the specific scope of the review. In other words, if an enterprise is being reviewed, external interdependencies, such as the DNS hierarchy and external power supply are considered.

All interdependent faults are considered:
In this case, an attempt to be complete in the review of interdependencies is to be undertaken. This ranges from the instantaneous to the long-term strategic (e.g., the education system is not producing enough experts so that in 30 years we won't have enough experts in power systems to operate the regional power grid.)