Fault Tolerant Design
SAN Recovery - Fault Tolerant Design
In computing, a fault-tolerant design refers to a computing design that enables a system to continue its operation partly rather than failing completely in case a part of the system fails to work. In such instances, the system may put forth a reduced level of performance. This state of performance degradation is known as graceful degradation.
Fault-tolerant design in IT adopts a distributed system approach. A distributed system consists of a non-centralized network involving multiple computers that communicate with one another, but appears to users as part of a single storage. This model incorporates key fault-tolerant aspects such as redundancy, replication and diversity.
Most of the fault-tolerant systems are ‘single point tolerant,’ meaning they are provided with a single backup. In such systems, broken parts are swapped with the new ones while the system is still operational. Such process is known as ‘hot swapping.’
Fault-tolerant design increases the reliability of the system. In case of a partial failure, the system may still continue to be more or less fully operational. A reduction in throughput or increase in the response time may occur as a result of performance degradation.
One of the best applications of fault-tolerant systems is the NonStop systems built by Tandem Computers. These systems were single-point tolerant, and they had uptimes measured in years.
Fault-tolerant designs are not an option for all applications. They are implemented in critical computing applications which demands higher availability. Fault-tolerant systems are expensive. They may also interfere with fault detection in other components of the system.


