Increasing Reliability by the Use of Redundant Machines
Abstract
The improvement of reliability and availability through redundancy of entire machines rather than of components is investigated. An attempt is made to break down the cost of operating a digital computer, and to determine the relationship between cost and system failure. Three specific cases are discussed. Case 1: Where n machines are operated independently, processing the same input data. The output is taken from a single one of them; if this machine fails, the output is promptly switched to a machine which is operating properly. As soon as repairs can be completed, the machine which had failed is returned to operation. System failure occurs only when all n machines are in the failed condition at the same time. A penalty cost is assessed for system failure, this cost being proportional to the system down-time. Case 2: Where n machines are operated as in Case 1, except that any machines which fail are not returned to operation until the beginning of the next operating period. Penalty cost for system failure is assessed in the same way as in Case 1. Case 3: Where n machines are operated as in Case 2, but where the penalty cost for system failure is a fixed amount and is independent of the resulting down-time. COPYRIGHT © 1959—THE INSTITUTE OF RADIO ENGINEERS, INC.