Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The error handling policy can be applied in real time through the adjust command even if the resource is in operation.

Info

Disk failures occur I/O errors happen more often than expected, based on experience in operating replication services. These results depend on the sub-disk layer. This means that errors in the disk layer, that is, the standard SCSI layer, can occur at any time and at any time, which means that they must be handled separately from the stability of the disk layer and be flexible in terms of replication. The detach you might think. This means that BSR replication must be able to flexibly cope with these disk I/O errors from the replication side as well, given that it depends on the lower disk layer, and errors in the SCSI layer can occur at any point in time. The DETACH policy, which has been provided as a disk failure policy, was is a policy where replication was unilaterally stopped that unilaterally stops replication at a specific certain point in time in terms of service operation. This method is also difficult to post-recovery and disadvantageous in terms of continuing service operation. We devised a passthrough policy to solve this problem and set it as the default policy for bsr. The pass-through policy records the OOS for the block when , which is disadvantageous from a service operation point of view, as it is difficult to recover after the fact and is also disadvantageous from a service operation continuation point of view. We devised the PASTHROUGH policy in response to these issues and set it as the default policy for BSR. When an I/O error occurs and delivers , the passthrough policy records an OOS for the block and forwards the failed I/O result to the file system. At this time, if the file system succeeds by retrying to write to the block where the error occurred and resolves the OOS through this, it will lead to the temporary overcoming of the disk layer error. Although OOS cannot be completely resolved depending on the operating characteristics of the file system, some filesystem. If the filesystem then rewrites the failed block to clear the OOS, this will encourage the filesystem to overcome the transient disk-layer error on its own. Even if the filesystem does not completely resolve the OOS due to its behavior, some of the remaining OOS can be resolved by resynchronizing through connection retriesby resynchronization, such as by retrying the connection. In other words, the pass-through policy induces encourages the Filesystem FS to resolve the error block by itself or through synchronization, and basically ensures essentially guarantees that the service continues will continue to operate even if there is a problem with the disk I/O problem.

Temporary error handling

...