Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

When multiple SBs occur, synchronization between victim nodes does not occur, so it can be resolved by establishing a connection with --discard-my-data from all victim nodes.

Fault

The FSR defines the following error conditions as failures, and if such failures occur, follow-up actions must be manually performed.

...

Describes the subsequent response to the following failure situations

  • Disk failure
  • File I/O errorerrors in the FSR engine

Disk

...

failure

Failures can occur on the replication target disk itself, such as when the volume on which the replication target resides is located is unintentionally unmounted during operation , or a problem occurs in when the storage medium itself fails due to physical damage. In this case, the user restores should take steps to repair the replication target again and so that the volume device is restored again. Actions must be taken to make it operational. When manual recovery up and running again. Once the manual repair is complete, you will need to restart replication with a full sync.

...

replication should be restarted with a full synchronization.

Monitoring disk health

FSR periodically monitors the health of the disks to detect if something is wrong with them. This is based on S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) technology, and the frequency of the monitoring can be specified as follows.

Info
"disk": {
    "health": {
        "period": 10
    }
},


Engine file I/O errors

File I/O can cause errors in various fail in a variety of situations, such as including errors due to file path problems and permission problems according to accountsissues and account permissions. Although I/O errors do not occur frequently, it is a common situation that can occur occurrence during service operation when there is an unintended environment change or an error is caused by an application that is not designed that they are caused by unintended changes in the environment or by applications that are not written to respond flexibly to an exception situationexceptional situations. When an I/O error occurs, applications are expected to handle exceptions for the exception situationperform exception handling for that exception, and the subsequent actions are handled differently depending behavior depends on the application. In this way, errors in file I/O errors caused by source-side applications are regarded viewed as general normal file I/O errors that can occur at any time and are not considered as failures.

However, if an error occurs in the file I/O performed by the fsr engine, it becomes is a failure. If fsr is unable to do file I/O, the mirroring operation is essentially disabled impossible and replication stops immediately.

The error code of the codes for I/O error occurring in errors encountered by the fsr engine is recorded in the logare logged, and the cause of the error can be estimated from the error code. This requires From there, the administrator to must manually recover from repair the failure and normalize the fsr's I/O on the fsr. In the a normalized environment, the resource is resources are finally restarted and replication is resumed by performing full synchronizationa full sync is performed to resume replication.


Check Disk

Physical errors in on disk volumes are difficult to recover due to repair because they are caused by damage to the media itself, but logical errors at the file system filesystem level can be inspected checked or repaired through with a utility (chkdsk in on Windows, fsck in on Linux).

In general, in the case of using such a utility, it is It is usually safe to use these utilities after the volume after unmountinghas been unmounted, and if there was logical error detection and recovery accordingly during this inspection processlogical faults are detected and subsequently repaired during this check, the volume must do a full synchronization after restarted be spun up again as a replication resource for consistency with the targetand given a full synchronization to ensure consistency with the target.


File Lock


File Handle Closing Error

During the file locking process, there is a procedure to clean up the handles of files that are already open among the files to be replicated. This section explains what to do if the following error message occurs when performing this procedure.

Info

ERR handle closed error="attach: operation not permitted" exec=handle group= key=2 name=/opt/data/b/1234.txt node=b pid=76716 resource=r0

The above error only occurs on Linux and is caused by the ptrace utility not being authorized to perform that control, and to resolve it you will need to adjust your system's permission settings. If the value of /proc/sys/kernel/yama/ptrace_scope is set to 3, you will need to adjust it to a value between 0 and 2, and reboot after adjusting the setting. If you are unable to adjust the value of the ptrace_scope setting on your system, you will need to manually kill all processes that are opening files.