Operatings

When the configuration file is ready, it moves to the step of operating the replication. Operation examples such as resource start, stop, synchronization/replication, and transfer are explained in sequence.

Replication operation is performed on a per resource basis.

Resource Up

Before starting the resource for the first time, you must perform an initialization process that creates metadata. Meta data initialization is performed only once when the resource is initially started.

fsradm meta create [resource name] {–force | -f}

Load the resource for which metadata was created as a replication target.

fsradm attach [resource name]

The loaded resource is in a neutral state that has not yet tried to connect with the other node. Try to connect to the replication through the connect command.

fsradm connect [resource name] [peer node name]

You can perform attach and connect sequentially through the up command. Usually, you use the up command to start up resources.

fsradm up [resource name]

Resource Down

You can disconnect the resource with the disconnect command.

fsradm disconnect [resource name] [peer node name]

Detach disconnected resources.

fsradm detach [resource name]

You can disconnect and detach sequentially with the down command. Usually, the down command is used to stop the resource.

fsradm down [resource name]

Synchronizations

Initial Sync

When a replication connection is established by starting up the resources of both the source and target nodes, it waits in the state before starting synchronization. It is an equilibrium state where the direction of initial synchronization is not determined. In this state, the initial synchronization starts by promoting the resource role of the node to be the source to Primary. As soon as synchronization starts, if there is a change in the source-side data, the change is also replicated in real time. FSR essentially performs synchronization and replication simultaneously.

The command to promote a resource is:

fsradm primary [resource name]

Local files at the time of initial synchronization are defaulted to the Inconsistent state, which is not consistent with both nodes, so promotion is denied by default. During initial promotion, the user must explicitly inform the user that the resource will be sourced through forced (-f option) promotion.

c:\>fsradm primary r0
declined
  r0: not up to date

c:\>fsradm primary r0 -f
done

When the forced promotion is successful, the source node changes its file status to UpToDate and starts initial synchronization with target nodes connected to it.

Initial synchronization is performed for the entire fileset, but when synchronization is performed again after synchronization is completed, partial synchronization is performed only for the changes on the source side. For example, if the replication connection is disconnected and then reconnected after the initial synchronization, it proceeds to partial synchronization.

While synchronization is in progress, the target file status is Inconsistent, and when synchronization is complete, the source and target are in the UptoDate status. Inconsistent status is not up-to-date, so it is desirable to keep the Inconsistent status as short as possible in terms of replication operations.

Manually Sync

If you need to synchronize manually during operation, this is done via the invalidate-remote command. This command synchronizes peer nodes with local as a source.

c:\>fsradm invalidate-remote r0

The invalidate command is a command that synchronizes with the peer node as a source.

c:\>fsradm invalidate r0

Replications

As the secondary node is promoted and synchronization starts, if a real-time change occurs in the data of the source node, the change is automatically reflected in parallel. Replication is defined as an action that reflects real-time changes in local data to a target in real time, and proceeds from the primary node to the secondary node.

Even during synchronization and replication, the role of each node can be manually changed by user command, and replication is stopped when the primary node is demoted.

The command to demote the promoted resource is as follows.

fsradm secondary [resource name]

Replication is sourced from the node promoted to the Primary role, but synchronization occurs when synchronization is required regardless of role. Even if there are no changes to be replicated or replication is interrupted by demoting, if synchronization was in progress, synchronization will continue until completion.

Missing file

During replication after synchronization is complete, files that did not exist on the replication destination may suddenly be included in the replication destination path. These files are called missing files and can occur in the following operating situations.

When a file that was in the same volume device path that was not included in the replication target is introduced into the replication target path through a file move operation
When a file that was excluded as an exclusion pattern is included in the replication target again due to an exclusion pattern policy change

In the first case, the FSR cannot capture Filesystem I/O for that file, it only receives the rename of the file path, so it cannot be processed as a duplicate. In this case, the FSR maintains the replication status once and at the same time performs synchronization for the missing files individually and processes them. In the case of omission due to the second exclusion pattern change, it is basically treated as resynchronization because only the replication target is changed without file system I/O operation.

Orphan file

Unlike missing files, orphaned files are defined as files left without any connection to the target's replication path. This doesn't happen in normal duplication situations, but it happens when there is unintentional file manipulation in a situation where the target file is not protected.

When an orphaned file occurs, it is processed according to the FSR's orphaned file response policy, and basically, it is processed as a backup to a specific path of the target. You can also specify the option to process the deletion immediately without the need for backup.

Failover

Failover is usually defined as a procedure to overcome a failure situation. Failover mentioned here is a planned failover, which refers to the process of demoting the source node in the replication cluster and then changing the target node to the source node role to activate data for service.

Demote the resource on the source node.

c:\>fsradm secondary r0
done

Promote the target node's resource.

c:\>fsradm primary r0
done

If the promotion is successful, the transfer is considered complete.

Considerations

When switching over, the resource file status of the target node is guaranteed to be in the UpToDate status. If the target does not have the latest data due to disconnection of the replication connection, or if the resource of the target node is in an inconsistent state that is being synchronized, it is a state that does not match with the source, so you must limit the transfer.

FileLock

Files copied to the target must be protected from write I/O other than the mirroring data received from the source. Otherwise, data consistency to maintain a duplicate copy is not guaranteed. In particular, when operating HA, the secondary file lock must be activated to protect data.

File lock is generally activated in the secondary and deactivated in the primary depending on the role of the resource to operate as a target file protection function.

File lock can be set automatically according to the role of the resource through the auto_file_lock option in the nodes section of the resource, or can be manually activated or deactivated through the fsradm lock or unlock command.

Auto Lock

The auto_file_lock option is enabled by default. When a resource's role is demoted, the files are locked by default. To unlock locked files, you need to promote the role of the resource or unlock it via the unlock command.

Locking is automatic, but unlocking is not.

Manual Lock

You can also manually operate file locking by disabling the auto_file_lock option. To operate file lock manually, you must separately execute the lock command and the demote command as follows and follow the command sequence.

c:\>fsradm lock r0
done
c:\>fsradm secondary r0
done

If the -l option is specified, the above two commands can be processed as one demotion command. The order of commands is the same as above, locking first and then demoting.

c:\>fsradm secondary -l r0
done

Conversely, during the promotion process, the lock is released after the primary command.

c:\>fsradm primary r0
done
c:\>fsradm unlock r0
done

It can be processed in a single promote command using the -u option.

c:\>fsradm primary -u r0
done

When file locking is activated, write I/O to the duplicate file set is blocked. Therefore, all related applications and services must be terminated so that I/O to the file no longer occurs, and then the lock is performed. If this is not done, writes may be blocked while I/O is occurring, leading to I/O errors, or failing to flush the cache area of the application, resulting in loss of writing important data. When switching over, you must ensure that the file is locked after the application is completely closed.

조회

상태 조회

FSR의 상태를 fsradm status 명령을 통해 조회할 수 있습니다.

λ fsradm status all
r0 role:primary file:up_to_date pending:0 locked:false
  node2 state:repl_source peer-state:repl_target role:secondary file:up_to_date
    last-synced:2019-10-24T15:30:12+09:00
  node3 state:connecting peer-state:unknown role:secondary file:unknown
    last-synced:none

r1 role:secondary file:inconsistent pending:0 locked:false
  node2 state:connecting peer-state:unknown role:secondary file:unknown
    last-synced:none

상세 출력 옵션을 사용하면 더 많은 상태 정보를 조회할 수 있습니다.

λ fsradm status -v
r0:node1 role:primary file:up_to_date pending:0 locked:false
  last-promoted:2020-06-10T09:40:32+09:00
  node2 state:repl_source peer-state:repl_target role:secondary file:up_to_date
    repl-started:2020-06-10T09:40:32+09:00 last-synced:2020-06-10T09:40:33+09:00
  node3 state:connecting peer-state:unknown role:secondary file:unknown
    repl-started:2020-04-09T09:50:38+09:00 last-synced:2020-04-09T09:50:53+09:00

상태 조회를 지속하고 싶다면 --watch(-w) 와 --interval(-i) 옵션을 사용하여 상태를 모니터링할 수 있습니다.

λ fsradm status all -w -i 1
r0 role:secondary file:inconsistent locked:false
  node2 state:established peer-state:established role:secondary file:inconsistent
    last-synced:none
  node3 state:connecting peer-state:unknown role:secondary file:unknown
    last-synced:none

r1 role:secondary file:inconsistent locked:false
  node2 state:connecting peer-state:unknown role:secondary file:unknown
    last-synced:none

update every 1.0s. current executions: 84
press 'q' or 'ctrl+c' to quit...

파일 상태

복제 대상 파일의 복제 상태를 나타냅니다.

unknown 알 수 없는 상태. 연결되지 않은 상대 노드의 알 수 없는 파일 상태를 표현합니다.

fileless 복제 대상 미 적재 상태. attach 명령에 의해 attaching 상태로 전환합니다.

attaching 복제 대상 적재 중 상태. 적재 중 실패하면 failed, 적재 완료하면 consistent 또는 inconsistent 상태가 됩니다.

detaching 복제 대상 분리 중. 분리 완료하면 fileless 상태가 됩니다.

failed 복제 구성 실패 또는 파일 I/O 에러 발생 시 실패를 나타내는 상태.

inconsistent 데이터 순차성 보장 불가한 상태 또는 동기화 타겟의 파일 상태. 기본적으로 승격이 불가합니다.(강제 승격 가능)

consistent 데이터 순차성 보장하는 상태. 중간 상태이며 outdated 또는 up_to_date로 최종 전환됩니다.

outdated 과거 데이터 상태. 복제 타겟 상황에서 연결 단절이나 일시 중지 등에 의해 최신 데이터를 받지 못하게 될 경우의 상태. 기본적으로 승격이 불가합니다. (강제 승격 가능)

up_to_date 최신 데이터 상태. Primary이거나 복제 타겟일 경우의 상태입니다.

연결/복제 상태

양 노드가 연결 되기 까지의 상태는 연결 상태, 연결 수립 이후의 상태는 복제 상태로 정의됩니다. 다음의 상태들이 정의되어 있습니다.

standalone 중립 상태. 연결을 시도하지 않는 상태로 리소스의 초기 연결 상태에 해당합니다. connect 명령에 의해 connecting 상태로 전환됩니다.

disconnecting 연결이 단절되고 정리 중인 상태. standalone 또는 connecting 상태로 전환됩니다.

connecting 연결 시도 중 상태. 연결 시도 중 오류가 발생하면 standalone, 연결이 성공하면 connected 상태가 됩니다. 실제로는 소켓 계층에서 accept와 connect 가 동시에 시도되는 상태입니다.

connected 연결 성공하고 복제 네트워크에 대해 인증 중인 상태입니다. 인증이 성공하면 established, 인증이 실패하면 standalone 상태가 됩니다.

established 복제 인증 완료 상태. 연결 직후의 상태이며 Secondary 간 연결이 완료되었을 때의 기본 상태입니다. 동기화, 복제로 바로 이행하지는 않습니다. 이 상태에서 승격할 경우 sync_source 또는 repl_source가 되고 상대가 승격하면 sync_target 또는 repl_taret 이 됩니다.

sync_source 동기화 소스 상태. 동기화 일시 중지할 경우 sync_source_paused 상태, 동기화 완료시 repl_source 상태가 됩니다. 세컨더리간에 동기화가 완료될 경우엔 established 상태가 됩니다.

sync_source_paused 동기화 소스 일시 중지 상태. 동기화 재개할 경우 sync_source 상태가 됩니다.

sync_target 동기화 타겟 상태. 동기화 일시 중지할 경우 sync_target_paused, 동기화 완료 시 repl_target이 됩니다. secondary 간 동기화 완료는 established 상태가 됩니다.

sync_target_paused 동기화 타겟 일시 중지 상태. 동기화 재개할 경우 sync_target 상태가 됩니다.

repl_source 복제 소스 상태. 이 상태에서 강등할 경우 established 상태, 일시 중지할 경우 repl_source_paused, 동기화 시작 시 sync_source 상태로 전환합니다.

repl_source_paused 복제 소스 일시 중지 상태. 복제 재개 시 repl_source 상태가 됩니다.

repl_target 복제 타겟 상태. 이 상태에서 상대가 강등하면 established, 일시 중지할 경우 repl_target_paused, 동기화가 시작되면 sync_target 상태가 됩니다.

repl_target_paused 복제 타겟 일시 중지 상태. 복제가 재개되면 repl_target 상태가 됩니다.

성능 조회

fsradm perfmon 명령을 통해 성능을 조회할 수 있습니다.

c:\>fsradm perfmon r0

성능에 대한 조회는 콘솔화면에 결과를 출력하여 직접 확인하거나 조회결과를 파일로 저장하는 등 다음과 같이 몇 가지 옵션을 사용할 수 있습니다.

--json <filename> JSON 파일 경로 지정
--csv <filename> CSV 파일 경로 지정
--display 콘솔 화면에 출력
--watch 모니터링 모드
--interval 조회 주기

성능 지표

이벤트

FSR 은 이벤트 구독 명령을 통해 FSR 로 부터 정의된 이벤트를 통지 받을 수 있습니다. 이벤트 구독을 통해 파일이나 연결 등의 상태가 변경되는 과정을 실시간 추적할 수 있습니다.

λ fsradm events r0
2020-06-12T12:42:39.295379 type=rpc state=connected
2020-06-12T12:42:41.685784 type=state node=node2 peer=node1 resource=r0 value=standalone
2020-06-12T12:42:41.685784 type=added node=node2 resource=r0
2020-06-12T12:42:41.685784 type=role node=node2 resource=r0 role=secondary
2020-06-12T12:42:41.685784 type=file_state node=node2 resource=r0 value=fileless
2020-06-12T12:42:41.728821 type=file_state node=node2 resource=r0 value=attaching
2020-06-12T12:42:41.744835 type=file_state node=node2 resource=r0 value=outdated
2020-06-12T12:42:41.774378 type=state node=node2 peer=node1 resource=r0 value=connecting

이벤트 해석의 용이성을 위해 json 형식의 출력을 지원하며 동기화 상태(--sync), 성능 통계에 대한 모니터링(–perf)에 대한 옵션을 부가적으로 지원합니다.

λ fsradm events --json r0
{"type":"rpc","timestamp":"2020-06-12T03:43:56.152358300Z","datas":{"state":"connected"}}
{"type":"state","timestamp":"2020-06-12T03:43:58.396422300Z","datas":{"node":"node2","peer":"node1","resource":"r0","value":"standalone"}}
{"type":"added","timestamp":"2020-06-12T03:43:58.396422300Z","datas":{"node":"node2","resource":"r0"}}
{"type":"role","timestamp":"2020-06-12T03:43:58.396422300Z","datas":{"node":"node2","resource":"r0","role":"secondary"}}
{"type":"file_state","timestamp":"2020-06-12T03:43:58.396422300Z","datas":{"node":"node2","resource":"r0","value":"fileless"}}
{"type":"file_state","timestamp":"2020-06-12T03:43:58.437426600Z","datas":{"node":"node2","resource":"r0","value":"attaching"}}
{"type":"file_state","timestamp":"2020-06-12T03:43:58.452638800Z","datas":{"node":"node2","resource":"r0","value":"outdated"}}
{"type":"state","timestamp":"2020-06-12T03:43:58.479433800Z","datas":{"node":"node2","peer":"node1","resource":"r0","value":"connecting"}}

이벤트의 유형에 관한 상세한 내용은 부록의 명령어 부분을 참고하세요.

정합성 검사

다음의 명령을 통해 소스와 타깃간의 데이터 정합성 검사를 수행할 수 있습니다. 정합성 검사는 소스가 아닌 타깃에서 다음과 같이 verify 검사를 요청하여 수행합니다.

λ fsradm verify r0

정합성 검사는 복제 수행 여부에 따라 동작 모드에 차이가 있습니다. 소스와 타깃 양측이 Secondary 일 경우라면 일반 verify 검사모드로 동작합니다. 그러나 한 쪽이 Primary 인 복제가 있는 상태일 경우에는 소스와 타깃 간의 데이터간의 차이가 발생하기 때문에 이에 대응하기 위한 복제 변경 분에 대한 데이터 시퀀스를 대기하는 advanced-verify 모드로 동작하게 됩니다. 일반 verify 모드와 advanced-verify 모드는 엔진에서 자동으로 결정하므로 사용자는 신경쓰지 않아도 되지만 두 방식에 차이가 있다는 것은 알아두어야 합니다.

기본적으로 정합성 검사는 UpToDate 인 데이터간의 검사를 전제로 하기 때문에 양측이 최신의 데이터가 아닐 경우 또는 정합성 검사 도중 동기화가 진행되거나 복제 상태가 변경되는 등의 상태 변화가 있게 되면 정합성 검사는 취소됩니다.

검사를 하는 대상은 해쉬 비교를 통해 차이점이 있는 파일을 대상으로 하고 정합성 검사가 끝난 후 검사에 대한 결과는 result 명령을 통해 확인할 수 있습니다.

λ fsradm result r0
{
  "id": "r0",
  "result": {
    "summary": {
      "start_time": "2019-09-09T06:22:26.6958913Z",
      "end_time": "2019-09-09T06:22:27.4653424Z",
      "peer_node": "node2"
    },
    "totals": {
      "diff_dir": "3",
      "diff_file": "1",
      "diff_bytes": "14",
      "orphaned_dir": "0",
      "orphaned_file": "0",
      "orphaned_bytes": "0",
      "missing_dir": "0",
      "missing_file": "0",
      "missing_bytes": "0",
      "synced_bytes": "0"
    },
    "files": [
      {
        "type": "different",
        "name": "G:\\Temp\\test1\\conf\\drbd.d",
        "is_dir": true,
        "out_of_sync": "0",
        "synced": "0",
        "flags": 4,
        "properties": {
          "mod_time": {
            "local": "2019-09-06T13:26:59.1427926+09:00",
            "remote": "2019-09-02T07:24:39.161996Z"
          }
        }
      },
      {
        "type": "different",
        "name": "G:\\Temp\\test1\\conf\\drbd.d\\1",
        "is_dir": true,
        "out_of_sync": "0",
        "synced": "0",
        "flags": 4,
        "properties": {
          "mod_time": {
            "local": "2019-09-06T13:26:54.0042751+09:00",
            "remote": "2019-09-02T07:24:39.3341577Z"
          }
        }
      },
      {
        "type": "different",
        "name": "G:\\Temp\\test1\\conf",
        "is_dir": true,
        "out_of_sync": "0",
        "synced": "0",
        "flags": 4,
        "properties": {
          "mod_time": {
            "local": "2019-09-06T13:26:59.0677748+09:00",
            "remote": "2019-08-07T02:15:58.4057437Z"
          }
        }
      },
      {
        "type": "different",
        "name": "G:\\Temp\\test1\\contributors.txt",
        "out_of_sync": "14",
        "synced": "0",
        "flags": 5,
        "properties": {
          "mod_time": {
            "local": "2019-09-09T14:00:05.6379239+09:00",
            "remote": "2018-12-12T04:42:50.6605579Z"
          },
          "size": {
            "local": 9,
            "remote": 15
          }
        }
      }
    ],
    "file_count": 4
  }
}

재구성

복제 운영 중 물리적인 디스크의 손상이 발생하는 등 환경적으로 예기치 않은 문제가 발생할 경우 이에 대응하고 복제를 정상화 하기 위한 절차가 필요로 합니다. 기본적으로는 이러한 문제가 발생하게 되면 디스크를 교체하고 복제 구성을 다시 해야 합니다.

다음의 과정에 따라 복제를 재구성하고 재 동기화 하는 절차를 수행해야 합니다.

운영중인 리소스를 중지 합니다.

c:\>fsradm down r0
done

디스크 교체 등 복구작업을 수행합니다.

메타를 재 생성합니다. 만약 구성상 변경이 있을 경우 구성파일을 새롭게 작성하고 메타를 재 생성해야 합니다.

c:\>fsradm meta create r0
done

리소스를 기동합니다.

c:\>fsradm up r0
done

소스 노드와 연결이 수립되면 동기화를 시작합니다.

백업

파일 삭제

FSR은 파일삭제에 대한 백업을 제공합니다. 파일삭제에 대한 백업은 의도치 않게 삭제 되는 파일들을 타깃의 특정경로에 임시로 저장해 두는 기능으로 archive 속성에 의해 지정될 수 있습니다. archive 속성은 기본 비활성화 되어 있으며 백업될 경로와 보관될 기간을 지정할 수 있습니다.

Working