Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Replication status

  • Off The connection with the other node is disconnected, or replication is not in progress.

  • Established. It is connected normally. Connection is established, data mirroring is enabled.

  • StartingSyncS. The local node is the source, and full synchronization has been initiated by the user. Next status: SyncSource or PausedSyncS

  • StartingSyncT. The local node is the target, and full synchronization has been started by the user. Next status: WFSyncUUID

  • WFBitMapS. Partial synchronization begins. Next status: SyncSource or PausedSyncS

  • WFBitMapT. Partial synchronization begins. Next status: WFSyncUUID

  • WFSyncUUID. The state in which synchronization is about to start. Next status: SyncTarget or PausedSyncT

  • SyncSource. The local node is the source and synchronization is in progress.

  • SyncTarget. The local node is the target and synchronization is in progress.

  • VerifyS. The local node is the source, and on-line device verification is running.

  • VerifyT. The local node is the target, and on-line device verification is running.

  • PausedSyncS. The local node is the source, and synchronization is paused by a dependency on other synchronization tasks to complete or by manual commands (bsradm pause-sync).

  • PausedSyncT. The local node is the target, and synchronization is paused by dependency on the completion of another synchronization operation or by manual command (bsradm pause-sync).

  • Ahead. The local node has reached the network congestion status and is unable to transmit the replication data. (send OOS Info to the peer node)

  • Behind. The partner node has reached the network congestion status and cannot receive the replicated data. (Afterward, switch to SyncTarget state)

...

Info

C:\Program Files\bsr\bin>bsrsetup events2 --now r0
exists resource name:r0 role:Secondary suspended:no
exists connection name:r0 peer-node-id:1 conn-name:remote-host connection:Connected role:Secondary
exists device name:r0 volume:0 minor:7 disk:UpToDate
exists device name:r0 volume:1 minor:8 disk:UpToDate
exists peer-device name:r0 peer-node-id:1 conn-name:remote-host volume:0
replication:Established peer-disk:UpToDate resync-suspended:no
exists peer-device name:r0 peer-node-id:1 conn-name:remote-host volume:1
replication:Established peer-disk:UpToDate resync-suspended:no
exists -

Adjust sync speed

When synchronization is in the background, the target data is temporarily inconsistent. This inconsistent state should be kept as short as possible, which is good in terms of consistency, so it is advantageous to have a sufficient synchronization speed. However, replication and synchronization share the same network band, and if the synchronization band is set high, relatively few replication bands can be provided. Lowering the replication bandwidth affects local I/O latency and consequently lowers local I/O performance of the active server. Because either side of replication or synchronization occupies a lot of bands unilaterally, it affects the operation of the other side relatively, so bsr implements variable-rate synchronization that adequately adjusts the synchronization band according to the replication situation while guaranteeing the replication band as much as possible. bsr use it as the default policy. Conversely, the fixed-rate synchronization policy is generally not recommended and should only be used in special situations, as it can lead to a decrease in local I/O performance when used during server operation in a way that ensures synchronization bands regardless of replication.

Info

Replication and synchronization

  • Replication is the action to reflect the I/O of the disk change occurring locally to the target in real time. replication is performed in the context where the incremental I/O is written to the local disk, thus affecting the local I/O latency.

  • Synchronization is the operation of matching the data on the source side disk with the data on the target side by out-of-sync area of the entire disk volume this is processed from 0 sector to last sector of volume sequentially.

To clearly differentiate these differences, bsr always describes replication and synchronization separately.

Info

It is pointless to set the synchronization speed to a value higher than the maximum disk write speed of the standby node. Since the standby node is the target of device synchronization in progress, the synchronization speed cannot be faster than the write speed of the I/O subsystem that the standby node allows. For the same reason, setting the sync speed to a value higher than the bandwidth available on the replication network makes no sense.

Fixed rate synchronization

The maximum bandwidth used for resynchronization in the background is determined by the resource's resync-rate option. These options are included in the disk section of the /etc/bsr.conf resource configuration as follows:

Code Block
resource <resource> {
  disk {
    resync-rate 40M;  
    c-min-rate 40M;  
    c-plan-ahead 0;  
    ...
  }
  ...
}

The resync-rate and c-min-rate settings are specified in bytes per second. The default unit is Kibibyte, and the value of 4096 is interpreted as 4 MiB.

Info

Important 

  • If the c-plan-ahead parameter is set to a positive value, the synchronization speed is dynamically adjusted. This value is set to 20 by default, but this value should be set to 0 for fixed rate synchronization speed.

  • c-min-rate is a parameter to set the minimum synchronization speed when replication and synchronization are performed simultaneously. This value is set to 250k by default, and if you want to guarantee a fixed synchronization speed, you should set it to the same value as resync-rate.

Variable rate synchronization

Fixed-rate synchronization is not an optimal method when multiple resources share a replication/synchronization network. Because they share the same network, if a synchronization rate is occupied for a specific replication resource channel, other resources are not guaranteed a fixed synchronization rate. In this case, you can mitigate that the synchronization rate is occupied by configuring to dynamically adjust the synchronization rate of each replication channel through variable rate synchronization. bsr determines the initial sync speed in this mode and then continuously adjusts the sync speed through an automatic control loop algorithm. This algorithm ensures sufficient bandwidth for foreground replication and greatly mitigates the impact of background synchronization on foreground I/O.

The optimal configuration for variable rate synchronization may vary depending on the available network bandwidth, application I/O pattern, and replication link congestion, and the optimal configuration setting may vary depending on whether replication accelerator(DRX) is used.

Info

Synchronization speed estimation

You can estimate the synchronization time with the following formula.

tresync = D/R

  • tresync is the estimated synchronization time.

  • D is the size of the data to be synchronized under the assumption that it is rarely affected (such as data being modified in the event of a broken network link).

  • R is the tunable synchronization rate, which has different limits depending on the replication network environment and the processing performance of the I/O subsystem.

Efficient synchronization

bsr provides various functions such as checksum-based synchronization, truck-based synchronization, and bitmap clear synchronization for efficient synchronization.

Checksum-based synchronization

Checksum data summarization can further improve the efficiency of bsr's synchronization algorithm. Checksum-based synchronization reads blocks before synchronization, obtains a hash summary of the contents on the current disk, and then reads the same sector from the other node and compares it with the obtained hash summary. If the hash match, the re-write for the block is omitted, and if they do not match, synchronization data is transmitted. This method can be advantageous in performance compared to the existing method of simply overwriting the block to be synchronized, and if the file system writes the same contents to the sector again while disconnected (disconnected), resynchronization is omitted for that sector. Overall, it have a more shorten synchronization time.

Truck-based synchronization

Truck-based synchronization by directly importing and configuring disks is suitable for the following situations.

  • Initially, the amount of data to be synchronized is very large (hundreds of gigabytes or more)

  • When the rate of change of the data to be copied is expected to be small compared to the huge data size

  • When available network bandwidth between source and target sites is limited

In the above situation, if you do not synchronize by truck-based synchronization and initialize with the normal device synchronization method, it will take a very long time during synchronization.

Let's say one situation. There is a local node that has been disconnected from being in Primary. That is, the device configuration is complete and the same copy of bsr.conf exists on both nodes. Commands for initial resource promotion have been executed on the local node, but the remote node is not connected yet.

  • 로컬 노드에서 다음 명령을 실행합니다.

    Code Block
    bsradm new-current-uuid --clear-bitmap <resource>
  • 복제 대상이 될 데이터와 그 데이터의 metadata의 사본을 똑같이 생성합니다. 예를 들어, RAID-1 미러에서 hot-swappable drive를 사용할 수도 있을 겁니다. 물론 이 상황에서는 RAID set이 미러링을 지속하기 위해 새로운 drive로 교체해 주어야 할 것입니다. 그러나 여기서 제거했던 디스크 드라이브는 다른곳에서 바로 사용할 수 있는 말 그대로의 사본입니다. 만약 로컬 블록 디바이스가 스냅샷 사본 기능을 지원한다면 이 기능을 사용하면 됩니다.

  • 로컬 노드에서 아래 명령을 실행합니다.

    Code Block
    bsradm new-current-uuid <resource>

두 번째 명령 실행에서는 --clear-bitmap 옵션이 없습니다.

  • 원본 데이터와 동일한 사본을 물리적으로 직접 가져와서 원격 노드에 사용할 수 있도록 구성 합니다.

  • 물리적으로 디스크를 직접 연결할 수도 있고, 가져온 데이터를 비트단위로 통째로 기존에 가지고 있던 디스크에 복사하여 사용해도 됩니다. 이 과정은 복제한 데이터 뿐만 아니라 메타데이터에도 해 주어야 합니다. 이런 절차가 수용될 수 없다면 이 방법은 더 이상 진행 할 수 없습니다. 

  • 원격 노드에서 bsr 리소스를 기동시킵니다.

    Code Block
    bsradm up <resource>

두 노드가 연결되면 디바이스 전체 동기화(full device synchronization)를 시작하지는 않을 것입니다. 대신에 bsradm--clear-bitmap new-current-uuid 명령을 호출 한 뒤부터 변경된 블럭에 관한 동기화만 자동으로 개시됩니다.

만약 아무런 변화가 없더라도 새로운 Secondary 노드에서 롤백되는 Activity Log에서 다뤄지는 영역에 따라 간단한 동기화가 있을 수 있습니다. 

Bitmap clear synchronization

비트맵을 클리어(--clear-bitmap)하는 옵션을 사용하여 장시간에 걸친 초기 전체 동기화 없이 빠르게 복제 상태가 될 수 있도록 할 수 있습니다. 다음은 이러한 운영 사례를 소개합니다.

새로운 Current UUID를 생성하고 Bitmap UUID를 지워서 초기 동기화를 건너 뛰는 데 사용할 수 있습니다. 이 사용 예는 지금 막 생성된 메타 데이터에서만 작동합니다.

  1. 양 노드에서, 메타를 초기화 하고 장치를 구성합니다. bsradm -- --force create-md res

  2. 양 노드의 리소스를 기동하고 초기 핸드쉐이크 시점에 서로의 볼륨 크기를 인식합니다. bsradm up res

  3. 양 노드가 Secondary/Secondary, Inconsistent/Inconsistent 로 연결된 상태에서 새로운 UUID를 생성하고 비트맵을 클리어 합니다. bsradm new-current-uuid --clear-bitmap res

  4. 이제 양노드는 Secondary/Secondary, UpToDate/UpToDate 상태가 되고 한 쪽을 Primary 로 승격한 후 파일시스템을 생성합니다. bsradm primary res mkfs -t fs-type $(bsradm sh-dev res)

이러한 방식의 명백한 부작용 중 하나는 복제본에 오래된 가비지가 가득하다는 것입니다 (다른 방법을 사용하여 동일하게 만들지 않는 한), 온라인 검사 시 동기화되지 않은 블록 수를 찾을 것으로 예상됩니다. 볼륨에 이미 데이터가 있는 상황에서는 이 방식을 절대 사용해선 안됩니다. 언뜻보기에는 작동하는 것처럼 보일 수 있지만 일단 다른 노드로 전환하면 이미 있던 데이터는 복제되지 않았으므로 데이터가 깨집니다.

혼잡 모드

Info

비동기 방식 복제에서 만 사용합니다.

복제 대역폭이 가변적인 환경(WAN 복제 환경)에서는 때때로 복제 링크가 정체 될 수 있습니다. 이로 인해 Primary 노드의 I/O가 대기하게 되면 로컬 I/O의 성능저하가 발생하기 때문에 바람직하지 않습니다. 이러한 혼잡 상황을 감지할 경우 진행 중인 복제를 일시 중단하도록 구성할 수 있습니다. 대신 이렇게 복제가 중단되는 상황에서는 Primary 측의 데이터 세트가 Secondary의 데이터보다 앞선 상태(Ahead)가 되고 이 앞서간 데이터 블럭들은 OOS(Out-Of-Sync) 로 기록하여 혼잡이 해제되면 이미 기록한 OOS를 백그라운드 재동기화를 통해 해소합니다. 다음은 혼잡 정책을 설정하는 예 입니다.

...

Efficient synchronization

bsr provides various functions such as FastSync, checksum-based synchronization, truck-based synchronization, and bitmap clear synchronization for efficient synchronization.

Fast Synchronization

bsr has improved the existing full synchronization method for the entire disk area to Fast Synchronization(FastSync), which synchronizes only the area used by the file system. bsr collects file system's usage area information for FastSync, records the usage area in OOS and performs synchronization.

FastSync is applied at the time of bsradm primary --force command for initial synchronization, invalidate / invalidate-remote, and online verify. Users do not need to set any special options for FastSync to work.

Checksum-based synchronization

Checksum data summarization can further improve the efficiency of bsr's synchronization algorithm. Checksum-based synchronization reads blocks before synchronization, obtains a hash summary of the contents on the current disk, and then reads the same sector from the other node and compares it with the obtained hash summary. If the hash match, the re-write for the block is omitted, and if they do not match, synchronization data is transmitted. This method can be advantageous in performance compared to the existing method of simply overwriting the block to be synchronized, and if the file system writes the same contents to the sector again while disconnected (disconnected), resynchronization is omitted for that sector. Overall, it have a more shorten synchronization time.

Truck-based synchronization

Truck-based synchronization by directly importing and configuring disks is suitable for the following situations.

  • Initially, the amount of data to be synchronized is very large (hundreds of gigabytes or more)

  • When the rate of change of the data to be copied is expected to be small compared to the huge data size

  • When available network bandwidth between source and target sites is limited

In the above situation, if you do not synchronize by truck-based synchronization and initialize with the normal device synchronization method, it will take a very long time during synchronization.

Let's say one situation. There is a local node that has been disconnected from being in Primary. That is, the device configuration is complete and the same copy of bsr.conf exists on both nodes. Commands for initial resource promotion have been executed on the local node, but the remote node is not connected yet.

  • Run the following command on the local node.

    Code Block
    bsradm new-current-uuid --clear-bitmap <resource>
  • Create copies of the data to be replicated and the metadata of the data. For example, you could use a hot-swappable drive in the RAID-1 mirror. Of course, in this situation, the RAID set will need to be replaced with a new drive to continue mirroring. However, the disk drive you removed here is a literal copy that can be used elsewhere. If your local block device supports snapshot copy function, you can use it.

  • Run the following command on the local node. There is no --clear-bitmap option in the second command run.

    Code Block
    bsradm new-current-uuid <resource>
  • Configures the same copy of the original data to be physically taken directly for use on remote nodes.

  • You can directly connect the disk physically, or you can copy the imported data to the existing disk and use it in bit units. This process should be done not only on the mirroring data, but also on the metadata. If such a procedure cannot be accepted, this method cannot proceed.

  • Start the bsr resource on the remote node.

    Code Block
    bsradm up <resource>

When both nodes are connected, they will not initiate full device synchronization. Instead, only synchronization of blocks that have changed since the bsradm--clear-bitmap new-current-uuid command was invoked is automatically initiated.

If there is no change, there may be a simple synchronization depending on the area covered in the Activity Log rolled back from the new secondary node. 

Bitmap clear synchronization

You can use the option to clear the bitmap (--clear-bitmap) so that it can be quickly sync without an initial full synchronization over a long period of time. The following are examples of these operations.

It can be used to skip the initial sync by creating a new Current UUID and clearing the Bitmap UUID. This use case only works for the metadata just created.

  1. On both nodes, initialize the meta and configure the device. bsradm -- --force create-md res

  2. Start resources of both nodes and recognize each other's volume size at the time of initial handshake. bsradm up res

  3. When both nodes are connected as Secondary / Secondary, Inconsistent / Inconsistent, create a new UUID and clear the bitmap. bsradm new-current-uuid --clear-bitmap res

  4. Now both nodes are in Secondary / Secondary, UpToDate / UpToDate state, and promote one side to Primary to create a file system. bsradm primary res mkfs -t fs-type $(bsradm sh-dev res)

One obvious side effect of this approach is that the replica is full of old garbage (unless you make it the same using other methods), it is expected to find the number of unsynchronized blocks when online verification. This method should never be used in situations where the volume already has data. At first glance it may seem to work, but once you switch to another node, the data that was already there is not replicated, so the data is broken.

Adjust sync speed

When synchronization is in the background, the target data is temporarily inconsistent. This inconsistent state should be kept as short as possible, which is good in terms of consistency, so it is advantageous to have a sufficient synchronization speed. However, replication and synchronization share the same network band, and if the synchronization band is set high, relatively few replication bands can be provided. Lowering the replication bandwidth affects local I/O latency and consequently lowers local I/O performance of the active server. Because either side of replication or synchronization occupies a lot of bands unilaterally, it affects the operation of the other side relatively, so bsr implements variable-rate synchronization that adequately adjusts the synchronization band according to the replication situation while guaranteeing the replication band as much as possible. bsr use it as the default policy. Conversely, the fixed-rate synchronization policy is generally not recommended and should only be used in special situations, as it can lead to a decrease in local I/O performance when used during server operation in a way that ensures synchronization bands regardless of replication.

Info

Replication and synchronization

  • Replication is the action to reflect the I/O of the disk change occurring locally to the target in real time. replication is performed in the context where the incremental I/O is written to the local disk, thus affecting the local I/O latency.

  • Synchronization is the operation of matching the data on the source side disk with the data on the target side by out-of-sync area of the entire disk volume this is processed from 0 sector to last sector of volume sequentially.

To clearly differentiate these differences, bsr always describes replication and synchronization separately.

Info

It is pointless to set the synchronization speed to a value higher than the maximum disk write speed of the standby node. Since the standby node is the target of device synchronization in progress, the synchronization speed cannot be faster than the write speed of the I/O subsystem that the standby node allows. For the same reason, setting the sync speed to a value higher than the bandwidth available on the replication network makes no sense.

Fixed rate synchronization

The maximum bandwidth used for resynchronization in the background is determined by the resource's resync-rate option. These options are included in the disk section of the /etc/bsr.conf resource configuration as follows:

Code Block
resource <resource> {
  disk {
    resync-rate 40M;  
    c-min-rate 40M;  
    c-plan-ahead 0;  
    ...
  }
  ...
}

The resync-rate and c-min-rate settings are specified in bytes per second. The default unit is Kibibyte, and the value of 4096 is interpreted as 4 MiB.

Info

Important 

  • If the c-plan-ahead parameter is set to a positive value, the synchronization speed is dynamically adjusted. This value is set to 20 by default, but this value should be set to 0 for fixed rate synchronization speed.

  • c-min-rate is a parameter to set the minimum synchronization speed when replication and synchronization are performed simultaneously. This value is set to 250k by default, and if you want to guarantee a fixed synchronization speed, you should set it to the same value as resync-rate.

Variable rate synchronization

Fixed-rate synchronization is not an optimal method when multiple resources share a replication/synchronization network. Because they share the same network, if a synchronization rate is occupied for a specific replication resource channel, other resources are not guaranteed a fixed synchronization rate. In this case, you can mitigate that the synchronization rate is occupied by configuring to dynamically adjust the synchronization rate of each replication channel through variable rate synchronization. bsr determines the initial sync speed in this mode and then continuously adjusts the sync speed through an automatic control loop algorithm. This algorithm ensures sufficient bandwidth for foreground replication and greatly mitigates the impact of background synchronization on foreground I/O.

The optimal configuration for variable rate synchronization may vary depending on the available network bandwidth, application I/O pattern, and replication link congestion, and the optimal configuration setting may vary depending on whether replication accelerator(DRX) is used.

Info

Synchronization speed estimation

You can estimate the synchronization time with the following formula.

tresync = D/R

  • tresync is the estimated synchronization time.

  • D is the size of the data to be synchronized under the assumption that it is rarely affected (such as data being modified in the event of a broken network link).

  • R is the tunable synchronization rate, which has different limits depending on the replication network environment and the processing performance of the I/O subsystem.

Congestion mode

Info

Used only in asynchronous replication.

In an environment where the replication bandwidth is variable (WAN replication environment), the replication link can sometimes become congested. Because of this, if the primary node's I/O waits, the performance of the local I/O will be degraded, which is undesirable. When detecting this congestion, you can configure it to suspend replication. Instead, in the situation where replication is interrupted, the primary data set is ahead of the secondary data, and these advanced data blocks are recorded as out-of-sync (OOS). when congestion is released, after all these oos is resolved through resynchronization. The following is an example of setting the congestion policy.

In the resource configuration file, the on-congestion option item sets the congestion mode, and the congestion-fill item sets the recognition threshold for congestion.

Code Block
resource <resource> {
  net {
    sndbuf-size 20M;
    on-congestion pull-ahead;
    congestion-fill 2G;
    congestion-extents 2000;
    ...
  }
  ...
}

The pull-ahead 옵션은 option is used with congestion-fill and congestion-extents와 함께 사용됩니다. congestion-fill의 권장 값은 다음과 같습니다.

  • 복제 가속기(DRX)를 연동하는 경우 DRX 버퍼 크기의 약 90% 로 설정합니다.

  • DRX를 연동하지 않을 경우엔 sndbuf-size 의 90% 크기로 설정합니다

  • congestion-extents의 권장 값은 al-extents 설정값의 90%입니다.

디스크 플러시

복제 중 타깃 노드가 전원장애로 인해 갑자기 다운된다면 디스크 캐쉬 영역이 배터리 백업 장치(BBWC)에 의해 백업되어 있지 않을 경우 데이터 유실이 발생할 수 있습니다. 복제에선 이를 미연에 방지하기 위해 데이터를 타깃의 디스크에 쓰는 과정에서 데이터를 미디어에 기록하고 난 후 flush 동작을 항상 수행하여 데이터 유실을 예방 합니다.

BBWC 가 장착된 스토리지 장치에선 디스크 플러시 동작을 굳이 할 필요가 없으므로 다음과 같이 플러시를 비활성화 할 수 있도록 옵션을 제공합니다extents. The recommended values for congestion-fill are:

  • When linking a replication accelerator(DRX), set it to about 90% of the DRX buffer size.

  • If DRX is not linked, set to 90% of sndbuf-size.

  • The recommended value for congestion-extents is 90% of the al-extents setting.

Disk flush

If the target node suddenly goes down due to power failure during replication, data loss may occur if the disk cache area is not backed up by a battery backup device (BBWC). In order to prevent this in advance, in the process of writing data to the disk of the target, after data is written to the media, the flush operation is always performed to prevent data loss.

The storage device equipped with BBWC does not need to perform the disk flush operation, so it provides an option to disable the flush as follows.

Code Block
resource <resource>
  disk {
    disk-flushes no;
    md-flushes no;
    ...
  }
  ...
}

배터리 백업 쓰기 캐시 You should disable device flushing only when running bsr on devices with battery backup write cache (BBWC)가있는 장치에서 bsr을 실행할 때만 장치 플러시를 비활성화해야합니다. 대부분의 스토리지 컨트롤러는 배터리가 소진되면 쓰기 캐시를 자동으로 비활성화하고 배터리가 소진되면 쓰기(write through) 모드로 전환합니다. Most storage controllers automatically disable the write cache when the battery is exhausted and switch to write through mode when the battery is exhausted.

정합성 검증

정합성 검증은 복제를 수행하는 과정에서 복제 트래픽을 블럭 단위로 실시간 수행하거나 전체 디스크 볼륨 단위로 소스와 타깃의 데이터가 완전히 일치하는지 해쉬 요약을 기반으로 블럭단위로 비교하는 기능입니다.

...