...
하위 수준의 장치를 복제 장치로 적재합니다.
connect { connection}
피어에 대한 기존 연결을 활성화 합니다. 연결은 new-peer 명령으로 먼저 작성해야 하며 new-path 명령으로 하나 이상의 경로를 작성해야 합니다.
create-md { device}
장치의 메타 데이터를 초기화 합니다. 이것은 복제 장치를 최초 연결하기 전에 필요합니다.
cstate { connection}
현재의 연결 상태를 표시합니다.
detach { device}
복제 장치의 하위 장치를 분리합니다.
disconnect { connection}
피어 호스트와의 연결을 제거 합니다.
disk-options { device}
적재된 장치의 disk 옵션을 변경합니다.
down { resource}
이 명령을 통해 모든 볼륨과 연결 그리고 리소스 자체를 제거하여 중지 합니다.
dstate { device}
하위 장치의 현재 디스크 상태를 출력합니다.
dump { resource}
구성파일은 파싱하여 stdout 으로 출력합니다.
dump-md { device}
장치의 메타데이터를 텍스트 형식으로 덤프합니다. 비트맵과 액티비트 로그도 포함됩니다.
get-gi { peer_device}
특정 연결의 장치를 위한 생성 식별자(GI)를 출력합니다. 적재된 장치에는 bsrsetup, 적재되지 않은 장치에는 bsrmeta 를 사용합니다.
...
명시적으로 문서화되지 않은 모든 명령을 표시합니다.
invalidate { peer_device}
장치의 로컬 데이터를 피어의 데이터로 동기화 합니다.
invalidate-remote { peer_device}
피어 장치의 데이터를 로컬 노드의 장치로 동기화 합니다.
net-options { connection}
기존 연결의 net 옵션을 변경합니다.
new-current-uuid { device}
새로운 current UUID 를 생성합니다.
outdate { device}
하위 장치의 데이터를 outdated 로 지정합니다.
pause-sync { peer_device}
로컬 일시 정지 플래그를 설정하여 로컬 장치와 피어 장치 간의 재 동기화를 중지합니다.
primary { resource}
리소스에서 노드의 역할을 primary로 변경합니다.
resize { device}
모든 노드에서 복제장치의 하위 장치 크기를 조정합니다. 이것은 check-resize 와 resize 하위 레벨 명령을 결합하여 수행합니다.
resource-options { resource}
기존 리소스의 리소스 옵션을 변경합니다.
resume-sync { peer_device}
로컬 동기화 일시 정지 플래그를 지워서 재 동기화가 재개 되도록 합니다.
role { resource}
리소스의 현재 역할을 출력합니다.
secondary { resource}
리소스의 역할을 Secondary 로 변경(강등)합니다. 만약 리소스의 복제 장치가 사용 중 이라면 리눅스에서 이 명령은 실패합니다. 윈도우즈에선 장치 사용 여부와 관계없이 강등합니다.
show-gi { peer_device}
특정 연결에서 장치의 데이터 생성 식별자를 표시합니다. 또한 출력에 대해 부연설명 합니다.
up { resource}
다음의 과정으로 리소스를 기동합니다
모든 볼륨의 activity log 적용: bsrmeta apply-al
리소스 생성: bsrsetup new-resource
복제 장치 생성: bsrsetup new-device, bsrsetup new-minor
하위 장치 적재: bsrsetup attach
모든 피어에 연결: bsrsetup connect
verify { peer_device}
온라인 검증을 시작, 중지, 또는 특정 부분에 대한 검증을 지정할 수 있습니다.
wait-connect {[ device] | [connection] | [resource]}
피어 상의 장치, 연결 상의 모든 장치들, 모든 피어상의 모든 장치들이 확인 될 때까지 기다립니다.
wait-sync {[ device] | [connection] | [resource]}
장치가 연결되고 최종 재 동기화 작업이 완료 될 때까지 기다립니다. 연결 및 리소스 수준에서도 사용할 수 있습니다
wipe-md { device}
장치의 bsr 메타 데이터를 지웁니다.
forget-peer { connection}
메타 데이터에서 연결되지 않은 피어에 대한 참조를 완전히 제거합니다.
...
두 명령 모두 복제장치의 minor 번호로 지정합니다. lower_dev는 하위 장치의 이름입니다. meta_data_dev는 메타 데이터를 포함하는 장치의 이름이며, lower_dev와 동일 할 수 있습니다. meta_data_index는 인덱스 메타 데이터의 번호이거나 내부 메타 데이터의 경우 internal 키워드 또는 가변 크기의 외부 메타 데이터의 경우 flexible 키워드 입니다. 사용 가능한 옵션은 다음과 같습니다:
--al-extents extents
bsr automatically maintains a "hot" or "active" disk area likely to be written to again soon based on the recent write activity. The "active" disk area can be written to immediately, while "inactive" disk areas must be "activated" first, which requires a meta-data write. We also refer to this active disk area as the "activity log". The activity log saves meta-data writes, but the whole log must be resynced upon recovery of a failed node. The size of the activity log is a major factor of how long a resync will take and how fast a replicated disk will become consistent after a crash. The activity log consists of a number of 4-Megabyte segments; the al-extents parameter determines how many of those segments can be active at the same time. The default value for al-extents is 1237, with a minimum of 7 and a maximum of 65536. Note that the effective maximum may be smaller, depending on how you created the device meta data, see also bsrmeta(8) The effective maximum is 919 * (available on-disk activity-log ring-buffer area/4kB -1), the default 32kB ring-buffer effects a maximum of 6433 (covers more than 25 GiB of data) We recommend to keep this well within the amount your backend storage and replication link are able to resync inside of about 5 minutesbsr은 최근 디스크 쓰기 작업을 근거로 쓰여진(active) 영역과 쓰여진 영역에 최근 다시 쓰여진(hot) 영역에 대해 관리합니다. 쓰기 I/O가 발생하면 active 영역은 디스크에 즉시 쓰면 되지만 inactive 디스크 영역은 먼저 activated 해야 하기 때문에 여기서 메타 데이터 쓰기가 필요합니다. 이 active 디스크 영역을 activity log 라고 합니다. activity log에 메타 데이터 쓰기를 저장하지만 실패한 노드를 복구할 경우 전체 activity log에 대해 다시 동기화 해야 합니다. 따라서 activity log의 크기는 primary 크래쉬 후 재 동기화에 얼마나 오래 걸릴지, 얼마나 빨리 복제 디스크의 일관성을 맞출지의 주요 요인이 됩니다. activity log는 여러 개의 4MiB 단위 세그먼트로 구성됩니다. al-extents 매개 변수는 동시에 활성화 할 수있는 세그먼트 수를 결정합니다. al-extents의 기본값은 6001이며 최소 7과 최대 65536입니다. 장치 메타 데이터를 생성한 방법에 따라 유효한 최대 값이 더 작을 수도 있습니다 (bsrmeta 참조).
유효 최대 값은 919 * (사용 가능한 온 디스크 activity log 링 버퍼 영역 / 4kB -1)이며, 기본 32KB 링 버퍼에서 최대 6433 (25GiB 이상의 데이터 포함)이 됩니다. 백엔드 스토리지 및 복제 링크가 약 5 분 이내에 재 동기화 될 수있는 양 이내에서 activity log의 크기를 유지하는 게 좋습니다. al-extents 의 크기를 변경하려면 리소스 중지(down)가 필요합니다.
--al-updates {yes | no}
With this parameter, the activity log can be turned off entirely (see the al-extents parameter). This will speed up writes because fewer meta-data writes will be necessary, but the entire device needs to be resynchronized opon recovery of a failed primary node. The default value for al-updates is yes이 매개 변수를 no 로 설정하면 activity log를 완전히 끌 수 있습니다. 메타 데이터 쓰기가 더 적게 필요하기 때문에 쓰기 속도가 빨라지지만, 실패한 기본 노드의 복구시 전체 장치를 재 동기화해야합니다. al-updats 의 기본값은 yes 입니다.
--disk-barrier, --disk-flushes, --disk-drain bsr has three methods of handling the ordering of dependent write requests:bsr에는 쓰기 요청의 순서를 처리하는 세 가지 방법이 있습니다.
disk-barrier
Use disk barriers to make sure that requests are written to disk in the right order. Barriers ensure that all requests submitted before a barrier make it to the disk before any requests submitted after the barrier. This is implemented using 'tagged command queuing' on SCSI devices and 'native command queuing' on SATA devices. Only some devices and device stacks support this method. The device mapper (LVM) only supports barriers in some configurations. Note that on systems which do not support disk barriers, enabling this option can lead to data loss or corruption. Until bsr 8.4.1, disk-barrier was turned on if the I/O stack below bsr did support barriers. Kernels sinceflushes
디스크에 쓰기 I/O 를 수행한 후 flush 를 강제하여 모든 데이터를 디스크에 기록하도록 조치합니다. 플랫폼에 따라 또는 드라이브 공급 업체에 따라 flush의 구현이 다를 수 있습니다. 예전 방식으로는 'force unit access'라고 명명되는 디스크 캐쉬를 우회하는 기술로 사용되기도 했으나 최근에은 기본적으로 디스크이 캐쉬를 비우는 작업을 통해 디스크 쓰기를 보장하는 방식으로 구현되고 있습니다. 이 옵션은 기본적으로 활성화 되어 있습니다.
disk-barrier
이 옵션을 사용하여 요청이 올바른 순서로 디스크에 기록되도록합니다. barrier는 barrier 이전에 제출 된 모든 요청이 이후에 제출 된 요청보다 앞서서 모두 디스크에 요청하도록 보장 합니다. 이는 SCSI 장치의 'tagged command queuing'과 SATA 장치의 'native command queuing'을 사용하여 구현됩니다. 일부 장치 및 장치 스택 만이 이 방법을 지원합니다. device mapper (LVM)는 일부 구성에서만 barrier를 지원합니다. disk-barrier을 지원하지 않는 시스템에서 이 옵션을 사용하면 데이터가 손실되거나 손상 될 수 있습니다. 이 옵션은 예전 리눅스 커널에서는 지원했지만 linux-2.6.36 (or 또는 2.6.32 RHEL6) no longer allow to detect if barriers are supported. Since bsr-8.4.2, this option is off by default and needs to be enabled explicitly이후의 커널은 더 이상 disk-barrier가 지원되는지 감지할 수 없습니다. 이 옵션은 기본적으로 해제되어 있으며 명시적으로 활성화 해야 합니다.
disk-flushes
Use disk flushes between dependent write requests, also referred to as 'force unit access' by drive vendors. This forces all data to disk. This option is enabled by default.
disk-drain
Wait for the request queue to "drain" (that is, wait for the requests to finish) before submitting a dependent write request. This method requires that requests are stable on disk when they finish. Before bsr 8.0.9, this was the only method implemented. This option is enabled by default. Do not disable in production environments.
From these three methods, bsr will use the first that is enabled and supported by the backing storage device. If all three of these options are turned off, bsr will submit write requests without bothering about dependencies. Depending on the I/O stack, write requests can be reordered, and they can be submitted in a different order on different cluster nodes. This can result in data loss or corruption. Therefore, turning off all three methods of controlling write ordering is strongly discouraged.
A general guideline for configuring write ordering is to use disk barriers or disk flushes when using ordinary disks (or an ordinary disk array) with a volatile write cache. On storage without cache or with a battery backed write cache, disk draining can be a reasonable choice.
--disk-timeout If the lower-level device on which a bsr device stores its data does not finish an I/O request within the defined disk-timeout, bsr treats this as a failure. The lower-level device is detached, and the device's disk state advances to Diskless. If bsr is connected to one or more peers, the failed request is passed on to one of them. This option is dangerous and may lead to kernel panic! "Aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed *error* completion is OK, though we still will complain noisily about it. The default value of disk-timeout is 0, which stands for an infinite timeout. Timeouts are specified in units of 0.1 seconds. This option is available since bsr 8.3.12.
--md-flushes Enable disk flushes and disk barriers on the meta-data device. This option is enabled by default. See the disk-flushes parameter.
--on-io-error handlerConfigure how bsr reacts to I/O errors on a lower-level device. The following policies are defined:
pass_on Change the disk status to Inconsistent, mark the failed block as inconsistent in the bitmap, and retry the I/O operation on a remote cluster node.
call-local-io-error Call the local-io-error handler (see the handlers section).
detach Detach the lower-level device and continue in diskless mode.
--read-balancing policyDistribute read requests among cluster nodes as defined by policy. The supported policies are prefer-local (the default), prefer-remote, round-robin, least-pending, when-congested-remote, 32K-striping, 64K-striping, 128K-striping, 256K-striping, 512K-striping and 1M-striping. This option is available since bsr 8.4.1.
resync-after minorDefine that a device should only resynchronize after the specified other device. By default, no order between devices is defined, and all devices will resynchronize in parallel. Depending on the configuration of the lower-level devices, and the available network and disk bandwidth, this can slow down the overall resync process. This option can be used to form a chain or tree of dependencies among devices.
--size size Specify the size of the lower-level device explicitly instead of determining it automatically. The device size must be determined once and is remembered for the lifetime of the device. In order to determine it automatically, all the lower-level devices on all nodes must be attached, and all nodes must be connected. If the size is specified explicitly, this is not necessary. The size value is assumed to be in units of sectors (512 bytes) by default.
--discard-zeroes-if-aligned {yes | no} There are several aspects to discard/trim/unmap support on linux block devices. Even if discard is supported in general, it may fail silently, or may partially ignore discard requests. Devices also announce whether reading from unmapped blocks returns defined data (usually zeroes), or undefined data (possibly old data, possibly garbage). If on different nodes, bsr is backed by devices with differing discard characteristics, discards may lead to data divergence (old data or garbage left over on one backend, zeroes due to unmapped areas on the other backend). Online verify would now potentially report tons of spurious differences. While probably harmless for most use cases (fstrim on a file system), bsr cannot have that. To play safe, we have to disable discard support, if our local backend (on a Primary) does not support "discard_zeroes_data=true". We also have to translate discards to explicit zero-out on the receiving side, unless the receiving side (Secondary) supports "discard_zeroes_data=true", thereby allocating areas what were supposed to be unmapped. There are some devices (notably the LVM/DM thin provisioning) that are capable of discard, but announce discard_zeroes_data=false. In the case of DM-thin, discards aligned to the chunk size will be unmapped, and reading from unmapped sectors will return zeroes. However, unaligned partial head or tail areas of discard requests will be silently ignored. If we now add a helper to explicitly zero-out these unaligned partial areas, while passing on the discard of the aligned full chunks, we effectively achieve discard_zeroes_data=true on such devices. Setting discard-zeroes-if-aligned to yes will allow bsr to use discards, and to announce discard_zeroes_data=true, even on backends that announce discard_zeroes_data=false. Setting discard-zeroes-if-aligned to no will cause bsr to always fall-back to zero-out on the receiving side, and to not even announce discard capabilities on the Primary, if the respective backend announces discard_zeroes_data=false. We used to ignore the discard_zeroes_data setting completely. To not break established and expected behaviour, and suddenly cause fstrim on thin-provisioned LVs to run out-of-space instead of freeing up space, the default value is yes. This option is available since 8.4.7.
--rs-discard-granularity byteWhen rs-discard-granularity is set to a non zero, positive value then bsr tries to do a resync operation in requests of this size. In case such a block contains only zero bytes on the sync source node, the sync target node will issue a discard/trim/unmap command for the area. The value is constrained by the discard granularity of the backing block device. In case rs-discard-granularity is not a multiplier of the discard granularity of the backing block device bsr rounds it up. The feature only gets active if the backing block device reads back zeroes after a discard command. The default value of is 0. This option is available since 8.4.7.
bsrsetup peer-device-options resource peer_node_id volume
These are options that affect the peer's device.
--c-delay-target delay_target,
--c-fill-target fill_target,
--c-max-rate max_rate,
--c-plan-ahead plan_time Dynamically control the resync speed. This mechanism is enabled by setting the c-plan-ahead parameter to a positive value. The goal is to either fill the buffers along the data path with a defined amount of data if c-fill-target is defined, or to have a defined delay along the path if c-delay-target is defined. The maximum bandwidth is limited by the c-max-rate parameter. The c-plan-ahead parameter defines how fast bsr adapts to changes in the resync speed. It should be set to five times the network round-trip time or more. Common values for c-fill-target for "normal" data paths range from 4K to 100K. If bsr-proxy is used, it is advised to use c-delay-target instead of c-fill-target. The c-delay-target parameter is used if the c-fill-target parameter is undefined or set to 0. The c-delay-target parameter should be set to five times the network round-trip time or more. The c-max-rate option should be set to either the bandwidth available between the bsr-hosts and the machines hosting bsr-proxy, or to the available disk bandwidth. The default values of these parameters are: c-plan-ahead = 20 (in units of 0.1 seconds), c-fill-target = 0 (in units of sectors), c-delay-target = 1 (in units of 0.1 seconds), and c-max-rate = 102400 (in units of KiB/s). Dynamic resync speed control is available since bsr 8.3.9.
--c-min-rate min_rate A node which is primary and sync-source has to schedule application I/O requests and resync I/O requests. The c-min-rate parameter limits how much bandwidth is available for resync I/O; the remaining bandwidth is used for application I/O. A c-min-rate value of 0 means that there is no limit on the resync I/O bandwidth. This can slow down application I/O significantly. Use a value of 1 (1 KiB/s) for the lowest possible resync rate. The default value of c-min-rate is 4096, in units of KiB/s.
--resync-rate rate Define how much bandwidth bsr may use for resynchronizing. bsr allows "normal" application I/O even during a resync. If the resync takes up too much bandwidth, application I/O can become very slow. This parameter allows to avoid that. Please note this is option only works when the dynamic resync controller is disabled.
bsrsetup check-resize minor
Remember the current size of the lower-level device of the specified replicated device. Used by bsradm. The size information is stored in file /var/lib/bsr/bsr-minor- minor.lkbd.
bsrsetup new-peer resource peer_node_id,
bsrsetup net-options resource peer_node_id
The new-peer command creates a connection within a resource. The resource must have been created with bsrsetup new-resource. The net-options command changes the network options of an existing connection. Before a connection can be activated with the connect command, at least one path need to added with the new-path command. Available options:
--after-sb-0pri policy Define how to react if a split-brain scenario is detected and none of the two nodes is in primary role. (We detect split-brain scenarios when two nodes connect; split-brain decisions are always between two nodes.) The defined policies are:
disconnect No automatic resynchronization; simply disconnect.
discard-younger-primary,
discard-older-primary Resynchronize from the node which became primary first ( discard-younger-primary) or last (discard-older-primary). If both nodes became primary independently, the discard-least-changes policy is used.
discard-zero-changes If only one of the nodes wrote data since the split brain situation was detected, resynchronize from this node to the other. If both nodes wrote data, disconnect.
discard-least-changes Resynchronize from the node with more modified blocks.
discard-node-nodename Always resynchronize to the named node.
--after-sb-1pri policy Define how to react if a split-brain scenario is detected, with one node in primary role and one node in secondary role. (We detect split-brain scenarios when two nodes connect, so split-brain decisions are always among two nodes.) The defined policies are:
disconnect No automatic resynchronization, simply disconnect.
consensus Discard the data on the secondary node if the after-sb-0pri algorithm would also discard the data on the secondary node. Otherwise, disconnect.
violently-as0p Always take the decision of the after-sb-0pri algorithm, even if it causes an erratic change of the primary's view of the data. This is only useful if a single-node file system (i.e., not OCFS2 or GFS) with the allow-two-primaries flag is used. This option can cause the primary node to crash, and should not be used.
discard-secondary Discard the data on the secondary node.
call-pri-lost-after-sb Always take the decision of the after-sb-0pri algorithm. If the decision is to discard the data on the primary node, call the pri-lost-after-sb handler on the primary node.
...
drain
쓰기 요청을 제출하기 전에 요청 큐가 "드레인"될 때까지(즉, 요청이 완료 될 때까지) 기다립니다. 이 방법을 사용하려면 요청이 완료될 떄 까지 요청들이 디스크에서 안정적이어야 합니다. 예전에는 이 옵션을 기본 활성화 하였지만 지금은 기본 옵션이 아닙니다.
이 세 가지 방법 중에서 bsr은 백업 저장 장치가 활성화하고 지원하는 첫 번째 방법을 사용합니다. 이 세 가지 옵션을 모두 해제하면 bsr은 쓰기 의존성을 신경 쓰지 않고 요청을 제출합니다. 다른 환경의 클러스터 노드에서는 I/O 스택에 따라 쓰기 요청이 다시 정렬되어 다른 순서로 제출될 수 있습니다. 이럴 경우 데이터가 손실되거나 손상될 수 있습니다. 따라서 쓰기 순서를 제어하는 세 가지 방법을 모두 해제하지 않는 것이 좋습니다.
쓰기 순서를 구성하기위한 일반적인 지침은 휘발성 쓰기 캐시가있는 일반 디스크(또는 일반 디스크 어레이)를 사용할 때 disk-barrier 나 disk-flush 를 사용하는 것입니다. 캐시가 없거나 배터리 백업 쓰기 캐시가 있는 스토리지에서는 disk-drain이 적합합니다.
--disk-timeout 데이터를 저장하는 하위 장치에 정의된 디스크 시간 내에 I/O 요청을 완료하지 못하면 bsr은 이를 실패로 처리합니다. 이 경우 하위 장치가 detach되고 장치의 디스크 상태가 diskless 상태가 됩니다. bsr이 하나 이상의 피어에 연결되어 있다면 실패한 요청이 그 중 하나에 전달됩니다. 이 옵션은 위험하며 커널 패닉으로 이어질 수 있습니다. 요청을 Abort 하고 강제로 디스크를 제거하는 것은 더 이상 요청을 완료하지도 않고 오류도 반환하지 않는 완전히 block되고 중지된 로컬 백업 장치를 위한 조치입니다. 이 상황에서는 일반적으로 하드 리셋 및 페일 오버가 유일한 방법입니다. disk-timeout의 기본값은 0이며, 이는 무한 시간 초과를 나타냅니다. 시간 초과는 0.1 초 단위로 지정됩니다.
--md-flushes 메타 데이터 장치에서 디스크 플러시 및 디스크 장벽을 활성화합니다. 이 옵션은 기본적으로 활성화되어 있습니다. disk-flush 매개 변수를 참조하십시오.
on-io-error handler
하위 레벨 장치에서 bsr이 I/O 오류에 대응하는 방식을 구성합니다. 다음과 같은 정책이 정의됩니다.
passthrough 하위 장치에서 오류가 반환될 경우 해당 블럭 계층을 OOS로 기록하고 상위 계층으로 오류를 전달합니다. 해당 오류 블럭은 보통 상위 계층에 의해 재시도 I/O가 발생 되고 재시도 시점에 성공할 경우 OOS 는 자연스럽게 해소되거나 그렇지 않을 경우 OOS 가 기록되어 남겨집니다. bsr 의 기본값 입니다.
call-local-io-error local-io-error 핸들러를 호출합니다.
detach 하위레벨 장치를 분리하고 diskless 상태로 전환합니다. diskless 상태에서는 I/O가 수행될 수 없으며 즉시 failover가 필요합니다.
resync-after minor 지정된 다른 장치가 동기화된 이후에만 장치를 재 동기화하도록 정의합니다. 기본적으로 장치간에는 동기화 순서가 정의되어 있지 않으며 모든 장치가 병렬로 재 동기화 됩니다. 하위 장치 구성, 사용 가능한 네트워크 및 디스크 대역폭에 따라 전체 재 동기화 프로세스가 느려질 수 있기 때문에 이 옵션을 사용하여 장치 간의 종속성 체인 또는 트리를 형성 할 수 있습니다.
bsrsetup peer-device-options resource peer_node_id volume
다음은 피어 장치에 영향을 주는 옵션입니다.
--c-delay-target delay_target,
--c-fill-target fill_target,
--c-max-rate max_rate,
--c-plan-ahead plan_time 재 동기화 속도를 동적으로 제어합니다. 이 메카니즘은 c-plan-ahead 매개 변수를 양수 값으로 설정하여 사용할 수 있습니다. 최대 대역폭은 c-max-rate 매개 변수에 의해 제한됩니다. c-plan-ahead 매개 변수는 bsr이 재 동기화 속도의 변화에 얼마나 빨리 적응 하는지를 정의합니다. 네트워크 왕복 시간(RTT)의 5 배 이상으로 설정해야합니다. c-fill-target이 정의되면 데이터 경로를 따라 정의 된 양의 데이터로 버퍼를 채우려고 하고 c-delay-target이 정의 된 경우 정의된 지연을 갖게 합니다. "정상" 데이터 경로에 대한 c-fill-target의 공통 값 범위는 4K ~ 100K입니다. drx를 사용하는 경우 c-fill-target 대신 c-delay-target을 사용하는 것이 좋습니다. c-delay-target 매개 변수는 c-fill-target 매개 변수가 정의되지 않거나 0으로 설정된 경우에 사용됩니다. c-delay-target 매개 변수는 네트워크 왕복 시간의 5 배 이상으로 설정해야합니다. c-max-rate 옵션은 bsr 호스트와 drx 를 호스팅하는 시스템간에 사용 가능한 대역폭 또는 사용 가능한 디스크 대역폭으로 설정해야합니다. 이 매개 변수들의 기본값은 다음과 같습니다. c-plan-ahead = 20 (0.1 초 단위), c-fill-target = 0 (섹터 단위), c-delay-target = 1 (0.1 초 단위) ) 및 c-max-rate = 102400 (KiB/s 단위).
--c-min-rate min_rate Primary 이고 동기화 소스 인 노드는 애플리케이션 I/O 요청과 동기화 요청을 스케줄링해야 합니다. c-min-rate 매개 변수는 재 동기화 I/O에 사용할 수있는 대역폭의 양을 제한합니다. 나머지 대역폭은 응용 프로그램 I/O의 복제에 사용됩니다. c-min-rate 값이 0이면 재 동기화 I/O 대역폭에 제한이 없음을 의미합니다. 이로 인해 응용 프로그램 I/O 속도가 크게 느려질 수 있습니다. 가장 낮은 재 동기화 속도를 위해서는 1 (1 KiB/s) 값을 사용하십시오. c-min-rate의 기본값은 KiB/s 단위로 250 입니다.
--resync-rate rate 재 동기화에 사용할 수있는 대역폭을 정의합니다. bsr은 재 동기화 중에도 일반적인 응용 프로그램 I/O를 허용합니다. 재 동기화가 너무 많은 대역폭을 차지하면 응용 프로그램 I/O가 매우 느려질 수 있으며 이 매개 변수를 사용하면 이를 피할 수 있습니다. 이 옵션은 동적 재 동기화 컨트롤러가 비활성화 된 경우에만 작동합니다.
bsrsetup check-resize minor
지정된 복제 장치의 하위 장치의 현재 크기를 기억합니다. bsradm에서 사용합니다. 크기 정보는 /var/lib/bsr/bsr-minor-minor.lkbd 파일에 저장됩니다.
bsrsetup new-peer resource peer_node_id,
bsrsetup net-options resource peer_node_id
The new-peer command creates a connection within a resource. The resource must have been created with bsrsetup new-resource. The net-options command changes the network options of an existing connection. Before a connection can be activated with the connect command, at least one path need to added with the new-path command. Available options:
--after-sb-0pri policy Define how to react if a split-brain scenario is detected and
...
none of the two nodes is in primary role. (We detect split-brain scenarios when two nodes connect
...
; split-brain decisions are always
...
between two nodes.) The defined policies are:
disconnect No automatic resynchronization
; simply disconnect.
violently-as0p See the violently-as0p policy for after-sb-1pri.
call-pri-lost-after-sb Call the pri-lost-after-sb helper program on one of the machines unless that machine can demote to secondary. The helper program is expected to reboot the machine, which brings the node into a secondary role. Which machine runs the helper program is determined by the after-sb-0pri strategy.
--allow-two-primaries The most common way to configure bsr devices is to allow only one node to be primary (and thus writable) at a time. In some scenarios it is preferable to allow two nodes to be primary at once; a mechanism outside of bsr then must make sure that writes to the shared, replicated device happen in a coordinated way. This can be done with a shared-storage cluster file system like OCFS2 and GFS, or with virtual machine images and a virtual machine manager that can migrate virtual machines between physical machines. The allow-two-primaries parameter tells bsr to allow two nodes to be primary at the same time. Never enable this option when using a non-distributed file system; otherwise, data corruption and node crashes will result!
--always-asbp Normally the automatic after-split-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node. With this option you request that the automatic after-split-brain policies are used as long as the data sets of the nodes are somehow related. This might cause a full sync, if the UUIDs indicate the presence of a third node. (Or double faults led to strange UUID sets.)
--connect-int time As soon as a connection between two nodes is configured with bsrsetup connect, bsr immediately tries to establish the connection. If this fails, bsr waits for connect-int seconds and then repeats. The default value of connect-int is 10 seconds.
--cram-hmac-alg hash-algorithm Configure the hash-based message authentication code (HMAC) or secure hash algorithm to use for peer authentication. The kernel supports a number of different algorithms, some of which may be loadable as kernel modules. See the shash algorithms listed in /proc/crypto. By default, cram-hmac-alg is unset. Peer authentication also requires a shared-secret to be configured.
--csums-alg hash-algorithm Normally, when two nodes resynchronize, the sync target requests a piece of out-of-sync data from the sync source, and the sync source sends the data. With many usage patterns, a significant number of those blocks will actually be identical. When a csums-alg algorithm is specified, when requesting a piece of out-of-sync data, the sync target also sends along a hash of the data it currently has. The sync source compares this hash with its own version of the data. It sends the sync target the new data if the hashes differ, and tells it that the data are the same otherwise. This reduces the network bandwidth required, at the cost of higher cpu utilization and possibly increased I/O on the sync target. The csums-alg can be set to one of the secure hash algorithms supported by the kernel; see the shash algorithms listed in /proc/crypto. By default, csums-alg is unset.
--csums-after-crash-only Enabling this option (and csums-alg, above) makes it possible to use the checksum based resync only for the first resync after primary crash, but not for later "network hickups". In most cases, block that are marked as need-to-be-resynced are in fact changed, so calculating checksums, and both reading and writing the blocks on the resync target is all effective overhead. The advantage of checksum based resync is mostly after primary crash recovery, where the recovery marked larger areas (those covered by the activity log) as need-to-be-resynced, just in case. Introduced in 8.4.5.
--data-integrity-alg alg bsr normally relies on the data integrity checks built into the TCP/IP protocol, but if a data integrity algorithm is configured, it will additionally use this algorithm to make sure that the data received over the network match what the sender has sent. If a data integrity error is detected, bsr will close the network connection and reconnect, which will trigger a resync. The data-integrity-alg can be set to one of the secure hash algorithms supported by the kernel; see the shash algorithms listed in /proc/crypto. By default, this mechanism is turned off. Because of the CPU overhead involved, we recommend not to use this option in production environments. Also see the notes on data integrity below.
--fencing fencing_policy Fencing is a preventive measure to avoid situations where both nodes are primary and disconnected. This is also known as a split-brain situation. bsr supports the following fencing policies:
dont-care No fencing actions are taken. This is the default policy.
resource-only If a node becomes a disconnected primary, it tries to fence the peer. This is done by calling the fence-peer handler. The handler is supposed to reach the peer over an alternative communication path and call ' bsradm outdate minor' there.
resource-and-stonith If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence-peer handler. The fence-peer handler is supposed to reach the peer over an alternative communication path and call ' bsradm outdate minor' there. In case it cannot do that, it should stonith the peer. IO is resumed as soon as the situation is resolved. In case the fence-peer handler fails, I/O can be resumed manually with ' bsradm resume-io'.
--ko-count number If a secondary node fails to complete a write request in ko-count times the timeout parameter, it is excluded from the cluster. The primary node then sets the connection to this secondary node to Standalone. To disable this feature, you should explicitly set it to 0; defaults may change between versions.
--max-buffers number Limits the memory usage per bsr minor device on the receiving side, or for internal buffers during resync or online-verify. Unit is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible setting is hard coded to 32 (=128 KiB). These buffers are used to hold data blocks while they are written to/read from disk. To avoid possible distributed deadlocks on congestion, this setting is used as a throttle threshold rather than a hard limit. Once more than max-buffers pages are in use, further allocation from this pool is throttled. You want to increase max-buffers if you cannot saturate the IO backend on the receiving side.
--max-epoch-size number Define the maximum number of write requests bsr may issue before issuing a write barrier. The default value is 2048, with a minimum of 1 and a maximum of 20000. Setting this parameter to a value below 10 is likely to decrease performance.
--on-congestion policy,
--congestion-fill threshold,
--congestion-extents threshold By default, bsr blocks when the TCP send queue is full. This prevents applications from generating further write requests until more buffer space becomes available again. When bsr is used together with bsr-proxy, it can be better to use the pull-ahead on-congestion policy, which can switch bsr into ahead/behind mode before the send queue is full. bsr then records the differences between itself and the peer in its bitmap, but it no longer replicates them to the peer. When enough buffer space becomes available again, the node resynchronizes with the peer and switches back to normal replication. This has the advantage of not blocking application I/O even when the queues fill up, and the disadvantage that peer nodes can fall behind much further. Also, while resynchronizing, peer nodes will become inconsistent. The available congestion policies are block (the default) and pull-ahead. The congestion-fill parameter defines how much data is allowed to be "in flight" in this connection. The default value is 0, which disables this mechanism of congestion control, with a maximum of 10 GiBytes. The congestion-extents parameter defines how many bitmap extents may be active before switching into ahead/behind mode, with the same default and limits as the al-extents parameter. The congestion-extents parameter is effective only when set to a value smaller than al-extents. Ahead/behind mode is available since bsr 8.3.10.
--ping-int interval When the TCP/IP connection to a peer is idle for more than ping-int seconds, bsr will send a keep-alive packet to make sure that a failed peer or network connection is detected reasonably soon. The default value is 10 seconds, with a minimum of 1 and a maximum of 120 seconds. The unit is seconds.
--ping-timeout timeout Define the timeout for replies to keep-alive packets. If the peer does not reply within ping-timeout, bsr will close and try to reestablish the connection. The default value is 0.5 seconds, with a minimum of 0.1 seconds and a maximum of 3 seconds. The unit is tenths of a second.
--socket-check-timeout timeout In setups involving a bsr-proxy and connections that experience a lot of buffer-bloat it might be necessary to set ping-timeout to an unusual high value. By default bsruses the same value to wait if a newly established TCP-connection is stable. Since the bsr-proxy is usually located in the same data center such a long wait time may hinder bsr's connect process. In such setups socket-check-timeout should be set to at least to the round trip time between bsr and bsr-proxy. I.e. in most cases to 1. The default unit is tenths of a second, the default value is 0 (which causes bsr to use the value of ping-timeout instead). Introduced in 8.4.5.
--protocol name Use the specified protocol on this connection. The supported protocols are:
A Writes to the bsr device complete as soon as they have reached the local disk and the TCP/IP send buffer.
B Writes to the bsr device complete as soon as they have reached the local disk, and all peers have acknowledged the receipt of the write requests.
C Writes to the bsr device complete as soon as they have reached the local and all remote disks.
--rcvbuf-size size Configure the size of the TCP/IP receive buffer. A value of 0 (the default) causes the buffer size to adjust dynamically. This parameter usually does not need to be set, but it can be set to a value up to 10 MiB. The default unit is bytes.
--rr-conflict policy This option helps to solve the cases when the outcome of the resync decision is incompatible with the current role assignment in the cluster. The defined policies are:
disconnect No automatic resynchronization, simply disconnect.
violently Resync to the primary node is allowed, violating the assumption that data on a block device are stable for one of the nodes. Do not use this option, it is dangerous.
call-pri-lost Call the pri-lost handler on one of the machines. The handler is expected to reboot the machine, which puts it into secondary role.
...
discard-younger-primary,
discard-older-primary Resynchronize from the node which became primary first ( discard-younger-primary) or last (discard-older-primary). If both nodes became primary independently, the discard-least-changes policy is used.
discard-zero-changes If only one of the nodes wrote data since the split brain situation was detected, resynchronize from this node to the other. If both nodes wrote data, disconnect.
discard-least-changes Resynchronize from the node with more modified blocks.
discard-node-nodename Always resynchronize to the named node.
--after-sb-1pri policy Define how to react if a split-brain scenario is detected, with one node in primary role and one node in secondary role. (We detect split-brain scenarios when two nodes connect, so split-brain decisions are always among two nodes.) The defined policies are:
disconnect No automatic resynchronization, simply disconnect.
consensus Discard the data on the secondary node if the after-sb-0pri algorithm would also discard the data on the secondary node. Otherwise, disconnect.
violently-as0p Always take the decision of the after-sb-0pri algorithm, even if it causes an erratic change of the primary's view of the data. This is only useful if a single-node file system (i.e., not OCFS2 or GFS) with the allow-two-primaries flag is used. This option can cause the primary node to crash, and should not be used.
discard-secondary Discard the data on the secondary node.
call-pri-lost-after-sb Always take the decision of the after-sb-0pri algorithm. If the decision is to discard the data on the primary node, call the pri-lost-after-sb handler on the primary node.
--after-sb-2pri policy Define how to react if a split-brain scenario is detected and both nodes are in primary role. (We detect split-brain scenarios when two nodes connect, so split-brain decisions are always among two nodes.) The defined policies are:
disconnect No automatic resynchronization, simply disconnect.
violently-as0p See the violently-as0p policy for after-sb-1pri.
call-pri-lost-after-sb Call the pri-lost-after-sb helper program on one of the machines unless that machine can demote to secondary. The helper program is expected to reboot the machine, which brings the node into a secondary role. Which machine runs the helper program is determined by the after-sb-0pri strategy.
--always-asbp Normally the automatic after-split-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node. With this option you request that the automatic after-split-brain policies are used as long as the data sets of the nodes are somehow related. This might cause a full sync, if the UUIDs indicate the presence of a third node. (Or double faults led to strange UUID sets.)
--connect-int time As soon as a connection between two nodes is configured with bsrsetup connect, bsr immediately tries to establish the connection. If this fails, bsr waits for connect-int seconds and then repeats. The default value of connect-int is 10 seconds.
--cram-hmac-alg hash-algorithm Configure the hash-based message authentication code (HMAC) or secure hash algorithm to use for peer authentication. The kernel supports a number of different algorithms, some of which may be loadable as kernel modules. See the shash algorithms listed in /proc/crypto. By default, cram-hmac-alg is unset. Peer authentication also requires a shared-secret to be configured.
--csums-alg hash-algorithm Normally, when two nodes resynchronize, the sync target requests a piece of out-of-sync data from the sync source, and the sync source sends the data. With many usage patterns, a significant number of those blocks will actually be identical. When a csums-alg algorithm is specified, when requesting a piece of out-of-sync data, the sync target also sends along a hash of the data it currently has. The sync source compares this hash with its own version of the data. It sends the sync target the new data if the hashes differ, and tells it that the data are the same otherwise. This reduces the network bandwidth required, at the cost of higher cpu utilization and possibly increased I/O on the sync target. The csums-alg can be set to one of the secure hash algorithms supported by the kernel; see the shash algorithms listed in /proc/crypto. By default, csums-alg is unset.
--csums-after-crash-only Enabling this option (and csums-alg, above) makes it possible to use the checksum based resync only for the first resync after primary crash, but not for later "network hickups". In most cases, block that are marked as need-to-be-resynced are in fact changed, so calculating checksums, and both reading and writing the blocks on the resync target is all effective overhead. The advantage of checksum based resync is mostly after primary crash recovery, where the recovery marked larger areas (those covered by the activity log) as need-to-be-resynced, just in case. Introduced in 8.4.5.
--data-integrity-alg alg bsr normally relies on the data integrity checks built into the TCP/IP protocol, but if a data integrity algorithm is configured, it will additionally use this algorithm to make sure that the data received over the network match what the sender has sent. If a data integrity error is detected, bsr will close the network connection and reconnect, which will trigger a resync. The data-integrity-alg can be set to one of the secure hash algorithms supported by the kernel; see the shash algorithms listed in /proc/crypto. By default, this mechanism is turned off. Because of the CPU overhead involved, we recommend not to use this option in production environments. Also see the notes on data integrity below.
--fencing fencing_policy Fencing is a preventive measure to avoid situations where both nodes are primary and disconnected. This is also known as a split-brain situation. bsr supports the following fencing policies:
dont-care No fencing actions are taken. This is the default policy.
resource-only If a node becomes a disconnected primary, it tries to fence the peer. This is done by calling the fence-peer handler. The handler is supposed to reach the peer over an alternative communication path and call ' bsradm outdate minor' there.
resource-and-stonith If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence-peer handler. The fence-peer handler is supposed to reach the peer over an alternative communication path and call ' bsradm outdate minor' there. In case it cannot do that, it should stonith the peer. IO is resumed as soon as the situation is resolved. In case the fence-peer handler fails, I/O can be resumed manually with ' bsradm resume-io'.
--ko-count number If a secondary node fails to complete a write request in ko-count times the timeout parameter, it is excluded from the cluster. The primary node then sets the connection to this secondary node to Standalone. To disable this feature, you should explicitly set it to 0; defaults may change between versions.
--max-buffers number Limits the memory usage per bsr minor device on the receiving side, or for internal buffers during resync or online-verify. Unit is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible setting is hard coded to 32 (=128 KiB). These buffers are used to hold data blocks while they are written to/read from disk. To avoid possible distributed deadlocks on congestion, this setting is used as a throttle threshold rather than a hard limit. Once more than max-buffers pages are in use, further allocation from this pool is throttled. You want to increase max-buffers if you cannot saturate the IO backend on the receiving side.
--max-epoch-size number Define the maximum number of write requests bsr may issue before issuing a write barrier. The default value is 2048, with a minimum of 1 and a maximum of 20000. Setting this parameter to a value below 10 is likely to decrease performance.
--on-congestion policy,
--congestion-fill threshold,
--congestion-extents threshold By default, bsr blocks when the TCP send queue is full. This prevents applications from generating further write requests until more buffer space becomes available again. When bsr is used together with bsr-proxy, it can be better to use the pull-ahead on-congestion policy, which can switch bsr into ahead/behind mode before the send queue is full. bsr then records the differences between itself and the peer in its bitmap, but it no longer replicates them to the peer. When enough buffer space becomes available again, the node resynchronizes with the peer and switches back to normal replication. This has the advantage of not blocking application I/O even when the queues fill up, and the disadvantage that peer nodes can fall behind much further. Also, while resynchronizing, peer nodes will become inconsistent. The available congestion policies are block (the default) and pull-ahead. The congestion-fill parameter defines how much data is allowed to be "in flight" in this connection. The default value is 0, which disables this mechanism of congestion control, with a maximum of 10 GiBytes. The congestion-extents parameter defines how many bitmap extents may be active before switching into ahead/behind mode, with the same default and limits as the al-extents parameter. The congestion-extents parameter is effective only when set to a value smaller than al-extents. Ahead/behind mode is available since bsr 8.3.10.
--ping-int interval When the TCP/IP connection to a peer is idle for more than ping-int seconds, bsr will send a keep-alive packet to make sure that a failed peer or network connection is detected reasonably soon. The default value is 10 seconds, with a minimum of 1 and a maximum of 120 seconds. The unit is seconds.
--ping-timeout timeout Define the timeout for replies to keep-alive packets. If the peer does not reply within ping-timeout, bsr will close and try to reestablish the connection. The default value is 0.5 seconds, with a minimum of 0.1 seconds and a maximum of 3 seconds. The unit is tenths of a second.
--protocol name Use the specified protocol on this connection. The supported protocols are:
A Writes to the bsr device complete as soon as they have reached the local disk and the TCP/IP send buffer.
B Writes to the bsr device complete as soon as they have reached the local disk, and all peers have acknowledged the receipt of the write requests.
C Writes to the bsr device complete as soon as they have reached the local and all remote disks.
--rcvbuf-size size Configure the size of the TCP/IP receive buffer. A value of 0 (the default) causes the buffer size to adjust dynamically. This parameter usually does not need to be set, but it can be set to a value up to 10 MiB. The default unit is bytes.
--sndbuf-size size Configure the size of the TCP/IP send buffer. Since bsr 8.0.13 / 8.2.7, a value of 0 (the default) causes the buffer size to adjust dynamically. Values below 32 KiB are harmful to the throughput on this connection. Large buffer sizes can be useful especially when protocol A is used over high-latency networks; the maximum value supported is 10 MiB.
--tcp-cork By default, bsr uses the TCP_CORK socket option to prevent the kernel from sending partial messages; this results in fewer and bigger packets on the network. Some network stacks can perform worse with this optimization. On these, the tcp-cork parameter can be used to turn this optimization off.
--timeout time Define the timeout for replies over the network: if a peer node does not send an expected reply within the specified timeout, it is considered dead and the TCP/IP connection is closed. The timeout value must be lower than connect-int and lower than ping-int. The default is 6 seconds; the value is specified in tenths of a second.
--use-rle Each replicated device on a cluster node has a separate bitmap for each of its peer devices. The bitmaps are used for tracking the differences between the local and peer device: depending on the cluster state, a disk range can be marked as different from the peer in the device's bitmap, in the peer device's bitmap, or in both bitmaps. When two cluster nodes connect, they exchange each other's bitmaps, and they each compute the union of the local and peer bitmap to determine the overall differences. Bitmaps of very large devices are also relatively large, but they usually compress very well using run-length encoding. This can save time and bandwidth for the bitmap transfers. The use-rle parameter determines if run-length encoding should be used. It is on by default since bsr 8.4.0.
--verify-alg hash-algorithm Online verification (bsradm verify) computes and compares checksums of disk blocks (i.e., hash values) in order to detect if they differ. The verify-alg parameter determines which algorithm to use for these checksums. It must be set to one of the secure hash algorithms supported by the kernel before online verify can be used; see the shash algorithms listed in /proc/crypto. We recommend to schedule online verifications regularly during low-load periods, for example once a month. Also see the notes on data integrity below.
bsrsetup new-path resource peer_node_id local-addr remote-addr
...
The connect command activates a connection. That means that the bsr driver will bind and listen on all local addresses of the connection-'s paths. It will begin to try to establish one or more paths of the connection. Available options:
--tentative Only determine if a connection to the peer can be established and if a resync is necessary (and in which direction) without actually establishing the connection or starting the resync. Check the system log to see what bsr would do without the --tentative option.
--discard-my-data Discard the local data and resynchronize with the peer that has the most up-to-data data. Use this option to manually recover from a split-brain situation.
bsrsetup del-peer resource peer_node_id
...
Show the current state of all configured bsr objects, followed by all changes to the state. The output format is meant to be human as well as machine readable. The line starts with a word that indicates the kind of event: exists for an existing object; create, destroy, and change if an object is created, destroyed, or changed; or call or response if an event handler is called or it returns. The second word indicates the object the event applies to: resource, device, connection, peer-device, helper, or a dash (-) to indicate that the current state has been dumped completely. The remaining words identify the object and describe the state that he object is in. Available options:
--now Terminate after reporting the current state. The default is to continuously listen and report state changes.
--statistics Include statistics in the output.
bsrsetup get-gi resource peer_node_id volume
...
The new-resource command creates a new resource. The resource-options command changes the resource options of an existing resource. Available options:
--auto-promote bool-value A resource must be promoted to primary role before any of its devices can be mounted or opened for writing. Before bsr 9, this could only be done explicitly ("bsradm primary"). Since bsr 9, the auto-promote parameter allows to automatically promote a resource to primary role when one of its devices is mounted or opened for writing. As soon as all devices are unmounted or closed with no more remaining users, the role of the resource changes back to secondary. Automatic promotion only succeeds if the cluster state allows it (that is, if an explicit bsradm primary command would succeed). Otherwise, mounting or opening the device fails as it already did before bsr 9: the mount(2) system call fails with errno set to EROFS (Read-only file system); the open(2) system call fails with errno set to EMEDIUMTYPE (wrong medium type). Irrespective of the auto-promote parameter, if a device is promoted explicitly ( bsradm primary), it also needs to be demoted explicitly (bsradm secondary). The auto-promote parameter is available since bsr 9.0.0, and defaults to yes.
--cpu-mask cpu-mask Set the cpu affinity mask for bsr kernel threads. The cpu mask is specified as a hexadecimal number. The default value is 0, which lets the scheduler decide which kernel threads run on which CPUs. CPU numbers in cpu-mask which do not exist in the system are ignored.
--on-no-data-accessible policy Determine how to deal with I/O requests when the requested data is not available locally or remotely (for example, when all disks have failed). The defined policies are:
io-error System calls fail with errno set to EIO.
suspend-io The resource suspends I/O. I/O can be resumed by (re)attaching the lower-level device, by connecting to a peer which has access to the data, or by forcing bsr to resume I/O with bsradm resume-io res. When no data is available, forcing I/O to resume will result in the same behavior as the io-error policy. This setting is available since bsr 8.3.9; the default policy is io-error.
--peer-ack-window value On each node and for each device, bsr maintains a bitmap of the differences between the local and remote data for each peer device. For example, in a three-node setup (nodes A, B, C) each with a single device, every node maintains one bitmap for each of its peers. When nodes receive write requests, they know how to update the bitmaps for the writing node, but not how to update the bitmaps between themselves. In this example, when a write request propagates from node A to B and C, nodes B and C know that they have the same data as node A, but not whether or not they both have the same data. As a remedy, the writing node occasionally sends peer-ack packets to its peers which tell them which state they are in relative to each other. The peer-ack-window parameter specifies how much data a primary node may send before sending a peer-ack packet. A low value causes increased network traffic; a high value causes less network traffic but higher memory consumption on secondary nodes and higher resync times between the secondary nodes after primary node failures. (Note: peer-ack packets may be sent due to other reasons as well, e.g. membership changes or expiry of the peer-ack-delay timer.) The default value for peer-ack-window is 2 MiB, the default unit is sectors. This option is available since 9.0.0.
--peer-ack-delay expiry-time If after the last finished write request no new write request gets issued for expiry-time, then a peer-ack packet is sent. If a new write request is issued before the timer expires, the timer gets reset to expiry-time. (Note: peer-ack packets may be sent due to other reasons as well, e.g. membership changes or the peer-ack-window option.) This parameter may influence resync behavior on remote nodes. Peer nodes need to wait until they receive an peer-ack for releasing a lock on an AL-extent. Resync operations between peers may need to wait for for these locks. The default value for peer-ack-delay is 100 milliseconds, the default unit is milliseconds. This option is available since 9.0.0.
bsrsetup outdate minor
Mark the data on a lower-level device as outdated. This is used for fencing, and prevents the resource the device is part of from becoming primary in the future. See the --fencing disk option.
...
Change the role of a node in a resource to primary. This allows the replicated devices in this resource to be mounted or opened for writing. Available options:
--overwrite-data-of-peer This option is an alias for the --force option.
--force Force the resource to become primary even if some devices are not guaranteed to have up-to-date data. This option is used to turn one of the nodes in a newly created cluster into the primary node, or when manually recovering from a disaster. Note that this can lead to split-brain scenarios. Also, when forcefully turning an inconsistent device into an up-to-date device, it is highly recommended to use any integrity checks available (such as a filesystem check) to make sure that the device can at least be used without crashing the system. Note that bsr usually only allows one node in a cluster to be in primary role at any time; this allows bsr to coordinate access to the devices in a resource across nodes. The --allow-two-primaries network option changes this; in that case, a mechanism outside of bsr needs to coordinate device access.
bsrsetup resize minor
Reexamine the size of the lower-level devices of a replicated device on all nodes. This command is called after the lower-level devices on all nodes have been grown to adjust the size of the replicated device. Available options:
--assume-peer-has-space Resize the device even if some of the peer devices are not connected at the moment. bsr will try to resize the peer devices when they next connect. It will refuse to connect to a peer device which is too small.--assume-cleanDo not resynchronize the added disk space; instead, assume that it is identical on all nodes. This option can be used when the disk space is uninitialized and differences do not matter, or when it is known to be identical on all nodes. See the bsrsetup verify command.
--size val This option can be used to online shrink the usable size of a bsr device. It's the users responsibility to make sure that a file system on the device is not truncated by that operation.
--al-stripes val These options may be used to change the layout of the activity log online. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the --size) or increasing the avalable space on the backing devices.
bsrsetup resume-io minor
Resume I/O on a replicated device. See the --fencing net option.
...
Show the current configuration of a resource, or of all resources. Available options:
--show-defaults Show all configuration parameters, even the ones with default values. Normally, parameters with default values are not shown.
bsrsetup show-gi resource peer_node_id volume
...
Show the status of a resource, or of all resources. The output consists of one paragraph for each configured resource. Each paragraph contains one line for each resource, followed by one line for each device, and one line for each connection. The device and connection lines are indented. The connection lines are followed by one line for each peer device; these lines are indented against the connection line. Long lines are wrapped around at terminal width, and indented to indicate how the lines belongs together. Available options:
--verbose Include more information in the output even when it is likely redundant or irrelevant.
--statistics Include data transfer statistics in the output.
--color={always | auto | never} Colorize the output. With --color=auto, bsrsetup emits color codes only when standard output is connected to a terminal.
For example, the non-verbose output for a resource with only one connection and only one volume could look like this:
...
Start online verification, change which part of the device will be verified, or stop online verification. The command requires the specified peer to be connected. Online verification compares each disk block on the local and peer node. Blocks which differ between the nodes are marked as out-of-sync, but they are not automatically brought back into sync. To bring them into sync, the resource must be disconnected and reconnected. Progress can be monitored in the output of bsrsetup status --statistics. Available options:
--start position Define where online verification should start. This parameter is ignored if online verification is already in progress. If the start parameter is not specified, online verification will continue where it was interrupted (if the connection to the peer was lost while verifying), after the previous stop sector (if the previous online verification has finished), or at the beginning of the device (if the end of the device was reached, or online verify has not run before). The position on disk is specified in disk sectors (512 bytes) by default.
--stop position Define where online verification should stop. If online verification is already in progress, the stop position of the active online verification process is changed. Use this to stop online verification. The position on disk is specified in disk sectors (512 bytes) by default. Also see the notes on data integrity in the bsr.conf(5) manual page.
bsrsetup wait-connect-volume resource peer_node_id volume,
...
The wait-connect-* commands waits until a device on a peer is visible. The wait-sync-* commands waits until a device on a peer is up to date. Available options for both commands:
--degr-wfc-timeout timeout Define how long to wait until all peers are connected in case the cluster consisted of a single node only when the system went down. This parameter is usually set to a value smaller than wfc-timeout. The assumption here is that peers which were unreachable before a reboot are less likely to be reachable after the reboot, so waiting is less likely to help. The timeout is specified in seconds. The default value is 0, which stands for an infinite timeout. Also see the wfc-timeout parameter.
--outdated-wfc-timeout timeout Define how long to wait until all peers are connected if all peers were outdated when the system went down. This parameter is usually set to a value smaller than wfc-timeout. The assumption here is that an outdated peer cannot have become primary in the meantime, so we don't need to wait for it as long as for a node which was alive before. The timeout is specified in seconds. The default value is 0, which stands for an infinite timeout. Also see the wfc-timeout parameter.
--wait-after-sb This parameter causes bsr to continue waiting in the init script even when a split-brain situation has been detected, and the nodes therefore refuse to connect to each other.
--wfc-timeout timeout Define how long the init script waits until all peers are connected. This can be useful in combination with a cluster manager which cannot manage bsr resources: when the cluster manager starts, the bsr resources will already be up and running. With a more capable cluster manager such as Pacemaker, it makes more sense to let the cluster manager control bsr resources. The timeout is specified in seconds. The default value is 0, which stands for an infinite timeout. Also see the degr-wfc-timeout parameter.
bsrsetup forget-peer resource peer_node_id
...