First, we'll look at the necessary components of a BSR and explain how it is configured with an example.
Components
To build replication, you need to configure nodes (hosts), volumes to be replicated, and a network for communication channels between replication nodes. You define a replication cluster by describing these components as a single resource unit in a configuration file.
Node
Basically, you need to prepare a production node and a standby node, and the standby node can be operated with N nodes. At least two nodes are required for replication.
Nodes are a distinct term from hosts, but we don't make a strict distinction here; we describe them as hosts only when the distinction is necessary, and otherwise as nodes.
Volume
Data Volume
You must prepare storage devices of the same size on both cluster nodes. When configuring with different sized volumes, at a minimum, the size of the volume on the target node should be larger than the size of the volume on the source node, but it is not recommended to configure volumes with different sizes. (Partition them to make the sizes on both sides match completely)
In BSR, the size of a clone volume is the partition size (in bytes). If the source and target are different in size, even by 1 byte, the replication connection will fail.
Getting Partition Size
In Windows powershell
gwmi -Query "SELECT * from Win32_DiskPartition"
In Linux command line
fdisk -l
Volumes must be formatted with the appropriate filesystem for the operating system and use filesystems such as NTFS/ReFS, ext/xfs, and others provided by Windows and Linux. Volumes can be logical drives or devices in MBR, GPT, or extended partitions, depending on how they are partitioned, and can be configured to include dynamic disks in any RAID format, including spanned, striped, or mirrored. If the volume is already formatted and contains critical data, you can use the existing volume for configuration.
The volume for replication must not have a paging file setting for virtual memory operations. If there is a paging file setting, you cannot perform umount on the volume.
The maximum size of a replication volume supported by bsr is theoretically 1 PB, and volumes larger than 10 TB are typically considered large.
Space reclamation in thin provisioning environments is not compatible with BSR. To deploy in a thin provisioned storage environment, you must disable and enable space reclamation.
Meta Volume
BSR keeps the additional information needed to operate replication in a separate, non-volatile storage space and performs real-time writing and reading of this data simultaneously during replication. This additional information is called metadata, and the storage volume that writes it is called the meta-volume. Meta-volumes should be sized to have a 1:1 correspondence to the replication volumes, requiring approximately 33 MB of space per 1 TB based on a 1-node replication. For example, a 1:2 replication, 3TB replica volume requires a meta volume sized 2333MB = 198MB.
The meta data is called internal meta if it is located on the same disk device as the replica volume, or external meta if it is located on an external disk other than the replica volume. Internal meta has the advantage of not requiring you to prepare a separate disk device, but performance wise, external meta is slightly better because it performs I/O to different disks. Internal Meta is described by the INTERNAL keyword in the configuration file, as shown in the following example, which causes BSR to partition the replica volume at initialization time and automatically generate metadata within a delimited meta-zone. The INTERNAL keyword is only available in Linux environments.
Linux Internal Meta
resource r0 {
...
meta-disk internal;
...
}
External meta disks can be specified in the configuration file in several ways, depending on the operating system: as a mount point in Windows, as a device name of a disk device in Linux. BSR supports metavolumes for virtual disk devices, so you can use a virtual volume as a meta volume even if you don't have a separate physical disk device. The virtual volume device must be prepared separately as a VHD in Windows or a Loop device in Linux.
The following is an example of a specification for an external metadisk
External Meta - Windows Letter Mount Point
resource r0 {
...
meta-disk m;
...
}
External Meta - Windows GUID Mount Point
resource r0 {
...
meta-disk "\\?\Volume{d41d41d8-17fb-11e6-bb93-000c29ac57ee}\";
...
}
External Meta - Windows GUID Mount Point + VHD
resource r0 {
...
meta-disk "\\?\Volume{ed8a8f02-18b3-11e6-91b4-000c29ac57f8}\" "c:\r0_meta.vhd";
...
}
A virtual disk is a type of file disk, and even if it is configured and mounted once, it is not automatically remounted when the system is restarted. Therefore, BSR ensures that the absolute path to the virtual disk file described in the configuration file is used to automatically mount it on system restart. This process is handled automatically by the BSR service and scripts.
External Meta - Linux device name
resource r0 {
...
meta-disk /dev/sdc1;
...
}
External Meta - Linux loop device
resource r0 {
...
meta-disk /dev/loop0 /bsr_meta/r0_meta;
...
}
The meta-disk volume must be prepared as a RAW filesystem, not formatted as a normal filesystem.
Connection
BSR recommends the use of dedicated lines when configuring replicasets, but this is not absolute: dedicated lines, back-to-back connections, and Gigabit Ethernet connections are the most reasonable choices, but when replicating beyond switched equipment, you must consider performance issues such as throughput and latency through routers.
Resources in the bsr typically use TCP listen ports of 7788 or higher, and each resource must have its ports set differently, and the local firewall must allow the ports set by that resource. Of course, you must prevent other applications from using the bsr's TCP ports.
You'll probably end up configuring your connections in the following order
The hosts (bsr-active, bsr-standby) use the dedicated network interface eth1 and assign IP addresses 10.1.1.31 and 10.1.1.32.
TCP ports 7788 through 7799 are used by bsr.
On the local firewall, allow both inbound and outbound ports between the hosts.
Configuration Files
The above-mentioned components are written in configuration files, i.e., node (host) information, volume and connection information are described using keywords that match the attributes within a given zone (section).
All configuration files in BSR are located in a subdirectory of the installation path, etc, and the BSR utilities load them from that path.
First, create bsr.conf in the etc directory. The general contents of a bsr.conf file are shown below.
include "bsr.d/global_common.conf";
include "bsr.d/*.res";
As a matter of convention, we start by creating a global, common section of BSR in the /etc/bsr.d/global_common.conf file, and we recommend that you include all .res files so that you can separate them by resource.
We describe properties based on a few promised zones, which we call sections. The top-level sections are Global, Common, and Resource, and within each section there are property-specific subsections. This section only describes the major sections and some basic properties. For more information about configuration files, see Configuration Files in the Appendix.
Global Section
This section can only be used once globally, and is typically found inside the /etc/bsr.d/global_common.conf file. If you configure it in a single file, you can write it at the top of the configuration file.
The configuration included in this section are options related to the user interface, such as command timeouts, IP validation, etc.
Common Section
This section provides settings that can be set for properties that are common to all resources and are usually written in /etc/bsr.d/global_common.conf. Of course, you can also define each property option on an individual resource basis.
While it is not mandatory to have a <Common> section, it is recommended that you do if you are using more than one resource, otherwise it can become cluttered with reused options. For example, if you set <net> {protocol C;} in the <Common> section, all resources will inherit this option unless otherwise specified.
Resource Section
A single resource configuration filename is typically created in the form /etc/bsr.d/<resource>.res. The resource name used here must be specified within the resource file. Naming is arbitrary and identifiable, but it must be in US-ASCII format and must not contain space characters. Also, every resource configuration must have at least one <host> subsection. All other configuration settings are inherited from the Common section or set to the default values in the bsr. Options with values common to both hosts can be specified directly in the parent <resource> section of the <host> at once, but can be simplified by stating them as in the following example.
Specifying a node-id for each node is mandatory.
resource r0 { disk d; meta-disk f; on alice { address 10.1.1.31:7789; node-id 0; } on bob { address 10.1.1.32:7789; node-id 1; } } |
Configuration type
BSR provides flexible redundancy for your organization's critical data in a variety of configurations. Configurations that replicate within the local network are commonly referred to as mirror configurations, while those that replicate between remote locations are called disaster recovery (DR) configurations.
Local Mirror
Replication protocols are typically configured synchronously, which is a common way to configure mirroring for redundancy within a local network.
/etc/bsr.d/r0.res
resource r0 { net { protocol C; } on alice { disk d; address 10.1.1.31:7789; meta-disk f; node-id 0; } on bob { disk d; address 10.1.1.32:7789; meta-disk f; node-id 1; } } |
Local 1:N Mirror
An N-node replication configuration within the local network that extends a 1:1 mirror to N nodes. The replication protocol is typically configured as synchronous, but you may want to consider an asynchronous configuration if N-node replication causes performance degradation. The replication protocol defaults to synchronous (C) if omitted from the configuration.
resource r0 { //net { // protocol C; //} device e minor 2; disk e; meta-disk f; on store1 { address 10.1.10.1:7100; node-id 0; } on store2 { address 10.1.10.2:7100; node-id 1; } on store3 { address 10.1.10.3:7100; node-id 2; } connection-mesh { hosts store1 store2 store3; } }
Remote DR(Disaster Recovery)
When configuring disaster recovery replication over a WAN segment, you need to use an asynchronous protocol as the default and set the egress buffer settings for buffering and a mode for when the buffer becomes congested. For more information about congestion mode, see Congestion mode.
resource r0 { net { protocol A; sndbuf-size 1G; on-congestion pull-ahead; congestion-fill 900M; } on main_server { disk d; address 10.1.1.31:7789; meta-disk f; node-id 0; } on dr_server { disk d; address 10.1.1.32:7789; meta-disk f; node-id 1; } }
You can also maximize replication processing performance when you integrate a replication accelerator (DRX).
MDR (Local Mirror & Remote DR)
A mixed configuration of mirroring on the local network and remote disaster recovery replication across the WAN. The local mirror is synchronous and the WAN remote replication is asynchronous. WAN cross-border replication requires a transmit buffer and congestion mode to be configured, and the use of a replication accelerator (DRX) is recommended.
resource r0 { volume 0 { device e minor 2; disk e; meta-disk f; } on store1 { node-id 1; // Active } on store2 { node-id 2; // Standby } on store3 { node-id 3; // DR } connection { net { protocol c; } host store1 address 10.10.0.240:7789; // Active host store2 address 10.10.0.241:7789; // Standby } connection { net { protocol A; sndbuf-size 1G; on-congestion pull-ahead; congestion-fill 900M; } host store2 address 10.10.0.241:7789; // Standby host store3 address 20.20.0.253:7789; // DR } connection { net { protocol A; sndbuf-size 1G; on-congestion pull-ahead; congestion-fill 900M; } host store1 address 10.10.0.240:7789; // Active host store3 address 20.20.0.253:7789; // DR } }
SDR (Shared Disk & Remote DR)
This method replicates across the WAN by configuring the shared disk of the main production site as the source and the target as a disaster recovery node. The two nodes that access the shared disk at the primary site are configured as Active/Standby, and the DR node is configured as the second standby node. The Active-Standby node is set to the same virtual IP address (VIP) to mutually exclude resource operations between the nodes, and the node on the DR side communicates with this VIP to receive data from the Active/Standby node.
As a WAN disaster recovery configuration, asynchronous protocol operation and replication accelerator (DRX) interworking should be considered.
resource r0{ net{ protocol A; verify-alg crc32; sndbuf-size 1G; # max buffer size is 1 / 8 of physical ram on-congestion pull-ahead; congestion-fill 950M; } floating 10.20.210.4:7788 { // Use VIP options { svc-autostart no; // Applies to Active,Standby between SDR configurations Required options, resource auto-start no } device e minor 2; disk e; meta-disk "\\\\?\\Volume{d4006597-e3d1-4685-a91b-b23a669499f4}\\"; // Storage (RAW) volumes that are concurrently accessible on both servers node-id 0; } floating 10.20.210.3:7788 { // DR IP options { svc-autostart yes; // DR between SDR configurations can resource auto-start yes } device e minor 2; disk e; meta-disk "\\\\?\\Volume{58f21aac-2b90-464e-9cea-42a25846fd56}\\"; // Internal or storage (RAW) volumes node-id 1; } }
N:1 Mirror
This is a way of designating one node as the target node for replication of resources located on different nodes. It is a 1:1 mirror configuration in terms of individual resources, but is defined as an N:1 mirror in terms of overall operations.
Specify the target node as store3 on the source node store1.
resource r0 { device e minor 2; disk e; meta-disk f; on store1 { node-id 0; } on store3 { node-id 2; } connection { host store1 address 10.10.0.245:7789; host store3 address 10.10.0.247:7789; } } |
This is an N:1 configuration in which store1 and store2 configured above with source node store2 and target node store3 are targeted by store3.
resource r1 { device e minor 2; disk e; meta-disk f; on store2 { node-id 1; } on store3 { node-id 2; } connection { host store2 address 10.10.0.246:7790; host store3 address 10.10.0.247:7790; } } |
Target node store3 accepts configurations from both store1 and store2.
resource r0 { device e minor 2; disk e; meta-disk f; on store1 { node-id 0; } on store3 { node-id 2; } connection { host store1 address 10.10.0.245:7789; host store3 address 10.10.0.247:7789; } } resource r1 { device g minor 4; disk g; meta-disk h; on store2 { node-id 1; } on store3 { node-id 2; } connection { host store2 address 10.10.0.246:7790; host store3 address 10.10.0.247:7790; } } |
Floating Peer config.
You can configure based on IP address without specifying a hostname.
resource r0 { floating 200.200.200.6:7788 { device d minor 1; disk d; meta-disk n; node-id 0; } floating 200.200.200.7:7788 { device d minor 1; disk d; meta-disk n; node-id 1; } }
resource r0 { floating 10.10.0.251:7788 { device e minor 2; disk e; meta-disk f; node-id 0; } floating 10.10.0.252:7788 { device e minor 2; disk e; meta-disk f; node-id 1; } floating 10.10.0.253:7788 { device e minor 2; disk e; meta-disk f; node-id 2; } connection { address 10.10.0.251:7788; address 10.10.0.252:7788; } connection { address 10.10.0.251:7788; address 10.10.0.253:7788; } connection { address 10.10.0.252:7788; address 10.10.0.253:7788; } }
Mixed Config.
You can configure a mix of Windows and Linux nodes. Use for DR deployments, backups, and more.
resource r0 { floating-on-linux 200.200.200.6:7788 { disk /dev/sdb1; device /dev/bsr0; meta-disk internal; node-id 0; } floating-on-windows 200.200.200.7:7788 { device d minor 1; disk d; meta-disk n; node-id 1; } }
Cautions
Windows
Volume
Replica volumes must be online (mounted) and assigned a letter.
Metadisk volumes must be lettered or GUIDed, and must be prepared in RAW format. Formatting with certain file systems (such as NTFS) will result in initialization errors due to permissions issues at meta-volume initialization time.
Disk volume size
The size of the volume must be the same or larger on the target node volume than the size of the source node volume.
The size of the volume here refers to the size of the partition, not the size of the filesystem after formatting, and can be obtained from the powershell command line as follows
Windows PowerShell
Copyright (C) 2014 Microsoft Corporation. All rights reserved.
PS C:\Users\sekim>gwmi -Query "SELECT * from Win32_DiskPartition"
NumberOfBlocks : 488392704
BootPartition : False
Name : Disk 4, Partition 0
PrimaryPartition : True
Size : 250057064448
Index : 0
…
Node
The hostname must be described in the configuration file host section (except for floating peer methods)
A node-id entry must be described in the configuration file host section.
Connection
You must add a local firewall exception policy for the mirroring address and port.
The network address set on the NIC must be described as the IP address in the net section.
IP address configuration residual errors
Depending on the operating situation, an error such as "There are multiple host sections for this node" may occur when the resource is started up after changing the previously set IP address information. This is an IP address recognition error caused by Windows leaving the IP address information that was set on the lan card in the registry, which can be resolved by manually modifying the following registry entries.
Set the IP address remnant set in the Interface key under the HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Tcpip\Parameters\Interfaces key to the IP address you want to change.
Linux
The I/O redirection implemented by Linux DRBDs has been removed from bsr. Therefore, diskless mode for legacy DRBDs is not supported (only the diskless state with the volume detached is defined).
With the removal of diskless mode, local read I/O is also treated to default bypass. Read I/O is also not redirected.