First, let's take a look at the configuration type of bsr and explain the necessary components according to the configuration.

Configuration type

bsr provides a flexible way to replicate your company's critical data in a variety of ways.

1:1 Mirror

As a general mirroring configuration method for redundancy within the local network, the synchronous configuration is common for protocols, but there is no restriction on protocol settings.

1:N Mirror

This is a configuration in which a 1:1 mirror is extended to N nodes with N node replication configuration within the local network. Synchronous configuration is general, but there are no restrictions on protocol settings.

1:1 DR

Asynchronous protocol should be used as a disaster recovery(DR) replication configuration over the WAN area, and the transmission buffer and congestion mode must be set. WAN replication can maximize replication performance by linking with a replication accelerator (DRX).

1:N Mirror & DR

It is a mixed configuration of mirroring configuration of local network and remote disaster recovery replication of WAN. Local mirror is synchronous and WAN remote replication is asynchronous. For WAN replication, it is necessary to set the transmission buffer and congestion mode, and it is recommended to interwork the replication accelerator (DRX).

Shared-Disk DR

This is a method of replicating through the WAN by configuring the shared disk of the main operation site as the source and configuring the target as a disaster recovery node. The 2 nodes accessing the shared disk of the main site are configured as Active-Standby to configure the DR node as the 2nd Standby node. The Active-Standby node sets the same virtual IP address (VIP) to perform mutually exclusive operations for resource up, and the DR-side node receives data from the Active or Standby node in association with the primary site through this VIP.

As a WAN disaster recovery configuration, asynchronous protocol operation and replication accelerator (DRX) interworking should be considered.

N:1 Mirror

This is a method of configuring the target node of resources located in different source nodes into one node. In terms of individual resources, it is a 1:1 mirror configuration, but is defined as an N:1 mirror in terms of overall topology operation.

Local Migration

This configuration replicates the local source volume to the local target volume through multi-resource configuration within the local. Used for live migration.

Configuration component

In order to establish replication, the source and target nodes, the volume to be replicated on nodes, and the network for the communication channel between the nodes(hosts) must be configured. In addition, these components are described in a configuration file as a resource unit to define a replication cluster.

Node

Basically, an operation node and a standby node must be prepared, and the standby node can operate as N nodes. At least two nodes are required for replication.

Node is a term that is distinguished from host, but it is not strictly distinguished here, and is described as a host only when a distinction is needed.

Volume

Data volume

Storage units of the same size must be prepared on all cluster nodes. If configuring a volume of a different size, the minimum target node's volume size must be larger than the source node's volume size.

The volume must be formatted with an appropriate file system according to the operating system, and uses file systems such as NTFS/ReFS and ext/xfs provided by Windows and Linux. Depending on the partitioning method, the volume can be a logical drive or device of MBR, GPT, or extended partition, and can be configured to include all dynamic disks in RAID format such as span, stripe, and mirror. If the volume is already formatted and contains important data, you can use the existing volume as it is without needing to format the volume, obviously.

The volume for replication should not have paging file settings for virtual memory operation. If there is a paging file setting, umount to the volume cannot be performed.
The maximum size of a replication volume supported by bsr is theoretically 1 PB, and a volume larger than 10 TB is generally considered as a large volume.

The method of configuring thin provisioning in a virtualized environment is not suitable for the environment in which replication is configured. In order to maintain consistency, replication requires continuous tracking of data changes over the entire area of the volume. In a thin provisioning environment, the volume's physical space is actively adjusted by increasing or decreasing the volume's physical space. Therefore, the replication agent installed in the guest OS cannot continuously track the entire area of the volume. For this reason, configuring replication in a thin provision method in a virtualized environment can be problematic.

Another option, the thick provisioning method, is a method that allocates the entire area of the volume in a fixed manner, so it conforms to the existing concept of replication operation. When configuring volumes in a virtualized environment, only the thick provisioning configuration should be used.

Meta Volume

bsr keeps additional information necessary to operate replication in a separate non-volatile storage space and simultaneously writes and reads this data during replication. This additional information is called meta data, and the storage volume that records it is called meta volume. The meta volume must be prepared to correspond 1: 1 to the replication volume, and the size requires about 33MB of space per 1TB based on 1 node replication. For example, for a 1: 2 replication, 3TB replication volume, you need a meta volume with size of 2 * 3 * 33MB = 198MB.

Meta data is classified into internal meta if it is on the same disk device as the replica volume, or external meta if metadata is located on an external disk other than the replica volume.

The internal meta has the advantage of not having to prepare a separate disk device, but in terms of performance, the external meta method of performing I / O to different disks is more advantageous. Internal meta is described by the internal keyword in the configuration file as shown in the following example, where bsr partitions the replication volume at initialization time and automatically generates metadata within the delimited meta area. The internal keyword is provided only in the Linux environment.

Internal Meta

resource r0 {
...
meta-disk internal;
...
}

The external meta disk can specify the device to be used as the meta disk in the configuration file in several ways, depending on the operating system, as the mount point in Windows, and as the device name of the disk device in Linux. In addition, bsr supports meta-volumes for virtual disk devices, so virtual volumes can be used as meta volumes even if there is no separate physical disk device. Virtual volume device must be prepared separately as VHD on Windows and Loop device on Linux and can be used in the same way as an external meta disk.

The following is an example of configuration for an external metadisk.

External Meta - Windows Letter Mount Point

resource r0 {
...
meta-disk m;
...
}

External Meta - Windows GUID Mount Point

resource r0 {
...
meta-disk "\\\\?\\Volume{d41d41d8-17fb-11e6-bb93-000c29ac57ee}\\";
...
}

External Meta - Windows GUID Mount Point + VHD

resource r0 {
...
meta-disk "\\\\?\\Volume{ed8a8f02-18b3-11e6-91b4-000c29ac57f8}\\" "c:\r0_meta.vhd";
...
}

The virtual disk is a type of file disk. Even if it is configured and mounted once, it is not automatically remounted when the system is restarted. Therefore, bsr measures to mount automatically upon system restart through the absolute path of the virtual disk file described in the configuration file. This is done automatically through the bsr service and scripts.

External Meta - Linux device name

resource r0 {
...
meta-disk /dev/sdc1;
...
}

External Meta - Linux loop device

resource r0 {
...
meta-disk /dev/loop0 /bsr_meta/r0_meta;
...
}

Meta disk volumes should be prepared in RAW file system state, rather than formatted as a regular file system.

Network

A dedicated line, back-to-back connection, and Gigabit Ethernet connection are the most reasonable options, but when replicating beyond a switch device, performance issues such as throughput and latency at the router must be considered.

The resources of bsr usually use the TCP listening port of 7788 or higher, and each resource must set the port differently and allow the port set by the resource in the local firewall. Of course, it must be configured to prevent other applications from using the TCP port of bsr.

The following is an example of network-related settings.

bsr hosts use dedicated network interface eth1 and assign IP addresses to 10.1.1.31 and 10.1.1.32.
bsr normally uses TCP ports 7788 to 7799.
Enable both inbound and outbound ports between hosts in the local firewall.

Create Resource

Resource creation is the process of describing the components mentioned above in the configuration file. In other words, node (host) information, volume, and connection information can be described in the configuration file through keywords that match attributes within a designated area (section).

All configuration files of bsr are located in the etc directory under the installation path, and the bsr commands load the configuration files by default. First, create bsr.conf in the etc directory. The general contents of the bsr.conf file are as follows.

include "bsr.d/global_common.conf";
include "bsr.d/*.res";

First, by convention, the global and common sections of bsr are described in the /etc/bsr.d/global_common.conf file. Also, by including all .res files, you can separate and manage configuration files for each resource.

The properties are described based on several promised zones, which are called sections. The section is the top-level section, and there are Global, Common, and Resource sections, and each section has sub-sections. Only the main sections and some basic properties are described here. For more information about the configuration file, refer to the configuration file in the appendix.

Global section

This section can only be used once globally, and is usually located in the /etc/bsr.d/global_common.conf file. If it consists of a single file, you can write it at the top of the configuration file.

The configurations included in this section are options related to the user interface, such as command line timeout or ip validation.

Common section

This section provides settings that can be set as properties common to all resources and is usually written in /etc/bsr.d/global_common.conf. Of course, you can also define each attribute option for each resource individually.

The <Common> section is not required, but it is recommended if you are using more than one resource. Otherwise, it can be complicated by the options that are reused. For example, if you set <net> {protocol C;} in the <Common> section, all resources inherit this option unless a separate option is specified.

Resource section

One resource configuration file name is usually created in /etc/bsr.d/<resource>.res. The resource name used here must be specified in the resource file. Naming names are randomly identifiable, but must be in US-ASCII format and must not contain spaces. Also, every resource configuration must have at least two <host> subsections. All other configuration settings are inherited from the Common section or set as the default for bsr. Options with values common to both hosts can be specified at once in the parent <resource> section of <host>, which can be simplified by describing them as in the following example.

Specifying the node id (node-id) of each node is mandatory.

resource r0 {
  disk      d;
  meta-disk f;
  on alice {
    address   10.1.1.31:7789;
    node-id 0;
  }
  on bob {
    address   10.1.1.32:7789;
    node-id 1;
  }
}

Configuration examples

Simple configuration

The following is an example of a Windows bsr configuration file configured with minimal settings.

/etc/bsr.d/global_common.conf

global {
}
common {
  net { 
    protocol C;
  }
}

/etc/bsr.d/r0.res

resource r0 {
  on alice {  
    disk      d;
    address   10.1.1.31:7789;
    meta-disk f;
    node-id 0;  
  }
  
  on bob {  
    disk      d;
    address   10.1.1.32:7789;
    meta-disk f;
    node-id 1;
  }
  
}

1:2 Mesh

Here is an example of a 1: 2 mirror configuration. Specifies to establish all connections between 3 nodes through the connection-mesh section.

resource r0 {
	device	e minor 2;
	disk	e;
	meta-disk f;

	on store1 {
		address   10.1.10.1:7100;
		node-id   0;
	}
	on store2 {
		address   10.1.10.2:7100;
		node-id   1;
	}
	on store3 {
		address   10.1.10.3:7100;
		node-id   2;
	}
	
	connection-mesh {
		hosts     store1 store2 store3;
  	}
}

1: 2 individual connection configuration

This is an example of a 1: 2 mirror configuration, and you can set properties according to the connection individually.

resource r0 {

	volume 0 {
		device    e  minor 2;
		disk      e;
		meta-disk f;
	}

  	on store1 {
		node-id   0;
  	}

  	on store2 {
		node-id   1;
  	}
  	on store3 {
		node-id   2;
  	}


	connection {
		host store1 address 10.10.0.245:7789;
		host store2 address 10.10.0.252:7789;
	}
	connection {
		host store2 address 10.10.0.252:7789;
		host store3 address 10.10.0.247:7789;
	}
	connection {
		host store1 address 10.10.0.251:7789;
		host store3 address 10.10.0.247:7789;
	}
}

floating peer

You can configure based on IP address without specifying a host name.

resource r0 {

	floating 200.200.200.6:7788 {
		device	d minor 1;
		disk	d;
		meta-disk	n;
		node-id  0;
	}

	floating 200.200.200.7:7788 {
		device	d minor 1;
		disk	d;
		meta-disk	n;
		node-id  1;
	}
}

resource r0 {

	floating 10.10.0.251:7788 {
		device	e minor 2;
		disk	e;
		meta-disk	f;
		node-id  0;
	}

	floating 10.10.0.252:7788 {
		device	e minor 2;
		disk	e;
		meta-disk	f;
		node-id  1;
	}
	floating 10.10.0.253:7788 {
		device	e minor 2;
		disk	e;
		meta-disk	f;
		node-id  2;
	}


	
	connection {
		address 10.10.0.251:7788;
		address 10.10.0.252:7788;
	}
	connection {
		address 10.10.0.251:7788;
		address 10.10.0.253:7788;
	}
	connection {
		address 10.10.0.252:7788;
		address 10.10.0.253:7788;
	}
	

}

2:1 configuration

In the source node store1, specify the target node as store3.

resource r0 {
	device    e  minor 2;
	disk      e;
	meta-disk f;

  	on store1 {
		node-id   0;
  	}

  	on store3 {
		node-id   2;
  	}

	connection {
		host store1 address 10.10.0.245:7789;
		host store3 address 10.10.0.247:7789;
	}
}

In the source node store2, the target node is specified as store3, and the above configured store1 and store2 are N: 1 configurations that specify store3 as the target.

resource r1 {
	device    e  minor 2;
	disk      e;
	meta-disk f;

  	on store2 {
		node-id   1;
  	}

  	on store3 {
		node-id   2;
  	}

	connection {
		host store2 address 10.10.0.246:7790;
		host store3 address 10.10.0.247:7790;
	}
}

Target node store3 accepts both store1 and store2 configurations.

resource r0 {
	device    e  minor 2;
	disk      e;
	meta-disk f;

  	on store1 {
		node-id   0;
  	}

  	on store3 {
		node-id   2;
  	}

	connection {
		host store1 address 10.10.0.245:7789;
		host store3 address 10.10.0.247:7789;
	}
}


resource r1 {
	device    g  minor 4;
	disk      g;
	meta-disk h;

  	on store2 {
		node-id   1;
  	}

  	on store3 {
		node-id   2;
  	}

	connection {
		host store2 address 10.10.0.246:7790;
		host store3 address 10.10.0.247:7790;
	}
}

Precautions

The following describes precautions for each platform.

Windows

volume

Replication volumes must be online (mounted) and have letter assigned.
The metadisk volume must be specified as a letter or GUID, and must be prepared in RAW format. Formatting with a specific file system (eg NTFS) causes initialization errors due to permission issues at the time of meta volume initialization.
Disk volume size
- The size of the volume must be the same or larger than the size of the target node volume.
- Here, the size of the volume means the size of the partition, not the size of the file system after formatting, and partition size can be obtained from the powershell command line as follows.

PS C:\Users\sekim>gwmi -Query "SELECT * from Win32_DiskPartition"

NumberOfBlocks : 488392704
BootPartition : False
Name : disk 4, partition 0
PrimaryPartition : True
Size : 250057064448
Index : 0

NumberOfBlocks : 716800
BootPartition : True
Name : disk 0, partition 0
PrimaryPartition : True
Size : 367001600
Index : 0

NumberOfBlocks : 487675904
BootPartition : False
Name : disk 0, partition 1
PrimaryPartition : True
Size : 249690062848
Index : 1

NumberOfBlocks : 976766976
BootPartition : False
Name : disk 5, partition 0
PrimaryPartition : True
Size : 500104691712
Index : 0

NumberOfBlocks : 1953519616
BootPartition : False
Name : disk 2, partition 0
PrimaryPartition : True
Size : 1000202043392
Index : 0

NumberOfBlocks : 976766976
BootPartition : False
Name : disk 3, partition 0
PrimaryPartition : True
Size : 500104691712
Index : 0

NumberOfBlocks : 488392704
BootPartition : False
Name : disk 1, partition 0
PrimaryPartition : True
Size : 250057064448
Index : 0

Node

The host name must be specified in the host section of the configuration file (except for the floating peer method).
The node-id entry should be described in the host section of the configuration file.

Network

You must add a local firewall exception policy for mirroring addresses and ports.
The network address set on the NIC should be described as the IP address in the net section.

IP address garbage value error on registry

Depending on the operating situation, an error such as “There are multiple host sections for this node“ may occur when starting up a resource after changing the previously set IP address information. This is an issue that causes IP address recognition errors because the IP address information set in the LAN card in Windows remains in the registry and can be solved by manually modifying the following registry entries.

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Tcpip\Parameters\Interfaces Set the IP address residue set in the interface key under the key to the IP address you want to change.

Linux

I/O redirection implemented in Linux DRBD has been removed from bsr. Therefore, operation of the existing DRBD in diskless mode is not possible.
As diskless mode is removed, local read I/O is also handled to default bypass. Read I/O is also not redirected.

Configurations

Configuration type

1:1 Mirror

1:N Mirror

1:1 DR

1:N Mirror & DR

Shared-Disk DR

N:1 Mirror

Local Migration

Configuration component

Node

Volume

Data volume

Meta Volume

Network

Create Resource

Global section

Common section

Resource section

Configuration examples

Simple configuration

1:2 Mesh

1: 2 individual connection configuration

floating peer

2:1 configuration

Precautions

Windows

volume

Node

Network

Linux