Table of Contents |
---|
Overview
bsr implements a block device that replicates data from the local node to all other nodes in the cluster. Here, the actual data and related metadata are stored individually (usually in the case of external metadata) on the “generic” block device volume of each cluster node. Replication block devices must be named by default in /dev/bsr<minor> format or directly as a symbolic link (letter) to the device. One or more devices per resource are grouped and each device is replicated in parallel. The device inside the resource is defined as a volume, and resources can be duplicated between two or more cluster nodes. Cluster node-to-node connections are point-to-point links and use the TCP protocol. bsr consists of the basic components bsradm, which understands and processes configuration files, and the low-level components bsrsetup, bsrmeta, and bsrcon. The basic bsr configuration consists of /etc/bsr.conf and any additional files it contains (typically global_common.conf and all * .res files in the /etc path). Usually each resource is in etc/bsr.d/. It is useful to define separate * .res files in the path. The configuration file is designed so that each cluster node contains the same copy of the entire cluster configuration. However, sometimes it may be necessary to have the contents of different configuration files for each node, so this is not absolute.
...
This example defines the volume of letter e as the resource r0 containing a single replication device. This resource replicates between IPv4 addresses 10.1.1.31 and 10.1.1.32 and hosts alice and bob with node identifiers 0 and 1, respectively. The actual data is volume e, and the metadata is stored in volume f. Protocol C is used for connections between hosts.
File Format
The configuration file consists of sections containing different sections and parameters depending on the section type. Each section consists of one or more keywords, sometimes a section name, an opening brace ("{"), the contents of the section, and a closing brace ("}"). Parameters within a section consist of a keyword and one or more keywords or values and a semicolon (";"). Some parameter values have a default scale applied when specifying a regular number (e.g. Kilo). These default scales can be overridden using a suffix (e.g. Mega for M). Common suffixes are K = 2 ^ 10 = 1024, M = 1024 K, and G = 1024 M. Comments can be written beginning with a hash sign ("#") and ending at the end of the line. You can also prefix the keyword skip to all sections to ignore sections and subsections. Additional files can be included in the include file pattern statement. Include statements are only allowed outside of the section.
...
Sections in parentheses affect other parts of the composition. The contents of the common section apply to all resources. The disk section of a resource or resource section applies to all volumes of that resource, and the network section of the resource section applies to all connections of that resource. This eliminates the need to repeat the same option for each resource, connection or volume. You can override more specific options in the Resources, Connections, Volumes or Volumes section. The peer-device options are defined as resync-rate, c-plan-ahead, c-delay-target, c-fill-target, c-max-rate and c-min-rate, and all disks for backward compatibility. Sections can also be specified. They are inherited by all relevant links. If granted in the connection section, it is inherited by all volumes in that connection. The "peer-device-options" section begins with the "disk" keyword.
Sections
common
This section can contain each a disk, handlers, net, options, and startup section. All resources inherit the parameters in these sections as their default values.
...
Define a volume within a resource. The volume numbers in the various volume sections of a resource define which devices on which hosts form a replicated device.
Section connection Parameters
host name [address [address-family] address] [port port-number]
Defines an endpoint for a connection. Each host statement refers to an on section in a resource. If a port number is defined, this endpoint will use the specified port instead of the port defined in the on section. Each connection section must contain exactly two host parameters. Instead of two host parameters the connection may contain multiple path sections.
Section path Parameters
host name [address [address-family] address] [port port-number]
Defines an endpoint for a connection. Each host statement refers to an on section in a resource. If a port number is defined, this endpoint will use the specified port instead of the port defined in the on section. Each path section must contain exactly two host parameters.
Section connection-mesh Parameters
hosts name...
Defines all nodes of a mesh. Each name refers to an on section in a resource. The port that is defined in the on section will be used.
Section disk Parameters
al-extents extents
bsr manages active and recently rewritten areas based on recent disk write operations. When write I / O occurs, the active area can be written to disk immediately, but the inactive disk area must be activated first, so metadata write is required here. This active disk area is called activity log.
...
Define that a device should only resynchronize after the specified other device. By default, no order between devices is defined, and all devices will resynchronize in parallel. Depending on the configuration of the lower-level devices, and the available network and disk bandwidth, this can slow down the overall resync process. This option can be used to form a chain or tree of dependencies among devices.
Section peer-device-options Parameters
Please note that you open the section with the disk keyword.c-delay-target delay_target,
...
Defines the bandwidth available for resynchronization. bsr allows general application I / O even during resynchronization. If resynchronization takes up too much bandwidth, application I / O can be very slow and this parameter can be avoided. This option only works if the dynamic resync controller is disabled.
Section global Parameters
dialog-refresh time
You can configure and start the device using the bsr initialization script. This may involve waiting for other cluster nodes. While waiting, the init script shows the remaining wait time. Refresh dialog defines the number of seconds between updates to that countdown and defaults to 1. A value of 0 turns countdown off.
...
Ability to aggregate usage statistics, but not used by bsr.
Section handlers Parameters
after-resync-target cmd
Called on a resync target when a node state changes from Inconsistent to Consistent when a resync finishes. This handler can be used for removing the snapshot created in the before-resync-target handler.
...
bsr has detected a split-brain situation which could not be resolved automatically. Manual recovery is necessary. This handler can be used to call for administrator attention.
Section net Parameters
after-sb-0pri policy
Defines how to respond when a split brain scenario is detected and neither of the two nodes plays the Primary role. (Detects a split brain scenario when two nodes are connected. The split brain decision is always between the two nodes.) The defined policy is:
...
Online verification (bsradm verify) computes and compares checksums of disk blocks (i.e., hash values) in order to detect if they differ. The verify-alg parameter determines which algorithm to use for these checksums. It must be set to one of the secure hash algorithms supported by the kernel before online verify can be used; see the shash algorithms listed in /proc/crypto. We recommend to schedule online verifications regularly during low-load periods, for example once a month. Also see the notes on data integrity below.
Section on Parameters
address [address-family] address: port
...
Defines the unique node identifier for a node in the cluster. Node identifiers are used to identify individual nodes in the network protocol, and to assign bitmap slots to nodes in the metadata. Node identifiers can only be reasssigned in a cluster when the cluster is down. It is essential that the node identifiers in the configuration and in the device metadata are changed consistently on all hosts. To change the metadata, dump the current state with bsrmeta dump-md, adjust the bitmap slot assignment, and update the metadata with bsrmeta restore-md. The node-id parameter must be set. Its value ranges from 0 to 16; there is no default.
Section options Parameters (Resource Options)
auto-promote bool-value
Not supported by bsr.
...
If after the last finished write request no new write request gets issued for expiry-time, then a peer-ack packet is sent. If a new write request is issued before the timer expires, the timer gets reset to expiry-time. (Note: peer-ack packets may be sent due to other reasons as well, e.g. membership changes or the peer-ack-window option.) This parameter may influence resync behavior on remote nodes. Peer nodes need to wait until they receive an peer-ack for releasing a lock on an AL-extent. Resync operations between peers may need to wait for for these locks. The default value for peer-ack-delay is 100 milliseconds, the default unit is milliseconds.
Section startup Parameters
The parameters in this section define the behavior of bsr at system startup time, in the bsr init script. They have no effect once the system is up and running.
...
Defines the time the init script waits for all peers to connect. This can be useful in combination with cluster managers who cannot manage bsr resources. When the cluster manager starts, the bsr resource is already running. Timeouts are specified in seconds. The default is 0, indicating an infinite timeout. See also degr-wfc-timeout parameter.
Section volume Parameters
device /dev/bsrminor-number
...