drbd.conf - DRBD Configuration Files

INTRODUCTION

DRBD implements block devices which replicate their data to all nodes of a cluster. The actual data and associated metadata are usually stored redundantly on "ordinary" block devices on each cluster node.Replicated block devices are called /dev/drbdminor by default. They are grouped into resources, with one or more devices per resource. Replication among the devices in a resource takes place in chronological order. With DRBD, we refer to the devices inside a resource as volumes.In DRBD 9, a resource can be replicated between two or more cluster nodes. The connections between cluster nodes are point-to-point links, and use TCP or a TCP-like protocol. All nodes must be directly connected.DRBD consists of low-level user-space components which interact with the kernel and perform basic operations ( drbdsetup, drbdmeta), a high-level user-space component which understands and processes the DRBD configuration and translates it into basic operations of the low-level components ( drbdadm), and a kernel component.The default DRBD configuration consists of /etc/drbd.conf and of additional files included from there, usually global_common.conf and all *.res files inside /etc/drbd.d/. It has turned out to be useful to define each resource in a separate *.res file.The configuration files are designed so that each cluster node can contain an identical copy of the entire cluster configuration. The host name of each node determines which parts of the configuration apply ( uname -n). It is highly recommended to keep the cluster configuration on all nodes in sync by manually copying it to all nodes, or by automating the process with csync2 or a similar tool.

EXAMPLE CONFIGURATION FILE

Code Block

global {
	usage-count yes;
	udev-always-use-vnr;
}
resource r0 {
      net {
	      cram-hmac-alg sha1;
	      shared-secret "FooFunFactory";
      }
      volume 0 {
	      device    /dev/drbd1;
	      disk      /dev/sda7;
	      meta-disk internal;
      }
      on alice {
	      node-id   0;
	      address   10.1.1.31:7000;
      }
      on bob {
	      node-id   1;
	      address   10.1.1.32:7000;
      }
      connection {
	      host      alice  port 7000;
	      host      bob    port 7000;
	      net {
			protocol C;
	      }
      }
}

...

This example defines a resource r0 which contains a single replicated device with volume number 0. The resource is replicated among hosts alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32 and the node identifiers 0 and 1, respectively. On both hosts, the replicated device is called /dev/drbd1, and the actual data and metadata are stored on the lower-level device /dev/sda7. The connection between the hosts uses protocol C.Please refer to the DRBD User's Guide[1] for more examples.

FILE FORMAT

DRBD configuration files consist of sections, which contain other sections and parameters depending on the section types. Each section consists of one or more keywords, sometimes a section name, an opening brace (“{”), the section's contents, and a closing brace (“}”). Parameters inside a section consist of a keyword, followed by one or more keywords or values, and a semicolon (“;”).Some parameter values have a default scale which applies when a plain number is specified (for example Kilo, or 1024 times the numeric value). Such default scales can be overridden by using a suffix (for example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K, and G = 1024 M are supported.Comments start with a hash sign (“#”) and extend to the end of the line. In addition, any section can be prefixed with the keyword skip, which causes the section and any sub-sections to be ignored.Additional files can be included with the include file-pattern statement (see glob(7) for the expressions supported in file-pattern). Include statements are only allowed outside of sections.The following sections are defined (indentation indicates in which context):

...

Sections in brackets affect other parts of the configuration: inside the common section, they apply to all resources. A disk section inside a resource or on section applies to all volumes of that resource, and a net section inside a resource section applies to all connections of that resource. This allows to avoid repeating identical options for each resource, connection, or volume. Options can be overridden in a more specific resource, connection, on, or volume section.peer-device-options are resync-rate, c-plan-ahead, c-delay-target, c-fill-target, c-max-rate and c-min-rate. Due to backward comapatibility they can be specified in any disk options section as well. They are inherited into all relevant connections. If they are given on connection level they are inherited to all volumes on that connection. A peer-device-options section is started with the disk keyword.

Sections

common

This section can contain each a disk, handlers, net, options, and startup section. All resources inherit the parameters in these sections as their default values.

...

Define a volume within a resource. The volume numbers in the various volume sections of a resource define which devices on which hosts form a replicated device.

Section connection Parameters

host name [address [address-family] address] [port port-number]

...

Defines an endpoint for a connection. Each host statement refers to an on section in a resource. If a port number is defined, this endpoint will use the specified port instead of the port defined in the on section. Each connection section must contain exactly two host parameters. Instead of two host parameters the connection may contain multiple path sections.

Section path Parameters

host name [address [address-family] address] [port port-number]

...

Defines an endpoint for a connection. Each host statement refers to an on section in a resource. If a port number is defined, this endpoint will use the specified port instead of the port defined in the on section. Each path section must contain exactly two host parameters.

Section connection-mesh Parameters

hosts name...

Defines all nodes of a mesh. Each name refers to an on section in a resource. The port that is defined in the on section will be used.

Section disk Parameters

al-extents extents

DRBD automatically maintains a "hot" or "active" disk area likely to be written to again soon based on the recent write activity. The "active" disk area can be written to immediately, while "inactive" disk areas must be "activated" first, which requires a meta-data write. We also refer to this active disk area as the "activity log". The activity log saves meta-data writes, but the whole log must be resynced upon recovery of a failed node. The size of the activity log is a major factor of how long a resync will take and how fast a replicated disk will become consistent after a crash. The activity log consists of a number of 4-Megabyte segments; the al-extents parameter determines how many of those segments can be active at the same time. The default value for al-extents is 1237, with a minimum of 7 and a maximum of 65536. Note that the effective maximum may be smaller, depending on how you created the device meta data, see also drbdmeta(8) The effective maximum is 919 * (available on-disk activity-log ring-buffer area/4kB -1), the default 32kB ring-buffer effects a maximum of 6433 (covers more than 25 GiB of data) We recommend to keep this well within the amount your backend storage and replication link are able to resync inside of about 5 minutes.

...

There are several aspects to discard/trim/unmap support on linux block devices. Even if discard is supported in general, it may fail silently, or may partially ignore discard requests. Devices also announce whether reading from unmapped blocks returns defined data (usually zeroes), or undefined data (possibly old data, possibly garbage). If on different nodes, DRBD is backed by devices with differing discard characteristics, discards may lead to data divergence (old data or garbage left over on one backend, zeroes due to unmapped areas on the other backend). Online verify would now potentially report tons of spurious differences. While probably harmless for most use cases (fstrim on a file system), DRBD cannot have that. To play safe, we have to disable discard support, if our local backend (on a Primary) does not support "discard_zeroes_data=true". We also have to translate discards to explicit zero-out on the receiving side, unless the receiving side (Secondary) supports "discard_zeroes_data=true", thereby allocating areas what were supposed to be unmapped. There are some devices (notably the LVM/DM thin provisioning) that are capable of discard, but announce discard_zeroes_data=false. In the case of DM-thin, discards aligned to the chunk size will be unmapped, and reading from unmapped sectors will return zeroes. However, unaligned partial head or tail areas of discard requests will be silently ignored. If we now add a helper to explicitly zero-out these unaligned partial areas, while passing on the discard of the aligned full chunks, we effectively achieve discard_zeroes_data=true on such devices. Setting discard-zeroes-if-aligned to yes will allow DRBD to use discards, and to announce discard_zeroes_data=true, even on backends that announce discard_zeroes_data=false. Setting discard-zeroes-if-aligned to no will cause DRBD to always fall-back to zero-out on the receiving side, and to not even announce discard capabilities on the Primary, if the respective backend announces discard_zeroes_data=false. We used to ignore the discard_zeroes_data setting completely. To not break established and expected behaviour, and suddenly cause fstrim on thin-provisioned LVs to run out-of-space instead of freeing up space, the default value is yes. This option is available since 8.4.7.

Section peer-device-options Parameters

Please note that you open the section with the disk keyword.c-delay-target delay_target,

...

Define how much bandwidth DRBD may use for resynchronizing. DRBD allows "normal" application I/O even during a resync. If the resync takes up too much bandwidth, application I/O can become very slow. This parameter allows to avoid that. Please note this is option only works when the dynamic resync controller is disabled.

Section global Parameters

dialog-refresh time

The DRBD init script can be used to configure and start DRBD devices, which can involve waiting for other cluster nodes. While waiting, the init script shows the remaining waiting time. The dialog-refresh defines the number of seconds between updates of that countdown. The default value is 1; a value of 0 turns off the countdown.

...

If you define this parameter in the global section, drbdadm will always add the .../VNR part, and will not care for whether the volume definition was implicit or explicit. For legacy backward compatibility, this is off by default, but we do recommend to enable it.

Section handlers Parameters

after-resync-target cmd

Called on a resync target when a node state changes from Inconsistent to Consistent when a resync finishes. This handler can be used for removing the snapshot created in the before-resync-target handler.

...

DRBD has detected a split-brain situation which could not be resolved automatically. Manual recovery is necessary. This handler can be used to call for administrator attention.

Section net Parameters

after-sb-0pri policy

Define how to react if a split-brain scenario is detected and none of the two nodes is in primary role. (We detect split-brain scenarios when two nodes connect; split-brain decisions are always between two nodes.) The defined policies are:disconnectNo automatic resynchronization; simply disconnect.discard-younger-primary, discard-older-primaryResynchronize from the node which became primary first ( discard-younger-primary) or last (discard-older-primary). If both nodes became primary independently, the discard-least-changes policy is used.discard-zero-changesIf only one of the nodes wrote data since the split brain situation was detected, resynchronize from this node to the other. If both nodes wrote data, disconnect.discard-least-changesResynchronize from the node with more modified blocks.discard-node-nodenameAlways resynchronize to the named node.

...

Online verification (drbdadm verify) computes and compares checksums of disk blocks (i.e., hash values) in order to detect if they differ. The verify-alg parameter determines which algorithm to use for these checksums. It must be set to one of the secure hash algorithms supported by the kernel before online verify can be used; see the shash algorithms listed in /proc/crypto. We recommend to schedule online verifications regularly during low-load periods, for example once a month. Also see the notes on data integrity below.

Section on Parameters

address [address-family] address: port

...

Defines the unique node identifier for a node in the cluster. Node identifiers are used to identify individual nodes in the network protocol, and to assign bitmap slots to nodes in the metadata. Node identifiers can only be reasssigned in a cluster when the cluster is down. It is essential that the node identifiers in the configuration and in the device metadata are changed consistently on all hosts. To change the metadata, dump the current state with drbdmeta dump-md, adjust the bitmap slot assignment, and update the metadata with drbdmeta restore-md. The node-id parameter exists since DRBD 9. Its value ranges from 0 to 16; there is no default.

Section options Parameters (Resource Options)

auto-promote bool-value

A resource must be promoted to primary role before any of its devices can be mounted or opened for writing. Before DRBD 9, this could only be done explicitly ("drbdadm primary"). Since DRBD 9, the auto-promote parameter allows to automatically promote a resource to primary role when one of its devices is mounted or opened for writing. As soon as all devices are unmounted or closed with no more remaining users, the role of the resource changes back to secondary. Automatic promotion only succeeds if the cluster state allows it (that is, if an explicit drbdadm primary command would succeed). Otherwise, mounting or opening the device fails as it already did before DRBD 9: the mount(2) system call fails with errno set to EROFS (Read-only file system); the open(2) system call fails with errno set to EMEDIUMTYPE (wrong medium type). Irrespective of the auto-promote parameter, if a device is promoted explicitly ( drbdadm primary), it also needs to be demoted explicitly (drbdadm secondary). The auto-promote parameter is available since DRBD 9.0.0, and defaults to yes.

...

By default DRBD freezes IO on a device, that lost quorum. By setting the on-no-quorum to io-error it completes all IO operations with an error if quorum ist lost. The on-no-quorum options is available starting with the DRBD kernel driver version 9.0.8.

Section startup Parameters

The parameters in this section define the behavior of DRBD at system startup time, in the DRBD init script. They have no effect once the system is up and running.degr-wfc-timeout timeout

...

Define how long the init script waits until all peers are connected. This can be useful in combination with a cluster manager which cannot manage DRBD resources: when the cluster manager starts, the DRBD resources will already be up and running. With a more capable cluster manager such as Pacemaker, it makes more sense to let the cluster manager control DRBD resources. The timeout is specified in seconds. The default value is 0, which stands for an infinite timeout. Also see the degr-wfc-timeout parameter.

Section volume Parameters

device /dev/drbdminor-number

...

Define where the metadata of a replicated block device resides: it can be internal, meaning that the lower-level device contains both the data and the metadata, or on a separate device. When the index form of this parameter is used, multiple replicated devices can share the same metadata device, each using a separate index. Each index occupies 128 MiB of data, which corresponds to a replicated device size of at most 4 TiB with two cluster nodes. We recommend not to share metadata devices anymore, and to instead use the lvm volume manager for creating metadata devices as needed. When the index form of this parameter is not used, the size of the lower-level device determines the size of the metadata. The size needed is 36 KiB + (size of lower-level device) / 32K * (number of nodes - 1). If the metadata device is bigger than that, the extra space is not used. This parameter is required if a disk other than none is specified, and ignored if disk is set to none. A meta-disk parameter without a disk parameter is not allowed.

NOTES ON DATA INTEGRITY

DRBD supports two different mechanisms for data integrity checking: first, the data-integrity-alg network parameter allows to add a checksum to the data sent over the network. Second, the online verification mechanism ( drbdadm verify and the verify-alg parameter) allows to check for differences in the on-disk data.Both mechanisms can produce false positives if the data is modified during I/O (i.e., while it is being sent over the network or written to disk). This does not always indicate a problem: for example, some file systems and applications do modify data under I/O for certain operations. Swap space can also undergo changes while under I/O.Network data integrity checking tries to identify data modification during I/O by verifying the checksums on the sender side after sending the data. If it detects a mismatch, it logs an error. The receiver also logs an error when it detects a mismatch. Thus, an error logged only on the receiver side indicates an error on the network, and an error logged on both sides indicates data modification under I/O.The most recent example of systematic data corruption was identified as a bug in the TCP offloading engine and driver of a certain type of GBit NIC in 2007: the data corruption happened on the DMA transfer from core memory to the card. Because the TCP checksum were calculated on the card, the TCP/IP protocol checksums did not reveal this problem.

Versions Compared

Old Version 4

New Version 5

Key