Troubleshooting
Troubleshooting guides for issues that may arise in the process of configuring DRX.
Install errors
- Issues with installing the Visual C++ Redistributable Package for Visual Studio 2013 (hereafter VS2013 Redistributable Package) when installing DRX for Windows
- Problem phenomenon
- Error installing "VS2013 Redistribution Package" that automatically adds after DRX installation
- Cause: An inherent flaw in the VS2013 redistribution package.
Solution
Windows Server 2012 R2
- Description: "VS2013 Redistribution Package" requires KB2883200 (Windows Update) for Windows Server 2012 R2.
- Solution: Ensure that Windows Update KB2883200 is installed. If it is not installed, install it through Windows Update.
- Windows Server 2008 R2 SP1
- Explanation: Error 0x800b010a occurs.
[0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed authenticode verification of payload: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64 [0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed to verify signature of payload: vcRuntimeMinimum_x64 [0AD8:05C0][2018-07-26T15:33:04]e310: Failed to verify payload: vcRuntimeMinimum_x64 at path: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64, error: 0x800b010a. Deleting file.
- Solution: Update Windows Update for additional ".NET Framework 3.5.1" entries.
- Explanation: Error 0x800b010a occurs.
- Problem phenomenon
Unable to start a resource
- Failed to read settings due to UTF-8 with BOM file format
Problem phenomenon
Failed to read drx.conf
DRX 로그E1120 16:37:02.690660 t42053 config] Failed to load [/opt/DRX/drx.conf]. /opt/DRX/drx.conf(1): '=' character not found in line
Failed to read BSR settings
DRX 로그E1120 16:37:52.810044 t42132 config] Failed to get drbd configuration: Can't get drbd configuration. (exit_code: 2560) E1120 16:37:52.810068 t42132 config] Output: drbd.d/1/r0.res:1: Parse error: 'global | common | resource | skip | include' expected, E1120 16:37:52.810070 t42132 config] Output: but got '▒'
- Cause: Failed to parse configuration file due to bill of materials information.
Solution
- Centos 6, 7
Check the file's encoding with the file command.
[root@drxdev1 test]# file r1.res r1.res: UTF-8 Unicode (with BOM) text, with CRLF line terminators
- Open the file with vi, type the following, and save.
:set nobomb
- Windows
- Open the file with notepad and change the encoding to 'ANSI' via 'Save As'.
- Centos 6, 7
Unable to connect
Because there are many possible reasons why a DRX connection might not be established, you should follow the order of the replication connection configuration procedure and check it carefully. The following configuration sequence is based on Linux and is the same for Windows
Network environment
- Verify that the bsr's IP and drx's IP are set in the node's firewall policy allowlist. If they are not enforced for the IP and port used by the resource, do the following
Centos 6
Add what you want to set to the /etc/sysconfig/iptables file.
-A INPUT -p tcp -s \{source ip\} -d \{destination ip\} --dport \{Allowed Ports\} -j ACCEPT
Centos 7
firewall-cmd --permanent --zone=public --add-port=\{허용할 포트\}/tcp firewall-cmd --reload firewall-cmd --zone=public --list-all
- Check the loopback ping
- If there is a ping response to the loopback address (127.0.0.1) but no ping response to the local IP address, there is a problem with the configuration of your network environment. In this case, you should contact your network administrator.
Versions
Check for version compatibility.
- drbd 8.4.8 or later
- drbd util 8.9.10 or later
- fsr 1.4 or later
- bsr 1.0 or later
- Verify that the local DRX and remote DRX have the same version
[root@c65-3 build_files]# lsmod | grep drbd drbd 374888 3 [root@c65-3 build_files]#
Replication configuration
- Ensure that the resource configuration file is saved in ANSI or UTF8 format (we do not support UTF8 with BOM format).
- If you made any changes to the hostname, make sure they are also applied to the configuration file.
- Verify that there are no duplicate communication ports in the configuration file.
- Verify with bsrsetup show that the ip loaded into the BSR is the same as the ip set in the resource file.
- Check whether wfc-timeout is set in the global entry. If not set, set the wfc-timeout value to 1.
- Add the value of ping-timeout to the "net" entry of the resource. The default value is 500ms, set it to 30 (3 seconds) to be generous.
Check connections step by step
Connection between local and remote DRX
- Change all resources in the BSR to STANDALONE: bsradm disconnect r0
- Install DRX and start the DRX service to connect both DRXs.
- Check the connection status of the drx ip/port in netstat (the connection status is ESTABLISHED).
- If normal, the connection status of the resource is BRIDGED.
- At this point, the DRX will change to CONNECTING/WAITING state, trying to connect to the BSR, and the BSR is still STANDALONE.
- If the state of both drxes is still BRIDGING, then the drxes are attempting to connect and if there is no change after a period of time, you should check the connectivity on the WAN leg first.
- ICMP ping is likely blocked by firewall policies, so don't rely on ping to determine connectivity status. Use a network connectivity checker tool, such as drxsim included with drx, to check for TCP connectivity between local and remote.
- Change the BSR resource configuration to connect directly between the BSRs without involving DRX to see if it connects normally. If it connects normally, the problem is with the DRX connection.
- Connecting between BSR and DRX
- Change the state of a BSR resource from STANDALONE to CONNECTING (BSRADM CONNECT).
- In normal cases, the BSR and DRX will be connected as ESTABLISHED.
- If the status of the bsr is CONNECTING and the connection is not established, check the netstat output to see if the bsr ip is in LISTEN state.
- Verify that the local drx is attempting to SYN_SENT to the local bsr ip.
- Because TCP state changes can happen quickly, netstat may not catch the SYN_SENT state output.
- Continuously monitor the output of netstat in the form of the following script.
$>
while
(
true
);
do
date; netstat -nap | grep
779
| sort -k
3
; sleep
1
; clear; done
Thu Aug
23
08
:
51
:
23
PDT
2018
tcp
0
0
192.168
.
100.3
:
35814
192.168
.
100.3
:
7792
ESTABLISHED -
tcp
0
0
192.168
.
100.3
:
7791
0.0
.
0.0
:* LISTEN -
tcp
0
0
192.168
.
100.3
:
7792
192.168
.
100.3
:
35814
ESTABLISHED
8033
/drx
tcp
0
0
192.168
.
100.3
:
7793
192.168
.
100.2
:
60676
ESTABLISHED
8033
/drx
tcp
0
0
192.168
.
100.3
:
7795
0.0
.
0.0
:* LISTEN
8033
/drx
tcp
0
0
192.168
.
100.3
:
7796
192.168
.
100.2
:
43684
ESTABLISHED
8033
/drx
tcp
0
1
10.10
.
0.182
:
50460
31.1
.
1.2
:
7793
SYN_SENT
8033
/drx
tcp
0
1
10.10
.
0.182
:
57966
31.1
.
1.2
:
7796
SYN_SENT
8033
/drx
unix
3
[ ] STREAM CONNECTED
18779
2477
/gconfd-
2
unix
3
[ ] STREAM CONNECTED
20779
2512
/gnome-panel
- Once the BSR and DRX are connected, verify that the resource's BSR IP and DRX's IP are in the EASTABLED state in the netstat output.
- Verify that there are no logs in the drx logs for failures (e.g. connection refuse).
- Change the state of a BSR resource from STANDALONE to CONNECTING (BSRADM CONNECT).
- If you get to this stage, collect support files to get logs and have someone analyse them.
VIP unreachable
If socket binds are performed over the same VIP on both Active/Standby nodes using VIP, communication interference between the two nodes may occur. When interworking with VIP (SDR, MDR, etc.), the DRX of the standby node must be stopped.
When failing over to the standby node, the reverse is true: the DRX of the Active must be brought down (down) and the DRX of the Standby must be started (up) before the resources of the Active are started (up) to ensure a smooth connection.