Troubleshooting


Troubleshooting guides for issues that may arise in the process of configuring DRX.


Install errors

  • Issues with installing the Visual C++ Redistributable Package for Visual Studio 2013 (hereafter VS2013 Redistributable Package) when installing DRX for Windows
    • Problem phenomenon 
      • Error installing "VS2013 Redistribution Package" that automatically adds after DRX installation
      • Cause: An inherent flaw in the VS2013 redistribution package.
    • Solution

      • Windows Server 2012 R2

        • Description: "VS2013 Redistribution Package" requires KB2883200 (Windows Update) for Windows Server 2012 R2.
        • Solution: Ensure that Windows Update KB2883200 is installed. If it is not installed, install it through Windows Update.
      • Windows Server 2008 R2 SP1
        • Explanation: Error 0x800b010a occurs.
          • [0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed authenticode verification of payload: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64
            [0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed to verify signature of payload: vcRuntimeMinimum_x64
            [0AD8:05C0][2018-07-26T15:33:04]e310: Failed to verify payload: vcRuntimeMinimum_x64 at path: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64, error: 0x800b010a. Deleting file.
        • Solution: Update Windows Update for additional ".NET Framework 3.5.1" entries.

Unable to start a resource

  • Failed to read settings due to UTF-8 with BOM file format
    • Problem phenomenon

      • Failed to read drx.conf

        DRX 로그
        E1120 16:37:02.690660 t42053 config] Failed to load [/opt/DRX/drx.conf]. /opt/DRX/drx.conf(1): '=' character not found in line
      • Failed to read BSR settings

        DRX 로그
        E1120 16:37:52.810044 t42132 config] Failed to get drbd configuration: Can't get drbd configuration. (exit_code: 2560)
        E1120 16:37:52.810068 t42132 config] Output: drbd.d/1/r0.res:1: Parse error: 'global | common | resource | skip | include' expected,
        E1120 16:37:52.810070 t42132 config] Output: but got '▒'
      • Cause: Failed to parse configuration file due to bill of materials information.
    • Solution

      • Centos 6, 7
        • Check the file's encoding with the file command.

          [root@drxdev1 test]# file r1.res
          r1.res: UTF-8 Unicode (with BOM) text, with CRLF line terminators
        • Open the file with vi, type the following, and save.
          :set nobomb
      • Windows
        • Open the file with notepad and change the encoding to 'ANSI' via 'Save As'.

Unable to connect

Because there are many possible reasons why a DRX connection might not be established, you should follow the order of the replication connection configuration procedure and check it carefully. The following configuration sequence is based on Linux and is the same for Windows

Network environment

  1. Verify that the bsr's IP and drx's IP are set in the node's firewall policy allowlist. If they are not enforced for the IP and port used by the resource, do the following
    1. Centos 6

      Add what you want to set to the /etc/sysconfig/iptables file.

      -A INPUT -p tcp -s \{source ip\} -d \{destination ip\} --dport \{Allowed Ports\} -j ACCEPT
    2. Centos 7

      firewall-cmd --permanent --zone=public --add-port=\{허용할 포트\}/tcp 
      firewall-cmd --reload
      firewall-cmd --zone=public --list-all
  2. Check the loopback ping
    1. If there is a ping response to the loopback address (127.0.0.1) but no ping response to the local IP address, there is a problem with the configuration of your network environment. In this case, you should contact your network administrator.

Versions

Check for version compatibility.

  • drbd 8.4.8 or later
  • drbd util 8.9.10 or later
  • fsr 1.4 or later
  • bsr 1.0 or later
  • Verify that the local DRX and remote DRX have the same version
[root@c65-3 build_files]# lsmod | grep drbd
drbd                  374888  3 
[root@c65-3 build_files]# 


Replication configuration

  • Ensure that the resource configuration file is saved in ANSI or UTF8 format (we do not support UTF8 with BOM format).
  • If you made any changes to the hostname, make sure they are also applied to the configuration file.
  • Verify that there are no duplicate communication ports in the configuration file.
  • Verify with bsrsetup show that the ip loaded into the BSR is the same as the ip set in the resource file.
  • Check whether wfc-timeout is set in the global entry. If not set, set the wfc-timeout value to 1.
  • Add the value of ping-timeout to the "net" entry of the resource. The default value is 500ms, set it to 30 (3 seconds) to be generous.


Check connections step by step


  1. Connection between local and remote DRX

    1. Change all resources in the BSR to STANDALONE: bsradm disconnect r0
    2. Install DRX and start the DRX service to connect both DRXs.
    3. Check the connection status of the drx ip/port in netstat (the connection status is ESTABLISHED).
    4. If normal, the connection status of the resource is BRIDGED.
      1. At this point, the DRX will change to CONNECTING/WAITING state, trying to connect to the BSR, and the BSR is still STANDALONE.
    5. If the state of both drxes is still BRIDGING, then the drxes are attempting to connect and if there is no change after a period of time, you should check the connectivity on the WAN leg first.
      1. ICMP ping is likely blocked by firewall policies, so don't rely on ping to determine connectivity status. Use a network connectivity checker tool, such as drxsim included with drx, to check for TCP connectivity between local and remote.
    6. Change the BSR resource configuration to connect directly between the BSRs without involving DRX to see if it connects normally. If it connects normally, the problem is with the DRX connection.
  2. Connecting between BSR and DRX
    1. Change the state of a BSR resource from STANDALONE to CONNECTING (BSRADM CONNECT).
      1. In normal cases, the BSR and DRX will be connected as ESTABLISHED.
    2. If the status of the bsr is CONNECTING and the connection is not established, check the netstat output to see if the bsr ip is in LISTEN state.
    3. Verify that the local drx is attempting to SYN_SENT to the local bsr ip.
      1. Because TCP state changes can happen quickly, netstat may not catch the SYN_SENT state output.
      2. Continuously monitor the output of netstat in the form of the following script.
        $> while(true); do date; netstat -nap | grep 779 | sort -k 3; sleep 1; clear; done
        Thu Aug 23 08:51:23 PDT 2018
        tcp        0      0 192.168.100.3:35814         192.168.100.3:7792          ESTABLISHED -                  
        tcp        0      0 192.168.100.3:7791          0.0.0.0:*                   LISTEN      -                  
        tcp        0      0 192.168.100.3:7792          192.168.100.3:35814         ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7793          192.168.100.2:60676         ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7795          0.0.0.0:*                   LISTEN      8033/drx           
        tcp        0      0 192.168.100.3:7796          192.168.100.2:43684         ESTABLISHED 8033/drx           
        tcp        0      1 10.10.0.182:50460           31.1.1.2:7793               SYN_SENT    8033/drx           
        tcp        0      1 10.10.0.182:57966           31.1.1.2:7796               SYN_SENT    8033/drx           
        unix  3      [ ]         STREAM     CONNECTED     18779  2477/gconfd-2      
        unix  3      [ ]         STREAM     CONNECTED     20779  2512/gnome-panel   
    4. Once the BSR and DRX are connected, verify that the resource's BSR IP and DRX's IP are in the EASTABLED state in the netstat output.
    5. Verify that there are no logs in the drx logs for failures (e.g. connection refuse).
  3. If you get to this stage, collect support files to get logs and have someone analyse them.


VIP unreachable

If socket binds are performed over the same VIP on both Active/Standby nodes using VIP, communication interference between the two nodes may occur. When interworking with VIP (SDR, MDR, etc.), the DRX of the standby node must be stopped.

When failing over to the standby node, the reverse is true: the DRX of the Active must be brought down (down) and the DRX of the Standby must be started (up) before the resources of the Active are started (up) to ensure a smooth connection.