Troubleshooting guides for issues that may arise in the process of configuring DRX.


Install errors

Unable to start a resource

Unable to connect

Because there are many possible reasons why a DRX connection might not be established, you should follow the order of the replication connection configuration procedure and check it carefully. The following configuration sequence is based on Linux and is the same for Windows

Network environment

  1. Verify that the bsr's IP and drx's IP are set in the node's firewall policy allowlist. If they are not enforced for the IP and port used by the resource, do the following
    1. Centos 6

      Add what you want to set to the /etc/sysconfig/iptables file.

      -A INPUT -p tcp -s \{source ip\} -d \{destination ip\} --dport \{Allowed Ports\} -j ACCEPT


    2. Centos 7

      firewall-cmd --permanent --zone=public --add-port=\{허용할 포트\}/tcp 
      firewall-cmd --reload
      firewall-cmd --zone=public --list-all


  2. Check the loopback ping
    1. If there is a ping response to the loopback address (127.0.0.1) but no ping response to the local IP address, there is a problem with the configuration of your network environment. In this case, you should contact your network administrator.

Versions

Check for version compatibility.

[root@c65-3 build_files]# lsmod | grep drbd
drbd                  374888  3 
[root@c65-3 build_files]# 


Replication configuration


Check connections step by step


  1. Connection between local and remote DRX

    1. Change all resources in the BSR to STANDALONE: bsradm disconnect r0
    2. Install DRX and start the DRX service to connect both DRXs.
    3. Check the connection status of the drx ip/port in netstat (the connection status is ESTABLISHED).
    4. If normal, the connection status of the resource is BRIDGED.
      1. At this point, the DRX will change to CONNECTING/WAITING state, trying to connect to the BSR, and the BSR is still STANDALONE.
    5. If the state of both drxes is still BRIDGING, then the drxes are attempting to connect and if there is no change after a period of time, you should check the connectivity on the WAN leg first.
      1. ICMP ping is likely blocked by firewall policies, so don't rely on ping to determine connectivity status. Use a network connectivity checker tool, such as drxsim included with drx, to check for TCP connectivity between local and remote.
    6. Change the BSR resource configuration to connect directly between the BSRs without involving DRX to see if it connects normally. If it connects normally, the problem is with the DRX connection.
  2. Connecting between BSR and DRX
    1. Change the state of a BSR resource from STANDALONE to CONNECTING (BSRADM CONNECT).
      1. In normal cases, the BSR and DRX will be connected as ESTABLISHED.
    2. If the status of the bsr is CONNECTING and the connection is not established, check the netstat output to see if the bsr ip is in LISTEN state.
    3. Verify that the local drx is attempting to SYN_SENT to the local bsr ip.
      1. Because TCP state changes can happen quickly, netstat may not catch the SYN_SENT state output.
      2. Continuously monitor the output of netstat in the form of the following script.



        $> while(true); do date; netstat -nap | grep 779 | sort -k 3; sleep 1; clear; done
        Thu Aug 23 08:51:23 PDT 2018
        tcp        0      0 192.168.100.3:35814         192.168.100.3:7792          ESTABLISHED -                  
        tcp        0      0 192.168.100.3:7791          0.0.0.0:*                   LISTEN      -                  
        tcp        0      0 192.168.100.3:7792          192.168.100.3:35814         ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7793          192.168.100.2:60676         ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7795          0.0.0.0:*                   LISTEN      8033/drx           
        tcp        0      0 192.168.100.3:7796          192.168.100.2:43684         ESTABLISHED 8033/drx           
        tcp        0      1 10.10.0.182:50460           31.1.1.2:7793               SYN_SENT    8033/drx           
        tcp        0      1 10.10.0.182:57966           31.1.1.2:7796               SYN_SENT    8033/drx           
        unix  3      [ ]         STREAM     CONNECTED     18779  2477/gconfd-2      
        unix  3      [ ]         STREAM     CONNECTED     20779  2512/gnome-panel   



    4. Once the BSR and DRX are connected, verify that the resource's BSR IP and DRX's IP are in the EASTABLED state in the netstat output.
    5. Verify that there are no logs in the drx logs for failures (e.g. connection refuse).
  3. If you get to this stage, collect support files to get logs and have someone analyse them.


VIP unreachable

If socket binds are performed over the same VIP on both Active/Standby nodes using VIP, communication interference between the two nodes may occur. When interworking with VIP (SDR, MDR, etc.), the DRX of the standby node must be stopped.

When failing over to the standby node, the reverse is true: the DRX of the Active must be brought down (down) and the DRX of the Standby must be started (up) before the resources of the Active are started (up) to ensure a smooth connection.