Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Verify that the bsr's IP and drx's IP are set in the node's firewall policy allowlist. If they are not enforced for the IP and port used by the resource, do the following
    1. Centos 6

      Add what you want to set to the /etc/sysconfig/iptables file.

      Code Block
      -A INPUT -p tcp -s \{source ip\} -d \{destination ip\} --dport \{Allowed Ports\} -j ACCEPT


    2. Centos 7

      Code Block
      firewall-cmd --permanent --zone=public --add-port=\{허용할 포트\}/tcp 
      firewall-cmd --reload
      firewall-cmd --zone=public --list-all


  2. Check the loopback ping
    1. If there is a ping response to the loopback address (127.0.0.1) but no ping response to the local IP address, there is a problem with the configuration of your network environment. In this case, you should contact your network administrator.

...

Versions

Check for version compatibility.

  • drbd

...

  • 8.4.

...

  • 8 or later
  • drbd util

...

  • 8.9

...

  • .10 or later
  • fsr 1.4 or later
  • bsr 1.0 or later
  • Verify that the

...

  • local DRX and remote DRX have the same version
Code Block
[root@c65-3 build_files]# lsmod | grep drbd
drbd                  374888  3 
[root@c65-3 build_files]# 

...

fsr: 1.2 or later

...


Checking the DRX version

Ensure that the DRX version on the local node and the DRX version on the remote node are the same. Although DRX provides backward compatibility between versions, it is recommended that you configure with the same version of DRX whenever possible.

Check resource settings

...

Replication configuration

  • Ensure that the resource configuration file is saved in ANSI

...

  • or UTF8 format

...

  • (we do not support UTF8 with BOM format

...

BSR Configuration

...

  • ).
  • If you made any changes to the hostname, make sure they are also applied to the configuration file.
  • Verify that there are no duplicate communication ports in the configuration file.
  • Verify with bsrsetup show that the ip loaded

...

  • into the BSR is the same as the ip set in the resource file.
  • Check whether wfc-timeout is set in the global entry. If not set, set the wfc-timeout value to 1.
  • Add the value of ping-timeout to the "net" entry of the resource. The default value is 500ms, set it to 30 (3 seconds) to be generous.

DRX Configuration

DRX 간 연결


...

Check connections step by step


  1. Connection between local and remote DRX

    1. Change all resources in the BSR to STANDALONE: bsradm disconnect r0
    2. drx를 설치하고 drxsvc를 start한 상태에서 drx간의 연결을 확인합니다.  
    3. netstat 출력물에서 drx ip와 포트가 LISTEN/ESTABLISHE/TIME_WAIT인지 여부를 확인합니다.
    4. 정상적일 경우 리소스의 연결 상태는 bridged 상태 입니다.
      1. 이 때의 bsr 상태는 standalone 이며 drx 가 bsr과 연결하기 위한 상태는 connecting / waiting 상태로 전환됩니다. 
    5. 양노드의 drx의 상태가 bridging이라면 drx간에 연결을 시도하는 상태이며 일정시간이 지나도 변화가 없다면 WAN 구간 상의 연결을 먼저 점검해 봐야 합니다.
      1. icmp ping 은 보통 방화벽 정책에 의해 차단되어 있을 가능성이 있기 때문에 drxsim등을 통한 로컬과 원격간의 TCP 연결 가능여부를 확인합니다.
    bsr - drx 간 연결
    1. standalone이었던 bsr 리소스의 상태를 connecting상태로 변경한다. → bsradm connect 명령어로 상태를 변경 합니다.
      • cat /proc/kmsg/의 로그에서 리소스의 상태가 Connecting으로 변경되는지 확인합니다.
    2. 정상 상황일 경우 bsr과 drx가 연결되면 established 로 연결이 성립됩니다.
    3. 만약 bsr의 status가 connecting이고 연결이 성립되지 않는다면 netstat 출력물에서 bsr ip가 LISTEN상태인지 확인합니다. 
    4. local drx가 local bsr ip로 SYN_SENT를 시도하는지 확인합니다.
    5. TCP의 상태변경이 신속하게 바뀔 수 있기 때문에 netstat에 SYN_SENT 상태 출력이 파악되지 않을 수도 있습니다.
    6. netstat의 결과를 다음과 같은 스크립트 형태로 지속적으로 모니터링 합니다.

      Code Block$> while(true); do date; netstat -nap | grep 779 | sort -k 3; sleep 1; clear; done Thu Aug 23 08:51:23 PDT 2018 tcp 0 0 Install DRX and start the DRX service to connect both DRXs.
    7. Check the connection status of the drx ip/port in netstat (the connection status is ESTABLISHED).
    8. If normal, the connection status of the resource is BRIDGED.
      1. At this point, the DRX will change to CONNECTING/WAITING state, trying to connect to the BSR, and the BSR is still STANDALONE.
    9. If the state of both drxes is still BRIDGING, then the drxes are attempting to connect and if there is no change after a period of time, you should check the connectivity on the WAN leg first.
      1. ICMP ping is likely blocked by firewall policies, so don't rely on ping to determine connectivity status. Use a network connectivity checker tool, such as drxsim included with drx, to check for TCP connectivity between local and remote.
    10. Change the BSR resource configuration to connect directly between the BSRs without involving DRX to see if it connects normally. If it connects normally, the problem is with the DRX connection.
  2. Connecting between BSR and DRX
    1. Change the state of a BSR resource from STANDALONE to CONNECTING (BSRADM CONNECT).
      1. In normal cases, the BSR and DRX will be connected as ESTABLISHED.
    2. If the status of the bsr is CONNECTING and the connection is not established, check the netstat output to see if the bsr ip is in LISTEN state.
    3. Verify that the local drx is attempting to SYN_SENT to the local bsr ip.
      1. Because TCP state changes can happen quickly, netstat may not catch the SYN_SENT state output.
      2. Continuously monitor the output of netstat in the form of the following script.


        Info


        $> while(true); do date; netstat -nap | grep 779 | sort -k 3; sleep 1; clear; done
        Thu Aug 23 08:51:23 PDT 2018
        tcp        0      0 192.168.100.3:35814
                 192.168.100.3:7792
        ESTABLISHED - tcp 0 0
                  ESTABLISHED -                  
        tcp        0      0 192.168.100.3:7791
                  0.0.0.0:*
        LISTEN - tcp 0 0
                           LISTEN      -                  
        tcp        0      0 192.168.100.3:7792
                  192.168.100.3:35814
        ESTABLISHED 8033/drx tcp 0 0
                 ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7793
                  192.168.100.2:60676
        ESTABLISHED 8033/drx tcp 0 0
                 ESTABLISHED 8033/drx           
        tcp        0      0 192.168.100.3:7795
                  0.0.0.0:*
        LISTEN 8033/drx tcp 0 0
                           LISTEN      8033/drx           
        tcp        0      0 192.168.100.3:7796
                  192.168.100.2:43684
        ESTABLISHED 8033/drx tcp 0 1
                 ESTABLISHED 8033/drx           
        tcp        0      1 10.10.0.182:50460
                   31.1.1.2:7793
        SYN_SENT 8033/drx tcp 0 1
                       SYN_SENT    8033/drx           
        tcp        0      1 10.10.0.182:57966
                   31.1.1.2:7796
        SYN_SENT 8033/drx unix 3 [ ] STREAM CONNECTED 18779 2477/gconfd-2 unix 3 [ ] STREAM CONNECTED 20779 2512/gnome-panel
    4. bsr과 drx가 연결되면 netstat 출력물에서 리소스의 bsr ip와 drx의 ip가 eastablished 상태가 되는지 확인합니다.
    5. drx 로그에 실패(Ex. connection refuse)에 대한 로그가 있는지 확인합니다.
  3. 로그 수집
    1. cat /etc/sysconfig/network-scripts/ifcfg-* 명령어로 출력되는 결과물을 수집합니다.
    2. /var/log/messages
    3. service iptables status
    4. ip a 명령어로 출력되는 결과물

VIP 연결 불가

...

      1.                SYN_SENT    8033/drx           
        unix  3      [ ]         STREAM     CONNECTED     18779  2477/gconfd-2      
        unix  3      [ ]         STREAM     CONNECTED     20779  2512/gnome-panel   



    1. Once the BSR and DRX are connected, verify that the resource's BSR IP and DRX's IP are in the EASTABLED state in the netstat output.
    2. Verify that there are no logs in the drx logs for failures (e.g. connection refuse).
  1. If you get to this stage, collect support files to get logs and have someone analyse them.


VIP unreachable

If socket binds are performed over the same VIP on both Active/Standby nodes using VIP, communication interference between the two nodes may occur. When interworking with VIP (SDR, MDR, etc.), the DRX of the standby node must be stopped.

When failing over to the standby node, the reverse is true: the DRX of the Active must be brought down (down) and the DRX of the Standby must be started (up) before the resources of the Active are started (up) to ensure a smooth connection.