Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

It is a guide to solving problems that may arise during

Table of Contents


Troubleshooting guides for issues that may arise in the process of configuring DRX.

Problem 1. An error occurred while installing DRX.

...


Installation errors

  • Issues with installing the Visual C++ Redistributable Package for Visual Studio 2013 (hereafter VS2013 Redistributable Package) when installing DRX for Windows
    • Problem phenomenon 
      • Error installing "VS2013 Redistributable Redistribution Package" installation error that automatically adds after installing DRX installation
      • Cause: VS2013 Redistributable Package IssueAn inherent flaw in the VS2013 redistribution package.
    • Solution

      • Windows Server 2012 R2

        • ConditionDescription: "VS2013 Redistributable Redistribution Package" requires KB2883200 (Windows Update) for Windows Server 2012 R2.
        • Solution:  Make sure Ensure that Windows Update KB2883200 is installed. If it is not installed, install it through Windows Update.
      • Windows Server 2008 R2 SP1
        • ConditionExplanation: An error of Error 0x800b010a occurs.

          • Code Block
            [0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed authenticode verification of payload: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64
            [0AD8:05C0][2018-07-26T15:33:04]e000: Error 0x800b010a: Failed to verify signature of payload: vcRuntimeMinimum_x64
            [0AD8:05C0][2018-07-26T15:33:04]e310: Failed to verify payload: vcRuntimeMinimum_x64 at path: C:\ProgramData\Package Cache\.unverified\vcRuntimeMinimum_x64, error: 0x800b010a. Deleting file.


        • Solution:  Update the Update Windows Update for additional ".NET Framework 3.5.1" entry in Windows Updateentries.

...

Unable to start a resource

  • Failed to read configuration files with settings due to UTF-8 with BOM .file format
    • ConditionProblem phenomenon

      • Failed to read drx.conf.

        Code Block
        titleDRX 로그
        E1120 16:37:02.690660 t42053 config] Failed to load [/opt/DRX/drx.conf]. /opt/DRX/drx.conf(1): '=' character not found in line


      • Failed to read drbd configurationsBSR settings

        Code Block
        titleDRX 로그
        E1120 16:37:52.810044 t42132 config] Failed to get drbd configuration: Can't get drbd configuration. (exit_code: 2560)
        E1120 16:37:52.810068 t42132 config] Output: drbd.d/1/r0.res:1: Parse error: 'global | common | resource | skip | include' expected,
        E1120 16:37:52.810070 t42132 config] Output: but got '▒'


      • Cause: BOM Configuration file parsing fails Failed to parse configuration file due to BOMbill of materials information.
    • Solution

      • Centos 6, 7
        • Check the file's encoding of the file with the file  commandcommand.

          Code Block
          [root@drxdev1 test]# file r1.res
          r1.res: UTF-8 Unicode (with BOM) text, with CRLF line terminators


        • Re-encoding via vi
          Open the file with vi, type the following, and save it.
          :set nobomb
      • Windows
        • Open the file with notepad and change the encoding to 'ANSI' via 'Save As'.

Problem 3. Can't connect between DRX resources.

...

Unable to connect

Because there are many possible reasons why a DRX connection has might not been be established. You should check these items in detail for the following sequence of configuration steps, you should follow the order of the replication connection configuration procedure and check it carefully. The following configuration sequence is a Linux-based troubleshooting guide and is equally applicable to Windows environments.

Network environments

...

based on Linux and is the same for Windows

Network environment

  1. Verify that the bsr's IP and drx's IP are set in the node's firewall policy allow list. If allowlist. If they are not enforced for the IP and port used by the resource has not been applied, take do the following actions.
    1. Centos 6

      Add the settings to 'what you want to set to the /etc/sysconfig/iptables ' file.

      Code Block
      -A INPUT -p tcp -s \{source IPip\} -d \{destination IPip\} --dport \{listenAllowed portPorts\} -j ACCEPT


    2. Centos 7

      Code Block
      Command to add port : firewallfirewall-cmd --permanent --zone=public --add-port=\{listen허용할 port포트\}/tcp 
      Command to restart firewall : firewall-cmd --reload
      Command to output opened ports : firewallfirewall-cmd --zone=public --list-all


  2. Ping Check the loopback addressping
    1. If there is a ping response with to the loopback address (127.0.0.1) , but no ping response with to the local ip IP address, there is a problem with the configuration of the your network environment. If In this is the case, you should contact your network administrator.

...

Version

  1. drbd : 8.4.8 or higher8 or later
  2. drbd -utils util : 8.9.10 or higherlsmod | grep drbd command to verify 10 or later
  3. Verify that the drbd kernel driver module is loaded with the lsmod | grep drbd command.


    1. Code Block
      [root@c65-3 build_files]# lsmod | grep drbd
      drbd                  374888  3 
      [root@c65-3 build_files]# 


  4. fsr: 1.2 or later

  5. bsr: all version availabl

Checking the DRX version

Make sure Ensure that the DRX version of on the local node is and the same as the DRX version of on the remote node are the same. Although DRX provides backward compatibility between versions, but it is recommended that you configure DRX to with the same version as of DRX whenever possible.

...

Check resource settings

  1. Make sure that Check if the resource configuration file is stored saved in ANSI format or UTF8 format → UTF8 with BOM format is not supported.
  2. Check hostname : Be settings: When changing the hostname, be careful about whether the hostname change it has been applied successfullycorrectly.
  3. Using Check whether individual ports by are used per resource:  Make sure Check that there is no port duplicationduplicate use of ports.

...


BSR Configuration

  1. Change the DRBD's resource configuration to a direct DRBD connection to ensure that it connects normally.drbdsetup show checks of the BSR to connect directly between the BSRs without interfacing with DRX to verify that the connection works.
  2. Verify with bsrsetup show that the ip loaded in drbd on the BSR is the same as the ip set in the resource file.
  3. Check the global entry to see if whether wfc-timeout  is is set in the global entry. If it is not set, set the wfc-timeout  value value to 1.
  4. Add the value of ping-timeout  to to the "net entry " entry of the resource. The default value is 500ms and we recommend setting , set it to 30 (3 seconds) generouslyto be generous.


DRX Configuration


  1. Connection between

    DRX


  2. Change all of DRBD's resources to 'standalone': drbdadm disconnect r0
  3. Install DRX and start drxsvc to check DRX connectivity.
  4. In netstat output, check whether DRX IP and port are

    간 연결

    1. bsr의 리소스들을 모두 standalone상태로 변경 합니다: bsradm disconnect r0
    2. drx를 설치하고 drxsvc를 start한 상태에서 drx간의 연결을 확인합니다.  
    3. netstat 출력물에서 drx ip와 포트가 LISTEN/ESTABLISHE/TIME_WAIT.
    4. If it is normal, the connection status of the resource is 'bridged'.
      1. At this time, DRBD status is 'standalone', and DRX is switched to 'connecting' / 'waiting' status to connect with DRBD.
    5. If the state of DRX of both nodes is 'bridging', it is a state to try to connect between DRX. If there is no change after a certain time, check connection on WAN section first.
      1. The icmp ping is usually blocked by firewall policy, so it checks for the possibility of a TCP connection between local and remote via drbdsim or other tools.
    Connections between DRBD and DRX
    1. Change the state of the DRBD resource from 'standalone' to 'connecting'. → Use drbdadm connect command to change the status.
      • Check that the status of the resource changes to 'WFConnection' in the log of cat /proc/kmsg/
    2. In normal situation, when DRBD and DRX are connected, it becomes 'established' state.
    3. If the status of the DRBD is 'connecting' and the connection is not established, check the netstat output to see if the IP of the DRBD is in the LISTEN state.
    4. Verify that the local DRX attempts SYN_SENT with the IP of the local DRBD.
    5. You may not be able to identify SYN_SENT in the output of netstat because the status of TCP can change quickly.
    6. Let netstat monitor the results continuously through the following scriptWAIT인지 여부를 확인합니다.
    7. 정상적일 경우 리소스의 연결 상태는 bridged 상태 입니다.
      1. 이 때의 bsr 상태는 standalone 이며 drx 가 bsr과 연결하기 위한 상태는 connecting / waiting 상태로 전환됩니다. 
    8. 양노드의 drx의 상태가 bridging이라면 drx간에 연결을 시도하는 상태이며 일정시간이 지나도 변화가 없다면 WAN 구간 상의 연결을 먼저 점검해 봐야 합니다.
      1. icmp ping 은 보통 방화벽 정책에 의해 차단되어 있을 가능성이 있기 때문에 drxsim등을 통한 로컬과 원격간의 TCP 연결 가능여부를 확인합니다.
  5. bsr - drx 간 연결
    1. standalone이었던 bsr 리소스의 상태를 connecting상태로 변경한다. → bsradm connect 명령어로 상태를 변경 합니다.
      • cat /proc/kmsg/의 로그에서 리소스의 상태가 Connecting으로 변경되는지 확인합니다.
    2. 정상 상황일 경우 bsr과 drx가 연결되면 established 로 연결이 성립됩니다.
    3. 만약 bsr의 status가 connecting이고 연결이 성립되지 않는다면 netstat 출력물에서 bsr ip가 LISTEN상태인지 확인합니다. 
    4. local drx가 local bsr ip로 SYN_SENT를 시도하는지 확인합니다.
      1. TCP의 상태변경이 신속하게 바뀔 수 있기 때문에 netstat에 SYN_SENT 상태 출력이 파악되지 않을 수도 있습니다.
      2. netstat의 결과를 다음과 같은 스크립트 형태로 지속적으로 모니터링 합니다.

        Code Block
        $> while(true); do date; netstat -nap | grep 779 | sort -k 3; sleep 1; clear; done
        Thu Aug 23 08:51:23 PDT 2018
        tcp        0      0 192.168.100.3:35814         192.168.100.3:7792          ESTABLISHED -                   
        tcp        0      0 192.168.100.3:7791          0.0.0.0:*                   LISTEN      -                   
        tcp        0      0 192.168.100.3:7792          192.168.100.3:35814         ESTABLISHED 8033/drx            
        tcp        0      0 192.168.100.3:7793          192.168.100.2:60676         ESTABLISHED 8033/drx            
        tcp        0      0 192.168.100.3:7795          0.0.0.0:*                   LISTEN      8033/drx            
        tcp        0      0 192.168.100.3:7796          192.168.100.2:43684         ESTABLISHED 8033/drx            
        tcp        0      1 10.10.0.182:50460           31.1.1.2:7793               SYN_SENT    8033/drx            
        tcp        0      1 10.10.0.182:57966           31.1.1.2:7796               SYN_SENT    8033/drx            
        unix  3      [ ]         STREAM     CONNECTED     18779  2477/gconfd-2       
        unix  3      [ ]         STREAM     CONNECTED     20779  2512/gnome-panel    
        
        
    5. When DRBD and DRX are connected, the netstat output checks that the DRBD IP and DRX IP of the resource are in the 'established' state.
    6. Check if the log output from DRX contains a failure (Ex. Connection refuse).
    Collect logs (Collect the output from the command)

    1. bsr과 drx가 연결되면 netstat 출력물에서 리소스의 bsr ip와 drx의 ip가 eastablished 상태가 되는지 확인합니다.
    2. drx 로그에 실패(Ex. connection refuse)에 대한 로그가 있는지 확인합니다.
  6. 로그 수집
    1. cat /etc/sysconfig/network-scripts/ifcfg-* 명령어로 출력되는 결과물을 수집합니다.
    2. /var/log/messages
    3. service iptables status
    4. ip a

Problem 4. DRX connection does not work well when configured with Virtual IP.

...

    1. 명령어로 출력되는 결과물



VIP 연결 불가

만약 VIP 를 사용하는 Active/Standby 양 노드에서 동일한 VIP 를 통해 소켓 Bind 가 수행될 경우 양노드 간의 통신 간섭이 발생할 수 있습니다. 따라서 DRX를 VIP로 연동(SDR, MDR 등)할 경우 대기 노드의 DRX는 반드시 기동을 중지해야 하고, 대기노드로 페일오버 후 리소스를 up 하기 이전에 DRX 를 기동시켜야 연결이 원활하게 수행될 수 있습니다.