Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1.  There is no difference in the MCCS role resulting from abnormal or normal termination of the server. MCCS will perform a failover to the standby server when the operation server fails.
    In the node management menu on the right side of the screen, select the server. You can check the details of failures in the 'Resource Status' & 'Resource Dependency' screens.
    • Normal Termination of a system
      This is a case where user selected 'system shutdown' in operating systems.
    • Abnormal Termination of a system 
      This is a case where system is terminated or rebooted due to an unexpected situation or blue screen.
    Image RemovedImage Added
    Figure] Failure in Active Server

  2. Since data cannot be replicated due to the server failure, will be shown in the mirror disk resource.
  3. Server operators check on the failure and put the server back to normal.
  4. After checking on the mirror role of two servers when server with the failure is rebooted, switch the server with the failure as replication target and proceed partial resync.

...

  1. MCCS will show the failure when failure occurs in standby server.
  2. Data replication will be paused until standby server is back to normal.
    Image RemovedImage Added
    Figure] Failure in Standby Server

  3. If I/O keep happens, data is impossible to replicate and mirror disk will be shown as 'Paused'(change of icon)
  4. If there is no I/O, icon of mirror disk has no change but failure messages related mirror disk exists in MCCS log.
  5. Even if the standby server failed, it does not affect operation. But as there is no server to perform failover to, the server operator must check the trouble in the MCCS web console and make sure that the standby server is normalized in time.
  6.  When standby server is back to normal, it will recover from 'Paused' to 'Normal' and  icon will be disappeared.

...

  • MonitorInterval
    Monitors the resource with interval set value. (Default Value=10sec)
  • MonitorTimeout
    If there is no reply as much as the set value, it is considered as a failure. (Default Value=10sec)
  • RestartLimit
    It will restart the application resource as the set value. (Default Value=0)
  • OnlieTrustTime
    It re-sets the time of number of resource restarting number.It is the time to reset the frequency of the resource to restart. (Default Value=600sec)
    Attributes above are the set value of the registered being added the resource, and users can check or change the values through Resource Attribute view of MCCS console. 
    Image RemovedImage Added
    [Figure] Resource attribute value Edit

  1. MCCS periodically monitors the resources referring  'MonitorInterval'.
  2. If there is no response as the time set in 'MonitorTimeout', it is considered as a failure.
  3. If there are no response after sending the command as the number set in 'RestartLimit', MCCS will failover the group which resource belongs to.
  4. If the resource stays in normal state within the time limit set by 'OnlineTrustTime'. MCCS will initialize the attribute value of 'RestartLimit'. This is to ensure restart number when failure occurs in a resource.
  5. If there is a failover due to a failure in the resource, server operator checks on the problem and put it back to normal.
  6. In the MCCS web console, a user can see where the trouble occurs. After a user checks the trouble area, they must remove the Trouble sign, so that the failover function can be activated again
  7. After checking on the mirror role of two servers when server with the failure is rebooted, switch the server with the failure as replication target and proceed partial resync.

    Image RemovedImage Added
    [Figure] Failure in Resource Clear

...

  • Service Network Failure

    If failure occurs in service network of active server, the fault mark will be shown on the network interface card resource or IP address of the node in MCCS UI, and will failover to the standby server.

    Image RemovedImage Added
    [Figure] Failure in Network Interface Card


...

  • Heartbeat Network Fault

    Heartbeat should be dualized because it plays a very important role of synchronizing the inter node status and determining the condition of failure. If any one of the dualized heartbeat network fails, the details of failure is displayed in the log window.
    However, the MCCS web console has no changes. It means that the operation server or the standby server has no problems.
     

    At this point, when failure occurs in active server and needs to failover to the standby server, MCCS will use redundant normal heartbeat network to failover.
    If all the redundant heartbeat is disconnected, MCCS will use the service network as heartbeat line.

           Image RemovedImage Added

          [Figure] Failure in Heartbeat

...

  • Replication (Mirroring) Network Failure

    When failure occurs in replication network, data cannot be replicated and it will be shown as 'Paused' in mirror disk resource of MCCS console.

    Image RemovedImage Added
    [Figure] Failure in Replicated Network


...

  • Single Network Switch Fault

    When failure occurs in network switch connected to Public Network where it is configured by single network switch, all the resources in active and standby server will be taken offline, resources where failure occurs will show as 'fault'.

    Image RemovedImage Added
    [Figure] Failure in Network Switch


...

  • Source Disk Failure

    If failure occurs in disk resource of active server, MCCS GUI will show the failure. MCCS will failover to the standby server since it is impossible to Read/Write in the disk.

    Image RemovedImage Added 

    [Figure] Failure in Mirror Disk


...

  • Target Disk Failure

    If failure occurs in disk resource of active server, MCCS UI will show as 'Paused'. It does not affect the operating service of active server.

    Image RemovedImage Added
    [Figure] Failure in Target Disk


...

  1. Check the resource attribute view.
    Image RemovedImage Added
    [Figure] Verify SplitBrain of MirrorDisk


  2. Check the mirror management view.

    Image RemovedImage Added
    [Figure] Checking Mirror Disk Split Brains

     

    Warning

    1) The both nodes' MirrorRole is Source, and their MirrorState is MIRROR_PAUSED.
    2) Check the mirror disk's TimeAquiredSourceRole. (TimeAquiredSourceRole is the system time. So, it is not the absolute value used to determine whether it is the latest data.)
    3) When a split brain occurs, the log will be displayed. 
    (Windows event error: An invalid attempt to establish a mirror occurred. Both systems were found to be Source. 
    Local Volume: F Remote system: 200.200.124.49 Remote Volume: F The mirror has been paused, or left in its current non-mirroring state. 
    Use the DataKeeper User Inteface to resolve this Split Brain condition.)
    4) In the mirror management window, the mirror condition is set to 'SPLIT'.




  3. In the Group tab of the configuration tree, right click mirrordisk resource and you can select the source node when you place the cursor on the "Resolve Split Brain".
    Image RemovedImage Added
    [Figure] Split Brain Resolving Selected

  4. Display the window to explain split brains.
    Image RemovedImage Added 
    [Figure] Checking the Source Node Selection

  5. Select the source node.
    Image RemovedImage Added
    [Figure] Source Roll Node Selection


  6. Recheck the selected source node.
    Image RemovedImage Added
    [Figure] Rechecking the Source Node Selection


  7. Split brains problems being resolved.
    Image RemovedImage Added
    [Figure] Split Brain Resolved


  8. Resolving split brains problems is finished.
    Image RemovedImage Added
    [Figure] Resolving Split Brain Finished


  9. The selected node becomes the source node and the mirror disk condition is changed to MIRRORING. 
    Image RemovedImage Added
    [Figure] Split Brain Resolved


    Warning

    The changed information of node B will be all overwritten.

     

...

When the external disk fails or has a bad connection path, you cannot read/write the disk. So, MCCS will display the sign of failure and proceed with a failover.

Image RemovedImage Added

[Figure] Failure in Shared Disk

...

  1. In the MCCS web console, click 'File' on the menu bar to collect support files.
    Image RemovedImage Added
    [Figure] Support file Collect Icon 1

  2. Support files can be collected by clicking the toolbar shown in the figure below.
    Image RemovedImage Added
    [Figure] Support File collect icon 2

  3. You can select a node to collect support files from and get the previous support file again.
    Image RemovedImage Added
    [Figure] Support File Node Selection and Previous Support File Selection

  4. Click 'OK' button and support file is collected.
    Image RemovedImage Added
    [Figure] Support Files Being Collected

     

    Info

    It may take several minutes depending on the log file capacity and the network condition.

  5. As shown below, you can download it from the download window.
    Image RemovedImage Added
    Image Removed
    Image Added
    [Figure] Support Files Collection Checked



...