...
Following is a part of consolidated web-based dashboard of EMS Server.
Servers with failures are shown in red, servers that had failure and had notified to the server operators are shown in yellow, and servers that operate normally are shown in blue.
Users registered in EMS server are the only ones that can monitor the dashboard.
[Figure] Redundant server monitoring view of EMS system
[Figure] Statistic view of EMS system
...
- MCCS will show the failure when failure occurs in standby server.
- Data replication will be paused until standby server is back to normal.
[ Figure] Failure in Standby Server - If I/O keep happens, data is impossible to replicate and mirror disk will be shown as 'Paused'( )
- If there is no I/O, icon of mirror disk has no change but failure messages related mirror disk exists in MCCS log.
- Even if the standby server failed, it does not affect operation. But as there is no server to perform failover to, the server operator must check the trouble in the MCCS web console and make sure that the standby server is normalized in time.
- When When standby server is back to normal, it will recover from 'Paused' to 'Normal' and icon will be disappeared.
...
Service Network Failure
If failure occurs in service network of active server, the fault mark will be shown on the network interface card resource or IP address of the node in MCCS UIweb console, and will failover to the standby server.
[Figure] Failure in Network Interface Card
...
Replication (Mirroring) Network Failure
When failure occurs in replication network, data cannot be replicated and it will be shown as 'Paused()' in mirror disk resource of MCCS web console.
[Figure] Failure in Replicated Network
- Replication network failure history can be checked on MCCS log, Window System log. If failure occurs in replication network, server operator should check on the TCP/IP of server, physical connection check on the replication network through ping test.
- If it is an abnormal situation, check on card, cable connection or cable disconnection and clear the cause of the failure.
...
Source Disk Failure
If failure occurs in disk resource of active server, MCCS GUI will web console will show the failure. MCCS will failover to the standby server since it is impossible to Read/Write in the disk.
[Figure] Failure in Mirror Disk
- Availability of disk monitoring of MCCS are as below.
- Periodic read/write test on the disk.
- Determines whether drive letter exists in the disk.
- Disk failure can be caused by the following. After resolving the above issues, the OS will detect the newly changed disk again. After that, Datakeeper will proceed with resynchronization.
- Disk controller problems or H/W problems should be fixed by the manufacturers.
- Physical disk problems or H/W problems should be fixed by the manufacturers.
- If the mirror disk does not perform synchronization, delete the mirror disk resource and try to create it again. But when you delete the resource, you must also delete the created mirror and create them again. Disk controller problems or H/W problems should be fixed by the manufacturers.
Physical disk problems or H/W problems should be fixed by the manufacturers.
Target Disk Failure
If failure occurs in disk resource of active server, MCCS UI will web console will show as 'Paused'. It does not affect the operating service of active server.
[Figure] Failure in Target Disk
- When MCCS detects failures of the target disk, it will only determine whether the disk has drive characters.
- Disk failure can be caused by the following. After resolving the above issues, the OS will detect the newly changed disk again.
After that, the OS will detect the newly changed disk again.
After that, Datakeeper will proceed with resynchronization.Datakeeper will proceed with resynchronization.- Disk controller problems or H/W problems should be fixed by the manufacturers.
- Physical disk problems or H/W problems should be fixed by the manufacturers.
- If the mirror disk does not perform synchronization, delete the mirror disk resource and try to create it again.
But when you delete the resource, you must also delete the created mirror and create them again.- Disk controller problems or H/W problems should be fixed by the manufacturers.
- Physical disk problems or H/W problems should be fixed by the manufacturers.
Split Brain of Mirror Disk Resource
This happens rarely but mirror disks identifies as source on both servers. This happens in the process of changing from existing source cannot be changed to target source.
Both servers will try to synchronize the data and that cause the split brain. Split brain occurs in the situation as shown below.
- Failover happens due to a failure in source server (A).
- MCCS sends a 'SWITCHOVERVOLUME' command.
- Without deleting the existing source volume, target source (B) has switched to source.
- Reboot the original source server(A).
- After booting the original source server(A), check in the role of target server(B).
- If target server(B) is source, it determines itself that failover happened normally and MCCS tries to switch the original server (Node A) to target.
- Role cannot be checked due to the problem in mirror network or other errors. (Process of 5 and 6 get failed)
- Mirror role of both servers become source.
In this case, icon of mirror disk resource() is shown as put on one another in MCCS Console web console and attribute value of 'MirrorRole' on both servers are 'Source'.
When this happens, role of mirror disk should be changed manually, and after the change, re-synchronization process happens.
There is a way to change the role of mirror disk, using MCCS Consoleweb console.
How to resolve the split brain issues by using the MCCS web console
...
Remove Reservation key and registration key using scsicmd.cmd -c or -cf command and re-set. Before registering resource, check if there is any registered key and if there is, remove the key first before registering.
Note that the current key is se set automatically by its MAC address. It uses the first adapter among the network adapters. This key is automatically recorded in setting file. If key does not exist in setting file then new key is not created.
...