1. Installation Prolog
MCCS (Mantech Continuous Cluster Server) is a high-availability solution for mission-critical computing systems developed by Mantech.
It provides continuous computing environment due to unplanned downtime such as natural disasters, or system failures, or unexpected service interruptions caused by terrors, sabotage or a human error and minimize the downtime due to planned downtime like periodic maintenance.
Normal business of companies in our current industries is conducted in computers, and most of the core information for this needs continuous service on a 24/7 basis. Mission critical systems require no downtime server infrastructure, but you will meet many types of service failures which bring in negative business impacts.Â
MCCS protects and make always on your mission critical application from these failure and disaster scenarios.
Table of Contents
Features of MCCS
MCCS has the following features.
Service Level Fault Detection and Automatic FailoverÂ
MCCS not only detect on hardware, but it also offers detection on resources (such as network connection, application, platform, disk, etc) as well.
Therefore, if the managed resources are not working properly, it will be either restarted or failovered to the standby server according to the defined action policies.
Support External Shared Storage and Local Disk Replication Environment
MCCS supports both local disk replication and shared disk environment. Thus, it provides enhanced flexibility to make equip any type of storage for the user.
Support Failover Group
Failover group is including the resources that can be run on only one node at a time.
Application will be switch over to standby node at critical reource is failed in a failover group.
The resources that included group are online only available node.
For example, IP Address is not onlined on an active node and standby node at same the time.
Therefore, when IP address is get failed, it and its related resources will be fail over to the standby node.Â
Support Parallel Group
Parallel group is a group of resources that are taken online on more than two nodes at the same time. Parallel group cannot failover.
The opposition of this is Failover Group. The group can be online on a node in a cluster within a failover group.
However, when configured through parallel group, specified application can be online, monitored and terminated on the multiple nodes at the same time in a cluster.
For example, user would like to run backup software on both nodes and be managed by the MCCS, backup software should be online on both nodes at the same time.
You can use parallel group instead of failover group for this.
Support Multi Group
Group is a set of collected resources dependent on one another to manage resources (ex. Network Card, IP address, disk application etc.) such as failure monitoring, recovery, failover within MCCS. It is also a set of resource to failover.
MCCS can configure more than one group in a cluster. Thus, a configuration of Active/Passive, Active/Active is made possible. (Active/Active means different services on different server)
You can configure parallel and failover goup at the same time. Each group will be managed independently by MCCS in a multi-group environment.
Cluster between Heterogeneous Platforms
MCCS is not influenced greatly by the hardware compatibility in a cluster.
Thus, there is no need to use the exact same hardware between the nodes in a cluster.
Remote Disaster Recovery
MCCS is a high availability that can be configured with no distance limitation between nodes.
That means standby server can be easily configured on a remote site without extra charge.
Web based management console
MCCS allows user to use a web browser to control MCCS from a local server or remote server.
At each server, you can connect to the server where MCCS is configured. There is no need to configure an additional environment file.
(Default access URL is 'http://ipaddress:10080/main'.)
Support IPv6 network environment
MCCSÂ supports both IPv4 and IPv6 network environment configuration.Â
Therefore, you can configure the existing been using IPv4, IPv6 network environment, disaster recovery system intact.
MCCS Configuration
MCCS Configuration is distinguished according to the disk configuration of data as listed below.
Generic Application Environment
In a generic application environment, data needed for service is not in the system that has cluster but exists in other system.
Since data does not exist, there is no need to configure disk resource. Only application program exists in general in this environment.
[Figure]Â Basic Application Environment
Shared Disk Environment
Shared disk (SCSI or SAN) is connected to both servers physically.
MCCS attaches the file system on shared disk to the only active server and lock to the standby server to prevent file system corruption.
[Figure]Â Shared Disk Environment
MirrorDisk Environment
Each local disk exists in each server and data is replicated real-time to the disk of another node.
Least unit of disk replication is in partition unit. (Recommended unit : LUN)
[Figure] Mirror Disk Environment
MCCS Component
MCCS basically has the following components.
Cluster
Cluster is a set of nodes exchanging state information for redundant purpose.
MCCS can configure cluster within two systems. In addition, additional Ethernet network connection which is called heartbeat is used to communicate between two systems, and TCP/IP is used to send the status information.
Node
Node is a physical or virtual computer system with a MCCS installed and running.
Node name does not need to exactly the same with the hostname of the system, and can be set as an alias in the Cluster.Â
Mapping between alias and the hostname is decided by its heartbeat network address.
Node state is related to the engine state of MCCS. Node state information is collected from MCCS Â when MCCS Â is, and when MCCS is not running, node state information is collected through ICMP test.Â
Heartbeat
Nodes are configured as a cluster through the communication path called heartbeat.Â
When nodes are synchronized the status of each nodes through heartbeat, service operation between nodes and standby role is decided by state and resource state of the nodes, and command will be sent through the event.
Synchronizing State Information
It exchanges state and attribute information of the resource and resource group of node configured with cluster.
Communication Port
MCCS  uses 4 TCP ports for heartbeat connection.
Four TCP ports (default is 14321~14324) registered to configure the environment and ICMP Echo Requests are used.
Check the state of a node
MCCS refers to heartbeat connection when determine a fault of remote node.
all the heartbeat connection is disconnected from the remote node, it is considered as a failure.
Therefore, redundant heartbeat connection is strongly recommended  because failure on heartbeat network can result in node failure.
Please refer to "4. Node of MCCS Manual" for more details about node state.
Sending command between nodes
Commands such as changing configuration of resource or control resource and group can be done through using web console or CLI. MCCS uses heartbeat line to send commands to another node.
Resource
Resources are hardware and software components, such as network interface cards (NICs), IP address, applications, disk and etc managed by MCCS.
MCCS can monitor the state of resources and can control which are bring service online, take offline, enable and disable those..Â
There are two categories of resources in MCCS which are 'general(OnOFF)' and ‘monitor only(None)’. Most resources such as IP address, disk, process and service are 'General' which MCCS bring online and offline based on management policies.Â
On the contrary, a 'MonitorOnly' resource such as NIC cannot be brought online or taken offline by MCCS Â and only monitored the status and operation.
Resource Group
Resource group is a set of resources with dependency and MCCS failover the entire resources in a group.Â
For example, to manage the ORACLE service by MCCS, IP address that is connected by client, network interface card, disks which database is stored, ORACLE listener and server service must be enclosed by a unit which is called as Group.Â
Group should be configured as IP address dependent on network interface card, ORACLE server dependent on disk where database is stored, ORACLE listener dependent on ORACLE server to provide proper service.
IP address cannot be assigned without network interface card, process related to ORACLE can be bring online after disk completed online. If there is more than a group defined on a cluster, one group will failover without affecting the other groups.
There are two types, parallel and failover group, according to the service.
Please refer to "5. Resource Group of MCCS Manual" for more details about parallel and failover group.
Resource Type
Resources that managed by MCCS are classified as below..
Network Interface Card
MCCS monitor the TCP/IP based network connectivity. It detects the network unplug, ethernet adapter failure, or cable failure.
IP Address
MCCS manages virtual IP address and subnet mask which will be assigned on a NIC and switchable node to node in case it reacts in the same manner as the node's real IP address.
A real IP address which is static must be set in the NIC to add on a virtual IP address.
Process
Process is used when register single execution file. MCCS detects the failures by checking if the process name exists in the process table of operating system.
Application
It is similar to process. But this is more complicated. Application works with several application or scripts such as tomcat.
MCCS does only detect on execution file, but it also brings online/ take offline/monitor the process by using pre-defined script.
Shared Disk
MCCS monitors the status of I/O path for a node connected to the external shared storage.
Mirror Disk
 When there is no shared disk in a cluster, data is saved in local or the direct-connected storage.
In such an environment, TCP/IP based mirror components are used to copy the changed data.
Service
Service is a component that is managed by SCM (Service Control Management) of Window O.S.Â
Virtual Name
It manages the Windows based NetBIOS names.
Shared disk DR
It is the mirror set of shared disk in a cluster to an off-site DR node through WAN, so that mission critical service will be always-on from disasters.
Mirror disk DR
It is the 2nd mirror disk set of mirror disk in a cluster to an off-site DR node through WAN, so that mission critical service will be always-on from disasters.
VxVM disk group
It is used to manage a disk group of Veritas Volume Manager.
The sub volumes of a disk group are not individually managed.
Oracle database
This resource manages the Oracle database.
It detects the Oracle database installed on the system, controls the service and monitors the status.