Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Section
Column

MCCS (Mantech Continuous Cluster Server) is a high-availability solution for mission-critical computing systems developed by Mantech
자연 재해나 시스템 장애와 같이 일반적으로 일어날 수 있는 장애 또는 테러, 강제 점거, 운용자의 실수 등
예기치 못한 상황에 의해 서비스가 중단되는 경우 등의 문제점을 해결하고 서비스의 지속을 위해 시스템의 가용성을 제공하는 고가용성 솔루션입니다.
현재 산업사회에서는 기업의 전반적인 업무가 컴퓨터에 의해서 제공되고 있으며 기업의 핵심 정보에 대해 24시간 365일 지속적인 서비스 제공을 필요로 합니다.
기업의 중요한 업무일수록 사용량이 많아지게 되므로 이로 인해 시스템의 다운타임이 발생될 가능성 또한 높아집니다.
고품질의 서비스를 중단 없이 항상 제공해야 하는 기업의 입장에서는 시스템 장애가 발생하면 기회비용의 막대한 손실이 발생하는 등 심각한 문제가 야기됩니다..
It is a highly available solution that resolves the problems with general failure such as natural disasters, or system failures, or unexpected service interruptions caused by terrors, forced occupation or an operator's mistakes and provides the system availability to maintain the service.
Normal business of companies in our current industries is conducted in computers, and most of the core information for this needs continuous service on a 24/7 basis. Mission critical systems require no downtime server infrastructure, but you will meet many types of service failures which bring in negative business impacts. 
MCCS solves these problems and provides a high-availability solution for missions-critical systems.

 

Column
width350px
Panel

이 페이지의 주요 내용Table of Contents

Table of Contents
maxLevel4

 

Features of MCCS

MCCS has the following features:.

Service Level Fault Detection and Automatic Failover 

MCCS not only detect on hardware, but it also offers detection on resources (such as network connection, application, platform, disk, etc) as well.
Therefore, if the managed resources are not working properly, it will be either restarted or failovered to the standby server according to the defined action policies.

...

MCCS supports both local disk replication and shared disk environment. Thus, it provides enhanced flexibility to make equip any type of storage for the user.

...

Support Failover Group

 

Failover group is including the resources that can be run on only one node at a time. 
Application will be switch over to standby node at critical reource is failed in a failover group.
The resources that included group are online only available node.
 
For example, IP Address is not onlined on an active node and standby node at same the time.
Therefore, when IP address is get failed, it and its related resources will be fail over to the standby node. 

...

Parallel group is a group of resources that are taken online on more than two nodes at the same time. Parallel group cannot failover.
The opposition of this is Failover Group. The group can be online on a node in a cluster within a failover group.
However, when configured through parallel group, specified application can be online, monitored and terminated on the multiple nodes at the same time in a cluster.
For example, user would like to run backup software on both nodes and be managed by the MCCS, backup software should be online on both nodes at the same time.
You can use parallel group instead of failover group for this.

...

Group is a set of collected resources dependent on one another to manage resources (ex. Network Card, IP address, disk application etc.) such as failure monitoring, recovery, failover within MCCS. It is also a set of resource to failover.
MCCS can configure more than one group in a cluster. Thus, a configuration of Active/Passive, Active/Active is made possible. (Active/Active means different services on different server)
You can configure parallel and failover goup at the same time. Each group will be managed independently by MCCS in a multi-group environment.

...

MCCS is not influenced greatly by the hardware compatibility in a cluster.
Thus, there is no need to use the exact same hardware between the nodes in a cluster.

...

MCCS is a high availability that can be configured with no distance limitation between nodes.
That means standby server can be easily configured on a remote site without extra charge.

관리 웹 콘솔(Management Web Console) 제공

...

Offers Remote Management Web Console

MCCS allows user to use a web browser to control MCCS from a local server or remote server.
At each server, you can connect to the server where MCCS is configured. There is no need to configure an additional environment file.
(Default access URL is 'http://ipaddress:10080/main'
입니다.)



MCCS Configuration

MCCS Configuration is distinguished according to the disk configuration of data as listed below.

...

In a generic application environment, data needed for service is not in the system that has cluster but exists in other system.
Since data does not exist, there is no need to configure disk resource. Only application program exists in general in this environment.

Image RemovedImage Added

[FigureFigureBasic Application Environment

Shared Disk Environment

Shared disk (SCSI or SAN) is connected to both servers physically.
MCCS attaches the file system on shared disk to the only active server and lock to the standby server to prevent file system corruption.

Image RemovedImage Added

[Figure] Shared Disk Environment

MirrorDisk Environment

Each local disk exists in each server and data is replicated real-time to the disk of another node.
Least unit of disk replication is in partition unit. (Recommended unit : LUN)

Image RemovedImage Added

[Figure] Mirror Disk Environment

...

Cluster is a set of nodes exchanging state information for redundant purpose.
MCCS can configure cluster within two systems. In addition, additional Ethernet network connection which is called heartbeat is used to communicate between two systems, and TCP/IP is used to send the status information.

...

Node is a physical or virtual computer system with a MCCS installed and running.
Node name does not need to exactly the same with the hostname of the system, and can be set as an alias in the Cluster. Mapping  
Mapping between alias and the hostname is decided by its heartbeat network address.
Node state is related to the engine state of MCCS. Node state information is collected from MCCS  when MCCS  is, and when MCCS is not running, node state information is collected through ICMP test. 

...

MCCS  uses 4 TCP ports for heartbeat connection.
환경을 구성할 때 등록한 포트를 기준으로 4개의 연결된 TCP 포트(기본값은 14321~14324) 와 ICMP Echo Request 를 사용합니다Four TCP ports (default is 14321~14324) registered to configure the environment and ICMP Echo Requests are used.

Check the state of a node

MCCS refers to heartbeat connection when determine a fault of remote node.
all the heartbeat connection is disconnected from the remote node, it is considered as a failure.
Therefore, redundant heartbeat connection is strongly recommended  because failure on heartbeat network can result in node failure.
Please refer to "4. 노드에서 MCCS 동작 원리  Node of MCCS UserGuide" for more details about node state.

...

Commands such as changing configuration of resource or control resource and group can be done through using GUI web console or CLI. MCCS uses heartbeat line to send commands to another node.

...

Resources are hardware and software components, such as network interface cards (NICs), IP address, applications, disk and etc managed by MCCS.
MCCS can monitor the state of resources and can control which are bring service online, take offline, enable and disable those.. 
There are two categories of resources in MCCS which are ‘PERSISTENT’ and ‘ON-OFF’. Most resources such as IP address, disk, process and service are ON-OFF which MCCS bring online and offline based on management policies.
On the contrary, a PERSISTENT resource such as NIC cannot be brought online or taken offline by MCCS  and only monitored the status and operation.

Resource Type

MCCS에서 지원하는 리소스는 다음과 같이 분류됩니다.

네트워크카드

MCCS 는 TCP/IP  기반 네트워크 연결을 모니터 합니다. 
일시적인 연결 장애나 네트워크 어댑터, 케이블의 네트워크에 장애를 감지합니다.

네트워크주소

MCCS 는 전환가능한 노드안의 네트워크 장치 위에서 가상 IP 주소와 서브넷 마스크를 구성하고 가상 IP 주소를 감시하며 노드의 실제 IP 주소와 같은 방식으로 동작합니다.
가상 IP 주소를 할당할 네트워크 카드의 실제 IP 주소는 정적이어야 합니다.

기본 응용

단일 실행 파일 형태의 프로세스를 등록할 때 사용하며, MCCS는 운영 체제의 프로세스 테이블에 등록된 프로세스 이름의 존재 여부를 체크해서 장애를 감지합니다.

복합 응용

기본 응용과 달리 여러 개의 프로세스로 이루어진 응용 프로그램 또는 톰캣과 같이 스크립트로 실행해야 하는 응용 프로그램 등을 등록할 때에 사용합니다.
단순히 실행 파일만을 감지하지 않고, 프로세스에 대한 시작/종료/감지 기능을 수행하는 스크립트 명령을 이용하여 사용자가 원하는 방법으로 정상적인 동작을 감시하고자 할 경우에도 복합 응용으로 등록하여 사용합니다.

공유 디스크

MCCS 는 외부 공유 스토리지에 연결된 노드의 I/O 경로의 상태를 감시합니다.

미러 디스크

클러스터에 공유디스크가 없을 경우, 데이터는 로컬 혹은 직접 연결된 스토리지에 저장됩니다. 
이런 환경에서 TCP/IP 를 통한 미러 구성요소는 변경된 데이터를 복제할 때 사용됩니다.

서비스

서비스는 윈도우의 SCM(Service Control Management)에 의해 관리되는 프로세스를 관리합니다.

가상이름

윈도우 기반 NetBIOS 이름을 관리합니다.

스카시예약

스카시예약 에이전트는 SCSI3-PR(Persistent Reservation) 이라는 스토리지가 지원하는 SAN 프로토콜을 사용하여 LUN 단위의 Lock 을 관리합니다.
이 기능은 클러스터의 모든 노드가 다른 노드의 상태를 알 수 없을 때 데이터 손상을 방지합니다.

리소스 그룹

서로 연관(의존) 관계에 있는 리소스들의 집합으로 페일오버가 이루어지는 기본 단위입니다. 
예를 들어 ORACLE 서비스에 대한 페일오버 구성을 하기 위해서는 ORACLE 클라이언트가 접속하는 IP 주소, IP 주소가 할당되는 네트워크 카드, 데이터가 저장될 디스크, ORACLE 리스너와 ORACLE 서버를 하나의 리소스 그룹으로 묶어야 합니다.
IP 주소는 네트워크 카드에 의존하며, ORACLE 리스너와 ORACLE 서버는 데이터가 존재하는 디스크에 의존하는 관계로 그룹을 구성해야 합니다. 
IP 주소의 경우, 네트워크 카드 없이는 등록이 불가능하며, ORACLE 관련 프로세스도 데이터가 저장된 디스크 없이는 구동이 불가능하기 때문입니다.
그룹은 서비스의 페일오버 여부에 따라 병렬형과 페일오버형으로 나뉘어집니다. 
병렬형과 페일오버형에 대한 설명은 "MCCS 사용자 안내서의 5.리소스 그룹"을 참조해 주십시오.Resources that managed by MCCS are classified as below..

Network Interface Card

MCCS monitor the TCP/IP based network connectivity. It detects the network unplug, ethernet adapter failure, or cable failure.

IP Address

MCCS manages virtual IP address and subnet mask which will be assigned on a NIC and switchable node to node in case it reacts in the same manner as the node's real IP address.
A real IP address which is static must be set in the NIC to add on a virtual IP address.

Process

Process is used when register single execution file. MCCS detects the failures by checking if the process name exists in the process table of operating system.

Application

It is similar to process. But this is more complicated. Application works with several application or scripts such as tomcat.
MCCS does only detect on execution file, but it also brings online/ take offline/monitor the process by using pre-defined script.

Shared Disk

MCCS monitors the status of I/O path for a node connected to the external shared storage.

Mirror Disk

 When there is no shared disk in a cluster, data is saved in local or the direct-connected storage.
In such an environment, TCP/IP based mirror components are used to copy the changed data.

Service

Service is a component that is managed by SCM (Service Control Management) of Window O.S. 

Virtual Name

It manages the Windows based NetBIOS names.

SCSI-LOCK

SCSI-LOCK uses SAN protocol which is called SCSI-3 PR(Persistent Reservation).
This is needed to prevent any data corruption which enable only one node can access a disk volume in environment where disk volumes are shared between nodes. 

Shared disk DR

The disk shared between the two nodes of MCCS is copied to a remote server, so that data can be safely protected from disasters.
A remote server has no MCCS installed in it. Thus, you need to establish an additional service recovery plan.

VxVM disk group

It is used to manage a disk group of Veritas Windows Volume Manager.
The sub volumes of a disk group are not individually managed.

Oracle database

This resource controls the Oracle database.
It detects the Oracle database installed on the system, controls the service and monitors the status.


Resource Group

Resource group is a set of resources with dependency and MCCS failover the entire resources in a group. 
For example, to manage the ORACLE service by MCCS, IP address that is connected by client, network interface card, disks which database is stored, ORACLE listener and server service must be enclosed by a unit which is called as Group.
Group should be configured as IP address dependent on network interface card, ORACLE server dependent on disk where database is stored, ORACLE listener dependent on ORACLE server to provide proper service.
IP address cannot be assigned without network interface card, process related to ORACLE can be bring online after disk completed online. If there is more than a group defined on a cluster, one group will failover without affecting the other groups.
There are two types, parallel and failover group, according to the service.
Please refer to "5. Resource Group of MCCS User Guide" for more details about parallel and failover group.