13. Agent.
An agent is a program designed to manage the resources.
Each resource type has a corresponding agent. Agents control resources according to its state. The agent aware the resource state and communicates them to MCCS.
A resource cannot be brought online or taken offline without an agent, and the actions required to do either differ from resource to resource.
For an example, filesystem will be mounted, unlocked, and have the write access to make a disk resource online. However, to make a process resource online is to execute the program.
MCCS agents are multithreaded which means a single agent manages multiple resources of the same resource type on a node
MCCS provides the following agents.
Table Of Contents
Agent State Change
Agent State
Agent state can be checked from agent state value from resource state view of detailed information panel.
Agent State | State Description |
Detached | It is in disabled state where agent thread hasn't started yet, therefore resource is not monitored yet. |
Opening | It is preparing to monitor where resource is enabled. |
Probing | It is probing and verifying stage where the resource is ready to be used. |
Online | Resource is in Online state and is being monitored. |
Offline | Resource is in Offline state and is being monitored. |
GoingOffline | Agent is taking offline a resource which it is online state. |
GoingOfflineWait | It is a state command which agent takes online a resource was completed and resource is ready to offline state. |
GoingOnline | Agent is bringing online a resource which it is offline state. |
GoingOnlineWait | It is a state command which agent brings online a resource is completed and resource is ready to online. |
Agent State Change
When a resource has been enabled, an agent will start monitor a resource after probing.
After determining the state based on the results of the monitoring, agent gets the command and its state will be changed.
Below is the flow of the change of agent state.
[Figure] Cycle of Agent State
- When a resource is first added, it will be 'Disabled' state and its corresponding agent will be 'Detached' state.
- When a resource is 'Enabled', an agent starts monitoring this after process of ‘Opening’ and ‘Probing’.
- If a resource in online state is taken offline, an agent becomes 'GoingOffine' and when the command is done, it becomes 'GoingOfflineWait'.
- If monitoring result is Offline, agent state is Offline.
If monitoring result is still Online, it repeats monitoring according to the set number interval it becomes Offline.
If the time of offline command has exceeded the time value defined in 'OfflineTimeout' attribute, offline process will be canceled and a resource state will go back to online state. - If a resource in offline state has been brought online, agent becomes ‘GoingOnline.’
- After 'Online' command is done and becomes 'GoingOnlineWait' monitoring begins and will repeat the monitoring according to value of 'OnlineWaitLimit' attribute in value of 'OfflineMonitorInterval' attributes until it comes online.
If the monitoring result is that a resource in online state, agent is also online state, and if it does not become online in value of 'OnlineWaitLimit', it is considered as a failure. An agent state will be changed and cycled as above figure according to result of a resource state. But if a resource state is changed by external reasons not by operation through MCCS, this cycle wasn’t applied. - If a resource state changes straight from Online to Offline without operation by MCCS, it is considered as a failure or abnormal state.
MCCS goes through a change of sending a command and while monitoring, agent state changes as well according to the state. It will go through a state change of 'Going~' and this only applies to MCCS. - For an example, if an online application is terminated due to an external error, it is considered as a failure.
Also, in case that a resource is brought online on a standby node by force even though a group including this resource has been online on a active node in failover mode group, MCCS considers that this is abnormal state and take offline the resource on a standby node. - If there is no change in the state as shown in the diagram above, it is considered as a failure or abnormal state.