Main Features

Synchronization

Before FSR performs data replication, it copies the entire data from the source node to the target node to match the data on the source and target. This process is called synchronization.

When performing the initial synchronization, the entire data is synchronized to the target, but if resynchronization is required after the initial synchronization is completed, partial synchronization is performed only for changes in the source node data to achieve efficient synchronization. Resynchronization is performed when the replication network is normalized, such as when the replication network connection is disconnected and reconnected during replication, or when a node is stopped (rebooted) and then reconnected. In other words, after the initial synchronization, FSR automatically performs partial synchronization whenever synchronization is required.

Synchronization can also be done manually by the user, for example, by issuing a sync command, configuring periodic synchronization, etc.

Replication

The behavior of reflecting changes to data to a real-time target is called replication. Changes to data are made even while synchronization is in progress, meaning that synchronization and replication are handled separately. Although synchronization and replication can be said to do the same thing in terms of reflecting data from the source to the target, FSR keeps them separate.

Replication is categorized as synchronous or asynchronous depending on how it is handled.

Synchronous methods fully guarantee target congruency by reflecting write I/O from one disk to the source and target disks simultaneously and then completing it. However, there are performance limitations in deploying performance-critical services in a synchronous manner because the replication response performance of the target node affects the local I/O latency performance.
The asynchronous approach considers replication complete when disk write I/Os are reflected locally and replication data is copied to the transfer buffer. This method does not fully guarantee target congruency because some data that has been in the process of buffering replication may be lost if there is a sudden node failure on the source side. However, it guarantees high performance with no impact on local I/O latency and is used to build distant replication without transmission bandwidth constraints.

FSR supports asynchronous replication by default. Asynchronous replication performs internal buffering to minimize the impact of local I/O latency, and the size of the buffers used should be set appropriately for your environment. Buffers are provided as memory buffers and file buffers, and their size is set within the available memory and disk limits of the system.

Online File Verification

Online File Verification of FSR is a function to perform file-level hash summarization, list, and real-time comparison of duplicate file SET of source node and target node. If there is a difference in the results of the comparison, the FSR informs the user and can resolve the difference by resync.

The FSR does not need to verify the integrity of the source and target under normal operating conditions. Online File Verification is useful in the following situations:
When you need to resolve unintentional operating situations, such as data being manipulated or deleted if the target's files are not protected, and you need to compare source and target differences.

For more information, refer to Online Verification

Split-brain detection

In the event of an unintended abnormal situation during replication operations, measures are required to protect the data on the source side. Among them, if it becomes the 2 primary at some point after disconnection (spli-brain), potentially causing data loss, it must be automatically detected and acted upon to block replication, a feature called split-brain detection. For more information about this, see Split-brain in Troubleshooting.

Snapshot

There may be times when failover fails due to a failure during replication operations, or when it is not possible to recover to the replication target due to security issues such as malware infection. In general, you should be prepared to recover in such cases by backing up your data. FSR supports recovery to a specific point in time through the snapshot feature and provides all the interfaces related to snapshot point-in-time recovery, including creating and managing snapshots and adjusting the space in the snapshot storage. For more information on this, see the snapshots section.

Terms

Node

A generic term for devices connected to a network, of which computer nodes are called hosts. Normally, node and host tend to be used without distinction. In this manual, node is used with the same meaning without distinguishing from host.

Cluster

A cluster is a collection of computer nodes for special purposes. The cluster here is a replication cluster, which contains the source and target nodes that are configured to perform replication, and the FSR represents these replication clusters on a resource basis.

Resource

A resource refers to a replication resource and is a unit that provides replication services. A resource consists of a node, a connection, and a (set of replication) filesets, which can be represented by an FSR configuration file. FSR interprets the replication environment and settings through a configuration file and performs replication based on this.

Source(Primary)

Source data nodes on a replication cluster are called source nodes or sources. And the role in the replication cluster is Primary.

Terms such as primary, source, and active nodes are used differently depending on the replication or HA environment, but are not usually strictly distinguished.

Target(Secondary)

The node that receives the original or incremental data on the replication cluster and maintains a copy is called the target node or target. And the role in the replication cluster is Secondary.

The terms Secondary, Target, and Standby Node are used differently depending on the replication or HA environment, but are not usually strictly distinguished.

For replication, Primary is always the source and Secondary is the target, but for synchronization, the secondary can also be the source. This is because even if only secondary nodes exist in the replication cluster, they must be allowed to synchronize based on the latest secondary.

File Set

A unit within a resource that describes a replication target. A fileset is described as a file or directory to be replicated and contains an exclusion filter. An exclusion filter is a policy that allows you to exclude some files or directories from replication targets. It is based on regular expressions such as wildcards.

Consistency

Represents data integrity, which means that the data on the source and target is exactly the same. File replication ensures byte-level data consistency.

Split-Brain

A split brain is a condition or status in which two or more nodes have a primary role at any point in the replication cluster, potentially causing data loss. When a split brain occurs, the user can decide which node has the primary role to sacrifice and resolve the split brain to normalize the replication.

RID

FSR maintains and tracks a ULID-based unique number that represents the file state of a fileset to be replicated. This value is called the revision identifier (RID). The FSR uses the RID to determine the direction of synchronization and identify the split brain.

Topology

FSR can configure various connections between nodes depending on the replication configuration. Typically operating in a mesh or star topology.

FSR User's Guide - eng

Overview