Stable Sync

1:N replication establishes all connections for replication between N nodes, so as the number of nodes increases, the network topology has a more complicated mesh network configuration. In such a complicated configuration environment, when operating without any standards, the direction of replication and synchronization between nodes is randomly formed inconsistently, which can cause data inconsistency.

This chapter describes the consistent bsr synchronization policy required in such a mesh network replication environment.

The following figure represents the case where the direction of replication and synchronization between nodes is the same or different.

If the direction of replication and synchronization is the same, there is nothing wrong with it. a different case is the problem. Assuming that the replication source node and the synchronization source node are different, if the replication and synchronization data are received as one target node, data integrity between the nodes may be broken. Replication data received from the primary is always up-to-date, but data blocks received from other secondary synchronization sources may be out-of-date. If data received from different sources is in the overlapping block area, some are up-to-date, but some may be recorded as old data, which causes problems.

We implemented this problem to ensure consistency in the mesh network by actively mediating the direction of replication and synchronization between nodes. Before explaining these synchronization policies, you need to understand some of the concepts behind them.

Stable node is a Primary node or a Secondary node, not a SyncTarget that is not connected to the Primary node.
Unstable node is a node connected to the primary node and a secondary node that is a SyncTarget. Unstable means that it may be constantly data changing.
Authoritative node is a relative node that made itself an Unstable node, and the Authoritative node is a Stable node at the same time.

The following synchronization policy works under this concept.

A stable node can be a SyncSource, and an unstable node cannot be a SyncSource.
When the SyncSource node becomes an unstable node, the synchronization in progress is stopped. Depending on the situation, when the unstable node is switched back to the stable node state, the stopped synchronization is resumed.
When one node becomes Primary, the interrupted synchronization is resumed with the Primary node as the SyncSource. Here, synchronization starts from the block where it was interrupted before.

The above process is illustrated in the figure below.

As such, the policy that matches the direction of synchronization and replication is called the Stable synchronization policy.

reconciliation resync

For example, if the Primary node is performing real-time replication to 2 Secondary nodes and the Primary is crashed and only 2 Secondary nodes are left, the two Secondary nodes cannot be guaranteed to be completely the same data. This is a natural result, since each node operates independently. However, it is not appropriate to operate with the two remaining nodes UpToDate without any action. At least, the data should be matched to each other through synchronization between the two nodes.

In this situation, bsr determines the node that has the most recent data between the secondary nodes and synchronizes it based on this to match the data consistency between the two nodes. This is called reconciliation resynchronization.

The data of the two nodes must be matched, whether the primary node that was crashed later restarts to join the cluster or not. In all operational situations, it is possible to guarantee data consistency between nodes in the cluster.

Resync Policy

Stable Sync

reconciliation resync