Snapshot


FSR 1.2 or higher

Overview

Snapshot is a technology that backs up data by capturing the storage's file system at a specific point in time like taking a picture. If the latest data is damaged due to an accident during replication operation or data integrity is compromised due to exposure to security issues such as malware infection, it is difficult to respond only with the replication function. If you have backed up your data with snapshots in advance, you can avoid the worst by recovering to a specific point in time. In other words, the FSR snapshot function serves as an auxiliary function to replication.

Snapshots are stored as images within each node's disk volume and are individually controlled and processed within the node. This means that snapshots between cluster nodes are not interoperable. In other words, if you need to restore snapshots after operating snapshots for each node, you can restore individual nodes.


Configurations


Environment

The first thing to consider before operating the snapshot is to designate the volume where the snapshot will be stored. Snapshots can be kept within the clone volume or stored externally on another disk volume. This should be determined by looking at the used space and free space on the volume. If there is not much free space, it is recommended to specify an external volume to save the snapshot. 

There is a capacity calculation issue for the space to store the snapshots. Some documentation recommends free space of several tens of percent of the total volume. However, this does not imply a maximum capacity for snapshots and sometimes misunderstands users about the snapshot space configuration.

Snapshots are based on Copy On Write (COW) technology. It is a way to save the original when changes occur to the data. Therefore, in the early days, not long after the snapshot was created, the capacity of the snapshot is small because there are not many changes. However, as time passes, the number of data changes gradually increases, and assuming that all areas of data are changed, snapshot space to store the entire original data might be required. In conclusion, we can say that the maximum capacity required for a snapshot is equal to the capacity of a full backup of your data. 

For example, if you are using 100GB for a 1TB volume, the required snapshot capacity is 100GB. If the used capacity exceeds 100GB and reaches 150GB, 150GB will be required to record a new snapshot.

The following are specific configuration examples for operating snapshots.

  • Replication volume 1TB, used space 300GB
  • Snapshot recording once a day, snapshot schedule maintained for 1 week (7 days)

In the case of the example above, space for 7 snapshot images is required for 1 week, so the space of the volume for storing snapshots is at least 300GB * 7 = 2.1TB, and the maximum is 1TB * 7 = 7TB. The space required for snapshots right now will require as much capacity as the currently used space, but the volume used can increase over time, so keep this in mind and consider it as the maximum capacity.

  • FSR snapshots follow the specifications of the Volume Shadow Copy Service on Windows.
    • The maximum volume supported by VSS in Windows is 64TB. Volumes larger than 64 TB do not support snapshots.
    • Up to 64 snapshot images can be written to one volume. If more snapshots are recorded, the oldest snapshot is deleted.



Snapshot type

To operate a snapshot, you need to specify the snapshot type. Snapshots have copy, full, differential, and incremental options depending on the recording method, and are specified in the type item of the snapshot section of the configuration file.

  • Copy: Back up all data and application logs. This is the default.
  • Full: Backs up all data.
  • Differential: Backs up for all differences from the last full backup.
  • Incremental: Backs up incrementally from the last full or incremental backup.

In FSR, the copy type is specified by default, so if there is no special request, snapshots are recorded and backed up with the copy type. However, since copy backups cannot serve as a basis for differential and incremental backups, differential and incremental backups may require that they be written as full backups first. This depends on the type of backup supported by the application.


Pre/Post Processing

FSR snapshots are oriented toward application consistency. In order for the application to acquire consistent snapshots, the following procedures must be performed.

  1. Before Recording a Snapshot
    1. It suspends application I/O operations and flushes the application's memory buffer to update the disk with the latest data.
    2. Flush the file system cache for the volume.
  2. Record a snapshot.
  3. Resume application I/O operations.

As you can see from the procedure, you can see that there is a necessary procedure before and after recording the snapshot. Users have the opportunity to control the application in the form of a script through FSR's pre-/post-snapshot handlers. 

The VSS service of Windows proposes to implement a VSS Writer in the application to ensure such application-consistent snapshots. VSS interworks with the application's VSS Writer to implement application-consistent snapshots by performing the above procedures sequentially when a snapshot request is received. Therefore, if you target applications that implement VSS Writer , you do not need to write pre/post handlers. The following are representative programs that support VSS Writer.

In reality, most applications other than the above programs do not implement VSS Writer.

If the application cannot be controlled by the above procedure, at least the file system cache should be flushed and recorded as a snapshot with file system consistency. will be


Control

FSR's snapshot function is an additional function to perform backup of the replication target during replication operation. In other words, snapshots are managed and controlled in units of replication resources and operated as sub-concepts of replication resources. 

Each control command is as follows.


Check environments for support

First, check if your current environment supports the snapshot feature.

λ fsradm status -v
r2:node1 role:secondary file:up_to_date fs-type:ntfs pending:0 lock:off
  last-promoted:2022-11-25T14:06:36+09:00 snapshot:available
  node2 state:established peer-state:established role:secondary file:outdated pending:0
    repl-started:2022-11-25T14:06:36+09:00 last-synced:2022-11-25T14:06:37+09:00 out-of-sync:none

As above in the status output command, if the resource's snapshot status ( snapshotitem ) unavailableis not an available environment.


create and delete

Snapshots can be created after configuring the clone resource, meta-create, and starting up. The created snapshot is managed by being included in the meta information of FSR, and is associated with the FSR resource until the meta information is deleted. Snapshot images are not deleted when the resource is deleted. Explicitly deleting the snapshot or clearing the meta disassociates the resource from the snapshot.


A resource and snapshot ID are required when creating a snapshot. The snapshot ID is determined by the user.

λ fsradm snapshot create r2 test
done

Snapshot creation  is performed asynchronously, and progress can be checked through status queries.

λ fsradm status -v
r2:node1 role:secondary file:up_to_date fs-type:ntfs pending:0 lock:off
  last-promoted:2022-11-25T14:06:36+09:00 snapshot:creating
  doing snapshot set...
  node2 state:established peer-state:established role:secondary file:outdated pending:0
    repl-started:2022-11-25T14:06:36+09:00 last-synced:2022-11-25T14:06:37+09:00 out-of-sync:none

When creation is complete, the snapshot status availablechanges back to .

Delete unnecessary snapshots with the following command.

λ fsradm snapshot delete r2 test
done


check

λ fsradm snapshot list r2
r2:node1 count:1
  snapshot-id:test created:2022-11-28T14:37:59+09:00 state:available

You can check the created snapshots through list search, and  stateoutput whether each snapshot image is valid or not. If any of the snapshot images have been deleted not_exists, they can be output as , and additional information can be checked using the detailed output option or through individual searches.

λ fsradm snapshot show r2 test
created:2022-11-28T14:37:59+09:00 state:available
directories:
  C:\r2
    recursive:true
images:
  index:0
    guid:{09DFE010-BAE8-4581-BC9E-836A9F556ACA}
    mount-path:C:\
    volume:\\?\Volume{d0c8016a-dc90-11ec-80b3-806e6f6e6963}\
    shadow-volume:\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy8
    created:2022-11-28T14:37:54+09:00


restore

restore

Snapshot recovery must be preceded by the following operations.

  • Demote: Running nodes cannot perform snapshot recovery.

  • Termination of connection: To avoid becoming a target node, the connection must be terminated to enter Standalone state.

  • File locking: More than read-only file locking should be done to prevent file changes during recovery.

λ fsradm snapshot recovery r2 test
done

Because restore commands are performed asynchronously, like snapshot creation, progress must be verified through status queries.

export

λ fsradm snapshot export r2 test e:\test
done

Executing the above command copies the contents of the snapshot to the specified path. If there are multiple replication destination paths, you can export by specifying multiple destination paths. If exporting all replication destination paths to one path, you only need to specify options --combineand create one destination path.

expose

You can directly access the file by connecting the created snapshot to the specified path. However, since a snapshot can have more than one image, you must enter the index (number) of the image. The image list can be checked through the snapshot lookup command.

λ fsradm snapshot expose r2 test 0 y:
done

When you're done using it, you can disconnect from the path like so:

λ fsradm snapshot unexpose r2 test 0
done

If you cannot remember which snapshot you linked to, you can only view linked snapshots.

λ fsradm snapshot list-exposed
r2:node1 count:1
  snapshot-id:test created:2022-11-28T14:37:59+09:00 state:available
  directories:
    C:\r2
      recursive:true
  images:
    index:0
      guid:{09DFE010-BAE8-4581-BC9E-836A9F556ACA}
      mount-path:C:\
      volume:\\?\Volume{d0c8016a-dc90-11ec-80b3-806e6f6e6963}\
      shadow-volume:\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy8
      created:2022-11-28T14:37:54+09:00
      expose-path:y:\


cleanup

Snapshots that are no longer valid due to the snapshot image being deleted can be deleted in bulk with the following command.

λ fsradm snapshot cleanup r2
done


Troubleshootings


Windows

Backup type not supported error

If the backup type specified in the configuration file is not supported by the VSS writer installed in the operating system, an error may occur during the creation process. Here is the error that occurs when SqlServerWriter, the VSS writer for MS SQL, does not support incremental backups, but FSR specifies to do incremental backups.

writer "SqlServerWriter" only supports (full, copy, differential): backup type not available

You need to change to one of the backup types printed in the error and create a snapshot.


Writer error

If the VSS writer encounters an error during snapshot creation, no details are communicated to the VSS requester (FSR). For example, when an error occurs in the VSS writer in MS SQL, the following error is output:

writer "SqlServerWriter": VSS_E_WRITERERROR_NONRETRYABLE

The types of errors that can occur include:

  • VSS_E_WRITERERROR_INCONSISTENTSNAPSHOT

    • Only part of the volume was backed up. It is judged to be a situation that cannot be used for restoration and is treated as a failure. The cause must be determined by looking at the event log or the corresponding VSS writer's log.

  • VSS_E_WRITERERROR_OUTOFRESOURCES

    • Insufficient memory or other system resources. You will need to free up resources and try again.

  • VSS_E_WRITERERROR_TIMEOUT

    • An action in the writer caused a timeout. Another application may be using too many resources.

  • VSS_E_WRITERERROR_RETRYABLE

    • When this error occurs, FSR will automatically retry the operation. If this error is returned, it means that a total of five retries have been performed and the same error is returned. The cause must be determined by looking at the event log or the corresponding VSS writer's log.

  • VSS_E_WRITERERROR_NONRETRYABLE

    • Author's operation failed. The cause must be determined by looking at the event log or the corresponding VSS writer's log.