Backup on volume snapshots
The initial release of volume snapshots (version 1.21.0) only supported cold backups, which required fencing of the instance. This limitation has been waived starting with version 1.21.1. Given the minimal impact of the change on the code, maintainers have decided to backport this feature immediately instead of waiting for version 1.22.0 to be out, and make online backups the default behavior on volume snapshots too. If you are planning to rely instead on cold backups, make sure you follow the instructions below.
As noted in the backup document, a cold snapshot explicitly set to target the primary will result in the primary being fenced for the duration of the backup, rendering the cluster read-only during that For safety, in a cluster already containing fenced instances, a cold snapshot is rejected.
CloudNativePG is one of the first known cases of database operators that directly leverages the Kubernetes native Volume Snapshot API for both backup and recovery operations, in an entirely declarative way.
About standard Volume Snapshots
Volume snapshotting was first introduced in
Kubernetes 1.12 (2018) as alpha,
promoted to beta in 1.17 (2019),
and moved to GA in 1.20 (2020).
It’s now stable, widely available, and standard, providing 3 custom resource
This Kubernetes feature defines a generic interface for:
- the creation of a new volume snapshot, starting from a PVC
- the deletion of an existing snapshot
- the creation of a new volume from a snapshot
Kubernetes delegates the actual implementation to the underlying CSI drivers (not all of them support volume snapshots). Normally, storage classes that provide volume snapshotting support incremental and differential block level backup in a transparent way for the application, which can delegate the complexity and the independent management down the stack, including cross-cluster availability of the snapshots.
For Volume Snapshots to work with a CloudNativePG cluster, you need to ensure
that each storage class used to dynamically provision the PostgreSQL volumes
walStorage sections) support volume snapshots.
Given that instructions vary from storage class to storage class, please refer to the documentation of the specific storage class and related CSI drivers you have deployed in your Kubernetes system.
Normally, it is the
that is responsible to ensure that snapshots can be taken from persistent
volumes of a given storage class, and managed as
It is your responsibility to verify with the third party vendor that volume snapshots are supported. CloudNativePG only interacts with the Kubernetes API on this matter and we cannot support issues at the storage level for each specific CSI driver.
How to configure Volume Snapshot backups
CloudNativePG allows you to configure a given Postgres cluster for Volume
Snapshot backups through the
Please refer to
in the API reference for a full list of options.
A generic example with volume snapshots (assuming that PGDATA and WALs share the same storage class) is the following:
# Volume snapshot backups
# WAL archive
As you can see, the
backup section contains both the
(controlling physical base backups on volume snapshots) and the
barmanObjectStore one (controlling the WAL archive).
Once you have defined the
barmanObjectStore, you can decide to use
both volume snapshot and object store backup strategies simultaneously
to take physical backups.
volumeSnapshot.className option allows you to reference the default
VolumeSnapshotClass object used for all the storage volumes you have
defined in your PostgreSQL cluster.
In case you are using a different storage class for
WAL files, you can specify a separate
that volume through the
walClassName option (which defaults to
the same value as
Once a cluster is defined for volume snapshot backups, you need to define
ScheduledBackup resource that requests such backups on a periodic basis.
Hot and cold backups
By default, CloudNativePG requests an online/hot backup on volume snapshots, using the PostgreSQL defaults of the low-level API for base backups:
- it doesn't request an immediate checkpoint when starting the backup procedure
- it waits for the WAL archiver to archive the last segment of the backup when terminating the backup procedure
The default values are suitable for most production environments. Hot backups are consistent and can be used to perform snapshot recovery, as we ensure WAL retention from the start of the backup through a temporary replication slot. However, our recommendation is to rely on cold backups for that purpose.
You can explicitly change the default behavior through the following options in
.spec.backup.volumeSnapshot stanza of the
falseas a value
onlineConfiguration.immediateCheckpoint: whether you want to request an immediate checkpoint before you start the backup procedure or not; technically, it corresponds to the
fastargument you pass to the
pg_start_backup()function in PostgreSQL, accepting
onlineConfiguration.waitForArchive: whether you want to wait for the archiver to process the last segment of the backup or not; technically, it corresponds to the
wait_for_archiveargument you pass to the
pg_stop_backup()function in PostgreSQL, accepting
If you want to change the default behavior of your Postgres cluster to take
cold backups by default, all you need to do is add the
online: false option
to your manifest, as follows:
If you are instead requesting an immediate checkpoint as the default behavior, you can add this section:
Overriding the default behavior
You can change the default behavior defined in the cluster resource by setting
different values for
online and, if needed,
onlineConfiguration in the
For example, in case you want to issue an on-demand cold backup, you can
Backup object with
Similarly, for the ScheduledBackup:
schedule: "0 0 0 * * *"
Persistence of volume snapshot objects
VolumeSnapshot objects created by CloudNativePG are retained after
Backup object that originated them, or the
Cluster they refer to.
Such behavior is controlled by the
option which accepts the following values:
none: no ownership is set, meaning that
VolumeSnapshotobjects persist after the
Clusterresources are removed
VolumeSnapshotobject is owned by the
Backupresource that originated it, and when the backup object is removed, the volume snapshot is also removed
VolumeSnapshotobject is owned by the
Clusterresource that is backed up, and when the Postgres cluster is removed, the volume snapshot is also removed
In case a
VolumeSnapshot is deleted, the
deletionPolicy specified in the
VolumeSnapshotContent is evaluated:
- if set to
VolumeSnapshotContentobject is kept
- if set to
VolumeSnapshotContentobject is removed as well
VolumeSnapshotContent objects do not keep all the information regarding the
backup and the cluster they refer to (like the annotations and labels that
are contained in the
VolumeSnapshot object). Although possible, restoring
from just this kind of object might not be straightforward. For this reason,
our recommendation is to always backup the
even using a Kubernetes level data protection solution.
The value in
VolumeSnapshotContent is determined by the
in the corresponding
VolumeSnapshotClass definition, which is
referenced in the
Please refer to the Kubernetes documentation on Volume Snapshot Classes for details on this standard behavior.
The following example shows how to configure volume snapshot base backups on an
EKS cluster on AWS using the
ebs-sc storage class and the
volume snapshot class.
If you are interested in testing the example, please read "Volume Snapshots" for the Amazon Elastic Block Store (EBS) CSI driver for detailed instructions on the installation process for the storage class and the snapshot class.
The following manifest creates a
Cluster that is ready to be used for volume
snapshots and that stores the WAL archive in a S3 bucket via IAM role for the
Service Account (IRSA, see AWS S3):
schedule: '0 0 0 * * *'
The last resource defines daily volume snapshot backups at midnight, requesting one immediately after the cluster is created.