Backup

CloudNativePG natively supports online/hot backup of PostgreSQL clusters through continuous physical backup and WAL archiving. This means that the database is always up (no downtime required) and that you can recover at any point in time from the first available base backup in your system. The latter is normally referred to as "Point In Time Recovery" (PITR).

The operator can orchestrate a continuous backup infrastructure that is based on the Barman tool. Instead of using the classical architecture with a Barman server, which backs up many PostgreSQL instances, the operator relies on the barman-cloud-wal-archive, barman-cloud-check-wal-archive, barman-cloud-backup, barman-cloud-backup-list, and barman-cloud-backup-delete tools. As a result, base backups will be tarballs. Both base backups and WAL files can be compressed and encrypted.

For this, it is required to use an image with barman-cli-cloud included. You can use the image ghcr.io/cloudnative-pg/postgresql for this scope, as it is composed of a community PostgreSQL image and the latest barman-cli-cloud package.

Important

Always ensure that you are running the latest version of the operands in your system to take advantage of the improvements introduced in Barman cloud (as well as improve the security aspects of your cluster).

A backup is performed from a primary or a designated primary instance in a Cluster (please refer to replica clusters for more information about designated primary instances), or alternatively on a standby.

Common object stores

If you are looking for a specific object store such as AWS S3, Microsoft Azure Blob Storage, Google Cloud Storage, or MinIO Gateway, or a compatible provider, please refer to Appendix A - Common object stores.

On-demand backups

To request a new backup, you need to create a new Backup resource like the following one:

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: backup-example
spec:
  cluster:
    name: pg-backup

The operator will start to orchestrate the cluster to take the required backup using barman-cloud-backup. You can check the backup status using the plain kubectl describe backup <name> command:

Name:         backup-example
Namespace:    default
Labels:       <none>
Annotations:  API Version:  postgresql.cnpg.io/v1
Kind:         Backup
Metadata:
  Creation Timestamp:  2020-10-26T13:57:40Z
  Self Link:         /apis/postgresql.cnpg.io/v1/namespaces/default/backups/backup-example
  UID:               ad5f855c-2ffd-454a-a157-900d5f1f6584
Spec:
  Cluster:
    Name:  pg-backup
Status:
  Phase:       running
  Started At:  2020-10-26T13:57:40Z
Events:        <none>

When the backup has been completed, the phase will be completed like in the following example:

Name:         backup-example
Namespace:    default
Labels:       <none>
Annotations:  API Version:  postgresql.cnpg.io/v1
Kind:         Backup
Metadata:
  Creation Timestamp:  2020-10-26T13:57:40Z
  Self Link:         /apis/postgresql.cnpg.io/v1/namespaces/default/backups/backup-example
  UID:               ad5f855c-2ffd-454a-a157-900d5f1f6584
Spec:
  Cluster:
    Name:  pg-backup
Status:
  Backup Id:         20201026T135740
  Destination Path:  s3://backups/
  Endpoint URL:      http://minio:9000
  Phase:             completed
  s3Credentials:
    Access Key Id:
      Key:   ACCESS_KEY_ID
      Name:  minio
    Secret Access Key:
      Key:      ACCESS_SECRET_KEY
      Name:     minio
  Server Name:  pg-backup
  Started At:   2020-10-26T13:57:40Z
  Stopped At:   2020-10-26T13:57:44Z
Events:         <none>

Important

This feature will not backup the secrets for the superuser and the application user. The secrets are supposed to be backed up as part of the standard backup procedures for the Kubernetes cluster.

Scheduled backups

You can also schedule your backups periodically by creating a resource named ScheduledBackup. The latter is similar to a Backup but with an added field, called schedule.

This field allows you to define a six-term cron schedule specification, which includes seconds, as expressed in the Go cron package format.

This is an example of a scheduled backup:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: backup-example
spec:
  schedule: "0 0 0 * * *"
  backupOwnerReference: self
  cluster:
    name: pg-backup

The above example will schedule a backup every day at midnight because the schedule specifies zero for the second, minute, and hour, while specifying wildcard, meaning all, for day of the month, month, and day of the week.

In Kubernetes CronJobs, the equivalent expression is 0 0 * * * because seconds are not included.

Hint

Backup frequency might impact your recovery time object (RTO) after a disaster which requires a full or Point-In-Time recovery operation. Our advice is that you regularly test your backups by recovering them, and then measuring the time it takes to recover from scratch so that you can refine your RTO predictability. Recovery time is influenced by the size of the base backup and the amount of WAL files that need to be fetched from the archive and replayed during recovery (remember that WAL archiving is what enables continuous backup in PostgreSQL!). Based on our experience, a weekly base backup is more than enough for most cases - while it is extremely rare to schedule backups more frequently than once a day.

ScheduledBackups can be suspended if needed by setting .spec.suspend: true, this will stop any new backup to be scheduled as long as the option is set to false.

In case you want to issue a backup as soon as the ScheduledBackup resource is created you can set .spec.immediate: true.

Note

.spec.backupOwnerReference indicates which ownerReference should be put inside the created backup resources.

  • none: no owner reference for created backup objects (same behavior as before the field was introduced)
  • self: sets the Scheduled backup object as owner of the backup
  • cluster: set the cluster as owner of the backup

WAL archiving

WAL archiving is enabled as soon as you choose a destination path and you configure your cloud credentials.

If required, you can choose to compress WAL files as soon as they are uploaded and/or encrypt them:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
[...]
spec:
  backup:
    barmanObjectStore:
      [...]
      wal:
        compression: gzip
        encryption: AES256

You can configure the encryption directly in your bucket, and the operator will use it unless you override it in the cluster configuration.

PostgreSQL implements a sequential archiving scheme, where the archive_command will be executed sequentially for every WAL segment to be archived.

Important

By default, CloudNativePG sets archive_timeout to 5min, ensuring that WAL files, even in case of low workloads, are closed and archived at least every 5 minutes, providing a deterministic time-based value for your Recovery Point Objective (RPO). Even though you change the value of the archive_timeout setting in the PostgreSQL configuration, our experience suggests that the default value set by the operator is suitable for most use cases.

When the bandwidth between the PostgreSQL instance and the object store allows archiving more than one WAL file in parallel, you can use the parallel WAL archiving feature of the instance manager like in the following example:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
[...]
spec:
  backup:
    barmanObjectStore:
      [...]
      wal:
        compression: gzip
        maxParallel: 8
        encryption: AES256

In the previous example, the instance manager optimizes the WAL archiving process by archiving in parallel at most eight ready WALs, including the one requested by PostgreSQL.

When PostgreSQL will request the archiving of a WAL that has already been archived by the instance manager as an optimization, that archival request will be just dismissed with a positive status.

Backup from a standby

Taking a base backup requires to scrape the whole data content of the PostgreSQL instance on disk, possibly resulting in I/O contention with the actual workload of the database.

For this reason, CloudNativePG allows you to take advantage of a feature which is directly available in PostgreSQL: backup from a standby.

By default, backups will run on the most aligned replica of a Cluster. If no replicas are available, backups will run on the primary instance.

Info

Although the standby might not always be up to date with the primary, in the time continuum from the first available backup to the last archived WAL this is normally irrelevant. The base backup indeed represents the starting point from which to begin a recovery operation, including PITR. Similarly to what happens with pg_basebackup, when backing up from a standby we do not force a switch of the WAL on the primary. This might produce unexpected results in the short term (before archive_timeout kicks in) in deployments with low write activity.

If you prefer to always run backups on the primary, you can set the backup target to primary as outlined in the example below:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  [...]
spec:
  backup:
    target: "primary"

When the backup target is set to prefer-standby, such policy will ensure backups are run on the most up-to-date available secondary instance, or if no other instance is available, on the primary instance.

By default, when not otherwise specified, target is automatically set to take backups from a standby.

The backup target specified in the Cluster can be overridden in the Backup and ScheduledBackup types, like in the following example:

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  [...]
spec:
  cluster:
    name: [...]
  target: "primary"

In the previous example, CloudNativePG will invariably choose the primary instance even if the Cluster is set to prefer replicas.

Retention policies

CloudNativePG can manage the automated deletion of backup files from the backup object store, using retention policies based on the recovery window.

Internally, the retention policy feature uses barman-cloud-backup-delete with --retention-policy “RECOVERY WINDOW OF {{ retention policy value }} {{ retention policy unit }}”.

For example, you can define your backups with a retention policy of 30 days as follows:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
[...]
spec:
  backup:
    barmanObjectStore:
      destinationPath: "<destination path here>"
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS_SECRET_KEY
    retentionPolicy: "30d"

There's more ...

The recovery window retention policy is focused on the concept of Point of Recoverability (PoR), a moving point in time determined by current time - recovery window. The first valid backup is the first available backup before PoR (in reverse chronological order). CloudNativePG must ensure that we can recover the cluster at any point in time between PoR and the latest successfully archived WAL file, starting from the first valid backup. Base backups that are older than the first valid backup will be marked as obsolete and permanently removed after the next backup is completed.

Compression algorithms

CloudNativePG by default archives backups and WAL files in an uncompressed fashion. However, it also supports the following compression algorithms via barman-cloud-backup (for backups) and barman-cloud-wal-archive (for WAL files):

  • bzip2
  • gzip
  • snappy

The compression settings for backups and WALs are independent. See the DataBackupConfiguration and WALBackupConfiguration sections in the API reference.

It is important to note that archival time, restore time, and size change between the algorithms, so the compression algorithm should be chosen according to your use case.

The Barman team has performed an evaluation of the performance of the supported algorithms for Barman Cloud. The following table summarizes a scenario where a backup is taken on a local MinIO deployment. The Barman GitHub project includes a deeper analysis.

Compression Backup Time (ms) Restore Time (ms) Uncompressed size (MB) Compressed size (MB) Approx ratio
None 10927 7553 395 395 1:1
bzip2 25404 13886 395 67 5.9:1
gzip 116281 3077 395 91 4.3:1
snappy 8134 8341 395 166 2.4:1

Tagging of backup objects

Barman 2.18 introduces support for tagging backup resources when saving them in object stores via barman-cloud-backup and barman-cloud-wal-archive. As a result, if your PostgreSQL container image includes Barman with version 2.18 or higher, CloudNativePG enables you to specify tags as key-value pairs for backup objects, namely base backups, WAL files and history files.

You can use two properties in the .spec.backup.barmanObjectStore definition:

  • tags: key-value pair tags to be added to backup objects and archived WAL file in the backup object store
  • historyTags: key-value pair tags to be added to archived history files in the backup object store

The excerpt of a YAML manifest below provides an example of usage of this feature:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
[...]
spec:
  backup:
    barmanObjectStore:
      [...]
      tags:
        backupRetentionPolicy: "expire"
      historyTags:
        backupRetentionPolicy: "keep"