Replica clusters
A replica cluster is a CloudNativePG Cluster
resource designed to
replicate data from another PostgreSQL instance, ideally also managed by
CloudNativePG.
Typically, a replica cluster is deployed in a different Kubernetes cluster in another region. These clusters can be configured to perform cascading replication and can rely on object stores for data replication from the source, as detailed further down.
There are primarily two use cases for replica clusters:
-
Disaster Recovery and High Availability: Enhance disaster recovery and, to some extent, high availability of a CloudNativePG cluster across different Kubernetes clusters, typically located in different regions. In CloudNativePG terms, this is known as a "Distributed Topology".
-
Read-Only Workloads: Create standalone replicas of a PostgreSQL cluster for purposes such as reporting or Online Analytical Processing (OLAP). These replicas are primarily for read-only workloads. In CloudNativePG terms, this is referred to as a "Standalone Replica Cluster".
For example, the diagram below — taken from the "Architecture" section — illustrates a distributed PostgreSQL topology spanning two Kubernetes clusters, with a symmetric replica cluster primarily serving disaster recovery purposes.
Basic Concepts
CloudNativePG builds on the PostgreSQL replication framework, allowing you to create and synchronize a PostgreSQL cluster from an existing source cluster using the replica cluster feature — described in this section. The source can be a primary cluster or another replica cluster (cascading replication).
About PostgreSQL Roles
A replica cluster operates in continuous recovery mode, meaning no changes to
the database, including the catalog and global objects like roles or databases,
are permitted. These changes are deferred until the Cluster
transitions to
primary. During this phase, global objects such as roles remain as defined in
the source cluster. CloudNativePG applies any local redefinitions once the
cluster is promoted.
If you are not planning to promote the cluster (e.g., for read-only workloads) or if you intend to detach completely from the source cluster once the replica cluster is promoted, you don't need to take any action. This is normally the case of the "Standalone Replica Cluster".
If you are planning to promote the cluster at some point, CloudNativePG will manage the following roles and passwords when transitioning from replica cluster to primary:
- the application user
- the superuser (if you are using it)
- any role defined using the declarative interface
If your intention is to seamlessly ensure that the above roles and passwords
don't change, you need to define the necessary secrets for the above in each
Cluster
.
This is normally the case of the "Distributed Topology".
Bootstrapping a Replica Cluster
The first step is to bootstrap the replica cluster using one of the following methods:
- Streaming replication via
pg_basebackup
- Recovery from a volume snapshot
- Recovery from a Barman Cloud backup in an object store
For detailed instructions on cloning a PostgreSQL server using pg_basebackup
(streaming) or recovery (volume snapshot or object store), refer to the
"Bootstrap" section.
Configuring Replication
Once the base backup for the replica cluster is available, you need to define how changes will be replicated from the origin using PostgreSQL continuous recovery. There are three main options:
- Streaming Replication: Set up streaming replication between the replica cluster and the source. This method requires configuring network connections and implementing appropriate administrative and security measures to ensure seamless data transfer.
- WAL Archive: Use the WAL (Write-Ahead Logging) archive stored in an
object store. WAL files are regularly transferred from the source cluster to
the object store, from where the
barman-cloud-wal-restore
utility retrieves them for the replica cluster. - Hybrid Approach: Combine both streaming replication and WAL archive methods. PostgreSQL can manage and switch between these two approaches as needed to ensure data consistency and availability.
Defining an External Cluster
When configuring the external cluster, you have the following options:
barmanObjectStore
section:- Enables use of the WAL archive, with CloudNativePG automatically setting
the
restore_command
in the designated primary instance. - Allows bootstrapping the replica cluster from an object store using the
recovery
section if volume snapshots are not feasible.
- Enables use of the WAL archive, with CloudNativePG automatically setting
the
connectionParameters
section:- Enables bootstrapping the replica cluster via streaming replication using
the
pg_basebackup
section. - CloudNativePG automatically sets the
primary_conninfo
option in the designated primary instance, initiating a WAL receiver process to connect to the source cluster and receive data.
- Enables bootstrapping the replica cluster via streaming replication using
the
Backup and Symmetric Architectures
The replica cluster can perform backups to a reserved object store from the designated primary, supporting symmetric architectures in a distributed environment. This architectural choice is crucial as it ensures the cluster is prepared for promotion during a controlled data center switchover or a failover following an unexpected event.
Distributed Architecture Flexibility
You have the flexibility to design your preferred distributed architecture for a PostgreSQL database, choosing from:
- Private Cloud: Spanning multiple Kubernetes clusters in different data centers.
- Public Cloud: Spanning multiple Kubernetes clusters in different regions.
- Hybrid Cloud: Combining private and public clouds.
- Multi-Cloud: Spanning multiple Kubernetes clusters across different regions and Cloud Service Providers.
Setting Up a Replica Cluster
To set up a replica cluster from a source cluster, follow these steps to create a cluster YAML file and configure it accordingly:
-
Define External Clusters:
- In the
externalClusters
section, specify the replica cluster. - For a distributed PostgreSQL topology aimed at disaster recovery (DR) and high availability (HA), this section should be defined for every PostgreSQL cluster in the distributed database.
- In the
-
Bootstrap the Replica Cluster:
- Streaming Bootstrap: Use the
pg_basebackup
section for bootstrapping via streaming replication. - Snapshot/Object Store Bootstrap: Use the
recovery
section to bootstrap from a volume snapshot or an object store.
- Streaming Bootstrap: Use the
- Continuous Recovery Strategy: Define this in the
.spec.replica
stanza:- Distributed Topology: Configure using the
primary
,source
, andself
fields along with the distributed topology defined inexternalClusters
. This allows CloudNativePG to declaratively control the demotion of a primary cluster and the subsequent promotion of a replica cluster using a promotion token. - Standalone Replica Cluster: Enable continuous recovery using the
enabled
option and set thesource
field to point to anexternalClusters
name. This configuration is suitable for creating replicas primarily intended for read-only workloads.
- Distributed Topology: Configure using the
Both the Distributed Topology and the Standalone Replica Cluster strategies for continuous recovery are thoroughly explained below.
Distributed Topology
Important
The Distributed Topology strategy was introduced in CloudNativePG 1.24.
Planning for a Distributed PostgreSQL Database
As Dwight Eisenhower famously said, "Planning is everything", and this holds true for designing PostgreSQL architectures in Kubernetes.
First, conceptualize your distributed topology on paper, and then translate it into a CloudNativePG API configuration. This configuration primarily involves:
- The
externalClusters
section, which must be included in everyCluster
definition within your distributed PostgreSQL setup. - The
.spec.replica
stanza, specifically theprimary
,source
, and (optionally)self
fields.
For example, suppose you want to deploy a PostgreSQL cluster distributed across two Kubernetes clusters located in Southern Europe and Central Europe.
In this scenario, assume you have CloudNativePG installed in the Southern
Europe Kubernetes cluster, with a PostgreSQL Cluster
named cluster-eu-south
acting as the primary. This cluster has continuous backup configured with a
local object store. This object store is also accessible by the PostgreSQL
Cluster
named cluster-eu-central
, installed in the Central European
Kubernetes cluster. Initially, cluster-eu-central
functions as a replica
cluster. Following a symmetric approach, it also has a local object store for
continuous backup, which needs to be read by cluster-eu-south
. The recovery
in this setup relies solely on WAL shipping, with no streaming connection
between the two clusters.
Here’s how you would configure the externalClusters
section for both
Cluster
resources:
# Distributed topology configuration
externalClusters:
- name: cluster-eu-south
barmanObjectStore:
destinationPath: s3://cluster-eu-south/
# Additional configuration
- name: cluster-eu-central
barmanObjectStore:
destinationPath: s3://cluster-eu-central/
# Additional configuration
The .spec.replica
stanza for the cluster-eu-south
PostgreSQL primary
Cluster
should be configured as follows:
replica:
primary: cluster-eu-south
source: cluster-eu-central
Meanwhile, the .spec.replica
stanza for the cluster-eu-central
PostgreSQL
replica Cluster
should be configured as:
replica:
primary: cluster-eu-south
source: cluster-eu-south
In this configuration, when the primary
field matches the name of the
Cluster
resource (or .spec.replica.self
if a different one is used), the
current cluster is considered the primary in the distributed topology.
Otherwise, it is set as a replica from the source
(in this case, using the
Barman object store).
This setup allows you to efficiently manage a distributed PostgreSQL architecture across multiple Kubernetes clusters, ensuring both high availability and disaster recovery through controlled switchover of a primary PostgreSQL cluster using declarative configuration.
Controlled switchover in a distributed topology is a two-step process involving:
- Demotion of a primary cluster to a replica cluster
- Promotion of a replica cluster to a primary cluster
These processes are described in the next sections.
Important
Before you proceed, ensure you review the "About PostgreSQL Roles" section
above and use identical role definitions, including secrets, in all
Cluster
objects participating in the distributed topology.
Demoting a Primary to a Replica Cluster
CloudNativePG provides the functionality to demote a primary cluster to a
replica cluster. This action is typically planned when transitioning the
primary role from one data center to another. The process involves demoting the
current primary cluster (e.g., cluster-eu-south
) to a replica cluster and
subsequently promoting the designated replica cluster (e.g.,
cluster-eu-central
) to primary when fully synchronized.
Provided you have defined an external cluster in the current primary Cluster
resource that points to the replica cluster that's been selected to become the
new primary, all you need to do is change the primary
field as follows:
replica:
primary: cluster-eu-central
source: cluster-eu-central
When the primary PostgreSQL cluster is demoted, write operations are no longer possible. CloudNativePG then:
-
Archives the WAL file containing the shutdown checkpoint as a
.partial
file in the WAL archive. -
Generates a
demotionToken
in the status, a base64-encoded JSON structure containing relevant information frompg_controldata
such as the system identifier, the timestamp, timeline ID, REDO location, and REDO WAL file of the latest checkpoint.
The first step is necessary to demote/promote using solely the WAL archive to feed the continuous recovery process (without streaming replication).
The second step, generation of the .status.demotionToken
, will ensure a
smooth demotion/promotion process, without any data loss and without rebuilding
the former primary.
At this stage, the former primary has transitioned to a replica cluster,
awaiting WAL data from the new global primary: cluster-eu-central
.
To proceed with promoting the other cluster, you need to retrieve the
demotionToken
from cluster-eu-south
using the following command:
kubectl get cluster cluster-eu-south \
-o jsonpath='{.status.demotionToken}'
You can obtain the demotionToken
using the cnpg
plugin by checking the
cluster's status. The token is listed under the Demotion token
section.
Note
The demotionToken
obtained from cluster-eu-south
will serve as the
promotionToken
for cluster-eu-central
.
You can verify the role change using the cnpg
plugin, checking the status of
the cluster:
kubectl cnpg status cluster-eu-south
Promoting a Replica to a Primary Cluster
To promote a PostgreSQL replica cluster (e.g., cluster-eu-central
) to a
primary cluster and make the designated primary an actual primary instance,
you need to perform the following steps simultaneously:
- Set the
.spec.replica.primary
to the name of the current replica cluster to be promoted (e.g.,cluster-eu-central
). - Set the
.spec.replica.promotionToken
with the value obtained from the former primary cluster (refer to "Demoting a Primary to a Replica Cluster").
The updated replica
section in cluster-eu-central
's spec should look like
this:
replica:
primary: cluster-eu-central
promotionToken: <PROMOTION_TOKEN>
source: cluster-eu-south
Warning
It is crucial to apply the changes to the primary
and promotionToken
fields simultaneously. If the promotion token is omitted, a failover will be
triggered, necessitating a rebuild of the former primary.
After making these adjustments, CloudNativePG will initiate the promotion of
the replica cluster to a primary cluster. Initially, CloudNativePG will wait
for the designated primary cluster to replicate all Write-Ahead Logging (WAL)
information up to the specified Log Sequence Number (LSN) contained in the
token. Once this target is achieved, the promotion process will commence. The
new primary cluster will switch timelines, archive the history file and new
WAL, thereby unblocking the replication process in the cluster-eu-south
cluster, which will then operate as a replica.
To verify the role change, use the cnpg
plugin to check the status of the
cluster:
kubectl cnpg status cluster-eu-central
This command will provide you with the current status of cluster-eu-central
,
confirming its promotion to primary.
By following these steps, you ensure a smooth and controlled promotion process, minimizing disruption and maintaining data integrity across your PostgreSQL clusters.
Standalone Replica Clusters
Important
Standalone Replica Clusters were previously known as Replica Clusters before the introduction of the Distributed Topology strategy in CloudNativePG 1.24.
In CloudNativePG, a Standalone Replica Cluster is a PostgreSQL cluster in continuous recovery with the following configurations:
.spec.replica.enabled
set totrue
- A physical replication source defined via the
.spec.replica.source
field, pointing to anexternalClusters
name
When .spec.replica.enabled
is set to false
, the replica cluster exits
continuous recovery mode and becomes a primary cluster, completely detached
from the original source.
Warning
Disabling replication is an irreversible operation. Once replication is disabled and the designated primary is promoted to primary, the replica cluster and the source cluster become two independent clusters definitively.
Important
Standalone replica clusters are suitable for several use cases, primarily involving read-only workloads. If you are planning to setup a disaster recovery solution, look into "Distributed Topology" above.
Main Differences with Distributed Topology
Although Standalone Replica Clusters can be used for disaster recovery purposes, they differ from the "Distributed Topology" strategy in several key ways:
- Lack of Distributed Database Concept: Standalone Replica Clusters do not support the concept of a distributed database, whether in simple forms (two clusters) or more complex configurations (e.g., three clusters in a circular topology).
- No Global Primary Cluster: There is no notion of a global primary cluster in Standalone Replica Clusters.
- No Controlled Switchover: A Standalone Replica Cluster can only be promoted to primary. The former primary cluster must be re-cloned, as controlled switchover is not possible.
Failover is identical in both strategies, requiring the former primary to be re-cloned if it ever comes back up.
Example of Standalone Replica Cluster using pg_basebackup
This first example defines a standalone replica cluster using streaming replication in both bootstrap and continuous recovery. The replica cluster connects to the source cluster using TLS authentication.
You can check the sample YAML
in the samples/
subdirectory.
Note the bootstrap
and replica
sections pointing to the source cluster.
bootstrap:
pg_basebackup:
source: cluster-example
replica:
enabled: true
source: cluster-example
The previous configuration assumes that the application database and its owning
user are set to the default, app
. If the PostgreSQL cluster being restored
uses different names, you must specify them as documented in Configure the application database.
You should also consider copying over the application user secret from
the original cluster and keep it synchronized with the source.
See "About PostgreSQL Roles" for more details.
In the externalClusters
section, remember to use the right namespace for the
host in the connectionParameters
sub-section.
The -replication
and -ca
secrets should have been copied over if necessary,
in case the replica cluster is in a separate namespace.
externalClusters:
- name: <MAIN-CLUSTER>
connectionParameters:
host: <MAIN-CLUSTER>-rw.<NAMESPACE>.svc
user: streaming_replica
sslmode: verify-full
dbname: postgres
sslKey:
name: <MAIN-CLUSTER>-replication
key: tls.key
sslCert:
name: <MAIN-CLUSTER>-replication
key: tls.crt
sslRootCert:
name: <MAIN-CLUSTER>-ca
key: ca.crt
Example of Standalone Replica Cluster from an object store
The second example defines a replica cluster that bootstraps from an object
store using the recovery
section and continuous recovery using both streaming
replication and the given object store. For streaming replication, the replica
cluster connects to the source cluster using basic authentication.
You can check the sample YAML
for it in the samples/
subdirectory.
Note the bootstrap
and replica
sections pointing to the source cluster.
bootstrap:
recovery:
source: cluster-example
replica:
enabled: true
source: cluster-example
The previous configuration assumes that the application database and its owning
user are set to the default, app
. If the PostgreSQL cluster being restored
uses different names, you must specify them as documented in Configure the application database.
You should also consider copying over the application user secret from
the original cluster and keep it synchronized with the source.
See "About PostgreSQL Roles" for more details.
In the externalClusters
section, take care to use the right namespace in the
endpointURL
and the connectionParameters.host
.
And do ensure that the necessary secrets have been copied if necessary, and that
a backup of the source cluster has been created already.
externalClusters:
- name: <MAIN-CLUSTER>
barmanObjectStore:
destinationPath: s3://backups/
endpointURL: http://minio:9000
s3Credentials:
…
connectionParameters:
host: <MAIN-CLUSTER>-rw.default.svc
user: postgres
dbname: postgres
password:
name: <MAIN-CLUSTER>-superuser
key: password
Note
To use streaming replication between the source cluster and the replica cluster, we need to make sure there is network connectivity between the two clusters, and that all the necessary secrets which hold passwords or certificates are properly created in advance.
Example using a Volume Snapshot
If you use volume snapshots and your storage class provides snapshots cross-cluster availability, you can leverage that to bootstrap a replica cluster through a volume snapshot of the source cluster.
The third example defines a replica cluster that bootstraps
from a volume snapshot using the recovery
section. It uses
streaming replication (via basic authentication) and the object
store to fetch the WAL files.
You can check the sample YAML
for it in the samples/
subdirectory.
The example assumes that the application database and its owning
user are set to the default, app
. If the PostgreSQL cluster being restored
uses different names, you must specify them as documented in Configure the
application database.
You should also consider copying over the application user secret from
the original cluster and keep it synchronized with the source.
See "About PostgreSQL Roles" for more details.
Delayed replicas
CloudNativePG supports the creation of delayed replicas through the
.spec.replica.minApplyDelay
option,
leveraging PostgreSQL's
recovery_min_apply_delay
.
Delayed replicas are designed to intentionally lag behind the primary database
by a specified amount of time. This delay is configurable using the
.spec.replica.minApplyDelay
option, which maps to the underlying
recovery_min_apply_delay
parameter in PostgreSQL.
The primary objective of delayed replicas is to mitigate the impact of
unintended SQL statement executions on the primary database. This is especially
useful in scenarios where operations such as UPDATE
or DELETE
are performed
without a proper WHERE
clause.
To configure a delay in a replica cluster, adjust the
.spec.replica.minApplyDelay
option. This parameter determines how much time
the replicas will lag behind the primary. For example:
# ...
replica:
enabled: true
source: cluster-example
# Enforce a delay of 8 hours
minApplyDelay: '8h'
# ...
The above example helps safeguard against accidental data modifications by providing a buffer period of 8 hours to detect and correct issues before they propagate to the replicas.
Monitor and adjust the delay as needed based on your recovery time objectives and the potential impact of unintended primary database operations.
The main use cases of delayed replicas can be summarized into:
-
mitigating human errors: reduce the risk of data corruption or loss resulting from unintentional SQL operations on the primary database
-
recovery time optimization: facilitate quicker recovery from unintended changes by having a delayed replica that allows you to identify and rectify issues before changes are applied to other replicas.
-
enhanced data protection: safeguard critical data by introducing a time buffer that provides an opportunity to intervene and prevent the propagation of undesirable changes.
Warning
The minApplyDelay
option of delayed replicas cannot be used in
conjunction with promotionToken
.
By integrating delayed replicas into your replication strategy, you can enhance the resilience and data protection capabilities of your PostgreSQL environment. Adjust the delay duration based on your specific needs and the criticality of your data.
Important
Always measure your goals. Depending on your environment, it might be more efficient to rely on volume snapshot-based recovery for faster outcomes. Evaluate and choose the approach that best aligns with your unique requirements and infrastructure.