Scheduling
Scheduling, in Kubernetes, is the process responsible for placing a new pod on the best node possible, based on several criteria.
Kubernetes documentation
Please refer to the Kubernetes documentation for more information on scheduling, including all the available policies. On this page we assume you are familiar with concepts like affinity, anti-affinity, node selectors, and so on.
You can control how the CloudNativePG cluster's instances should be
scheduled through the affinity
section in the definition of the cluster, which supports:
- pod affinity/anti-affinity
- node selectors
- tolerations
Pod Affinity and Anti-Affinity
Kubernetes provides mechanisms to control where pods are scheduled using affinity and anti-affinity rules. These rules allow you to specify whether a pod should be scheduled on particular nodes (affinity) or avoided on specific nodes (anti-affinity) based on the workloads already running there. This capability is technically referred to as inter-pod affinity/anti-affinity.
By default, CloudNativePG configures cluster instances to preferably be
scheduled on different nodes, while pgBouncer
instances might still run on
the same nodes.
For example, given the following Cluster
specification:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.4
affinity:
enablePodAntiAffinity: true # Default value
topologyKey: kubernetes.io/hostname # Default value
podAntiAffinityType: preferred # Default value
storage:
size: 1Gi
The affinity
configuration applied in the instance pods will be:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: cnpg.io/cluster
operator: In
values:
- cluster-example
- key: cnpg.io/podRole
operator: In
values:
- instance
topologyKey: kubernetes.io/hostname
weight: 100
With this setup, Kubernetes will prefer to schedule a 3-node PostgreSQL cluster across three different nodes, assuming sufficient resources are available.
Requiring Pod Anti-Affinity
You can modify the default behavior by adjusting the settings mentioned above.
For example, setting podAntiAffinityType
to required
will enforce
requiredDuringSchedulingIgnoredDuringExecution
instead of
preferredDuringSchedulingIgnoredDuringExecution
.
However, be aware that this strict requirement may cause pods to remain pending if resources are insufficient—this is particularly relevant when using Cluster Autoscaler for automated horizontal scaling in a Kubernetes cluster.
Inter-pod Affinity and Anti-Affinity
For more details, refer to the Kubernetes documentation.
Topology Considerations
In cloud environments, you might consider using topology.kubernetes.io/zone
as the topologyKey
to ensure pods are distributed across different
availability zones rather than just nodes. For more options, see
Well-Known Labels, Annotations, and Taints.
Disabling Anti-Affinity Policies
If needed, you can disable the operator-generated anti-affinity policies by
setting enablePodAntiAffinity
to false
.
Fine-Grained Control with Custom Rules
For scenarios requiring more precise control, you can specify custom pod
affinity or anti-affinity rules using the additionalPodAffinity
and
additionalPodAntiAffinity
configuration attributes. These custom rules will
be added to those generated by the operator, if enabled, or used directly if
the operator-generated rules are disabled.
Note
When using additionalPodAntiAffinity
or additionalPodAffinity
, you must
provide the full podAntiAffinity
or podAffinity
structure expected by the
Pod specification. The following YAML example demonstrates how to configure
only one instance of PostgreSQL per worker node, regardless of which PostgreSQL
cluster it belongs to:
additionalPodAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: postgresql
operator: Exists
values: []
topologyKey: "kubernetes.io/hostname"
Node selection through nodeSelector
Kubernetes allows nodeSelector
to provide a list of labels (defined as
key-value pairs) to select the nodes on which a pod can run. Specifically,
the node must have each indicated key-value pair as labels for the
pod to be scheduled and run.
Similarly, CloudNativePG consents you to define a nodeSelector
in the
affinity
section, so that you can request a PostgreSQL cluster to run only
on nodes that have those labels.
Tolerations
Kubernetes allows you to specify (through taints
) whether a node should repel
all pods not explicitly tolerating (through tolerations
) their taints
.
So, by setting a proper set of tolerations
for a workload matching a specific
node's taints
, Kubernetes scheduler will now take into consideration the
tainted node, while deciding on which node to schedule the workload.
Tolerations can be configured for all the pods of a Cluster through the
.spec.affinity.tolerations
section, which accepts the usual Kubernetes syntax
for tolerations.
Taints and Tolerations
More information on taints and tolerations can be found in the Kubernetes documentation.
Isolating PostgreSQL workloads
Important
Before proceeding, please ensure you have read the "Architecture" section of the documentation.
While you can deploy PostgreSQL on Kubernetes in various ways, we recommend following these essential principles for production environments:
- Exploit Availability Zones: If possible, take advantage of availability zones (AZs) within the same Kubernetes cluster by distributing PostgreSQL instances across different AZs.
- Dedicate Worker Nodes: Allocate specific worker nodes for PostgreSQL
workloads through the
node-role.kubernetes.io/postgres
label and taint, as detailed in the Reserving Nodes for PostgreSQL Workloads section. - Avoid Node Overlap: Ensure that no instances from the same PostgreSQL cluster are running on the same node.
As explained in greater detail in the previous sections, CloudNativePG provides the flexibility to configure pod anti-affinity, node selectors, and tolerations.
Below is a sample configuration to ensure that a PostgreSQL Cluster
is
deployed on postgres
nodes, with its instances distributed across different
nodes:
# <snip>
affinity:
enablePodAntiAffinity: true
topologyKey: kubernetes.io/hostname
podAntiAffinityType: required
nodeSelector:
node-role.kubernetes.io/postgres: ""
tolerations:
- key: node-role.kubernetes.io/postgres
operator: Exists
effect: NoSchedule
# <snip>
Despite its simplicity, this setup ensures optimal distribution and isolation of PostgreSQL workloads, leading to enhanced performance and reliability in your production environment.