Scheduling

Scheduling, in Kubernetes, is the process responsible for placing a new pod on the best node possible, based on several criteria.

Kubernetes documentation

Please refer to the Kubernetes documentation for more information on scheduling, including all the available policies. On this page we assume you are familiar with concepts like affinity, anti-affinity, node selectors, and so on.

You can control how the CloudNativePG cluster's instances should be scheduled through the affinity section in the definition of the cluster, which supports:

  • pod affinity/anti-affinity
  • node selectors
  • tolerations

Pod Affinity and Anti-Affinity

Kubernetes provides mechanisms to control where pods are scheduled using affinity and anti-affinity rules. These rules allow you to specify whether a pod should be scheduled on particular nodes (affinity) or avoided on specific nodes (anti-affinity) based on the workloads already running there. This capability is technically referred to as inter-pod affinity/anti-affinity.

By default, CloudNativePG configures cluster instances to preferably be scheduled on different nodes, while pgBouncer instances might still run on the same nodes.

For example, given the following Cluster specification:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  affinity:
    enablePodAntiAffinity: true # Default value
    topologyKey: kubernetes.io/hostname # Default value
    podAntiAffinityType: preferred # Default value

  storage:
    size: 1Gi

The affinity configuration applied in the instance pods will be:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: cnpg.io/cluster
                operator: In
                values:
                  - cluster-example
              - key: cnpg.io/podRole
                operator: In
                values:
                  - instance
          topologyKey: kubernetes.io/hostname
        weight: 100

With this setup, Kubernetes will prefer to schedule a 3-node PostgreSQL cluster across three different nodes, assuming sufficient resources are available.

Requiring Pod Anti-Affinity

You can modify the default behavior by adjusting the settings mentioned above.

For example, setting podAntiAffinityType to required will enforce requiredDuringSchedulingIgnoredDuringExecution instead of preferredDuringSchedulingIgnoredDuringExecution.

However, be aware that this strict requirement may cause pods to remain pending if resources are insufficient—this is particularly relevant when using Cluster Autoscaler for automated horizontal scaling in a Kubernetes cluster.

Inter-pod Affinity and Anti-Affinity

For more details, refer to the Kubernetes documentation.

Topology Considerations

In cloud environments, you might consider using topology.kubernetes.io/zone as the topologyKey to ensure pods are distributed across different availability zones rather than just nodes. For more options, see Well-Known Labels, Annotations, and Taints.

Disabling Anti-Affinity Policies

If needed, you can disable the operator-generated anti-affinity policies by setting enablePodAntiAffinity to false.

Fine-Grained Control with Custom Rules

For scenarios requiring more precise control, you can specify custom pod affinity or anti-affinity rules using the additionalPodAffinity and additionalPodAntiAffinity configuration attributes. These custom rules will be added to those generated by the operator, if enabled, or used directly if the operator-generated rules are disabled.

Note

When using additionalPodAntiAffinity or additionalPodAffinity, you must provide the full podAntiAffinity or podAffinity structure expected by the Pod specification. The following YAML example demonstrates how to configure only one instance of PostgreSQL per worker node, regardless of which PostgreSQL cluster it belongs to:

    additionalPodAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: postgresql
            operator: Exists
            values: []
        topologyKey: "kubernetes.io/hostname"

Node selection through nodeSelector

Kubernetes allows nodeSelector to provide a list of labels (defined as key-value pairs) to select the nodes on which a pod can run. Specifically, the node must have each indicated key-value pair as labels for the pod to be scheduled and run.

Similarly, CloudNativePG consents you to define a nodeSelector in the affinity section, so that you can request a PostgreSQL cluster to run only on nodes that have those labels.

Tolerations

Kubernetes allows you to specify (through taints) whether a node should repel all pods not explicitly tolerating (through tolerations) their taints.

So, by setting a proper set of tolerations for a workload matching a specific node's taints, Kubernetes scheduler will now take into consideration the tainted node, while deciding on which node to schedule the workload. Tolerations can be configured for all the pods of a Cluster through the .spec.affinity.tolerations section, which accepts the usual Kubernetes syntax for tolerations.

Taints and Tolerations

More information on taints and tolerations can be found in the Kubernetes documentation.

Isolating PostgreSQL workloads

Important

Before proceeding, please ensure you have read the "Architecture" section of the documentation.

While you can deploy PostgreSQL on Kubernetes in various ways, we recommend following these essential principles for production environments:

  • Exploit Availability Zones: If possible, take advantage of availability zones (AZs) within the same Kubernetes cluster by distributing PostgreSQL instances across different AZs.
  • Dedicate Worker Nodes: Allocate specific worker nodes for PostgreSQL workloads through the node-role.kubernetes.io/postgres label and taint, as detailed in the Reserving Nodes for PostgreSQL Workloads section.
  • Avoid Node Overlap: Ensure that no instances from the same PostgreSQL cluster are running on the same node.

As explained in greater detail in the previous sections, CloudNativePG provides the flexibility to configure pod anti-affinity, node selectors, and tolerations.

Below is a sample configuration to ensure that a PostgreSQL Cluster is deployed on postgres nodes, with its instances distributed across different nodes:

  # <snip>
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    podAntiAffinityType: required
    nodeSelector:
      node-role.kubernetes.io/postgres: ""
    tolerations:
    - key: node-role.kubernetes.io/postgres
      operator: Exists
      effect: NoSchedule
  # <snip>

Despite its simplicity, this setup ensures optimal distribution and isolation of PostgreSQL workloads, leading to enhanced performance and reliability in your production environment.