Scheduling

Scheduling, in Kubernetes, is the process responsible for placing a new pod on the best node possible, based on several criteria.

Kubernetes documentation

Please refer to the Kubernetes documentation for more information on scheduling, including all the available policies. On this page we assume you are familiar with concepts like affinity, anti-affinity, node selectors, and so on.

You can control how the CloudNativePG cluster's instances should be scheduled through the affinity section in the definition of the cluster, which supports:

  • pod affinity/anti-affinity
  • node selectors
  • tolerations

Info

CloudNativePG does not support pod templates for finer control on the scheduling of workloads. While they were part of the initial concept, the development team decided to postpone their introduction in a newer version of the API (most likely v2 of CNPG).

Pod affinity and anti-affinity

Kubernetes allows you to control which nodes a pod should (affinity) or should not (anti-affinity) be scheduled, based on the actual workloads already running in those nodes. This is technically known as inter-pod affinity/anti-affinity.

CloudNativePG by default will configure the cluster's instances preferably on different nodes, resulting in the following affinity definition:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: postgresql
                operator: In
                values:
                  - cluster-example
          topologyKey: kubernetes.io/hostname
        weight: 100

As a result of the following Cluster spec:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:15.4

  affinity:
    enablePodAntiAffinity: true #default value
    topologyKey: kubernetes.io/hostname #defaul value
    podAntiAffinityType: preferred #default value

  storage:
    size: 1Gi

Therefore, Kubernetes will prefer to schedule a 3-node PostgreSQL cluster over 3 different nodes - resources permitting.

The aforementioned default behavior can be changed by tweaking the above settings.

podAntiAffinityType can be set to required: resulting in requiredDuringSchedulingIgnoredDuringExecution being used instead of preferredDuringSchedulingIgnoredDuringExecution. Please, be aware that such a strong requirement might result in pending instances in case resources are not available (which is an expected condition when using Cluster Autoscaler for automated horizontal scaling of a Kubernetes cluster).

Inter-pod affinity and anti-affinity

More information on this topic is in the Kubernetes documentation.

Another possible value for topologyKey in a cloud environment can be topology.kubernetes.io/zone, to be sure pods will be spread across availability zones and not just nodes. Please refer to "Well-Known Labels, Annotations and Taints" for more options.

You can disable the operator's generated anti-affinity policies by setting enablePodAntiAffinity to false.

Additionally, in case a more fine-grained control is needed, you can specify a list of custom pod affinity or anti-affinity rules via the additionalPodAffinity and additionalPodAntiAffinity configuration attributes. These rules will be added to the ones generated by the operator, if enabled, or passed transparently otherwise.

Note

You have to pass to additionalPodAntiAffinity or additionalPodAffinity the whole content of podAntiAffinity or podAffinity that is expected by the Pod spec (please look at the following YAML as an example of having only one instance of PostgreSQL running on every worker node, regardless of which PostgreSQL cluster they belong to).

    additionalPodAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: postgresql
            operator: Exists
            values: []
        topologyKey: "kubernetes.io/hostname"

Node selection through nodeSelector

Kubernetes allows nodeSelector to provide a list of labels (defined as key-value pairs) to select the nodes on which a pod can run. Specifically, the node must have each indicated key-value pair as labels for the pod to be scheduled and run.

Similarly, CloudNativePG consents you to define a nodeSelector in the affinity section, so that you can request a PostgreSQL cluster to run only on nodes that have those labels.

Tolerations

Kubernetes allows you to specify (through taints) whether a node should repel all pods not explicitly tolerating (through tolerations) their taints.

So, by setting a proper set of tolerations for a workload matching a specific node's taints, Kubernetes scheduler will now take into consideration the tainted node, while deciding on which node to schedule the workload. Tolerations can be configured for all the pods of a Cluster through the .spec.affinity.tolerations section, which accepts the usual Kubernetes syntax for tolerations.

Taints and Tolerations

More information on taints and tolerations can be found in the Kubernetes documentation.