Scheduling, in Kubernetes, is the process responsible for placing a new pod on the best node possible, based on several criteria.
Please refer to the Kubernetes documentation for more information on scheduling, including all the available policies. On this page we assume you are familiar with concepts like affinity, anti-affinity, node selectors, and so on.
You can control how the CloudNativePG cluster's instances should be
scheduled through the
section in the definition of the cluster, which supports:
- pod affinity/anti-affinity
- node selectors
CloudNativePG does not support pod templates for finer control on the scheduling of workloads. While they were part of the initial concept, the development team decided to postpone their introduction in a newer version of the API (most likely v2 of CNPG).
Pod affinity and anti-affinity
Kubernetes allows you to control which nodes a pod should (affinity) or should not (anti-affinity) be scheduled, based on the actual workloads already running in those nodes. This is technically known as inter-pod affinity/anti-affinity.
CloudNativePG by default will configure the cluster's instances
preferably on different nodes, resulting in the following
affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: postgresql operator: In values: - cluster-example topologyKey: kubernetes.io/hostname weight: 100
As a result of the following Cluster spec:
apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: cluster-example spec: instances: 3 imageName: ghcr.io/cloudnative-pg/postgresql:15.3 affinity: enablePodAntiAffinity: true #default value topologyKey: kubernetes.io/hostname #defaul value podAntiAffinityType: preferred #default value storage: size: 1Gi
Therefore, Kubernetes will prefer to schedule a 3-node PostgreSQL cluster over 3 different nodes - resources permitting.
The aforementioned default behavior can be changed by tweaking the above settings.
podAntiAffinityType can be set to
required: resulting in
requiredDuringSchedulingIgnoredDuringExecution being used instead of
preferredDuringSchedulingIgnoredDuringExecution. Please, be aware that such a
strong requirement might result in pending instances in case resources are not
available (which is an expected condition when using
for automated horizontal scaling of a Kubernetes cluster).
Inter-pod affinity and anti-affinity
More information on this topic is in the Kubernetes documentation.
Another possible value for
topologyKey in a cloud environment can be
topology.kubernetes.io/zone, to be sure pods will be spread across
availability zones and not just nodes. Please refer to
"Well-Known Labels, Annotations and Taints"
for more options.
You can disable the operator's generated anti-affinity policies by setting
enablePodAntiAffinity to false.
Additionally, in case a more fine-grained control is needed, you can specify a
list of custom pod affinity or anti-affinity rules via the
attributes. These rules will be added to the ones generated by the operator,
if enabled, or passed transparently otherwise.
You have to pass to
the whole content of
podAffinity that is expected by the
Pod spec (please look at the following YAML as an example of having only one
instance of PostgreSQL running on every worker node, regardless of which
PostgreSQL cluster they belong to).
additionalPodAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: postgresql operator: Exists values:  topologyKey: "kubernetes.io/hostname"
Node selection through
nodeSelector to provide a list of labels (defined as
key-value pairs) to select the nodes on which a pod can run. Specifically,
the node must have each indicated key-value pair as labels for the
pod to be scheduled and run.
Similarly, CloudNativePG consents you to define a
nodeSelector in the
affinity section, so that you can request a PostgreSQL cluster to run only
on nodes that have those labels.
Kubernetes allows you to specify (through
taints) whether a node should repel
all pods not explicitly tolerating (through
So, by setting a proper set of
tolerations for a workload matching a specific
taints, Kubernetes scheduler will now take into consideration the
tainted node, while deciding on which node to schedule the workload.
Tolerations can be configured for all the pods of a Cluster through the
.spec.affinity.tolerations section, which accepts the usual Kubernetes syntax
Taints and Tolerations
More information on taints and tolerations can be found in the Kubernetes documentation.