Scheduling
Scheduling, in Kubernetes, is the process responsible for placing a new pod on the best node possible, based on several criteria.
Kubernetes documentation
Please refer to the Kubernetes documentation for more information on scheduling, including all the available policies. On this page we assume you are familiar with concepts like affinity, anti-affinity, node selectors, and so on.
You can control how the CloudNativePG cluster's instances should be
scheduled through the affinity
section in the definition of the cluster, which supports:
- pod affinity/anti-affinity
- node selectors
- tolerations
Info
CloudNativePG does not support pod templates for finer control on the scheduling of workloads. While they were part of the initial concept, the development team decided to postpone their introduction in a newer version of the API (most likely v2 of CNPG).
Pod affinity and anti-affinity
Kubernetes allows you to control which nodes a pod should (affinity) or should not (anti-affinity) be scheduled, based on the actual workloads already running in those nodes. This is technically known as inter-pod affinity/anti-affinity.
CloudNativePG by default will configure the cluster's instances
preferably on different nodes, while pgBouncer may still run on the same nodes,
resulting in the following affinity
definition:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: cnpg.io/cluster
operator: In
values:
- cluster-example
- key: cnpg.io/podRole
operator: In
values:
- instance
topologyKey: kubernetes.io/hostname
weight: 100
As a result of the following Cluster spec:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.3
affinity:
enablePodAntiAffinity: true #default value
topologyKey: kubernetes.io/hostname #defaul value
podAntiAffinityType: preferred #default value
storage:
size: 1Gi
Therefore, Kubernetes will prefer to schedule a 3-node PostgreSQL cluster over 3 different nodes - resources permitting.
The aforementioned default behavior can be changed by tweaking the above settings.
podAntiAffinityType
can be set to required
: resulting in
requiredDuringSchedulingIgnoredDuringExecution
being used instead of
preferredDuringSchedulingIgnoredDuringExecution
. Please, be aware that such a
strong requirement might result in pending instances in case resources are not
available (which is an expected condition when using
Cluster Autoscaler
for automated horizontal scaling of a Kubernetes cluster).
Inter-pod affinity and anti-affinity
More information on this topic is in the Kubernetes documentation.
Another possible value for topologyKey
in a cloud environment can be
topology.kubernetes.io/zone
, to be sure pods will be spread across
availability zones and not just nodes. Please refer to
"Well-Known Labels, Annotations and Taints"
for more options.
You can disable the operator's generated anti-affinity policies by setting
enablePodAntiAffinity
to false.
Additionally, in case a more fine-grained control is needed, you can specify a
list of custom pod affinity or anti-affinity rules via the
additionalPodAffinity
and additionalPodAntiAffinity
configuration
attributes. These rules will be added to the ones generated by the operator,
if enabled, or passed transparently otherwise.
Note
You have to pass to additionalPodAntiAffinity
or additionalPodAffinity
the whole content of podAntiAffinity
or podAffinity
that is expected by the
Pod spec (please look at the following YAML as an example of having only one
instance of PostgreSQL running on every worker node, regardless of which
PostgreSQL cluster they belong to).
additionalPodAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: postgresql
operator: Exists
values: []
topologyKey: "kubernetes.io/hostname"
Node selection through nodeSelector
Kubernetes allows nodeSelector
to provide a list of labels (defined as
key-value pairs) to select the nodes on which a pod can run. Specifically,
the node must have each indicated key-value pair as labels for the
pod to be scheduled and run.
Similarly, CloudNativePG consents you to define a nodeSelector
in the
affinity
section, so that you can request a PostgreSQL cluster to run only
on nodes that have those labels.
Tolerations
Kubernetes allows you to specify (through taints
) whether a node should repel
all pods not explicitly tolerating (through tolerations
) their taints
.
So, by setting a proper set of tolerations
for a workload matching a specific
node's taints
, Kubernetes scheduler will now take into consideration the
tainted node, while deciding on which node to schedule the workload.
Tolerations can be configured for all the pods of a Cluster through the
.spec.affinity.tolerations
section, which accepts the usual Kubernetes syntax
for tolerations.
Taints and Tolerations
More information on taints and tolerations can be found in the Kubernetes documentation.