Kubernetes Services Optimization Techniques

6 min readFeb 28, 2021

Kubernetes (K8s) is a scalable and performant engine which orchestrates containers in a server environment. Although It is highly optimized by default, and it scales nicely, there are plenty of customizations for end-users to define. Since server costs can increase quickly and performance can be compromised due to poor design, we have to find ways to increase our infrastructure utilization, its performance and reduce costs to get the most out of our environments. We need to follow below good practices to optimize the performance.

1. Configure Deployment Resources

Kubernetes orchestrates containers at scale, and the heart of this mechanism is the efficient scheduling of pods into nodes. We can help the Kubernetes scheduler do this by specifying resource constraints.

In other words, we can define request and limit resources such as CPU, memory, or Linux Huge Pages.

For example, let’s say we have a Java microservice that acting as a rest endpoint. We can assign the following resource profile:

when we clearly define the resource requirements in the deployment descriptor, we make it easier for the scheduler to ensure that each resource is allocated to the best available node, which will improve runtime performance.

2 . Configure Image container-optimized

For performance tuning of our Kubernetes cluster and services, it’s very important to optimize the image we use. Containerized applications built to run on virtual machines include overhead that aren’t necessary in a container environment. A container-optimized image will greatly reduce our container image size and this lets Kubernetes retrieve that image faster and run the resultant running container more efficiently.

A container-optimized image, should:

Contains a single application or performs one thing (e.g. a single webserver or endpoint)
Contains only small images, because big images are bulky and difficult to load and ship.
Use container-friendly operating systems (e.g. Alpine) because they are more resistant to misconfigurations.
Leverages multistage builds so that you keep the app small because you only deploy the compiled application and not the dev sources it was built from.
Have health and readiness check endpoints, enabling Kubernetes to take appropriate actions if our container is down.

3. Configure taints and tolerations

It’s always important to ensure that Kubernetes is not deploying certain containers to specific nodes. This is the role of “Taints” that really act as the opposite of the Affinity rules. They provide Kubernetes with the rules that prevent things from happening. Like not permitting a certain set of nodes to be scheduled to certain zones or nodes. To apply taints to a certain node, we have to apply the taint option with kubectl. We have to specify the key and value part, after which we follow it up with a taint effect like NoExecute or NoSchedule:

$ kubectl taint nodes pool=high-mem:NoSchedule

We can later remove that taint

$ kubectl taint nodes pool=high-mem:NoSchedule-

Or we provide an exception for certain pods by including a “toleration” in the PodSpec. This could be useful when we have a node that we tainted so that nothing is scheduled on it, but now we want to schedule required jobs/services and nothing else. You could schedule jobs/services on the tainted nods by adding the following fields in the Pod Spec:

Since this matches the tainted node, any pod with that spec will be able to be deployed in the high-mem node pool.

While taints and tolerations do give operators very fine-grained control over performance, there is a cost in the effort required to initially configure them.

Please note, this example is based of Google Kubernetes engine.

4. HorizontalPodAutoscaler (HPA) Best Practices

HPA scales the number of pods in a replication controller, deployment, replica set or stateful set based on CPU utilization, Memory utilization, Custom and external metrics.

HPA is a great method to ensure that critical applications are elastic and can scale out to meet increasing demand as well scale down to ensure optimal resource usage. below are some points which we need to consider for better optimization.

All Pods should have Resource Requests Configured

HPA makes scaling decisions based on the observed CPU utilization values of pods that are part of a Kubernetes controller. Utilization values are calculated as a percentage of the resource requests of individual pods. Missing resource request values for some containers might throw off the utilization calculations of the HPA controller leading to suboptimal operation and scaling decisions.

A best practice therefore is to ensure that resource request values are configured for all containers of each individual pod, that is a part of the Kubernetes controller being scaled using HPA. Below is one of the example snippet used for “HorizontalPodAutoscaler” kind for an application

Configure and Install metrics-server

HPA makes scaling decisions based on per-pod resource metrics retrieved from the resource metrics API provided by the metrics-server. A best practice therefore is to launch metrics-server in your Kubernetes cluster as a cluster add-on.

In addition to this, another best practice is to set --horizontal-pod-autoscaler-use-rest-clients to true or unset. This is important since setting this flag to false will revert to Heapster.

Configure and install Custom or External Metrics

The HPA can also make scaling decisions based on custom or external metrics. There are two types of custom metrics supported: pod and object metrics. Pod Metrics are averaged across all pods and as such only support target type of AverageValue. Object metrics can describe any other object in the same namespace and support target types of both Value and AverageValue.

A best practice when configuring custom metrics is to ensure that the correct target type is used for pod and object metrics.

External metrics allow HPA to autoscale applications based on metrics provided by third party monitoring systems. External metrics support target types of both Value and AverageValue.

Configure Custom Metrics over External Metrics whenever Possible

A best practice when deciding between custom and external metrics (when such a choice is possible) is to prefer custom metrics. One reason for this is the fact that the external metrics API takes a lot more effort to secure as compared to custom metrics API and could potentially allow access to all metrics.

Configure Cooldown Period

The dynamic nature of the metrics being evaluated by the HPA may at times lead to scaling events in quick succession without a period between those scaling events. This leads to thrashing where the number of replicas fluctuates frequently and is not desirable.

To get around this and specify a cool down period a best practice is to configure the --horizontal-pod-autoscaler-downscale-stabilization flag passed to the kube-controller-manager. This flag has a default value of 5 minutes and specifies the duration HPA waits after a downscale event before initiating another downscale operation.

Kubernetes admins should also take into account the unique requirements of their applications when deciding on an optimal value for this duration.

By default the HPA tolerates a 10% change in the desired to actual metrics ratio before scaling. Depending on application requirements, this value can be changed by configuring the horizontal-pod-autoscaler-tolerance flag. Other configuration flags include --horizontal-pod-autoscaler-cpu-initialization-period duration, horizontal-pod-autoscaler-initial-readiness-delay duration and horizontal-pod-autoscaler-sync-period duration. All of these can be configured based on unique cluster or application requirements.

Kubernetes Services Optimization Techniques

1. Configure Deployment Resources

2 . Configure Image container-optimized

3. Configure taints and tolerations

4. HorizontalPodAutoscaler (HPA) Best Practices

Written by Ritul Rai