2023-02-07 12:19:01 +11:00

5.0 KiB

Kubernetes Autoscaling Guide

Cluster Autoscaling

Cluster autoscaler allows us to scale cluster nodes when they become full
I would recommend to learn about scaling your cluster nodes before scaling pods.
Video here

Kubernetes cluster auto scaling

Horizontal Pod Autoscaling

HPA allows us to scale pods when their resource utilisation goes over a threshold

Pod auto scaling

Requirements

A Cluster

  • For both autoscaling guides, we'll need a cluster.
  • For Cluster Autoscaler You need a cloud based cluster that supports the cluster autoscaler
  • For HPA We'll use kind

Cluster Autoscaling - Creating an AKS Cluster

# azure example

NAME=aks-getting-started
RESOURCEGROUP=aks-getting-started
SERVICE_PRINCIPAL=
SERVICE_PRINCIPAL_SECRET=

az aks create -n $NAME \
--resource-group $RESOURCEGROUP \
--location australiaeast \
--kubernetes-version 1.16.10 \
--nodepool-name default \
--node-count 1 \
--node-vm-size Standard_F4s_v2  \
--node-osdisk-size 250 \
--service-principal $SERVICE_PRINCIPAL \
--client-secret $SERVICE_PRINCIPAL_SECRET \
--output none \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5

Horizontal Pod Autocaling - Creating a Kind Cluster

My Node has 6 CPU cores for this demo

kind create cluster --name hpa --image kindest/node:v1.18.4

Metric Server

  • For Cluster Autoscaler - On cloud-based clusters, Metric server may already be installed.
  • For HPA - We're using kind

Metric Server provides container resource metrics for use in autoscaling pipelines

Because I run K8s 1.18 in kind, the Metric Server version i need is 0.3.7
We will need to deploy Metric Server 0.3.7
I used components.yamlfrom the release page link above.

Important Note : For Demo clusters (like kind), you will need to disable TLS
You can disable TLS by adding the following to the metrics-server container args

For production, make sure you remove the following :

- --kubelet-insecure-tls
- --kubelet-preferred-address-types="InternalIP"

Deployment:

cd kubernetes\autoscaling
kubectl -n kube-system apply -f .\components\metric-server\metricserver-0.3.7.yaml

#test 
kubectl -n kube-system get pods

#note: wait for metrics to populate!
kubectl top nodes

Example Application

For all autoscaling guides, we'll need a simple app, that generates some CPU load

  • Build the app
  • Push it to a registry
  • Ensure resource requirements are set
  • Deploy it to Kubernetes
  • Ensure metrics are visible for the app
# build

cd kubernetes\autoscaling\components\application
docker build . -t aimvector/application-cpu:v1.0.0

# push
docker push aimvector/application-cpu:v1.0.0

# resource requirements
resources:
  requests:
    memory: "50Mi"
    cpu: "500m"
  limits:
    memory: "500Mi"
    cpu: "2000m"

# deploy 
kubectl apply -f deployment.yaml

# metrics
kubectl top pods

Cluster Autoscaler

For cluster autoscaling, you should be able to scale the pods manually and watch the cluster scale.
Cluster autoscaling stops here.
For Pod Autoscaling (HPA), continue

Generate some traffic

Let's deploy a simple traffic generator pod

cd kubernetes\autoscaling\components\application
kubectl apply -f .\traffic-generator.yaml

# get a terminal to the traffic-generator
kubectl exec -it traffic-generator sh

# install wrk
apk add --no-cache wrk

# simulate some load
wrk -c 5 -t 5 -d 99999 -H "Connection: Close" http://application-cpu

#you can scale to pods manually and see roughly 6-7 pods will satisfy resource requests.
kubectl scale deploy/application-cpu --replicas 2

Deploy an autoscaler

# scale the deployment back down to 2
kubectl scale deploy/application-cpu --replicas 2

# deploy the autoscaler
kubectl autoscale deploy/application-cpu --cpu-percent=95 --min=1 --max=10

# pods should scale to roughly 6-7 to match criteria of 95% of resource requests

kubectl get pods
kubectl top pods
kubectl get hpa/application-cpu  -owide

kubectl describe hpa/application-cpu 

Vertical Pod Autoscaling

The vertical pod autoscaler allows us to automatically set request values on our pods
based on recommendations. This helps us tune the request values based on actual CPU and Memory usage.

More here