2025-01-19 07:56:41 +11:00
..
2025-01-19 07:56:41 +11:00
2025-01-19 07:56:41 +11:00

Running PostgreSQL in Kubernetes (Primary and Standby)

In chapters one, two and three we've managed to stand up a Primary and Stand-By PostgreSQL instances using containers.

We've learnt the fundamentals of how to persist and store data, how to configure instances and how to setup streaming replication from a Primary container to a Stand-by container.

In chapter four we lifted everything we've learnt so far into a Kubernetes cluster.

We ended with a basic PostgreSQL running in Kubernetes with persistence, but without high availability.

The challenges - a reminder

We have encountered a few challenges along the way, but running PostgreSQL in a container is pretty similar to just running it on a server outside of a container.

Kubernetes will add a bunch more complexity which we'll cover in this chapter.
A few points to note:

  • If you are not familiar with running PostreSQL in a container, this chapter is not for you. Please go back to Chapter 1
  • If you are not familiar with configuration of PostreSQL, do not attempt to run it in Kubernetes. Please go back to Chapter 2
  • If you are not familiar with Streaming Replication, Do not attempt to run PostreSQL in Kubernetes. Please go back to Chapter 3
  • If you are not familiar with StatefulSets, Do not attempt to run PostreSQL in Kubernetes
  • We will not be using Popular PostgreSQL controllers\operators or Helm charts in this guide. Operators and controllers simply automate things, and those open source tooling assumes you understand all the above mentioned tech.

One caveat to think of before running PostgreSQL in Kubernetes, or any database for that matter, is how would you handle cluster upgrades?
Most cloud providers uprade by rolling new nodes and deleting old nodes, meaning your primary server may be deleted and start on a new node without any data.
If you don't have a strategy here, you will lose your data.

If something goes wrong and you're using operators or controllers and don't have a background in how PostgreSQL works, you will lose data.

And finally - The work in this guide has not been tested for Production workloads and written purely for educational purposes.

Create a Kubernetes cluster

In this chapter, we will start by creating a test Kubernetes cluster using kind

kind create cluster --name postgresql --image kindest/node:v1.23.5

kubectl get nodes
NAME                       STATUS   ROLES                  AGE   VERSION
postgresql-control-plane   Ready    control-plane,master   31s   v1.23.5

Setting up our PostgreSQL environment

Deploy a namespace to hold our resources:

kubectl create ns postgresql

In Chapter 3, we defined a few environment variables in our docker run command.
Some of those values are sensitive, so in Kubernetes we'll place sensitive values in a Kubernetes secret.

Create our secret for our first PostgreSQL instance:

kubectl -n postgresql create secret generic postgresql `
  --from-literal POSTGRES_USER="postgresadmin" `
  --from-literal POSTGRES_PASSWORD='admin123' `
  --from-literal POSTGRES_DB="postgresdb" `
  --from-literal REPLICATION_USER="replicationuser" `
  --from-literal REPLICATION_PASSWORD='replicationPassword'

Review our first PostgreSQL instance

In Chapter four we created a basic PostgreSQL running in Kubernetes by combining everything we've learnt so far in the series annd creating a statefulset.yaml file.

Create our Primary PostgreSQL instance

Now what we will be doing is copying this file as it will be our starting point and call it postgres-1.yaml.

We'll run through the file and will need to introduce a few things:

  • Rename postgres to postgres-1 to simbolize our first instance which will act as a Primary server
  • Utilise /docker-entrypoint-initdb.d/ initialization scripting
  • Automate adding of the replication user

Postgres Initialization

The Docker version of PostgreSQL has an initialization feature that will run any scripts in the folder /docker-entrypoint-initdb.d/ as explained on Docker Hub.
It will only run when the data directory of PostgreSQL has not been initialized, making it perfect for things we want to setup only once at the first start and not upon every restart.

To add this, we need a volume mount for out init step so the initContainer can do things and share files with the database container:

We will need this in both init and database container:

volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
  name: initdb

Then we need a volume as an emptyDir which is just an empty directory shared between the two containers

volumes:
- name: initdb
  emptyDir: {}

Now our initContainer can generate a script that creates our replication user. PostgreSQL will run this on the first initialization of the data directory

#create a init template
echo "CREATE USER #REPLICATION_USER REPLICATION LOGIN ENCRYPTED PASSWORD '#REPLICATION_PASSWORD';" > init.sql

# add credentials
sed -i 's/#REPLICATION_USER/'${REPLICATION_USER}'/g' init.sql
sed -i 's/#REPLICATION_PASSWORD/'${REPLICATION_PASSWORD}'/g' init.sql

mkdir -p /docker-entrypoint-initdb.d/
cp init.sql /docker-entrypoint-initdb.d/init.sql

The init container will also need the replication user secrets to be mounted to environment variables:

env:
- name: REPLICATION_USER
  valueFrom:
    secretKeyRef:
      name: postgresql
      key: REPLICATION_USER
      optional: false
- name: REPLICATION_PASSWORD
  valueFrom:
    secretKeyRef:
      name: postgresql
      key: REPLICATION_PASSWORD
      optional: false

Now the initContainer will create our replication user!

Deploy our Primary PostgreSQL instance

Deploy our first instance which will act as our primary:

kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-1.yaml

Check our installation

kubectl -n postgresql get pods

# check the initialization logs (should be clear)
kubectl -n postgresql logs postgres-1-0 -c init

# check the database logs
kubectl -n postgresql logs postgres-1-0

kubectl -n postgresql exec -it postgres-1-0 -- bash

# login to postgres
psql --username=postgresadmin postgresdb

# see our replication user created
\du

#create a table
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);

#show the table
\dt

# quit out of postgresql
\q

# check the data directory
ls -l /data/pgdata

# check the archive 
ls -l /data/archive

Create our Standby Server

To create our Stand-By server, we will create a copy of postgres-1.yaml and rename it to postgres-2.yaml

Now there are a few things we'll do here:

  • Carefully rename postgres-1 to postgres-2
  • Setup replication streaming from the Primary instance.

Setting up our Stand by server automatically

As we learnt in previous chapters, a stand-by server is setup using pg_basebackup utility.
We'll need to automate this as part of our initialization process.

I'd like to keep our PostgreSQL YAML the same eventually so it could be packaged as a Helm chart.

This will allow us to use the same YAML for Primary and Stand-by instances.

To achieve this, we'll introduce a STANDBY_MODE variable which we can turn "on" if we want a Stand-by, and "off" if we want a Primary instance

if [ ${STANDBY_MODE} == "on" ]; 
then
  # initialize from backup if data dir is empty
  if [ -z "$(ls -A ${PGDATA})" ]; then
    export PGPASSWORD=${REPLICATION_PASSWORD}
    pg_basebackup -h ${PRIMARY_SERVER_ADDRESS} -p 5432 -U ${REPLICATION_USER} -D ${PGDATA} -Fp -Xs -R
  fi
else
  # previous logic here
fi

We we'll also need to introduce more environment variables

- name: STANDBY_MODE
  value: "on"
- name: PRIMARY_SERVER_ADDRESS
  value: "postgres-1"
- name: PGDATA
  value: "/data/pgdata"

Deploy our Standby Server

Let's deploy the instance and see what happens:

kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml

Check our stand-by instance:

kubectl -n postgresql get pods

# check the initialization logs (you'll see `No such file or directory` because its not initialized yet!)
kubectl -n postgresql logs postgres-2-0 -c init

# check the database logs
kubectl -n postgresql logs postgres-2-0

kubectl -n postgresql exec -it postgres-2-0 -- bash

# login to postgres
psql --username=postgresadmin postgresdb

# see our replication user created
\du

#create a table
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);

#show the table
\dt

# quit out of postgresql
\q

# check the data directory
ls -l /data/pgdata

Failover

Now lets say postgres-1 fails.
PostgreSQL does not have built-in automated failver and recovery and requires tooling to perform this.

When postgres-1 fails, we would use a utility called pg_ctl to promote our stand-by server to a new primary server.

Then we have to build a new stand-by server just like we did in this guide.
We would also need to configure replication on the new primary, the same way we did in this guide.

Let's stop the primary server to simulate failure:

kubectl -n postgresql delete sts postgres-1

# notice the failure in replication from postgres-1 
 kubectl -n postgresql logs postgres-2-0

Then log into postgres-2 and promote it to primary:

kubectl -n postgresql exec -it postgres-2-0 -- bash
psql --username=postgresadmin postgresdb

# confirm we cannot create a table as its a stand-by server
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);

# quit out of postgresql
\q 

# run pg_ctl as postgres user (cannot be run as root!)
runuser -u postgres -- pg_ctl promote

# confirm we can create a table as its a primary server
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);

Setup Replication on the new Primary

Reconfigure the postgres-2 instance as a primary:

env:
- name: STANDBY_MODE
  value: "off"
- name: PRIMARY_SERVER_ADDRESS
  value: ""

Deploy the changes:

kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml

# check our new instance
kubectl -n postgresql get pods
kubectl -n postgresql logs postgres-2-0 -c init
kubectl -n postgresql logs postgres-2-0
kubectl -n postgresql exec -it postgres-2-0 -- bash

If we wanted to deploy a new stand-by server, all we needed to do is create postgres-3 with STANDBY_MODE="on" and set the PRIMARY_SERVER_ADDRESS="postgres-2 and we would have a new stand-by server

kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-3.yaml

That's it for chapter chapter five!