Running PostgreSQL in Kubernetes (Primary and Standby)
In chapters one, two and three we've managed to stand up a Primary and Stand-By PostgreSQL instances using containers.
We've learnt the fundamentals of how to persist and store data, how to configure instances and how to setup streaming replication from a Primary container to a Stand-by container.
In chapter four we lifted everything we've learnt so far into a Kubernetes cluster.
We ended with a basic PostgreSQL running in Kubernetes with persistence, but without high availability.
The challenges - a reminder
We have encountered a few challenges along the way, but running PostgreSQL in a container is pretty similar to just running it on a server outside of a container.
Kubernetes will add a bunch more complexity which we'll cover in this chapter.
A few points to note:
- If you are not familiar with running PostreSQL in a container, this chapter is not for you. Please go back to Chapter 1
- If you are not familiar with configuration of PostreSQL, do not attempt to run it in Kubernetes. Please go back to Chapter 2
- If you are not familiar with Streaming Replication, Do not attempt to run PostreSQL in Kubernetes. Please go back to Chapter 3
- If you are not familiar with StatefulSets, Do not attempt to run PostreSQL in Kubernetes
- We will not be using Popular PostgreSQL controllers\operators or Helm charts in this guide. Operators and controllers simply automate things, and those open source tooling assumes you understand all the above mentioned tech.
One caveat to think of before running PostgreSQL in Kubernetes, or any database for that matter, is how would you handle cluster upgrades?
Most cloud providers uprade by rolling new nodes and deleting old nodes, meaning your primary server may be deleted and start on a new node without any data.
If you don't have a strategy here, you will lose your data.
If something goes wrong and you're using operators or controllers and don't have a background in how PostgreSQL works, you will lose data.
And finally - The work in this guide has not been tested for Production workloads and written purely for educational purposes.
Create a Kubernetes cluster
In this chapter, we will start by creating a test Kubernetes cluster using kind
kind create cluster --name postgresql --image kindest/node:v1.23.5
kubectl get nodes
NAME STATUS ROLES AGE VERSION
postgresql-control-plane Ready control-plane,master 31s v1.23.5
Setting up our PostgreSQL environment
Deploy a namespace to hold our resources:
kubectl create ns postgresql
In Chapter 3, we defined a few environment variables in our docker run
command.
Some of those values are sensitive, so in Kubernetes we'll place sensitive values in a Kubernetes secret.
Create our secret for our first PostgreSQL instance:
kubectl -n postgresql create secret generic postgresql `
--from-literal POSTGRES_USER="postgresadmin" `
--from-literal POSTGRES_PASSWORD='admin123' `
--from-literal POSTGRES_DB="postgresdb" `
--from-literal REPLICATION_USER="replicationuser" `
--from-literal REPLICATION_PASSWORD='replicationPassword'
Review our first PostgreSQL instance
In Chapter four we created a basic PostgreSQL running in Kubernetes by combining everything we've learnt so far in the series annd creating a statefulset.yaml file.
Create our Primary PostgreSQL instance
Now what we will be doing is copying this file as it will be our starting point and call it postgres-1.yaml
.
We'll run through the file and will need to introduce a few things:
- Rename
postgres
topostgres-1
to simbolize our first instance which will act as a Primary server - Utilise
/docker-entrypoint-initdb.d/
initialization scripting - Automate adding of the replication user
Postgres Initialization
The Docker version of PostgreSQL has an initialization feature that will run any scripts in the folder /docker-entrypoint-initdb.d/
as explained on Docker Hub.
It will only run when the data directory of PostgreSQL has not been initialized, making it perfect for things we want to setup only once at the first start and not upon every restart.
To add this, we need a volume mount for out init
step so the initContainer can do things and share files with the database container:
We will need this in both init and database container:
volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
name: initdb
Then we need a volume
as an emptyDir
which is just an empty directory shared between the two containers
volumes:
- name: initdb
emptyDir: {}
Now our initContainer
can generate a script that creates our replication user. PostgreSQL will run this on the first initialization of the data directory
#create a init template
echo "CREATE USER #REPLICATION_USER REPLICATION LOGIN ENCRYPTED PASSWORD '#REPLICATION_PASSWORD';" > init.sql
# add credentials
sed -i 's/#REPLICATION_USER/'${REPLICATION_USER}'/g' init.sql
sed -i 's/#REPLICATION_PASSWORD/'${REPLICATION_PASSWORD}'/g' init.sql
mkdir -p /docker-entrypoint-initdb.d/
cp init.sql /docker-entrypoint-initdb.d/init.sql
The init container will also need the replication user secrets to be mounted to environment variables:
env:
- name: REPLICATION_USER
valueFrom:
secretKeyRef:
name: postgresql
key: REPLICATION_USER
optional: false
- name: REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: postgresql
key: REPLICATION_PASSWORD
optional: false
Now the initContainer
will create our replication user!
Deploy our Primary PostgreSQL instance
Deploy our first instance which will act as our primary:
kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-1.yaml
Check our installation
kubectl -n postgresql get pods
# check the initialization logs (should be clear)
kubectl -n postgresql logs postgres-1-0 -c init
# check the database logs
kubectl -n postgresql logs postgres-1-0
kubectl -n postgresql exec -it postgres-1-0 -- bash
# login to postgres
psql --username=postgresadmin postgresdb
# see our replication user created
\du
#create a table
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);
#show the table
\dt
# quit out of postgresql
\q
# check the data directory
ls -l /data/pgdata
# check the archive
ls -l /data/archive
Create our Standby Server
To create our Stand-By server, we will create a copy of postgres-1.yaml
and rename it to postgres-2.yaml
Now there are a few things we'll do here:
- Carefully rename
postgres-1
topostgres-2
- Setup replication streaming from the Primary instance.
Setting up our Stand by server automatically
As we learnt in previous chapters, a stand-by server is setup using pg_basebackup
utility.
We'll need to automate this as part of our initialization process.
I'd like to keep our PostgreSQL YAML the same eventually so it could be packaged as a Helm chart.
This will allow us to use the same YAML for Primary and Stand-by instances.
To achieve this, we'll introduce a STANDBY_MODE
variable which we can turn "on"
if we want a Stand-by, and "off"
if we want a Primary instance
if [ ${STANDBY_MODE} == "on" ];
then
# initialize from backup if data dir is empty
if [ -z "$(ls -A ${PGDATA})" ]; then
export PGPASSWORD=${REPLICATION_PASSWORD}
pg_basebackup -h ${PRIMARY_SERVER_ADDRESS} -p 5432 -U ${REPLICATION_USER} -D ${PGDATA} -Fp -Xs -R
fi
else
# previous logic here
fi
We we'll also need to introduce more environment variables
- name: STANDBY_MODE
value: "on"
- name: PRIMARY_SERVER_ADDRESS
value: "postgres-1"
- name: PGDATA
value: "/data/pgdata"
Deploy our Standby Server
Let's deploy the instance and see what happens:
kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml
Check our stand-by instance:
kubectl -n postgresql get pods
# check the initialization logs (you'll see `No such file or directory` because its not initialized yet!)
kubectl -n postgresql logs postgres-2-0 -c init
# check the database logs
kubectl -n postgresql logs postgres-2-0
kubectl -n postgresql exec -it postgres-2-0 -- bash
# login to postgres
psql --username=postgresadmin postgresdb
# see our replication user created
\du
#create a table
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);
#show the table
\dt
# quit out of postgresql
\q
# check the data directory
ls -l /data/pgdata
Failover
Now lets say postgres-1
fails.
PostgreSQL does not have built-in automated failver and recovery and requires tooling to perform this.
When postgres-1
fails, we would use a utility called pg_ctl to promote our stand-by server to a new primary server.
Then we have to build a new stand-by server just like we did in this guide.
We would also need to configure replication on the new primary, the same way we did in this guide.
Let's stop the primary server to simulate failure:
kubectl -n postgresql delete sts postgres-1
# notice the failure in replication from postgres-1
kubectl -n postgresql logs postgres-2-0
Then log into postgres-2
and promote it to primary:
kubectl -n postgresql exec -it postgres-2-0 -- bash
psql --username=postgresadmin postgresdb
# confirm we cannot create a table as its a stand-by server
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);
# quit out of postgresql
\q
# run pg_ctl as postgres user (cannot be run as root!)
runuser -u postgres -- pg_ctl promote
# confirm we can create a table as its a primary server
CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp);
Setup Replication on the new Primary
Reconfigure the postgres-2
instance as a primary:
env:
- name: STANDBY_MODE
value: "off"
- name: PRIMARY_SERVER_ADDRESS
value: ""
Deploy the changes:
kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml
# check our new instance
kubectl -n postgresql get pods
kubectl -n postgresql logs postgres-2-0 -c init
kubectl -n postgresql logs postgres-2-0
kubectl -n postgresql exec -it postgres-2-0 -- bash
If we wanted to deploy a new stand-by server, all we needed to do is create postgres-3
with STANDBY_MODE="on"
and set the PRIMARY_SERVER_ADDRESS="postgres-2
and we would have a new stand-by server
kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-3.yaml
That's it for chapter chapter five!