# Running PostgreSQL in Kubernetes (Primary and Standby) In chapters [one](../1-introduction/README.md), [two](../2-configuration/README.md) and [three](../3-replication/README.md) we've managed to stand up a Primary and Stand-By PostgreSQL instances using containers.
We've learnt the fundamentals of how to persist and store data, how to configure instances and how to setup streaming replication from a Primary container to a Stand-by container.
In chapter [four](../4-k8s-basic/README.md) we lifted everything we've learnt so far into a Kubernetes cluster.
We ended with a basic PostgreSQL running in Kubernetes with persistence, but without high availability.
## The challenges - a reminder We have encountered a few challenges along the way, but running PostgreSQL in a container is pretty similar to just running it on a server outside of a container.
Kubernetes will add a bunch more complexity which we'll cover in this chapter.
A few points to note: * If you are not familiar with running PostreSQL in a container, this chapter is not for you. Please go back to [Chapter 1](../1-introduction/README.md) * If you are not familiar with configuration of PostreSQL, do not attempt to run it in Kubernetes. Please go back to [Chapter 2](../2-configuration/README.md) * If you are not familiar with Streaming Replication, Do not attempt to run PostreSQL in Kubernetes. Please go back to [Chapter 3](../3-replication/README.md) * If you are not familiar with [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), Do not attempt to run PostreSQL in Kubernetes * We will not be using Popular PostgreSQL controllers\operators or Helm charts in this guide. Operators and controllers simply automate things, and those open source tooling assumes you understand all the above mentioned tech. One caveat to think of before running PostgreSQL in Kubernetes, or any database for that matter, is how would you handle cluster upgrades?
Most cloud providers uprade by rolling new nodes and deleting old nodes, meaning your primary server may be deleted and start on a new node without any data.
If you don't have a strategy here, you will lose your data.
If something goes wrong and you're using operators or controllers and don't have a background in how PostgreSQL works, you will lose data.
And finally - The work in this guide has not been tested for Production workloads and written purely for educational purposes.
## Create a Kubernetes cluster In this chapter, we will start by creating a test Kubernetes cluster using [kind](https://kind.sigs.k8s.io/)
``` kind create cluster --name postgresql --image kindest/node:v1.23.5 kubectl get nodes NAME STATUS ROLES AGE VERSION postgresql-control-plane Ready control-plane,master 31s v1.23.5 ``` ## Setting up our PostgreSQL environment Deploy a namespace to hold our resources: ``` kubectl create ns postgresql ``` In [Chapter 3](../3-replication/README.md), we defined a few environment variables in our `docker run` command.
Some of those values are sensitive, so in Kubernetes we'll place sensitive values in a Kubernetes [secret](https://kubernetes.io/docs/concepts/configuration/secret/).
Create our secret for our first PostgreSQL instance: ``` kubectl -n postgresql create secret generic postgresql ` --from-literal POSTGRES_USER="postgresadmin" ` --from-literal POSTGRES_PASSWORD='admin123' ` --from-literal POSTGRES_DB="postgresdb" ` --from-literal REPLICATION_USER="replicationuser" ` --from-literal REPLICATION_PASSWORD='replicationPassword' ``` ## Review our first PostgreSQL instance In [Chapter four](../4-k8s-basic/README.md) we created a basic PostgreSQL running in Kubernetes by combining everything we've learnt so far in the series annd creating a [statefulset.yaml](./yaml/statefulset.yaml) file.
## Create our Primary PostgreSQL instance Now what we will be doing is copying this file as it will be our starting point and call it `postgres-1.yaml`.
We'll run through the file and will need to introduce a few things: * Rename `postgres` to `postgres-1` to simbolize our first instance which will act as a Primary server * Utilise `/docker-entrypoint-initdb.d/` initialization scripting * Automate adding of the replication user ### Postgres Initialization The Docker version of PostgreSQL has an initialization feature that will run any scripts in the folder `/docker-entrypoint-initdb.d/` as explained on [Docker Hub](https://hub.docker.com/_/postgres).
It will only run when the data directory of PostgreSQL has not been initialized, making it perfect for things we want to setup only once at the first start and not upon every restart.
To add this, we need a volume mount for out `init` step so the initContainer can do things and share files with the database container: We will need this in both init and database container: ``` volumeMounts: - mountPath: /docker-entrypoint-initdb.d name: initdb ``` Then we need a `volume` as an `emptyDir` which is just an empty directory shared between the two containers ``` volumes: - name: initdb emptyDir: {} ``` Now our `initContainer` can generate a script that creates our replication user. PostgreSQL will run this on the first initialization of the data directory ``` #create a init template echo "CREATE USER #REPLICATION_USER REPLICATION LOGIN ENCRYPTED PASSWORD '#REPLICATION_PASSWORD';" > init.sql # add credentials sed -i 's/#REPLICATION_USER/'${REPLICATION_USER}'/g' init.sql sed -i 's/#REPLICATION_PASSWORD/'${REPLICATION_PASSWORD}'/g' init.sql mkdir -p /docker-entrypoint-initdb.d/ cp init.sql /docker-entrypoint-initdb.d/init.sql ``` The init container will also need the replication user secrets to be mounted to environment variables: ``` env: - name: REPLICATION_USER valueFrom: secretKeyRef: name: postgresql key: REPLICATION_USER optional: false - name: REPLICATION_PASSWORD valueFrom: secretKeyRef: name: postgresql key: REPLICATION_PASSWORD optional: false ``` Now the `initContainer` will create our replication user! ## Deploy our Primary PostgreSQL instance Deploy our first instance which will act as our primary: ``` kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-1.yaml ``` ## Check our installation ``` kubectl -n postgresql get pods # check the initialization logs (should be clear) kubectl -n postgresql logs postgres-1-0 -c init # check the database logs kubectl -n postgresql logs postgres-1-0 kubectl -n postgresql exec -it postgres-1-0 -- bash # login to postgres psql --username=postgresadmin postgresdb # see our replication user created \du #create a table CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp); #show the table \dt # quit out of postgresql \q # check the data directory ls -l /data/pgdata # check the archive ls -l /data/archive ``` ## Create our Standby Server To create our Stand-By server, we will create a copy of `postgres-1.yaml` and rename it to `postgres-2.yaml` Now there are a few things we'll do here: * Carefully rename `postgres-1` to `postgres-2` * Setup replication streaming from the Primary instance. ### Setting up our Stand by server automatically As we learnt in previous chapters, a stand-by server is setup using `pg_basebackup` utility.
We'll need to automate this as part of our initialization process.
I'd like to keep our PostgreSQL YAML the same eventually so it could be packaged as a Helm chart.
This will allow us to use the same YAML for Primary and Stand-by instances.
To achieve this, we'll introduce a `STANDBY_MODE` variable which we can turn `"on"` if we want a Stand-by, and `"off"` if we want a Primary instance ``` if [ ${STANDBY_MODE} == "on" ]; then # initialize from backup if data dir is empty if [ -z "$(ls -A ${PGDATA})" ]; then export PGPASSWORD=${REPLICATION_PASSWORD} pg_basebackup -h ${PRIMARY_SERVER_ADDRESS} -p 5432 -U ${REPLICATION_USER} -D ${PGDATA} -Fp -Xs -R fi else # previous logic here fi ``` We we'll also need to introduce more environment variables ``` - name: STANDBY_MODE value: "on" - name: PRIMARY_SERVER_ADDRESS value: "postgres-1" - name: PGDATA value: "/data/pgdata" ``` ## Deploy our Standby Server Let's deploy the instance and see what happens: ``` kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml ``` Check our stand-by instance: ``` kubectl -n postgresql get pods # check the initialization logs (you'll see `No such file or directory` because its not initialized yet!) kubectl -n postgresql logs postgres-2-0 -c init # check the database logs kubectl -n postgresql logs postgres-2-0 kubectl -n postgresql exec -it postgres-2-0 -- bash # login to postgres psql --username=postgresadmin postgresdb # see our replication user created \du #create a table CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp); #show the table \dt # quit out of postgresql \q # check the data directory ls -l /data/pgdata ``` ## Failover Now lets say `postgres-1` fails.
PostgreSQL does not have built-in automated failver and recovery and requires tooling to perform this.
When `postgres-1` fails, we would use a utility called [pg_ctl](https://www.postgresql.org/docs/current/app-pg-ctl.html) to promote our stand-by server to a new primary server.
Then we have to build a new stand-by server just like we did in this guide.
We would also need to configure replication on the new primary, the same way we did in this guide.
Let's stop the primary server to simulate failure: ``` kubectl -n postgresql delete sts postgres-1 # notice the failure in replication from postgres-1 kubectl -n postgresql logs postgres-2-0 ``` Then log into `postgres-2` and promote it to primary: ``` kubectl -n postgresql exec -it postgres-2-0 -- bash psql --username=postgresadmin postgresdb # confirm we cannot create a table as its a stand-by server CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp); # quit out of postgresql \q # run pg_ctl as postgres user (cannot be run as root!) runuser -u postgres -- pg_ctl promote # confirm we can create a table as its a primary server CREATE TABLE customers (firstname text, customer_id serial, date_created timestamp); ``` ## Setup Replication on the new Primary Reconfigure the `postgres-2` instance as a primary: ``` env: - name: STANDBY_MODE value: "off" - name: PRIMARY_SERVER_ADDRESS value: "" ``` Deploy the changes: ``` kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-2.yaml # check our new instance kubectl -n postgresql get pods kubectl -n postgresql logs postgres-2-0 -c init kubectl -n postgresql logs postgres-2-0 kubectl -n postgresql exec -it postgres-2-0 -- bash ``` If we wanted to deploy a new stand-by server, all we needed to do is create `postgres-3` with `STANDBY_MODE="on"` and set the `PRIMARY_SERVER_ADDRESS="postgres-2` and we would have a new stand-by server
``` kubectl -n postgresql apply -f storage/databases/postgresql/5-k8s-replication/yaml/postgres-3.yaml ``` That's it for chapter chapter five!