2023-03-09 13:03:40 +11:00

7.6 KiB

How to configure PostgreSQL

This is part 2 of our PostgreSQL series.
In this chapter, we learn about fundamentals of the Postgres configuration.

Many people make the mistakes of relying directly on Kubernetes PostgreSQL controllers and Helm charts without having any understanding of Databases.

Let's start where we left off, and review our simple PostgreSQL database:

Run a simple PostgreSQL database (docker)

cd storage/databases/postgresql/2-configuration
docker run -it --rm --name postgres `
  -e POSTGRES_PASSWORD=admin123 `
  -v ${PWD}/pgdata:/var/lib/postgresql/data `
  -p 5000:5432 `
  postgres:15.0

Environment Variables

Many settings can be specified using environment variables.
I generally recommend not relying on default values and set most of the settings possible.

I personally prefer most or all settings in a configuration file, so it can be committed to source control.
This is where Environment variables are great because we can inject secrets there and keep passwords out of our configuration files and out of source control.

This will be important in Kubernetes later on.

We will not learn all or even most of the configurations in this chapter, as PostgreSQL has a lot of depth. So we will only learn what we need, one step at a time.

Let's take a look at some basic configurations here

Let's set a few things here:

Environment Variable Meaning
POSTGRES_USER Username for the Postgres Admin
POSTGRES_PASSWORD Password for the Postgres Admin
POSTGRES_DB Default database for your Postgres Server
PGDATA Path where data is stored

Configuration files

If we take a look at our docker mount that we defined in our docker run command:

-v ${PWD}/pgdata:/var/lib/postgresql/data

The {PWD}/pgdata folder that we have mounted contains not only data, but some default configuration files that we can explore.

Three files are important here:

Configuration file Meaning Documentation
pg_hba.conf Host Based Authentication file Official Documentation
pg_ident.conf User Mappings file Official Documentation
postgresql.conf PostgreSQL main configuraiton

The pg_hba.conf File

We'll start this guide with the host based authentication file.
This file is automatically created in the data directory as we see.
We should create a copy of this file and configure it ourselves.

It controls who can access our PostgreSQL server.
Let's refer to the official documentation as well as walk through the config.
The config file itself has a great description of the contents.

As mentioned in the previous chapter, it's always good not to rely on default configurations. So let's create our own pg_hba.conf file.

We can grab the content from the default configuration and we may edit it as we go.

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

host all all all scram-sha-256

The pg_ident.conf File

This config file is a mapping file between system users and database users.
Let's refer to the official documentation and walk through the config.
This is not a feature that we will need in this series, so we will skip this config for the time being.

The postgresql.conf File

This configuration file is the main one for PostgreSQL.
As you can see this is a large file with in-depth tuning and customization capability.

File Locations

Let's set our data directory locations as well as config file locations
Our volume mount path in the container is also short and simple.
Note that we also split config from data so we have separate paths :

data_directory = '/data'
hba_file = '/config/pg_hba.conf'
ident_file = '/config/pg_ident.conf'

Connection and Authentication

The shared_buffers parameter determines how much memory is dedicated to the server for caching data. The value should be set to 15% to 25% of the machine's total RAM. For example: if your machine's RAM size is 32 GB, then the recommended value for shared_buffers is 8 GB

We will take a look at WAL (Write Ahead Log), Archiving, Primary, and Standby configurations in a future chapter on replication

port = 5432
listen_addresses = '*'
max_connections = 100
shared_buffers = 128MB
dynamic_shared_memory_type = posix
max_wal_size = 1GB
min_wal_size = 80MB
log_timezone = 'Etc/UTC'
datestyle = 'iso, mdy'
timezone = 'Etc/UTC'

#locale settings
lc_messages = 'en_US.utf8'			# locale for system error message
lc_monetary = 'en_US.utf8'			# locale for monetary formatting
lc_numeric = 'en_US.utf8'			# locale for number formatting
lc_time = 'en_US.utf8'				# locale for time formatting

default_text_search_config = 'pg_catalog.english'

We can also include other configurations from other locations with the include_dir and include options.
We will skip these for the sake of keeping things simple.
Nested configurations can over complicate a setup and makes it hard to troubleshoot when issues occur.

Specifying Custom Configuration

If we run on Linux, we need to ensure that the postgres user which has a user ID of 999 by default, should have access to the configuration files.

sudo chown 999:999 config/postgresql.conf
sudo chown 999:999 config/pg_hba.conf
sudo chown 999:999 config/pg_ident.conf

There is another important gotcha here.
The PGDATA variable tells PostgreSQL where our data directory is.
Similarly, we've learnt that our configuration file also has data_directory which tells PostgreSQL the same.

However, the latter is only read by PostgreSQL after initialization has occurred.
PostgreSQL's initialization phase sets up directory permissions on the data directory.
If we leave out PGDATA, then we will get errors that the data directory is invalid.
Hence PGDATA is important here.

Running our PostgreSQL

Finally, we can run our database with our custom configuration files:

docker run -it --rm --name postgres `
-e POSTGRES_USER=postgresadmin `
-e POSTGRES_PASSWORD=admin123 `
-e POSTGRES_DB=postgresdb `
-e PGDATA="/data" `
-v ${PWD}/pgdata:/data `
-v ${PWD}/config:/config `
-p 5000:5432 `
postgres:15.0 -c 'config_file=/config/postgresql.conf'

That's it for chapter two!
In chapter 3, we will take a look at Replication and how to replicate our data to another PostgreSQL instance for better availability.