Pitfalls and quirks of logical replication in Postgres 12

Logical replication, in a nutshell, is a process of “replaying” data changes in another database. The first attempts to implement replication in Postgres – Slony, Bucardo – were logical replication solutions. In contrast to binary (streaming) replication, it offers greater flexibility: data can be replicated selectively – only relevant databases and tables, the replica server remains a fully functional instance that can have its own users, security rules and non-replicated databases, and in some cases the performance is better.

However, the logical replication is not as foolproof as binary replication, and without proper care it can lead to a primary server crash. I’d like to share some thoughts after setting up a logical replication in a large-ish (one terabyte) production database.

Do not proceed without monitoring

Seriously, if you don’t have a robust monitoring solution that will warn you against system abnormalities – especially running out of disk space – set it up and test before any attempt to implement logical replication. This is because logical replication can break silently without any SQL exceptions, cause the WAL files to pile up, fill the entire disk and bring the primary server down. At least two things have to be monitored:

disk usage,
errors in Postgres log.

Only with such early warning system, you will be able to fix any issues before they cause an outage.

Replicate to the same server? Possible, but…

Logical replication can be used to replicate data between databases in a single Postgres cluster. It’s a perfectly valid setup, but it requires special treatment: you have to create logical replication slot first, and with the slot already in place, create a subscription pointing to that slot. If you try to set up replication in a default way – with automatic slot creation – the CREATE SUBSCRIPTION command will hang. No errors or warnings – just a hung statement.

Be careful with PUBLICATION FOR ALL TABLES

Like all previous logical replication solutions, native replication doesn’t replicate data structure changes. If any DDL command is issued (CREATE, ALTER, DROP) it must be executed manually both on primary and replica(s). The FOR ALL TABLES modifier of a CREATE PUBLICATION statement doesn’t change this limitation. Instead, it will cause the replication to stop as soon as you (or your application, or a Postgres extension) issue a CREATE TABLE statement. Also, FOR ALL TABLES will automatically includes any tables created by extensions (like spatial_ref_sys from PostGIS), and tables that don’t have a primary key or replica identity – both cases are problematic.

Spatial is special

The spatial_ref_sys table is a part of PostGIS extension, and it’s populated by CREATE EXTENSION postgis; statement. More often than not, it shouldn’t be replicated, as every PostGIS instance populates it itself. If you have to replicate it (for example, you work with coordinate systems that aren’t part of EPSG registry), remember to TRUNCATE the spatial_ref_sys table on replica before creating subscription.

Review the primary keys carefully

A table eligible for logical replication must have a primary key constraint or replica identity – that’s the rule. It’s nothing new as it was the same with previous logical replication solutions, but its enforcement in native Postgres replication is at least weird. You are allowed to add a table without PK or replica identity to a publication, it won’t cause any error, but… it will block any write activity to it!

You will need to add a PK as soon as possible (or if you can’t afford an exclusive lock for unique index creation time, replica identity full will be just fine, but less performant) to unlock write access.

After adding a new table, refresh publication

It’s not enough to execute “ALTER PUBLICATION … ADD TABLE …” on primary server in order to add a new table to replication. You have to log in into a replica database and execute the following statement:

ALTER SUBSCRIPTION … REFRESH PUBLICATION;

Summary

Logical replication is great and has many use cases: separating transactional and analitical workload, aggregating changes from multiple databases, and so on. It is not as simple as it looks at first glance. Follow the rules and use it with caution, and you should enjoy a fast, flexible data replication.