When to Use Logical vs Physical Replication for Read Scaling

Operational question: Your read traffic is outpacing the primary and you need to add replica capacity β€” but your DBA is asking whether to set wal_level = replica or wal_level = logical. The wrong choice locks you into an upgrade path you cannot easily change without downtime.

Physical replication copies WAL segments byte-for-byte, maintaining exact binary parity between primary and standby nodes. Logical replication decodes those same WAL records at the row-and-transaction level, enabling schema divergence, selective table sync, and routing to heterogeneous target engines. Both protocols can scale reads, but they expose different failure surfaces and impose different operational constraints.


Symptom Identification

Look for these signals when deciding which protocol is appropriate β€” or when diagnosing a misconfigured replica fleet:

  • Read queries time out despite normal primary CPU and memory metrics, suggesting the bottleneck is read capacity, not compute.
  • pg_stat_replication shows state = streaming but write_lag or replay_lag is continuously growing β€” the standby is falling behind under read-heavy WAL generation.
  • A logical subscriber’s health check is mis-wired: SELECT pg_is_in_recovery() always returns false on a subscriber, causing load balancers to treat it as a primary.
  • pg_replication_slots shows active = false and restart_lsn frozen in the past β€” a disconnected slot is retaining WAL and disk is filling on the primary.
  • Schema migration on the primary breaks a logical subscriber: ALTER TABLE that is not replicated causes subscriber apply errors logged in pg_stat_subscription.last_error_msg.

Root Cause Analysis

Why physical replication is simple but inflexible

Physical standbys replay WAL records as raw binary blocks. Because they apply exactly the same byte sequence the primary wrote, the standby must run the same PostgreSQL major version and the same CPU architecture. There is no decoding overhead β€” the WAL receiver writes blocks directly to the data directory. This makes physical replication the lowest-latency, highest-throughput path for full-cluster read scaling.

The rigidity is also the constraint: you cannot run a physical standby on PostgreSQL 15 against a PostgreSQL 16 primary, and you cannot replicate only selected tables to reduce replica disk footprint.

Why logical replication is flexible but higher overhead

Logical replication runs a WAL decoder process (pgoutput plugin by default) on the primary. The decoder reads WAL and reconstructs row-level INSERT / UPDATE / DELETE events, which the subscriber applies via SQL. Because the wire format is row-level rather than binary blocks, the subscriber can be a different major version, run on a different OS, or be a non-PostgreSQL engine.

The overhead is real: decoding is CPU-bound on the primary, each UPDATE and DELETE on a table without a primary key writes the full old row into WAL (REPLICA IDENTITY FULL), and replication slots retain WAL until the subscriber confirms receipt β€” a stalled subscriber can exhaust primary disk.

Replication lag under logical replication is measured differently from physical: pg_stat_subscription.latest_end_time reports when the subscriber last confirmed a batch, not a byte offset.


Protocol Decision Matrix

Map your workload to the right protocol before provisioning infrastructure.

Workload Characteristic Recommended Protocol Key Constraint
Full-cluster read scaling, identical PostgreSQL versions Physical wal_level = replica; same major version required
Partial table sync, reporting subset only Logical wal_level = logical; explicit publication table list
Cross-version upgrade with zero-downtime cutover Logical Test schema compatibility; no DDL replication by default
Cross-engine routing (Postgres β†’ analytics target) Logical Target must implement logical replication protocol or use pglogical
Sub-second lag, high-throughput OLTP Physical Lower decoding overhead; pair with synchronous_commit = remote_write if RPO matters
Schema evolution on replicas (different indexes, partitioning) Logical Subscriber schema managed independently

Architecture: How the Two Protocols Diverge

The diagram below shows where the WAL stream splits between physical and logical standbys, and how connection routing differs for each.

Logical vs Physical Replication Data Flow Diagram showing WAL flowing from primary PostgreSQL through two paths: physical WAL streaming to physical standbys, and WAL logical decoding to logical subscribers, with a connection router distributing reads to both pools. Primary PostgreSQL WAL writer WAL stream WAL level: logical binary blocks row-level events Physical Standby pg_is_in_recovery() = true Logical Subscriber pg_is_in_recovery() = false Connection Router PgBouncer / HAProxy / ProxySQL replay_lag check pg_stat_subscription check

Key takeaway from the diagram: logical subscribers do not enter recovery mode β€” they run as independent primaries. Health checks must use pg_stat_subscription, not pg_is_in_recovery(). If you wire a logical subscriber into a load balancer using the physical standby health check, the router will classify it as a primary and may route writes to it.


Step-by-Step Configuration Runbook

Step 1 β€” Audit schema for primary keys (logical replication only)

Tables lacking an explicit primary key or unique constraint must be configured before replication starts. Find affected tables:

sql
SELECT schemaname, tablename
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
  AND tablename NOT IN (
    SELECT tc.table_name
    FROM information_schema.table_constraints tc
    WHERE tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE')
  );

For each table without a primary key, set replica identity:

sql
ALTER TABLE <table> REPLICA IDENTITY FULL;

REPLICA IDENTITY FULL writes the complete old row to WAL on every UPDATE and DELETE. On tables with wide rows or high update rates, this meaningfully increases WAL volume β€” add indexes on frequently-filtered columns to offset the decode cost on the subscriber.

Inline verification:

sql
SELECT relname, relreplident
FROM pg_class
WHERE relkind = 'r' AND relreplident != 'd';
-- 'f' = FULL, 'd' = DEFAULT (primary key), 'n' = NOTHING, 'i' = INDEX

Step 2 β€” Configure postgresql.conf parameters

Both protocols share some parameters; logical requires a higher wal_level:

ini
# postgresql.conf β€” primary node
wal_level = logical          # 'replica' suffices for physical-only; 'logical' is a superset
max_wal_senders = 12         # num_physical_standbys + num_logical_subs + 2 headroom
max_replication_slots = 10   # one slot per standby/subscriber + 2 headroom
wal_keep_size = 1GB          # retain WAL during brief subscriber disconnection

Changes to wal_level, max_wal_senders, and max_replication_slots require a PostgreSQL restart.

Inline verification after restart:

sql
SHOW wal_level;
SHOW max_wal_senders;
SHOW max_replication_slots;

Step 3 β€” Initialize the replication stream

Physical (streaming replication):

bash
pg_basebackup \
  -h primary_host \
  -D /var/lib/postgresql/data \
  --wal-method=stream \
  -R \            # writes standby.signal + primary_conninfo automatically
  -P \            # progress reporting
  -v

The -R flag creates standby.signal and writes primary_conninfo to postgresql.auto.conf. Start the standby with pg_ctl start after the base backup completes.

Logical (publication / subscription):

sql
-- On the primary: define which tables to publish
CREATE PUBLICATION read_scaling_pub
  FOR TABLE users, orders, products
  WITH (publish = 'insert, update, delete');

-- On the subscriber: connect and subscribe
CREATE SUBSCRIPTION read_scaling_sub
  CONNECTION 'host=primary_host port=5432 dbname=app user=replicator password=secret'
  PUBLICATION read_scaling_pub;

The subscriber copies initial table data then streams ongoing changes. Track initial sync completion:

sql
-- On the subscriber
SELECT subname, srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel
JOIN pg_subscription ON srsubid = oid;
-- srsubstate 'r' = ready (initial sync complete), 'i' = initializing, 'd' = data-copy

Step 4 β€” Configure connection routing

Deploy connection pool routing with protocol-appropriate health checks.

PgBouncer β€” physical standbys:

ini
[databases]
app_primary = host=primary_host port=5432 dbname=app
app_read    = host=standby-lb.internal port=5432 dbname=app

[pgbouncer]
pool_mode = transaction
server_check_query = SELECT pg_is_in_recovery()
server_check_delay = 10

PgBouncer β€” logical subscribers:

Logical subscribers always return false for pg_is_in_recovery(), so use a lag-based check instead:

ini
server_check_query = SELECT CASE WHEN extract(epoch from (now() - max(latest_end_time))) < 30 THEN 1 ELSE NULL END FROM pg_stat_subscription

A NULL return causes PgBouncer to mark the backend unhealthy. Tune the < 30 threshold to match your replication lag SLA thresholds.

Inline verification:

sql
-- Physical standby lag (bytes behind primary)
SELECT
  pid,
  client_addr,
  state,
  write_lag,
  flush_lag,
  replay_lag,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

-- Logical subscriber lag (seconds since last confirmed batch)
SELECT
  subname,
  extract(epoch from (now() - latest_end_time)) AS lag_seconds,
  last_error_msg
FROM pg_stat_subscription;

Step 5 β€” Validate end-to-end

Write a row on the primary and confirm it appears on the replica within your expected lag window:

sql
-- On primary
INSERT INTO orders (id, status, created_at)
VALUES (gen_random_uuid(), 'test', now());

-- On replica (physical or logical): confirm row is present
SELECT id, status, created_at FROM orders ORDER BY created_at DESC LIMIT 1;

Verification and Rollback

Confirming the replication is healthy

sql
-- Physical standby: confirm lag_bytes is trending toward 0
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;

-- Logical subscriber: confirm no error messages and lag is within SLA
SELECT subname, latest_end_lsn, last_error_msg FROM pg_stat_subscription;

-- Both: confirm replication slots are active and not retaining excessive WAL
SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_bytes
FROM pg_replication_slots;

A slot with active = false and growing retained_bytes is a disk-exhaustion risk. Drop inactive slots that no longer have a live subscriber:

sql
SELECT pg_drop_replication_slot('slot_name');
-- Warning: drops undelivered changes. Confirm the downstream subscriber is fully decommissioned first.

Rollback: reverting a physical standby to standalone

  1. Stop the standby: pg_ctl stop -D /var/lib/postgresql/data -m fast
  2. Remove recovery configuration: delete standby.signal; remove primary_conninfo from postgresql.auto.conf (PostgreSQL 12+). On PostgreSQL 11 and earlier, remove recovery.conf.
  3. Rebuild from a fresh base backup if consistency is in question:
bash
rm -rf /var/lib/postgresql/data/*
pg_basebackup -h primary_host -D /var/lib/postgresql/data \
  --wal-method=stream -R -P -v
  1. Confirm clean state after restart:
sql
SELECT
  pg_last_wal_replay_lsn() AS replay_lsn,
  pg_last_wal_receive_lsn() AS receive_lsn,
  pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;
-- lag_bytes = 0 before re-enabling automated read routing

Rollback: removing a logical subscription

sql
-- On the subscriber
DROP SUBSCRIPTION read_scaling_sub;

-- On the primary: confirm the slot was cleaned up
SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'read_scaling_sub';
-- Should return no rows. If it does, drop it manually.
SELECT pg_drop_replication_slot('read_scaling_sub');

Edge Cases and Gotchas

Cascading physical replication and wal_level

If you run a cascading standby (a physical standby that itself has physical standbys), wal_level on the intermediate node must also be set to replica (or logical). A cascading node set to minimal cannot forward WAL to downstream standbys. The intermediate node must have its own max_wal_senders headroom for the downstream standbys it serves.

Schema migrations on logical replication

Logical replication does not replicate DDL. When you run ALTER TABLE ... ADD COLUMN on the primary, you must run the same DDL on the subscriber manually before the corresponding DML arrives β€” otherwise the subscriber apply worker exits with a column mismatch error. A safe sequence is:

  1. Run the DDL on the subscriber.
  2. Run the DDL on the primary.
  3. Confirm pg_stat_subscription.last_error_msg is null.

For NOT NULL columns, add the column as nullable first, backfill, then apply the constraint.

Multi-region latency amplifying logical decoding overhead

In a multi-region read replica topology, the round-trip from primary to distant subscriber increases the time between a change being decoded and the subscriber confirming receipt. This lengthens the window during which the primary’s replication slot retains WAL. Set wal_keep_size conservatively high (2 GB or more) in multi-region logical setups, and monitor retained_bytes per slot continuously. A slot in a distant region that loses connectivity for more than a few minutes can trigger disk pressure on the primary.


Operational Checklist


FAQ

Can I use logical replication without primary keys?

Yes, but you must run ALTER TABLE <table> REPLICA IDENTITY FULL. Without it, UPDATE and DELETE statements cannot be replicated correctly β€” the subscriber cannot identify which row to modify. REPLICA IDENTITY FULL writes the entire old row into WAL on every update, which significantly increases WAL volume on busy tables. Prefer adding a proper primary key where the application schema allows it.

Does pg_is_in_recovery() work on logical replication subscribers?

No. A logical subscriber is an independent primary instance; pg_is_in_recovery() always returns false. Using it as the health check for a logical subscriber will cause load balancers to classify the subscriber as a writable primary. Use a custom lag query against pg_stat_subscription.latest_end_time for subscriber health checks.

Can physical and logical standbys coexist on the same primary?

Yes. Set wal_level = logical (a superset of replica) and provision both pg_basebackup standbys and logical subscriptions simultaneously. Each consumes its own replication slot and WAL sender. The combined slot count must stay within max_replication_slots.


← Back to Understanding Synchronous vs Asynchronous Replication