When to Use Logical vs Physical Replication for Read Scaling
Operational question: Your read traffic is outpacing the primary and you need to add replica capacity β but your DBA is asking whether to set wal_level = replica or wal_level = logical. The wrong choice locks you into an upgrade path you cannot easily change without downtime.
Physical replication copies WAL segments byte-for-byte, maintaining exact binary parity between primary and standby nodes. Logical replication decodes those same WAL records at the row-and-transaction level, enabling schema divergence, selective table sync, and routing to heterogeneous target engines. Both protocols can scale reads, but they expose different failure surfaces and impose different operational constraints.
Symptom Identification
Look for these signals when deciding which protocol is appropriate β or when diagnosing a misconfigured replica fleet:
- Read queries time out despite normal primary CPU and memory metrics, suggesting the bottleneck is read capacity, not compute.
pg_stat_replicationshowsstate = streamingbutwrite_lagorreplay_lagis continuously growing β the standby is falling behind under read-heavy WAL generation.- A logical subscriberβs health check is mis-wired:
SELECT pg_is_in_recovery()always returnsfalseon a subscriber, causing load balancers to treat it as a primary. pg_replication_slotsshowsactive = falseandrestart_lsnfrozen in the past β a disconnected slot is retaining WAL and disk is filling on the primary.- Schema migration on the primary breaks a logical subscriber:
ALTER TABLEthat is not replicated causes subscriber apply errors logged inpg_stat_subscription.last_error_msg.
Root Cause Analysis
Why physical replication is simple but inflexible
Physical standbys replay WAL records as raw binary blocks. Because they apply exactly the same byte sequence the primary wrote, the standby must run the same PostgreSQL major version and the same CPU architecture. There is no decoding overhead β the WAL receiver writes blocks directly to the data directory. This makes physical replication the lowest-latency, highest-throughput path for full-cluster read scaling.
The rigidity is also the constraint: you cannot run a physical standby on PostgreSQL 15 against a PostgreSQL 16 primary, and you cannot replicate only selected tables to reduce replica disk footprint.
Why logical replication is flexible but higher overhead
Logical replication runs a WAL decoder process (pgoutput plugin by default) on the primary. The decoder reads WAL and reconstructs row-level INSERT / UPDATE / DELETE events, which the subscriber applies via SQL. Because the wire format is row-level rather than binary blocks, the subscriber can be a different major version, run on a different OS, or be a non-PostgreSQL engine.
The overhead is real: decoding is CPU-bound on the primary, each UPDATE and DELETE on a table without a primary key writes the full old row into WAL (REPLICA IDENTITY FULL), and replication slots retain WAL until the subscriber confirms receipt β a stalled subscriber can exhaust primary disk.
Replication lag under logical replication is measured differently from physical: pg_stat_subscription.latest_end_time reports when the subscriber last confirmed a batch, not a byte offset.
Protocol Decision Matrix
Map your workload to the right protocol before provisioning infrastructure.
| Workload Characteristic | Recommended Protocol | Key Constraint |
|---|---|---|
| Full-cluster read scaling, identical PostgreSQL versions | Physical | wal_level = replica; same major version required |
| Partial table sync, reporting subset only | Logical | wal_level = logical; explicit publication table list |
| Cross-version upgrade with zero-downtime cutover | Logical | Test schema compatibility; no DDL replication by default |
| Cross-engine routing (Postgres β analytics target) | Logical | Target must implement logical replication protocol or use pglogical |
| Sub-second lag, high-throughput OLTP | Physical | Lower decoding overhead; pair with synchronous_commit = remote_write if RPO matters |
| Schema evolution on replicas (different indexes, partitioning) | Logical | Subscriber schema managed independently |
Architecture: How the Two Protocols Diverge
The diagram below shows where the WAL stream splits between physical and logical standbys, and how connection routing differs for each.
Key takeaway from the diagram: logical subscribers do not enter recovery mode β they run as independent primaries. Health checks must use pg_stat_subscription, not pg_is_in_recovery(). If you wire a logical subscriber into a load balancer using the physical standby health check, the router will classify it as a primary and may route writes to it.
Step-by-Step Configuration Runbook
Step 1 β Audit schema for primary keys (logical replication only)
Tables lacking an explicit primary key or unique constraint must be configured before replication starts. Find affected tables:
SELECT schemaname, tablename
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
AND tablename NOT IN (
SELECT tc.table_name
FROM information_schema.table_constraints tc
WHERE tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE')
);For each table without a primary key, set replica identity:
ALTER TABLE <table> REPLICA IDENTITY FULL;REPLICA IDENTITY FULL writes the complete old row to WAL on every UPDATE and DELETE. On tables with wide rows or high update rates, this meaningfully increases WAL volume β add indexes on frequently-filtered columns to offset the decode cost on the subscriber.
Inline verification:
SELECT relname, relreplident
FROM pg_class
WHERE relkind = 'r' AND relreplident != 'd';
-- 'f' = FULL, 'd' = DEFAULT (primary key), 'n' = NOTHING, 'i' = INDEXStep 2 β Configure postgresql.conf parameters
Both protocols share some parameters; logical requires a higher wal_level:
# postgresql.conf β primary node
wal_level = logical # 'replica' suffices for physical-only; 'logical' is a superset
max_wal_senders = 12 # num_physical_standbys + num_logical_subs + 2 headroom
max_replication_slots = 10 # one slot per standby/subscriber + 2 headroom
wal_keep_size = 1GB # retain WAL during brief subscriber disconnectionChanges to wal_level, max_wal_senders, and max_replication_slots require a PostgreSQL restart.
Inline verification after restart:
SHOW wal_level;
SHOW max_wal_senders;
SHOW max_replication_slots;Step 3 β Initialize the replication stream
Physical (streaming replication):
pg_basebackup \
-h primary_host \
-D /var/lib/postgresql/data \
--wal-method=stream \
-R \ # writes standby.signal + primary_conninfo automatically
-P \ # progress reporting
-vThe -R flag creates standby.signal and writes primary_conninfo to postgresql.auto.conf. Start the standby with pg_ctl start after the base backup completes.
Logical (publication / subscription):
-- On the primary: define which tables to publish
CREATE PUBLICATION read_scaling_pub
FOR TABLE users, orders, products
WITH (publish = 'insert, update, delete');
-- On the subscriber: connect and subscribe
CREATE SUBSCRIPTION read_scaling_sub
CONNECTION 'host=primary_host port=5432 dbname=app user=replicator password=secret'
PUBLICATION read_scaling_pub;The subscriber copies initial table data then streams ongoing changes. Track initial sync completion:
-- On the subscriber
SELECT subname, srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel
JOIN pg_subscription ON srsubid = oid;
-- srsubstate 'r' = ready (initial sync complete), 'i' = initializing, 'd' = data-copyStep 4 β Configure connection routing
Deploy connection pool routing with protocol-appropriate health checks.
PgBouncer β physical standbys:
[databases]
app_primary = host=primary_host port=5432 dbname=app
app_read = host=standby-lb.internal port=5432 dbname=app
[pgbouncer]
pool_mode = transaction
server_check_query = SELECT pg_is_in_recovery()
server_check_delay = 10PgBouncer β logical subscribers:
Logical subscribers always return false for pg_is_in_recovery(), so use a lag-based check instead:
server_check_query = SELECT CASE WHEN extract(epoch from (now() - max(latest_end_time))) < 30 THEN 1 ELSE NULL END FROM pg_stat_subscriptionA NULL return causes PgBouncer to mark the backend unhealthy. Tune the < 30 threshold to match your replication lag SLA thresholds.
Inline verification:
-- Physical standby lag (bytes behind primary)
SELECT
pid,
client_addr,
state,
write_lag,
flush_lag,
replay_lag,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
-- Logical subscriber lag (seconds since last confirmed batch)
SELECT
subname,
extract(epoch from (now() - latest_end_time)) AS lag_seconds,
last_error_msg
FROM pg_stat_subscription;Step 5 β Validate end-to-end
Write a row on the primary and confirm it appears on the replica within your expected lag window:
-- On primary
INSERT INTO orders (id, status, created_at)
VALUES (gen_random_uuid(), 'test', now());
-- On replica (physical or logical): confirm row is present
SELECT id, status, created_at FROM orders ORDER BY created_at DESC LIMIT 1;Verification and Rollback
Confirming the replication is healthy
-- Physical standby: confirm lag_bytes is trending toward 0
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;
-- Logical subscriber: confirm no error messages and lag is within SLA
SELECT subname, latest_end_lsn, last_error_msg FROM pg_stat_subscription;
-- Both: confirm replication slots are active and not retaining excessive WAL
SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_bytes
FROM pg_replication_slots;A slot with active = false and growing retained_bytes is a disk-exhaustion risk. Drop inactive slots that no longer have a live subscriber:
SELECT pg_drop_replication_slot('slot_name');
-- Warning: drops undelivered changes. Confirm the downstream subscriber is fully decommissioned first.Rollback: reverting a physical standby to standalone
- Stop the standby:
pg_ctl stop -D /var/lib/postgresql/data -m fast - Remove recovery configuration: delete
standby.signal; removeprimary_conninfofrompostgresql.auto.conf(PostgreSQL 12+). On PostgreSQL 11 and earlier, removerecovery.conf. - Rebuild from a fresh base backup if consistency is in question:
rm -rf /var/lib/postgresql/data/*
pg_basebackup -h primary_host -D /var/lib/postgresql/data \
--wal-method=stream -R -P -v- Confirm clean state after restart:
SELECT
pg_last_wal_replay_lsn() AS replay_lsn,
pg_last_wal_receive_lsn() AS receive_lsn,
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;
-- lag_bytes = 0 before re-enabling automated read routingRollback: removing a logical subscription
-- On the subscriber
DROP SUBSCRIPTION read_scaling_sub;
-- On the primary: confirm the slot was cleaned up
SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'read_scaling_sub';
-- Should return no rows. If it does, drop it manually.
SELECT pg_drop_replication_slot('read_scaling_sub');Edge Cases and Gotchas
Cascading physical replication and wal_level
If you run a cascading standby (a physical standby that itself has physical standbys), wal_level on the intermediate node must also be set to replica (or logical). A cascading node set to minimal cannot forward WAL to downstream standbys. The intermediate node must have its own max_wal_senders headroom for the downstream standbys it serves.
Schema migrations on logical replication
Logical replication does not replicate DDL. When you run ALTER TABLE ... ADD COLUMN on the primary, you must run the same DDL on the subscriber manually before the corresponding DML arrives β otherwise the subscriber apply worker exits with a column mismatch error. A safe sequence is:
- Run the DDL on the subscriber.
- Run the DDL on the primary.
- Confirm
pg_stat_subscription.last_error_msgis null.
For NOT NULL columns, add the column as nullable first, backfill, then apply the constraint.
Multi-region latency amplifying logical decoding overhead
In a multi-region read replica topology, the round-trip from primary to distant subscriber increases the time between a change being decoded and the subscriber confirming receipt. This lengthens the window during which the primaryβs replication slot retains WAL. Set wal_keep_size conservatively high (2 GB or more) in multi-region logical setups, and monitor retained_bytes per slot continuously. A slot in a distant region that loses connectivity for more than a few minutes can trigger disk pressure on the primary.
Operational Checklist
FAQ
Can I use logical replication without primary keys?
Yes, but you must run ALTER TABLE <table> REPLICA IDENTITY FULL. Without it, UPDATE and DELETE statements cannot be replicated correctly β the subscriber cannot identify which row to modify. REPLICA IDENTITY FULL writes the entire old row into WAL on every update, which significantly increases WAL volume on busy tables. Prefer adding a proper primary key where the application schema allows it.
Does pg_is_in_recovery() work on logical replication subscribers?
No. A logical subscriber is an independent primary instance; pg_is_in_recovery() always returns false. Using it as the health check for a logical subscriber will cause load balancers to classify the subscriber as a writable primary. Use a custom lag query against pg_stat_subscription.latest_end_time for subscriber health checks.
Can physical and logical standbys coexist on the same primary?
Yes. Set wal_level = logical (a superset of replica) and provision both pg_basebackup standbys and logical subscriptions simultaneously. Each consumes its own replication slot and WAL sender. The combined slot count must stay within max_replication_slots.
Related
β Back to Understanding Synchronous vs Asynchronous Replication
- How to Calculate Replication Lag Thresholds for SLA Compliance β pair lag measurement queries with the routing thresholds configured here
- Designing Multi-Region Read Replica Topologies β how physical and logical replication interact with cross-AZ latency and failover topology
- Connection Pool Architecture for Read Replicas β size PgBouncer pools correctly for replica fleets mixing physical and logical targets
- Detecting and Handling Replication Lag in Real Time β production monitoring patterns for both protocol types