← Back to Database Replication Fundamentals & Architecture
Synchronous vs Asynchronous Database Replication
Problem Framing
Every production read replica deployment forces a binary choice at the commit path: does the primary wait for the replica to acknowledge receipt of the Write-Ahead Log (WAL) record before returning success to the client, or does it return immediately and let replication trail behind? The wrong answer costs either write throughput or data durability — and both failures show up during incidents, not benchmarks.
Synchronous replication eliminates Recovery Point Objective (RPO) at the cost of write latency that scales linearly with network round-trip time. Asynchronous replication maximises write throughput at the cost of potential data loss equal to the current replication lag at the moment of primary failure. Most production systems do not pick one globally — they use synchronous commit for the primary standby that anchors failover and asynchronous commit for read-serving replicas that handle analytical load.
Getting this architecture wrong exposes you to three concrete SLA risks:
- Silent data loss — async primary failover promotes a standby that has not applied the last N transactions.
- Write stall — a synchronous primary whose only synchronous standby becomes unreachable blocks all commits until the connection times out or a DBA intervenes.
- Stale read violations — routing queries to async replicas without lag-aware session pinning exposes users to read-your-writes anomalies.
Concept Definition and Scope
Synchronous replication means the primary’s commit protocol requires at least one replica to write (and optionally flush) the WAL record to disk before the primary acknowledges the transaction. PostgreSQL implements this via synchronous_commit and synchronous_standby_names. The replica does not need to have applied (replayed) the record — only written it to its WAL receive buffer — so the guarantee is durability at the replica, not visibility.
Asynchronous replication means the primary writes the WAL locally, acknowledges the client, and streams the record to standbys as a background operation. The replica may lag anywhere from milliseconds to minutes depending on write throughput, network bandwidth, and disk I/O on the standby.
Adjacent concepts to distinguish:
- Replication mode (sync vs async) is a commit-protocol setting, not a physical topology setting. You can have a physical streaming replica that receives data synchronously.
- Logical vs physical replication describes the WAL decoding layer, not the acknowledgment timing. Logical replication can be synchronous or asynchronous.
- Replica lag is the measured delay between primary WAL position and replica apply position — the observable symptom of async replication under load.
Commit Path: How Each Mode Works Internally
Synchronous commit flow
The diagram below shows the WAL acknowledgment path for a synchronous standby. The primary’s backend process blocks at the XLogFlush barrier until the standby’s WAL receiver confirms it has written (or flushed, depending on synchronous_commit level) the record.
The primary holds open the client connection and blocks new commit acknowledgments for subsequent transactions until the standby LSN advances past the current record. Under normal conditions this adds one network RTT plus the standby’s fsync latency to each write.
Asynchronous commit flow
In async mode the primary writes WAL locally, increments its commit LSN, and immediately returns success to the client. The WAL sender process streams records to standbys in the background. The standby may be anywhere from zero to many seconds behind the primary LSN — the gap is the current replication lag.
The critical implication: if the primary crashes between the client’s SUCCESS and the standby applying the record, those transactions are lost on promotion.
Trade-Off Comparison Table
| Dimension | Synchronous | Asynchronous |
|---|---|---|
| RPO on primary failure | Zero (committed = durable on standby) | Non-zero: equals lag at crash time |
| Write latency added | +1 network RTT + standby fsync | Near-zero (local WAL write only) |
| Write throughput | Reduced under high concurrency | Full primary throughput |
| Stale read risk | None (replica is always current) | Present: depends on lag |
| Primary write stall risk | High if standby disconnects without quorum fallback | None |
| Operational complexity | Higher: quorum config, network dependency | Lower: simpler config surface |
| Best-fit workload | Financial transactions, inventory writes, auth tokens | Analytics, reporting, cache warm-up, bulk reads |
| Cross-region deployment | Costly: RTT of 30–150 ms adds directly to p99 write latency | Cost-free: lag absorbs RTT without blocking |
Configuration Runbook
PostgreSQL: synchronous commit modes
PostgreSQL exposes five synchronous_commit levels that trade durability scope for latency:
# postgresql.conf
# Level: on (default)
# Primary and standby both flush WAL before ack. Zero RPO.
synchronous_commit = on
# Level: remote_write
# Standby has written WAL to OS buffer (not flushed to disk).
# Lower latency than 'on'; standby OS crash can lose data.
# synchronous_commit = remote_write
# Level: remote_apply
# Standby has also replayed the WAL (data is visible on standby).
# Highest latency; required for synchronous hot-standby reads.
# synchronous_commit = remote_apply
# Level: local
# Only the primary flushes; standby is effectively async.
# synchronous_commit = local
# Level: off
# Neither primary nor standby flushes before ack. Maximum throughput.
# Risk: up to wal_writer_delay ms of data loss on primary crash.
# synchronous_commit = off
# Quorum standby selection — requires ANY or FIRST keyword:
# ANY 1: ack from any one of the listed standbys satisfies the quorum.
# FIRST 1: only the first (highest-priority) standby in the list.
synchronous_standby_names = 'ANY 1 (replica_east, replica_west)'
# Timeouts: when a standby goes silent, the sender marks it stale.
wal_receiver_timeout = 60s
wal_sender_timeout = 60sFailure-state annotation: If all standbys listed in synchronous_standby_names disconnect and ANY quorum cannot be met, the primary blocks all new commits. To prevent a complete write stall:
- Always configure
ANY Nwith N+1 standbys so one can fail without losing quorum. - Add at least one async standby outside
synchronous_standby_namesto absorb reads during a sync standby outage.
PostgreSQL: asynchronous replication baseline
# postgresql.conf — asynchronous read-replica baseline
synchronous_commit = local # Primary flushes; replicas are async
wal_level = replica # Minimum level needed for streaming replication
max_wal_senders = 10 # Headroom for replicas + monitoring connections
wal_keep_size = 1GB # Buffer for slow replicas; prevents slot WAL deletion
max_replication_slots = 10
# Tune wal_sender process buffering for high-throughput writes:
wal_sender_timeout = 60sProxySQL: lag-aware read routing (MySQL)
-- ProxySQL routing configuration
-- Hostgroup 10 = primary (writes + strict-consistency reads)
-- Hostgroup 20 = async read pool (analytics, reporting)
INSERT INTO mysql_servers
(hostgroup_id, hostname, port, max_connections, max_replication_lag, comment)
VALUES
(10, 'primary.db.internal', 3306, 500, 0, 'Primary — writes + sync reads'),
(20, 'async-read-01.db.internal',3306, 1000, 5, 'Async — analytics'),
(20, 'async-read-02.db.internal',3306, 1000, 5, 'Async — analytics');
-- Route SELECTs to the async pool; writes to primary
INSERT INTO mysql_query_rules
(rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
(100, 1, '^SELECT.*', 20, 1),
(101, 1, '^(INSERT|UPDATE|DELETE).*', 10, 1);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;max_replication_lag (seconds): ProxySQL automatically removes endpoints from the read pool when lag exceeds this threshold. When the entire async pool is lagging, reads fall back to the primary — watch for primary CPU spikes as the signal.
PgBouncer: circuit-breaker via connection limits
PgBouncer has no native circuit-breaker, but connection caps at the pgbouncer layer protect the primary during replica catch-up storms:
; pgbouncer.ini
[databases]
primary_db = host=primary port=5432 dbname=app pool_mode=transaction
async_read = host=replica01 port=5432 dbname=app pool_mode=transaction
[pgbouncer]
max_client_conn = 2000
default_pool_size = 50
server_idle_timeout = 30 ; reclaim idle backend connections quickly
server_connect_timeout = 5 ; fail fast on replica network flaps
server_lifetime = 3600 ; recycle long-lived connections to prevent resource leaksImplement true circuit breaking at the application or sidecar layer (Envoy, Resilience4j, or the Python middleware below). PgBouncer enforces capacity ceilings; your application code enforces routing logic.
Application-layer read-your-writes routing (Python / SQLAlchemy)
import time
class ConsistencyRouter:
"""
Routes reads to the async replica unless a write has occurred
recently in this session (read-your-writes guarantee).
Falls back to primary if replica lag exceeds the threshold.
"""
def __init__(
self,
primary_engine,
async_engine,
lag_threshold_ms: int = 2000,
pin_window_ms: int = 5000,
):
self.primary = primary_engine
self.async_engine = async_engine
self.lag_threshold = lag_threshold_ms / 1000.0
self.pin_window = pin_window_ms / 1000.0
self._session_pins: dict[str, float] = {}
def get_engine(self, session_id: str, is_write: bool = False):
if is_write:
self._session_pins[session_id] = time.monotonic()
return self.primary
pin_time = self._session_pins.get(session_id, 0.0)
if time.monotonic() - pin_time < self.pin_window:
return self.primary # enforce read-your-writes
if self._replica_lag_seconds() < self.lag_threshold:
return self.async_engine
return self.primary # safety fallback when replica is lagging
def _replica_lag_seconds(self) -> float:
# Query pg_stat_replication.write_lag or a heartbeat table
raise NotImplementedErrorCritical parameters:
pin_window_ms— session pinning window after a write. Target 1.5–2x your 99th-percentile async replica apply latency.lag_threshold_ms— application-side consistency boundary; align with ProxySQL’smax_replication_lag.- Degraded-state: If heavy write load pins all sessions to the primary, connection pool exhaustion follows. Implement a circuit breaker that temporarily disables session pinning and serves mildly stale reads rather than letting the primary saturate.
Monitoring and Alerting Signals
The three metrics that matter most are WAL sender lag, apply delay, and the downstream proxy routing health:
# prometheus/rules/replication.yml
groups:
- name: replication_sla
rules:
# Primary: replica is falling behind the commit LSN
- alert: ReplicaWriteLagExceedsSLA
expr: pg_replication_lag_seconds > 3.0
for: 2m
labels:
severity: critical
annotations:
summary: "Replica {{ $labels.instance }} write lag > 3 s"
description: >
Automated traffic drain should begin.
Check WAL sender/receiver buffer sizes and disk IOPS on the standby.
# WAL apply throughput drop — sustained disk or CPU constraint on standby
- alert: WALApplyRateDegradation
expr: rate(pg_stat_replication_write_lag_bytes[5m]) < 10485760
for: 5m
labels:
severity: warning
annotations:
summary: "WAL apply rate on {{ $labels.instance }} < 10 MB/s"
description: "Check standby disk IOPS, fsync latency, and wal_receiver_status."
# Sync standby disconnect — primary may enter write-blocking state
- alert: SynchronousStandbyDisconnected
expr: pg_stat_replication_state{sync_state="sync"} == 0
for: 30s
labels:
severity: critical
annotations:
summary: "No synchronous standbys connected to {{ $labels.instance }}"
description: >
If synchronous_standby_names has no async fallback,
the primary will block all commits. Investigate network
and standby process health immediately.Key PostgreSQL diagnostic queries:
-- Current replication state and lag per standby
SELECT
application_name,
sync_state,
state,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS send_lag,
pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn)) AS write_lag,
pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS flush_lag,
pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) AS replay_lag
FROM pg_stat_replication
ORDER BY sync_state DESC;
-- WAL slot retention pressure (async logical replication)
SELECT
slot_name,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY retained_wal DESC;Set health_check_interval on your proxy to 1–2 seconds for high-traffic systems to prevent stale routing decisions. Configure drain_timeout long enough for in-flight queries to complete but short enough to avoid prolonged primary overload.
Failure Modes and Recovery Steps
1. Synchronous primary write stall (standby disconnect)
Root cause: All standbys in synchronous_standby_names become unreachable. The primary blocks new commit acknowledgments indefinitely.
Diagnosis:
-- Check if any sync standbys are connected
SELECT application_name, sync_state, state
FROM pg_stat_replication
WHERE sync_state = 'sync';
-- Empty result means the primary is or will soon be blockedRecovery:
-- Option A: Temporarily degrade to async (accept RPO risk)
ALTER SYSTEM SET synchronous_commit = 'local';
SELECT pg_reload_conf();
-- Option B: Remove the missing standby from the sync list
ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (replica_west)';
SELECT pg_reload_conf();
-- After the standby recovers, restore the original config:
ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (replica_east, replica_west)';
SELECT pg_reload_conf();2. Async replica lag breach (proxy routing fallback overloads primary)
Root cause: The async replica’s apply rate falls below WAL generation rate due to disk I/O saturation or a write spike. ProxySQL removes it from the read pool; all reads route to the primary.
Diagnosis:
# ProxySQL: check which servers are OFFLINE due to lag
mysql -h 127.0.0.1 -P 6032 -e \
"SELECT hostgroup_id, hostname, status, max_replication_lag, replication_lag
FROM runtime_mysql_servers;"Recovery:
- Increase standby disk IOPS (or reduce
synchronous_commiton the replica side if you are usingremote_apply). - Temporarily raise
max_replication_lagin ProxySQL to allow the replica to re-enter the pool at a higher lag tolerance while it catches up. - Once lag drops below the original threshold, reset
max_replication_lag.
3. Logical replication slot WAL accumulation (disk-full risk)
Root cause: A logical replication subscriber falls behind or disconnects. PostgreSQL retains all WAL since the slot’s confirmed_flush_lsn, eventually filling the data volume.
Prevention config:
# postgresql.conf (PostgreSQL 13+)
max_slot_wal_keep_size = 10GB # Invalidate slot rather than fill diskRecovery:
-- Identify the bloated slot
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn))
FROM pg_replication_slots ORDER BY 3 DESC;
-- If the subscriber is gone, drop the slot immediately
SELECT pg_drop_replication_slot('slot_name_here');
-- Restore the subscriber from a fresh base backup rather than
-- letting it catch up — catching up from a large WAL backlog
-- generates sustained I/O pressure on the primary.4. Read-your-writes violation (async replica serves stale data post-write)
Root cause: Session pinning window expired before the replica applied the write, or the pin was never implemented.
Diagnosis: Users report seeing old data immediately after saving. Correlate with pg_stat_replication.replay_lsn vs the transaction’s commit LSN.
Recovery:
- Extend
pin_window_msin the application router to cover the measured 99th-percentile apply latency. - For critical paths, force reads to the primary using a
/*+ read_primary */hint or an explicit connection-selection flag in your ORM. - Consider managing sticky sessions in distributed database reads for session-scoped primary affinity.
5. Logical replication transactional ordering conflict
Root cause: logical_decoding_work_mem is undersized for the write throughput, causing WAL decoding to spill to disk. The subscriber’s apply latency spikes during bulk writes.
Fix:
# postgresql.conf on the publisher
logical_decoding_work_mem = 256MB # Default 64 MB is insufficient for bulk loadsVerify the change under load: SHOW logical_decoding_work_mem; on the publisher, then watch pg_stat_replication.write_lag during the next bulk insert window.
Protocol Selection: Physical vs Logical Replication for Read Scaling
The replication mode (sync vs async) is orthogonal to the replication protocol (physical vs logical). Choosing the right protocol affects how you design multi-region read replica topologies and which consistency models for distributed reads you can guarantee.
Physical replication (streaming) copies the raw WAL byte-for-byte. The standby must be on the same PostgreSQL major version and identical platform. Lower CPU overhead; no decoding latency. Works with synchronous and asynchronous commit. Best for HA standbys and full-instance read replicas.
Logical replication decodes WAL into a row-change stream. Supports selective table replication, cross-version upgrades, and schema divergence between publisher and subscriber. Higher serialization CPU; requires careful slot retention management.
-- PostgreSQL logical replication setup
-- On the publisher (primary):
ALTER SYSTEM SET wal_level = 'logical';
SELECT pg_reload_conf(); -- requires restart if changing from 'replica'
-- Publish only the tables needed for read-scaling
CREATE PUBLICATION read_scaling_pub FOR TABLE users, orders, sessions;
-- On the subscriber (analytics replica):
CREATE SUBSCRIPTION read_scaling_sub
CONNECTION 'host=primary port=5432 dbname=app user=replicator password=secret'
PUBLICATION read_scaling_pub
WITH (copy_data = true, create_slot = true);See when to use logical vs physical replication for read scaling for a full decision matrix including cross-version migration paths and heterogeneous topology patterns.
Child Pages
- How to calculate replication lag thresholds for SLA compliance — WAL generation rate analysis, network RTT measurement, and the formula for setting business-safe lag budgets.
- When to use logical vs physical replication for read scaling — decision matrix for protocol selection including cross-version upgrade paths and slot retention trade-offs.
FAQ
Does synchronous replication guarantee zero data loss on primary failure?
Yes — when synchronous_commit = on and at least one synchronous standby is connected, every committed transaction has been flushed to the standby WAL before the primary returns success. If the primary crashes immediately after that acknowledgment, the standby holds all committed data.
What happens to writes when the synchronous standby becomes unreachable?
Without a quorum fallback in synchronous_standby_names, the primary blocks all new commits indefinitely. Configure ANY 1 (replica_a, replica_b) so the primary can satisfy the quorum from whichever standby remains reachable, or add at least one asynchronous standby to keep reads available during a sync standby outage.
How large should the session pin window be for read-your-writes consistency?
Set pin_window_ms to 1.5–2x the 99th-percentile replica apply latency measured under peak write throughput. Too short causes stale reads; too long routes all reads to the primary and starves replicas. Monitor with pg_stat_replication.write_lag and adjust the window weekly during growth phases.
When should you choose logical replication over physical replication for read scaling?
Choose logical replication when you need to replicate a subset of tables, support schema divergence between primary and replica, or run the replica on a different PostgreSQL major version. Use physical replication when you need the lowest possible CPU overhead and full-instance consistency — it has lower serialization latency and simpler slot management.
Related
- ← Database Replication Fundamentals & Architecture — parent section covering WAL internals, topology patterns, and read-scaling decisions.
- Designing Multi-Region Read Replica Topologies — network RTT constraints that determine whether synchronous cross-region replication is viable.
- Evaluating Consistency Models for Distributed Reads — session-consistency guarantees, monotonic reads, and how replication mode shapes your consistency level options.
- Detecting and Handling Replication Lag in Real Time — instrumentation patterns and alerting strategies for the lag produced by async replication.
- Implementing Read/Write Splitting at the Proxy Layer — ProxySQL and PgBouncer configurations that enforce the routing topology described on this page.