Back to Database Replication Fundamentals & Architecture

Synchronous vs Asynchronous Database Replication

Problem Framing

Every production read replica deployment forces a binary choice at the commit path: does the primary wait for the replica to acknowledge receipt of the Write-Ahead Log (WAL) record before returning success to the client, or does it return immediately and let replication trail behind? The wrong answer costs either write throughput or data durability — and both failures show up during incidents, not benchmarks.

Synchronous replication eliminates Recovery Point Objective (RPO) at the cost of write latency that scales linearly with network round-trip time. Asynchronous replication maximises write throughput at the cost of potential data loss equal to the current replication lag at the moment of primary failure. Most production systems do not pick one globally — they use synchronous commit for the primary standby that anchors failover and asynchronous commit for read-serving replicas that handle analytical load.

Getting this architecture wrong exposes you to three concrete SLA risks:

  1. Silent data loss — async primary failover promotes a standby that has not applied the last N transactions.
  2. Write stall — a synchronous primary whose only synchronous standby becomes unreachable blocks all commits until the connection times out or a DBA intervenes.
  3. Stale read violationsrouting queries to async replicas without lag-aware session pinning exposes users to read-your-writes anomalies.

Concept Definition and Scope

Synchronous replication means the primary’s commit protocol requires at least one replica to write (and optionally flush) the WAL record to disk before the primary acknowledges the transaction. PostgreSQL implements this via synchronous_commit and synchronous_standby_names. The replica does not need to have applied (replayed) the record — only written it to its WAL receive buffer — so the guarantee is durability at the replica, not visibility.

Asynchronous replication means the primary writes the WAL locally, acknowledges the client, and streams the record to standbys as a background operation. The replica may lag anywhere from milliseconds to minutes depending on write throughput, network bandwidth, and disk I/O on the standby.

Adjacent concepts to distinguish:

  • Replication mode (sync vs async) is a commit-protocol setting, not a physical topology setting. You can have a physical streaming replica that receives data synchronously.
  • Logical vs physical replication describes the WAL decoding layer, not the acknowledgment timing. Logical replication can be synchronous or asynchronous.
  • Replica lag is the measured delay between primary WAL position and replica apply position — the observable symptom of async replication under load.

Commit Path: How Each Mode Works Internally

Synchronous commit flow

The diagram below shows the WAL acknowledgment path for a synchronous standby. The primary’s backend process blocks at the XLogFlush barrier until the standby’s WAL receiver confirms it has written (or flushed, depending on synchronous_commit level) the record.

Synchronous replication commit flow Sequence diagram showing a client sending a COMMIT to the primary, the primary writing WAL and forwarding it to the standby, the standby writing WAL and sending an acknowledgment back, and the primary then returning success to the client. Client Primary Standby COMMIT Write WAL WAL stream Write WAL ACK (LSN) Unblock SUCCESS primary blocked here

The primary holds open the client connection and blocks new commit acknowledgments for subsequent transactions until the standby LSN advances past the current record. Under normal conditions this adds one network RTT plus the standby’s fsync latency to each write.

Asynchronous commit flow

In async mode the primary writes WAL locally, increments its commit LSN, and immediately returns success to the client. The WAL sender process streams records to standbys in the background. The standby may be anywhere from zero to many seconds behind the primary LSN — the gap is the current replication lag.

The critical implication: if the primary crashes between the client’s SUCCESS and the standby applying the record, those transactions are lost on promotion.


Trade-Off Comparison Table

Dimension Synchronous Asynchronous
RPO on primary failure Zero (committed = durable on standby) Non-zero: equals lag at crash time
Write latency added +1 network RTT + standby fsync Near-zero (local WAL write only)
Write throughput Reduced under high concurrency Full primary throughput
Stale read risk None (replica is always current) Present: depends on lag
Primary write stall risk High if standby disconnects without quorum fallback None
Operational complexity Higher: quorum config, network dependency Lower: simpler config surface
Best-fit workload Financial transactions, inventory writes, auth tokens Analytics, reporting, cache warm-up, bulk reads
Cross-region deployment Costly: RTT of 30–150 ms adds directly to p99 write latency Cost-free: lag absorbs RTT without blocking

Configuration Runbook

PostgreSQL: synchronous commit modes

PostgreSQL exposes five synchronous_commit levels that trade durability scope for latency:

ini
# postgresql.conf

# Level: on (default)
# Primary and standby both flush WAL before ack. Zero RPO.
synchronous_commit = on

# Level: remote_write
# Standby has written WAL to OS buffer (not flushed to disk).
# Lower latency than 'on'; standby OS crash can lose data.
# synchronous_commit = remote_write

# Level: remote_apply
# Standby has also replayed the WAL (data is visible on standby).
# Highest latency; required for synchronous hot-standby reads.
# synchronous_commit = remote_apply

# Level: local
# Only the primary flushes; standby is effectively async.
# synchronous_commit = local

# Level: off
# Neither primary nor standby flushes before ack. Maximum throughput.
# Risk: up to wal_writer_delay ms of data loss on primary crash.
# synchronous_commit = off

# Quorum standby selection — requires ANY or FIRST keyword:
# ANY 1: ack from any one of the listed standbys satisfies the quorum.
# FIRST 1: only the first (highest-priority) standby in the list.
synchronous_standby_names = 'ANY 1 (replica_east, replica_west)'

# Timeouts: when a standby goes silent, the sender marks it stale.
wal_receiver_timeout = 60s
wal_sender_timeout   = 60s

Failure-state annotation: If all standbys listed in synchronous_standby_names disconnect and ANY quorum cannot be met, the primary blocks all new commits. To prevent a complete write stall:

  • Always configure ANY N with N+1 standbys so one can fail without losing quorum.
  • Add at least one async standby outside synchronous_standby_names to absorb reads during a sync standby outage.

PostgreSQL: asynchronous replication baseline

ini
# postgresql.conf — asynchronous read-replica baseline

synchronous_commit = local          # Primary flushes; replicas are async
wal_level          = replica        # Minimum level needed for streaming replication
max_wal_senders    = 10             # Headroom for replicas + monitoring connections
wal_keep_size      = 1GB            # Buffer for slow replicas; prevents slot WAL deletion
max_replication_slots = 10

# Tune wal_sender process buffering for high-throughput writes:
wal_sender_timeout = 60s

ProxySQL: lag-aware read routing (MySQL)

sql
-- ProxySQL routing configuration
-- Hostgroup 10 = primary (writes + strict-consistency reads)
-- Hostgroup 20 = async read pool (analytics, reporting)

INSERT INTO mysql_servers
  (hostgroup_id, hostname,                  port, max_connections, max_replication_lag, comment)
VALUES
  (10,           'primary.db.internal',     3306, 500,             0,                  'Primary — writes + sync reads'),
  (20,           'async-read-01.db.internal',3306, 1000,           5,                  'Async — analytics'),
  (20,           'async-read-02.db.internal',3306, 1000,           5,                  'Async — analytics');

-- Route SELECTs to the async pool; writes to primary
INSERT INTO mysql_query_rules
  (rule_id, active, match_digest,                destination_hostgroup, apply)
VALUES
  (100,     1,      '^SELECT.*',                 20,                    1),
  (101,     1,      '^(INSERT|UPDATE|DELETE).*', 10,                    1);

LOAD MYSQL SERVERS TO RUNTIME;   SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;

max_replication_lag (seconds): ProxySQL automatically removes endpoints from the read pool when lag exceeds this threshold. When the entire async pool is lagging, reads fall back to the primary — watch for primary CPU spikes as the signal.

PgBouncer: circuit-breaker via connection limits

PgBouncer has no native circuit-breaker, but connection caps at the pgbouncer layer protect the primary during replica catch-up storms:

ini
; pgbouncer.ini

[databases]
primary_db   = host=primary   port=5432 dbname=app pool_mode=transaction
async_read   = host=replica01 port=5432 dbname=app pool_mode=transaction

[pgbouncer]
max_client_conn       = 2000
default_pool_size     = 50
server_idle_timeout   = 30    ; reclaim idle backend connections quickly
server_connect_timeout = 5    ; fail fast on replica network flaps
server_lifetime       = 3600  ; recycle long-lived connections to prevent resource leaks

Implement true circuit breaking at the application or sidecar layer (Envoy, Resilience4j, or the Python middleware below). PgBouncer enforces capacity ceilings; your application code enforces routing logic.

Application-layer read-your-writes routing (Python / SQLAlchemy)

python
import time

class ConsistencyRouter:
    """
    Routes reads to the async replica unless a write has occurred
    recently in this session (read-your-writes guarantee).
    Falls back to primary if replica lag exceeds the threshold.
    """
    def __init__(
        self,
        primary_engine,
        async_engine,
        lag_threshold_ms: int = 2000,
        pin_window_ms: int = 5000,
    ):
        self.primary      = primary_engine
        self.async_engine = async_engine
        self.lag_threshold = lag_threshold_ms / 1000.0
        self.pin_window    = pin_window_ms / 1000.0
        self._session_pins: dict[str, float] = {}

    def get_engine(self, session_id: str, is_write: bool = False):
        if is_write:
            self._session_pins[session_id] = time.monotonic()
            return self.primary

        pin_time = self._session_pins.get(session_id, 0.0)
        if time.monotonic() - pin_time < self.pin_window:
            return self.primary  # enforce read-your-writes

        if self._replica_lag_seconds() < self.lag_threshold:
            return self.async_engine
        return self.primary     # safety fallback when replica is lagging

    def _replica_lag_seconds(self) -> float:
        # Query pg_stat_replication.write_lag or a heartbeat table
        raise NotImplementedError

Critical parameters:

  • pin_window_ms — session pinning window after a write. Target 1.5–2x your 99th-percentile async replica apply latency.
  • lag_threshold_ms — application-side consistency boundary; align with ProxySQL’s max_replication_lag.
  • Degraded-state: If heavy write load pins all sessions to the primary, connection pool exhaustion follows. Implement a circuit breaker that temporarily disables session pinning and serves mildly stale reads rather than letting the primary saturate.

Monitoring and Alerting Signals

The three metrics that matter most are WAL sender lag, apply delay, and the downstream proxy routing health:

yaml
# prometheus/rules/replication.yml
groups:
  - name: replication_sla
    rules:

      # Primary: replica is falling behind the commit LSN
      - alert: ReplicaWriteLagExceedsSLA
        expr: pg_replication_lag_seconds > 3.0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Replica {{ $labels.instance }} write lag > 3 s"
          description: >
            Automated traffic drain should begin.
            Check WAL sender/receiver buffer sizes and disk IOPS on the standby.

      # WAL apply throughput drop — sustained disk or CPU constraint on standby
      - alert: WALApplyRateDegradation
        expr: rate(pg_stat_replication_write_lag_bytes[5m]) < 10485760
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "WAL apply rate on {{ $labels.instance }} < 10 MB/s"
          description: "Check standby disk IOPS, fsync latency, and wal_receiver_status."

      # Sync standby disconnect — primary may enter write-blocking state
      - alert: SynchronousStandbyDisconnected
        expr: pg_stat_replication_state{sync_state="sync"} == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "No synchronous standbys connected to {{ $labels.instance }}"
          description: >
            If synchronous_standby_names has no async fallback,
            the primary will block all commits. Investigate network
            and standby process health immediately.

Key PostgreSQL diagnostic queries:

sql
-- Current replication state and lag per standby
SELECT
    application_name,
    sync_state,
    state,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))  AS send_lag,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn))             AS write_lag,
    pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn))            AS flush_lag,
    pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn))           AS replay_lag
FROM pg_stat_replication
ORDER BY sync_state DESC;

-- WAL slot retention pressure (async logical replication)
SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY retained_wal DESC;

Set health_check_interval on your proxy to 1–2 seconds for high-traffic systems to prevent stale routing decisions. Configure drain_timeout long enough for in-flight queries to complete but short enough to avoid prolonged primary overload.


Failure Modes and Recovery Steps

1. Synchronous primary write stall (standby disconnect)

Root cause: All standbys in synchronous_standby_names become unreachable. The primary blocks new commit acknowledgments indefinitely.

Diagnosis:

sql
-- Check if any sync standbys are connected
SELECT application_name, sync_state, state
FROM pg_stat_replication
WHERE sync_state = 'sync';
-- Empty result means the primary is or will soon be blocked

Recovery:

sql
-- Option A: Temporarily degrade to async (accept RPO risk)
ALTER SYSTEM SET synchronous_commit = 'local';
SELECT pg_reload_conf();

-- Option B: Remove the missing standby from the sync list
ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (replica_west)';
SELECT pg_reload_conf();

-- After the standby recovers, restore the original config:
ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (replica_east, replica_west)';
SELECT pg_reload_conf();

2. Async replica lag breach (proxy routing fallback overloads primary)

Root cause: The async replica’s apply rate falls below WAL generation rate due to disk I/O saturation or a write spike. ProxySQL removes it from the read pool; all reads route to the primary.

Diagnosis:

bash
# ProxySQL: check which servers are OFFLINE due to lag
mysql -h 127.0.0.1 -P 6032 -e \
  "SELECT hostgroup_id, hostname, status, max_replication_lag, replication_lag
   FROM runtime_mysql_servers;"

Recovery:

  1. Increase standby disk IOPS (or reduce synchronous_commit on the replica side if you are using remote_apply).
  2. Temporarily raise max_replication_lag in ProxySQL to allow the replica to re-enter the pool at a higher lag tolerance while it catches up.
  3. Once lag drops below the original threshold, reset max_replication_lag.

3. Logical replication slot WAL accumulation (disk-full risk)

Root cause: A logical replication subscriber falls behind or disconnects. PostgreSQL retains all WAL since the slot’s confirmed_flush_lsn, eventually filling the data volume.

Prevention config:

ini
# postgresql.conf (PostgreSQL 13+)
max_slot_wal_keep_size = 10GB   # Invalidate slot rather than fill disk

Recovery:

sql
-- Identify the bloated slot
SELECT slot_name, active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn))
FROM pg_replication_slots ORDER BY 3 DESC;

-- If the subscriber is gone, drop the slot immediately
SELECT pg_drop_replication_slot('slot_name_here');

-- Restore the subscriber from a fresh base backup rather than
-- letting it catch up — catching up from a large WAL backlog
-- generates sustained I/O pressure on the primary.

4. Read-your-writes violation (async replica serves stale data post-write)

Root cause: Session pinning window expired before the replica applied the write, or the pin was never implemented.

Diagnosis: Users report seeing old data immediately after saving. Correlate with pg_stat_replication.replay_lsn vs the transaction’s commit LSN.

Recovery:

  1. Extend pin_window_ms in the application router to cover the measured 99th-percentile apply latency.
  2. For critical paths, force reads to the primary using a /*+ read_primary */ hint or an explicit connection-selection flag in your ORM.
  3. Consider managing sticky sessions in distributed database reads for session-scoped primary affinity.

5. Logical replication transactional ordering conflict

Root cause: logical_decoding_work_mem is undersized for the write throughput, causing WAL decoding to spill to disk. The subscriber’s apply latency spikes during bulk writes.

Fix:

ini
# postgresql.conf on the publisher
logical_decoding_work_mem = 256MB   # Default 64 MB is insufficient for bulk loads

Verify the change under load: SHOW logical_decoding_work_mem; on the publisher, then watch pg_stat_replication.write_lag during the next bulk insert window.


Protocol Selection: Physical vs Logical Replication for Read Scaling

The replication mode (sync vs async) is orthogonal to the replication protocol (physical vs logical). Choosing the right protocol affects how you design multi-region read replica topologies and which consistency models for distributed reads you can guarantee.

Physical replication (streaming) copies the raw WAL byte-for-byte. The standby must be on the same PostgreSQL major version and identical platform. Lower CPU overhead; no decoding latency. Works with synchronous and asynchronous commit. Best for HA standbys and full-instance read replicas.

Logical replication decodes WAL into a row-change stream. Supports selective table replication, cross-version upgrades, and schema divergence between publisher and subscriber. Higher serialization CPU; requires careful slot retention management.

sql
-- PostgreSQL logical replication setup

-- On the publisher (primary):
ALTER SYSTEM SET wal_level = 'logical';
SELECT pg_reload_conf();  -- requires restart if changing from 'replica'

-- Publish only the tables needed for read-scaling
CREATE PUBLICATION read_scaling_pub FOR TABLE users, orders, sessions;

-- On the subscriber (analytics replica):
CREATE SUBSCRIPTION read_scaling_sub
  CONNECTION 'host=primary port=5432 dbname=app user=replicator password=secret'
  PUBLICATION read_scaling_pub
  WITH (copy_data = true, create_slot = true);

See when to use logical vs physical replication for read scaling for a full decision matrix including cross-version migration paths and heterogeneous topology patterns.


Child Pages


FAQ

Does synchronous replication guarantee zero data loss on primary failure?

Yes — when synchronous_commit = on and at least one synchronous standby is connected, every committed transaction has been flushed to the standby WAL before the primary returns success. If the primary crashes immediately after that acknowledgment, the standby holds all committed data.

What happens to writes when the synchronous standby becomes unreachable?

Without a quorum fallback in synchronous_standby_names, the primary blocks all new commits indefinitely. Configure ANY 1 (replica_a, replica_b) so the primary can satisfy the quorum from whichever standby remains reachable, or add at least one asynchronous standby to keep reads available during a sync standby outage.

How large should the session pin window be for read-your-writes consistency?

Set pin_window_ms to 1.5–2x the 99th-percentile replica apply latency measured under peak write throughput. Too short causes stale reads; too long routes all reads to the primary and starves replicas. Monitor with pg_stat_replication.write_lag and adjust the window weekly during growth phases.

When should you choose logical replication over physical replication for read scaling?

Choose logical replication when you need to replicate a subset of tables, support schema divergence between primary and replica, or run the replica on a different PostgreSQL major version. Use physical replication when you need the lowest possible CPU overhead and full-instance consistency — it has lower serialization latency and simpler slot management.