Understanding Synchronous vs Asynchronous Replication
Commit Path Mechanics & Acknowledgment Flows
At the core of Database Replication Fundamentals & Architecture, the commit acknowledgment path dictates system behavior under load and failure. Synchronous replication enforces a blocking protocol where the primary transaction commit waits for replica Write-Ahead Log (WAL) receipt and acknowledgment before returning control to the client. This guarantees zero Recovery Point Objective (RPO) but introduces write latency directly proportional to network round-trip time (RTT) and disk fsync latency on the standby. Asynchronous replication decouples the client commit return from replica acknowledgment, prioritizing write throughput while accepting potential data divergence during primary failure or network partitions.
Production deployments must explicitly tune acknowledgment boundaries to prevent primary thread starvation. In PostgreSQL, this is governed by synchronous_commit and synchronous_standby_names. When network RTT exceeds acceptable thresholds, the primary transaction queue backs up, triggering connection pool exhaustion.
# postgresql.conf - Synchronous Commit Configuration
synchronous_commit = on
synchronous_standby_names = 'ANY 1 (replica_east, replica_west)'
wal_receiver_timeout = 6000ms
wal_sender_timeout = 6000ms
Critical Parameters & Degraded-State Behavior:
synchronous_commit = on: Forces WAL flush on both primary and standby before commit. If a standby becomes unreachable, the primary blocks indefinitely unlesssynchronous_standby_namesusesANYorFIRSTwith fallback logic.wal_receiver_timeout/wal_sender_timeout: Defines the heartbeat window. When breached, the primary marks the link as stale and may automatically downgrade to asynchronous mode ifsynchronous_commitis set toremote_applywith fallback, or halt writes entirely if strict zero-RPO is enforced.- Degraded State: During a network partition, synchronous primaries will queue transactions until the timeout expires. If
synchronous_standby_nameslacks a quorum fallback, the database enters a read-only state to prevent split-brain data loss.
Connection Routing & Topology Implications
Connection routers must dynamically classify endpoints based on replication mode to prevent stale reads and write routing violations. When Designing Multi-Region Read Replica Topologies, synchronous nodes are typically reserved for critical transactional paths requiring immediate consistency, while asynchronous endpoints handle bulk analytical queries, background workers, and cache warm-up routines. Proxy configurations require explicit routing rules that enforce read-after-write consistency by pinning user sessions to the primary or a low-lag sync replica during active transactions.
ProxySQL and PgBouncer implementations rely on endpoint tagging and health-aware routing tables. Misconfigured routing rules cause “read-your-writes” violations, where a user updates a record but immediately queries an async replica that hasn’t applied the WAL yet.
-- ProxySQL Routing Rules Example
INSERT INTO mysql_servers (hostgroup_id, hostname, port, max_connections, max_replication_lag, comment)
VALUES
(10, 'primary.db.internal', 5432, 500, 0, 'Write/Strict Sync'),
(20, 'async-read-01.db.internal', 5432, 1000, 5000, 'Async Read - Analytics'),
(20, 'async-read-02.db.internal', 5432, 1000, 5000, 'Async Read - Analytics');
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
(100, 1, '^SELECT.*', 20, 1),
(101, 1, '^INSERT|^UPDATE|^DELETE', 10, 1);
Critical Parameters & Degraded-State Behavior:
max_replication_lag: Threshold in milliseconds. Proxies automatically remove endpoints from the read pool when lag exceeds this value.sticky_session_timeout: Pins a client connection to a specific hostgroup for a defined window after a write, ensuring read-after-write consistency without global sync requirements.- Degraded State: When all async replicas breach
max_replication_lag, the proxy must either route reads to the primary (increasing write contention) or reject read queries with503 Service Unavailable. Proper configuration requires afallback_to_primaryflag with strict rate limiting to prevent primary overload during replica catch-up storms.
Consistency Models & Application-Level Handling
Application developers must align query execution with the chosen replication mode to avoid consistency anomalies. Evaluating Consistency Models for Distributed Reads demonstrates that async replicas require explicit session pinning or application-level read routing to maintain user-facing consistency. Connection pools should implement transaction-scoped routing and fallback mechanisms that automatically redirect reads to the primary when replica lag exceeds acceptable thresholds.
Strong consistency across async topologies is impossible without coordination overhead. Instead, implement read-your-writes guarantees via application middleware that tracks transaction boundaries and routes subsequent reads to the primary for a configurable window.
# Application-Level Routing Middleware (Python/SQLAlchemy)
class ConsistencyRouter:
def __init__(self, primary_engine, async_engine, lag_threshold_ms=2000, pin_window_ms=5000):
self.primary = primary_engine
self.async = async_engine
self.lag_threshold = lag_threshold_ms
self.pin_window = pin_window_ms
self._session_pins = {}
def get_engine(self, session_id, is_write=False):
if is_write:
self._session_pins[session_id] = time.time()
return self.primary
pin_expiry = self._session_pins.get(session_id, 0)
if time.time() - pin_expiry < self.pin_window / 1000:
return self.primary # Enforce read-your-writes
# Fallback to async if lag is healthy
if self._check_replica_lag() < self.lag_threshold:
return self.async
return self.primary # Safety fallback
Critical Parameters & Degraded-State Behavior:
pin_window_ms: Duration a session remains routed to the primary post-write. Too short causes stale reads; too long starves async replicas.lag_threshold_ms: Application-side consistency boundary. Must align with proxy thresholds.- Degraded State: If the primary is under heavy write load and the
pin_windowforces all reads to it, connection pool exhaustion occurs. Mitigation requires implementing a circuit breaker that temporarily disables session pinning and serves slightly stale data with explicit cache-control headers (Cache-Control: max-age=0, stale-while-revalidate=5).
Observability & SLA Threshold Management
Operationalizing async replication requires precise lag tracking and automated routing adjustments. How to calculate replication lag thresholds for SLA compliance involves measuring WAL generation rates, network RTT, and transaction volume to define business-safe boundaries. SREs must configure proxy health endpoints to continuously monitor apply delays, automatically marking replicas as unhealthy and draining traffic when thresholds are breached.
Lag is not a static metric; it is a derivative of write throughput and network capacity. Monitoring must track both replication_lag_seconds and wal_apply_rate to distinguish between transient network blips and sustained replication bottlenecks.
# Prometheus Alerting Rules for Replication SLA
groups:
- name: replication_sla
rules:
- alert: ReplicaLagExceedsSLA
expr: pg_replication_lag_seconds > 3.0
for: 2m
labels:
severity: critical
annotations:
summary: "Replica {{ $labels.instance }} lag exceeds 3s SLA"
description: "Automated traffic drain initiated. Verify WAL sender/receiver buffers."
- alert: WALApplyRateDegradation
expr: rate(pg_stat_replication_write_lag_bytes[5m]) < 10485760
for: 5m
labels:
severity: warning
annotations:
summary: "Replica apply rate degraded"
description: "Check disk IOPS and fsync latency on standby."
Critical Parameters & Degraded-State Behavior:
health_check_interval: Frequency of proxy lag probes. Set to1s-2sfor high-traffic systems to prevent stale routing decisions.drain_timeout: Grace period before a lagging replica is removed from the pool. Allows in-flight queries to complete without abruptTCP RSTerrors.- Degraded State: When lag breaches SLA, the proxy initiates a graceful drain. If the replica fails to catch up within
drain_timeout, it is markedOFFLINE_HARD. Automated recovery scripts should trigger apg_rewindor snapshot restore rather than attempting infinite catch-up, which can cause primary I/O starvation.
Protocol Selection & Scaling Tradeoffs
The underlying replication protocol directly influences routing flexibility and scaling overhead. When to use logical vs physical replication for read scaling determines whether you can route heterogeneous queries, apply row-level filtering, or support cross-version upgrades. Physical replication offers lower CPU overhead but restricts routing to identical schema endpoints, while logical replication enables targeted read distribution at the cost of higher serialization latency and WAL decoding overhead.
Logical replication decouples the physical storage format from the replication stream, allowing selective table routing and schema divergence. However, it introduces transactional ordering complexities and requires careful slot retention management.
# postgresql.conf - Logical Replication Tuning
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
logical_decoding_work_mem = 64MB
# Publication/Subscription Setup
CREATE PUBLICATION read_scaling_pub FOR TABLE users, orders, sessions;
CREATE SUBSCRIPTION read_scaling_sub
CONNECTION 'host=async-reader port=5432 dbname=analytics'
PUBLICATION read_scaling_pub
WITH (copy_data = true, synchronous_commit = off, create_slot = true);
Critical Parameters & Degraded-State Behavior:
logical_decoding_work_mem: Memory allocated for decoding WAL changes. Insufficient allocation causes disk spilling and exponential lag growth during bulk writes.synchronous_commit = off(on subscriber): Acceptable for async read scaling but risks data loss on subscriber crash. Never use for financial ledger replication.- Degraded State: If a logical replication slot falls behind, WAL files accumulate on the primary, eventually triggering
disk fullerrors. Implementmax_slot_wal_keep_sizeto cap retention and force slot invalidation rather than primary saturation. Route traffic away from the affected subscriber until a fresh base backup is applied.
Failure Modes & Write Amplification Mitigation
During network partitions or heavy sync operations, async replicas can trigger cascading write amplification and primary saturation. Preventing write amplification during heavy read replica syncs requires tuning WAL sender/receiver buffers, implementing exponential backoff in connection routers, and isolating bulk ingestion workloads from real-time read paths. Proper circuit breakers in the routing layer prevent connection pool exhaustion and ensure graceful degradation during catch-up storms.
Write amplification occurs when retry storms, connection pool flapping, and replica catch-up queries compete for primary I/O and network bandwidth. Without isolation, a single lagging replica can degrade the entire cluster.
# PgBouncer / Connection Pool Circuit Breaker Config
[databases]
primary_db = host=primary port=5432 dbname=app pool_mode=transaction max_client_conn=500
[pgbouncer]
max_client_conn = 2000
default_pool_size = 50
server_idle_timeout = 30
server_lifetime = 3600
server_connect_timeout = 5
server_login_retry = 3
circuit_breaker_threshold = 0.65
circuit_breaker_timeout = 30000
Critical Parameters & Degraded-State Behavior:
server_connect_timeout/server_login_retry: Prevents connection pool threads from blocking indefinitely during replica network flaps.circuit_breaker_threshold: Fraction of failed/timeout connections before the pool temporarily stops routing to the affected hostgroup.retry_backoff_ms: Exponential delay applied by application drivers. Must be jittered to prevent thundering herd effects.- Degraded State: When the circuit breaker trips, the routing layer sheds traffic to the degraded replica and routes reads to the primary or a healthy secondary. Bulk sync operations should be paused via
pg_replication_advanceor connection suspension until primary I/O utilization drops below70%. Once stabilized, replicas rejoin withcopy_data=falseto avoid full table scans, and the circuit breaker resets with acooldownperiod to prevent rapid re-tripping.