Within the broader replication lag and consistency management framework, one of the most operationally risky gaps is the absence of an explicit freshness contract at the query level. Without it, every read is implicitly “best effort” — engineers assume staleness is acceptable until a financial discrepancy, a double-spend, or a vanished inventory item proves otherwise. The SLA risk is asymmetric: reads that tolerate eventual consistency are cheap to serve from replicas, but reads that silently downgrade from a strict freshness requirement cause data correctness incidents that are expensive to debug and may be externally visible.

This page covers how to close that gap: classifying read operations into discrete freshness tiers, annotating queries so routing infrastructure can act on those classifications, and wiring the routing layer to real-time lag signals so decisions remain accurate under changing replica health. Failure to enforce freshness routing also over-provisions primary capacity — unannotated queries frequently default to the primary “to be safe”, defeating the horizontal read scaling that read replicas are meant to deliver.

Concept definition and scope

Freshness-aware query routing is the practice of directing each read query to the database node whose current replication lag satisfies the maximum staleness window the query can tolerate. It is distinct from:

Load balancing, which distributes queries across replicas purely for throughput without consulting lag state.
Sticky sessions, which bind a client to a specific node for session consistency (covered in managing sticky sessions in distributed database reads).
Read/write splitting, which separates SELECT from DML without further differentiation among reads (see implementing read/write splitting at the proxy layer).

Freshness-aware routing adds a third axis — temporal correctness — to the routing decision. It requires both a classification scheme (tiering queries by staleness tolerance) and a feedback loop (measuring actual lag to determine which nodes satisfy each tier at any given moment).

Freshness tier model

Adopt a three-tier model covering the vast majority of production read patterns:

Tier	Annotation	Max acceptable lag	Typical use cases
Strict	`FRESHNESS: STRICT`	0 ms (primary or sync standby only)	Financial ledgers, inventory deduction, user session state, payment idempotency keys
Near-real-time	`FRESHNESS: NEAR_RT`	≤ 500 ms	User profiles, activity feeds, dashboard metrics, order status
Eventual	`FRESHNESS: EVENTUAL`	Seconds to minutes	Historical reporting, search index reads, batch analytics, audit logs

Unannotated queries must not silently route to replicas. Route them to a quarantine pool for review — or fail them with a descriptive error — rather than allowing them to bypass freshness checks. Ambiguous defaults are how staleness violations reach production.

Mechanism deep-dive

Step 1 — Classify data domains at design time

Map each data domain to a tier before writing routing infrastructure. Classify based on the consequence of serving a stale read, not on expected lag:

Zero-consequence staleness → EVENTUAL
Visible to user, recoverable → NEAR_RT
Financial, security, or irreversible operation → STRICT

Encode the classification in a data dictionary or service contract, not just in code comments. Teams that skip this step end up with per-developer routing decisions that drift over time.

Step 2 — Inject freshness annotations at the query layer

The most portable annotation mechanism is a SQL comment injected by a middleware layer the application cannot bypass:

python

# Django ORM interceptor — injects freshness hint before execution
from django.db.backends.base.operations import DatabaseOperations

class FreshnessAnnotationMiddleware:
    TIER_HINTS = {
        "strict":   "/* FRESHNESS: STRICT */",
        "near_rt":  "/* FRESHNESS: NEAR_RT */",
        "eventual": "/* FRESHNESS: EVENTUAL */",
    }

    def execute(self, sql, params, many, context):
        tier = context.get("freshness_tier", "strict")  # default to strict
        hint = self.TIER_HINTS[tier]
        annotated_sql = f"{hint} {sql}"
        return super().execute(annotated_sql, params, many, context)

For services that cannot modify the ORM layer, inject hints via a connection string parameter that the proxy interprets, or use a session variable (SET @freshness_tier = 'eventual') that the proxy reads before each statement.

Step 3 — Configure proxy routing rules

ProxySQL evaluates mysql_query_rules in rule_id order. Anchor rules with ^ to prevent partial matches misclassifying writes as reads, and set apply=1 to short-circuit further rule evaluation:

sql

-- Hostgroup layout
-- HG 10 = writer (primary)
-- HG 20 = low-lag replicas (max_replication_lag ≤ 1s)
-- HG 30 = relaxed replicas (max_replication_lag ≤ 10s)

INSERT INTO mysql_query_rules
  (rule_id, active, match_digest, destination_hostgroup, apply, comment)
VALUES
  (10, 1, '^\/\* FRESHNESS: STRICT \*\/',   10, 1, 'Strict → primary'),
  (20, 1, '^\/\* FRESHNESS: NEAR_RT \*\/',  20, 1, 'Near-RT → low-lag replicas'),
  (30, 1, '^\/\* FRESHNESS: EVENTUAL \*\/', 30, 1, 'Eventual → any replica'),
  (99, 1, '^SELECT',                         10, 1, 'Unannotated reads → primary (quarantine)');

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

Step 4 — Wire lag detection into routing weights

Static hostgroup assignment is insufficient — a replica can drift from HG 20 eligibility during write bursts. ProxySQL’s max_replication_lag field per server causes automatic weight reduction to zero when lag is exceeded, but this binary approach is too coarse for near-real-time workloads. Pair it with an external lag monitor that adjusts weights dynamically:

python

# Lag monitor — polls every 250ms, adjusts weights
import pymysql, time

THRESHOLDS = {"near_rt": 0.5, "eventual": 10.0}

def adjust_replica_weights(admin_conn, lag_by_host: dict):
    for host, lag_s in lag_by_host.items():
        if lag_s <= THRESHOLDS["near_rt"]:
            weight = 100
        elif lag_s <= THRESHOLDS["eventual"]:
            weight = 30   # still eligible for EVENTUAL
        else:
            weight = 0    # circuit open
        admin_conn.execute(
            "UPDATE mysql_servers SET weight=%s WHERE hostname=%s AND hostgroup_id=20",
            (weight, host)
        )
    admin_conn.execute("LOAD MYSQL SERVERS TO RUNTIME")

Beware of thundering herd effects when multiple replicas simultaneously breach a threshold: implement jittered backoff between polling cycles and stagger weight adjustments across replicas to prevent oscillating routing states during transient spikes.

Step 5 — Define deterministic fallback paths

Every tier must have a declared fallback. When the circuit opens, the routing engine must not silently downgrade — it must execute a predefined action:

Tier	Primary action	Fallback when lag > threshold
STRICT	Route to primary	Already at primary; no fallback needed
NEAR_RT	Route to low-lag replica	503 to caller, or primary with rate limit cap
EVENTUAL	Route to any replica	Route to least-lagged replica; only fail if all replicas are unreachable

Architecture overview

The diagram below shows the routing decision flow from an annotated application query through the proxy layer to the appropriate database node.

Trade-off comparison table

Routing layer	Implementation	Zero-code-change	Lag visibility	Failure surface	Best fit
Transparent middleware	ProxySQL, HAProxy, Envoy	Yes	Proxy-level only	Routing loops during proxy health-check failures	High-throughput analytics, reporting
ORM/datasource switching	Django DB Routers, Spring `AbstractRoutingDataSource`	No	Full application context	Connection pool exhaustion under burst traffic	Transactional services with complex routing logic
Service mesh sidecar	Istio/Linkerd, Envoy cluster routing	Yes	Infrastructure metadata	Metadata cache staleness during rapid topology changes	Multi-service platforms with mTLS requirements
Application-level cursor	SQLAlchemy session factories, Prisma client routing	No	Full query context	Developer discipline required; easy to bypass	Greenfield services where ORM adoption is uniform

For production systems handling mixed workloads, the most operationally resilient pattern is a hybrid: route bulk analytics and reporting through a transparent proxy, while transactional services use explicit ORM-level datasource switching. Ensure connection pools are isolated per freshness tier to prevent noisy-neighbour effects — a surge in EVENTUAL queries must not exhaust the connection budget for STRICT queries.

Configuration runbook

ProxySQL — full freshness-aware setup

sql

-- 1. Define hostgroups with explicit lag constraints
INSERT INTO mysql_replication_hostgroups
  (writer_hostgroup, reader_hostgroup, comment)
VALUES
  (10, 20, 'Primary → Read Replicas');

-- 2. Register servers with per-server lag thresholds (seconds)
--    weight=0 means the server is registered but not eligible until lag drops
INSERT INTO mysql_servers
  (hostgroup_id, hostname, port, max_replication_lag, weight, comment)
VALUES
  (10, 'primary.db.internal',    3306, 0,  100, 'Writer node'),
  (20, 'replica-01.db.internal', 3306, 1,  100, 'HG20: low-lag replica'),
  (20, 'replica-02.db.internal', 3306, 1,   80, 'HG20: low-lag replica'),
  (30, 'replica-03.db.internal', 3306, 10, 100, 'HG30: relaxed replica'),
  (30, 'replica-04.db.internal', 3306, 10,  60, 'HG30: relaxed replica');

-- 3. Query rules — rule_id order matters; lower = higher priority
--    apply=1 short-circuits further evaluation
INSERT INTO mysql_query_rules
  (rule_id, active, match_digest, destination_hostgroup, apply, comment)
VALUES
  (10, 1, '^\/\* FRESHNESS: STRICT \*\/',   10, 1, 'Strict → primary'),
  (20, 1, '^\/\* FRESHNESS: NEAR_RT \*\/',  20, 1, 'Near-RT → HG20'),
  (30, 1, '^\/\* FRESHNESS: EVENTUAL \*\/', 30, 1, 'Eventual → HG30'),
  (99, 1, '^SELECT',                         10, 1, 'Unannotated reads → primary (quarantine)');

-- 4. Monitoring and timeout tuning
UPDATE global_variables SET variable_value = '250'
  WHERE variable_name = 'mysql-monitor_replication_lag_interval';    -- poll every 250ms
UPDATE global_variables SET variable_value = '200'
  WHERE variable_name = 'mysql-monitor_read_only_timeout';           -- replica health timeout
UPDATE global_variables SET variable_value = '1000'
  WHERE variable_name = 'mysql-connect_timeout_server';              -- connection timeout
UPDATE global_variables SET variable_value = '2'
  WHERE variable_name = 'mysql-query_retries_on_failure';            -- retry budget

LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;   SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;

Failure-state annotation: When max_replication_lag is breached for a server, ProxySQL reduces its effective weight to 0, shifting traffic to the writer hostgroup. To prevent primary overload, cap mysql-max_connections and implement application-level circuit breakers that return cached responses or 503 when the primary connection queue depth exceeds 80%.

PostgreSQL — lag detection query for routing agents

sql

-- Run on each replica to measure true application-level lag
-- Requires a heartbeat table populated by the primary every 250ms
SELECT
  now() - MAX(written_at) AS replica_lag_seconds,
  pg_is_in_recovery()     AS is_replica
FROM replication_heartbeat;

-- Cross-check with streaming replication stats (on primary)
SELECT
  application_name,
  client_addr,
  EXTRACT(EPOCH FROM replay_lag) AS replay_lag_seconds,
  state
FROM pg_stat_replication
ORDER BY replay_lag_seconds DESC;

Monitoring and alerting signals

Instrument the routing layer with distributed tracing spans that capture freshness decision metadata alongside normal latency signals.

Key metrics to export:

Metric	Type	Alert threshold
`replica_lag_seconds{host, tier}`	Gauge	NEAR_RT: > 0.5s for 30s; EVENTUAL: > 30s for 2m
`routing_fallback_total{tier}`	Counter	Any STRICT fallback triggers PagerDuty
`routing_decision_latency_ms`	Histogram	p99 > 5ms indicates proxy saturation
`consistency_violation_rate{service}`	Counter	> 0 for STRICT tier is a P1 incident
`circuit_breaker_open{hostgroup}`	Gauge	Alert immediately when HG20 circuit opens

Prometheus alerting rules:

yaml

groups:
  - name: freshness_routing
    rules:
      - alert: NearRTReplicaLagExceeded
        expr: replica_lag_seconds{tier="near_rt"} > 0.5
        for: 30s
        labels:
          severity: warning
        annotations:
          summary: "Replica {{ $labels.host }} exceeds 500ms lag for NEAR_RT tier"
          action: "Verify replication threads; check network latency; prepare manual weight override"

      - alert: StrictRoutingFallbackTriggered
        expr: increase(routing_fallback_total{tier="strict"}[5m]) > 0
        labels:
          severity: critical
        annotations:
          summary: "STRICT-tier query fell back — investigate primary or routing rule misconfiguration"

      - alert: CircuitBreakerOpen
        expr: circuit_breaker_open{hostgroup="20"} == 1
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "HG20 circuit open — NEAR_RT traffic routing to primary"

Heartbeat table setup (MySQL/PostgreSQL):

sql

-- Create heartbeat table on primary
CREATE TABLE replication_heartbeat (
  id          INT PRIMARY KEY DEFAULT 1,
  written_at  TIMESTAMP(6) NOT NULL DEFAULT NOW(),
  source_host VARCHAR(128)
);

-- Populate from a cron / sidecar every 250ms
-- MySQL
INSERT INTO replication_heartbeat (id, written_at, source_host)
VALUES (1, NOW(6), @@hostname)
ON DUPLICATE KEY UPDATE written_at = NOW(6), source_host = @@hostname;

-- PostgreSQL
INSERT INTO replication_heartbeat (id, written_at, source_host)
VALUES (1, clock_timestamp(), inet_server_addr()::text)
ON CONFLICT (id) DO UPDATE
  SET written_at = clock_timestamp(),
      source_host = inet_server_addr()::text;

Heartbeat-based lag measurement captures application-level latency — including cases where Seconds_Behind_Source (MySQL) or replay_lag from pg_stat_replication (PostgreSQL) reports zero during a replication thread stall that has not yet disconnected. Complement both signals with GTID/LSN gap tracking to detect silent halts.

Failure modes and recovery steps

1. Unannotated queries silently routed to replicas

Root cause: A catch-all SELECT rule exists without requiring freshness annotations, so all reads reach replicas regardless of staleness tolerance.

Diagnosis:

sql

-- Identify unannotated read traffic in ProxySQL's query digest
SELECT hostgroup, digest_text, count_star
FROM stats_mysql_query_digest
WHERE digest_text LIKE '%SELECT%'
  AND digest_text NOT LIKE '%FRESHNESS:%'
ORDER BY count_star DESC
LIMIT 20;

Remediation: Change the catch-all rule to route unannotated reads to the writer hostgroup (hostgroup 10) and raise an application-level alert. Do not silently accept them on replicas.

2. Thundering herd when multiple replicas simultaneously exceed the threshold

Root cause: A write spike or network hiccup causes several replicas to breach the lag threshold at the same time, collapsing all replica weight to zero and flooding the primary.

Diagnosis: Sudden spike in primary Threads_running (MySQL) or active_connections (PostgreSQL), combined with circuit_breaker_open firing for multiple hostgroups simultaneously.

Remediation:

python

# Stagger weight reductions with jitter to avoid simultaneous collapse
import random, time

def degrade_replica(admin_conn, host, current_lag):
    delay = random.uniform(0.05, 0.3)  # 50–300ms jitter
    time.sleep(delay)
    new_weight = compute_weight(current_lag)
    admin_conn.execute(
        "UPDATE mysql_servers SET weight=%s WHERE hostname=%s", (new_weight, host)
    )
    admin_conn.execute("LOAD MYSQL SERVERS TO RUNTIME")

Also apply max_connections caps on the primary to prevent cascade overload.

3. Proxy rule regex misclassifying writes as reads

Root cause: Unanchored regex patterns (missing ^) match write queries containing SELECT subqueries, routing them to read replicas.

Diagnosis:

sql

-- Check query rule match counts in ProxySQL
SELECT rule_id, match_digest, hits
FROM stats_mysql_query_rules
ORDER BY hits DESC;

-- Inspect matched queries that were writes
SELECT hostgroup, digest_text
FROM stats_mysql_query_digest
WHERE digest_text LIKE '%INSERT%' OR digest_text LIKE '%UPDATE%'
  AND hostgroup != 10;  -- HG10 = writer

Remediation: Anchor all regex rules with ^. For MySQL queries containing comment hints, ensure the comment appears at the very start of the SQL string before the DML keyword.

4. Statistics lag causing suboptimal execution plans on replicas

Root cause: Replicas receive bulk writes that alter data distribution, but optimizer statistics are not refreshed, causing full-table scans on queries that should use an index.

Diagnosis: High Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests ratio on MySQL, or low heap_blks_hit / (heap_blks_hit + heap_blks_read) in pg_statio_user_tables on PostgreSQL.

Remediation:

sql

-- MySQL: refresh statistics on affected table
ANALYZE TABLE orders;

-- PostgreSQL: refresh without locking
ANALYZE orders;

-- Add an optimizer hint for critical read paths where statistics lag is unavoidable
-- MySQL
SELECT /*+ INDEX(o idx_orders_created_at) */ * FROM orders WHERE created_at > NOW() - INTERVAL 1 HOUR;

-- PostgreSQL
/*+ IndexScan(o idx_orders_created_at) */
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '1 hour';

Also enforce query complexity limits: max_execution_time in MySQL and statement_timeout in PostgreSQL to reject unbounded scans on replica endpoints.

5. Incident response — freshness guarantees cannot be met

When multiple replicas breach thresholds simultaneously and primary rate limits are hit, execute this sequence:

Open the circuit: Force all NEAR_RT and EVENTUAL traffic to the primary with a strict connection cap (mysql-max_connections or pg_hba.conf limit for replica users).
Drain long-running replica queries: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND query_start < NOW() - INTERVAL '30 seconds' (PostgreSQL) or KILL QUERY <id> for blocking threads (MySQL).
Validate replication state: Run heartbeat delta checks and GTID/LSN gap analysis on each replica.
Restore gradually: Increase replica weights at 10% increments, monitoring buffer pool hit ratio and query latency p99 between each increment.

Child pages

Configuring MySQL read-only routing with lag thresholds — Step-by-step ProxySQL hostgroup mapping, query rule priority ordering, and fallback syntax for enforcing per-tier lag constraints in MySQL environments.

Operational checklist

All data domains mapped to a freshness tier in the service contract or data dictionary
Freshness annotations injected at the ORM or middleware layer — not left to individual developers
Unannotated SELECT queries route to the primary (quarantine), not silently to replicas
ProxySQL query rules anchored with ^ to prevent write queries matching read rules
Heartbeat table populated every 250ms; delta measured on each replica before routing decisions
GTID/LSN gap tracking enabled alongside time-based lag metrics
Dynamic weight adjustment logic includes jitter to prevent thundering herd
Circuit breaker fallback paths declared for every tier before go-live
Prometheus alerts active for lag thresholds, fallback counts, and circuit breaker state
ANALYZE scheduled on replicas after bulk load operations
Connection pool sizing independent per freshness tier (no noisy-neighbour risk)
Incident runbook (drain → validate → restore) rehearsed in staging under simulated lag

FAQ

How do I prevent strict queries from silently reading stale data?

Tag STRICT queries with a SQL comment annotation and route them exclusively to a writer hostgroup or synchronous standby. Configure ProxySQL query rules with apply=1 so the rule short-circuits before any replica hostgroup is evaluated. Pair this with application-level assertions that verify result freshness against a heartbeat table timestamp.

What is the safest fallback when all replicas exceed the lag threshold?

Open the circuit for NEAR_RT traffic: return a 503 or cached response rather than silently routing to a stale replica. Apply strict rate limiting when falling back to the primary to prevent write amplification. For EVENTUAL traffic, allow continued routing to the least-lagged replica rather than failing outright.

Should I use a proxy or application-layer routing for freshness-aware queries?

Proxies (ProxySQL, PgBouncer, HAProxy) offer zero code changes and centralised control but are opaque to application logic. ORM-level routing (Django DB Routers, Spring AbstractRoutingDataSource) gives explicit developer control and is easier to test in isolation. Most high-throughput systems use a hybrid: transparent proxy for bulk analytics and explicit ORM switching for transactional services.

Replication Lag & Consistency Management — parent section covering the full lag detection, measurement, and mitigation landscape.
Detecting and handling replication lag in real-time — the lag measurement signals (heartbeat tables, GTID gap tracking, pg_stat_replication) that feed routing decisions on this page.
Fallback strategies when replicas fall behind — what to do when the circuit opens and freshness guarantees cannot be met.
Eventual consistency patterns for read-heavy workloads — when to accept staleness and how to design application flows that tolerate it safely.
Implementing read/write splitting at the proxy layer — the foundational proxy routing setup that freshness-aware rules build on top of.
Connection pool architecture for read replicas — how to size and isolate connection pools per freshness tier to prevent resource contention.

Routing Queries Based on Data Freshness Requirements

Problem framing

Concept definition and scope

Freshness tier model

Mechanism deep-dive

Step 1 — Classify data domains at design time

Step 2 — Inject freshness annotations at the query layer

Step 3 — Configure proxy routing rules

Step 4 — Wire lag detection into routing weights

Step 5 — Define deterministic fallback paths

Architecture overview

Trade-off comparison table

Configuration runbook

ProxySQL — full freshness-aware setup

PostgreSQL — lag detection query for routing agents

Monitoring and alerting signals

Failure modes and recovery steps

1. Unannotated queries silently routed to replicas

2. Thundering herd when multiple replicas simultaneously exceed the threshold

3. Proxy rule regex misclassifying writes as reads

4. Statistics lag causing suboptimal execution plans on replicas

5. Incident response — freshness guarantees cannot be met

Child pages

Operational checklist

FAQ