Replication: empty `limited_get_log_entries()` workaround hides storage corruption and can stall catch-up indefinitely #3

New issue

Closed

opened 2026-05-23 16:16:35 +02:00 by buildagent · 1 comment

buildagent commented

2026-05-23 16:16:35 +02:00

Member

Summary

ReplicationCore::send_log_entries() treats an empty limited_get_log_entries(start, end) result for a non-empty range as a heartbeat plus 10ms sleep. This avoids the old panic, but it also hides a RaftLogReader contract violation and can stall replication forever if the storage layer has a hole or a buggy reader.

The current behavior turns a correctness/storage problem into slow liveness degradation with only a warning.

Code pointers

openraft/src/replication/mod.rs: when start != end and logs.is_empty(), the code logs a warning, sleeps 10ms, and returns LogIdRange::new(rng.prev, rng.prev) as if it were a heartbeat.
openraft/src/storage/v2.rs: RaftLogStorage requires no holes and serialized writes. RaftLogReader::limited_get_log_entries() says it must not return empty for non-empty input.
IXT currently relies on this path because its SqliteLogStore does not override limited_get_log_entries() and its try_get_log_entries() can return empty for a bounded non-empty range if the first requested key is missing.

Why this matters

A leader trying to catch up a follower can repeatedly request a range that the log reader should provide. If storage returns empty, OpenRaft now sends a heartbeat-sized AppendEntries and sleeps, never advancing the follower and never surfacing a fatal storage error. This is especially risky because IXT's fork documentation explicitly says the branch was backported as a panic workaround.

Expected behavior

For production builds, a non-empty requested log range returning empty should be treated as a storage/contract error, not as successful heartbeat progress. It should be visible in metrics and fail loudly enough that the node/operator can repair storage.

Suggested fixes

Replace the heartbeat workaround with a ReplicationError::StorageError or a new explicit contract-violation error.
If the workaround is still needed for compatibility, gate it behind a feature flag and emit a counter/metric that fails CI in IXT.
Add a storage conformance test suite that exercises limited_get_log_entries() for present, partially missing, purged, and hole ranges.

Suggested regression test

Use a test RaftLogReader whose limited_get_log_entries(1, 2) returns Ok(vec![]); assert replication reports a storage/contract error instead of retrying forever or reporting heartbeat-like progress.

## Summary `ReplicationCore::send_log_entries()` treats an empty `limited_get_log_entries(start, end)` result for a non-empty range as a heartbeat plus 10ms sleep. This avoids the old panic, but it also hides a `RaftLogReader` contract violation and can stall replication forever if the storage layer has a hole or a buggy reader. The current behavior turns a correctness/storage problem into slow liveness degradation with only a warning. ## Code pointers - `openraft/src/replication/mod.rs`: when `start != end` and `logs.is_empty()`, the code logs a warning, sleeps 10ms, and returns `LogIdRange::new(rng.prev, rng.prev)` as if it were a heartbeat. - `openraft/src/storage/v2.rs`: `RaftLogStorage` requires no holes and serialized writes. `RaftLogReader::limited_get_log_entries()` says it must not return empty for non-empty input. - IXT currently relies on this path because its `SqliteLogStore` does not override `limited_get_log_entries()` and its `try_get_log_entries()` can return empty for a bounded non-empty range if the first requested key is missing. ## Why this matters A leader trying to catch up a follower can repeatedly request a range that the log reader should provide. If storage returns empty, OpenRaft now sends a heartbeat-sized AppendEntries and sleeps, never advancing the follower and never surfacing a fatal storage error. This is especially risky because IXT's fork documentation explicitly says the branch was backported as a panic workaround. ## Expected behavior For production builds, a non-empty requested log range returning empty should be treated as a storage/contract error, not as successful heartbeat progress. It should be visible in metrics and fail loudly enough that the node/operator can repair storage. ## Suggested fixes - Replace the heartbeat workaround with a `ReplicationError::StorageError` or a new explicit contract-violation error. - If the workaround is still needed for compatibility, gate it behind a feature flag and emit a counter/metric that fails CI in IXT. - Add a storage conformance test suite that exercises `limited_get_log_entries()` for present, partially missing, purged, and hole ranges. ## Suggested regression test Use a test `RaftLogReader` whose `limited_get_log_entries(1, 2)` returns `Ok(vec![])`; assert replication reports a storage/contract error instead of retrying forever or reporting heartbeat-like progress.

dhoyer referenced this issue from a commit

2026-05-23 16:55:10 +02:00

fix(replication): empty limited_get_log_entries → loud StorageError (issue #3)

dhoyer referenced this issue from a commit

2026-05-23 18:29:36 +02:00

fix(replication): vote-gate StorageError so a stale stream can't FATAL a healthy node (issue #3 follow-up)

buildagent commented

2026-05-23 21:30:44 +02:00

Author

Member

Fixed on `ixt-stable` (CI green, run #2179, 14/14)

Implemented the "treat empty-for-non-empty-range as a storage/contract error" behavior, plus the suggested feature gate and conformance suite.

`3d6da188` — empty `limited_get_log_entries()` → loud `StorageError`

Replaced the heartbeat + 10ms-sleep workaround in ReplicationCore::send_log_entries(). A non-empty requested range that returns empty is now a ReplicationError::StorageError by default — the node fails loudly instead of silently stalling catch-up.
Compatibility escape hatch: gated the old tolerant behavior behind a new opt-in cargo feature tolerate-empty-log-range (OFF by default, and in no CI feature combo — so CI never compiles it; verify manually if you enable it).
Extended Suite::limited_get_log_entries storage-conformance test (openraft/src/testing/suite.rs) to exercise present / partially-missing ranges, so a non-conforming reader is caught by the conformance suite.

Why flipping the default to FATAL is safe (verified in code, not assumed): ProgressEntry::next_send clamps start to the purge boundary and routes below-purge reads to snapshot, and try_purge_log postpones purging in-flight ranges. So an empty read for a non-empty range is a genuine RaftLogReader contract violation, never a benign purge race.

`0789a571` — follow-up: vote-gate the `StorageError` (regression fix)

Flipping the default surfaced that Response::StorageError was the only Notify::Network arm in raft_core.rs not gated on vote freshness. On a leadership change with an uncommitted-log revert (normal under loosen-follower-log-revert), an in-flight replication stream from the old epoch reads the truncated range and its StorageError would have FATAL'd the now-healthy ex-leader.

Fix: tag StorageError with sender_vote and discard it in RaftCore unless the vote matches (mirrors the existing HigherVote gating).
Deterministic regression test tests/.../replication/t70_stale_session_storage_error.rs, backed by a new BlockOperation::LimitedGetLogEntries memstore delay hook — proven to fail without the gate and pass with it.

⚠️ Operational caveat for downstream stores (the one thing outside this repo)

The default now goes FATAL on a genuine contract violation. IXT's SqliteLogStore (separate repo) is the reason the old workaround existed — its try_get_log_entries() could return empty for a bounded non-empty range. Before pinning this rev, that store must either (a) uphold the contract — ideally override limited_get_log_entries(), or at minimum never return empty for a range overlapping present entries (the extended conformance test will catch it), or (b) run with tolerate-empty-log-range as a documented break-glass during transition.

Closing as fixed. The SqliteLogStore decision above is tracked downstream.

## Fixed on `ixt-stable` (CI green, run #2179, 14/14) Implemented the "treat empty-for-non-empty-range as a storage/contract error" behavior, plus the suggested feature gate and conformance suite. ### `3d6da188` — empty `limited_get_log_entries()` → loud `StorageError` - Replaced the heartbeat + 10ms-sleep workaround in `ReplicationCore::send_log_entries()`. A non-empty requested range that returns empty is now a `ReplicationError::StorageError` by default — the node fails loudly instead of silently stalling catch-up. - Compatibility escape hatch: gated the old tolerant behavior behind a new opt-in cargo feature **`tolerate-empty-log-range`** (OFF by default, and in **no** CI feature combo — so CI never compiles it; verify manually if you enable it). - Extended `Suite::limited_get_log_entries` storage-conformance test (`openraft/src/testing/suite.rs`) to exercise present / partially-missing ranges, so a non-conforming reader is caught by the conformance suite. **Why flipping the default to FATAL is safe** (verified in code, not assumed): `ProgressEntry::next_send` clamps `start` to the purge boundary and routes below-purge reads to snapshot, and `try_purge_log` postpones purging in-flight ranges. So an empty read for a non-empty range is a genuine `RaftLogReader` contract violation, never a benign purge race. ### `0789a571` — follow-up: vote-gate the `StorageError` (regression fix) Flipping the default surfaced that `Response::StorageError` was the **only** `Notify::Network` arm in `raft_core.rs` not gated on vote freshness. On a leadership change with an uncommitted-log revert (normal under `loosen-follower-log-revert`), an in-flight replication stream from the *old* epoch reads the truncated range and its `StorageError` would have FATAL'd the now-healthy ex-leader. - Fix: tag `StorageError` with `sender_vote` and discard it in RaftCore unless the vote matches (mirrors the existing `HigherVote` gating). - Deterministic regression test `tests/.../replication/t70_stale_session_storage_error.rs`, backed by a new `BlockOperation::LimitedGetLogEntries` memstore delay hook — proven to fail without the gate and pass with it. ### ⚠️ Operational caveat for downstream stores (the one thing outside this repo) The default now goes FATAL on a genuine contract violation. IXT's `SqliteLogStore` (separate repo) is the reason the old workaround existed — its `try_get_log_entries()` could return empty for a bounded non-empty range. **Before pinning this rev, that store must either** (a) uphold the contract — ideally override `limited_get_log_entries()`, or at minimum never return empty for a range overlapping present entries (the extended conformance test will catch it), **or** (b) run with `tolerate-empty-log-range` as a documented break-glass during transition. Closing as fixed. The `SqliteLogStore` decision above is tracked downstream.

buildagent closed this issue

2026-05-23 21:30:53 +02:00