Pre-Vote: stale grants can promote to a real election after a same-leader heartbeat refreshes the lease #4
Labels
No labels
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
h-dv/openraft#4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The current Pre-Vote implementation keeps
Engine::pre_candidatealive across same-vote leader heartbeats. That avoids starving a pre-vote round, but it creates a stale-grant window: an old pre-vote quorum can still promote intoEngine::elect()after a live leader has refreshed this node's leader lease with AppendEntries.This weakens the main Pre-Vote guarantee from Raft dissertation §9.6: a node that has recently heard from a valid leader should not start a disruptive real election.
Code pointers
openraft/src/engine/handler/vote_handler/mod.rs:update_vote()clearspre_candidateonly whenvote > self.state.vote_ref(). Same-vote committed leader heartbeats calltouch()and leave the pre-vote round intact.openraft/src/engine/engine_impl.rs:handle_pre_vote_resp()promotes on quorum by clearingpre_candidateand callingself.elect()without re-checking current leader lease / election timer freshness.openraft/src/core/raft_core.rs:handle_notify()routesresp.pre_votetohandle_pre_vote_resp()purely by matching the stored prospective vote.Plausible failure sequence
vote_last_modified/ leader lease.pre_candidatesurvives.handle_pre_vote_resp()callselect()immediately, incrementing/persisting a real vote despite the fresh leader lease.The current unit/integration tests cover lease rejection when answering Pre-Vote and isolated-follower non-disruption, but they do not cover this stale-response-after-heartbeat interleaving.
Expected behavior
Any accepted AppendEntries from the current leader after a Pre-Vote round starts should make old pre-vote grants unable to trigger a real election unless the election timer/lease has expired again.
Suggested fixes
pre_vote_epoch/ election timer generation and carry it throughNotify::VoteResponse, rejecting responses from old rounds.pre_candidateon accepted AppendEntries/heartbeat from the current committed leader, but avoid the starvation noted indocs/design/pre-vote-spike.mdby restarting only after the next real timeout.handle_pre_vote_resp().Suggested regression test
Engine-level deterministic test:
pre_elect()and receive one grant short of quorum.SaveVote, no realSendVote{pre_vote:false}, and state remains follower until the next election timeout.This is consensus-critical because it can turn Pre-Vote back into a disruptive election under message reordering.
Fixed on
ixt-stable(CI green, run #2179, 14/14)4d4d715f— re-check leader lease before promoting a Pre-Vote roundImplemented the "at minimum, re-check the local leader lease/election timeout immediately before promotion" suggestion (the lowest-risk of the three options, no wire/protocol change):
Engine::leader_lease_is_unexpired()fromwould_grant_vote(step 1, the same lease check used when answering a vote).handle_pre_vote_resp()now calls it immediately before promotion: if a leader lease was refreshed after the pre-vote round started, the stale round is abandoned (pre_candidate = None; return) instead of callingelect(). This closes the stale-grant window from the failure sequence in the issue — a delayed grant can no longer ratchet a real vote after a same-vote leader heartbeat refreshed the lease.This restores the §9.6 guarantee (a node that recently heard from a valid leader does not start a disruptive real election) without the starvation risk noted in
pre-vote-spike.md, since the round is only abandoned, not the timer reset.Regression tests (
openraft/src/engine/tests/pre_vote_test.rs)test_handle_pre_vote_resp_stale_grant_blocked_by_refreshed_lease— the exact 5-step interleaving from the issue (grant short of quorum → same-leader heartbeat refreshes lease → delayed grant completes quorum); asserts no promotion / no real vote.test_handle_pre_vote_resp_promotes_when_lease_genuinely_expired— positive liveness: when the lease has expired, quorum still promotes normally.Closing as fixed.