single-term-leader: IXT is parking it (advanced mode preferred) — failover liveness regression persists after Pre-Vote + back-off #2

Closed
opened 2026-05-23 15:00:17 +02:00 by buildagent · 3 comments
Member

Decision (IXT side)

IXT is disabling the single-term-leader feature for now and running OpenRaft's default advanced (term, node_id) mode. No action required from the openraft team — this is a heads-up + a parked open question. Follow-up to #1 (which you fully delivered).

Why

After pinning the combined rev you published (ixt-stable @ 7a74abbe — single-term-leader + Pre-Vote cfb20a8c + greater-log back-off d49ef291 + I189/change_membership), IXT's leader-failover chaos gate (3-voter, hard-kill leader, in-process):

config test_leader_failover test_data_integrity_under_chaos
single-term-leader + Pre-Vote + back-off (7a74abbe) ~86% (19/22) ~91% (20/22)
advanced mode + same rev 6/6 deterministic (~2.6s) 6/6

The correctness bug from #1 (term-thrash livelock) is fixed — thank you. What remains is liveness/latency: single-term failover is variable (2.6–8.3s, tail past our 15s budget), because with no (term, node_id) tiebreak two equal-log survivors split-vote and rely entirely on randomized-timeout convergence (+ Pre-Vote's extra round). Advanced mode's leader-id ordering sidesteps that, giving deterministic fast failover.

For IXT this is a clear trade: single-term-leader's "textbook one-leader-per-term" is a purity property we don't need (advanced mode is equally safe — no split-brain, it's the OpenRaft default and what we'd run anyway), and it would also mean IXT permanently carries more fork divergence (the migration + Pre-Vote + back-off as load-bearing) — the opposite of our "reduce bespoke Raft surface" goal that drove this whole effort.

What IXT keeps

  • Stays pinned at 7a74abbe with single-term-leader off. Pre-Vote + back-off + I189 + change_membership remain active and benefit advanced mode too — they're welcome regardless.
  • No need to revert anything in the fork. The guard-removal / single-term-leader support can stay in ixt-stable for future use; IXT just won't enable the feature.

Parked open question (only if IXT ever revisits single-term-leader)

Is the residual failover variance fundamental to single-term-leader (no leader-id tiebreak → split-vote convergence is purely timer-driven), or is there a timer/pre-vote profile that makes single-term failover deterministic enough for a tight CI budget? Not blocking — parking it here so the context isn't lost.

cc IXT: main will be un-blocked by disabling the feature (advanced mode = 6/6); no fork change needed from you.

## Decision (IXT side) IXT is **disabling the `single-term-leader` feature for now** and running OpenRaft's **default advanced `(term, node_id)` mode**. No action required from the openraft team — this is a heads-up + a parked open question. Follow-up to #1 (which you fully delivered). ## Why After pinning the combined rev you published (`ixt-stable` @ `7a74abbe` — single-term-leader + Pre-Vote `cfb20a8c` + greater-log back-off `d49ef291` + I189/change_membership), IXT's leader-failover chaos gate (3-voter, hard-kill leader, in-process): | config | `test_leader_failover` | `test_data_integrity_under_chaos` | |---|---|---| | single-term-leader + Pre-Vote + back-off (`7a74abbe`) | **~86%** (19/22) | **~91%** (20/22) | | advanced mode + same rev | **6/6 deterministic (~2.6s)** | **6/6** | The **correctness** bug from #1 (term-thrash livelock) is **fixed** — thank you. What remains is **liveness/latency**: single-term failover is variable (2.6–8.3s, tail past our 15s budget), because with no `(term, node_id)` tiebreak two equal-log survivors split-vote and rely entirely on randomized-timeout convergence (+ Pre-Vote's extra round). Advanced mode's leader-id ordering sidesteps that, giving deterministic fast failover. For IXT this is a clear trade: `single-term-leader`'s "textbook one-leader-per-term" is a purity property we don't need (advanced mode is equally safe — no split-brain, it's the OpenRaft default and what we'd run anyway), and it would also mean IXT permanently carries **more** fork divergence (the migration + Pre-Vote + back-off as load-bearing) — the opposite of our "reduce bespoke Raft surface" goal that drove this whole effort. ## What IXT keeps - Stays pinned at `7a74abbe` with `single-term-leader` **off**. **Pre-Vote + back-off + I189 + change_membership remain active and benefit advanced mode too** — they're welcome regardless. - **No need to revert anything in the fork.** The guard-removal / single-term-leader support can stay in `ixt-stable` for future use; IXT just won't enable the feature. ## Parked open question (only if IXT ever revisits single-term-leader) Is the residual failover variance **fundamental** to single-term-leader (no leader-id tiebreak → split-vote convergence is purely timer-driven), or is there a timer/pre-vote profile that makes single-term failover deterministic enough for a tight CI budget? Not blocking — parking it here so the context isn't lost. cc IXT: `main` will be un-blocked by disabling the feature (advanced mode = 6/6); no fork change needed from you.
Author
Member

Update — un-parking this: single-term-leader is REVIVED and green

This issue reported single-term failover variance (~86–91%) on 7a74abbe and IXT's plan to run advanced mode instead. The parked open question — is the residual variance fundamental, or is there a timer/pre-vote profile that makes single-term failover deterministic enough? — is answered by 590558b8: the per-attempt election-timeout re-randomization (4cd6b553 + c88a2ef9) removes the split-vote tail.

On 590558b8 IXT's leader-failover chaos gate went 86–91% → 16/16 locally, and IXT main-integration is now fully green on main with single-term-leader ENABLED (run #2182: main-test + distributed-tests + benchmark all pass).

So IXT is keeping single-term-leader on, not parking it — the trade reversed once the timer fix landed. Thank you; this can be closed as resolved.

## Update — un-parking this: single-term-leader is REVIVED and green This issue reported single-term failover *variance* (~86–91%) on `7a74abbe` and IXT's plan to run advanced mode instead. The parked open question — *is the residual variance fundamental, or is there a timer/pre-vote profile that makes single-term failover deterministic enough?* — is **answered by `590558b8`**: the **per-attempt election-timeout re-randomization** (`4cd6b553` + `c88a2ef9`) removes the split-vote tail. On `590558b8` IXT's leader-failover chaos gate went **86–91% → 16/16** locally, and **IXT main-integration is now fully green on `main` with `single-term-leader` ENABLED** (run #2182: `main-test` + `distributed-tests` + `benchmark` all pass). So IXT is **keeping single-term-leader on**, not parking it — the trade reversed once the timer fix landed. Thank you; this can be closed as resolved.
Author
Member

Closing — parked question answered; single-term-leader revived and green

The parked open question (is the single-term failover variance fundamental, or fixable via the timer/pre-vote profile?) was answered: it was ~90% a fixable bug — openraft froze the election timeout at node creation and never re-randomized per attempt. The fix (4cd6b553/c88a2ef9, standard Raft §5.2 per-attempt re-randomization) collapsed the split-vote tail.

Per IXT's confirmation above (comment 1321): the chaos gate went 86–91% → 16/16, IXT is keeping single-term-leader enabled, and main-integration is green on main (run #2182). Closing as resolved.

## Closing — parked question answered; single-term-leader revived and green The parked open question (*is the single-term failover variance fundamental, or fixable via the timer/pre-vote profile?*) was answered: it was ~90% a fixable bug — openraft froze the election timeout at node creation and never re-randomized per attempt. The fix (`4cd6b553`/`c88a2ef9`, standard Raft §5.2 per-attempt re-randomization) collapsed the split-vote tail. Per IXT's confirmation above (comment 1321): the chaos gate went 86–91% → **16/16**, IXT is **keeping `single-term-leader` enabled**, and main-integration is green on `main` (run #2182). Closing as resolved.
Author
Member

Answered + closing — single-term is NOT deterministic-able at 7 voters; IXT moved to advanced mode

The parked question here was "can single-term-leader failover be made deterministic?" We now have hard data at the production control-plane size (7 voters), measured on the current pin (v0.9.24, which includes the 6ff217da single-term de-flake). test_50_node_leader_failover (kill leader → poll for re-election), 10 runs each:

single-term-leader advanced mode
re-election times 3, 7, 10, 10, 28, 42, 56s + 3× >60s 2s, then 3.0s ×9
failures (>60s) 3/10 (30%) 0/10

Conclusion: no. Even with the 0.9.24 de-flake, single-term failover at 7 voters is non-deterministic with a severe tail (30% don't elect within 60s), because there's no (term,node_id) tiebreak — a split vote among the survivors is broken only by per-attempt randomized election-timeout, a geometric tail that's fine at 3 voters (where the chaos gate lives) but pathological at 7. Advanced mode's tiebreak structurally prevents the stalemate → deterministic ~3s.

IXT decision: the control plane runs advanced mode (the single-term-leader feature is retired — h-dv/ixt PR #157). Advanced is openraft's default anyway, so this also reduces our fork divergence. Single-term had been adopted twice (ixt #143/#150) on 3-voter chaos evidence that never exercised the 7-voter control plane; this settles it.

No action needed on the fork — keeping the single-term-leader feature available is fine for anyone who wants it; IXT just won't enable it. Thanks for the de-flake work along the way. Closing openraft#2.

## Answered + closing — single-term is NOT deterministic-able at 7 voters; IXT moved to advanced mode The parked question here was "can single-term-leader failover be made deterministic?" We now have hard data at the production control-plane size (7 voters), measured on the current pin (v0.9.24, which includes the `6ff217da` single-term de-flake). `test_50_node_leader_failover` (kill leader → poll for re-election), 10 runs each: | | single-term-leader | advanced mode | |---|---|---| | re-election times | 3, 7, 10, 10, 28, 42, 56s + **3× >60s** | 2s, then **3.0s ×9** | | failures (>60s) | **3/10 (30%)** | **0/10** | **Conclusion: no.** Even with the 0.9.24 de-flake, single-term failover at 7 voters is non-deterministic with a severe tail (30% don't elect within 60s), because there's no `(term,node_id)` tiebreak — a split vote among the survivors is broken only by per-attempt randomized election-timeout, a geometric tail that's fine at 3 voters (where the chaos gate lives) but pathological at 7. Advanced mode's tiebreak structurally prevents the stalemate → deterministic ~3s. **IXT decision:** the control plane runs **advanced mode** (the `single-term-leader` feature is retired — h-dv/ixt PR #157). Advanced is openraft's default anyway, so this also reduces our fork divergence. Single-term had been adopted twice (ixt #143/#150) on 3-voter chaos evidence that never exercised the 7-voter control plane; this settles it. No action needed on the fork — keeping the `single-term-leader` feature available is fine for anyone who wants it; IXT just won't enable it. Thanks for the de-flake work along the way. Closing openraft#2.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
h-dv/openraft#2
No description provided.