h-dv/code-index

Fork 0

Primary MCP server attaches to a linked project's daemon → all tools fail with -32603 "sending <op>" #7

New issue

Closed

opened 2026-06-08 15:09:06 +02:00 by buildagent · 3 comments

buildagent commented

2026-06-08 15:09:06 +02:00

Member

Summary

In a workspace with a linked dependency project, the primary MCP server attached to the linked project's daemon instead of its own. Every tool call then returned a JSON-RPC -32603 internal error (sending stats, sending file_outline, sending search_symbols, …) because the daemon was serving a different repo root than the one being queried. Reconnecting the MCP server resolved it.

Two distinct defects:

Daemon misattachment — primary server connected to the wrong (linked) daemon even though its own daemon was alive and listening.
Swallowed errors / logging gap — the internal failures were never written to the server's stderr, so they're undiagnosable without process forensics.

Version

code-index-mcp 0.3.1
code-index-daemon 0.3.1 (installed at /usr/local/bin/)
MCP protocol 2025-11-25; client claude-code 2.1.168
Newer build v0.4.1 was available but not installed (not yet tested against this bug).

Symptoms

All code-index tools failed identically. Verbatim errors (JSON-RPC -32603 = Internal error):

project_overview  →  MCP error -32603: sending stats
file_outline      →  MCP error -32603: sending file_outline
search_symbols    →  MCP error -32603: sending search_symbols

The message is always sending <operation>. Nothing was logged to the server's stderr for these — only startup lines appear in the MCP client log.

Root cause

Workspace layout (primary + one linked dependency):

Project	Root	daemon pid	port	listening
primary (`ixt-iam`)	`…/modules/ixt-iam`	94831	45569	yes (fd=10)
dependency (`ixt`)	`…/rust/ixt`	2630659	40899	yes (fd=14)

The primary server's startup log:

INFO starting MCP server root=/home/.../modules/ixt-iam ...
INFO attached to existing daemon port=40899 pid=2630659   ← WRONG: that is the ixt daemon
INFO linked project ready name=ixt root=/home/.../rust/ixt

Port 40899 / pid 2630659 belongs to the linked ixt project (DB …/rust/ixt/.code-index/index.db). The primary server should have attached to port 45569 / pid 94831 — its own daemon, alive, listening, and correctly recorded in …/ixt-iam/.code-index/daemon.toml. Because it queried a daemon scoped to a different root, every request errored with -32603.

Both daemons were confirmed alive and listening (ss -ltnp). Timing: the primary daemon had ~34s uptime while the linked daemon had ~1 day — consistent with a race where the newly-spawned primary daemon was ignored in favor of an already-listening daemon during discovery.

Reproduction (suspected)

Open project A (primary) that declares a [[links]] dependency on project B.
Have project B's daemon already running/listening.
Start the MCP server for project A.
Observe it attaches to B's daemon port; all A-scoped tool calls return -32603 sending <op>.

Expected

The MCP server must attach to (or spawn) the daemon whose root/db matches its own primary root, never a linked project's daemon. Daemon discovery should key on root, not just "first listening daemon found."

Workaround

Reconnect/restart the MCP server (or kill both daemons so they respawn). After reconnect, the primary log read attached to existing daemon port=45569 and all tools returned correct primary-scoped data.

Suggested fixes

Match daemon by primary root (and/or db path) during discovery; reject/ignore daemons whose root differs.
Surface internal errors: log the underlying cause to stderr instead of returning only -32603 sending <op> with no server-side trace.
Re-test under v0.4.1 to confirm whether either defect is already addressed.

## Summary In a workspace with a linked dependency project, the **primary** MCP server attached to the **linked project's daemon** instead of its own. Every tool call then returned a JSON-RPC `-32603` internal error (`sending stats`, `sending file_outline`, `sending search_symbols`, …) because the daemon was serving a different repo root than the one being queried. Reconnecting the MCP server resolved it. Two distinct defects: 1. **Daemon misattachment** — primary server connected to the wrong (linked) daemon even though its own daemon was alive and listening. 2. **Swallowed errors / logging gap** — the internal failures were never written to the server's stderr, so they're undiagnosable without process forensics. ## Version - `code-index-mcp 0.3.1` - `code-index-daemon 0.3.1` (installed at `/usr/local/bin/`) - MCP protocol `2025-11-25`; client `claude-code 2.1.168` - Newer build `v0.4.1` was available but not installed (not yet tested against this bug). ## Symptoms All code-index tools failed identically. Verbatim errors (JSON-RPC `-32603` = Internal error): ``` project_overview → MCP error -32603: sending stats file_outline → MCP error -32603: sending file_outline search_symbols → MCP error -32603: sending search_symbols ``` The message is always `sending <operation>`. Nothing was logged to the server's stderr for these — only startup lines appear in the MCP client log. ## Root cause Workspace layout (primary + one linked dependency): | Project | Root | daemon pid | port | listening | |---------|------|-----------|------|-----------| | **primary** (`ixt-iam`) | `…/modules/ixt-iam` | 94831 | **45569** | yes (fd=10) | | dependency (`ixt`) | `…/rust/ixt` | 2630659 | **40899** | yes (fd=14) | The primary server's startup log: ``` INFO starting MCP server root=/home/.../modules/ixt-iam ... INFO attached to existing daemon port=40899 pid=2630659 ← WRONG: that is the ixt daemon INFO linked project ready name=ixt root=/home/.../rust/ixt ``` Port `40899` / pid `2630659` belongs to the **linked `ixt`** project (DB `…/rust/ixt/.code-index/index.db`). The primary server should have attached to port `45569` / pid `94831` — its own daemon, alive, listening, and correctly recorded in `…/ixt-iam/.code-index/daemon.toml`. Because it queried a daemon scoped to a different root, every request errored with `-32603`. Both daemons were confirmed alive and listening (`ss -ltnp`). Timing: the primary daemon had ~34s uptime while the linked daemon had ~1 day — consistent with a race where the newly-spawned primary daemon was ignored in favor of an already-listening daemon during discovery. ## Reproduction (suspected) 1. Open project A (primary) that declares a `[[links]]` dependency on project B. 2. Have project B's daemon already running/listening. 3. Start the MCP server for project A. 4. Observe it attaches to B's daemon port; all A-scoped tool calls return `-32603 sending <op>`. ## Expected The MCP server must attach to (or spawn) the daemon whose `root`/db matches its own primary root, never a linked project's daemon. Daemon discovery should key on root, not just "first listening daemon found." ## Workaround Reconnect/restart the MCP server (or kill both daemons so they respawn). After reconnect, the primary log read `attached to existing daemon port=45569` and all tools returned correct primary-scoped data. ## Suggested fixes - Match daemon by primary root (and/or db path) during discovery; reject/ignore daemons whose root differs. - Surface internal errors: log the underlying cause to stderr instead of returning only `-32603 sending <op>` with no server-side trace. - Re-test under `v0.4.1` to confirm whether either defect is already addressed.

buildagent commented

2026-06-08 15:45:14 +02:00

Author

Member

Investigation update (reviewed against current `v0.4.2`)

Thanks for the unusually thorough report — the daemon table and logs made this tractable. After a deep review of the current code, the diagnosis is partially corrected:

The "primary attached to the linked daemon" reading is a misattribution

With a correct primary lockfile (port 45569, exactly as you documented), the attach path reads <primary_root>/.code-index/daemon.toml and can only ever reach 45569 — it is structurally incapable of attaching to the linked daemon at 40899. The attached to existing daemon port=40899 line you saw was almost certainly the linked project's own attach log: that log line carries only port+pid and no project tag, so it reads identically for primary and links. That observability gap is the first thing we're fixing.

What actually caused the outage

RpcIndex caches a single persistent TCP connection at startup with no reconnect and no health re-check. When the daemon that connection points to dies, restarts, or idle-times-out, every subsequent tool call writes to a dead socket → -32603 sending <op> indefinitely, until the MCP server is restarted. That precisely matches your symptoms (all tools fail identically; reconnecting the server fixes it). This is the real root cause and is high severity.

Confirmed defects we're fixing (target: `v0.5.0`)

RpcIndex never reconnects — a dead/restarted daemon bricks all calls until MCP restart. (the real trigger)
Handshake has no root-identity check — Stats/Lockfile carry no root, so attach can't verify the daemon on a given port serves the expected repo. Latent here, but it's the genuine "wrong daemon" hazard, and it's what makes a safe reconnect possible. We're adding an optional root to the stats handshake (wire-compatible) and rejecting mismatches.
Observability gaps — no project tag on attach logs; internal errors stringified with zero tracing; daemon-side write failures logged at debug! only. Both the silent failure and this misdiagnosis trace back to this.
Defense-in-depth — root in the lockfile payload + nested-link handling.

All wire changes are backward/forward compatible (new optional fields, old daemons still accepted). A regression test will kill a daemon under a live MCP connection and assert tool calls recover.

Will update here when v0.5.0 ships.

## Investigation update (reviewed against current `v0.4.2`) Thanks for the unusually thorough report — the daemon table and logs made this tractable. After a deep review of the current code, the diagnosis is **partially corrected**: ### The "primary attached to the linked daemon" reading is a misattribution With a *correct* primary lockfile (port `45569`, exactly as you documented), the attach path reads `<primary_root>/.code-index/daemon.toml` and can only ever reach `45569` — it is structurally incapable of attaching to the linked daemon at `40899`. The `attached to existing daemon port=40899` line you saw was almost certainly the **linked project's own** attach log: that log line carries only `port`+`pid` and **no project tag**, so it reads identically for primary and links. That observability gap is the first thing we're fixing. ### What actually caused the outage `RpcIndex` caches a **single persistent TCP connection** at startup with **no reconnect and no health re-check**. When the daemon that connection points to dies, restarts, or idle-times-out, every subsequent tool call writes to a dead socket → `-32603 sending <op>` **indefinitely**, until the MCP server is restarted. That precisely matches your symptoms (all tools fail identically; reconnecting the server fixes it). This is the real root cause and is **high severity**. ### Confirmed defects we're fixing (target: `v0.5.0`) 1. **`RpcIndex` never reconnects** — a dead/restarted daemon bricks all calls until MCP restart. *(the real trigger)* 2. **Handshake has no root-identity check** — `Stats`/`Lockfile` carry no `root`, so attach can't verify the daemon on a given port serves the expected repo. Latent here, but it's the genuine "wrong daemon" hazard, and it's what makes a safe reconnect possible. We're adding an optional `root` to the stats handshake (wire-compatible) and rejecting mismatches. 3. **Observability gaps** — no project tag on attach logs; internal errors stringified with zero `tracing`; daemon-side write failures logged at `debug!` only. Both the silent failure *and* this misdiagnosis trace back to this. 4. Defense-in-depth — `root` in the lockfile payload + nested-link handling. All wire changes are backward/forward compatible (new optional fields, old daemons still accepted). A regression test will kill a daemon under a live MCP connection and assert tool calls recover. Will update here when `v0.5.0` ships.

buildagent referenced this issue from a commit

2026-06-08 16:00:37 +02:00

fix(daemon): self-healing reconnect + root-identity handshake (issue #7)

buildagent referenced this issue from a pull request that will close it,

2026-06-08 16:02:07 +02:00

fix(daemon): self-healing reconnect + root-identity handshake (issue #7) #8

buildagent closed this issue

2026-06-08 16:05:43 +02:00

buildagent referenced this issue from a commit

2026-06-08 16:05:43 +02:00

Merge pull request 'fix(daemon): self-healing reconnect + root-identity handshake (issue #7)' (#8) from fix/issue-7-daemon-reconnect-identity into master

buildagent commented

2026-06-08 16:10:25 +02:00

Author

Member

Verified fixed. After reconnecting the MCP server, the primary server attaches to its own daemon and all previously-failing tools return correct primary-scoped results:

project_overview → 133 files / 1610 symbols, entry point src/lib.rs, 0 parse errors (correct primary index, not the linked ixt project)
search_symbols → resolves primary symbols, tagged project: primary
file_outline → returns structure cleanly

No more -32603 sending <op> errors. Closing.

Verified fixed. After reconnecting the MCP server, the primary server attaches to its own daemon and all previously-failing tools return correct primary-scoped results: - `project_overview` → 133 files / 1610 symbols, entry point `src/lib.rs`, 0 parse errors (correct primary index, not the linked `ixt` project) - `search_symbols` → resolves primary symbols, tagged `project: primary` - `file_outline` → returns structure cleanly No more `-32603 sending <op>` errors. Closing.

buildagent commented

2026-06-08 16:15:17 +02:00

Author

Member

Fixed in v0.5.0 ✅

Released: v0.5.0 (PR #8). Binaries published for linux x86_64 (glibc + musl), linux aarch64, and windows x86_64 — each with a .sha256.

What landed:

Self-healing reconnect — code-index-mcp now re-establishes its daemon connection on a transport failure (re-reads the lockfile, reconnects to the current port, retries once) instead of bricking every call with -32603 sending <op> until you restart it. This is the fix for your actual outage.
Root-identity handshake — the daemon now reports its root in the stats response and in the lockfile payload; the MCP server verifies it on both attach and reconnect and refuses a daemon serving a different project. Closes the "wrong daemon" hazard at its source.
Observability — internal failures now log the full cause chain to stderr (the -32603 sending <op> you saw will now have a server-side trace), and attach logs are tagged with root so the primary's daemon is distinguishable from a link's in the log.

All wire changes are backward/forward compatible, so a v0.5.0 MCP server works against an older daemon and vice versa.

Note on the original diagnosis: as covered above, the "primary attached to the linked daemon" reading was a misattribution — the per-root lockfile discovery can't structurally reach a link's port, and the un-tagged attach log made the link's own line look like the primary's. The real trigger was the missing reconnect. Both the symptom and the latent identity gap are now fixed.

To pick it up: install the v0.5.0 code-index-mcp and code-index-daemon (kill any running daemons so they respawn on the new build). Thanks again for the detailed report — the daemon/port table is what made this fast to pin down.

## Fixed in v0.5.0 ✅ Released: [**v0.5.0**](https://git.h-dv.de/h-dv/code-index/releases/tag/v0.5.0) (PR #8). Binaries published for linux x86_64 (glibc + musl), linux aarch64, and windows x86_64 — each with a `.sha256`. **What landed:** - **Self-healing reconnect** — `code-index-mcp` now re-establishes its daemon connection on a transport failure (re-reads the lockfile, reconnects to the current port, retries once) instead of bricking every call with `-32603 sending <op>` until you restart it. This is the fix for your actual outage. - **Root-identity handshake** — the daemon now reports its `root` in the stats response and in the lockfile payload; the MCP server verifies it on both attach and reconnect and refuses a daemon serving a different project. Closes the "wrong daemon" hazard at its source. - **Observability** — internal failures now log the full cause chain to stderr (the `-32603 sending <op>` you saw will now have a server-side trace), and attach logs are tagged with `root` so the primary's daemon is distinguishable from a link's in the log. All wire changes are backward/forward compatible, so a v0.5.0 MCP server works against an older daemon and vice versa. **Note on the original diagnosis:** as covered above, the "primary attached to the linked daemon" reading was a misattribution — the per-root lockfile discovery can't structurally reach a link's port, and the un-tagged attach log made the link's own line look like the primary's. The real trigger was the missing reconnect. Both the symptom and the latent identity gap are now fixed. To pick it up: install the v0.5.0 `code-index-mcp` **and** `code-index-daemon` (kill any running daemons so they respawn on the new build). Thanks again for the detailed report — the daemon/port table is what made this fast to pin down.