The dashboard banner 'Hermes agent is not responding' fires on every
multi-container deployment that doesn't set 'pid: "service:hermes-agent"'
in compose, because get_running_pid() relies on fcntl.flock and
os.kill(pid, 0) — both PID-namespace-scoped and invisible across container
boundaries.
Fix: when get_running_pid() returns None, fall back to a freshness check on
gateway_state.json. The gateway already writes that file on every tick with
gateway_state == 'running' and an aware ISO-8601 updated_at timestamp, so a
recent (<= 120s) timestamp is an equivalent live-process signal that needs
only a shared volume — no PID namespace, no compose workaround, no extra
HTTP probe URL.
Behavior preserved:
- In-namespace deployments still hit the PID-based path first; payload shape
unchanged (no 'reason' key) so #716 contract holds.
- Cross-container alive path adds reason='cross_container_freshness' so
support diagnostics can tell which signal succeeded.
- Stale updated_at, non-running gateway_state, malformed/naive/missing
timestamps, and timestamps far in the future all still report 'down' — the
fallback never produces a false positive.
- Same redaction rules: argv/command/executable/env/raw pid never leak.
Tests: 15 new cases in test_issue1879_cross_container_gateway_liveness.py
covering the cross-container alive path, every refusal case, clock-skew
tolerance, and backward compat with the #716 PID path. Existing #716
heartbeat tests (8) continue to pass.
Two bugs in get_available_models() conspired to duplicate the active
provider's auto-detected models under a phantom 'Custom' group whenever
custom_providers was also declared in config.yaml:
1. custom:* PIDs not in _named_custom_groups (e.g. stale slugs left from
prior configs) fell through to the auto_detected_models fallback, copying
the active provider's whole catalog into a phantom Custom: <slug> group.
Fix: continue unconditionally for ANY custom:* PID — the named-group
branch is the only legitimate population path.
2. The bare 'custom' PID, with the active provider being concrete (e.g.
ai-gateway), hit 'elif auto_detected_models: copy.deepcopy(...)' and
built a duplicate Custom group of the active provider's models with
mismatched provider prefixes. Fix: when pid == 'custom' and the active
provider is non-custom, leave models_for_group empty.
The reporter also suggested a third fix gating resolve_model_provider() on
config_provider — that's intentionally NOT applied because it conflicts with
the long-standing model-specific-override semantics covered by
test_model_resolver.py::test_custom_provider_*_routes_to_named_custom_provider
(custom_providers entries explicitly override the active provider's routing
when the user opted-in). The reporter's symptom (duplicate UI group) lives
entirely in get_available_models()'s group construction and is fully fixed
by the two changes above.
Tests: 6 new regression tests (3 in #1881 file + reuse), 774 broader
tests still green (model/provider/custom/config domain).
Per Opus pre-release verdict on PR #1843: the four handle_kanban_*
entry points declare '-> bool' but actually return True | None | False
(after PR #1843 made the False-vs-None distinction load-bearing for
the caller's '_kanban_unknown_endpoint' decision). Update the type
annotations to 'bool | None' and add a docstring on handle_kanban_get
(with cross-references on the three siblings) so a future contributor
adding a new return path doesn't accidentally produce a 0/'' value
that would silently revert the double-404 fix.
Test-only verification: kanban tests pass (49/49). Production behavior
unchanged. Cheap defensive cleanup per Nathan's standing absorb-in-release
default for ≤20-LOC documentation/type-annotation fixes.
PR #1837's new `_kanban_unknown_endpoint` wrapper was triggered for any
falsy bridge return — but `handle_kanban_*` returns `None` (not `True`)
when an inner handler calls `bad(...)` to send an error response. The
wrapper then sent a SECOND 404 on top of the bridge's response, producing
concatenated JSON bodies on the wire.
Concrete reproducer (caught by behavioural harness, not the merged tests):
GET /api/kanban/tasks/<missing-id>/log
→ '{"error":"task not found"}{"error":"unknown Kanban endpoint: GET ..."}'
This affected every `bad(...)`-shaped error path in the bridge:
- task-not-found returns from `_task_log_payload` / `_task_detail_payload`
- exception handlers for ImportError (503), LookupError (404),
ValueError (400), RuntimeError (409) across all four method handlers
- the `_handle_events_sse_stream` board-resolution failure path
The fix: distinguish an explicit `False` (truly unmatched path) from
`None` (handled, response already sent). Only `False` should trigger
the unknown-endpoint diagnostic.
Adds a regression test that exercises the task-not-found path through
`routes.handle_get` and asserts only one JSON body is on the wire.
Follow-on to #1837 (already merged into master at v0.51.20).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Note: PR #1827 was branched before v0.51.19 shipped #1812, which
introduced an initial (pure live-fetch) Codex provider card hook in
api/providers.py at the same line range. The contributor's PR was
filed AFTER #1812 shipped but their diff didn't yet account for it.
Stage 314 absorbs the contributor's intent (visible Codex cache
merge for gpt-5.3-codex-spark visibility) by replacing the v0.51.19
hook with the richer merged version directly in stage. Production
code change ≡ what the contributor's PR would have produced if
rebased onto current master. Test file + pr-media adopted verbatim.
Marker commit so the stage log makes the absorption visible.
Two in-stage fixes for v0.51.19 batch:
1) api/config.py — add resolve_alias=False param to
_resolve_configured_provider_id() and pass it from
resolve_model_provider(). The PR #1818 swap from
_resolve_provider_alias() to _resolve_configured_provider_id()
was correct for active-provider/badge surfaces but broke #1625's
local-server-provider literal-preservation contract: 'ollama' →
'custom' and 'lm-studio' → 'lmstudio' alias-collapse caused
_LOCAL_SERVER_PROVIDERS membership check to miss, breaking the
model-id full-path preservation for LM Studio/Ollama. The new
flag preserves the raw provider value when called from
resolve_model_provider, and named-custom-slug + base-url
fallback both still run unchanged.
2) tests/test_bootstrap_discover_agent.py — pin Path.home() in
_isolate_discover_agent_dir so the hard-coded
'Path.home() / .hermes / hermes-agent' / 'Path.home() /
hermes-agent' candidates in discover_agent_dir() can't pick up
the dev machine's real install. The original PR #1817 isolation
helper covered HERMES_HOME, HERMES_WEBUI_AGENT_DIR, and
REPO_ROOT but missed the Path.home() leak.
Both surfaced on full pytest pre-release gate, fixed in stage,
ship in v0.51.19. Tests: full suite green.
PR #1762 fixed the rsplit grammar collision for plain @openrouter:model:free
qualifiers, but skipped the fallback whenever the provider hint started with
'custom:' on the assumption that custom providers route directly. That left
'@custom:my-key:some-model:free' broken: rsplit yields
provider='custom:my-key:some-model', bare='free' → custom guard skips the
split-fallback → returns provider='custom:my-key:some-model', model='free'.
Detect the over-split structurally instead of using a known-suffix allowlist:
custom hints carry exactly one segment after 'custom:' (constructed at
api/config.py:1363 as 'custom:' + entry_name). So any rsplit result of
'custom:<a>:<b>' with bare model '<c>' has eaten one model segment — peel
it back with a second rsplit and prepend it to the bare model.
This is robust for :free / :beta / :thinking / :preview / any future
OpenRouter suffix without an allowlist to maintain.
Adds 5 regression tests covering the matrix (free/beta/thinking/preview/
slashed-model). All 7 existing #1744 tests still pass; #1228 tests
unaffected.
Co-authored-by: Cake <51058514+Sanjays2402@users.noreply.github.com>
The bridge module docstring still described the API as 'deliberately
read-only' but it now exposes full CRUD (tasks, boards, comments,
links, SSE). Updated to list the supported operations.
For _board_counts_for_slug (the hot path for the board-switcher badge),
added a board_exists() early-out that mirrors the agent's own helper
in plugin_api.py (path.exists() before connect()). This avoids a
redundant init_db()+connect() schema pass per board per list refresh.
connect() already handles auto-init for fresh databases via its
needs_init check, so the extra init_db was unnecessary overhead on
the hot path that scales linearly with board count.
Tests:
- test_board_counts_returns_empty_for_nonexistent_board: verifies the
early-out (no connect() call, returns {})
- test_board_counts_returns_real_counts_for_populated_board: verifies
actual per-status counts are returned for existing boards
Issue #1764 asked for a much larger surface (Reveal + Copy-path on
every UI surface that references a file path, plus Rename in session
menus). Per Nathan's curation we ship only the three highest-leverage
pieces in this PR — they cover the three concrete user-visible
frictions Cygnus reported, and leave the broader sweep for follow-up.
## 1. Copy file path in workspace tree right-click menu
The tree's right-click already had Rename and Reveal in File Manager.
Reveal is slow when the user just wants the path string for a
terminal/editor — and there was no Copy-path action anywhere.
Added "Copy file path" between Reveal and Delete. It POSTs to a new
`/api/file/path` endpoint that resolves the relative tree-rooted path
into the absolute on-disk path (the frontend can't compute it because
only the server knows the workspace root) and writes the result to
the OS clipboard via `navigator.clipboard.writeText()`. Falls back to
the legacy execCommand pattern on browsers where the modern Clipboard
API is gated.
The new endpoint deliberately does NOT require the target to exist:
copy-path on a recently-deleted file is still useful (paste into a
terminal to investigate). `safe_resolve` continues to gate path
traversal — the test suite pins this with a `../../../../../etc/passwd`
attempt that 400s.
## 2. Rename in session three-dot menu
Cygnus's specific ask: double-click rename in the sidebar is timing-
sensitive — the first click frequently registers as "open the chat"
before the second click arrives, so users open the conversation when
they meant to rename it. Putting Rename in the menu eliminates the
timing entirely.
Added Rename as the FIRST item in `_openSessionActionMenu` (above
Pin). It reuses the existing `startRename` closure attached to each
session row — no duplicated state, no second API call out of band
with the double-click path. Mechanism: the row builder now stores
`el._startRename = startRename` and `el.dataset.sid = s.session_id`,
so the menu can find the row by data-sid and call its closure
directly. This keeps all the `_renamingSid`/`oldTitle`/`applyTitle`
bookkeeping single-sourced.
Read-only imported sessions skip the menu item via the same
`_isReadOnlySession` gate the closure already uses.
## 3. Reveal-failed toast includes the resolved server-side path
Cygnus posted a screenshot of a "Failed to reveal: not found" toast
that dropped the path entirely. Without it the user can't tell which
file the system expected — useful when a stale session row still
references a deleted file.
Server-side fix in `_handle_file_reveal`: instead of returning
`bad(handler, "File not found", 404)`, return
`bad(handler, f"File not found: {target}", 404)` where target is the
resolved absolute path. Frontend toast also defends against err with
no .message: `(err.message||err)` instead of `err.message` alone.
Verified live: a missing-file reveal now produces:
Failed to reveal: File not found: /home/hermes/workspace/missing-xyz.txt
Cygnus's exact diagnostic-friction is gone.
## Tests
* tests/test_1764_context_menu_essentials.py (new)
- 13 source-level pinning tests
- 6 live HTTP behaviour tests against the conftest test server
* tests/test_1466_sidebar_cancel_clarify.py
- Two assertion-window bumps (3200→4400, 3600→4800) to accommodate
the new Rename action prepended to _openSessionActionMenu. The
test relied on a fixed-byte-window function-body slice — comments
added explaining why the bumps were needed.
* All 9 locales got translations for the 5 new keys
(copy_file_path, path_copied, path_copy_failed, session_rename,
session_rename_desc) — locale parity tests pass.
## Verification
Full pytest suite: 4671 passed, 2 skipped, 3 xpassed (matches
pre-change baseline).
Live browser verification on port 8789:
- Right-click .git folder in workspace tree → menu shows
Rename / Reveal in File Manager / Copy file path / Delete (red).
- Click Copy file path → clipboard gets "/home/hermes/workspace/.git",
toast confirms "File path copied to clipboard".
- Open session three-dot menu → Rename conversation appears first
with pencil icon, followed by Pin / Move / Archive / Duplicate /
Delete in the same order as before.
- Trigger reveal on a non-existent file → toast reads
"Failed to reveal: File not found: /home/hermes/workspace/<filename>".
The resolved server-side path is now visible in the failure.
Refs nesquena/hermes-webui#1764.
The previous approach of prepending 'openrouter/' to the model ID in the
catalog was incorrect — it only masked the symptom while regressing the
config_provider=openrouter codepath.
The root cause is in resolve_model_provider(): rsplit(':', 1) on
'@openrouter:tencent/hy3-preview:free' yields provider='openrouter:tencent/hy3-preview'
and model='free', because the ':free' suffix collides with the @provider:model
grammar.
Fix: after rsplit, validate that the extracted provider hint is a known
provider (in _PROVIDER_MODELS, _PROVIDER_DISPLAY, or starts with 'custom:').
If not, fall back to split(':', 1) so trailing suffixes stay attached to
the model ID.
This fixes all current and future OR models with colon-suffixed tags
(:free, :beta, :thinking, :nitro, etc.) without catalog changes.
Also adds regression tests for the affected models and edge cases.
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
- Backend: return `configured` field alongside `running`. When
alive=None (no gateway metadata), configured=false with fallback to
identity_map heuristic.
- Frontend: amber "Gateway not configured" when configured=false,
red "Gateway not running" only when configured but process is down,
green "Running" when both true.
- Replace dead try/except fallback with explicit tri-state check on
health["alive"].
- Add regression test for last_active guard when alive=true and
identity_map is empty.
All 87 gateway-related tests pass.
Use agent_health.build_agent_health_payload() as the authoritative
running signal instead of bool(identity_map). An empty identity_map
means zero connected messaging platforms, not that the gateway is down.
Falls back to identity_map heuristic when agent_health module is unavailable
(e.g. WebUI-only deployments).
macOS Finder's 'Copy as Pathname' (Cmd+Option+C) wraps paths in single
quotes by default — '/Users/x/Documents/foo' — and users routinely paste
those quoted strings into the Add Space input expecting them to work.
Other shells and OS file managers do similar things with double quotes.
Today the path is taken via .strip() only, so the literal quote
characters become part of the resolved Path and the validator rejects
the result as 'not a directory'. cygnus reported this on Discord
(2026-05-01) — she had to manually un-quote her paths to register a
new Space.
Fix:
- New api.workspace._strip_surrounding_quotes() helper. Removes only
the outermost paired single or double quotes; preserves unpaired or
mismatched quotes (a path may legitimately contain a literal quote).
- validate_workspace_to_add() calls it before resolution so every
code path that registers a workspace benefits, not just the HTTP
route.
- _handle_workspace_add() also calls it at the route entry so the
blocked-system-path check and the duplicate-detection check both
see the cleaned form.
14 regression tests pin the behavior matrix:
- Unwrapped path unchanged
- Single quotes stripped
- Double quotes stripped
- Whitespace outside quotes handled (trim-then-strip)
- Only outermost pair removed (internal quotes preserved)
- Unpaired / mismatched quotes preserved
- Empty string + just-a-pair edge cases
- Validate_workspace_to_add accepts quoted form for existing dir
4610 tests pass (+14 from this PR), 0 regressions, ~2:27 full suite.
Reported by Cygnus on Discord, May 1 2026.