Per Opus advisor on stage-299:
1. Bounded WIKI_PATH walk + forbidden-root guard (api/routes.py)
- _LLM_WIKI_MAX_FILES = 10000 caps rglob iteration (prevents hangs on
symlink loops or pathologically-large trees)
- _LLM_WIKI_FORBIDDEN_ROOTS blocklist refuses '/' '/etc' '/usr' '/var'
'/opt' '/sys' '/proc' even if WIKI_PATH is misconfigured to point
at them
- Self-DoS prevention: /api/wiki/status fires on every Insights tab
open via Promise.all, and unbounded rglob would block the endpoint
2. URL-scheme guard for docs_url interpolation (static/panels.js)
- rawDocsUrl is regex-validated against /^https?:\/\//i before being
interpolated into the <a href=> attribute
- esc() HTML-escapes but doesn't validate URL scheme; docs_url is
server-controlled today but the contributor scaffolded it for
potential config-driven use, so future-proof against javascript:
scheme XSS
6 regression tests in tests/test_stage299_opus_fixes.py pin both fixes.
Two SHOULD-FIX items from the Opus advisor pass on PR #1675:
1. **PATCH/DELETE handler routing asymmetry**. The /boards/<slug> path
match was running AFTER ?board= resolution, so a stray ?board=ghost
on a 'PATCH /api/kanban/boards/experiments?board=ghost' would 404 on
the missing 'ghost' board instead of editing 'experiments'. POST
already routed /boards first; PATCH/DELETE now mirror that structure.
The ?board= query is still resolved for the task-scoped routes that
actually need it.
2. **SSE event frames now emit 'id: <event_id>' lines**. EventSource
stores Last-Event-ID and sends it on auto-reconnect; without an 'id:'
field on each frame the browser couldn't resume cleanly across
connection drops, forcing the server to re-stream up to
_KANBAN_SSE_BATCH_LIMIT=200 events the client already had. The
handler now (a) emits 'id: <cursor>' on every events frame, and
(b) reads Last-Event-ID from the request headers as a fallback when
?since= is absent.
+4 regression tests:
- test_handle_kanban_patch_routes_boards_slug_before_board_query_param
- test_handle_kanban_delete_routes_boards_slug_before_board_query_param
- test_sse_emits_id_lines_so_browser_can_resume_via_last_event_id
- test_sse_honours_last_event_id_header_when_since_absent
Total kanban tests: 67 -> 68 (CSS-injection fix in 60874db) -> 72 (this).
Co-authored-by: ai-ag2026 <ai-ag2026@users.noreply.github.com>
Closes the remaining gaps to first-party Hermes Agent dashboard parity:
multi-board CRUD on /api/kanban/boards and a real-time event stream over
Server-Sent Events. Builds on top of #1660 (review-feedback hardening).
== Multi-board ==
Five new endpoints mirror the agent dashboard plugin contract verbatim
(plugins/kanban/dashboard/plugin_api.py) so a single CLI / gateway slash
command / dashboard / WebUI all share the same active-board pointer:
GET /api/kanban/boards
POST /api/kanban/boards
PATCH /api/kanban/boards/<slug>
DELETE /api/kanban/boards/<slug>
POST /api/kanban/boards/<slug>/switch
All existing endpoints accept ?board=<slug> (and writes also accept
'board' in the JSON body) — query takes precedence over body. The slug
travels through the kanban_db library which already had multi-board
support; the bridge is mostly thin wrappers around create_board /
remove_board / list_boards / set_current_board / get_current_board.
The default board is protected from deletion. Slugs are normalised
through kb._normalize_board_slug() with path-traversal rejection.
Archive is the default for DELETE; ?delete=1 hard-deletes.
Frontend gets a 'Default ▾' switcher pill in the panel header. The menu
lists every board (current first), per-status total badges, plus three
actions (New / Rename / Archive). Create + rename use the same modal
with a slug auto-derived from the name. Archive routes through the
existing showConfirmDialog with a clear 'tasks remain on disk and the
board can be restored from kanban/boards/_archived/' message.
Active-board state is persisted to localStorage so a refresh stays put.
The on-disk pointer in kanban/current is the cross-process source of
truth, kept in sync via POST /boards/<slug>/switch.
== SSE event stream ==
GET /api/kanban/events/stream is a long-lived Server-Sent Events feed
that mirrors the agent dashboard's WebSocket /events contract. The
WebUI uses SSE rather than WebSocket because (1) the existing transport
is BaseHTTPServer, not async — WS would require a significant refactor
or a hijack-the-socket hack; (2) SSE is the right tool for unidirectional
server-pushed event streams; (3) browsers auto-reconnect on drop;
(4) the existing /api/approval/stream and /api/clarify/stream patterns
are proven and easy to copy.
The handler polls task_events at 300ms (matching the agent dashboard's
WebSocket poll cadence) so write-to-receive latency is identical.
Heartbeats every 15s prevent proxy/CDN reaping. Hard cap of 200 events
per batch.
Frontend uses EventSource by default and falls back to 30s HTTP polling
after 3 SSE failures. A 250ms debounce coalesces bursts of N events
into a single board re-fetch. Stream is torn down when the user leaves
the Kanban panel.
== Bugs fixed during build ==
(1) read_only=True legacy lie. _board_payload, _events_payload,
_task_log_payload, and the no-change short-circuit all hardcoded
read_only=True from the read-only-bridge era of #1645. Bridge has
been writable since #1649 — flag now matches reality.
(2) Modal + dropdown menu transparent backgrounds. The PR stack used
var(--panel) which is undefined in the WebUI design system (uses
--surface, --bg, gradient panels). Replaced with the same gradient
+ accent border pattern used by the .app-dialog overlay.
(3) Archive race. kb.connect(board=<slug>) auto-materialises the
directory + sqlite on first call, so any in-flight SSE poll on a
board mid-archive would silently un-archive it by re-creating the
directory. Two-layer fix: (a) frontend stops the SSE stream BEFORE
the DELETE call, restarts on failure; (b) bridge's _kanban_sse_fetch_new
checks kb.board_exists() before connect(), returning empty results
when the board is gone.
(4) Save vs. Cancel button visual hierarchy. Both rendered as identical
secondary buttons in the modal. Save now uses the .primary class
with accent-tinted gold styling.
(5) Mobile viewport gaps. Added 9 rules under @media (max-width: 640px)
covering the switcher button (smaller padding/font), name truncation
(max-width:140px), menu sizing (min(280px, 100vw - 24px)), modal
padding, and inline-row stacking.
== Tests ==
+45 new tests across two files. Bridge tests: 18 covering board CRUD
endpoints, slug validation, default-board protection, dispatcher routing,
board isolation (verified via connect() spy), and 3 SSE tests including
a worker-thread integration test with threading.Event watchdog. UI static
tests: 11 covering switcher markup, modal markup, JS handler presence,
REST verb usage, board-param plumbing, localStorage persistence,
showConfirmDialog usage, EventSource subscription, polling fallback,
panel-switch teardown, and 250ms debouncing.
Bridge tests: 18 → 36 (+18 multi-board, +3 SSE)
UI static tests: 15 → 26 (+11)
Total kanban: 33 → 63
Full repo test suite: 4351 passed, 0 regressions.
== Live verification ==
End-to-end browser walkthrough on port 8789:
- Create Sprint 12 + Backlog via modal: switcher updates ✓
- Switch between boards: count isolation correct ✓
- Add task on Sprint 12 via API: SSE delivers in 400ms ✓
- 5-task burst: 250ms debounce coalesces to single render ✓
- Rename board via modal: switcher label updates ✓
- Archive board: confirm dialog → board moved to _archived/, no zombie
directory (race fix verified) ✓
- Zero JS errors throughout 11-step flow
Co-authored-by: ai-ag2026 <ai-ag2026@users.noreply.github.com>
Four follow-up issues found in the combined-stack live verification:
(1) handle_kanban_get had no exception handler; ImportError (webui-only deploy
without hermes_cli), ValueError, LookupError, RuntimeError would bubble
as 500. Wrapped in same exception cascade as POST/PATCH/DELETE.
(2) ImportError on any verb now returns 503 "kanban unavailable: <reason>"
instead of 500. Frontend's existing try/catch surfaces a clean toast.
(3) The 'Read-only view' banner (legacy of read-only PR #1645) was always
visible regardless of actual board state. Default-hidden in HTML;
loadKanban() toggles based on _kanbanBoard.read_only.
(4) .btn / .btn.secondary class names were referenced in 4 places (Bulk
action / Nudge dispatcher / New task / Back to board) but no matching
CSS shipped — buttons rendered as browser-default beveled controls
that clashed with the dark theme. Added scoped CSS rules under the
kanban-* parent containers.
+4 behavioral + static UI tests covering the contracts.
Co-authored-by: ai-ag2026 <ai-ag2026@users.noreply.github.com>
The PATCH /api/kanban/tasks/:id endpoint allowed any status-to-any-status
transition for the non-claim/complete/block/archive set via raw
`UPDATE tasks SET status = ?`. This let UI users (or any client) flip a
task to 'running' without going through kb.claim_task(), bypassing
claim_lock + claim_expires + started_at + worker_pid. The dispatcher
treats such a phantom-claimed task as orphaned and may reclaim, hide, or
double-dispatch it.
Match the agent dashboard plugin's contract
(plugins/kanban/dashboard/plugin_api.py update_task):
- status='running' via PATCH → ValueError (HTTP 400)
- status='ready' from currently-blocked → kb.unblock_task() (fires
'unblocked' event)
- status='ready' from anything else, plus status in {'todo', 'triage'}
→ new _set_status_direct() helper that nulls claim fields when leaving
'running', closes any active run with outcome='reclaimed', and
appends a 'status' event row to task_events
- status='done', 'blocked', 'archived' → unchanged (already structured)
Frontend changes:
- Drop 'running' from the .kanban-status-actions button row in the task
detail pane (clicking it would always 400 anyway).
- allowKanbanDrop() refuses the 'running' column as a drop target with
dropEffect='none' so users see immediate visual feedback that the
dispatcher/claim path owns running.
Tests added (3, all passing):
- test_patch_status_running_is_rejected_to_protect_dispatcher_contract
- test_patch_status_done_to_running_is_rejected
- test_patch_status_blocked_to_ready_routes_through_unblock_task
Existing 12 tests still pass.
Co-authored-by: ai-ag2026 <ai-ag2026@users.noreply.github.com>
The Codex OAuth onboarding worker introduced in #1652 had a cancel-vs-worker
race: a `cancel_onboarding_oauth_flow` request that arrived while the worker
was mid-network-call (between the `live = dict(...)` snapshot and the next
status check) would be silently overridden:
1. User clicks Cancel → server sets flow.status = "cancelled" and drops
sensitive lifecycle fields under the lock.
2. Worker is mid-`_poll_codex_authorization` / `_exchange_codex_authorization`
using the local `live` snapshot it captured before the cancel.
3. Worker calls `_persist_codex_credentials(...)` — auth.json gets written.
4. Worker calls `_set_flow_status(flow_id, "success")` — overrides the
cancelled status.
Net effect: the user's explicit cancel is ignored, credentials are persisted,
and the UI reports success. Reproduced with a behavioural harness that drove
a real worker thread against patched network helpers and confirmed:
pre-fix : flow status `success`, auth.json written despite cancel
post-fix: flow status `cancelled`, auth.json NOT written
The fix re-checks the flow status under `_OAUTH_FLOWS_LOCK` after the token
exchange completes and before persisting. If the status is no longer
`pending`, the worker exits without persisting credentials and without
overwriting the terminal status.
Regression test `test_cancel_during_token_exchange_does_not_persist_credentials`
drives the worker against threading.Event-gated network stubs to reproduce
the race deterministically and lock the new invariant.
Trace verified against fresh hermes-agent tarball — credential_pool entry
shape (`auth_type=oauth`, `source=manual:device_code`, `priority=0`, base_url)
remains compatible with `agent.credential_pool.load_pool("openai-codex")` and
the agent CLI's `_save_codex_tokens` legacy fallback path.
Tests:
- 10/10 in tests/test_issue1362_codex_oauth_onboarding.py
- Full suite: 4230 passed, 57 skipped, 3 xpassed, 0 failed in 33.82s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Constituent PRs:
#1637 by @Michaelyklam — protect raw pre from glued-bold lift (closes#1451)
#1639 by @bergeouss — macOS auto-scroll race + custom:* provider list (closes#1360, #1619)
#1642 by @nesquena-hermes — YAML/JSON/diff code block newlines (closes#1618, #1463)
Opus advisor SHIP verdict on stage-295. One observation absorbed:
- api/config.py:2533 dead-code comment per Opus (defensive belt-and-braces
for #1619 fallback; load-bearing fix is in routes.py /api/models/live)
PR #1641 (Michaelyklam parallel-discovery duplicate of #1642) closed as
superseded; UI media adopted with co-author trailer.
4245 → 4255 tests passing (+10).
#1360 — On macOS WKWebView, trackpad momentum scrolling fires scroll
events that interleave with the _programmaticScroll setTimeout(0) guard.
A mid-momentum scroll event either gets swallowed (_programmaticScroll
still true) or falsely reports nearBottom (momentum hasn't settled),
keeping _scrollPinned=true and snapping the viewport back down.
Fix: rAF-debounce the scroll listener so the nearBottom check runs at
the next paint frame when the browser's scroll position has settled.
Added a hysteresis counter requiring 2 consecutive near-bottom samples
before re-pinning, preventing accidental re-pin during deceleration.
#1619 — When a custom:* provider (e.g. custom:relay via custom_providers)
has models that overlap with auto-detected models from base_url /v1/models,
the dedup logic at config.py:2263 skipped them all. The named custom
group ended up empty, and the continue at line 2334 silently discarded
the auto-detected models. Result: only the default model appeared.
Fix 1 (config.py): When custom:* named group has 0 models after dedup,
fall back to auto_detected_models_by_provider instead of dropping them.
Fix 2 (routes.py): Extended /api/models/live fallback to handle
custom:* slugs (not just bare "custom") for both custom_providers
config lookup and base_url live fetch.
Closes#1633. STATE_DIR/models_cache.json was persisted across server
restarts without any version stamp, so a Docker container update from
version A to B read the cache file written by version A — users saw
stale picker contents (missing models, phantom provider groups) for
up to 24 hours until either the TTL expired, an unrelated provider
edit triggered invalidate_models_cache(), or they manually deleted
the file.
Reporter Deor (Discord) updated to v0.50.292 — which contained fixes
for #1538, #1539, and #1568 — did a hard refresh and cleared site
data, and still saw byte-for-byte identical picker contents because
the server kept reading the v0.50.281 cache file off the host-mounted
state volume.
Fix:
* _save_models_cache_to_disk() stamps payloads with _webui_version
(resolved lazily from api.updates.WEBUI_VERSION via sys.modules
lookup to avoid the api.config <-> api.updates circular import)
and _schema_version = 2.
* New _is_loadable_disk_cache() validator checks both stamps in
addition to shape. Mismatch on either field rejects the load.
* _load_models_cache_from_disk() calls the new validator and
strips the disk-only metadata before returning, so the rest of
the code sees the same shape it always did.
* _is_valid_models_cache() kept loose (shape-only) so in-memory
cache writes that never touch disk don't fail validation.
Schema version is independent of the WebUI version stamp so future
cache-shape changes can invalidate older releases without relying
on a tag bump alone.
Early-init edge case (api.updates not yet loaded) skips the version
check rather than wedging the boot — at worst an unstamped file is
written once and rejected on the next call.
Updated existing tests/test_model_cache_metadata.py to use subset/
round-trip semantics rather than byte-for-byte equality, since the
disk payload now has additional stamps. The four response-shape
fields still round-trip verbatim; the load result is unchanged
(stamps stripped). 19 new regression tests.
4180 -> 4199 tests pass.
SHOULD-FIX: rate-limit _repair_stale_pending repair-firing telemetry. Switch
from unconditional logger.warning to age-keyed: WARNING when pending_age <
5min (the diagnostically valuable race window — actual leak-path candidates
that slipped past the grace guard) and DEBUG for the long-tail (orphaned
sidecars from prior process lifetimes). Prevents reconnect loops on stuck
sessions from flooding the log while preserving the diagnostic signal we
want for tuning _REPAIR_STALE_PENDING_GRACE_SECONDS empirically.
NIT: _LOCAL_SERVER_PROVIDERS expanded with lm-studio (hyphenated alias used
in some custom_providers configs and already recognized at api/config.py:2189
for SSRF host trust) and localai (LocalAI project). Test parametrize expanded
from 7 to 11 names, also covering pre-existing koboldcpp and textgen for
symmetry. +4 regression tests.
NIT (docs): CHANGELOG callout for the RFC1918 behavior change. Internal-
network OpenAI-compatible proxies now preserve the model prefix on private-IP
base_urls. Documented the migration path: configure as a custom_providers
entry to bypass the local-server detection.
NIT (deferred, optional): narrowing the heuristic to is_loopback only is
left as future work; the broader scope was an explicit goal in the bug
body and Opus flagged it as SHOULD-DISCUSS-but-not-block.
4184 -> 4188 passing. 0 regressions. ~10 LOC absorbed total.
Closes#1623 — Lower SSE app heartbeat from 30s to 5s at every long-lived
handler (main agent, terminal, gateway-watcher, approval-poller, clarify-poller).
Kernel TCP keepalive declares peer dead at 25s worst-case (10s KEEPIDLE +
5s KEEPINTVL * 3 KEEPCNT, added v0.50.289 #1581). 30s app heartbeat let the
kernel tear sockets down on flaky networks before the app sent its first
keepalive byte — drops at ~10s during long thinking phases. New named
constant _SSE_HEARTBEAT_INTERVAL_SECONDS=5; regression test pins the
inequality (app_heartbeat * 2 <= kernel_window) so future tuning can't
re-introduce the misalignment.
Closes#1624 — Add 30s grace period to _repair_stale_pending() trigger.
Without it, any narrow race between the streaming thread clearing
pending_user_message and STREAMS.pop(stream_id) produces a false-positive
'Previous turn did not complete.' marker on a turn that finished correctly
(reproducible after every command-approval turn). Defense-in-depth, not
the root-cause fix — the actual streaming-thread leak path is tracked
separately. Falsy pending_started_at (legacy sidecars) treated as
'old enough' so legitimate legacy-data recovery still works. Plus
logger.warning telemetry on every legitimate repair so the next batch of
user reports tells us whether the underlying race still fires.
Closes#1625 — Local model servers (LM Studio, Ollama, llama.cpp, vLLM,
TabbyAPI, koboldcpp, textgen-webui) now keep the full HuggingFace-style
model id (e.g. 'qwen/qwen3.6-27b' instead of stripped 'qwen3.6-27b'). New
_LOCAL_SERVER_PROVIDERS set + _base_url_points_at_local_server() loopback/
RFC1918 heuristic — either signal triggers no-strip. Backward compat
preserved for OpenAI-compatible proxies on public hosts (LiteLLM at
litellm.example.com still strips openai/gpt-5.4 -> gpt-5.4). Updated the
existing #230/#433 test to reflect that #1625 supersedes the strip-on-custom
rule for loopback hosts (see api/config.py and test_model_resolver.py
docstring update). Reported by @akarichan8231 in Discord on 2026-05-04.
42 regression tests across:
tests/test_issue1623_sse_heartbeat_alignment.py (3)
tests/test_issue1624_repair_stale_pending_grace.py (9)
tests/test_issue1625_local_server_model_id_preservation.py (30)
4142 -> 4184 passing. 0 regressions.
SHOULD-FIX #1 (renamed-root client cross-alias): drop strict-equality client
filter at static/sessions.js:1853. Server-side _profiles_match cross-aliases
'default'-tagged rows to a renamed root 'kinni'; the strict-equality client
would reject them, dropping every legacy session for renamed-root users. The
server is now solely authoritative for profile scoping.
SHOULD-FIX #2 (messaging-source dedupe ordering): _keep_latest_messaging_session_per_source
now runs AFTER the profile filter at api/routes.py:2078. Before, it ran on
the merged-cross-profile list with profile-blind keys, discarding the older
profile's row across profiles before the scope filter — leaving zero rows for
any messaging identity the active profile shared with another profile.
NIT #3: _projects_migrated flag now set only AFTER successful save_projects.
NIT #4: cleaned dead test code in test_is_root_profile_invalidation_drops_stale.
NIT #5: _create_profile_fallback's clone_from=='default' literal now routes
through _is_root_profile() for parity with the 5 other callsites.
+2 regression tests pin the SHOULD-FIX shapes:
- test_keep_latest_messaging_runs_after_profile_filter (source-string ordering)
- test_static_sessions_js_trusts_server_profile_scoping (no client re-filter)
4173 -> 4175 tests pass. 0 regressions.