Opus stage-339 review SHOULD-FIX items:
1. server.py: drop 'unsafe-eval' from CSP report-only policy.
Verified by grepping all production JS — zero matches for eval(),
new Function(), or string-form setTimeout/setInterval. Keeping it
was a gratuitous privilege.
2. server.py: add https://cdn.jsdelivr.net to script-src + style-src.
index.html loads Prism/xterm/katex from this CDN with SRI hashes —
without the allowance every page load fires known-good CSP violations
that drown out real signal once a collector is wired.
3. api/commands.py: sanitize plugin command error. Previously returned
f'Plugin command error: {exc}' which would leak paths/env from
FileNotFoundError('/etc/something/secret.key') etc. Now returns only
the exception type name; full traceback goes to server log.
Test asserts updated to match the new policy shape.
Co-authored-by: Opus advisor <opus-advisor@hermes.local>
- replace navigator.clipboard.writeText with _copyText (has textarea fallback)
- add severity filter dropdown (All / Errors / Warnings+)
- add _severityForLine and _filteredLogsLines helpers
- add logsSeverityFilter HTML element + CSS class hooks
- add 5 new i18n keys across all 8 locales
- update test_logs_ui_static.py to match new implementation
Closes#2081
Add test_kanban_locale_parity to test_kanban_ui_static.py that asserts
every kanban_* i18n key in the English locale exists in all non-English
locale blocks. Pattern follows test_lineage_segment_locale_keys_are_defined_for_sidebar_locales.
CI's pytest invocation imports conftest twice (once via the standard
tests/ discovery, once via repo-root rootdir discovery), producing two
distinct function objects with the same __qualname__ but different `is`
identity. The strict identity assertion failed because each import
created a fresh closure. Switch to __qualname__ substring check — same
guarantee (default-on state has the wrapper installed; fixture restores
the real one) without the multi-import sensitivity.
CI on Python 3.11 still failed test_allow_outbound_network_fixture_*
because the previous module-global toggle (_ALLOW_OUTBOUND=True/False)
was unreliable on the runner — the wrapper's global lookup at call time
sometimes saw False even after the fixture's True assignment.
Switch to monkeypatch-based fixture: instead of toggling a global that
the wrapper checks, restore socket.create_connection and
socket.socket.connect to their REAL captured implementations for the
duration of the test. Pytest's monkeypatch fixture handles teardown so
the wrappers are reinstalled automatically.
Rewrote the two paired tests to check function identity
(socket.create_connection is _hermes_blocked_create_connection vs. is
_REAL_CREATE_CONNECTION) instead of attempting a live outbound to
8.8.8.8:53 — direct identity check is hermetic and doesn't depend on
whether the CI runner has any outbound network access at all.
Two low-severity follow-ups from Opus regrounding review:
1. The IPv6 unique-local fc00::/7 check was `h.startswith('fc') or
h.startswith('fd')` — too loose. It would also classify hostnames
like 'food.example.com' or 'fdsa.test' as 'local' and silently let
them through the block. Tightened to a regex match for canonical
IPv6 syntax (`f[cd][0-9a-f]{0,2}:`) so only actual IPv6 addresses
match. Same fix in both tests/conftest.py and server.py.
2. test_allow_outbound_network_fixture_unblocks was technically
self-passing: it tried to connect to a *.invalid hostname, which is
in the allow-list, so the real socket.create_connection would run
regardless of whether the fixture toggled the block. Replaced with
a public-IP-based test that actually proves the toggle works, plus
a paired test_block_is_active_outside_the_fixture sanity test that
proves the block is on without the fixture.
Both follow-ups noted by Opus advisor as 'defer-OK' but trivial fixes
so landing them in this batch.
CI on Python 3.13 (clean editable install, no hermes_cli package) was still
failing the 3 lmstudio tests after the first fix attempt. Root cause: the
outer try/except in the lmstudio branch was catching ImportError from
`from hermes_cli.models import provider_model_ids`, hijacking the whole
branch and silently skipping the urlopen fallback.
Restructured into two independent tiers:
1. hermes_cli lookup in its own try/except — ImportError logs at DEBUG
and continues with lm_ids=[].
2. urlopen fallback runs unconditionally when lm_ids is empty, including
after hermes_cli import failure.
New regression test `test_lmstudio_fallback_works_when_hermes_cli_unavailable`
explicitly blocks hermes_cli via sys.meta_path and verifies the lmstudio
group still populates from the urlopen fallback. Without this test, the
CI-vs-local divergence (local env had hermes_cli installed, CI didn't)
would keep slipping through.
All 12 lmstudio-related tests pass, including the 3 #1527 tests that
broke on stage-337.
PR #2053 added worktree-backed session creation. PR #2041 (shipped in
v0.51.42) added state.db sidecar reconciliation that rebuilds a missing
<sid>.json sidecar from the canonical state.db row when the JSON file is
gone (failed save, manual rm, restore-from-backup with mismatched dirs).
The two interact silently. `_state_db_row_to_sidecar()` was hard-coding
`'workspace': ''` and never propagating the four worktree_* fields from
the row to the rebuilt sidecar dict. So a worktree-backed session that
loses its sidecar and gets rebuilt from state.db:
- loses `worktree_path` → matches the empty-session sidebar filter at
`api/models.py:1067/1107` (which spares worktree-backed empty sessions
via `not s.get('worktree_path')`) → session disappears from the
sidebar even though the worktree directory still exists on disk.
- loses `workspace` → downstream tools (terminal panels, file pickers
that use `s.workspace`) operate on empty string instead of the original
worktree path.
- always reports `message_count == 0` → contributes to the empty-session
filter even for sessions that have messages in `state.db.messages`.
Fix:
1. `_read_state_db_missing_sidecar_rows()` SELECT now includes
`workspace, worktree_path, worktree_branch, worktree_repo_root,
worktree_created_at, message_count` (each gated by
`_sql_optional_col()` so older state.db schemas without those columns
continue to work — recovery degrades gracefully rather than 500ing).
2. `_state_db_row_to_sidecar()` propagates each field. workspace comes
from the row if it's a string, otherwise '' (matching pre-fix behavior
for non-worktree sessions). message_count comes from the row if
it's an int, otherwise falls back to `len(messages)` so the rebuilt
sidecar always has a coherent count.
3 new regression tests in tests/test_state_db_worktree_recovery.py
exercise:
- worktree session with messages → all four worktree_* fields preserved.
- non-worktree session → worktree_* fields all None (no spurious
propagation), workspace=''.
- empty worktree session (the worst case) → confirms the rebuilt sidecar
does NOT match the empty-session-exempt filter, so it stays visible
in the sidebar.
Caught by Opus advisor during stage-337 review (the cross-PR interaction
between #2053 and the previously-shipped #2041 wasn't exercised by either
PR's individual test suite).
PR #1970 added a dedicated `elif pid == "lmstudio":` branch in
`get_available_models()` that fetches the live /v1/models list when the
hermes_cli helper doesn't have ids cached. The fallback path inside that
branch only looked at `cfg["providers"]["lmstudio"]["base_url"]`, missing
the historical config shape where the URL lives under `cfg["model"]`:
model:
provider: lmstudio
base_url: http://192.168.1.22:1234/v1 ← here, not under providers.lmstudio
providers:
lmstudio:
api_key: local-key
3 pre-existing tests in tests/test_issue1527_lmstudio_base_url_classification
broke on stage-337 because of this — they passed on master, failed after
the PR #1970 merge.
The simpler fix is to enhance the already-introduced `_get_provider_base_url()`
helper so it falls back to `cfg["model"]["base_url"]` when
`cfg["model"]["provider"] == provider_id`, then use the helper inside the
lmstudio branch instead of a direct lookup. This keeps the previous
behaviour (where the generic configured-provider branch handled lmstudio
via the model block) while preserving PR #1970's live-discovery additions.
Belt-and-suspenders: `_get_provider_base_url()` explicitly does NOT inherit
model.base_url for providers other than the active one — if a user's config
says `model.provider: anthropic` and they have `providers.openai` configured
without a base_url, openai must still resolve to None (use SDK default),
not to the anthropic proxy URL.
6 new regression tests in tests/test_pr1970_lmstudio_base_url_fallback.py
lock the two-location lookup, the precedence rule (explicit providers entry
wins over model fallback), trailing-slash stripping, and the negative case
(model.base_url MUST NOT leak to non-active providers).
All 51 tests in the existing model-resolver + custom-provider banks still
pass.
Caught by maintainer review on stage-337 (full pytest with the new network
isolation in place surfaced the regression that the fork-CI mock-server path
would have hidden).
Tests should not reach the public internet. Before this commit, an
accidentally-leaking outbound socket from the test_server fixture (real
TLS handshakes to Anthropic / Amazon / OpenRouter, sometimes triggered
by SDK-init paths that found a credential the credential-strip allowlist
missed) was adding 60+s of wall-time to a 100s test run and creating a
class of flaky failures.
This installs a default-deny socket-block at two layers:
1. Pytest process, via tests/conftest.py module-level monkey-patch on
socket.create_connection + socket.socket.connect. Loopback / RFC1918
private / link-local / RFC2606 reserved-TLD destinations pass through;
anything else raises OSError("hermes test network isolation: outbound
to ... blocked"). Tests that legitimately need real outbound opt back
in via the new `allow_outbound_network` fixture (no current callers).
2. Test_server subprocess (server.py), via a HERMES_WEBUI_TEST_NETWORK_BLOCK=1
environment-variable-gated guard at the top of server.py. tests/conftest.py
sets the env var on every test_server spawn. Without this, the subprocess
could make outbound that the pytest-side block can't see (which is exactly
what was happening — verified via `ss -tnp` showing the server.py child
with established ESTAB sockets to [2607:6bc0::10]:443).
In production the env var is unset, so the guard is a no-op.
Companion changes:
- test_dns_resolution_failure refactored to mock socket.getaddrinfo
raising gaierror, instead of relying on a real DNS lookup of a
*.invalid hostname. The test was the one outlier that genuinely
exercised real DNS; mocking matches what every other probe-error test
in the same file already does.
- New tests/test_conftest_network_isolation.py with 9 adversarial
tests proving the block fires for public IPs (including the exact
Anthropic IPv6 and Amazon IPv4 destinations we observed leaking),
the allow-list passes loopback / RFC1918 / link-local / reserved-TLDs,
and the opt-in fixture re-enables real outbound when needed.
Test suite: 5,120 → 5,192 (+72 net new from this commit + the regression
tests in the companion commits). Wall time: 161s → 95s on the same
hardware. No remaining outbound from any test path.
`_isDesktopWidth()` in boot.js gates every collapse path on
`matchMedia('(min-width:641px)')` — matching where the rail itself becomes
visible. The CSS rules driving the actual visual collapse were nested inside
the workspace-panel block at `@media(min-width:901px)` — a threshold copied
from the right-panel collapse but with no functional reason to apply here.
Behavioural consequence in the 641–900 px band (tablet portrait + small
laptop windows):
- Rail is visible, user clicks the active icon
- JS adds `.layout.sidebar-collapsed` and writes localStorage='1'
- JS sets aria-expanded='false' on the active rail button
- CSS at min-width:901px does NOT apply → sidebar stays at 300 px width
- User sees no visual change; screen reader announces collapsed state for
a sidebar that is still visible; localStorage silently persists
- Resize to ≥901 px later → sidebar suddenly collapses (surprise state)
Fix: hoist the three `.sidebar-collapsed` / flash-prevention rules out of
the workspace-panel @media block and into their own `@media(min-width:641px)`
block. The rail visibility breakpoint, the JS gate, and the CSS gate now
all agree.
`:not(.mobile-open)` is preserved on both selectors so the mobile slide-in
overlay (handled in the `max-width:640px` block) is never targeted — the
new @641 boundary doesn't change that contract.
Verified breakpoint matrix end-to-end (Node harness over real boot.js +
style.css):
Width | JS desktop | CSS applies | Effect
------|------------|-------------|------------
640 | no | no | no-op (mobile overlay)
641 | yes | yes | collapses ✓
700 | yes | yes | collapses ✓
768 | yes | yes | collapses ✓
900 | yes | yes | collapses ✓
1024 | yes | yes | collapses ✓
Regression test added: `test_css_breakpoint_matches_js_isdesktopwidth`
parses boot.js for the `_isDesktopWidth` matchMedia query, walks CSS to
find the @media block enclosing `.layout.sidebar-collapsed`, and asserts
the thresholds match. Locks the invariant so a future refactor can't
re-introduce the asymmetric-band silent-state-leak.
Test counts:
- tests/test_sidebar_collapse_toggle.py: 35/35 pass (was 34, +1 regression)
- Full suite (Python 3.14, local): 5040 passed, 0 failed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two test-infrastructure fixes surfaced while running the full suite on
this branch. Both prevent accidental outbound network calls from the
pytest process — a class of bug that doesn't show up as test failures
but corrupts timing, leaks credentials, and was responsible for a recent
10× slowdown observation.
## 1. AWS_EC2_METADATA_DISABLED for the whole pytest session
When hermes-agent's bedrock_adapter / botocore credential chain is
imported during tests (e.g. via api/config.py provider-catalog imports),
botocore probes the EC2 Instance Metadata Service at 169.254.169.254
looking for an instance role. On VPS hosts where IMDS is reachable but
rate-limited (HTTP 429) or non-responsive, those probes dominate wall
time — a 161s test run was observed extending to 600+s.
Set `AWS_EC2_METADATA_DISABLED=true` at module load (before any test-file
imports trigger botocore initialisation). This is the documented AWS-
supported way to silence the probe and matches the guard the agent's own
`hermes_cli/doctor.py` already uses inside its parallel-probe block.
Also explicitly re-set the var on the spawned test-server env so it
can't be accidentally cleared by a later `env.update(...)`.
## 2. Expanded credential-strip allowlist
The original strip list covered 6 providers (OpenRouter, OpenAI,
Anthropic, Google, DeepSeek, Xiaomi). Several others leaked through
into the test server subprocess:
- `MEM0_API_KEY`, `XAI_API_KEY`, `MISTRAL_API_KEY`, `OLLAMA_API_KEY`,
`GROQ_API_KEY`, `TOGETHER_API_KEY`, …
- AWS credentials (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`,
`AWS_SESSION_TOKEN`, `AWS_PROFILE`, `AWS_BEARER_TOKEN_BEDROCK`)
- Messaging bot tokens (`TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`,
`SLACK_BOT_TOKEN`, `SIGNAL_API_TOKEN`, `WHATSAPP_API_TOKEN`)
- Memory providers (`HONCHO_API_KEY`, `SUPERMEMORY_API_KEY`)
- Search / browser / image-gen (`FIRECRAWL_API_KEY`, `FAL_KEY`,
`TAVILY_API_KEY`, `SERPER_API_KEY`, `BRAVE_API_KEY`)
- GitHub tokens (`GH_TOKEN`, `GITHUB_TOKEN`)
- Azure OpenAI (`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`)
A real outbound TLS connection to a provider's IPv6 endpoint was
observed during a test run on this host before the strip was expanded.
The test server uses a mock config and has no business making real API
calls.
## Test status
5,151 passed / 11 skipped / 1 xfailed / 2 xpassed / 0 regressions in
139s on Python 3.11. Down from 147s before the fixes (and from
intermittent 10×-slowdowns on IMDS-rate-limited hosts). All API/feature
contracts unchanged.
## Security audit of remaining test-suite host references
Every IP / URL / hostname referenced in `tests/**.py` was classified:
- Loopback (127.0.0.1, localhost, ::1, 0.0.0.0)
- RFC1918 private (10.*, 172.16-31.*, 192.168.*)
- RFC 5737 TEST-NET-3 documentation (203.0.113.*)
- RFC 2606 reserved docs domains (*.example.com, *.example.local,
*.example.test)
- Security-attack input strings used only as parser/validator input
(evil.com, attacker, evil.example.com — never resolved or contacted)
- Real provider/CDN endpoints used only as `base_url` config strings
or CSP-allowlist assertions — never actually fetched
- 8.8.8.8 used only as a "non-loopback example" in `_is_local_from_handler()`
unit tests
No suspicious egress destinations.
Lets desktop users collapse the session-list sidebar to maximise the chat
area, without adding any visible UI affordance. Default appearance is
identical to master — only users who actively try to toggle (or know the
keyboard shortcut) ever see a difference.
## Behaviour (desktop only, ≥641px)
| State | Action | Result |
|------------------------------------|-----------------------|-----------------------------------------|
| Sidebar open, click active rail | Toggle | Sidebar collapses to width:0 |
| Sidebar open, click different rail | Normal switch | **Sidebar stays open** (no surprise) |
| Sidebar collapsed, click any rail | Expand + switch | Sidebar expands, then panel switches |
| Anywhere, Cmd/Ctrl+B | Toggle | Same as same-active-rail click |
| Mobile (<641px), any of the above | No-op | Mobile overlay behaviour unchanged |
Two discoverability paths, both opt-in. **No new visible buttons.** Users
who never click the active rail icon see zero UI change vs. master.
## Surface-minimal design
The behaviour is contained behind one extra arg on the rail/sidebar-nav
onclick: `switchPanel('chat',{fromRailClick:true})`. Without that flag the
function preserves master's behaviour exactly — every programmatic
`switchPanel(name)` callsite (commands, deeplinks, internal state changes)
is unaffected. The guard chain inside `switchPanel`:
opts.fromRailClick && _isDesktopWidth() && (
_isSidebarCollapsed() ? expandSidebar() :
prevPanel === nextPanel ? (toggleSidebar(true); return false))
is the ONLY new code path that can cause a collapse. Cross-panel clicks
fall through to the existing switch logic untouched.
## Polish from both source PRs
- **Click-active gesture** as the primary toggle (#1884 @jasonjcwu — the
genuine UX innovation; no extra button needed)
- **Cmd/Ctrl+B keyboard shortcut** (#1924 @spektro33; VS Code convention).
Guarded against firing when typing in INPUT / TEXTAREA / contenteditable
so the shortcut never steals from in-progress text editing.
- **Inline flash-prevention `<script>`** in `<head>` (#1924) sets
`data-sidebar-collapsed='1'` on `<html>` BEFORE the stylesheet loads,
so cold loads with a persisted-collapsed state paint correctly from
frame 0 with no flicker. Cleared by JS once the class system takes over.
- **Smooth slide animation** via `.24s cubic-bezier(.22,1,.36,1)`
(#1924, mirrors the existing workspace-panel collapse on the right)
- **`aria-expanded` mirrored** on the active rail button (#1884) so
screen readers announce open/collapsed transitions.
- **`body.resizing` transition-suppression** (#1884) keeps the drag-resize
cursor instant — no animation during a width-resize gesture.
- **bfcache `pageshow` re-sync** (#1884) — if another tab toggled the
sidebar while this page was frozen, bring it in line on restore.
## Drops vs. #1924
- No persistent rail "toggle sidebar" button (Nathan: keep the UI stealth)
- No close-X button in chat panel head (same reason)
- No i18n keys for the dropped buttons
## What did NOT change
- 22 rail/sidebar-nav `onclick` handlers gained the `{fromRailClick:true}`
arg — function-call shape, invisible to users
- 1 inline `<script>` in `<head>` (flash prevention) — invisible
- 5 lines of CSS — invisible unless someone collapses
That's the entire visible-UI delta. **23 ins / 22 del on `index.html`,
all string-replace.**
## Verification
- 5,151 pytest passing including a new 34-test structural suite covering
every contract (CSS rules, JS functions, fromRailClick guard, legacy
proxy forwarding, flash-prevention `<script>` ordering, mobile
exclusion via :not(.mobile-open) selector, aria-expanded sync).
- Live browser walkthrough at 1280px verified:
- Default boot state identical to master (sidebar open, width 300px)
- Click active rail → collapse (width 1, opacity 0, translateX -14px,
localStorage='1', aria-expanded=false). Panel unchanged.
- Click active rail again → expand back to width 300, aria=true
- Click DIFFERENT rail → normal switch, sidebar stays open (legacy-
preserving case, verified explicitly)
- Click rail while collapsed → expand + switch in one gesture
- Cmd+B toggles correctly
- Cmd+B inside `<textarea>` → suppressed (defaultPrevented=false)
- Reload with collapsed state persisted → restores without flash
- Mobile simulation (matchMedia returns false for min-width:641px):
same-active-rail click is no-op, Cmd+B is no-op, sidebar stays at 300px
Co-authored-by: jasonjcwu <jasonjcwu@users.noreply.github.com>
Co-authored-by: spektro33 <spektro33@users.noreply.github.com>
Closes#1884Closes#1924
Two follow-ups from Opus pre-release review of stage-336:
1. tests/conftest.py — autouse session fixture that removes
HERMES_WEBUI_SKIP_ONBOARDING from os.environ for the whole pytest run, and
restores it after. Hosting providers and isolated harnesses set this var
to short-circuit the onboarding wizard, but it leaked into pytest and
caused tests that exercise apply_onboarding_setup() to fail with cryptic
FileNotFoundError. Tests that specifically validate the short-circuit
behavior can opt back in with monkeypatch.setenv. Surgical per-test
delenv calls remain as defense-in-depth but are now redundant.
2. docs/rfcs/README.md — one-line note that first-time contributor RFCs
should be discussed in an issue before opening a PR. Gates drive-by
design-doc PRs without us having to decline them on contribution.
Verified: 96 onboarding-related tests pass with HERMES_WEBUI_SKIP_ONBOARDING=1
exported in the test runner env (would have failed before this fixture).
1. test_issue1362_codex_oauth_onboarding.py::test_anthropic_onboarding_setup_allows_linked_oauth_without_api_key
Pre-existing env-collision bug, surfaced when HERMES_WEBUI_SKIP_ONBOARDING=1
is in the test runner env (set by hosting providers and by isolated test
harnesses). `apply_onboarding_setup()` short-circuits without writing the
config file when SKIP_ONBOARDING is set, but the test asserts the file was
written, so it fails with FileNotFoundError on read_text().
Fix: `monkeypatch.delenv("HERMES_WEBUI_SKIP_ONBOARDING", raising=False)` —
matches the convention already used in test_issue1499_keyless_onboarding.py
and test_issue1500_lmstudio_env_var_alignment.py.
2. test_issue1800_file_html_interactions.py::test_media_html_inline_keeps_csp_sandbox
Slicing-based source-string assertion (4000-char window after `def _handle_media`)
broke because PR #2044's MEDIA_ALLOWED_ROOTS parsing was inserted earlier in
the function and pushed the CSP block to offset 4211. Widened window to 5000.
Assertion content is structural (CSP sandbox string present), not positional.