mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 11:10:18 +00:00
6967965782bc45ac10ade67ea190a96a7df72467
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6a26e82c22 |
fix(bootstrap): address Opus pre-merge review feedback (#1478)
Three changes from the pre-merge Opus review: **MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not just real services. Typical noise values: `"0"` (truthy in Python!) and `"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)` existence check would auto-promote interactive `./start.sh` runs to foreground mode on every Mac dev machine — silently breaking the most common installation path (no /health probe, no browser open, no log file, hanging shell). Fix: new `_is_real_supervisor_value()` helper that filters noise. For `XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"` prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`) which still triggers correctly. 7 new tests in `TestXPCServiceNameNoiseFilter`: - 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection - 3 real Label forms → correct detection - Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd **SHOULD-FIX 1** — Test env leakage The original `clean_env` fixture stripped supervisor-detection env vars but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR) that `main()` mutates onto `os.environ`. After `test_foreground_exports_resolved_env_vars` ran, later tests would import bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of "127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*), but it was a footgun for future tests. Fix: extend `clean_env` to also `delenv` the three resolved vars before each test. **SHOULD-FIX 2** — Pre-execv executability guard If `discover_launcher_python` returns a path that doesn't exist or isn't executable, `os.execv` raises OSError → wrapper catches → SystemExit(1) → supervisor restarts → loop forever. That's exactly the failure mode this PR is supposed to eliminate. Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts infinite supervisor loop into a single visible RuntimeError. 1 new test in `TestForegroundExecutabilityGuard` pinning that the guard fires before execv when the python path is non-executable. **Docs** — supervisor.md updates - New section explaining the XPC_SERVICE_NAME noise filter and what values trigger / don't trigger detection - New section listing supervisors that are NOT auto-detected (runit, daemontools, PM2, Foreman/Honcho, custom shell-script supervisors) with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1 Verification - 3820 tests pass (+9 from this commit's new tests vs the original PR push of 3811) - Filter manually verified end-to-end with the live os.environ: XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers - run-browser-tests.sh ALL CHECKS PASSED on the worktree Items deferred from the Opus review - #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve() so it's stable; not a real concern in practice - #6 two startup messages in foreground mode: cosmetic, useful for diagnostics - #7 stricter explicit-only mode: leaves user the override of just not passing --foreground (current behavior) - #8 test stub return value: trivial, can fix later if regression surface - #9 argparse positional-after-option ordering: test reads fine These can be follow-up issues if anyone hits them. |
||
|
|
f84b6a4e2f |
fix(bootstrap): add --foreground mode for process supervisors (#1458 Bug #1)
Issue #1458 reports persistent-host crashes (≥1/day) when running the WebUI under launchd KeepAlive on macOS. Root cause: `bootstrap.py` calls `subprocess.Popen([python, "server.py"], start_new_session=True)`, probes /health, then exits 0. Under any process supervisor (launchd, systemd, supervisord, runit, s6), the supervisor sees its tracked PID exit, marks the program as "completed," and respawns it. The new bootstrap fails to bind port 8787 (orphaned server still has it), exits non-zero, supervisor respawns again — loop until the orphan crashes for some other reason and the next respawn finds the port free. This PR addresses Bug #1 of the three failure modes tracked in #1458: the `bootstrap.py` double-fork breaking process supervisors. Bug #2 (state.db FD leak) and Bug #3 (HTTP-unhealthy wedge) remain open under the same issue — they need diagnosis data before a fix can land. Changes ------- 1. `bootstrap.py`: - New `--foreground` argparse flag with help text mentioning launchd / systemd / supervisord. - New `_detect_supervisor()` that returns the env var name for any supervisor it detects: `INVOCATION_ID` / `JOURNAL_STREAM` / `NOTIFY_SOCKET` (systemd, s6), `XPC_SERVICE_NAME` (launchd), `SUPERVISOR_ENABLED` (supervisord), or `HERMES_WEBUI_FOREGROUND` for the explicit user opt-in. Truthy values for the explicit opt-in: `1` / `true` / `yes` / `on` (case-insensitive). - `main()` branches on `args.foreground or _detect_supervisor()`: - **Foreground path:** chdir to `agent_dir or REPO_ROOT`, then `os.execv(python, [python, server_path])` to replace the bootstrap process image with the server. The supervisor sees the long-lived server as the original child. No `wait_for_health` probe — the supervisor's KeepAlive / Restart=on-failure handles liveness. - **Default path:** unchanged. Spawn server as detached child via `Popen + start_new_session=True`, probe /health, return 0. This still works for interactive `bash start.sh` invocations. - Resolved env vars (HOST/PORT/STATE_DIR/AGENT_DIR) are now mutated on `os.environ` directly instead of into a local `env` copy so they are inherited across `os.execv`. 2. `docs/supervisor.md` (new): runnable launchd plist, systemd .service, and supervisord conf examples + a diagnostic recipe (`lsof` + ppid chain) for catching the orphan-loop in production. 3. `.gitignore`: allowlist `docs/supervisor.md` (the directory uses an opt-in pattern; matches the existing `!docs/docker.md` precedent). 4. `tests/test_bootstrap_foreground.py` (new): 35 regression tests covering the argparse flag, `_detect_supervisor()` behavior across all five supervisor env vars, the explicit opt-in's truthy/falsy values, and `main()`'s execv-vs-Popen routing decision under each input combination. `os.execv` is monkeypatched in the routing tests — we pin the structural choice (which call is made, with which args, in which cwd, with which env) not the post-exec behavior. Why this scope and no more -------------------------- Bug #2 (state.db FD leak) lists 5 candidate paths and asks the reporter for `lsof -p <pid> | sort | uniq -c | sort -rn | head -20` output to disambiguate. Until that data lands, any "fix" would be speculative — explicitly out of scope per the contributor-pickup comment on the issue. Bug #3 (launchd-running, port-listening, HTTP-unhealthy) was added in @stefanpieter's reply comment. Diagnosis is in flight; no concrete fix shape yet. Also out of scope. Running locally end-to-end verifies the behavior: ``` [bootstrap] Starting Hermes Web UI on http://127.0.0.1:8789 (foreground mode: --foreground) $ pgrep -af 'server.py' 2997632 /home/.../python /tmp/wt-fix-1458/server.py $ ps -o ppid -p 2997632 2997581 ← bash that ran bootstrap.py — same PID as the original bootstrap $ ps -p 2997581 -o cmd ... bootstrap.py ... ← but exec'd into server.py ``` The same PID that bash forked for `bootstrap.py` is now `server.py`. A supervisor watching that PID would correctly observe the long-lived server. No double-fork. Verification ------------ - 3811 tests pass (`pytest tests/` — full suite, +51 from this PR plus master-merge-in) - All 35 new bootstrap-foreground tests pass - `bash scripts/run-browser-tests.sh` PASS (HTTP API checks against worktree) - `bash scripts/webui_qa_agent.sh 8789` PASS (23/23 visual QA) - Live verified: server starts cleanly under both `--foreground` and `HERMES_WEBUI_FOREGROUND=1`; PID lineage confirms no double-fork Closes #1458 (Bug #1 only). Bugs #2 and #3 remain tracked under the issue. |
||
|
|
4ee9368464 |
Opus pre-release follow-ups for PR #1445
REQUIRED: - _fully_unquote_path range(3) -> range(10) — defense-in-depth so quadruple- encoded .. is rejected by validator instead of slipping through (not exploitable but contract violation) - docs/EXTENSIONS.md trust-model callout moved to top of file with explicit 'don't enable in untrusted env / don't point at user-writable dir' guidance NICE-TO-HAVE (taken since Nathan asked for all fixes big and small): - URL list cap at _MAX_URL_LIST=32 to avoid pathological rendering - One-shot WARNING log for rejected URLs (silent drop now visible to admin) - One-shot WARNING log for URL list truncation - MIME map: ttf (font/ttf), otf (font/otf), wasm (application/wasm) 5 regression tests in tests/test_pr1445_opus_followups.py pin all invariants. |
||
|
|
9de61a0b9a | feat: add opt-in webui extension hooks | ||
|
|
b57525241b |
v0.50.260: Docker reliability batch - PR #1428 + broader UX/docs improvements + Opus advisor fixes
Combines PR #1428 (UID/GID alignment) with a broader Docker reliability pass that addresses recurring user reports about compose files not working. Constituent PR: - #1428 sunnysktsang - Align agent UID/GID with webui (fixes #1399). Two- and three-container compose files had agent at UID 10000 (image default) and webui at UID 1000 (WANTED_UID default), causing permission denied on shared hermes-home volume. All services now use ${UID:-1000}. Plus broader Docker UX overhaul: - All 3 compose files document HERMES_SKIP_CHMOD/HERMES_HOME_MODE escape hatches inline (the v0.50.254 fix wasn't surfaced for Docker users). - New .env.docker.example template covering UID/GID, paths, password, permission handling. UID/GID are uncommented with placeholder values per Opus advisor (so macOS users don't skim past). - New docs/docker.md - comprehensive guide: 5-min quickstart, failure mode table with one-line fixes, bind-mount migration, multi-container architecture diagram, macOS Docker Desktop VirtioFS note, link to community sunnysktsang/hermes-suite all-in-one image. - README Docker section rewritten - clearer quickstart, failure-mode table, link to docs/docker.md. Stale /root/.hermes references removed. Plus Opus pre-release advisor MUST-FIX: - HERMES_HOME_MODE has DIFFERENT semantics in the WebUI vs the agent image. WebUI: credential-file mode threshold (0640 allows group bits). Agent: HERMES_HOME directory mode (default 0700). 0640 on a directory has no owner-execute bit, so the agent can't traverse its own home and bricks. My initial draft recommended HERMES_HOME_MODE=0640 in agent service blocks - corrected to 0750 across all 4 surfaces (compose files, .env.docker.example, docs/docker.md). 3 regression tests pin the asymmetry. 12 regression tests total in test_v050260_docker_invariants.py. Full suite: 3627 passed, 0 failed. Nathan explicitly authorized merge with my own review + Opus only, no independent review needed. |
||
|
|
f14280e2c4 |
fix(#1195): route sessions to profile dir even when dir doesn't exist yet (#1373)
When a user switched profiles and created a new session, the session was saved to the default profile directory instead of the active profile directory — because get_hermes_home_for_profile() silently fell back to _DEFAULT_HERMES_HOME when the profile directory didn't exist yet on disk. Root cause: api/profiles.py:156 had `if profile_dir.is_dir(): return profile_dir; return _DEFAULT_HERMES_HOME`. New profiles (no session yet, so no dir) routed every session back to default. Fix: remove the is_dir() guard, return the profile path unconditionally. The profile directory is created on first use by the agent/session layer. 5 regression tests in tests/test_issue1195_session_profile_routing.py: existing-profile, non-existent-profile (the core fix), None, empty- string, 'default' all return the expected path. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com> |
||
|
|
dca8624454 |
fix(ui): restore rail-era app titlebar state (v0.50.226) (#1163)
Merged as v0.50.226. Integration branch absorbed @aronprins's original PR #1141 with one reviewer fix from @nesquena (`1d11646`: queue hide tooltip updated to reference the queue pill, not the removed titlebar badge). **Full gate results:** - 2595 tests passing ✅ - Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14) ✅ - Independent review: APPROVED by @nesquena ✅ Thank you @aronprins for the clean PR — the titlebar is properly restored. |
||
|
|
76e602af25 |
feat: remove bubble_layout setting end-to-end (#777)
Removes the bubble_layout toggle from Settings, all persistence, CSS, i18n strings, and the UI docs demo. The CSS was already effectively dead. Users with a saved bubble_layout value in settings.json get a clean migration via _SETTINGS_LEGACY_DROP_KEYS. Credit: @aronprins (PR #760 / #777) Co-authored-by: aronprins <aronprins@users.noreply.github.com> |
||
|
|
9a3dc10d93 |
feat: redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70 (PR #587 by @aronprins)
Redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70 Squash-merges PR #587 by @aronprins (Aron Prins). Full credit to @aronprins for all feature and fix work. Transcript redesign: unified --msg-rail/--msg-max CSS variables, user turns as tinted cards, thinking cards as bordered panels, error card treatment, day-change separators, composer fade. Approval/clarify as composer flyouts: cards slide up from behind composer top, overflow:hidden + translateY clip prevents travel visibility, focus({preventScroll:true}). Streaming lifecycle: DOM order user→thinking→tool cards→response, no mid-stream jump. Live tool cards inserted before [data-live-assistant]. Persistence: reasoning attached before s.save(), _restore_reasoning_metadata on reload, role=tool rows preserved in S.messages, CLI-session tool-result fallback. Workspace panel FOUC fix: [data-workspace-panel] set at parse time. Docs: docs/ui-ux/index.html + two-stage-proposal.html. Maintainer additions (433b867): CHANGELOG v0.50.70, version badge, usage badge loop simplification. Reviewed and approved by @nesquena (independent review). 1361 tests passing. |
||
|
|
57a4f573f6 |
docs: HERMES.md deep-dive, Why Hermes in README, screenshot layout
* docs: add HERMES.md deep-dive, Why Hermes section in README, and screenshot layout - HERMES.md: full why-Hermes document -- assistant vs. agent mental model, three pillars (memory/scheduling/reach), four-category taxonomy of AI tools, per-tool comparison sections with tables (Claude Code, Codex CLI, OpenCode, Cursor/Copilot, Claude.ai), compounding advantage, who it's for, what it's not, quick reference - README: hero screenshot stays full-width; two new UI screenshots in side-by-side HTML table with captions below - README: new Why Hermes section with 6-bullet summary, comparison table, and link to HERMES.md - README: HERMES.md added to Docs section - docs/images/: two UI screenshots (workspace browser, sessions view) * docs: fact-check and update all comparisons; add Open Interpreter section Researched current state of each tool before updating: Claude Code: - Scheduled jobs: now Partial (has /loop session-scoped, cloud-managed /schedule via claude.ai/code, and desktop app automations); updated table to reflect this with footnotes distinguishing self-hosted cron - Persistent memory: Partial (CLAUDE.md, MEMORY.md, rolling auto-memory but not full automatic cross-session recall) - Provider-agnostic: No -- supports Bedrock/Vertex but Claude models only - Web UI: Yes but Anthropic-hosted (not self-hosted) Codex CLI: - Persistent memory: Partial (session history + AGENTS.md since v0.100.0) - Scheduled jobs: Partial (desktop app Automations only; CLI has no native scheduling as of early 2026, open feature request) - Provider-agnostic: Yes (10+ providers) OpenCode: - Web UI: now Yes (embedded in binary + official desktop app) - Persistent memory: Partial (SQLite sessions + AGENTS.md, not semantic) - Messaging: community Telegram bot only, not first-party Open Interpreter: added as new comparison section - Most common 'why not just use this' question; addressed head-on - Session-scoped, no persistent memory by their own docs, no scheduler, no messaging integration; powerful for one-shot tasks, not always-on README Why Hermes table: updated to include Open Interpreter column, fixed Claude Code self-hosted row (No -- scheduling runs on Anthropic cloud), added footnotes for partial entries * docs: add OpenClaw comparison; update category framework and quick reference table OpenClaw (openclaw.ai, MIT, 347k stars) is the most direct Hermes competitor -- both are open-source, self-hosted, always-on agents with persistent memory, cron, and messaging integration. Added: - Full OpenClaw section in HERMES.md with honest comparison: where it wins (15+ messaging platforms incl. iMessage/WeChat, native Chrome CDP browser control, voice wake words, ClawHub marketplace) and where Hermes differs (self-improving skills system, Python/ML ecosystem, web UI, multi-profile, sub-agent orchestration) - Category 4 framework updated: now lists both Hermes and OpenClaw, with the key architectural distinction called out - Quick reference table expanded to include OpenClaw column (now 8 tools) - New rows added: self-improving skills, browser/computer control, Python/ML ecosystem - README Why Hermes table updated: OpenClaw replaces OpenCode column, self-improving skills row replaces generic skills row, callout line at bottom addresses OpenClaw head-on * docs: major accuracy pass -- OpenClaw deep-dive, Claude Code corrections, drop Open Interpreter OpenClaw: - Expanded comparison from a table to a full prose section with 'Where OpenClaw wins' / 'Where Hermes wins' structure - Honest about OpenClaw strengths: 15+ messaging platforms, native Chrome CDP browser control, voice wake words, 13k+ ClawHub skills - Hermes advantages called out clearly: self-improving skills as a first-class automatic loop (vs marketplace-install model), stability (documented OpenClaw update regressions, Telegram breakage in early 2026, WhatsApp protocol instability), security (156 CVEs and 1,184 malicious skills found in ClawHub audit vs Hermes's no marketplace attack surface), Python/ML ecosystem, full web UI vs dashboard-only, and first-class multi-profile support - Category 4 framework updated to name both Hermes and OpenClaw - Table updated: added stability/security rows, corrected web UI row (OpenClaw has a gateway dashboard but not a full chat UI) Claude Code corrections (researched against official docs at code.claude.com): - Skills/Hooks: changed from No to Yes -- has a full Hooks system (13 event types, 4 handler types) and a Plugin/Skills marketplace since v2.0.12; unified with slash commands in v2.1.0 - Messaging: changed from No to Partial -- Channels feature (Telegram, Discord, iMessage, Webhooks) in research preview since v2.1.80; deep Slack integration that triggers cloud sessions and creates PRs - Added Claude Cowork row: separate product with 38+ connectors (Slack, Gmail, Teams, Notion, Jira, Salesforce, etc.) - Scheduling footnote updated: cloud-managed has 1-hour minimum interval - Provider-agnostic clarified: routes through Bedrock/Vertex but always Claude models; cannot swap to GPT or Gemini Open Interpreter removed: - Less relevant comparison than OpenClaw for the 'always-on agent' frame - Kept coverage focused on the tools people actually compare Hermes to Quick reference table: - Now 7 tools wide (added OpenClaw, kept Claude Code, Codex, OpenCode, Cursor, Claude.ai, Hermes) - New rows: self-improving skills, browser/computer control, stability - Updated: Claude Code messaging to Partial, OpenClaw web UI to 'Dashboard only', skills rows differentiated by type * docs: apply full editorial pass from hermes-edit-list.md Writing patterns fixed: - Em dashes reduced by ~80%; replaced with commas, periods, parens - All 'Not X, it's Y' negative parallelism rewritten as positive statements; 'What Hermes Is Not' section renamed 'Scope and Limits' and reframed positively throughout - 'It compounds.' standalone flourish removed - 'meaningfully' removed everywhere (was appearing 3+ times) - 'leverages' -> 'uses' in README - 'remembers everything' softened to 'retains context across sessions' - Bolded Hermes column in Quick Reference table un-bolded (only genuine differentiator cells kept bold: self-improving skills, always-on, orchestrates other agents) - 'The honest summary' framing removed from OpenClaw section - 'Hermes is different.' cliche transition cut from README - Rule-of-three slogans trimmed (e.g. 'Same agent, same memory...') - 'tired of re-explaining' -> 'don't want to re-explaining' Duplicate content removed: - 'day one / day one hundred' comparison kept only in Compounding Advantage section; removed from Pillar 1 Factual accuracy fixes: - Claude.ai comparison updated: memory now auto-generated from history (not just user-curated); code execution and file read/write noted as sandboxed (Artifacts), not flat No - Category 2: Windsurf framed as 'earliest' on memory, Copilot 'catching up'; removed overconfident 'most mature' claim - Category 4 qualifier: 'as of early 2026' added - '1-hour minimum' for Claude Code cloud scheduling softened to 'minimum interval applies' (specific claim unverified) - Claude Code scheduling table note: 'cloud or desktop-app only' (was just 'cloud-managed or session-scoped') - README claim 'No other open-source tool combines...' removed; was false because OpenClaw does combine all three - OpenClaw self-improving skills: 'No' -> 'Partial' with clarification - README OpenClaw callout: 'relies on a marketplace' softened to 'skill system centers on a community marketplace' - 'meaningfully more stable' -> 'more stable'; 'supply chain issues' -> 'security incidents involving malicious skills' - OpenClaw star count: '347k+' -> '~347k' (moving fast) - Stability row added to OpenClaw table; bold removed from table --------- Co-authored-by: Hermes <hermes@localhost> |