Files
hermes-webui/docs/supervisor.md
T
Hermes Bot 6a26e82c22 fix(bootstrap): address Opus pre-merge review feedback (#1478)
Three changes from the pre-merge Opus review:

**MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal

macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not
just real services. Typical noise values: `"0"` (truthy in Python!) and
`"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)`
existence check would auto-promote interactive `./start.sh` runs to
foreground mode on every Mac dev machine — silently breaking the most
common installation path (no /health probe, no browser open, no log file,
hanging shell).

Fix: new `_is_real_supervisor_value()` helper that filters noise. For
`XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"`
prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`)
which still triggers correctly.

7 new tests in `TestXPCServiceNameNoiseFilter`:
- 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection
- 3 real Label forms → correct detection
- Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd

**SHOULD-FIX 1** — Test env leakage

The original `clean_env` fixture stripped supervisor-detection env vars
but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR)
that `main()` mutates onto `os.environ`. After
`test_foreground_exports_resolved_env_vars` ran, later tests would import
bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of
"127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*),
but it was a footgun for future tests.

Fix: extend `clean_env` to also `delenv` the three resolved vars before
each test.

**SHOULD-FIX 2** — Pre-execv executability guard

If `discover_launcher_python` returns a path that doesn't exist or isn't
executable, `os.execv` raises OSError → wrapper catches → SystemExit(1)
→ supervisor restarts → loop forever. That's exactly the failure mode
this PR is supposed to eliminate.

Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts
infinite supervisor loop into a single visible RuntimeError.

1 new test in `TestForegroundExecutabilityGuard` pinning that the guard
fires before execv when the python path is non-executable.

**Docs** — supervisor.md updates

- New section explaining the XPC_SERVICE_NAME noise filter and what
  values trigger / don't trigger detection
- New section listing supervisors that are NOT auto-detected (runit,
  daemontools, PM2, Foreman/Honcho, custom shell-script supervisors)
  with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1

Verification

- 3820 tests pass (+9 from this commit's new tests vs the original PR
  push of 3811)
- Filter manually verified end-to-end with the live os.environ:
  XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers
- run-browser-tests.sh ALL CHECKS PASSED on the worktree

Items deferred from the Opus review

- #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve()
  so it's stable; not a real concern in practice
- #6 two startup messages in foreground mode: cosmetic, useful for
  diagnostics
- #7 stricter explicit-only mode: leaves user the override of just not
  passing --foreground (current behavior)
- #8 test stub return value: trivial, can fix later if regression surface
- #9 argparse positional-after-option ordering: test reads fine

These can be follow-up issues if anyone hits them.
2026-05-02 17:52:13 +00:00

7.2 KiB

Running Hermes Web UI under a process supervisor

Use a process supervisor (launchd, systemd, supervisord, runit, s6) when you want the Web UI to start at boot, restart on crash, or be managed alongside other services.

TL;DR

Pass --foreground to bootstrap.py (or bash start.sh):

bash start.sh --foreground

Or set HERMES_WEBUI_FOREGROUND=1 in the environment. The Web UI will auto-detect launchd / systemd / supervisord even without the flag, but being explicit is safer.

Why --foreground matters

Without it, bootstrap.py does this:

  1. Spawn server.py as a detached subprocess (start_new_session=True)
  2. Probe /health until the server is up
  3. Exit 0

That works for an interactive shell run (./start.sh returns to your prompt with the server alive in the background). It is broken under any process supervisor: the supervisor sees its tracked PID exit, marks the job as completed, and respawns bootstrap.py. The respawn fails to bind port 8787 (the orphaned server still has it), exits non-zero, supervisor respawns again — loop.

In foreground mode, bootstrap.py does its setup work and then calls os.execv to replace its own process with server.py. The supervisor sees the long-lived server as the original child. KeepAlive=true / Restart=always work correctly.

launchd (macOS)

~/Library/LaunchAgents/com.example.hermes-webui.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.example.hermes-webui</string>

    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>/Users/yourname/hermes-webui/start.sh</string>
        <string>--foreground</string>
    </array>

    <key>WorkingDirectory</key>
    <string>/Users/yourname/hermes-webui</string>

    <key>RunAtLoad</key>
    <true/>

    <key>KeepAlive</key>
    <true/>

    <key>StandardOutPath</key>
    <string>/Users/yourname/.hermes/webui/launchd-stdout.log</string>

    <key>StandardErrorPath</key>
    <string>/Users/yourname/.hermes/webui/launchd-stderr.log</string>

    <key>EnvironmentVariables</key>
    <dict>
        <key>HOME</key>
        <string>/Users/yourname</string>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Load:

launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
launchctl print gui/$(id -u)/com.example.hermes-webui   # check state

Reload after editing the plist:

launchctl unload ~/Library/LaunchAgents/com.example.hermes-webui.plist
launchctl load   ~/Library/LaunchAgents/com.example.hermes-webui.plist

launchd sets XPC_SERVICE_NAME automatically, so even without the --foreground argument the Web UI will auto-promote to foreground mode. The flag is still recommended as documentation of intent.

systemd (Linux)

~/.config/systemd/user/hermes-webui.service:

[Unit]
Description=Hermes Web UI
After=network.target

[Service]
Type=simple
WorkingDirectory=%h/hermes-webui
ExecStart=/bin/bash %h/hermes-webui/start.sh --foreground
Restart=on-failure
RestartSec=5

# Optional: route stdout/stderr to journald instead of files
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=default.target

Enable + start:

systemctl --user daemon-reload
systemctl --user enable --now hermes-webui.service
journalctl --user -u hermes-webui.service -f

systemd sets INVOCATION_ID and JOURNAL_STREAM (when stdio is wired to the journal), both of which auto-promote to foreground mode.

supervisord (cross-platform)

/etc/supervisor/conf.d/hermes-webui.conf:

[program:hermes-webui]
command=/bin/bash /home/youruser/hermes-webui/start.sh --foreground
directory=/home/youruser/hermes-webui
user=youruser
autostart=true
autorestart=true
stopsignal=TERM
stopwaitsecs=10
stdout_logfile=/var/log/hermes-webui.out.log
stderr_logfile=/var/log/hermes-webui.err.log
environment=HOME="/home/youruser",PATH="/usr/local/bin:/usr/bin:/bin"

Reload + start:

sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl status hermes-webui

supervisord sets SUPERVISOR_ENABLED, which auto-promotes to foreground mode.

Auto-detected env vars (full list)

These trigger --foreground behavior even when the flag is not passed:

Env var Set by Notes
INVOCATION_ID systemd Set on every service activation
JOURNAL_STREAM systemd Set when stdio is wired to journald
NOTIFY_SOCKET systemd Type=notify / s6 sd_notify-style notification socket
XPC_SERVICE_NAME launchd Set to the plist Label — narrowed to com.<rdns>.<svc> form (see below)
SUPERVISOR_ENABLED supervisord Always set under supervisord
HERMES_WEBUI_FOREGROUND you Explicit opt-in; accepts 1 / true / yes / on

XPC_SERVICE_NAME noise filter

macOS launchd sets XPC_SERVICE_NAME in every Terminal-spawned shell, not just real services. Typical noise values:

  • 0 — set on launchd descendants generally
  • application.com.apple.Terminal.<UUID> — Terminal.app shells
  • application.com.googlecode.iterm2 — iTerm2
  • application.com.microsoft.VSCode — VSCode integrated terminal

A bare existence check on this var would auto-promote interactive ./start.sh runs to foreground mode on every Mac dev machine, breaking the most common installation path. We narrow detection to launchd Label-style names (typically reverse-DNS like com.example.foo). Real launchd plists always use this form. If you ever see XPC_SERVICE_NAME=0 in your service environment, the auto-detect will ignore it — set HERMES_WEBUI_FOREGROUND=1 or pass --foreground explicitly to be safe.

Supervisors that are NOT auto-detected

The following set no env var that we can reliably detect. Pass --foreground (or HERMES_WEBUI_FOREGROUND=1) explicitly:

  • runit (without sd_notify) — pure runit chains
  • daemontools / svc
  • PM2 (Node.js process manager occasionally repurposed for Python)
  • Foreman / Honcho (Procfile-style)
  • Docker with a custom CMD entrypoint that doesn't already use exec
  • Custom shell-script supervisors that fork-and-wait

If your supervisor isn't in the auto-detect list and you see the orphan-PID respawn loop, set HERMES_WEBUI_FOREGROUND=1 in the service environment.

Diagnostic recipe

If the Web UI keeps getting respawned and you suspect the double-fork loop:

# Check the running PID for the server
lsof -iTCP:8787 -sTCP:LISTEN

# Get its parent — should be the supervisor itself, NOT init (PID 1)
PID=$(lsof -tiTCP:8787 -sTCP:LISTEN)
ps -p "$PID" -o pid,ppid,cmd
ps -p "$(ps -o ppid= -p "$PID" | tr -d ' ')" -o pid,cmd

A healthy foreground-mode setup looks like:

PID    PPID  CMD
12345  6789  /path/to/python /path/to/server.py
6789   1     /sbin/launchd        # or /usr/lib/systemd/systemd, etc.

If PPID is 1 (init) when it should be the supervisor, the orphan-server loop is happening — re-check that --foreground (or one of the env vars) is reaching the process.