8.9 KiB
Running Hermes Web UI under a process supervisor
Use a process supervisor (launchd, systemd, supervisord, runit, s6) when you want the Web UI to start at boot, restart on crash, or be managed alongside other services.
TL;DR
Pass --foreground to bootstrap.py (or bash start.sh):
bash start.sh --foreground
Or set HERMES_WEBUI_FOREGROUND=1 in the environment. The Web UI will
auto-detect launchd / systemd / supervisord even without the flag, but being
explicit is safer.
Why --foreground matters
Without it, bootstrap.py does this:
- Spawn
server.pyas a detached subprocess (start_new_session=True) - Probe
/healthuntil the server is up - Exit 0
That works for an interactive shell run (./start.sh returns to your
prompt with the server alive in the background). It is broken under any
process supervisor: the supervisor sees its tracked PID exit, marks the job
as completed, and respawns bootstrap.py. The respawn fails to bind port
8787 (the orphaned server still has it), exits non-zero, supervisor
respawns again — loop.
In foreground mode, bootstrap.py does its setup work and then calls
os.execv to replace its own process with server.py. The supervisor
sees the long-lived server as the original child. KeepAlive=true /
Restart=always work correctly.
launchd (macOS)
~/Library/LaunchAgents/com.example.hermes-webui.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.example.hermes-webui</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/yourname/hermes-webui/start.sh</string>
<string>--foreground</string>
</array>
<key>WorkingDirectory</key>
<string>/Users/yourname/hermes-webui</string>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/Users/yourname/.hermes/webui/launchd-stdout.log</string>
<key>StandardErrorPath</key>
<string>/Users/yourname/.hermes/webui/launchd-stderr.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>HOME</key>
<string>/Users/yourname</string>
<key>PATH</key>
<string>/usr/local/bin:/usr/bin:/bin</string>
</dict>
</dict>
</plist>
Load:
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
launchctl print gui/$(id -u)/com.example.hermes-webui # check state
Reload after editing the plist:
launchctl unload ~/Library/LaunchAgents/com.example.hermes-webui.plist
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
launchd sets XPC_SERVICE_NAME automatically, so even without the
--foreground argument the Web UI will auto-promote to foreground mode.
The flag is still recommended as documentation of intent.
systemd (Linux)
~/.config/systemd/user/hermes-webui.service:
[Unit]
Description=Hermes Web UI
After=network.target
[Service]
Type=simple
WorkingDirectory=%h/hermes-webui
ExecStart=/bin/bash %h/hermes-webui/start.sh --foreground
Restart=on-failure
RestartSec=5
# Optional: route stdout/stderr to journald instead of files
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=default.target
Enable + start:
systemctl --user daemon-reload
systemctl --user enable --now hermes-webui.service
journalctl --user -u hermes-webui.service -f
systemd sets INVOCATION_ID and JOURNAL_STREAM (when stdio is wired to
the journal), both of which auto-promote to foreground mode.
supervisord (cross-platform)
/etc/supervisor/conf.d/hermes-webui.conf:
[program:hermes-webui]
command=/bin/bash /home/youruser/hermes-webui/start.sh --foreground
directory=/home/youruser/hermes-webui
user=youruser
autostart=true
autorestart=true
stopsignal=TERM
stopwaitsecs=10
stdout_logfile=/var/log/hermes-webui.out.log
stderr_logfile=/var/log/hermes-webui.err.log
environment=HOME="/home/youruser",PATH="/usr/local/bin:/usr/bin:/bin"
Reload + start:
sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl status hermes-webui
supervisord sets SUPERVISOR_ENABLED, which auto-promotes to foreground
mode.
Auto-detected env vars (full list)
These trigger --foreground behavior even when the flag is not passed:
| Env var | Set by | Notes |
|---|---|---|
INVOCATION_ID |
systemd | Set on every service activation |
JOURNAL_STREAM |
systemd | Set when stdio is wired to journald |
NOTIFY_SOCKET |
systemd Type=notify / s6 |
sd_notify-style notification socket |
XPC_SERVICE_NAME |
launchd | Set to the plist Label — narrowed to com.<rdns>.<svc> form (see below) |
SUPERVISOR_ENABLED |
supervisord | Always set under supervisord |
HERMES_WEBUI_FOREGROUND |
you | Explicit opt-in; accepts 1 / true / yes / on |
XPC_SERVICE_NAME noise filter
macOS launchd sets XPC_SERVICE_NAME in every Terminal-spawned shell,
not just real services. Typical noise values:
0— set on launchd descendants generallyapplication.com.apple.Terminal.<UUID>— Terminal.app shellsapplication.com.googlecode.iterm2— iTerm2application.com.microsoft.VSCode— VSCode integrated terminal
A bare existence check on this var would auto-promote interactive
./start.sh runs to foreground mode on every Mac dev machine, breaking
the most common installation path. We narrow detection to launchd
Label-style names (typically reverse-DNS like com.example.foo).
Real launchd plists always use this form. If you ever see
XPC_SERVICE_NAME=0 in your service environment, the auto-detect will
ignore it — set HERMES_WEBUI_FOREGROUND=1 or pass --foreground
explicitly to be safe.
Supervisors that are NOT auto-detected
The following set no env var that we can reliably detect. Pass
--foreground (or HERMES_WEBUI_FOREGROUND=1) explicitly:
- runit (without sd_notify) — pure runit chains
- daemontools /
svc - PM2 (Node.js process manager occasionally repurposed for Python)
- Foreman / Honcho (Procfile-style)
- Docker with a custom CMD entrypoint that doesn't already use
exec - Custom shell-script supervisors that fork-and-wait
If your supervisor isn't in the auto-detect list and you see the orphan-PID
respawn loop, set HERMES_WEBUI_FOREGROUND=1 in the service environment.
Diagnostic recipe
If the Web UI keeps getting respawned and you suspect the double-fork loop:
# Check the running PID for the server
lsof -iTCP:8787 -sTCP:LISTEN
# Get its parent — should be the supervisor itself, NOT init (PID 1)
PID=$(lsof -tiTCP:8787 -sTCP:LISTEN)
ps -p "$PID" -o pid,ppid,cmd
ps -p "$(ps -o ppid= -p "$PID" | tr -d ' ')" -o pid,cmd
A healthy foreground-mode setup looks like:
PID PPID CMD
12345 6789 /path/to/python /path/to/server.py
6789 1 /sbin/launchd # or /usr/lib/systemd/systemd, etc.
If PPID is 1 (init) when it should be the supervisor, the orphan-server
loop is happening — re-check that --foreground (or one of the env vars)
is reaching the process.
HTTP watchdog / deep health
KeepAlive / Restart=always only recover a process that exits. If the
process is still listening on the port but request handling is wedged, pair your
supervisor with an HTTP probe and force a restart when the probe fails.
Hermes Web UI exposes two health levels:
/health— cheap liveness probe withactive_streams, uptime, and anaccept_loopheartbeat counter./health?deep=1— readiness probe that briefly acquires the stream lock, reads the sidebar/session path, reads projects state, and touches Hermesstate.dbif it exists. Use this for watchdogs.
At startup the server also tries to raise its file-descriptor soft limit to
4096 on platforms that support RLIMIT_NOFILE. That is defense in depth for
persistent hosts: leaks should still be fixed, but a higher soft limit gives
you more diagnostic headroom before request handling falls over.
Minimal macOS launchd watchdog script:
#!/usr/bin/env bash
set -euo pipefail
LABEL="com.example.hermes-webui"
BASE="http://127.0.0.1:8787"
if ! curl -fsS --max-time 10 "$BASE/health?deep=1" >/dev/null; then
launchctl kickstart -k "gui/$(id -u)/$LABEL"
fi
Run it every few minutes from a separate StartInterval LaunchAgent. For
systemd, prefer a timer/service pair that runs the same curl probe and
systemctl --user restart hermes-webui.service on failure.
The accept_loop.requests_total value should increase when probes arrive. If
it stays flat while the process is still alive, the server accept loop is not
making progress; capture logs/thread samples before restarting if you are
collecting diagnostics for a bug report.