Files
hermes-webui/ROADMAP.md
T
nesquena-hermes 1d7344c602 release: v0.51.31 — Release H (12-PR contributor batch)
CHANGELOG, ROADMAP, TESTING refresh for v0.51.31 stage release covering
12 contributor PRs:

Added (2 PRs):
- #1956 JKJameson — persistent composer draft (server-side, cross-client)
- #1957 hermes-gimmethebeans — configurable session TTL via env + settings

Fixed (10 PRs):
- #1939 ai-ag2026 — theme-color + sw cache regression coverage
- #1941 ai-ag2026 — preserve chat scroll across final render
- #1945 franksong2702 — localize session jump controls (#1938)
- #1947 happy5318 — show same model from different custom providers
  (Co-authored-by hacker1e7 for #1874 close)
- #1949 Sanjays2402 — close #1937 endless-scroll vs Start-jump race
  with generation-token + mutex
  (Co-authored-by franksong2702 + Michaelyklam)
- #1950 franksong2702 — mute stale stopped gateway heartbeat (#1944)
- #1951 amlyczz — gate goal hook on goal-related turns (#1932)
  (Co-authored-by franksong2702 for #1946 close)
- #1953 lucky-yonug — skip provider peel for custom host:port slugs
- #1960 Michaelyklam — translate hidden-files workspace label (#1841)
- #1961 sbe27 — respect image_input_mode (#1959)

Closed in favor of canonical: #1942, #1962, #1946, #1874, #1311.

Stage-326 hotfixes (per Opus advisor):
- CRITICAL #1951 PENDING_GOAL_CONTINUATION race fix (removed finally
  discard that race-erased the marker before consumer could read it)
- #1956 composer-draft input validation (50 KB text / 50 file clamp +
  type coercion to prevent unbounded session-JSON bloat)
- #1957 SESSION_TTL constant preserved as named fallback (existing
  regression tests pin it; #1957 originally deleted it)

Tests: 5006 → 5028 (+51 net new) — 0 regressions, 142.61s runtime.
2026-05-09 18:46:25 +00:00

350 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Hermes Web UI — Roadmap
> Web companion to the Hermes Agent CLI. Same workflows, browser-native.
>
> Last updated: v0.51.31 (May 9, 2026) — 5028 tests collected — Release H 12-PR contributor batch (image-mode fix + race fixes + composer drafts + locale parity + custom-provider dedup + TTL config + heartbeat polish)
> Test source: `pytest tests/ --collect-only -q`
> Per-version detail: see [CHANGELOG.md](./CHANGELOG.md)
---
## Status snapshot
| Surface | Status |
|---|---|
| **Hermes CLI parity** | ✅ Complete — every CLI workflow has a web equivalent |
| **Streaming + tool transparency** | ✅ Live tool cards, reasoning cards, approval prompts, cancel |
| **Multi-provider model support** | ✅ Any provider configured in `config.yaml` shows in the picker |
| **Sessions + projects + search** | ✅ CRUD, content search, projects, tags, archive, fork, import |
| **Mobile + Docker + auth** | ✅ Hamburger nav, slide-overs, password auth, GHCR images |
| **Auxiliary surfaces** | ✅ Workspace tree + edit, cron CRUD, skills CRUD, memory write, MCP server UI |
| **Visual polish** | ✅ 8 themes (incl. light/system/OLED/Sienna), Mermaid, KaTeX, syntax highlighting |
| **Native distribution** | ✅ macOS desktop app (universal arm64+x86_64 DMG, signed) — separate repo |
Remaining gaps and forward work live in [Forward Work](#forward-work) below.
---
## Architecture
| Layer | Files | Status |
|---|---|---|
| Python server | `server.py` (~165 lines) + `api/` modules (~20k lines) | Thin shell + auth middleware + business logic |
| HTML template | `static/index.html` (~600 lines) | Served from disk |
| CSS | `static/style.css` (~3k lines) | Themes, mobile responsive, KaTeX, table styles |
| JavaScript | `static/{ui,sessions,messages,workspace,panels,boot,commands,icons,i18n,login,onboarding}.js` (~26k lines) | 11 modules served as static files |
| Service worker | `static/sw.js` | Offline shell cache, version-pinned assets |
| Docker | `Dockerfile`, `docker-compose.yml` | `python:3.12-slim`, multi-arch (amd64+arm64), HEALTHCHECK |
| CI/CD | `.github/workflows/release.yml` | Auto-release + GHCR publish on tag push |
| Test isolation | `tests/_pytest_port.py` | Per-worktree port + state-dir derivation, no collisions |
---
## Feature parity checklist
### Chat and streaming
- [x] Send messages, get SSE-streaming responses
- [x] Composer-scoped model picker (per-conversation model selection)
- [x] Multi-provider API support — OpenAI, Anthropic, Google, OpenRouter, xAI, GLM, DeepSeek, Mistral, MiniMax, Kimi, OpenCode, Nous Portal, custom OpenAI-compatible endpoints
- [x] Live custom-endpoint model discovery (Ollama, LM Studio, vLLM via `/v1/models`)
- [x] Free-form OpenRouter model name (autocomplete + custom input)
- [x] Tool progress shown inline via live tool cards
- [x] Approval card for dangerous commands (Allow once / session / always, Deny)
- [x] Approval polling + SSE-pushed approval events
- [x] Clarify dialog — agent can ask blocking clarifying questions
- [x] Subagent delegation cards in tool view
- [x] INFLIGHT guard: switch sessions mid-request without losing response
- [x] Session restores from localStorage on page load
- [x] Reconnect banner if page reloaded mid-stream
- [x] SSE auto-reconnect with stream replay
- [x] Token / cost estimate per message and per session
- [x] Context usage indicator (compact ring badge in composer footer)
- [x] Auto-compaction handling + `/compact` command
- [x] rAF-throttled token rendering (smooth, no DOM thrash)
- [x] Cancel / stop button in composer footer
- [x] Reasoning effort selector (low / medium / high / xhigh) + `/reasoning`
- [x] Pure-text streaming with crash-recovery — partial messages restored from localStorage on reload
### Conversation controls
- [x] Copy message to clipboard (hover icon on each bubble)
- [x] Edit last user message and regenerate
- [x] Regenerate last response
- [x] Clear conversation (wipe messages, keep session)
- [x] Branch / fork conversation from any message point (#465)
- [x] Pure-text + tool-call streams both recover
### Sessions
- [x] Create session (+ button or Cmd/Ctrl+K)
- [x] Load session (click in sidebar)
- [x] Delete session (hover trash, toast undo, fallback)
- [x] Auto-title from first user message + adaptive title refresh (configurable cadence)
- [x] LLM-generated titles via auxiliary route (configurable model)
- [x] Rename session inline (double-click, Enter saves, Escape cancels)
- [x] Title search (live filter)
- [x] Content search (full-text across all sessions)
- [x] Date group headers (Today / Yesterday / Earlier) with collapsible groups
- [x] Pin / star sessions to top
- [x] Duplicate session
- [x] Import / Export session as JSON (full messages + metadata)
- [x] Download as Markdown transcript
- [x] Tags (`#tag` extraction + filter chips)
- [x] Archive sessions (hidden by default, "Show N archived" toggle)
- [x] Projects / folders (chip filter bar, "Unassigned" filter)
- [x] Per-session profile tracking
- [x] Per-session toolset override (`/toolsets`)
- [x] Batch select mode (multi-select, bulk delete / move / archive)
- [x] CLI session bridge — read CLI sessions from state.db, import as WebUI sessions
### Workspace and files
- [x] Add workspace with path validation (existing directory, follows symlinks)
- [x] Remove / rename workspace
- [x] Quick-switch from topbar dropdown
- [x] Sidebar live workspace display (name + path)
- [x] New sessions inherit last-used workspace
- [x] Browse workspace directory tree with type icons
- [x] Tree view with expand / collapse + lazy load (#22)
- [x] Breadcrumb navigation in subdirectories
- [x] Preview text / code (read-only)
- [x] Preview markdown (rendered + tables + Mermaid + KaTeX)
- [x] Preview images (PNG, JPG, GIF, SVG, WEBP, AVIF inline)
- [x] Preview PDF / SVG / audio / video / Excalidraw / CSV / JSON / YAML
- [x] Edit files inline (Edit button, Enter saves, Escape cancels)
- [x] Create / rename / delete files and folders (in current directory)
- [x] Drag-drop / click / clipboard paste upload
- [x] Archive upload (zip / tar) with extraction
- [x] Syntax highlighted code preview (Prism.js, language-aware)
- [x] File preview auto-close on directory navigation
- [x] Right panel resizable (drag inner edge)
- [x] Embedded workspace terminal (`/api/terminal/{start,input,output}`)
- [x] Git branch + dirty status badge in workspace header
### Cron jobs
- [x] List all cron jobs (Tasks sidebar tab)
- [x] View job details (prompt, schedule, last run, output)
- [x] Run / pause / resume / delete
- [x] Create job from UI (name, schedule, prompt, delivery target)
- [x] Edit job inline (full create-form parity, including skills)
- [x] Skill picker in create + edit forms
- [x] Cron run history viewer (expandable per job)
- [x] Cron completion alerts (toast + badge)
- [x] Run-status tracking with live watch mode
### Skills
- [x] List all skills grouped by category
- [x] Search / filter by name, description, category
- [x] View full SKILL.md content
- [x] View skill linked files
- [x] Create / edit / delete skill
- [x] `/skills` slash command
### Memory
- [x] View personal notes (MEMORY.md) rendered as markdown
- [x] View user profile (USER.md) rendered as markdown
- [x] Last-modified timestamp per section
- [x] Add / edit memory entries inline
### Profiles
- [x] Multi-profile support — create, switch, delete (#28)
- [x] Topbar profile picker with gateway-status dots
- [x] Profile management panel (full CRUD)
- [x] Seamless switching (no server restart, refreshes models / skills / memory / cron / workspace)
- [x] Profile-local workspace storage
- [x] First-run onboarding wizard with provider config (OpenRouter / Anthropic / OpenAI / Custom)
- [x] In-app OAuth for Codex and Claude
### Configuration
- [x] Settings panel (default model, default workspace, send key, theme, voice, font size)
- [x] Send key preference (Enter or Ctrl+Enter)
- [x] Password authentication (off by default)
- [x] Per-session toolset override
- [x] Personality config via `config.yaml`
- [x] Reasoning effort persistence
### Notifications
- [x] Cron job completion alerts
- [x] Background agent error banner
- [x] Approval pending badge
- [x] Provider / model mismatch toast warning
### Slash commands
- [x] Command registry + autocomplete dropdown
- [x] Built-ins: `/help`, `/clear`, `/model`, `/workspace`, `/new`, `/usage`, `/theme`, `/compact`, `/queue`, `/interrupt`, `/steer`, `/goal`, `/btw`, `/reasoning`, `/skills`, `/toolsets`
- [x] Transparent pass-through for unrecognized commands
### Security
- [x] Password auth with signed HMAC HTTP-only cookies (24h TTL)
- [x] Security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy)
- [x] CSRF protection (scheme-aware, port-normalized for reverse proxies)
- [x] PBKDF2 password hashing
- [x] Rate limiting on auth endpoints
- [x] Session ID validation
- [x] SSRF guard on `/api/models/live`, `cfg_base_url`, `custom_providers[]`
- [x] ENV_LOCK around env mutations
- [x] XSS sanitization on all rendered HTML
- [x] HMAC-signed signing keys (random per install)
- [x] Skills path-traversal guard
- [x] Secure cookie flags (HttpOnly, SameSite, Secure when HTTPS)
- [x] Error message sanitization (no stack traces in responses)
- [x] POST body size limit (20MB)
- [x] Upload path-traversal guard
- [x] Credential redaction in API responses
- [x] Profile `.env` secret isolation on switch
- [x] Auto-install gate (opt-in via `HERMES_WEBUI_AUTO_INSTALL=1`)
### Visual / UX
- [x] 8 themes — Dark, Light, System (auto-sync), Slate, Solarized, Monokai, Nord, OLED, Sienna
- [x] 2-axis appearance model (theme + skin) for community theme contributions
- [x] Mermaid diagram rendering
- [x] KaTeX math rendering with fence-before-math fix
- [x] Syntax highlighting (Prism.js, language-aware, YAML newline preservation)
- [x] Markdown image syntax `![alt](url)` and inline MEDIA: tokens render as `<img>`
- [x] Plain URL auto-linking
- [x] Inline markdown in table cells (bold, italic, code, links)
- [x] Code block copy button
- [x] Tool card expand / collapse toggle
- [x] Collapsible thinking / reasoning cards (Claude extended thinking, o3 reasoning tokens)
- [x] Message timestamps (subtle, full date on hover)
- [x] Empty composer hides send button (icon-circle with pop-in animation)
- [x] Pluggable Lucide SVG icons (no emoji rendering inconsistencies)
- [x] Composer-centric controls (v0.50.0 UI overhaul)
- [x] Hermes Control Center modal (centralized actions)
- [x] Workspace panel state machine (defaults closed, opens for browsing / preview)
- [x] PWA manifest + service worker (offline shell)
- [x] Favicon (SVG + PNG + ICO)
- [x] Branded onboarding wizard
### Voice
- [x] Voice input via Web Speech API (push-to-talk dictation)
- [x] Hands-free voice mode (turn-based conversation, opt-in via Settings → Preferences)
- [x] TTS playback of responses (configurable voice, rate, pitch)
### Mobile
- [x] Hamburger sidebar (slide-in overlay)
- [x] Bottom navigation bar (5-tab iOS-style)
- [x] Files slide-over (right panel as slide-over)
- [x] 44px minimum touch targets
- [x] Container queries on composer
- [x] Android Chrome compatibility fixes
- [x] PWA installation (manifest + icons + Android support)
### Internationalization
- [x] 9 locales — English, Japanese, Russian, Spanish, German, Chinese (zh + zh-Hant), Portuguese, Korean, French
- [x] Key-parity test ensures every locale has every key
- [x] Right-to-left and CJK input (IME composition fixes)
### Gateway integration
- [x] Real-time gateway sessions in sidebar (Telegram, Discord, Slack, Weixin) via SSE + DB polling
- [x] Cross-channel handoff dock — composer-docked flyout summarizing the live external session
- [x] Transcript-summary card at 10+ rounds
- [x] Sidebar dedup keying on per-conversation identity (distinct chats from same platform stay separate)
- [x] Gateway session sync skips dup / delete options for external sessions
- [x] LLM Gateway routing metadata display — assistant turns and session metadata show the served model/provider, failover path, and model-switch warnings when response metadata includes `used_provider`, `used_model`, or `routing` (#732)
### MCP integration
- [x] MCP server management UI (System Settings → MCP Servers)
- [x] Add / edit / delete MCP server entries
### Distribution
- [x] Docker support (multi-arch amd64 + arm64, HEALTHCHECK, UID/GID auto-detect)
- [x] Two-container Docker compose (webui + agent)
- [x] GHCR auto-publish on tag push
- [x] Subpath mount support (reverse proxy at `/hermes/`)
- [x] PWA installable from any browser
- [x] Native macOS app — universal Intel + Apple Silicon, signed + notarized DMG, Sparkle 2 auto-update — see `hermes-webui/hermes-swift-mac` repo
---
## Forward work
### Confirmed candidates (open feature requests with sprint-candidate or active interest)
| Theme | Tracking | Why |
|---|---|---|
| Persistent-host stability | #1458 | Bootstrap fork pattern crashes under launchd / systemd — partial fix shipped (foreground mode); state.db FD leak and HTTP-unhealthy wedge remain |
| Free-tier OpenRouter variants visible | #1426 | `:free` tool-support filter currently hides them from the picker |
| macOS scroll override regression | #1360 | Auto-scroll sometimes overrides user scroll on the desktop app |
| GLM dual-use (main + auxiliary) | #1291 | Currently mutually exclusive; same provider can't serve both surfaces |
| Auto-assign session to filtered project | #1468 | When user is filtering by project X, new session should default to project X |
| Update banner "What's new?" link | #1512 | Surface release highlights from the update banner |
| Sunset legacy `LMSTUDIO_API_KEY` env var | #1502 | Tracking issue — alias stays for one minor cycle, then removed |
| Hermes Agent dashboard cross-link | #1459 | Detect a running Hermes Agent and surface link in nav |
| Gateway status card in Settings | #1457 | Current gateway-status dots only on profile picker |
| Insights — daily token chart + per-model breakdown | #1456 | Existing usage badge is per-message; need rollup view |
| Logs tab — view agent / errors / gateway logs | #1455 | Currently requires terminal access to log files |
| Model picker collision handling | #1425 | Same-name models from different providers aren't disambiguated in dropdown |
| "Reveal in Finder" right-click on workspace | #1424 | macOS desktop app convenience |
| Configurable session persistence timing | #1406 | Currently every checkpoint, want operator control |
| Silent credential self-heal on 401 | #1401 | Gateway auth.json drift should resolve without user re-auth |
| LLM Wiki status panel | #1257 | On / off toggle for Wiki integration |
| Lightweight in-app Canvas editing | #1255 | Text canvas for prompt drafting / shared notes |
| Provider / Model source-of-truth alignment | #1240 | Reconcile WebUI vs CLI vs Gateway provider resolution |
| Built-in SearXNG web search | #1037 | Lightweight search tool with on / off toggle |
| Subagent session relationship view | #1004 | Show subagent hierarchy in sidebar with expand / collapse |
### Backlog (deferred, listed for visibility)
- **Insights / monitoring suite** — agent heartbeat + alerts (#716), quota / rate-limit display (#706), data tabs (#722), monitor dashboard concepts (#766, #721)
- **Native MCP server expose** — Hermes WebUI as an MCP server for direct agent integration (#733)
- **Teams / agents management panel** — editable names, roles, assignments (#719)
- **Web UI profile model alignment with Hermes runtime** — design parity (#749)
- **DOM windowing / message virtualization** — for sessions with hundreds of messages (#734)
- **Searchable global tool list** (#697)
- **Add agent / replace model modals** (#698)
- **Code execution inline cells** — Jupyter-style cell rendering inside chat
- **Sharing / public conversation URLs** — requires hosted backend with access control (out of scope for self-host)
### Intentionally not planned
- Full SwiftUI rewrite of the frontend — the WKWebView shell already gets 95% of native benefit
- App Store distribution — sandboxing breaks the local server model
- Real-time multi-user collaboration — single-user assumption throughout
- Plugin marketplace — Hermes skills cover this surface
- Anthropic / Claude proprietary features — Projects AI memory, Claude artifacts sync (not reproducible)
---
## Sprint history
Per-version detail lives in [CHANGELOG.md](./CHANGELOG.md). The table below is a high-level chronology of major sprint themes; individual PR / fix detail moved to CHANGELOG to keep this file readable.
| Range | Theme | Highlights |
|---|---|---|
| Sprints 16 | Foundations + workspace | server / static split, JS module split, workspace CRUD, file editor, message queue + INFLIGHT, isolated test environment |
| Sprint 7 | Wave 2 core | Cron / skill / memory CRUD, session content search, health endpoint, git init |
| Sprint 8 | Daily-driver finish line | Edit + regenerate, regenerate last response, clear conversation, Prism.js, queue + INFLIGHT polish |
| Sprints 910 | Codebase health + operational polish | `app.js` → 6 modules, server.py → `api/` modules, tool card UX, background task cancel, regression tests |
| Sprint 11 | Multi-provider models + streaming | Dynamic model dropdown, smooth scroll pinning, routes extracted to `api/routes.py` |
| Sprint 12 | Settings + reliability + session QoL | Settings panel, SSE auto-reconnect, pin sessions, JSON import |
| Sprint 13 | Alerts + polish | Cron alerts, background error banner, session duplicate, browser tab title |
| Sprint 14 | Visual polish + workspace ops | Mermaid, message timestamps, file rename, folder create, session tags, archive |
| Sprint 15 | Session projects + code copy | Projects / folders, code copy button, tool card expand / collapse |
| Sprint 16 | Sidebar visual polish | SVG icons, action dropdown, pin indicator, project border, safe HTML rendering |
| Sprint 17 | Workspace polish + slash commands | Breadcrumb nav, slash command autocomplete, send key setting (#26) |
| Sprint 18 | Thinking display + workspace tree | File preview auto-close, thinking / reasoning cards, expandable directory tree (#22) |
| Sprint 19 | Auth + security hardening | Password auth, login page, security headers, body limit (#23) |
| Sprint 20 | Voice input + send button | Web Speech API voice, send button polish |
| Sprint 21 | Mobile responsive + Docker | Hamburger sidebar, mobile nav, slide-over files, Docker support (#21, #7) |
| Sprint 22 | Multi-profile support | Profile picker, management panel, seamless switching, per-session tracking (#28) |
| Sprint 23 | Agentic transparency | Token / cost display, subagent cards, skill picker in cron, profile-local storage |
| Sprint 24 | Web polish | rAF streaming, git detection, collapsible date groups, context ring (#80, #81, #82, #83) |
| Sprint 25 | macOS desktop application | Native Swift + WKWebView shell, universal DMG, Sparkle 2 auto-update — separate repo |
| Sprint 26 | Pluggable themes | Light / Slate / Solarized / Monokai / Nord, settings unsaved-changes guard, `/theme` |
| Sprint 27 | Theme polish | 30+ hardcoded colors → CSS variables, light theme final polish |
| Sprint 28 | Security hardening | Env race fix, random signing key, upload traversal, PBKDF2 |
| Sprints 2932 | Model routing + custom endpoints + reasoning | Model routing by provider prefix, custom endpoint URL fix, OLED theme, top-level reasoning, message_count sync |
| Sprint 33 | Approval card + Lucide icons | Approval prompt surfaced, emoji → SVG, login CSP fix, update diagnostics |
| Sprint 34 | v0.50.0 UI overhaul | Composer-centric controls, Control Center modal, workspace state machine, collapsible date groups, rAF throttle, context ring |
| Sprints 3537 | Onboarding + i18n + Spanish | First-run wizard, OpenRouter / Anthropic / OpenAI / Custom config, Spanish locale, Docker two-container, mobile Profiles button |
| Sprints 3840 | Session + UI polish + Sprint 40 | Five-bug clean-up + sidebar timestamp + test port isolation |
| Sprints 4142 | Renderer hardening + KaTeX + handoff | Context ring live usage, renderMd link / image / code stash chain, MEDIA: image rendering, gateway handoff foundation |
| Sprints 43+ | Continuous contributor sprints | Custom providers, Russian locale, IME fixes, model-switch toast, approval queue multi-slot, profile polish, font-size CSS, contributor wave |
---
## Versioning conventions
- **Patch** (`v0.50.X`) — small batches, contributor PR releases, hotfixes
- **Minor** (`v0.X.0`) — sprint completion, new feature surface, architecture milestone
- **Major** (`v1.0.0`) — declared when CLI parity + Claude parity reach steady state and the feature surface stabilizes
Per-version detail and contributor attribution live in [CHANGELOG.md](./CHANGELOG.md).