Both files had drifted significantly from the actual current state of the project: ROADMAP.md previously contained: - A ~75-row 'sprint history' table that overlapped with CHANGELOG.md - A 'Wave 2 Core' section frozen at Sprint 7 progress - A 'Wave 2: Full CRUD' nested section repeating the same Wave 7 items - A 'User Requested Features' table that double-counted the same shipped issues - A 'Feature Parity Checklist' with many unchecked boxes that were actually shipped (branch/fork via #465, LLM-generated session titles via auto_title_refresh_every, workspace git detection at api/workspace.py:719, code execution and TTS reclassified, etc.) SPRINTS.md previously contained: - 1159 lines of historical sprint plans (Sprints 11-26) - Inline planning detail more appropriate for the private workspace - Stale 'as of v0.50.245' header with 'next sprint Sprint 24' reference - Track-A/B/C breakdowns from sprints already long-merged Rewrite: ROADMAP.md (now 397 lines, was 363): - Status snapshot table at the top - Architecture table reflecting current layout (api/ ~20k LOC, static/*.js ~26k LOC) - Feature parity checklist reorganized by surface (chat / sessions / workspace / cron / skills / memory / profiles / config / security / visual / voice / mobile / i18n / gateway / MCP / distribution) with every line currently in master correctly checked - 'Forward work' section split into confirmed candidates (with tracking issue numbers) vs deferred backlog vs intentionally not planned - 'Sprint history' compressed to a single chronological theme table — per-version detail explicitly redirects to CHANGELOG.md - Versioning conventions documented SPRINTS.md (now 165 lines, was 1159): - Forward-looking only — no historical sprint plans (those live in CHANGELOG.md) - Active sprint candidates table sourced from the sprint-candidate label - Planning principles section (phase-0 fit assessment, salvage over absorb, independent-review gate, per-PR release velocity, no feature creep mid-PR, pre-release gate) - Sprint shape table (typical 3-7 day sprint with phases) - Out-of-scope section centralized - Template for new sprint plans Also updates TESTING.md test count 3990 → 3995 to match actual pytest collect. No private workspace info, agent infra references, or contributor stipend content. References to the maintainer's private planning notes are acknowledged as 'in a private workspace' without further specifics — same disclosure pattern most open-source projects use.
7.4 KiB
Hermes Web UI — Sprint Planning
Forward-looking sprint plan and active queue.
Current state: v0.50.281 | 3995 tests | port 8787 Target A (CLI parity): ✅ Complete Target B (Claude parity): ~95% — full subagent transparency UI and code-execution cells remain
Per-version detail: CHANGELOG.md Sprint history (chronology): ROADMAP.md
How sprints work here
A sprint is a thematic batch — usually 3–8 PRs landed together as a release. Each sprint has:
- Theme — single-sentence framing of what changes
- Items — selected work, with tracking issue / PR
- Out of scope — what is explicitly deferred and why
- Risks — known sharp edges
- Retro — once shipped, what we learned
External contributor PRs that don't fit a planned sprint get released individually as patch versions (v0.50.X). Sprints are reserved for coherent batches where the items reinforce each other.
Active sprint candidates
These are the issues currently labeled sprint-candidate — the inbox the next sprint plan draws from. Each item already has a confirmed root cause or design direction.
| # | Title | Type | Sprint fit |
|---|---|---|---|
| #1458 | Persistent-host crashes — bootstrap fork pattern, state.db FD leak, HTTP-unhealthy wedge | bug, stability | Stability sprint candidate |
| #1426 | OpenRouter free-tier :free variants invisible (tool-support filter) |
bug + feat | Model picker sprint candidate |
| #1362 | In-app OAuth login for Codex and Claude (currently terminal-only) | feat, ux | Onboarding sprint candidate |
| #1360 | macOS desktop app — auto-scroll overrides user scroll (#677 regression) | bug, ux | Desktop app polish |
| #1291 | GLM mutually exclusive between main agent and auxiliary title generation | bug | Auxiliary-route sprint |
The active queue stays small — it's the next 1-2 sprints' worth of work. The broader backlog of feature requests lives in ROADMAP.md → Forward Work and on the GitHub enhancement label.
Planning principles
Phase-0 fit assessment. Every sprint candidate gets a marginal-benefit screen first: does this make the product noticeably better for real users, or does it add surface area that costs more to maintain than it earns? Five-question fit screen — need / shape / bloat / clutter / scope.
Salvage over absorb. When a contributor PR is partial or scoped wrong, prefer splicing the good parts into a maintainer-side PR with Co-authored-by attribution rather than asking for multiple rebase rounds. Only absorb whole when the PR is genuinely shippable as-is.
Independent-review gate. Self-built PRs need either (a) Opus advisor pass on the merged stage diff, or (b) independent review from a separate reviewer. High-risk batches (large LOC, security, locks, durability) get both.
Per-PR release velocity. When a single PR fixes a real bug and has a clean review, ship it as its own patch release the same day rather than waiting for a sprint batch. Friction-free is the goal — sprint batches exist for coherence, not for arbitrary grouping.
No feature creep mid-PR. If a contributor proposes a scope addition during review, file it as a separate PR and link from the original. The current PR keeps its original boundaries.
Pre-release gate (mandatory). Every release runs:
pytest tests/ -q --timeout=120clean- Browser sanity check (HTTP-level API tests against a test server)
- Opus advisor pass on the merged stage diff with a written brief
- CHANGELOG.md + ROADMAP.md + TESTING.md version stamp
- CI green on Python 3.11 / 3.12 / 3.13
Skipping any of these requires a documented "I'm doing an override" from the maintainer.
Sprint shape
A typical sprint runs 3-7 days end to end:
| Phase | Duration | Output |
|---|---|---|
| Triage | 0.5 day | Active queue → selected items + scope notes |
| Design / spike | 0.5–1 day | Design notes for items needing them; deferrals documented |
| Build | 2–4 days | Each item on its own branch + PR for independent review |
| Review | 0.5–1 day | Maintainer + Opus advisor passes; SHOULD-FIX absorbed in-release |
| Stage + ship | 0.5 day | Stage branch, full test suite, release PR, tag, deploy, verify live |
| Hygiene | 0.5 day | Close PRs / issues, GitHub release notes, docs sync, retro |
Smaller sprints (1–2 PRs) compress to 1–2 days end to end. Single-PR releases skip the stage branch and ship via a release PR directly off the contributor's branch (or a maintainer-rebased copy if the fork doesn't grant write access).
Sprint history
Per-version detail is in CHANGELOG.md. High-level theme chronology is in ROADMAP.md → Sprint History. A few notable sprints worth highlighting:
- Sprint 19 — auth + security hardening. First sprint that made the app safe to leave running beyond localhost.
- Sprint 21 — mobile responsive + Docker. Two-container compose enabled the first wave of self-host deployments.
- Sprint 22 — multi-profile support. Major CLI-parity unlock; profile switching is now seamless without server restart.
- Sprint 25 — macOS desktop application. Native Swift + WKWebView shell, universal Intel + Apple Silicon DMG, Sparkle 2 auto-update. Lives in the separate
hermes-webui/hermes-swift-macrepo. - Sprint 26 — pluggable themes. CSS-variable-driven 8-theme system that lets community contributors add themes as pure CSS.
- Sprint 34 — v0.50.0 UI overhaul. Composer-centric controls, Control Center modal, workspace state machine, rAF streaming throttle.
Out of scope (across all sprints)
These are intentionally not on the roadmap. Listing them here to save planning cycles.
- Multi-user collaboration — single-user assumption throughout the codebase. Refactoring would be a from-scratch architecture change.
- Sharing / public conversation URLs — requires hosted backend with access control + CDN. Out of scope for self-hosted.
- Plugin marketplace — Hermes skills already cover this surface.
- Anthropic / Claude proprietary features — Projects AI memory, Claude artifacts sync. Not reproducible.
- Linux / Windows native app wrappers — macOS done; demand on other platforms not yet established. Web UI works in any browser.
- App Store distribution — sandboxing breaks the local-server model.
- Auto-update mechanism for the Python webapp — Sparkle 2 covers the Mac app; the webapp updates via
git pull+ restart, which is the same as every other Python service.
Templates
When opening a new sprint plan, copy this structure:
# Sprint NN — <theme>
**Version target:** vX.Y.Z
**Theme:** <single sentence>
**Date started:** YYYY-MM-DD
**Status:** PLANNED | IN PROGRESS | COMPLETED — vX.Y.Z
## Items
| # | Issue | Title | Complexity | Files | PR |
|---|-------|-------|------------|-------|-----|
## Rationale
Why these items, why now.
## Build approach
Per-item branch + PR; or single combined branch if items are tightly coupled.
## Out of scope
What did NOT make this sprint and why.
## Known risks
Sharp edges to watch during review.
## PR Status
| Issue | PR | Status |
|-------|-----|--------|
## Retro (post-ship)
What worked, what we'd do differently.
The maintainer's planning notes for each sprint live in the workspace repo (private), not in this file. This file is the public-facing planning shape.