diff --git a/docs/rfcs/README.md b/docs/rfcs/README.md index 9f40371a..4e2d1d37 100644 --- a/docs/rfcs/README.md +++ b/docs/rfcs/README.md @@ -32,5 +32,8 @@ First-time contributor RFCs should be discussed in an issue before opening a PR. ## Current RFCs +- [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md) — Event/control + compatibility contract and gap matrix for moving WebUI chat runs to Hermes-owned + runtime execution. - [`turn-journal.md`](turn-journal.md) — Crash-safe WebUI turn journal for recovering interrupted chat submissions. diff --git a/docs/rfcs/hermes-run-adapter-contract.md b/docs/rfcs/hermes-run-adapter-contract.md new file mode 100644 index 00000000..d6ff2b23 --- /dev/null +++ b/docs/rfcs/hermes-run-adapter-contract.md @@ -0,0 +1,315 @@ +# Hermes Run Adapter Compatibility Contract + +- **Status:** Proposed +- **Author:** @Michaelyklam +- **Created:** 2026-05-11 +- **Tracking issue:** [#1925](https://github.com/nesquena/hermes-webui/issues/1925) + +## Problem + +Hermes WebUI currently gives a rich workbench experience, but browser-originated +chat turns are still executed inside the WebUI server process. The WebUI path +creates process-local stream state, starts background agent threads, constructs or +reuses `AIAgent`, and owns callback queues for token, tool, reasoning, approval, +and clarify state. + +The target boundary from #1925 is: + +> WebUI should be thin in execution ownership, not thin in product scope. + +That means WebUI remains the full browser workbench for sessions, workspace +files, chat rendering, tools, approvals, status, diagnostics, and controls. The +change is that Hermes Agent must own run lifecycle, event ordering, replay, +approvals, clarify, cancellation, and terminal state. + +This document defines the first reviewable contract for a Hermes-owned run +adapter. It is intentionally a spec/gap matrix, not an implementation plan for a +new WebUI runtime surrogate. + +## Goals + +- Keep the browser-facing WebUI workbench contract stable while execution moves + out of the WebUI process. +- Define the minimum Hermes Runtime API / IPC v0 surface WebUI needs before it + can route new runs to Hermes-owned execution. +- Map current WebUI-owned runtime primitives to Hermes-owned APIs, WebUI + presentation state, or explicit temporary compatibility shims. +- Make restart/reattach the first meaningful success criterion, not merely + "basic chat streamed once." + +## Non-goals + +- Do not implement the adapter in this RFC. +- Do not create a new run-manager sidecar or broker requirement. +- Do not re-create `STREAMS`, cached `AIAgent` objects, approval queues, clarify + queues, or cancellation flags under new names inside WebUI. +- Do not reduce WebUI product scope. The rich workbench UX remains in WebUI. +- Do not require every event to be durably persisted on day one if the first + upstream runtime slice can still prove Hermes-owned execution and reconnect. + +## Ownership boundary + +### Hermes Agent owns + +- run creation and lifecycle +- run ids and session-to-active-run mapping +- ordered event stream and replay cursor +- terminal run state, final result, and error metadata +- model/provider/profile/toolset routing +- agent execution and tool dispatch +- command semantics and capability metadata +- approval and clarify lifecycle +- cancel, interrupt, queue, continue, steer, and goal control where supported +- durable runtime/session state needed for reconnect + +### WebUI owns + +- browser authentication and presentation-specific session routing +- chat layout, transcript rendering, tool cards, thinking/progress display +- approval and clarify widgets +- workspace/file-panel UX +- settings/admin/diagnostics presentation +- adapting Hermes runtime events into WebUI-compatible browser events +- temporary compatibility shims explicitly listed in this RFC + +## WebUI event/control compatibility contract + +The browser-facing contract should remain stable enough that the current WebUI +workbench can render either the legacy in-process runtime or the Hermes-owned run +adapter during migration. These are presentation events over Hermes runtime +truth, not a second source of truth. + +All events should include enough metadata for idempotent rendering and +reconnect: + +```json +{ + "event_id": "run_123:42", + "seq": 42, + "run_id": "run_123", + "session_id": "20260511_...", + "type": "tool.update", + "created_at": 1778540000.0, + "terminal": false, + "payload": {} +} +``` + +`event_id` may be an SSE `id:` value or an equivalent cursor token. `seq` is a +monotonic per-run cursor. Clients may send `Last-Event-ID` or `after_seq` on +reconnect. The runtime should treat replay as at-least-once delivery; WebUI must +deduplicate by `run_id` + `seq` / `event_id`. + +### Event families + +| WebUI event family | Required payload | Runtime source of truth | +|---|---|---| +| `run.started` / `status` | lifecycle state, controls available, session id, workspace/profile/model/toolset summary | Hermes run state | +| `token.delta` | assistant message id/segment id, delta text, optional content type | Hermes model output stream | +| `reasoning.delta` / `reasoning.done` | reasoning text or structured reasoning block, visibility metadata | Hermes reasoning callback/event stream | +| `progress` | concise status/progress text, optional phase/tool context | Hermes agent progress callbacks | +| `tool.started` | tool call id, tool name, sanitized arguments, start time | Hermes tool dispatch lifecycle | +| `tool.updated` | stdout/stderr/structured partial data, progress metadata | Hermes tool dispatch lifecycle | +| `tool.done` | result, exit/status, duration, error flag | Hermes tool dispatch lifecycle | +| `approval.requested` | approval id, command/action summary, risk metadata, available choices | Hermes approval queue/control plane | +| `approval.resolved` | approval id, choice, resulting status | Hermes approval queue/control plane | +| `clarify.requested` | clarify id, question, choices/input mode | Hermes clarify lifecycle | +| `clarify.resolved` | clarify id, answer metadata/status | Hermes clarify lifecycle | +| `title.updated` | title text, title source/confidence | Hermes session/title subsystem | +| `usage.updated` / `usage.final` | tokens, cost, model/provider, duration where available | Hermes usage accounting | +| `error` | stable error code, safe message, redacted diagnostic metadata, terminal flag | Hermes run terminal/error state | +| `done` | final lifecycle state, usage, terminal result/error summary, last seq | Hermes run terminal state | + +### Reconnect metadata + +Every active or terminal run must expose: + +- `run_id` +- `session_id` +- current `status`: `queued`, `running`, `awaiting_approval`, + `awaiting_clarify`, `paused`, `cancelling`, `cancelled`, `failed`, + `completed`, or `expired` +- last committed event cursor / `last_event_id` +- terminal state and final result/error when finished +- currently available controls +- pending approval/clarify ids, if any +- session-to-active-run mapping for the current WebUI session + +### Controls + +| WebUI control | Required semantics | Runtime endpoint / IPC | +|---|---|---| +| cancel | Request graceful cancellation of the current run; terminal event must follow | `cancel_run` / `interrupt` | +| queue / continue | Append follow-up work to a live, paused, or resumable run/session according to Hermes semantics | `queue_or_continue` | +| approval | Resolve a pending approval request with `allow_once`, `allow_session`, `always`, or `deny` where supported | `respond_approval` | +| clarify | Submit answer text or selected choice for a pending clarify request | `respond_clarify` | +| goal | Set/status/pause/resume/clear goal where Hermes exposes goal capability for this surface | command/capability API | +| observe | Attach to live events and replay from cursor | `observe_run` | +| status | Poll lifecycle state when SSE/WebSocket is unavailable | `get_run` | + +WebUI may keep local UI state such as which disclosure rows are expanded, but it +must not infer or privately mutate runtime state for these controls. + +## Hermes Runtime API / IPC v0 minimum + +The transport can be HTTP, stdio IPC, websocket, or another Hermes-owned local +protocol. The key requirement is the semantic contract: Hermes owns the run id, +lifecycle, event cursor, controls, pending human-interaction state, and terminal +state. + +### `start_run` + +Creates a Hermes-owned run. + +Input fields: + +- `session_id` or instruction to create one +- user message / queued input +- workspace context and attachments metadata +- profile/provider/model/toolset hints +- source/surface metadata, e.g. `source=webui` +- optional command intent, e.g. `/goal` if parsed by WebUI command UI +- idempotency key for duplicate browser submissions + +Output fields: + +- `run_id` +- `session_id` +- initial `status` +- `observe` cursor / first event id +- supported controls for this run + +### `observe_run` + +Streams ordered run events, with replay from a cursor. + +Required behavior: + +- support `after_seq` or `Last-Event-ID` +- emit events in monotonically increasing per-run order +- replay terminal `error` / `done` state for completed runs +- make duplicate delivery safe for reconnecting clients +- preserve enough history for short WebUI restarts and browser reloads + +### `get_run` + +Returns current lifecycle state without consuming the event stream. + +Required fields: + +- `run_id`, `session_id`, `status` +- `created_at`, `updated_at`, optional `completed_at` +- `last_seq` / `last_event_id` +- active controls +- pending approval/clarify summaries +- terminal result/error summary +- usage/model/provider/profile/toolset summary where available + +### `cancel_run` / interrupt + +Requests graceful run cancellation or interruption. Hermes owns the final state +transition and emits a terminal event. WebUI should not directly toggle a local +cancellation flag as the source of truth. + +### `queue_or_continue` + +Submits follow-up work for a live, paused, or resumable run/session. Semantics +must match Hermes-native queue/continue behavior so WebUI does not create a +parallel continuation model. + +### `respond_approval` + +Resolves a pending approval request by id. + +Required behavior: + +- validate the approval belongs to the run/session +- accept only supported choices +- emit `approval.resolved` +- continue, pause, or fail the run according to Hermes approval semantics + +### `respond_clarify` + +Resolves a pending clarification request by id. + +Required behavior: + +- validate the clarify request belongs to the run/session +- accept text or selected-choice payloads +- emit `clarify.resolved` +- continue or fail the run according to Hermes clarify semantics + +## Gap matrix + +| Current WebUI primitive | Current role | Hermes-owned target | Temporary shim allowed? | Notes / gap | +|---|---|---|---|---| +| `STREAMS` / `STREAMS_LOCK` | Process-local live stream registry and subscriber fan-out | Hermes run registry + `observe_run` replay/fan-out | Yes, adapter may keep per-browser SSE connections only | Shim must not be the run source of truth and must survive WebUI restart by re-observing Hermes. | +| `CANCEL_FLAGS` | Local cancellation signal checked by WebUI-owned agent thread | `cancel_run` / interrupt control | No, except translating button clicks into runtime calls | Cancellation result must come back as Hermes status/events. | +| `AGENT_INSTANCES` | Cached `AIAgent` objects inside WebUI process | Hermes Agent runtime owns agent construction/reuse | No | Keeping this in the adapter would recreate the runtime surrogate. | +| Partial text buffers | Reconstruct live assistant deltas for browser reconnect/render | Hermes event log/cursor plus WebUI renderer cache | Short-lived presentation cache only | Source should be replayed token events or persisted transcript, not WebUI-only execution state. | +| Reasoning buffers | Preserve streamed reasoning/thinking text | Hermes reasoning events + replay | Short-lived presentation cache only | Replay must rebuild the same thinking cards after refresh. | +| Tool buffers / live tool calls | Render tool cards and updates | Hermes tool lifecycle events + replay | Short-lived presentation cache only | WebUI owns card rendering, not tool execution state. | +| Approval callbacks and queues | Bridge WebUI buttons to a live Python callback | Hermes pending approval state + `respond_approval` | No private callback queue | Pending approval must be discoverable after WebUI restart. | +| Clarify callbacks and queues | Bridge WebUI form to a live Python callback | Hermes pending clarify state + `respond_clarify` | No private callback queue | Pending clarify must be discoverable after WebUI restart. | +| Command capability metadata | Decide which slash commands render/execute in WebUI | Hermes command registry/capability API with owner/surface metadata | WebUI may cache metadata | Unknown commands should not be reimplemented in WebUI by default. | +| Session-to-active-run mapping | Stored implicitly in WebUI session JSON / active stream ids | Hermes session/run mapping API | WebUI may cache last seen run id | Reopen session must rediscover active/completed run from Hermes. | +| Reconnect/replay behavior | Depends on WebUI process memory and session JSON | `observe_run(after_seq)` + `get_run` terminal state | Browser SSE adapter only | First milestone must prove WebUI restart does not orphan the run. | +| Usage/title/status events | Produced by WebUI streaming callbacks | Hermes usage/title/status events and run state | WebUI formatting only | WebUI can display and persist presentation copies after events arrive. | +| Goal / queue / continue hooks | Mixed WebUI command handling and streaming callbacks | Hermes command/control plane | Only UI affordance shim | Goal support should be driven by Hermes capabilities. | + +## Migration ladder + +1. **Inventory and contract**: keep this RFC current with the current WebUI-owned + runtime primitives and browser event/control contract. +2. **Hermes Runtime API / IPC v0**: add or stabilize upstream Hermes primitives + for `start_run`, `observe_run`, `get_run`, `cancel_run`, and replayable event + cursors. +3. **Read-only observation spike**: from WebUI, observe an existing Hermes-owned + run and adapt its events into WebUI-compatible event objects without starting + a WebUI-owned agent thread. +4. **Feature-flagged new-run path**: route new WebUI runs to Hermes-owned + `start_run` behind a flag while preserving the legacy path as fallback. +5. **Restart/reattach milestone**: prove a non-trivial WebUI-started run + survives a WebUI-only restart and browser reload with ordered replay. +6. **Controls migration**: move cancel, queue/continue, approval, clarify, and + goal controls to Hermes-owned endpoints/capabilities. +7. **Parity tests**: compare legacy and adapter event streams for synthetic + token, reasoning, tool, approval, clarify, error, and done scenarios. +8. **Retire runtime surrogate state**: remove normal WebUI chat ownership of + `AIAgent`, cancellation flags, callback queues, and process-local run truth + once parity and fallback criteria are satisfied. + +## First success criterion + +The first implementation milestone is not "basic chat streams through a new +endpoint." The first meaningful milestone is: + +1. Start a non-trivial chat run from WebUI through the Hermes-owned path. +2. Restart only `hermes-webui` while the run is active. +3. Reload or reopen the browser session. +4. Rediscover the same `run_id` from Hermes using `session_id` or last known run + metadata. +5. Replay events from the last cursor with no duplicate visible transcript + content. +6. Render the same token/reasoning/tool/approval/clarify state the workbench + would have rendered without the restart. +7. Cancel the run from WebUI and observe Hermes emit the terminal cancelled + state. + +If this works, WebUI is moving toward a protocol translator over Hermes-owned +execution instead of becoming another runtime with different variable names. + +## Open questions + +- Where should the normative Hermes Runtime API / IPC v0 spec live: in + `NousResearch/hermes-agent`, this WebUI RFC, or both with one designated + source of truth? +- What retention window is enough for v0 event replay: active-run memory only, + SQLite-backed event log, or transcript-derived reconstruction plus terminal + state? +- Should WebUI talk to Hermes over the existing API server, an embedded IPC + channel, or a profile-local runtime socket? +- How should multiple clients observing the same run coordinate controls and + pending approval/clarify prompts? +- Which slash commands need surface-specific capability metadata before WebUI + can safely delegate them to Hermes?