Commit Graph

121 Commits

Author SHA1 Message Date
nesquena-hermes e35c94bf55 Stage 393: PR #2615 2026-05-20 22:23:53 +00:00
dobby-d-elf 87527ff4f6 Fix state db legacy dedup repeat preservation 2026-05-20 14:18:47 -06:00
nesquena-hermes 3d34eef02d Stage 389: PR #2620 2026-05-20 16:41:45 +00:00
Isla Liu 2a303de2a3 fix(session): preserve retry budget while journal is still arriving 2026-05-20 20:55:07 +08:00
Isla Liu d5a185d9c6 fix(session): serialize lazy journal retry per session 2026-05-20 20:48:38 +08:00
manji ff0aa69d5f fix(session): use second-level timestamp granularity in legacy dedup key
The _normalized_message_timestamp_for_key helper was preserving
microsecond precision (%.6f). When the same message is persisted by
both the WebUI sidecar JSON writer and the Hermes agent state.db
writer, their timestamps can differ by a few microseconds, causing
_session_message_merge_key to produce different keys for the same
logical message and letting both copies survive the dedup pass in
merge_session_messages_append_only.

Truncating to second-level granularity collapses sub-second drift to
the same key, so the duplicate is suppressed correctly.

Fixes #2616
2026-05-20 07:13:55 +00:00
Lumen Yang b2c6af12f1 fix(webui): prefer sidecar counts over stale session index 2026-05-20 05:42:55 +00:00
Isla Liu 9870e8f111 fix(session): address Copilot review — scope tool-card dedupe by stream id + tighten docs
Four code-review comments from the automated Copilot reviewer on this PR:

1. `_journal_tool_already_present` dedupe was session-wide, so a
   legitimately-repeated tool (e.g. a second `terminal: ls` in an
   earlier turn) could cause the retry path to falsely skip
   materializing the recovered tool card.  The helper now takes a
   keyword `stream_id` argument; when supplied, a tool card whose
   `_recovered_stream_id` is set AND differs from the candidate is no
   longer treated as a duplicate.  Untagged tool cards (live tools, or
   tool cards carried over from a pre-tagging core transcript) still
   match, preserving the existing 'core transcript already has this
   tool, don't duplicate' invariant.  Two new tests in
   `TestJournalToolDedupeScoping` cover both legs of the rule.

2./3. The troubleshooting FAQ pointed at `~/.hermes/webui/sessions/session_<sid>.json`
   and `~/.hermes/_run_journal/...`.  The actual sidecar filename has
   no `session_` prefix and the run-journal lives under the WebUI
   sessions dir (`~/.hermes/webui/sessions/_run_journal/<sid>/<stream>.jsonl`,
   default).  Both paths fixed and an explicit note added about
   `HERMES_WEBUI_STATE_DIR` overriding the state root.

4. Drop unused `json` / `queue` / `Path` imports from
   `tests/test_session_lost_response_regression.py` so the file stops
   carrying noise that future linting would flag.
2026-05-20 12:18:03 +08:00
Isla Liu 75a26174aa fix(session): lazily retry run-journal recovery so the interrupted-turn marker self-heals
When the WebUI process restarts mid-stream and sidecar repair runs while
the run-journal for the dead stream is not yet visible on disk (WSL2 9p
/ DrvFs page-cache loss, un-fsynced journal tail on network FS, …),
`_append_journaled_partial_output()` returns False and the marker is
permanently baked with the "no agent output was recovered" wording even
though the journaled tokens appear on disk shortly afterwards.

This commit reframes the recovery contract so the read side can
self-heal:

  * `_interrupted_recovery_marker` gains a `pending_retry=True` mode
    that produces a third wording ("Recovering the partial output …
    reload this session to retry.") and stamps a
    `_pending_journal_recovery` flag.
  * `_apply_core_sync_or_error_marker` now writes that pending-retry
    marker (with `_journal_retry_stream_id`,
    `_journal_retry_attempts`, `_journal_retry_first_seen_ts` meta)
    whenever it cannot recover visible output AND the stream id is
    known. The legacy "no output" wording is reserved for the
    no-stream-id case. The core-sync branch leaves marker emission to
    the existing visible-output check (the core transcript itself is the
    canonical history in that branch).
  * A new `_retry_journal_recovery_in_place(session)` helper re-runs
    `_append_journaled_partial_output(…, dedupe_existing=True)` for the
    latest pending marker. On success the marker is promoted in place to
    the recovered-output wording, the journaled rows are reordered to
    sit above the marker (preserving chronological order), and all
    retry meta is stripped. On failure attempts is incremented; after
    _JOURNAL_RETRY_MAX_ATTEMPTS (12) or _JOURNAL_RETRY_GIVEUP_SECONDS
    (24h) the marker is demoted to a neutral "Partial output may have
    been lost." wording.
  * `get_session()` cheaply short-circuits via
    `_session_has_pending_journal_retry()` and invokes the helper on
    both cache-hit and cold-load paths when a pending marker is found.
    `metadata_only=True` skips the helper to keep sidebar refresh
    cheap. The retry call runs OUTSIDE the SESSIONS LOCK to avoid a
    deadlock with `session.save()` write paths.

No streaming write path or run_journal fsync behaviour is changed — the
fix is read-side only.
2026-05-20 11:58:26 +08:00
nesquena-hermes cc8ef201be Stage 387: PR #2600 2026-05-19 22:10:20 +00:00
nesquena-hermes 93727897b6 Stage 387: PR #2605
# Conflicts:
#	api/routes.py
2026-05-19 22:10:20 +00:00
nesquena-hermes e63de7c15f Stage 387: PR #2593
# Conflicts:
#	CHANGELOG.md
2026-05-19 22:08:56 +00:00
Lumen Yang dc5c8168d1 fix(webui): refresh active session on external sidecar updates 2026-05-19 21:34:08 +00:00
Lumen Yang 8d2b9d4a16 feat(webui): render indexed context metadata 2026-05-19 18:52:50 +00:00
nesquena-hermes 86f52f67b8 Stage 386: PR #2581
# Conflicts:
#	api/streaming.py
2026-05-19 18:20:47 +00:00
Michael Lam 0736e45485 fix: dedupe tool-only partial recovery markers 2026-05-19 11:16:21 -07:00
starship-s 2e9ca283dc fix: display canonical cache hit percentage 2026-05-19 02:27:12 -06:00
Lumen Yang 600bb48970 fix(webui): use active state db for metadata summary 2026-05-19 08:02:43 +00:00
Lumen Yang 6ca63e5815 perf(webui): keep external refresh metadata cheap 2026-05-19 08:02:43 +00:00
Lumen Yang a63ab310b5 fix(webui): preserve reconciled session invariants 2026-05-19 08:02:43 +00:00
Lumen Yang 467ef33a24 feat(webui): reconcile external session updates
When API server runs append messages directly to state.db, reconcile WebUI sidecar sessions with those canonical rows across API responses, model-facing streaming context, and active browser refresh.

Add append-only state.db merge helpers, metadata-only counts for refresh polling, and regression coverage for API visibility, context incorporation, and frontend refresh behavior.
2026-05-19 08:02:43 +00:00
Frank Song 4661a5e94e Recover journal output after core transcript sync 2026-05-17 12:28:05 +08:00
nesquena-hermes 8f98465024 Stage 374: PR #2427 — fix(streaming): recover journaled partial assistant output after WebUI restart by @franksong2702 (fixes #2423)
Co-authored-by: Frank Song <franksong2702@gmail.com>
2026-05-17 02:49:35 +00:00
nesquena-hermes 47c210899e Stage 374: PR #2421 — fix(cache-tokens): surface provider prompt-cache read/write tokens in WebUI usage by @Michaelyklam (fixes #2419)
Co-authored-by: Michael Lam <michael@example.local>
2026-05-17 02:49:34 +00:00
Hermes Agent 026a9957f4 Stage 368: PR #2385 — Keep fuller compression snapshots reachable in sidebar by @franksong2702 2026-05-16 17:19:05 +00:00
Frank Song 4899ae17b9 Keep fuller compression snapshots reachable 2026-05-16 20:58:44 +08:00
Frank Song c415c843df Update interrupted recovery comment wording 2026-05-16 20:05:47 +08:00
Frank Song 49bea3ad01 Clarify interrupted turn recovery marker 2026-05-16 14:29:58 +08:00
Frank Song 40f69a2b75 Keep recovered pending turns in context 2026-05-16 04:07:02 +00:00
Hermes Agent 4826a31fbc Merge pull request #2285 into stage-359
fix: hide pre-compression snapshots from sidebar (dso2ng, refs #2230)

# Conflicts:
#	CHANGELOG.md
2026-05-15 14:55:19 +00:00
Dennis Soong bfccdc5c94 fix: hide pre-compression snapshots from sidebar 2026-05-15 11:20:17 +08:00
ai-ag2026 5110005324 fix: load CLI continuation session transcripts 2026-05-14 23:48:49 +02:00
Dennis Soong 143d9d8ef7 fix: reconcile stale sidebar display titles 2026-05-14 16:18:53 +08:00
starship-s 4084c3cf56 perf(sessions): cache CLI session scans 2026-05-12 11:24:29 -06:00
Frank Song 186453ea0e Add worktree-backed session creation 2026-05-11 12:12:40 +08:00
nesquena-hermes c624770c63 Stage 331: PR #2015 — fix(sessions): stitch continued session transcripts by @Jellypowered 2026-05-10 17:09:21 +00:00
Jellypowered 8aed650b4c Stitch continued session transcripts in WebUI 2026-05-10 11:10:54 -05:00
Frank Song 1bec8070f2 fix(1833): persist compression anchor summary for reload UI 2026-05-10 16:45:16 +08:00
Minimax 08c4ef8d88 feat: persistent composer draft — server-side, cross-client, survives refresh
- Session.composer_draft field: {text, files} stored in session JSON
- POST+GET /api/session/draft endpoint for save/load
- loadSession: save draft before switch, restore from S.session.composer_draft
- textarea input: debounced 400ms auto-save to server
- send(): clear draft after message is sent
- lockComposerForClarify(): save draft before card locks composer
- _restoreComposerDraft: clears textarea when target has no draft, guards
  against stale responses racing new session loads, exact text comparison
- Session.compact(): includes composer_draft in response
- Fix: use handler.command instead of parsed.method (ParseResult has no .method)

Co-authored-by: Minimax <noreply@minimax.io>
2026-05-09 13:47:57 +01:00
Frank Song 7e2709e281 fix: add request wedge diagnostics 2026-05-08 15:37:08 +00:00
ai-ag2026 755c18bdf9 fix: persist generated title refresh marker 2026-05-08 01:36:10 +02:00
Michael Lam 0bd65ef0bf fix: preserve CLI session tool metadata 2026-05-07 02:47:19 +00:00
ai-ag2026 8b34a79f02 fix: preserve imported session lineage visibility 2026-05-05 22:32:19 +02:00
Michael Lam c94ec31dec feat: show LLM Gateway routing metadata 2026-05-05 02:26:55 +00:00
Frank Song 8981d33543 Fix CLI session CI compatibility 2026-05-05 01:52:42 +00:00
Frank Song 79d0762d8c Filter low-value CLI agent sessions 2026-05-05 01:52:42 +00:00
Michael Lam e54a0470f0 Add Claude Code session imports 2026-05-05 01:18:34 +00:00
Michael Lam 876a670387 feat: add session save mode config 2026-05-04 14:05:49 -07:00
nesquena-hermes 040cb8af70 Apply Opus pre-release SHOULD-FIX + NITs (in-PR per release policy)
SHOULD-FIX: rate-limit _repair_stale_pending repair-firing telemetry. Switch
from unconditional logger.warning to age-keyed: WARNING when pending_age <
5min (the diagnostically valuable race window — actual leak-path candidates
that slipped past the grace guard) and DEBUG for the long-tail (orphaned
sidecars from prior process lifetimes). Prevents reconnect loops on stuck
sessions from flooding the log while preserving the diagnostic signal we
want for tuning _REPAIR_STALE_PENDING_GRACE_SECONDS empirically.

NIT: _LOCAL_SERVER_PROVIDERS expanded with lm-studio (hyphenated alias used
in some custom_providers configs and already recognized at api/config.py:2189
for SSRF host trust) and localai (LocalAI project). Test parametrize expanded
from 7 to 11 names, also covering pre-existing koboldcpp and textgen for
symmetry. +4 regression tests.

NIT (docs): CHANGELOG callout for the RFC1918 behavior change. Internal-
network OpenAI-compatible proxies now preserve the model prefix on private-IP
base_urls. Documented the migration path: configure as a custom_providers
entry to bypass the local-server detection.

NIT (deferred, optional): narrowing the heuristic to is_loopback only is
left as future work; the broader scope was an explicit goal in the bug
body and Opus flagged it as SHOULD-DISCUSS-but-not-block.

4184 -> 4188 passing. 0 regressions. ~10 LOC absorbed total.
2026-05-04 16:50:22 +00:00
nesquena-hermes bea57beba9 fix(streaming): SSE heartbeat alignment, repair grace period, local-server model id preservation (#1623, #1624, #1625)
Closes #1623 — Lower SSE app heartbeat from 30s to 5s at every long-lived
handler (main agent, terminal, gateway-watcher, approval-poller, clarify-poller).
Kernel TCP keepalive declares peer dead at 25s worst-case (10s KEEPIDLE +
5s KEEPINTVL * 3 KEEPCNT, added v0.50.289 #1581). 30s app heartbeat let the
kernel tear sockets down on flaky networks before the app sent its first
keepalive byte — drops at ~10s during long thinking phases. New named
constant _SSE_HEARTBEAT_INTERVAL_SECONDS=5; regression test pins the
inequality (app_heartbeat * 2 <= kernel_window) so future tuning can't
re-introduce the misalignment.

Closes #1624 — Add 30s grace period to _repair_stale_pending() trigger.
Without it, any narrow race between the streaming thread clearing
pending_user_message and STREAMS.pop(stream_id) produces a false-positive
'Previous turn did not complete.' marker on a turn that finished correctly
(reproducible after every command-approval turn). Defense-in-depth, not
the root-cause fix — the actual streaming-thread leak path is tracked
separately. Falsy pending_started_at (legacy sidecars) treated as
'old enough' so legitimate legacy-data recovery still works. Plus
logger.warning telemetry on every legitimate repair so the next batch of
user reports tells us whether the underlying race still fires.

Closes #1625 — Local model servers (LM Studio, Ollama, llama.cpp, vLLM,
TabbyAPI, koboldcpp, textgen-webui) now keep the full HuggingFace-style
model id (e.g. 'qwen/qwen3.6-27b' instead of stripped 'qwen3.6-27b'). New
_LOCAL_SERVER_PROVIDERS set + _base_url_points_at_local_server() loopback/
RFC1918 heuristic — either signal triggers no-strip. Backward compat
preserved for OpenAI-compatible proxies on public hosts (LiteLLM at
litellm.example.com still strips openai/gpt-5.4 -> gpt-5.4). Updated the
existing #230/#433 test to reflect that #1625 supersedes the strip-on-custom
rule for loopback hosts (see api/config.py and test_model_resolver.py
docstring update). Reported by @akarichan8231 in Discord on 2026-05-04.

42 regression tests across:
  tests/test_issue1623_sse_heartbeat_alignment.py (3)
  tests/test_issue1624_repair_stale_pending_grace.py (9)
  tests/test_issue1625_local_server_model_id_preservation.py (30)

4142 -> 4184 passing. 0 regressions.
2026-05-04 16:49:43 +00:00