mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 11:10:18 +00:00
75a26174aa
When the WebUI process restarts mid-stream and sidecar repair runs while
the run-journal for the dead stream is not yet visible on disk (WSL2 9p
/ DrvFs page-cache loss, un-fsynced journal tail on network FS, …),
`_append_journaled_partial_output()` returns False and the marker is
permanently baked with the "no agent output was recovered" wording even
though the journaled tokens appear on disk shortly afterwards.
This commit reframes the recovery contract so the read side can
self-heal:
* `_interrupted_recovery_marker` gains a `pending_retry=True` mode
that produces a third wording ("Recovering the partial output …
reload this session to retry.") and stamps a
`_pending_journal_recovery` flag.
* `_apply_core_sync_or_error_marker` now writes that pending-retry
marker (with `_journal_retry_stream_id`,
`_journal_retry_attempts`, `_journal_retry_first_seen_ts` meta)
whenever it cannot recover visible output AND the stream id is
known. The legacy "no output" wording is reserved for the
no-stream-id case. The core-sync branch leaves marker emission to
the existing visible-output check (the core transcript itself is the
canonical history in that branch).
* A new `_retry_journal_recovery_in_place(session)` helper re-runs
`_append_journaled_partial_output(…, dedupe_existing=True)` for the
latest pending marker. On success the marker is promoted in place to
the recovered-output wording, the journaled rows are reordered to
sit above the marker (preserving chronological order), and all
retry meta is stripped. On failure attempts is incremented; after
_JOURNAL_RETRY_MAX_ATTEMPTS (12) or _JOURNAL_RETRY_GIVEUP_SECONDS
(24h) the marker is demoted to a neutral "Partial output may have
been lost." wording.
* `get_session()` cheaply short-circuits via
`_session_has_pending_journal_retry()` and invokes the helper on
both cache-hit and cold-load paths when a pending marker is found.
`metadata_only=True` skips the helper to keep sidebar refresh
cheap. The retry call runs OUTSIDE the SESSIONS LOCK to avoid a
deadlock with `session.save()` write paths.
No streaming write path or run_journal fsync behaviour is changed — the
fix is read-side only.