Keep distinct generated summary categories, route update-summary generation through the configured auxiliary model first, disclose capped large-range summary input, and constrain long summary panels.
The FakeAgent in test_issue1857_usage_overwrite returned only 2 messages
(user + assistant) without the conversation history. The real agent always
returns the full history plus new messages. This mismatch caused the new
_has_new_assistant_reply helper (which checks only messages beyond the
pre-turn offset) to see len(result)==len(prev) and incorrectly flag the
turn as a silent failure.
Fix: prepend conversation_history to the FakeAgent's response so the
message list mirrors production behavior.
When a provider error (401/429/rate-limit) causes the agent to return
without producing a new assistant reply, the WebUI should emit an
apperror event so the user sees an inline error. However, the detection
logic scanned ALL messages in result['messages'] — which includes the
full conversation history. If any prior turn had an assistant response,
_assistant_added would be True and the apperror would be silently
skipped, leaving the user staring at a blank response.
Extract a helper _has_new_assistant_reply(all_messages, prev_count)
that only inspects messages beyond the pre-turn history offset. Apply
it to both the main detection path and the self-heal/retry path.
Tests: 15 new cases covering history masking, empty content, whitespace,
edge-case shrinks, and multi-assistant scenarios.
Opus identified that PR #2227's preservation block had two related bugs in
the parent_session_id handling:
1. During preservation save: code did
_old_parent = s.parent_session_id
s.parent_session_id = None
s.save(touch_updated_at=False, skip_index=True)
s.parent_session_id = _old_parent
The save persisted parent=None to disk. The in-memory restoration didn't
reach the disk copy. Result: a /branch fork session that subsequently
compressed lost its 'Forked from X' badge on the preserved old snapshot.
2. Stamping the continuation: code did
if not s.parent_session_id:
s.parent_session_id = old_sid
The 'if not' guard skipped the stamp when the session already had a
parent_session_id from a prior fork. Result: fork-of-fork compression
broke lineage — the continuation jumped back to the original fork parent
instead of the just-preserved immediate predecessor snapshot.
Fix (matches Opus's recommendation):
- Remove the parent clearing during preservation save (preserve as-is)
- Drop the 'if not' guard; always stamp continuation to old_sid
This makes the lineage chain consistent: new → old → old.parent → ... root.
Traversal from the continuation always walks through the just-preserved
snapshot to get to its parent's parent, never jumping over the snapshot.
Two new regression tests pin both invariants:
- test_parent_session_id_stamped_unconditionally (no 'if not' guard)
- test_old_session_parent_preserved_during_archive_save (no parent=None)
Both pass against the fix. All 8 tests in the file pass.
The previous implementation renamed old_sid.json → new_sid.json during
context compression, destroying the only persistent copy of the full
conversation history. If the summarisation LLM call also failed, the
user was left with zero recoverable messages.
Fix:
- Remove the destructive old_path.rename(new_path) call
- Preserve old_sid.json as an immutable pre-compression archive
- Create new_sid.json as a fresh file via s.save()
- Set parent_session_id on the continuation session for lineage
- Save in-memory messages to old_sid.json if they're newer than disk
Test: test_issue2223_compression_no_rename.py (6 tests, all passing)
Refs #2215 Fix A: replace plain dict _summary_cache with OrderedDict-based LRU capped at 16 entries to prevent unbounded memory growth from long-running update summary generations.
Add regression coverage for the bounded LRU behavior: cache hits refresh recency, a new entry at capacity evicts the least-recently used key, and cache size never exceeds the cap.
Refs #2215 Fix B: remove the mid-response stripping hazard without losing leading multi-line wrapper cleanup.
The pattern now strips only a leading 'the user is asking' wrapper line and preserves the visible answer that follows. Add regression coverage for both the leading-wrapper and mid-response prose cases.
perf(sessions): cache CLI session scans (starship-s)
Conflict resolution on api/routes.py:
(1) Master grew a new helper '_messages_include_tool_metadata()' that
pr-2149 doesn't have. Kept it (unrelated function — detects whether
returned messages contain tool metadata, used elsewhere).
(2) pr-2149 renames the CLI-metadata gate from '_needs_cli_session_metadata'
to '_session_requires_cli_metadata_lookup' AND broadens it to cover
legacy-imported sidecars with 'read_only=False' but persisted 'is_cli_session'
or session_source markers. The new gate is strictly more inclusive than
the master version — covers (a) is_cli_session, (b) read_only=True,
(c) session_source in {messaging, external_agent}, AND (d) source_tag,
raw_source, source, source_label, platform markers. All sessions that
previously took the slow path still do, plus a few more legacy shapes
that needed CLI metadata for correct display.
(3) Removed the obsolete '_needs_cli_session_metadata()' definition from
master (only consumer migrated to the new name).
29/29 tests pass across test_session_cli_scan_fast_path (new), claude_code
session import, session_index, and session_lineage_full_transcript.
Opus flagged that PR #2151's cancel-handler partial-dedup loop used a
substring check that was too broad: any short prior assistant reply
('OK', 'Here is the answer:') would dedup a longer new partial containing
it, silently dropping the partial and resurrecting the #893 data-loss bug.
Tightened to only dedup against actual prior _partial=True markers with
exact (whitespace-stripped) content match. Three new regression tests
added (short-non-partial-prefix-does-not-dedup, exact-partial-match-still-
dedups, same-content-non-partial-does-not-dedup).
10/10 partial-cancel tests pass after the fix. Also updated CHANGELOG with
the conflict-resolution notes for #2151 vs #2136 and the #2178 test-fix.
fix: clarify cancelled chat turn status (Jordan-SkyLF)
Conflict resolution on api/streaming.py:4549-4567 (the cancel-handler
ownership guard). Both this PR and the already-shipped PR #2136 add a
guard at the same site against stale stream writebacks, from different
angles:
- PR #2136 (HEAD): _stream_writeback_is_current(_cs, stream_id) — strictly
dominates by checking the active_stream_id token equality.
- PR #2151: 'worker won the race' check via (active_stream_id != stream_id
and not pending_user_message), with _emit_cancel_event = False to suppress
the terminal cancel event.
Resolution merges both: keep #2136's strictly-stronger condition for skip
detection, and adopt #2151's _emit_cancel_event = False semantic so the
cancel event isn't emitted in addition to skipping the writeback (when
client may have already received the successful done payload).
55/55 tests pass across cancelled-turn-status + stale-stream-writeback +
the four cancel/data-loss sibling test files.
Providers like Xiaomi MiMo, DeepSeek, and Kimi require reasoning_content
to be echoed back on every assistant message in multi-turn conversations
with tool calls. Omitting it causes HTTP 400: 'The reasoning_content in
the thinking mode must be passed back to the API.'
The WebUI's _sanitize_messages_for_api() strips all fields not in
_API_SAFE_MSG_KEYS before sending conversation history to the LLM API.
reasoning_content was not in this whitelist, so it was silently dropped.
The CLI path (run_agent.py) is unaffected because it has its own
_copy_reasoning_content_for_api() logic that operates on raw message
dicts without going through this filter. This is why the same session
works from CLI but fails from WebUI with HTTP 400.
The fix adds 'reasoning_content' to _API_SAFE_MSG_KEYS so the field
passes through sanitization intact.
HMAC length: create_session() now emits a full 64-char HMAC-SHA256 hex
digest instead of the truncated 32-char form. verify_session() accepts
both lengths during a transition window so existing sessions survive the
upgrade without a forced global logout. The legacy 32-char branch can be
removed once the default 30-day session TTL has elapsed.
Secure flag: introduce _is_secure_context(handler) to encapsulate the
env-var override and heuristic. Restores the getpeercert / X-Forwarded-Proto
heuristic that was present before this refactor, keeping the env-var
override (HERMES_WEBUI_SECURE) on top for proxy deployments that need
explicit control. The bare `return False` stub that the previous commit
left in place silently broke Secure-cookie delivery for all reverse-proxy
users who never set the env var.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Opus advisor flagged that PR #2171's credential prefilter only listed
specific DB scheme prefixes and form keys, letting OAuth callback URLs,
URL userinfo, signed-URL query params bypass the hard agent redactor.
Adding the generic '://' marker restores the WebUI-as-hard-safety-boundary
contract. Plain URLs without sensitive substrings still pass through
unchanged because the redactor itself only mutates sensitive substrings.
Regression-pinned with 5 new parametric cases in test_security_redaction.py
plus 1 negative-case companion. Verified test FAILS without the fix and
PASSES with it.
Concurrent failed logins raced on _login_attempts because no lock guarded
the dict. Add _LOGIN_ATTEMPTS_LOCK and wrap both _check_login_rate() and
_record_login_attempt() with it.
Extract _load_key() to de-duplicate key file I/O. Add _pbkdf2_key() that
loads .pbkdf2_key (separate from .signing_key) so PBKDF2 and HMAC signing
no longer share a key — key reuse across cryptographic primitives is unsafe.
Update _hash_password() to use _pbkdf2_key() as its default salt, with an
optional *salt* kwarg so verify_password() can try the legacy .signing_key
salt during transparent migration. When the old hash matches, save_settings()
re-hashes with _pbkdf2_key() and _invalidate_password_hash_cache() ensures
the next request sees the upgraded hash without a restart.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix(config): preserve nvidia/ prefix on NVIDIA NIM (closes#2177)
Self-built. nesquena APPROVED with extensive end-to-end trace including
cross-tool agent CLI verification and 12-shape behavioural harness.
Move the `_PORTAL_PROVIDERS` guard in `resolve_model_provider()` to run
BEFORE the `prefix == config_provider` strip branch. The guard was added
for NVIDIA (along with the Nous portal cases in #854 / #894) but was
placed after the strip, so it never fired when `config_provider == "nvidia"`
and the model id started with `nvidia/`.
For `model_id="nvidia/nemotron-3-super-120b-a12b"`,
`config_provider="nvidia"`:
- prefix = "nvidia", bare = "nemotron-3-super-120b-a12b"
- prefix == config_provider → True → strip branch returned bare name
- `_PORTAL_PROVIDERS` guard never reached
- bare "nemotron-3-super-120b-a12b" sent to NVIDIA NIM → HTTP 404
NIM requires the full namespaced path. The fix moves the portal guard
to run first, so all portal providers (Nous, OpenCode-Zen, OpenCode-Go,
NVIDIA NIM) always preserve the full `provider/model` id regardless of
whether the prefix happens to equal the provider name.
This also closes a latent symmetric bug for the Nous case if a
`nous/<model>` id ever existed in the catalog.
Test plan:
- New `tests/test_issue2177_nvidia_prefix_preservation.py` covers:
- nvidia/nemotron-... under nvidia (the reported case)
- cross-namespace qwen/ and meta/ under nvidia (regression pin)
- every static nvidia model in `_PROVIDER_MODELS` resolves to itself
- latent nous/<model> under nous (structural ordering pin)
- non-portal providers (anthropic) still strip — fix doesn't over-correct
- Existing portal-routing suites (test_nous_portal_routing.py,
test_issue895_894_nous_prefix.py) continue to pass.
- Full test suite: 5320 passed, 4 skipped, 3 xpassed.
Reported on Discord by @vishnu (Nathan forwarded as #2177).