Keep distinct generated summary categories, route update-summary generation through the configured auxiliary model first, disclose capped large-range summary input, and constrain long summary panels.
The FakeAgent in test_issue1857_usage_overwrite returned only 2 messages
(user + assistant) without the conversation history. The real agent always
returns the full history plus new messages. This mismatch caused the new
_has_new_assistant_reply helper (which checks only messages beyond the
pre-turn offset) to see len(result)==len(prev) and incorrectly flag the
turn as a silent failure.
Fix: prepend conversation_history to the FakeAgent's response so the
message list mirrors production behavior.
When a provider error (401/429/rate-limit) causes the agent to return
without producing a new assistant reply, the WebUI should emit an
apperror event so the user sees an inline error. However, the detection
logic scanned ALL messages in result['messages'] — which includes the
full conversation history. If any prior turn had an assistant response,
_assistant_added would be True and the apperror would be silently
skipped, leaving the user staring at a blank response.
Extract a helper _has_new_assistant_reply(all_messages, prev_count)
that only inspects messages beyond the pre-turn history offset. Apply
it to both the main detection path and the self-heal/retry path.
Tests: 15 new cases covering history masking, empty content, whitespace,
edge-case shrinks, and multi-assistant scenarios.
Opus identified that PR #2227's preservation block had two related bugs in
the parent_session_id handling:
1. During preservation save: code did
_old_parent = s.parent_session_id
s.parent_session_id = None
s.save(touch_updated_at=False, skip_index=True)
s.parent_session_id = _old_parent
The save persisted parent=None to disk. The in-memory restoration didn't
reach the disk copy. Result: a /branch fork session that subsequently
compressed lost its 'Forked from X' badge on the preserved old snapshot.
2. Stamping the continuation: code did
if not s.parent_session_id:
s.parent_session_id = old_sid
The 'if not' guard skipped the stamp when the session already had a
parent_session_id from a prior fork. Result: fork-of-fork compression
broke lineage — the continuation jumped back to the original fork parent
instead of the just-preserved immediate predecessor snapshot.
Fix (matches Opus's recommendation):
- Remove the parent clearing during preservation save (preserve as-is)
- Drop the 'if not' guard; always stamp continuation to old_sid
This makes the lineage chain consistent: new → old → old.parent → ... root.
Traversal from the continuation always walks through the just-preserved
snapshot to get to its parent's parent, never jumping over the snapshot.
Two new regression tests pin both invariants:
- test_parent_session_id_stamped_unconditionally (no 'if not' guard)
- test_old_session_parent_preserved_during_archive_save (no parent=None)
Both pass against the fix. All 8 tests in the file pass.
The previous implementation renamed old_sid.json → new_sid.json during
context compression, destroying the only persistent copy of the full
conversation history. If the summarisation LLM call also failed, the
user was left with zero recoverable messages.
Fix:
- Remove the destructive old_path.rename(new_path) call
- Preserve old_sid.json as an immutable pre-compression archive
- Create new_sid.json as a fresh file via s.save()
- Set parent_session_id on the continuation session for lineage
- Save in-memory messages to old_sid.json if they're newer than disk
Test: test_issue2223_compression_no_rename.py (6 tests, all passing)
Refs #2215 Fix A: replace plain dict _summary_cache with OrderedDict-based LRU capped at 16 entries to prevent unbounded memory growth from long-running update summary generations.
Add regression coverage for the bounded LRU behavior: cache hits refresh recency, a new entry at capacity evicts the least-recently used key, and cache size never exceeds the cap.
Refs #2215 Fix B: remove the mid-response stripping hazard without losing leading multi-line wrapper cleanup.
The pattern now strips only a leading 'the user is asking' wrapper line and preserves the visible answer that follows. Add regression coverage for both the leading-wrapper and mid-response prose cases.
perf(sessions): cache CLI session scans (starship-s)
Conflict resolution on api/routes.py:
(1) Master grew a new helper '_messages_include_tool_metadata()' that
pr-2149 doesn't have. Kept it (unrelated function — detects whether
returned messages contain tool metadata, used elsewhere).
(2) pr-2149 renames the CLI-metadata gate from '_needs_cli_session_metadata'
to '_session_requires_cli_metadata_lookup' AND broadens it to cover
legacy-imported sidecars with 'read_only=False' but persisted 'is_cli_session'
or session_source markers. The new gate is strictly more inclusive than
the master version — covers (a) is_cli_session, (b) read_only=True,
(c) session_source in {messaging, external_agent}, AND (d) source_tag,
raw_source, source, source_label, platform markers. All sessions that
previously took the slow path still do, plus a few more legacy shapes
that needed CLI metadata for correct display.
(3) Removed the obsolete '_needs_cli_session_metadata()' definition from
master (only consumer migrated to the new name).
29/29 tests pass across test_session_cli_scan_fast_path (new), claude_code
session import, session_index, and session_lineage_full_transcript.
Opus flagged that PR #2151's cancel-handler partial-dedup loop used a
substring check that was too broad: any short prior assistant reply
('OK', 'Here is the answer:') would dedup a longer new partial containing
it, silently dropping the partial and resurrecting the #893 data-loss bug.
Tightened to only dedup against actual prior _partial=True markers with
exact (whitespace-stripped) content match. Three new regression tests
added (short-non-partial-prefix-does-not-dedup, exact-partial-match-still-
dedups, same-content-non-partial-does-not-dedup).
10/10 partial-cancel tests pass after the fix. Also updated CHANGELOG with
the conflict-resolution notes for #2151 vs #2136 and the #2178 test-fix.
fix: clarify cancelled chat turn status (Jordan-SkyLF)
Conflict resolution on api/streaming.py:4549-4567 (the cancel-handler
ownership guard). Both this PR and the already-shipped PR #2136 add a
guard at the same site against stale stream writebacks, from different
angles:
- PR #2136 (HEAD): _stream_writeback_is_current(_cs, stream_id) — strictly
dominates by checking the active_stream_id token equality.
- PR #2151: 'worker won the race' check via (active_stream_id != stream_id
and not pending_user_message), with _emit_cancel_event = False to suppress
the terminal cancel event.
Resolution merges both: keep #2136's strictly-stronger condition for skip
detection, and adopt #2151's _emit_cancel_event = False semantic so the
cancel event isn't emitted in addition to skipping the writeback (when
client may have already received the successful done payload).
55/55 tests pass across cancelled-turn-status + stale-stream-writeback +
the four cancel/data-loss sibling test files.
Providers like Xiaomi MiMo, DeepSeek, and Kimi require reasoning_content
to be echoed back on every assistant message in multi-turn conversations
with tool calls. Omitting it causes HTTP 400: 'The reasoning_content in
the thinking mode must be passed back to the API.'
The WebUI's _sanitize_messages_for_api() strips all fields not in
_API_SAFE_MSG_KEYS before sending conversation history to the LLM API.
reasoning_content was not in this whitelist, so it was silently dropped.
The CLI path (run_agent.py) is unaffected because it has its own
_copy_reasoning_content_for_api() logic that operates on raw message
dicts without going through this filter. This is why the same session
works from CLI but fails from WebUI with HTTP 400.
The fix adds 'reasoning_content' to _API_SAFE_MSG_KEYS so the field
passes through sanitization intact.