Opus stage-360 review caught that the docstring at api/streaming.py:40-43
said 'around the entire agent run' which is no longer accurate after the
narrow-lock refactor. The lock is now held only briefly for the env-mutation
critical section; the agent runs outside the lock and the finally block
re-acquires to atomically restore env vars.
Docstring now points to both narrow-lock implementations as references:
- _run_agent_streaming at line ~2719 (the original pattern)
- profile_env_for_background_worker at api/profiles.py:715 (added stage-360)
#2299 introduced profile_env_for_background_worker() in api/profiles.py and
changed _ENV_LOCK from threading.Lock() to threading.RLock(). Both changes
were incorrect:
1. RLock masked rather than fixed the underlying deadlock. The QA
test_env_lock_is_non_reentrant test exists precisely to enforce
non-reentrance — RLock would let a single thread hold _ENV_LOCK across
nested critical sections, which hides bugs while still allowing
different-thread races.
2. The original context manager held _ENV_LOCK for the ENTIRE 'yield'
duration, meaning the lock was held for the full background worker's
runtime (title generation, compression, update summary — possibly
many seconds). That blocked ALL other sessions on _ENV_LOCK, which
the QA test_third_message_completes runtime test caught as a timeout
on the third sequential message.
Fix: mirror the narrow-lock pattern from _run_agent_streaming:
- Acquire _ENV_LOCK only for env mutation (set runtime_env + patch
skill modules)
- Release immediately, yield to worker (no lock held)
- Reacquire in finally to restore env + skill modules
Restored _ENV_LOCK back to threading.Lock(). All 20 QA tests now pass,
including test_third_message_completes (was timing out, now 35s).
Keep distinct generated summary categories, route update-summary generation through the configured auxiliary model first, disclose capped large-range summary input, and constrain long summary panels.
The FakeAgent in test_issue1857_usage_overwrite returned only 2 messages
(user + assistant) without the conversation history. The real agent always
returns the full history plus new messages. This mismatch caused the new
_has_new_assistant_reply helper (which checks only messages beyond the
pre-turn offset) to see len(result)==len(prev) and incorrectly flag the
turn as a silent failure.
Fix: prepend conversation_history to the FakeAgent's response so the
message list mirrors production behavior.
When a provider error (401/429/rate-limit) causes the agent to return
without producing a new assistant reply, the WebUI should emit an
apperror event so the user sees an inline error. However, the detection
logic scanned ALL messages in result['messages'] — which includes the
full conversation history. If any prior turn had an assistant response,
_assistant_added would be True and the apperror would be silently
skipped, leaving the user staring at a blank response.
Extract a helper _has_new_assistant_reply(all_messages, prev_count)
that only inspects messages beyond the pre-turn history offset. Apply
it to both the main detection path and the self-heal/retry path.
Tests: 15 new cases covering history masking, empty content, whitespace,
edge-case shrinks, and multi-assistant scenarios.