Pre-release Opus review caught three correctness bugs in the original
PR #1350 SSE wiring beyond the snapshot/subscribe race:
A) **Notify-ordering race (MUST-FIX A):** _approval_sse_notify took _lock
only for the subscriber-list snapshot, then released it before
put_nowait. With two parallel submit_pending calls, T2's notify
could fire before T1's, leaving the UI showing pending_count=1 while
the server actually had 2 queued.
C) **Trailing approval lost (MUST-FIX C):** _handle_approval_respond
never called _approval_sse_notify after popping. With parallel
tool-call approvals (#527), a second approval queued behind the one
being responded to was invisible until the next event ever fired —
in practice, the agent thread parked on it would appear hung.
D) **Payload showed tail not head (MUST-FIX D):** payload built from
the just-appended entry instead of queue[0]. /api/approval/pending
returns the head; SSE returned the tail. Diverging contracts.
Fix:
- Split into _approval_sse_notify_locked (caller holds _lock, no
internal locking) and _approval_sse_notify (convenience wrapper).
- submit_pending: call _locked variant inside the queue-mutation lock,
passing queue_list[0] as head.
- _handle_approval_respond: call _locked variant inside the pop lock,
passing the new head (or None/0 if queue is empty).
- Restore fallback poll to 1500ms (was bumped to 3000ms; degraded-mode
parity with v0.50.247 is more important than save 1.5s of polling).
New regression tests in tests/test_pr1350_sse_notify_correctness.py:
- test_second_submit_pending_sends_head_not_tail (D)
- test_respond_to_first_pushes_second_as_new_head (C)
- test_respond_to_only_pending_pushes_empty_state (C edge)
- test_pending_count_is_monotonic_under_contention (A)
Updated test_approval_sse.py to pin the new contract:
- _approval_sse_notify_locked(session_key, head, total)
- 1500ms fallback interval
Total: 3411 tests passing.
Co-authored-by: jasonjcwu <jasonjcwu@users.noreply.github.com>
Changes _pending from a single overwriting dict value to a list,
so parallel tool calls each get their own approval slot.
api/routes.py:
- Wraps submit_pending() to append to a list and assign a stable
approval_id (uuid4) to each entry.
- _handle_approval_pending() returns the first queued entry plus
pending_count so the UI can show '1 of N'.
- _handle_approval_respond() pops by approval_id (falls back to
oldest entry for backward-compat with old clients).
- Backward-compat: legacy single-dict values in _pending are
handled without crashing.
static/messages.js:
- respondApproval() sends approval_id in the POST body.
- showApprovalCard() accepts pendingCount, shows '1 of N pending'
counter when multiple approvals are queued.
- _approvalCurrentId tracks the approval_id of the displayed card.
- Poll loop passes pending_count to showApprovalCard.
static/index.html:
- Adds approvalCounter element for the '1 of N' display.
tests/test_approval_queue.py:
- 14 tests: static-analysis checks (Python + JS + HTML),
functional tests that inject two simultaneous approvals and
verify both are surfaced and independently resolvable.