mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-24 18:50:15 +00:00
58ad315dca
* fix(workspace): add .html/.htm to MIME_MAP so HTML preview renders correctly
MIME_MAP was missing entries for .html and .htm. The server fell back to
Content-Type: application/octet-stream, which browsers refuse to render as
HTML in an iframe — causing a blank white preview.
The rest of the pipeline was already correct: the iframe exists in
static/index.html, openFile() in static/workspace.js routes .html to
showPreview('html'), and _handle_file_raw() in api/routes.py sets the
correct CSP sandbox header when ?inline=1 is present. The only missing
piece was the MIME type.
* test(workspace): lock in MIME_MAP entry for .html/.htm
PR #1070 added .html/.htm → text/html to MIME_MAP in api/config.py
to fix the blank workspace HTML preview iframe. Without a direct
assertion on the MIME_MAP entries, the fix could silently regress
(the existing test_779_html_preview.py tests cover the iframe wiring,
the inline=1 query handling, and the CSP sandbox header — but none of
them touch MIME_MAP itself).
Add a single regression test that asserts MIME_MAP['.html'] and
MIME_MAP['.htm'] are both 'text/html' so any future removal of those
entries fails CI immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(composer): raise .approval-card.visible z-index above .queue-card
.queue-card has z-index:2. .approval-card.visible had no z-index, so the
queue flyout would render on top of the approval card when both were visible
simultaneously — obscuring the Allow/Deny buttons.
Fix: add z-index:3 to .approval-card.visible so approvals always render
above the queue flyout. Approval is a blocking, security-relevant interaction
and must never be obscured by passive UI elements.
* test(composer): pin approval-card z-index > queue-card invariant
PR #1071 raises .approval-card.visible to z-index:3 so the security-
relevant Allow / Deny buttons stay clickable when the queue flyout is
also open. Without a regression test, a future CSS edit could silently
drop the z-index back below queue-card (z-index:2) and reintroduce the
bug — there is no automated UI test covering this stacking interaction.
Add a focused regex check that pins the invariant:
.approval-card.visible z-index must be strictly greater than
.queue-card z-index.
Modeled on the existing CSS-regex regression style in
tests/test_mobile_layout.py (test_profile_dropdown_not_clipped_by_overflow).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: intercept /steer /interrupt /queue before busy-mode routing in send()
Root cause: slash commands entered while the agent is busy never reached
the command dispatcher. send() enters the busy block and returns early at
line ~50, so the slash-command intercept (~line 56) is never reached.
The text was queued as a plain message. When it drained after the turn
ended, cmdSteer / cmdInterrupt ran on an idle session, saw no active stream,
and showed "No active task to stop."
Fix: at the top of the busy block, before checking busyMode, check if the
text starts with / and is one of the three control commands. If so, dispatch
the handler immediately and return. This lets the user type /steer, /interrupt,
or /queue at any time — including while the agent is mid-stream — and have
them execute against the live session.
Two new regression tests added:
- test_slash_commands_intercepted_before_busymode_routing: verifies the
intercept appears before the busyMode routing in the busy block
- test_steer_intercept_calls_handler_directly: verifies the intercept calls
_bc.fn(_pc.args) and returns, not queues
* test(busy-intercept): pin sync input-clear before await in slash intercept
PR #1072's intercept clears the msg input before awaiting the handler.
Order matters: if the await happens first (or if the clear is moved
inside the handler), the input still shows '/steer foo' for the duration
of the await. A reflexive second Enter press during that window — common
while waiting for the toast — re-runs send(): either re-fires the
handler (double-steer) or, if the turn just ended, falls through to the
non-busy slash dispatcher and drops a confusing "No active task to stop."
Add test_steer_intercept_clears_input_before_await pinning the order so
this UX invariant cannot silently regress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: update steer i18n and settings copy — steer no longer interrupts
With the real /steer implementation (agent.steer() via /api/chat/steer),
steer injects a correction mid-turn WITHOUT interrupting the current stream.
The previous copy said "falls back to interrupt", "Steer (interrupt + send)",
etc. — accurate only for the old placeholder, not the real implementation.
Changes across all 6 locales (en/ru/es/de/zh/zh-Hant):
cmd_steer: "falls back to interrupt" removed
settings_busy_input_mode_steer: "interrupt + send" → "mid-turn correction"
cmd_steer_fallback: "interrupted" → "queued for next turn"
busy_steer_fallback: "interrupted instead" → "queued for next turn"
settings_desc_busy_input_mode: "currently falls back to interrupt" removed
Also:
static/index.html: inline fallback text updated to match
static/commands.js: internal comment clarified (fallback = queue+cancel,
not "interrupt mode" which implies the primary action)
* fix(renderer): group consecutive blockquote lines into single element
Root cause: the old rule `s.replace(/^> (.+)$/gm, ...)` had three bugs:
1. `.+` required at least one character — bare `>` lines (blank
continuation lines) did not match and passed through as literal `>`
2. Each matching line became its own `<blockquote>` element — a 10-line
blockquote produced 10 stacked `<blockquote>` tags with no grouping
3. When a fenced code block sat inside a blockquote, the fence-stash
pass consumed the code content and left orphaned `>` lines that the
old `.+` pattern could not match
Fix: replace the single-line regex with a group-based approach that matches
one or more consecutive `>` lines as a single block, strips the `>` prefix
from each line, passes each non-empty line through inlineMd(), turns blank
`>` lines into `<br>`, and wraps the entire group in one `<blockquote>`.
14 regression tests added covering:
- Single-line blockquotes (regression)
- Multi-line grouping (2 and 10 lines)
- Two separate blockquotes staying separate
- Bare `>` and `>text` (no space) edge cases
- Blank continuation lines → <br>
- Bold / italic / inline-code inside blockquotes
- Blockquote followed by normal paragraph
* fix(renderer): drop empty trailing line from blockquote match
The new group-based blockquote rule introduced in this PR captures the
trailing newline in its (?:\n|$) clause. After block.split('\n') that
trailing newline produces an empty final element. The original filter
only dropped lone bare '>' artifacts on the last line, so the empty
final element survived, and the .map(blank → '<br>') step turned it
into a phantom <br> immediately before </blockquote>.
Visible symptom: any blockquote whose source ends with \n (the common
case — a quote followed by another paragraph or end-of-message) renders
with an extra blank line at the bottom of the quote.
Reproducer:
'> Hello\n\nThe rest of the message.'
→ '<blockquote>Hello\n<br></blockquote>\nThe rest of the message.'
^^^ phantom <br>
Fix: replace the single-line filter with a while-loop that pops trailing
lines while they are either empty OR a bare '>'. This matches the
intent the Python test mirror in tests/test_blockquote_rendering.py
already had (the mirror was correct; the JS was not — that's why
the original tests passed despite the bug).
Also add four new regression tests in TestNoPhantomTrailingBr that pin
the no-trailing-<br> invariant for the common shapes:
- input ending with \n
- quote followed by paragraph (the real-world case)
- multi-line quote ending with \n
- quote with blank continuation + trailing \n (internal <br> stays,
trailing <br> does not)
Verified end-to-end with node against the actual JS regex.
244 renderer-adjacent tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(renderer): comprehensive markdown fixes — strikethrough, task lists, CRLF, nested blockquotes
Five additional fixes on top of the blockquote grouping from the initial commit:
1. CRLF normalisation: strip \r\n → \n at start of renderMd so Windows
line endings do not produce stray \r characters in rendered output
2. Strikethrough: ~~text~~ → <del>text</del> in both inlineMd() (for use
inside blockquotes/lists) and the outer pass (for plain paragraphs).
Added <del> to SAFE_TAGS and SAFE_INLINE so it is not HTML-escaped.
3. Task lists: - [x] / - [ ] items in unordered lists render as ✅/☐
via task-done/task-todo span wrappers. Checks [X] (uppercase) too.
4. Nested blockquotes: >> / >>> etc. now recurse so each level gets its
own <blockquote> element rather than passing through as literal >.
Implemented by extracting the blockquote rule into _applyBlockquotes()
which calls itself recursively on the stripped inner content.
5. Lists inside blockquotes: > - item now renders <ul><li> inside the
blockquote instead of a literal "- item" string. Task list items work
inside blockquotes too (> - [x] done → ✅ inside <blockquote><ul>).
Also fixed test_issue342.py search window (5000→10000 chars) — the CRLF
strip at the top of renderMd pushed the autolink regex past the old limit.
68 new tests in test_renderer_comprehensive.py + test_blockquote_rendering.py
covering all constructs, edge cases, and combinations.
* fix(renderer): restore space in blockquote prefix-strip regex
Commit 04e7b53 changed the blockquote prefix-strip regex from
/^>[ \t]?/ (consume "> ", "\t>", or just ">")
to
/^>[\t]?/ (only consume "\t>" or just ">")
The space character was dropped from the character class. Since
practically every blockquote an LLM produces is "> " (greater-than
followed by a space), this leaves a leading space artifact on every
stripped blockquote line. Worse, the leading space breaks the
list-detection regex `^(?: )?[-*+] ` inside the new `_applyBlockquotes`
helper — that regex requires either zero or two leading spaces, never
one — so the new "list inside blockquote" feature never fired for
the canonical input shape `> - item`.
Reproducer (against the actual ui.js via node, before the fix):
> Hello world → <blockquote> Hello world</blockquote>
^ phantom leading space
> Steps: → <blockquote>Steps:
> - one - one
> - two - two</blockquote>
^ literal text, NOT a <ul>; lists-in-quote feature broken
> - [x] done → blockquote with literal "[x] done", no checkbox span
Tests passed despite the bug because tests/test_blockquote_rendering.py
and tests/test_renderer_comprehensive.py validate against a Python
mirror (`_apply_blockquotes`) whose strip regex is `^>[ \t]?` — i.e.
the mirror is correct, the JS is not, and the static-mirror tests
can't catch the divergence. Same shape of bug as commit 94d63d0
(phantom <br> in trailing line) where the mirror was right and the JS
was wrong.
Fix: restore the space character in the strip regex's character class.
Add tests/test_renderer_js_behaviour.py — 11 tests that drive the
ACTUAL renderMd via node and assert on rendered output for the most
common LLM shapes (single-line quote, multi-line quote, list inside
quote, task list inside quote, nested >>>, strikethrough inside and
outside quote, top-level task list, quote followed by heading,
multi-paragraph quote with list, CRLF normalisation).
Verified: the buggy regex makes 6 of those 11 tests fail; the corrected
regex makes all 11 pass.
Suite: 2354 passed, 0 new failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Collapse agent session compression chains
* Restore upstream changelog entries
* fix(agent_sessions): bubble active compression chains to top by tip last_activity
The original PR merge kept the chain head's id/title/started_at and overrode
id/model/message_count/ended_at/end_reason from the tip — but did NOT override
last_activity. Since the projected list is sorted by last_activity DESC and
the WebUI sidebar surfaces updated_at = last_activity, an actively-used
compression chain whose tip is being edited NOW would sort by the ROOT's
old last_activity and fall below recently touched standalone sessions.
Reproducer (with the harness against actual code, before the fix):
- root: started 30 days ago, last msg 30 days ago
- tip: started 28 days ago (parent_session_id=root), last msg 5 seconds ago
- standalone: last msg 2 days ago
Sidebar order with original PR:
[0] standalone (48h ago)
[1] active_tip (last_activity=root's 720h ago) ← wrong
Sidebar order after fix:
[0] active_tip (last_activity=tip's 0h ago) ← correct
[1] standalone (48h ago)
This matches Hermes Agent's own list_sessions_rich projection at
hermes_state.py:903-909, which overrides "last_active" from the tip
exactly so that the agent CLI's session list orders the same way.
Add ``last_activity`` to the merge-from-tip key list, update the existing
test_compression_chain_collapses_to_latest_tip_in_sidebar assertion to
expect tip-derived updated_at, and add
test_compression_chain_bubbles_to_top_by_tip_activity locking in the
bubble-to-top invariant — without this regression test the previous
behaviour passed CI because no test exercised the sort order against a
mixed set of chains and standalone sessions.
The chain head's started_at (created_at) and title remain preserved, so
users can still find the conversation by its original date and name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: v0.50.216 release notes and version bump
Compression chains, renderer fixes, HTML preview, approval z-index, /steer fix.
* chore: gitignore local-only review harness directory
Adds .local-review/ to .gitignore so renderer drivers, sample inputs,
fixture builders, and other reviewer scratch files do not accidentally
get committed. Nothing under that path is ever shared in the repo;
keeping the entry tracked makes the boundary explicit for any future
contributor who creates the directory locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Keep reasoning chip visible for None effort
* test(reasoning): pin chip render output via node, not just source regex
The PR's static checks in test_reasoning_chip_btw_fixes.py validate the
shape of _applyReasoningChip (no display='none' literal, the right
classList.toggle call exists, the right label literals are in the
function body) but pass even if the runtime detail is wrong — for
example if `inactive` were inverted, _normalizeReasoningEffort
mishandled whitespace, or _formatReasoningEffortLabel returned the
wrong literal for an unknown input.
Add tests/test_reasoning_chip_js_behaviour.py — 11 tests that drive
the actual _applyReasoningChip() via node and assert on the rendered
DOM state for each effort value:
TestChipAlwaysVisible
- empty / null -> "Default" label, inactive=true
- "none" -> "None" label, inactive=true
- "low"/"high" -> verbatim label, inactive=false
TestNormalizationEdgeCases
- "NONE" -> normalises to "None"
- " none " -> trims and normalises
- unknown junk -> falls through visible, never hidden
TestTitleAttributeAccessibility
- title attribute carries the human-readable label for tooltip /
screen-reader use
Sanity-checked against master's pre-fix ui.js: 11/11 fail (bug caught).
Against this PR's ui.js: 11/11 pass.
This pattern (drive the actual JS via node) caught two regex-only
regressions in PR #1073 where the Python mirror was correct while the
JS was broken. Same protection added here so the chip-visibility
contract can't silently break in a future refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: add #1074 to v0.50.216 changelog, bump test count to 2428
* fix(i18n): restore broken Unicode in Russian and Spanish steer strings
Commit 56c7a14 (fix: update steer i18n and settings copy) accidentally
stripped the `\u` prefix from Unicode escape sequences in two locales,
producing garbled literal hex strings visible to users:
Spanish (es):
- cmd_steer: correcci00f3n → corrección
- cmd_steer_fallback: 2014 en cola → — en cola
- busy_steer_fallback: 2014 en cola → — en cola
- settings_desc_busy_input_mode: qu00e9, est00e1, correcci00f3n → qué, está, corrección
- settings_busy_input_mode_steer: correcci00f3n → corrección
Russian (ru):
- settings_desc_busy_input_mode: the entire Cyrillic string was
replaced with raw 4-hex-char code-points without the \u prefix
(041e043f... instead of actual Cyrillic). Decoded:
"Определяет поведение при отправке сообщения во время работы
агента. Очередь ждёт; Прерывание отменяет и начинает заново;
Steer внедряет коррекцию без прерывания."
Fix: write the correct characters directly (UTF-8 is the file encoding
so embedding them literally is cleaner than \u escapes for long text).
All other locales (en, de, zh, zh-Hant) were not affected — confirmed
by grepping for bare hex run-ons in the updated file.
Verified: node --check static/i18n.js passes; full pytest suite green
(2365 passed, 47 skipped).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: remove duplicate compression chain entry from [Unreleased]
---------
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Frank Song <franksong2702@gmail.com>
356 lines
16 KiB
Python
356 lines
16 KiB
Python
"""Regression tests for PR #934 UI fixes.
|
|
|
|
Four invariants this file locks in place:
|
|
|
|
1. `#composerReasoningDropdown` lives OUTSIDE `.composer-left` (as a sibling of
|
|
the other composer dropdowns), so it isn't clipped by that container's
|
|
`overflow-y: hidden`. Regresses to invisible-dropdown if moved back.
|
|
|
|
2. The reasoning chip label uses an SVG icon (`stroke="currentColor"`) instead
|
|
of the `🧠` emoji, matching every other composer chip.
|
|
|
|
3. `cmdReasoning()` calls `_applyReasoningChip(eff)` directly with the
|
|
server-confirmed effort, not `syncReasoningChip()` which re-applies the
|
|
stale cached value.
|
|
|
|
4. `attachBtwStream()` sets a `_streamDone` flag in `done`/`apperror` and
|
|
gates `onerror`'s row removal on `!_streamDone` — otherwise the browser's
|
|
post-`stream_end` error event wipes the just-rendered answer.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import pathlib
|
|
import re
|
|
|
|
|
|
REPO = pathlib.Path(__file__).resolve().parent.parent
|
|
INDEX = (REPO / "static" / "index.html").read_text(encoding="utf-8")
|
|
UI_JS = (REPO / "static" / "ui.js").read_text(encoding="utf-8")
|
|
COMMANDS_JS = (REPO / "static" / "commands.js").read_text(encoding="utf-8")
|
|
MESSAGES_JS = (REPO / "static" / "messages.js").read_text(encoding="utf-8")
|
|
STYLE_CSS = (REPO / "static" / "style.css").read_text(encoding="utf-8")
|
|
|
|
|
|
# ── #1 dropdown escapes composer-left ─────────────────────────────────────────
|
|
|
|
|
|
class TestReasoningDropdownEscapesComposerLeft:
|
|
"""The dropdown must sit as a sibling of .composer-footer, not inside
|
|
.composer-left which has overflow-y: hidden and clips absolute children."""
|
|
|
|
def test_dropdown_lives_outside_composer_left(self):
|
|
# Find the <div class="composer-left">...</div> block and confirm the
|
|
# reasoning dropdown is NOT inside it.
|
|
m = re.search(
|
|
r'<div class="composer-left"[^>]*>(?P<body>[\s\S]*?)<div class="composer-footer-right"',
|
|
INDEX,
|
|
)
|
|
# Some templates use different closing structures; fall back to a
|
|
# coarser search that at least locates composer-left.
|
|
if m:
|
|
inner = m.group("body")
|
|
assert 'id="composerReasoningDropdown"' not in inner, (
|
|
"composerReasoningDropdown is still nested inside .composer-left — "
|
|
"this is the exact bug #933 flagged: overflow-y: hidden clips "
|
|
"upward-opening absolute dropdowns. Move it alongside "
|
|
"#composerModelDropdown / #composerWsDropdown / #profileDropdown."
|
|
)
|
|
# Either way, check that the dropdown sits next to the other composer
|
|
# dropdowns (reliable structural marker).
|
|
assert '<div class="profile-dropdown" id="profileDropdown"></div>' in INDEX
|
|
assert 'id="composerReasoningDropdown"' in INDEX
|
|
|
|
def test_dropdown_is_sibling_of_other_composer_dropdowns(self):
|
|
# The four composer-level dropdowns must appear contiguously — if one
|
|
# of them is nested inside an overflow-hidden container, this would
|
|
# typically split the group.
|
|
positions = [
|
|
("profileDropdown", INDEX.find('id="profileDropdown"')),
|
|
("composerWsDropdown", INDEX.find('id="composerWsDropdown"')),
|
|
("composerReasoningDropdown", INDEX.find('id="composerReasoningDropdown"')),
|
|
("composerModelDropdown", INDEX.find('id="composerModelDropdown"')),
|
|
]
|
|
for name, pos in positions:
|
|
assert pos > -1, f"{name} not found in index.html"
|
|
# They should all be in the same area of the document — within ~1.5 KB
|
|
window = [p for _, p in positions]
|
|
assert max(window) - min(window) < 2000, (
|
|
"composer dropdowns are no longer grouped — reasoning dropdown may "
|
|
"have drifted back inside a nested container"
|
|
)
|
|
|
|
|
|
# ── #2 monochrome SVG replaces emoji ──────────────────────────────────────────
|
|
|
|
|
|
class TestReasoningChipIcon:
|
|
"""The chip must render a currentColor SVG, not a 🧠 emoji, for cross-platform
|
|
rendering consistency with the other composer chips."""
|
|
|
|
def test_chip_button_contains_svg_with_currentColor(self):
|
|
# Locate the chip button and confirm it contains a stroke="currentColor" SVG
|
|
m = re.search(
|
|
r'<button class="composer-reasoning-chip"[^>]*>([\s\S]*?)</button>',
|
|
INDEX,
|
|
)
|
|
assert m, "composer-reasoning-chip button not found"
|
|
btn_body = m.group(1)
|
|
assert 'stroke="currentColor"' in btn_body, (
|
|
"reasoning chip must use stroke='currentColor' SVG matching other chips"
|
|
)
|
|
assert '<svg' in btn_body, "reasoning chip must contain an <svg> icon"
|
|
|
|
def test_apply_reasoning_chip_label_has_no_emoji(self):
|
|
# Locate _applyReasoningChip and confirm the label assignment doesn't
|
|
# concatenate a 🧠 emoji.
|
|
m = re.search(
|
|
r"function\s+_applyReasoningChip\b[\s\S]*?^\}",
|
|
UI_JS,
|
|
re.MULTILINE,
|
|
)
|
|
assert m, "_applyReasoningChip not found in ui.js"
|
|
fn = m.group(0)
|
|
assert "🧠" not in fn, (
|
|
"_applyReasoningChip should not concatenate a 🧠 emoji into the label — "
|
|
"the chip already has a monochrome SVG icon next to the label"
|
|
)
|
|
|
|
|
|
# ── #1068 None/default reasoning chip stays visible ──────────────────────────
|
|
|
|
|
|
class TestReasoningChipNoneState:
|
|
"""Reasoning effort is a current setting like model selection. Setting it
|
|
to None disables reasoning, but the chip must remain visible so users can
|
|
see and change the current level."""
|
|
|
|
def get_apply_reasoning_chip(self):
|
|
m = re.search(
|
|
r"function\s+_applyReasoningChip\b[\s\S]*?^}",
|
|
UI_JS,
|
|
re.MULTILINE,
|
|
)
|
|
assert m, "_applyReasoningChip not found in ui.js"
|
|
return m.group(0)
|
|
|
|
def test_none_and_default_do_not_hide_reasoning_chip(self):
|
|
fn = self.get_apply_reasoning_chip()
|
|
assert "wrap.style.display='';" in fn, (
|
|
"_applyReasoningChip must show the reasoning chip even for empty/"
|
|
"default or 'none' effort values"
|
|
)
|
|
assert "if(!eff" not in fn and "wrap.style.display='none'" not in fn, (
|
|
"_applyReasoningChip must not use a truthy guard that hides the "
|
|
"chip for the valid 'none' state"
|
|
)
|
|
assert "wrap.style.display='none'" not in fn, (
|
|
"the None/default reasoning state should be visible, not hidden"
|
|
)
|
|
|
|
def test_none_and_default_have_visible_labels(self):
|
|
assert "if(effort==='none') return 'None';" in UI_JS, (
|
|
"the disabled reasoning state must render a visible 'None' label"
|
|
)
|
|
assert "if(!effort) return 'Default';" in UI_JS, (
|
|
"the unset reasoning state must render a visible 'Default' label"
|
|
)
|
|
|
|
def test_none_and_default_are_visually_inactive_not_missing(self):
|
|
fn = self.get_apply_reasoning_chip()
|
|
assert "chip.classList.toggle('inactive',inactive)" in fn, (
|
|
"None/default should be shown with an inactive visual treatment "
|
|
"instead of removing the chip"
|
|
)
|
|
assert ".composer-reasoning-chip.inactive" in STYLE_CSS, (
|
|
"the inactive chip state needs a CSS rule so the visible None/"
|
|
"default state is intentionally muted"
|
|
)
|
|
|
|
|
|
# ── #3 /reasoning immediately updates chip ────────────────────────────────────
|
|
|
|
|
|
class TestReasoningCommandUpdatesChip:
|
|
"""cmdReasoning must apply the SERVER-CONFIRMED effort, not the cached value."""
|
|
|
|
def test_cmd_reasoning_calls_apply_not_sync(self):
|
|
# Locate cmdReasoning and verify the success branch calls
|
|
# _applyReasoningChip(eff) directly, not syncReasoningChip() which
|
|
# would read stale _currentReasoningEffort.
|
|
m = re.search(
|
|
r"function\s+cmdReasoning\b[\s\S]*?(?=^function\s|\Z)",
|
|
COMMANDS_JS,
|
|
re.MULTILINE,
|
|
)
|
|
assert m, "cmdReasoning not found in commands.js"
|
|
fn = m.group(0)
|
|
assert "_applyReasoningChip(eff)" in fn, (
|
|
"cmdReasoning must call _applyReasoningChip(eff) with the "
|
|
"server-confirmed effort from the /api/reasoning POST response"
|
|
)
|
|
|
|
|
|
# ── #4 /btw answer not wiped by onerror after clean close ─────────────────────
|
|
|
|
|
|
class TestBtwStreamDoneGuard:
|
|
"""attachBtwStream must guard onerror with a _streamDone flag so the
|
|
browser's post-stream_end error event doesn't wipe the just-rendered row."""
|
|
|
|
def get_attach_btw(self):
|
|
m = re.search(
|
|
r"function\s+attachBtwStream\b[\s\S]*?(?=^function\s|\Z)",
|
|
MESSAGES_JS,
|
|
re.MULTILINE,
|
|
)
|
|
assert m, "attachBtwStream not found in messages.js"
|
|
return m.group(0)
|
|
|
|
def test_stream_done_flag_declared(self):
|
|
fn = self.get_attach_btw()
|
|
assert "_streamDone" in fn, (
|
|
"attachBtwStream must declare a _streamDone flag to distinguish "
|
|
"clean server-closed streams from real errors"
|
|
)
|
|
|
|
def test_stream_done_set_in_done_handler(self):
|
|
fn = self.get_attach_btw()
|
|
# Inside the 'done' listener body, _streamDone must be set true.
|
|
done_block_m = re.search(
|
|
r"addEventListener\('done'[\s\S]*?(?=addEventListener\(')",
|
|
fn,
|
|
)
|
|
assert done_block_m, "done handler not found in attachBtwStream"
|
|
assert "_streamDone=true" in done_block_m.group(0) or \
|
|
"_streamDone = true" in done_block_m.group(0), (
|
|
"_streamDone must be set to true in the done handler so onerror "
|
|
"knows the stream completed successfully"
|
|
)
|
|
|
|
def test_onerror_gated_on_stream_done(self):
|
|
fn = self.get_attach_btw()
|
|
# onerror must NOT unconditionally call btwRow.remove()
|
|
m = re.search(r"src\.onerror\s*=\s*\(?\)?\s*=>\s*\{[^}]*\}", fn)
|
|
assert m, "src.onerror assignment not found"
|
|
handler = m.group(0)
|
|
assert "_streamDone" in handler, (
|
|
"src.onerror must check !_streamDone before removing the btw row — "
|
|
"otherwise the browser's post-stream_end error fire wipes the "
|
|
"answer that was just rendered by the done handler"
|
|
)
|
|
|
|
def test_ensure_btw_row_called_in_done(self):
|
|
"""The done handler must create the row even if no token events arrived
|
|
(e.g., agent returned a non-streaming single-shot answer)."""
|
|
fn = self.get_attach_btw()
|
|
done_block_m = re.search(
|
|
r"addEventListener\('done'[\s\S]*?(?=addEventListener\(')",
|
|
fn,
|
|
)
|
|
assert done_block_m
|
|
assert "_ensureBtwRow()" in done_block_m.group(0), (
|
|
"done handler must call _ensureBtwRow() so the answer bubble exists "
|
|
"even if no token events arrived before done"
|
|
)
|
|
|
|
def test_ensure_btw_row_gated_on_session_match(self):
|
|
"""Regression for PR #935: _ensureBtwRow reads $('msgInner') — the
|
|
CURRENTLY-viewed session's container. If the user switched sessions
|
|
during the /btw stream, creating a bubble here would put it in the
|
|
wrong session. The done handler must guard on S.session.session_id
|
|
matching the parent sid before creating a new bubble.
|
|
"""
|
|
fn = self.get_attach_btw()
|
|
done_block_m = re.search(
|
|
r"addEventListener\('done'[\s\S]*?(?=addEventListener\(')",
|
|
fn,
|
|
)
|
|
assert done_block_m
|
|
block = done_block_m.group(0)
|
|
# The _ensureBtwRow call must be guarded by a session-match check
|
|
assert ("S.session" in block and "parentSid" in block), (
|
|
"_ensureBtwRow() in the done handler must be gated on "
|
|
"S.session.session_id === parentSid — otherwise a user who "
|
|
"switched sessions during the /btw stream gets the answer "
|
|
"bubble injected into the wrong session's container"
|
|
)
|
|
|
|
def test_stream_done_set_before_close_in_done(self):
|
|
"""Regression for PR #935 defensive ordering: _streamDone=true must be
|
|
set BEFORE src.close() in the done handler. Setting it after works
|
|
today because EventSource.close() is synchronous per spec and doesn't
|
|
dispatch events, but the defensive-correct ordering is flag-first so
|
|
no future browser quirk or event-queue race can bypass the guard.
|
|
"""
|
|
fn = self.get_attach_btw()
|
|
done_block_m = re.search(
|
|
r"addEventListener\('done'[\s\S]*?(?=addEventListener\(')",
|
|
fn,
|
|
)
|
|
assert done_block_m
|
|
block = done_block_m.group(0)
|
|
flag_pos = block.find("_streamDone=true")
|
|
close_pos = block.find("src.close()")
|
|
assert flag_pos > -1 and close_pos > -1
|
|
assert flag_pos < close_pos, (
|
|
"_streamDone=true must be set BEFORE src.close() in the done handler "
|
|
"so any event the browser fires during close() sees the flag already set"
|
|
)
|
|
|
|
def test_stream_done_set_before_close_in_apperror(self):
|
|
"""Same defensive ordering as test_stream_done_set_before_close_in_done,
|
|
applied to the apperror handler."""
|
|
fn = self.get_attach_btw()
|
|
apperror_m = re.search(
|
|
r"addEventListener\('apperror'[\s\S]*?(?=addEventListener\(')",
|
|
fn,
|
|
)
|
|
assert apperror_m
|
|
block = apperror_m.group(0)
|
|
flag_pos = block.find("_streamDone=true")
|
|
close_pos = block.find("src.close()")
|
|
assert flag_pos > -1 and close_pos > -1, (
|
|
"apperror handler must both set _streamDone=true and call src.close()"
|
|
)
|
|
assert flag_pos < close_pos, (
|
|
"_streamDone=true must be set BEFORE src.close() in the apperror handler"
|
|
)
|
|
|
|
def test_stream_end_sets_stream_done(self):
|
|
"""Regression for PR #935/#939/#942: stream_end handler must also set
|
|
_streamDone=true. Today the ephemeral /btw path returns before emitting
|
|
stream_end so this is moot, but if the server emits stream_end as a
|
|
standalone terminator (e.g. for a non-ephemeral /btw variant), the
|
|
subsequent browser-fired onerror would wipe the bubble without the flag.
|
|
"""
|
|
fn = self.get_attach_btw()
|
|
m = re.search(r"addEventListener\('stream_end'[\s\S]*?\}\s*\)", fn)
|
|
assert m, "stream_end handler not found"
|
|
block = m.group(0)
|
|
assert "_streamDone=true" in block or "_streamDone = true" in block, (
|
|
"stream_end handler must set _streamDone=true before closing the "
|
|
"connection — consistent with the done/apperror handlers"
|
|
)
|
|
|
|
|
|
# ── #5 resize handler symmetry (non-blocking polish) ─────────────────────────
|
|
|
|
|
|
class TestResizeHandlerSymmetry:
|
|
"""When the window resizes while either the model OR reasoning dropdown is
|
|
open, the dropdown must be re-positioned so it stays aligned under its chip."""
|
|
|
|
def test_resize_repositions_reasoning_dropdown(self):
|
|
# The global resize handler must handle both composerModelDropdown AND
|
|
# composerReasoningDropdown to keep them aligned when the window resizes.
|
|
m = re.search(
|
|
r"window\.addEventListener\(\s*['\"]resize['\"][\s\S]*?\}\s*\)\s*;",
|
|
UI_JS,
|
|
)
|
|
assert m, "window resize handler not found in ui.js"
|
|
handler = m.group(0)
|
|
assert "composerReasoningDropdown" in handler, (
|
|
"window resize handler must also re-position composerReasoningDropdown "
|
|
"while it's open (symmetric with the existing model-dropdown branch)"
|
|
)
|