_build_native_multimodal_message() unconditionally embedded images as
native image_url parts, bypassing the agent's image_input_mode config.
Add _resolve_image_input_mode(cfg) helper mirroring the agent's
decide_image_input_mode logic, and wire it into
_build_native_multimodal_message with a new cfg parameter.
When mode resolves to 'text' (explicit aux vision config, or
image_input_mode: text), returns plain string so the agent's
existing text-mode pipeline (vision_analyze) handles images.
Closes#1959
The two get_model_context_length() fallback callsites in api/streaming.py
(session save + SSE usage payload) were calling the resolver with only
model + base_url. When the agent's compressor reports 0 (fresh/cached/
transitioning agent), resolution fell through to the 256K DEFAULT_FALLBACK
even when users had set model.context_length: 1048576 in config.yaml.
For LCM users on 1M-context models, the wrong window cascaded into a
session-killing failure: auto-compression triggered at ~25% of the wrong
value, floods of compress requests, 429s, credential pool exhaustion,
fallback 429s, then 'API call failed after 3 retries'.
Reported by @AvidFuturist on Discord with deepseek-v4-flash. Reproduced 5x.
Both callsites now pass config_context_length, provider, and
custom_providers. The resolver consults these BEFORE probing, so the
config override wins. Both are wrapped in except TypeError blocks that
retry with the legacy 2-arg form for older hermes-agent builds whose
get_model_context_length signature pre-dates these kwargs.
Tests: 7 source-string regressions guarding both call shapes, the safe
config parse, the legacy fallback, and the per-profile config source.
Also bumped the line-distance assertion in test_pr1341 (the test
explicitly invites bumping when a new pre-save mutation block is added).
Closes#1896
Co-authored-by: Hermes Agent <agent@hermes.local>
Same-session profile switches reused cached AIAgent from previous profile,
silently leaking the old persona's SOUL.md / system prompt into the new
profile's turns. session_id stays stable across profile switches, and the
signature didn't include the active profile home, so every signature input
matched and the stale agent was returned from SESSION_AGENT_CACHE.
Append _profile_home to the signature blob so profile switches force a
cache miss and a fresh agent build under the new HERMES_HOME (which
triggers a fresh load_soul_md() call).
Tests: 3 source-string regressions guarding the signature contract,
ordering, and empty-home fallback.
Closes#1897
Co-authored-by: Hermes Agent <agent@hermes.local>
Read agent.max_turns when constructing streaming WebUI AIAgent instances, pass it as max_iterations when supported, and include it in the per-session agent cache signature so budget changes take effect.
Add regression coverage for the config read, constructor kwarg, and cache key.
Closes#1695.
@Patrick-81 reported the bare "AIAgent not available -- check that
hermes-agent is on sys.path" error on a symlinked install (~/Programmes/hermes-agent
linked to ~/hermes-agent). The maintainer's response — three diagnostic
commands plus `pip install -e .` in the agent dir — fixed it for them.
This PR captures both halves of that learning so the next user with the
same shape doesn't have to file a new issue:
1. **Error message diagnostic block.** New helper
`_aiagent_import_error_detail()` in api/streaming.py builds a multi-line
diagnostic when the import fails, including:
- the running Python interpreter
- HERMES_WEBUI_AGENT_DIR (set value, or "(not set)")
- sys.path entries that mention hermes/agent (or "no entries mention..."
— itself a strong diagnostic signal)
- the most-common fix (`pip install -e .` in the agent dir)
- a pointer to docs/troubleshooting.md
The original error message string is preserved as the FIRST line so
existing log scrapers and docs-search keep matching.
Helper is kept as a separate function so it stays out of the hot path
until we actually need to raise — building it on every successful import
would be wasted work.
2. **New docs/troubleshooting.md.** Symptom → Why → Diagnostic commands →
Fix → When-to-file-a-bug template, with one entry to start: the
"AIAgent not available" flow Patrick-81 walked through. Future
recurring failure modes follow the same template. Required a one-line
addition to .gitignore — docs/* is gitignored with an allowlist, and
the new file needed `!docs/troubleshooting.md` to be tracked.
3. **README link.** docs/troubleshooting.md added to the `## Docs` section
so users know where to look first.
13 regression tests in tests/test_1695_aiagent_import_error_detail.py:
9 for the helper output shape (preserves original message line, includes
running python, shows HERMES_WEBUI_AGENT_DIR set/unset both ways, includes
pip-install-e hint, points at troubleshooting doc, lists relevant sys.path
entries when present, says "no entries..." when absent, output is multi-line)
plus 4 for the docs-presence regression (file exists, has the AIAgent
section, includes pip install -e ., describes the diagnostic chain with
readlink + agent/__init__.py verification).
190 streaming/aiagent tests pass after the change. ast.parse on
api/streaming.py clean.
CI failure on prior push was due to the docs/* gitignore swallowing the
new troubleshooting.md file silently — this commit adds the allowlist
entry so the file is tracked.
Switch the per-turn duration fallback from `is not None` to a truthy check so
None, missing-attr, and an explicit 0 all uniformly fall back to time.time().
Without this, a 0 timestamp (e.g. via a buggy migration or manual file edit)
would yield `time.time() - 0` ≈ wall-clock-since-epoch, displaying nonsense
like 'Done in 56 years 4 months ...'. In practice pending_started_at is always
set via int(time.time()) so this is a hardening fix, not a live-bug fix.
Also drop the brittle source-string assertion in the regression test that
pinned the literal expression. The behavioural test
test_done_handler_persists_duration_on_last_assistant_message already proves
the duration field is set; pinning the source line broke twice during the
v0.50.290 release pipeline alone (Opus tightening + maintainer revert).
Fixes#1595
Signed-off-by: Sanjay Santhanam <51058514+Sanjays2402@users.noreply.github.com>
Spliced from #1531 by @Asunfly: take Change-1 only (the actual bug fix +
cache signature inclusion) and skip Change-2 (auxiliary title-route
extra_body change) which is a separate scope concern.
## What
Two surgical fixes in api/streaming.py:
1. Line 1820 — `_cfg.cfg.get(...)` → `_cfg.get(...)`. `get_config()` returns
a plain dict (not a wrapper exposing `.cfg`). The buggy line raised
AttributeError that the surrounding try/except swallowed, so
`_reasoning_config` was always None regardless of what `/reasoning
<level>` had been set to. Verified locally — `api/streaming.py:1959`
already correctly used `_cfg.get(...)` in the same function, so the
same `_cfg` was being read two different ways in one file.
2. Line 1888 — added `_reasoning_config or {}` to `_sig_blob`. Without
this, switching effort mid-session would fail to take effect because
the per-session agent cache key would still match the old entry.
Mirrors how `resolved_provider` / `resolved_base_url` already
participate in the signature.
## Why splice instead of merge #1531 directly
@Asunfly force-pushed a Change-2 onto #1531 after the original review
that removes `extra_body={"reasoning": {"enabled": False}}` from
`generate_title_raw_via_aux` (the auxiliary title-generation route).
That intent is reasonable (let operator-configured `extra_body.reasoning`
flow through to the title route) but it touches a different surface and
deserves its own PR.
The narrow concern is operators who selected a reasoning-capable
auxiliary title model without explicitly setting
`reasoning.enabled=False` in the task config — pre-Change-2 the WebUI
defended against accidental reasoning on the title hot path; post-Change-2
those configs would reason on every new conversation`s title, with cost
and latency implications.
## What is NOT in this PR
- The `generate_title_raw_via_aux` extra_body refactor (Change-2 from #1531).
- The `test_does_not_override_configured_reasoning_extra_body` test (guards
Change-2). Asunfly can re-open that as its own focused PR.
## Tests
Two new R17b/R17c regression assertions in tests/test_regressions.py:
- `test_streaming_reads_reasoning_effort_from_config_dict` — static-source
guard: `_cfg.cfg` must not return to streaming.py
- `test_streaming_agent_cache_signature_includes_reasoning_config` —
catches removal of `_reasoning_config` from `_sig_blob`
## Closes
- Closes#1531 (the Change-1 portion ships here; Asunfly can re-open
Change-2 as a separate PR if desired)
Co-authored-by: Asunfly <[email protected]>
Read configured max_tokens from config.yaml, pass it into WebUI-created AIAgent instances when supported, and include it in the agent cache signature. Also classify OpenRouter quota phrasing such as more credits, can only afford, and fewer max_tokens.
Adds regression coverage for max_tokens propagation, cache signature isolation, and quota error classification.
PR #1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a
sibling leak at the LRU eviction site that I caught during pre-review:
api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding
the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db
was dropped on the floor and only released when GC eventually finalized
the agent — which on a long-running server may be never (cyclic refs,
extension types holding C handles, etc.).
Same fix shape as #1421: capture the evicted entry, call
_evicted_agent._session_db.close() explicitly. SessionDB.close() is
idempotent + thread-safe (with self._lock: if self._conn:), so the
double-close-is-benign property still holds.
5 regression tests in test_v050259_sessiondb_fd_leak.py:
- Source-level: cached-agent reuse path closes before replace
- Source-level: LRU eviction path captures + closes evicted agent
- Behavioral: SessionDB.close() is idempotent (3 calls safe)
- Behavioral: cached-agent reuse with mock — close called exactly once
- Behavioral: LRU eviction with mock — only evicted agent's DB closes
Full suite: 3615 passed, 0 failed.
Nathan explicitly authorized 'just go ahead and merge it as a small release'
since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and
matches the empirically-confirmed leak shape (73-handle leak at EMFILE).
SessionDB WAL handles leak when streaming.py creates a new SessionDB
instance per request and replaces the cached agent's _session_db without
closing the old one. Each orphaned connection holds 2 FDs (.db +
.db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages.
Fix: close the previous _session_db before replacing it on cached
agents, mirroring the close-before-replace pattern used elsewhere in the
codebase.
Opus pre-release advisor caught a 5th issue not covered by my initial
follow-up sweep, this one CRITICAL: PR #1402#493 per-session toolset
override silently no-op'd every time.
Bug: api/streaming.py:1755 called _session_meta.get('enabled_toolsets') on
the result of Session.load_metadata_only(). It returns a Session INSTANCE,
not a dict. .get() raised AttributeError, which the surrounding bare
except swallowed silently. The toolset chip in the UI saved correctly to
disk, but the streaming agent always ran with global toolsets.
Fix: use getattr(_session_meta, 'enabled_toolsets', None).
Two new regression tests:
- Source-level: forbid the .get() / [] dict-access shape.
- Runtime: Session.load_metadata_only must return a Session instance.
Full suite: 3604 passed, 0 failed.
Persist session model_provider separately from model IDs so active/default provider selections like gpt-5.5 remain bare while routing through OpenAI Codex. Keep @provider:model for picker disambiguation and runtime bridging, and preserve explicit OpenRouter plus custom/proxy base_url routing.
Opus pass-2 review of v0.50.251 caught a critical regression in PR
#1375:
The cancel-partial message stored captured tool calls under the
'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so
_sanitize_messages_for_api forwarded the entries to the next-turn
LLM call. But the captured entries use the WebUI internal shape
({name, args, done, duration, is_error}) — they don't have the
OpenAI/Anthropic id + function: {name, arguments} envelope. Strict
providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed
entries. Net effect: the very cancel-then-continue scenario PR
#1375 aimed to improve becomes a hard fail.
Fix:
- Rename the persisted key to '_partial_tool_calls' (underscore-
prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize
correctly strips it).
- Update static/messages.js hasMessageToolMetadata check to also
recognize _partial_tool_calls for UI rendering.
- Update test_issue1361_cancel_data_loss.py assertion to check
_partial_tool_calls (and tool_calls as legacy fallback).
Plus 2 NIT fixes from the same Opus review:
NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency
with other _PROFILE_ID_RE callers in the codebase. The trailing-
newline footgun ($ matches before final \n in re.match) is now
closed. Without #1373's is_dir() guard, a name like 'valid\n' would
have created a directory named 'valid\n' on Linux. Doesn't escape
<HERMES_HOME>/profiles/ via Path joining, but unintended.
NIT 2 (test_issue798.py): R19j coverage gaps — added trailing-
newline tests, length-boundary tests (64-char valid, 65-char
rejected), single-char minimum, and non-ASCII / Unicode-trick tests.
New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py):
- test_partial_tool_calls_field_not_forwarded_to_llm: pins that
sanitize-for-API strips _partial_tool_calls + reasoning + does
NOT have tool_calls on a partial message
- test_legitimate_tool_calls_are_preserved_for_completed_turns:
pins that real OpenAI-shape tool_calls on completed turns survive
sanitize unchanged
Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
Three distinct data-loss paths fixed:
§A — Reasoning text was accumulated in a thread-local _reasoning_text
inside _run_agent_streaming. cancel_stream() never saw it because it
went out of scope when the thread was interrupted. Now mirrored to a
new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated
in on_reasoning() and the reasoning branch of on_tool(), read in
cancel_stream().
§B — Live tool calls in thread-local _live_tool_calls were similarly
invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS
on tool.started + tool.completed.
§C — Reasoning-only streams produced no partial message because the
thinking-block regex strip returned empty string and the `if _stripped:`
guard skipped the append. Now appends the partial message when EITHER
content text, reasoning trace, OR tool calls exist.
Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly:
same dict creation in _run_agent_streaming, same _live_config fallback
in cancel_stream, same cleanup in _periodic_checkpoint.
8 regression tests in tests/test_issue1361_cancel_data_loss.py
covering all three sections plus tools+text combinations.
Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247).
- api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter.
- static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'.
- static/messages.js clears 'Uploading...' composer status after upload completes.
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
* fix(streaming): fallback to model_metadata for context_length when compressor missing (#1318 follow-up)
PR #1318 (shipped in v0.50.246 via PR #1341 + commit a5c10d5) persisted
context_length on the session so the context-ring indicator survives
page reloads. But the writer only fired when agent.context_compressor
was present and reported a non-zero value. Fresh agents, interrupted
streams, or compressors without the attribute would still leave
s.context_length=0 — and the indicator would still show 0% on reload.
This follow-up adds a fallback that calls
agent.model_metadata.get_model_context_length(model, base_url) when the
compressor didn't populate the value. The function returns a sensible
static context window for any known model (with a 256K default for
unknown models). Wrapped in a broad try/except because older
hermes-agent builds may not expose the helper.
Sourced from PR #1344 (@jasonjcwu) — extracted into this focused
follow-up after #1344 was closed as superseded by #1341.
Adds 6 structural tests covering: import + call presence, falsy-gate,
agent.model/base_url passing, exception swallowing, save() ordering,
result assignment.
Closes the data-flow gap in #1318 for the compressor-missing case.
* test: relax pr1341 block-size assertion to accommodate the new fallback
---------
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Pre-release Opus + nesquena review on v0.50.246 caught that PR #1341
added the data-structure scaffolding (Session.__init__ accepts the 3
fields, save() persists them, compact() exposes them, GET /api/session
returns them) but did NOT add the writer that actually populates them.
Without a writer, the user-visible bug (context-ring shows 0% after
page reload) was NOT fixed by #1341 alone — the fields stayed None
forever because nothing wrote to s.context_length anywhere.
Adds the writer at api/streaming.py:2188 (post-merge per-turn save block,
before s.save()) so the values from agent.context_compressor land on
disk and survive page reloads.
Also moves the SSE usage payload comment to clarify that the live SSE
payload and the session-level persistence are now distinct paths
(payload below, persistence above).
Adds tests/test_pr1341_context_window_persistence.py — 6 structural +
round-trip tests covering Session __init__/save/compact, the routes
response, and the streaming.py writer placement.
Closes#1318 (the actual user-visible bug, not just the scaffolding).
Pre-release Opus review on v0.50.246 caught a SHOULD-FIX in PR #1338's
cancel_stream synthesis: the symmetric substring guard
(_pending_user in _last_content OR _last_content in _pending_user) was too
loose. Common confirmation replies ("ok", "yes", "go") in the prior turn
would match longer follow-up prompts ("ok please continue"), the synthesis
would be skipped, and the user's typed text would be lost — exactly the
data-loss bug #1298 was supposed to fix.
The fix: gate the substring check on a timestamp comparison. Only treat
the latest user turn as 'already merged by the streaming thread' if its
timestamp is at or after pending_started_at. Earlier turns whose content
happens to be a substring of the pending must not short-circuit synthesis.
Also drops the symmetric (_last_content in _pending_user) branch — that
direction was the false-positive vector. Keeps the equality and prefix
match (workspace-prefix tolerance from the streaming thread).
Adds tests/test_issue1298_cancel_and_activity.py::
test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending —
regression for the exact 'ok' → 'ok please continue' scenario.
From PR #1338. Already independently APPROVED by nesquena before being absorbed into v0.50.246.
CHANGELOG entries from this PR were dropped during squash (the v0.50.245 section is already
shipped); they will be re-added under [v0.50.246] in the release commit.
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>