mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-21 03:39:54 +00:00
4d7fc0f37cedeecb02a8bda05d2b6eb6987b7bbc
340 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dd2d1ba5e6 |
refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool
Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to
match the design: user-initiated rescan, no prompt-cache reset, no new
schema surface, no phantom user turn, and the next-turn note carries each
added/removed skill's 60-char description (not just its name).
Changes vs the original PR:
* Drop the in-process skills prompt-cache clear in reload_skills(). Skills
are invoked at runtime via /skill-name, skills_list, or skill_view —
they don't need to live in the system prompt for the model to use them.
Keeping the cache intact preserves prefix caching across the reload so
/reload-skills pays no cache-reset cost. (MCP has to break the cache
because tool schemas must be known at conversation start; skills do not.)
* Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from
tools/skills_tool.py, plus the four skills_reload enumerations in
toolsets.py. No new schema surface — agents can already see a freshly-
installed skill via skill_view / skills_list the moment it's on disk.
* Replace the phantom 'role: user' turn injection with a one-shot queued
note. CLI uses self._pending_skills_reload_note (same pattern as
_pending_model_switch_note, prepended to the next API call and cleared).
Gateway uses self._pending_skills_reload_notes[session_key]. The note
is prepended to the NEXT real user message in this session, so message
alternation stays intact and nothing out-of-band is persisted to the
transcript.
* reload_skills() now returns added/removed as
[{'name': str, 'description': str}, ...] (description truncated to 60
chars — matches the curator / gateway adapter budget). The injected
next-turn note formats each entry as 'name — description' so the model
can actually reason about which new skills to call without running
skills_list first.
* Only emit the note when the diff is non-empty. On empty diff, print
'No new skills detected' and do nothing else.
* Tests rewritten to cover the queue semantics, the description payload,
and a regression guard that the prompt-cache snapshot is preserved.
|
||
|
|
7966560fb5 |
feat(skills): /reload-skills slash command + skills_reload agent tool
Adds a public reload path for the in-process skill caches so newly installed (or removed) skills become visible mid-session without a gateway restart. Mirrors the shape of /reload-mcp. Three surfaces: * /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py), with /reload_skills alias for Telegram autocomplete and an explicit Discord registration. * skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents pick up freshly-installed skills via tool call. * agent.skill_commands.reload_skills() — shared helper that clears _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the on-disk .skills_prompt_snapshot.json, then returns an added/removed diff plus the new total count. Tested: * tests/agent/test_skill_commands_reload.py (9 cases) * tests/cli/test_cli_reload_skills.py (3 cases) * tests/gateway/test_reload_skills_command.py (4 cases) Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop skills into ~/.hermes/skills mid-session, plus agentic flows where the agent itself installs a skill via the shell tool and needs it bound without a gateway restart. The Python helper clear_skills_system_prompt_cache(clear_snapshot=True) already exists internally — this PR just exposes it via slash command and tool. |
||
|
|
31f70d1f2a |
fix(ci): recover 38 failing tests on main (#17642)
CI Tests workflow has been red on main for 40+ consecutive runs. This commit recovers every failure visible in run 25130722163 (most recent completed run prior to this PR). Root causes, by group: Test-mock drift after product landed (fix: update mocks) - test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests): product added _rpc_lock (#02ae15222) and _schedule_tools_refresh (#1350d12b0) without updating sibling test files. Install a real asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh. - test_session.py: renamed normalize_whatsapp_identifier → canonical_ whatsapp_identifier upstream; keep a local alias so the legacy tests keep working. - test_run_progress_topics Slack DM test: PR #8006 made Slack default tool_progress=off; explicitly set it to 'all' in the test fixture so the progress-callback path still runs. Also read tool_progress_callback at call time rather than freezing it in FakeAgent.__init__ — production assigns it AFTER construction. - test_tui_gateway_server session-create/close race: session.create now defers _start_agent_build behind a 50ms timer — wait for the build thread to enter _make_agent before closing, otherwise the orphan- cleanup path never runs. - test_protocol session.resume: product get_messages_as_conversation now takes include_ancestors kwarg; accept **_kwargs in the test stub. - test_copilot_acp_client redaction: redactor is OFF by default (snapshots HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True for the duration of the test. - test_minimax_provider: after #17171, dots in non-Anthropic model names stay dots even with preserve_dots=False. Assert the new invariant rather than the old 'broken for MiniMax' behavior. - test_update_autostash: updater now scans `ps -A` for dashboard PIDs; the test's catch-all subprocess.run stub needed stdout/stderr fields. - test_accretion_caps: read_timestamps dict is populated lazily when os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate CI filesystems where the stat races file creation. Change-detector tests (fix: rewrite as structural invariants) - test_credential_sources_registry_has_expected_steps: was a frozen set comparison that broke when minimax-oauth was added. Rewrite as an invariant check (every step has description, no dupes, core steps present) per AGENTS.md 'don't write change-detector tests'. xdist ordering / test pollution (fix: reset state, use module-local patches) - test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to os.environ via save_env_value() and never cleared it. monkeypatch.delenv the VERCEL_* vars in the link-file test. - test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real /proc/version often contains 'microsoft'. Patching builtins.open with mock_open didn't reliably intercept hermes_constants.is_wsl's call in xdist workers that had already cached _wsl_detected=True from an earlier test. Patch hermes_constants.open directly and add teardown_method to reset the cache after each test. Pytest-asyncio cancellation hangs (fix: bound product await with timeout) - test_session_split_brain_11016 (3 params) + test_gateway_shutdown cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and 'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled task's finally block awaits typing-task cleanup. Bound both with asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers are released from adapter tracking and allowed to finish unwinding in the background. This is also a legitimate hardening: a wedged finally shouldn't stall the caller's dispatch or a gateway shutdown. Orphan UI config (fix: merge tiny tab into messaging category) - test_web_server test_no_single_field_categories: the telegram.reactions config field lived in its own 'telegram' schema category with no siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard doesn't render an orphan single-field tab. Local verification: 38/38 originally-failing tests pass; 4044/4044 gateway tests pass; 684/684 targeted subset (all 16 touched test files) passes. |
||
|
|
c5a5e586d7 | fix(gemini): nest OpenAI-compat thinking config under google | ||
|
|
fa3338c171 |
test(anthropic): regression guard for DeepSeek /anthropic thinking replay
Covers the #16748 fix: - unsigned thinking blocks synthesised from reasoning_content survive replay - non-latest assistant turns keep their thinking (DeepSeek validates every turn) - signed Anthropic blocks are stripped (DeepSeek can't validate them) - cache_control is stripped from thinking blocks - OpenAI-compat base (api.deepseek.com without /anthropic) is NOT matched - non-DeepSeek third parties (minimax) keep the generic strip-all behaviour |
||
|
|
0a5ee01e48 |
fix(hindsight): route flush-on-switch through writer queue, not raw thread
Follow-up to the cherry-picked PR #17447. The original flush spawned a bare threading.Thread for the buffer-flush path, overwriting self._sync_thread — which is aliased to the long-lived writer thread. Two consequences: 1. No serialization with the writer queue. If old-session retains were still queued in _retain_queue, the flush ran concurrently with the writer and both threads could call aretain_batch against the same document_id. 2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the long-lived writer, which never exits, so the join was a no-op that just timed out — never actually serialized anything. Fix: enqueue the flush closure on _retain_queue via _ensure_writer + put(). Natural FIFO ordering behind any pending retains, no new thread, no broken join. Shutdown-aware so it doesn't enqueue after teardown. Tests updated to drain via _retain_queue.join() instead of the stale _sync_thread.join(). Added regression guard test_flush_serializes_behind_pending_retains_via_writer_queue that blocks the writer mid-retain to prove the flush waits in FIFO behind the old retain. Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on the bare-object test helper in test_memory_session_switch.py so that path doesn't blow up under the new queue-enqueue. tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing. |
||
|
|
c38dac742b |
fix(hindsight): flush buffered turns and drop stale prefetch on session switch
Two data-loss / leak gaps in HindsightMemoryProvider.on_session_switch introduced by #17409. 1. Buffered turns silently lost when retain_every_n_turns > 1. on_session_switch unconditionally cleared _session_turns without flushing. Users who batched every N>1 turns and switched mid-batch (/reset, /new, /resume, /branch, or context compression) had those buffered turns disappear. Same data-loss class as the shutdown race, different lifecycle event. Note commit_memory_session() -> on_session_end() runs *before* on_session_switch on /reset, but Hindsight doesn't implement on_session_end so the buffer survives that step and dies at clear time. /resume, /branch, and compression skip commit_memory_session entirely so an on_session_end impl wouldn't help them anyway. Fix: snapshot the old _session_id, _document_id, _parent_session_id, _turn_index, and _session_turns; spawn one final retain that lands under the OLD document_id; then rotate state. Metadata is built synchronously against the old self._* so session_id / lineage tags on the flushed item all reference the prior session consistently. 2. Stale _prefetch_result leaks across switch. If queue_prefetch ran in the old session and the result hadn't been consumed by prefetch() yet, on_session_switch left the cached recall text in place. The next session's first prefetch() call would return text mined from the prior session's bank/query. Fix: join any in-flight _prefetch_thread (3s bounded — matches shutdown()), then clear _prefetch_result under _prefetch_lock before rotating session_id. Tests ----- - tests/plugins/memory/test_hindsight_provider.py (TestSessionSwitchBufferFlush): - buffered turns flushed under OLD document_id with OLD lineage tags - empty buffer => no spurious retain - _prefetch_result cleared on switch - in-flight prefetch thread is awaited before clear (no race) - tests/agent/test_memory_session_switch.py: factory extended to seed the attrs the new flush path reads (_retain_source, _platform, _bank_id, prefetch state, etc.) and stub _run_hindsight_operation so existing switch-state assertions keep passing without network setup. |
||
|
|
1bedc836b5 |
docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507)
The ~/.openclaw/ detection banner (#16327) had two problems flagged in #16629: 1. It only pitched 'hermes claw cleanup' (destructive archive) and never mentioned 'hermes claw migrate' — the actual non-destructive path that ports config/memory/skills into Hermes. 2. The copy anthropomorphized the bug ('the agent can still get confused', 'dutifully reads') and framed OpenClaw as a competitor to eliminate ('instead of Hermes's'). Rewrite so migrate leads, cleanup is a clearly-labelled follow-up with a warning that archiving breaks OpenClaw for users still running it. Closes #16629 |
||
|
|
83c288da01 |
fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455)
The guard that drops Anthropic's `thinking` kwarg for Kimi endpoints was matched on `https://api.kimi.com/coding` only. Users configuring a custom Kimi-compatible gateway (or an official Moonshot host) with `api_mode: anthropic_messages` fall through to the generic third-party path, which strips thinking blocks AND still sends `thinking={enabled,...}` → upstream rejects with HTTP 400 "reasoning_content is missing in assistant tool call message at index N" on the next request after a tool call. Replace `_is_kimi_coding_endpoint` callers (history replay + thinking kwarg gate) with `_is_kimi_family_endpoint(base_url, model)` that also matches the `api.kimi.com` / `moonshot.ai` / `moonshot.cn` hosts and Kimi/Moonshot family model names (`kimi-`, `moonshot-`, `k1.`, `k2.`, …) for custom / proxied endpoints. Keeps the UA-header check in `build_anthropic_client` URL-only — the `claude-code/0.1.0` header is an official-Kimi contract. Plumbs optional `model` through `convert_messages_to_anthropic` so the unsigned reasoning_content→thinking block synthesised for Kimi's history validation survives the third-party signature-stripping pass on custom hosts too. Closes #17057. |
||
|
|
ff687c019e |
fix(aux): skip kimi-coding in vision auto-detect (closes #17076) (#17451)
* docs(anthropic): correct OAuth scope to Max plan + extra usage credits only The previous docs pass (#17399) overstated what Anthropic OAuth works with. In practice Hermes can only route against a Claude Max plan that has purchased extra usage credits — the base Max allowance is not consumed, and Claude Pro is not supported at all. Without Max + extra credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token). Updates the four pages touched in #17399: - integrations/providers.md - user-guide/features/credential-pools.md - reference/environment-variables.md - getting-started/quickstart.md * fix(aux): skip kimi-coding in vision auto-detect (closes #17076) Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no image_in capability — Kimi's own docs confirm and suggest switching to a vision-capable model. Vision lives on the separate Kimi Platform (api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has kimi-coding as main provider and auxiliary.vision.provider=auto, resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient wrapped around /coding which 404'd on every vision request. Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn}) and gate the main-provider vision branch on membership. On a skip the auto-detect falls through to OpenRouter → Nous like any other main-provider-unavailable case. Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are unaffected — the skip only applies when the caller is in auto mode. Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering the skip path, CN variant, explicit-override passthrough, and a guard against accidental skip-list widening. |
||
|
|
13683c0842 |
feat(memory): notify providers on mid-process session_id rotation (#17409)
Fixes #6672 Memory providers now receive on_session_switch() whenever AIAgent.session_id rotates mid-process — /resume, /branch, /reset, /new, and context compression. Before this, providers that cached per-session state in initialize() (Hindsight's _session_id, _document_id, accumulated _session_turns, _turn_counter) kept writing into the old session's record after the agent had moved on. MemoryProvider ABC ------------------ - New optional hook on_session_switch(new_session_id, *, parent_session_id='', reset=False, **kwargs) with no-op default for backward compat. reset=True signals /reset or /new — providers should flush accumulated per-session buffers. reset=False for /resume, /branch, compression where the logical conversation continues. MemoryManager ------------- - on_session_switch() fans the hook out to every registered provider. Isolated try/except per provider — one bad provider can't block others. - Empty/None new_session_id is a no-op to avoid corrupting provider state during shutdown paths. run_agent.py ------------ - _sync_external_memory_for_turn now passes session_id=self.session_id into sync_all() and queue_prefetch_all(). Providers with defensive session_id updates in sync_turn (Hindsight already had this at plugins/memory/hindsight/__init__.py:1199) now actually receive the current id. - Compression block at ~L8884 already notified the context engine of the rollover; now also calls _memory_manager.on_session_switch(reason='compression'). cli.py ------ - new_session() fires reset=True, reason='new_session' so providers flush buffers. - _handle_resume_command fires reset=False, reason='resume' with the previous session as parent_session_id. - _handle_branch_command fires reset=False, reason='branch' with the parent session_id already captured for the DB parent link. gateway/run.py -------------- - _handle_resume_command now evicts the cached AIAgent, mirroring /branch and /reset. The next message rebuilds a fresh agent whose memory provider initialize() runs with the correct session_id — matches the pattern the gateway already uses for provider state cross-session transitions. Hindsight reference implementation ---------------------------------- - plugins/memory/hindsight/__init__.py adds on_session_switch that: updates _session_id, mints a fresh _document_id (prevents vectorize-io/hindsight#1303 overwrite), and clears _session_turns / _turn_counter / _turn_index so in-flight batches don't flush under the new document id. parent_session_id only overwritten when provided (avoids clobbering on a bare switch). Tests ----- - tests/agent/test_memory_session_switch.py: new dedicated file. ABC default no-op, manager fan-out, failure isolation, empty-id no-op, session_id propagation through sync_all/queue_prefetch_all, Hindsight state transitions for every reset/non-reset case, parent preservation. - tests/cli/test_branch_command.py: new test verifying /branch fires the hook with correct parent_session_id + reset=False + reason. - tests/gateway/test_resume_command.py: new test verifying /resume evicts the cached agent. - tests/run_agent/test_memory_sync_interrupted.py: updated existing assertions to account for the session_id kwarg on sync_all and queue_prefetch_all. E2E verified (real imports, tmp HERMES_HOME): - /resume: session_id updates, doc_id fresh, buffers cleared, parent set - /branch: session_id forks, parent links to original - /new: reset=True clears accumulated state - compression: reason='compression' propagated, lineage preserved - Empty id: no-op, state preserved - Legacy provider without on_session_switch: no crash Reported by @nicoloboschi (Hindsight maintainer); related scope-widening comment by @kidonng extending coverage to compression. |
||
|
|
21676e80cc |
Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)" (#17397)
This reverts commit
|
||
|
|
bc0d8a941e |
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
|
||
|
|
fa9383d27b |
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description. |
||
|
|
a12f7aa8bb |
fix(curator): default cycle is every 7 days, not 24 hours
Weekly is closer to how skill churn actually works — most agent-created skills don't change multiple times per day, so a daily review is pure cost without benefit. Bumping the default to 7 days reduces aux-model spend while still catching drift and staleness on the timescales that matter (30d stale, 90d archive). Changes: - DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days) - config.yaml default: interval_hours: 24 -> 24 * 7 - CLI status line renders as '7d' when interval is a whole-day multiple - Test `test_old_run_eligible` decoupled from the exact default: it now uses 2 * get_interval_hours() so future tweaks don't break it |
||
|
|
0d31864e3b |
fix(curator): defense-in-depth gates against bundled/hub skills
Previous invariants only gated the primary entry points
(apply_automatic_transitions, archive_skill, CLI pin). Several paths
were unprotected:
- bump_view / bump_use / bump_patch / set_state / set_pinned wrote
usage records unconditionally, which is confusing noise in
.usage.json even though the review list filtered them out
- restore_skill did not check whether a bundled skill now shadows
the archived name
- CLI unpin was asymmetric with CLI pin — it had no gate
Fixes:
- _mutate() (the shared counter / state writer) now drops silently
when the skill is not agent-created. .usage.json never gains a
record for a bundled or hub-installed skill.
- restore_skill() refuses to restore under a name that is now
bundled or hub-installed (would shadow upstream).
- CLI unpin gate matches CLI pin.
New tests:
- 5 provenance-guard tests on skill_usage (one per mutator)
- 1 end-to-end test that hammers every mutator at a bundled skill
and a hub skill, asserts both are untouched on disk, and asserts
the sidecar stays clean
- 2 CLI tests proving pin/unpin refuse bundled skills symmetrically
64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).
|
||
|
|
c8b7e7268a |
refactor(curator): point review prompt at existing tools
The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill` tools that are not registered as model tools. Swap the prompt to rely on the real surface: - skill_manage action=patch — for patching and consolidation - terminal — to `mv` skill dirs into .archive/ Also drop `pin` from the model's decision list — pinning is a user opt-out for `hermes curator pin <skill>`, not something the model should do autonomously. Decision list is now: keep / patch / consolidate / archive. Tests updated: prompt-invariant test now asserts the existing tools are referenced and that bespoke tool names do NOT appear. New test prevents `pin` from being re-added as a model decision. |
||
|
|
bc79e227e6 |
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
|
||
|
|
01ad0aacaf | fix(tui): show correct context length | ||
|
|
214ca943ac | feat(agent): add lmstudio integration | ||
|
|
1d8b9e6458 |
fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027)
Auxiliary tasks (title_generation, vision, compression, web_extract,
session_search) now pick the correct wire protocol based on the
endpoint, not just on which resolve_provider_client branch built the
client. Fixes 404s on Kimi Coding Plan and any other named provider
whose endpoint speaks Anthropic Messages.
Root cause: the 'api_key' branch of resolve_provider_client (and the
Step 2 fallback chain inside _resolve_auto) always built a plain
OpenAI client regardless of what the endpoint actually spoke. For
provider=kimi-coding + model=kimi-for-coding, that meant:
POST https://api.kimi.com/coding/v1/chat/completions
{ "model": "kimi-for-coding", ... }
→ 404 resource_not_found_error
The /coding route only accepts the Anthropic Messages shape (the main
agent already uses api_mode=anthropic_messages for it). Earlier fixes
(#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and
external-process branches — but the named api_key branch (kimi-coding,
minimax, zai, future /anthropic providers) was the fourth sibling and
never got the same treatment.
Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a
plain OpenAI client in AnthropicAuxiliaryClient when:
- api_mode is explicitly 'anthropic_messages', OR
- the URL ends in '/anthropic', OR
- the host is api.kimi.com + path contains '/coding', OR
- the host is api.anthropic.com.
Wired into _wrap_if_needed (covers all resolve_provider_client
branches that already go through it) and into the Step 2 api_key
fallback chain inside _resolve_auto. Explicit api_mode still wins:
passing api_mode='chat_completions' forces OpenAI wire, and already-
wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass
through unchanged.
E2E verified:
- resolve_provider_client('kimi-coding', 'kimi-for-coding')
→ AnthropicAuxiliaryClient (was plain OpenAI, which 404'd)
- _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient
- resolve_provider_client('openrouter', ...) → plain OpenAI (no regression)
- api_mode='chat_completions' override → plain OpenAI (explicit wins)
Tests:
- tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests
covering URL detection, wrap decisions, and integration.
- 204/205 existing auxiliary tests pass (1 pre-existing failure on
main, unrelated to this change).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
|
||
|
|
391f1ca1f4 |
feat(aux): translate extra_body.reasoning into Codex Responses API (#17004)
Auxiliary callers that configure reasoning via
auxiliary.<task>.extra_body.reasoning were having that config silently
dropped by the Codex Responses adapter — it only forwarded
messages/model/tools through to responses.stream(), never translating
chat.completions-shaped reasoning hints into the Responses API's
top-level reasoning + include fields.
Mirror the main-agent translation from agent/transports/codex.py:
- extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"}
- 'minimal' → 'low' clamp (Codex backend rejects 'minimal')
- Always include ['reasoning.encrypted_content'] when reasoning is enabled
- {'enabled': False} → omit reasoning and include entirely
- Non-dict reasoning values are ignored defensively
Reported by @OP (Apr 26 feedback bundle).
## Changes
- agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads
and translates extra_body.reasoning before calling responses.stream()
- tests/agent/test_auxiliary_client.py: 9 new tests covering all effort
levels, the minimal→low clamp, the disabled path, the no-op paths,
and defensive handling of wrong-shape inputs
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
|
||
|
|
06164a7b28 |
fix(codex): resync pool entry from auth.json after reauth (#17001)
When openai-codex tokens expire or the ChatGPT account hits a 429
window, the pool entry gets marked STATUS_EXHAUSTED with
last_error_reset_at many hours in the future. If the user then runs
`hermes model` / `hermes auth openai-codex` to reauth, fresh tokens
land in ~/.hermes/auth.json but the pool entry stayed frozen behind
its reset_at — every request kept failing with 'credential pool: no
available entries (all exhausted or empty)' until the original window
elapsed.
_available_entries() already had auth.json/credentials-file resync
branches for anthropic/claude_code and nous/device_code; openai-codex
was missing. Added _sync_codex_entry_from_auth_store() mirroring the
nous version (reads state["tokens"][{access,refresh}_token] +
state["last_refresh"]) and wired it into the exhausted-entry resync
loop.
Also softens the 'codex CLI not found' doctor warning — native
device-code OAuth does not require the Codex binary, only
importing existing Codex CLI tokens does. Downgraded to an info line.
Reported on Discord by p1aceho1der: Codex stalled indefinitely after
a rate-limit reset, reauth didn't help, and doctor falsely warned
that the codex CLI was required.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
|
||
|
|
529eb29b6a |
fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set
Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel
values. The salvaged bridge was forwarding Hermes' "minimal" effort to
Flash verbatim, which is not a documented Gemini level and risks a 400
from the native adapter.
Clamp minimal->low on Flash (matching how Pro already clamps minimal+low
down), and funnel anything outside {low, medium, high} into medium to
keep the request valid by construction. No behaviour change for the
documented effort levels.
|
||
|
|
dbbe2d1973 | fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes | ||
|
|
02ae152222 | fix(mcp): normalize nullable tool schemas | ||
|
|
37551ee53e |
test(bedrock): add model picker and region routing tests
25 new tests (all Bedrock API calls mocked, no real AWS creds needed):
tests/hermes_cli/test_bedrock_model_picker.py (20 tests):
- provider_model_ids("bedrock") uses live discovery, returns regional
model IDs, falls back gracefully on empty/exception, resolves all
bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery
- list_authenticated_providers() section 2: bedrock appears with AWS
creds, model list from discover_bedrock_models(), total_models
matches, is_current flag works, absent creds hides bedrock, discovery
failure does not crash, no duplicate entries
- Region routing: botocore profile eu-central-1 yields eu.* model IDs
end-to-end; env var takes priority over botocore profile
- providers.py overlay: exists with correct transport/auth_type, label
is non-empty, all aliases normalize to bedrock
tests/agent/test_bedrock_adapter.py (5 tests):
- resolve_bedrock_region() botocore profile fallback, botocore failure
fallback, us-east-1 hard fallback (with botocore mocked)
|
||
|
|
023f5c74b1 |
fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)
* fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path
OAuth requests now identify as Hermes on the wire. Removed:
- "You are Claude Code, Anthropic's official CLI for Claude." system
prompt prepend
- Hermes Agent → Claude Code / Nous Research → Anthropic
system-prompt substitutions
- mcp_ tool-name prefix on outgoing tool schemas + message history
- Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path
removed from AnthropicTransport.normalize_response, + all 5 call
sites in run_agent.py and auxiliary_client.py)
- user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on
the Messages API client
Added:
- OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth
requests carrying it with HTTP 400 'This authentication style is
incompatible with the long context beta header.'
Kept (auth plumbing, not identity spoofing):
- _is_oauth_token classifier and is_oauth flag threading
- Bearer vs x-api-key auth routing
- _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend
requires these on the OAuth-gated Messages endpoint
- _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth
creds to third parties; this is the only way the login flow works
- claude-cli/<v> User-Agent on the OAuth token exchange + refresh
endpoints at platform.claude.com/v1/oauth/token — bare requests get
Cloudflare 1010 blocked
Verified live against api.anthropic.com with a fresh sk-ant-oat01-*
token:
- claude-haiku-4-5 simple message: HTTP 200, 'OK' response
- claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool
named 'terminal' (no mcp_ prefix) round-tripped correctly
- Outgoing wire: no user-agent, no x-app, real Hermes identity in
system prompt, real tool name in schema
Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer
needed since the mcp_ round-trip is gone).
* fix(anthropic): resolve_anthropic_token() reads credential pool first
Close the gap where ~/.hermes/auth.json → credential_pool.anthropic
(where hermes login + dashboard PKCE flow write OAuth tokens) was not
in resolve_anthropic_token()'s source list.
Before: users who authed via hermes login got the token written into
the pool, but legacy fallback code paths (auxiliary_client, models
catalog fetch, explicit-runtime path) that call resolve_anthropic_token()
saw None and raised 'No Anthropic credentials found' — even though the
token was sitting in auth.json.
New priority 1: pool.select() with env-sourced entries skipped. Skipping
env:* entries preserves the existing env-var priority logic further
down the chain (static env OAuth → refreshable Claude Code upgrade via
_prefer_refreshable_claude_code_token).
Surfaced while writing the hermes-agent-dev skill playbook for
'finding a live OAuth token for an E2E test'.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
|
||
|
|
e63364b8df |
revert: computer-use cua-driver (PR #16919) (#16927)
Reverts PR #16919 (commits |
||
|
|
f3371c39a4 |
fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen
- auxiliary_client: apply _to_openai_base_url() to custom base_url
(fixes /anthropic → /v1 rewrite missing for provider="custom")
- auxiliary_client: use main_runtime.get("model") instead of _read_main_model()
so auxiliary tasks follow system default model changes
- title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title
- cli.py / gateway/run.py: pass main_runtime to maybe_auto_title
- tests: update mock assertions for new main_runtime parameter
|
||
|
|
dad10a78d0 |
feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form. |
||
|
|
a7cdd4133c |
fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793)
On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6 are capped at 200K context unless the request carries the `context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com) 1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate it as beta as of 2026-04. Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`) while silently sending a request without the beta — so Bedrock users saw a 200K ceiling with no error message, and no config knob unblocked it. Claude Code sends this header by default, which is why the same Bedrock credentials worked there. - Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved thinking and fine-grained tool streaming). - Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth endpoints — they host their own models, not Claude, so Anthropic beta headers are irrelevant and could risk rejection. - Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock client. Previously that constructor passed no betas at all, so native Anthropic had the 1M unlock via default_headers but Bedrock didn't. - Fast-mode per-request `extra_headers` already rebuilds from `_common_betas_for_base_url`, so it picks up the 1M beta automatically. Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while same credentials worked in Claude Code. |
||
|
|
6ea5699e3f |
fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775)
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.
Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None
Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.
Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam
Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
|
||
|
|
94b26f3ec9 |
fix(compression): retry summary on main model for unknown errors before giving up (#16774)
The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls. |
||
|
|
dfdc4276e8 |
fix(compression): notify gateway users when summary generation fails
When auxiliary compression's summary LLM call fails (e.g. model 404,
auxiliary model misconfigured), the compressor still drops the selected
turns and inserts a static fallback placeholder — the dropped context
is unrecoverable.
Previously the only signal of this was a WARNING in agent.log. Gateway
users (Telegram/Discord/etc.) had no way to know context was lost
because the existing _emit_warning path requires a status_callback,
and the gateway hygiene path uses a temporary _hyg_agent with
quiet_mode=True and no callback wired up.
Changes:
- ContextCompressor: track _last_summary_fallback_used and
_last_summary_dropped_count on each compress() call. Cleared at the
start of compress() and on session reset.
- gateway/run.py hygiene: after auto-compress, inspect the temp
agent's compressor; if fallback was used, send a visible ⚠️ warning
to the user via the platform adapter (TG/Discord/etc.) including
dropped count and the underlying error.
- gateway/run.py /compress: append the same warning to the manual
compress reply so users running /compress see the failure too.
Acceptance:
- Summary success: no user-visible warning (unchanged).
- Summary failure on gateway hygiene: user receives a TG/Discord
message with dropped count + error + remediation hint.
- Summary failure on /compress: warning appended to the command reply.
- CLI status_callback / _emit_warning path is untouched.
- Test coverage: two new tests verify the tracking fields are set on
failure and cleared on subsequent success.
|
||
|
|
49e3a1d8ee | style: trim verbose comment blocks added by previous commit | ||
|
|
e553f6f3e4 |
fix(memory): narrow scrub surface to known wrapper boundaries
Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence). |
||
|
|
3b2edb347d |
fix(gateway): scrub memory-context leaks from vision auto-analysis output
fixes #5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together) |
||
|
|
56724147ef |
fix(providers/gmi): post-salvage review fixes
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
|
||
|
|
c53fcb0173 |
feat(providers): add GMI Cloud as a first-class API-key provider (#11955)
Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local> |
||
|
|
8402ba150e |
fix(copilot): send vision header for Copilot vision requests
Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route. Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com> |
||
|
|
ec671c4154 |
feat(image-input): native multimodal routing based on model vision capability (#16506)
* feat(image-input): native multimodal routing based on model vision capability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
* refactor(image-input): attempt native, shrink-and-retry on provider reject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
|
||
|
|
4a2ee6c162 |
fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure
Closes #15775. Title generation swallowed exceptions at debug level and returned None, so a depleted auxiliary provider (e.g. OpenRouter 402) silently left sessions with NULL titles. Reporter observed 45 untitled sessions accumulated over 19 days with no user-visible indication. - agent/title_generator.py: accept optional failure_callback, bump log to WARNING, invoke callback on call_llm exception (swallowing callback errors so nothing can crash the fire-and-forget worker thread). - cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the callback so failures route through the existing user-visible warning channel. - tests: cover callback fires / errors are swallowed / no-callback legacy behavior / maybe_auto_title forwards kwarg to worker. |
||
|
|
943465235e |
fix(compressor): guard against bare-string items in multimodal content list
raw_content from message["content"] can be a list that contains bare
strings, not only dicts. The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.
Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)). Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cfc8befe65 |
fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes #16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
6c87371815 |
fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327)
Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes migration (especially migrations done via OpenClaw's own tool, which doesn't archive the source directory). 1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py: rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml (capital H — a directory that doesn't exist). Now case-preserving: "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem paths land on the real Hermes home). Regex logic unchanged — replacement function now checks if the matched text was all-lowercase and emits the replacement in the matching case. 2. agent/onboarding.py + cli.py: one-time startup banner the first time Hermes launches and finds ~/.openclaw/. Tells the user to run `hermes claw cleanup` to archive it, gated on the existing onboarding seen-flag framework (onboarding.seen.openclaw_residue_cleanup in config.yaml). Fires once per install; re-running requires wiping that flag or running cleanup directly. Tests: - 4 new TestDetectOpenclawResidue tests (present / absent / file-instead- of-dir / default-home smoke) - 2 TestOpenclawResidueHint tests (content check) - 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip) - test_rebrand_text_preserves_filesystem_path_casing regression test with 4 scenarios including the exact ~/.openclaw/config.yaml case - Existing test_rebrand_text_* tests updated to the new case-preserving contract (lowercase input → lowercase output) Co-authored-by: teknium1 <teknium@noreply.github.com> |
||
|
|
e19854d893 |
fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322)
`_resolve_effective_accept()` used `return bool(cfg_val)` for the
`hooks_auto_accept` config key. In Python, `bool("false")` is `True`,
so a user setting `hooks_auto_accept: "false"` (quoted YAML string)
in `config.yaml` would silently enable auto-approval of every shell
hook, bypassing the consent prompt entirely.
Replace the coercion with the same type-aware parsing already used for
the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough,
strings checked against {1,true,yes,on} case-insensitively, everything
else (including "false", None, 0, ints) rejected.
Add TestHooksAutoAcceptParsing guarding the regression across all four
value shapes (bool, string-truthy, string-falsy, missing/None).
Reported by @sprmn24 in #16244.
|
||
|
|
635253b918 |
feat(busy): add 'steer' as a third display.busy_input_mode option (#16279)
Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt). |
||
|
|
9a70260490 |
Revert "feat(onboarding): port first-touch hints to the TUI (#16054)" (#16062)
This reverts commit
|
||
|
|
ffd2621039 |
feat(onboarding): port first-touch hints to the TUI (#16054)
PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean |