mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-21 03:39:54 +00:00
fc2754dbdff860cdeb8fe4ed5fe0464bb6295cbb
1339 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bcca5ed34d |
fix(deps): pin brotlicffi so aiohttp can decode Discord's Brotli attachments
Discord's CDN serves attachments with Content-Encoding: br. aiohttp's
compression_utils tries 'import brotlicffi as brotli' first and falls back
to google's Brotli, but Brotli<1.2.0's Decompressor.process() is 1-arg
while aiohttp calls it with 2 args (data, max_length). Result: every
.txt/.md/.doc uploaded to a Discord-gateway session fails to decode at
att.read() with 'Can not decode content-encoding: br' / 'TypeError:
process() takes exactly 1 argument (2 given)', the agent never sees the
bytes, and falls back to filesystem guessing.
Pin brotlicffi==1.2.0.1 in both surfaces:
- tools/lazy_deps.py 'platform.discord' tuple: Discord users on the
lazy-install path get it on first discord.py import.
- pyproject.toml [messaging] extra: users who explicitly install
hermes-agent[messaging] (skipping the lazy path) get it eagerly.
brotlicffi wins aiohttp's import race regardless of what else is
installed (try brotlicffi / except: import brotli), so existing setups
that already pulled google's Brotli transitively don't change behavior
beyond the bug fix. ~1.5 MB wheel, manylinux/macOS/Windows coverage.
E2E verified: round-trip decode of Brotli-compressed payload via
aiohttp.compression_utils.brotli succeeds with brotlicffi pinned; same
test against Brotli==1.1.0 alone reproduces the reported TypeError.
Credit to @Korkyzer for the original diagnosis and fix shape in #15744;
the lazy-deps gating layer was added on top to keep brotlicffi out of
the install path for users who don't run a Discord gateway.
Fixes #12511.
Closes #15744.
Co-authored-by: Korky <korkyzer@gmail.com>
|
||
|
|
5af672c753 |
chore: remove Atropos RL environments and tinker-atropos integration (#26106)
* chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example |
||
|
|
4695d2716f |
fix(browser): honor pre-set AGENT_BROWSER_ARGS and document the bypass
Follow-up to the sandbox-bypass env-var fix: - Update the opt-out gate so a user-provided AGENT_BROWSER_ARGS is also respected, not just the legacy AGENT_BROWSER_CHROME_FLAGS. Previously the gate only checked the broken legacy var, so a user who pre-set AGENT_BROWSER_ARGS would still get clobbered by Hermes's auto-injection. - Document AGENT_BROWSER_ARGS in .env.example, the browser feature page, and the env var reference, with notes about the auto-injection on AppArmor-restricted systems (Ubuntu 23.10+, DGX Spark, containers). - Add Anadi Jaggia to AUTHOR_MAP. |
||
|
|
8ed2ef6f46 |
fix(browser): use correct env var for --no-sandbox bypass
AGENT_BROWSER_CHROME_FLAGS is not read by agent-browser CLI. The correct env var is AGENT_BROWSER_ARGS, with comma-separated values. This fixes Chrome 'No usable sandbox' crash on Ubuntu 23.10+ systems where AppArmor restricts unprivileged user namespaces. The detection logic was correct but the fix used the wrong environment variable name and space-separated instead of comma-separated args. |
||
|
|
19071529f6 |
fix(lsp): shift baseline diagnostics into post-edit coordinates (#25978)
Pre-existing diagnostics below an edit point used to surface as 'LSP diagnostics introduced by this edit' whenever the edit deleted or inserted lines. The delta-filter key included the diagnostic's range, so the same logical error reported at a different line in the post-edit snapshot looked like a brand new diagnostic. Concrete case: deleting 14 lines in cli.py caused Pyright errors at lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be reported as introduced by it. Fix: build a piecewise-linear line-shift map (via difflib's SequenceMatcher) from pre and post content, and remap baseline diagnostics into post-edit coordinates before the set-difference. Diagnostics in deleted regions drop out cleanly; diagnostics below the edit shift by the right amount; diagnostics above are untouched. The strict (range-aware) equality key stays — so a genuinely new instance of an identical error class at a different line still surfaces as new. Pieces: - agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range, shift_baseline. Pure functions, no LSP state. - agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an optional line_shift kwarg; baseline is shift_baseline'd before computing the seen-set. _diag_key keeps the strict range key. - tools/file_operations.py — write_file captures pre_content for any LSP-handled extension (not just LINTERS_INPROC) and passes pre/post to _maybe_lsp_diagnostics, which builds the shift map. - New _lsp_handles_extension helper guards the pre_content read. Trade-offs preserved: - Genuinely new same-class errors at different lines still surface (content-only key would have swallowed them). - Pre-existing errors at unshifted positions still get filtered (covered by the strict-key path with no shift). - Best-effort: when pre_content can't be captured (file didn't exist, permissions), the unshifted comparison still catches most pre-existing errors; the edge case it misses is a new file with a non-empty baseline, which is structurally impossible. |
||
|
|
72b5dd8658 |
fix(update): refresh lazy-installed backends on hermes update (#25766)
Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional
backends moved to tools/lazy_deps.py and only install on first use.
hermes update runs uv pip install -e .[all] which doesn't touch any of
them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were
silently ignored on already-activated backends.
Two changes:
1. _is_satisfied() now parses the spec and checks the installed version
against the constraint via packaging.specifiers. Previously it
returned True the moment the package name was importable, which made
ensure() a name-presence gate rather than a version-pin gate.
2. New active_features() / refresh_active_features() pair: lists every
feature with at least one of its packages currently installed, then
re-runs ensure() on each. Refresh is invoked at the end of
_cmd_update_impl, right after the [all] install completes. Cold
backends (never activated) stay quiet — no churn for them.
Output during update is one summary block:
→ Refreshing 4 active lazy backend(s)...
↑ 1 refreshed: provider.anthropic
✓ 3 already current
or
⚠ memory.honcho failed to refresh: <pip stderr>
Failures never raise out of update — backends keep their previously-
installed version and we tell the user to rerun once upstream is fixed.
security.allow_lazy_installs=false is honored: features get marked
"skipped" with the reason shown.
Tests: 18 new unit tests covering version-aware satisfaction (exact pin,
range, extras blocks, missing package, malformed spec), active feature
discovery, and refresh status reporting. All 61 lazy_deps tests pass.
|
||
|
|
364ddd45e8 |
fix(terminal): prevent safety filter false positives on keywords inside quoted strings
The _foreground_background_guidance() function matched background-wrapper keywords (nohup/disown/setsid) anywhere in the command text, including inside quoted strings, Python -c code, commit messages, and PR body text. Two-layer fix: 1. Strip single-quoted, double-quoted, and backtick-quoted content before pattern matching via _strip_quotes() helper. 2. Tighten the regex to only match keywords at command-start positions (after ^, ;, &, &&, ||, or $() — not mid-argument. Both layers are needed: quote stripping handles the common case of keywords in string literals, and the position-aware regex handles unquoted cases like 'export FOO=setsid' (word boundary match, wrong position). Fixes #20064 |
||
|
|
1247ff2dca | fix: stop retrying initial MCP auth failures | ||
|
|
f0e46c5e9e |
fix: do not inherit api_mode when delegating across providers
Cross-provider delegation (e.g. MiniMax parent → DeepSeek child) must not inherit the parent's api_mode, because each provider uses a different API surface: MiniMax uses 'anthropic_messages' while DeepSeek uses 'chat_completions'. Inheriting the wrong mode causes 404 errors. When the effective provider differs from the parent's provider, derive api_mode from the target provider's defaults instead (None triggers re-derivation). Refs: Bug #20558, PR #20563 |
||
|
|
4ca5e72444 |
fix(web): preserve top-level error envelope on unconfigured systems
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.
Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:
1. **search**: ``firecrawl.search()`` caught the
``ValueError("Web tools are not configured...")`` from
``_get_firecrawl_client()`` and returned it as
``{"success": False, "error": ...}``, losing the legacy
``{"error": "Error searching web: ..."}`` envelope that
``tool_error()`` emits on main. Models that special-case the
``error`` key still detect the failure, but the prefix is part of
the legacy contract some users rely on.
2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
``ValueError`` and wrapped it as a per-page error inside
``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
BEFORE dispatching, so its unconfigured response is
``{"success": False, "error": "web_crawl requires Firecrawl..."}``
at the top level. The PR's per-page burying hid the failure inside
``results[]`` where models that check ``result.get("error")`` would
miss it.
Fix:
- ``plugins/web/firecrawl/provider.py``: pull
``_get_firecrawl_client()`` outside the broad ``try`` in
``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
to the dispatcher's top-level exception handler. In-flight SDK
errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
``web_crawl_tool``. When the resolved crawl provider is
``is_available()==False``, short-circuit BEFORE dispatching with the
same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
(``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
future plugin work can't undo this.
Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
|
||
|
|
21e3a863bb |
feat(web): firecrawl plugin natively supports crawl; delete legacy inline path
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.
What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
- Accepts the same kwargs as the dispatcher passes to any crawl
provider (``instructions``, ``depth``, ``limit``); Firecrawl's
/crawl endpoint ignores ``instructions`` and ``depth`` so we log
and drop with a clear info message.
- Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
gateway event loop isn't blocked on a multi-page crawl.
- Preserves the response-shape normalization across pydantic /
typed-object / dict variants that the legacy inline code did.
- Preserves per-page website-policy re-check (catches blocked
redirects after the SDK returns).
- Returns the same {"results": [...]} shape so the dispatcher's
shared LLM-summarization post-processing path works unchanged.
- Sets supports_crawl() to True so the dispatcher routes through
the plugin instead of the legacy fallthrough.
2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
that used to run after "No registered provider supports crawl" —
~270 lines including:
- check_firecrawl_api_key gate + typed error
- inline SSRF + website-policy seed-URL gate (dispatcher already
does this)
- Firecrawl client setup with crawl_params
- 100+ lines of pydantic/dict/typed-object normalization
- Per-page LLM-processing loop (kept in the dispatcher's shared
post-processing path; that's where it always belonged)
- trimming + base64 image cleanup (still done in the dispatcher's
shared path)
Replaced with a single typed-error branch when no crawl-capable
provider is available: "web_crawl has no available backend. Set
FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
TAVILY_API_KEY for Tavily."
Test updates
------------
- tests/tools/test_website_policy.py:
- test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
gate still runs on web_tools.check_website_access (no change to
that patch), but the firecrawl client lockdown moved to the
plugin module — patch firecrawl_provider._get_firecrawl_client
instead of web_tools._get_firecrawl_client. The dispatcher
short-circuits before the plugin runs, so the test still passes.
- test_web_crawl_blocks_redirected_final_url: patch the per-page
policy gate at plugins.web.firecrawl.provider.check_website_access
(where it now runs) AND on web_tools (where the seed-URL gate
still runs). Patch firecrawl_provider._get_firecrawl_client for
the FakeCrawlClient injection. Both checks flow through the same
fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
- Update parametrized capability-flag spec: firecrawl supports_crawl
is now True.
- Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
verifies inspect.iscoroutinefunction(p.crawl) is True and that
the async crawl returns a per-page error dict (not a raise) when
FIRECRAWL_API_KEY is missing.
Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
name search extract crawl async-extract? async-crawl?
firecrawl True True True True True
tavily True True True False False
Both crawl-capable providers exercise the dispatcher's
inspect.iscoroutinefunction async-or-sync detection.
Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
+ new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
before this commit, 2227 before the migration started).
|
||
|
|
39b4ebfcea |
refactor(web): delete legacy tools/web_providers/ directory + migrate ABC tests
Removes the legacy in-tree provider scaffolding that PR #25182 fully replaced with the plugin architecture: tools/web_providers/__init__.py (6 lines) tools/web_providers/base.py (89 lines — old ABCs) tools/web_providers/ARCHITECTURE.md (73 lines — old design doc) These were the staging-ground ABCs and provider modules that the plugin migration absorbed. All seven web providers now implement the single :class:`agent.web_search_provider.WebSearchProvider` ABC and live under ``plugins/web/<vendor>/``. Nothing else in the tree imports ``tools.web_providers`` — verified via grep before deletion. Test migration (tests/tools/test_web_providers.py) -------------------------------------------------- Rewrote ``TestWebProviderABCs`` to test the new unified ABC at :mod:`agent.web_search_provider`: - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available`` - test_concrete_search_only_provider_works — exercise default ``supports_extract=False`` / ``supports_crawl=False`` flags - test_concrete_multi_capability_provider_works — exercise all three capabilities, async extract supported (declared sync here for simplicity; real plugins like parallel + firecrawl use async) - test_search_only_provider_skips_extract_and_crawl — verify ``supports_*()`` flags default to False so search-only providers don't have to implement extract() or crawl() The 9 other tests in the file (per-capability backend selection, DEFAULT_CONFIG merge, dispatcher routing) test public helpers in ``tools.web_tools`` that still exist and pass unchanged. agent/web_search_provider.py docstring updated to reflect that the legacy ABCs no longer exist; the response-shape contract is preserved bit-for-bit so external consumers see no behavioral change. Net diff -------- - tools/web_providers/ removed (-168 lines) - tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net, same coverage, new API) - agent/web_search_provider.py docstring (-3/+5 lines) Verified -------- - 173/173 targeted web tests pass - 12/12 ABC contract tests pass with the new interface - No remaining grep hits for ``tools.web_providers`` outside of intentional historical references in plugin docstrings. |
||
|
|
748f3e016b |
refactor(web): delete inline vendor helpers, re-export from plugins
Removes ~580 lines of dead code from tools/web_tools.py that were superseded by the plugin migration but kept around in the cutover commit to keep the diff focused. Replaces them with thin re-export shims so existing tests and external callers that reach for the legacy ``tools.web_tools.<name>`` paths continue to work transparently. Deleted from tools/web_tools.py -------------------------------- - Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, the Firecrawl singleton) - Firecrawl client section (_get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _raise_web_backend_configuration_error, _firecrawl_backend_help_suffix, _get_firecrawl_client) - Parallel client section (_get_parallel_client, _get_async_parallel_client, _parallel_client, _async_parallel_client) - Tavily client section (_TAVILY_BASE_URL, _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents) - Generic SDK normalizers (_to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload) - Exa client section (_get_exa_client, _exa_client, _exa_search, _exa_extract) - Parallel helpers (_parallel_search, _parallel_extract) - Duplicate inline check_firecrawl_api_key Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines). Re-exports added at top of tools/web_tools.py --------------------------------------------- - From plugins.web.firecrawl.provider: Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls, _get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error, _get_firecrawl_client, _to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload, check_firecrawl_api_key - From plugins.web.tavily.provider: _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents - From plugins.web.parallel.provider: _get_parallel_client, _get_async_parallel_client - From plugins.web.exa.provider: _get_exa_client Plus retained module-level imports for backward-compat with tests: - httpx (tests patch tools.web_tools.httpx for tavily request mocking) - build_vendor_gateway_url, _read_nous_access_token, resolve_managed_tool_gateway, managed_nous_tools_enabled, prefers_gateway (tests patch tools.web_tools.<name>) Plugin indirection pattern (key technique) ------------------------------------------ For functions inside the firecrawl/parallel/exa plugins to honor unit-test patches that target ``tools.web_tools.<name>``, the plugin implementations now do ``import tools.web_tools as _wt`` at call time and read helper names through that module (``_wt._read_nous_access_token``, ``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the existing test patches transparently reach the plugin code without any test changes. The cached client globals (_firecrawl_client, _firecrawl_client_config, _parallel_client, _async_parallel_client, _exa_client) also now live on tools.web_tools so existing test setup_method handlers that reset ``tools.web_tools._<vendor>_client = None`` between cases keep working. The plugins read/write the cache via getattr/setattr on the web_tools module. Verified -------- - 173/173 targeted web tests pass: test_web_providers.py, test_web_providers_brave_free.py, test_web_providers_ddgs.py, test_web_providers_searxng.py, test_web_tools_config.py, test_web_tools_tavily.py, test_website_policy.py, test_config_null_guard.py - Compile-clean (py_compile.compile passes) - All inline implementations now exist in exactly one place (plugins.web.<vendor>.provider) Follow-up clean-up ------------------ - Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows (next commit) - Delete tools/web_providers/ directory entirely - Add tests/plugins/web/ coverage - Full tests/tools/ + tests/gateway/ regression sweep before promoting PR |
||
|
|
5e54330e27 |
fix(web): preserve firecrawl crawl + website-policy gate after migration
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:
1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).
Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.
2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.
Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.
Test updates (tests/tools/test_website_policy.py — 4 tests):
- test_web_extract_short_circuits_blocked_url
- test_web_extract_blocks_redirected_final_url
Both: patch the gate at plugins.web.firecrawl.provider (where it
runs after migration) and force the firecrawl plugin to be the
active extract provider via FIRECRAWL_API_KEY.
- test_web_crawl_short_circuits_blocked_url
- test_web_crawl_blocks_redirected_final_url
Both: unchanged — the dispatcher-level gate at tools.web_tools.py
line 1651 still uses the imported `check_website_access` name and
the firecrawl-fallthrough path is exercised as before.
Verified: 22/22 tests/tools/test_website_policy.py pass.
|
||
|
|
b05253ceed |
refactor(web): dispatch all three tools through web_search_registry
Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.
Per-tool changes:
web_search_tool (sync)
Replace 5 backend branches (parallel, exa, registry-3-providers,
tavily, firecrawl-fallthrough) with a single registry path:
1. _get_search_backend() resolves the configured name
2. _wsp_get_provider(name) for explicit-config-wins semantics
3. get_active_search_provider() fallback for typo / unknown name
4. provider.search(query, limit) — sync for all 7 providers
web_extract_tool (async)
Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
search-only-error, firecrawl-perurl-loop) with:
1. Same provider resolution as search.
2. When configured backend IS registered but doesn't support
extract (search-only providers like brave-free), surface a
typed "search-only" error matching the legacy text — tests
assert that wording.
3. inspect.iscoroutinefunction(provider.extract) detects sync vs
async: parallel + firecrawl are async; exa + tavily are sync.
Sync extracts run in asyncio.to_thread() so we don't block.
web_crawl_tool (async)
Replace tavily-specific branch + search-only-error block with:
1. _wsp_get_provider(backend) — explicit config first
2. Search-only typed error when the configured name doesn't
support crawl (matches legacy phrasing)
3. get_active_crawl_provider() fallback otherwise
4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
5. Response post-processing (LLM summarization, trimming) stays
unchanged — it's not provider-specific.
When no plugin advertises supports_crawl, falls through to the
existing Firecrawl-via-web-summarize path below (unchanged).
Test updates (2 tests in tests/tools/test_web_tools_config.py):
- test_web_search_clamps_limit_before_backend_call:
patch("tools.web_tools._parallel_search") -> patch the registry
provider returned by agent.web_search_registry.get_provider
- test_search_error_response_does_not_expose_diagnostics:
patch("tools.web_tools._get_firecrawl_client") -> same pattern
Tests unchanged (still pass):
- All TestXBackendWiring classes (test _get_backend / _is_backend_available
config-resolution, independent of dispatch)
- All TestXSearchOnlyErrors classes (test the search-only error path
via web_extract_tool / web_crawl_tool — error text preserved)
- 141 passing web tests total, 0 regressions.
Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
- tools.web_tools._exa_search / _exa_extract / _parallel_search /
_parallel_extract / _tavily_request / _normalize_tavily_* /
_get_firecrawl_client / _extract_web_search_results /
_extract_scrape_payload / _to_plain_object / _normalize_result_list
are no longer called by the dispatchers, but still exist.
- The config-resolution layer (_get_backend, _is_backend_available,
_is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
use and must stay.
- The Firecrawl proxy and check_firecrawl_api_key are still imported
by integration tests and patched by unit tests — must stay (or be
re-exported from the plugin).
|
||
|
|
6b219f5af6 |
refactor(web): remove legacy in-tree provider modules
Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three
providers that moved to plugins/web/ in prior commits. tools/web_tools.py
no longer imports them (registry dispatch as of
|
||
|
|
6bd16a645b |
refactor(web): dispatch brave-free/ddgs/searxng via web_search_registry
The three migrated providers (brave-free, ddgs, searxng) are now dispatched
through agent.web_search_registry.get_provider() instead of importing
their concrete classes directly. The four inline providers (parallel, exa,
tavily, firecrawl) keep their existing branches — they live in
tools/web_tools.py itself and aren't part of this spike's plugin extraction.
The legacy tools/web_providers/{brave_free,ddgs,searxng}.py modules are
still in place (untouched by this commit) — Task 10 deletes them once the
real migration PR is ready. Keeping them alive during the spike means
revertibility is trivial.
E2E verified:
1. Plugin discovery registers ['brave-free','ddgs','searxng']
2. Config web.search_backend: brave-free resolves to the plugin instance
3. Dispatch result matches the original {success, data.web[]} contract
4. compile OK; no new LSP errors beyond pre-existing ones in web_tools.py
|
||
|
|
d898e0eb7f |
fix(gateway): complete lazy-install rebind for slack/feishu/matrix + add ensure_and_bind helper (#25038)
Fixes #25028. The lazy-install hooks added in #25014 installed packages correctly but failed to rebind module-level globals after install: - Slack: missing aiohttp rebind → NameError on file uploads - Feishu: none of the ~25 lark_oapi symbols rebound → TypeError on adapter instantiation - Matrix: mautrix.types enums stayed as stubs → mismatched values at runtime Introduces tools.lazy_deps.ensure_and_bind() — a DRY helper that combines ensure() + importer-callable + globals().update(). This eliminates the error-prone pattern of manually listing every global that needs updating after lazy-install. Each platform adapter now defines a single _import() function returning all bindings. Also fixes: pyproject.toml [slack] extra was missing aiohttp (needed by slack-bolt's async path). |
||
|
|
52521c937a | fix(install): skip browser download when system chromium exists | ||
|
|
7f08cb5941 |
fix(tts): align MiniMax TTS defaults with current API and add GroupId support
Follow-up on @pty819's t2a_v2 endpoint fix: - Default model: speech-02 -> speech-02-hd (bare 'speech-02' is not in the supported enum; t2a_v2 rejects it with 400). Official enum: speech-01-hd, speech-01-turbo, speech-02-hd, speech-02-turbo, speech-2.6-hd/turbo, speech-2.8-hd/turbo. - Default voice: female-shaonv -> English_expressive_narrator. The legacy speech-01-series short ID doesn't resolve cleanly on the speech-02+ models that are now the default. - Default base URL: api.minimaxi.com -> api.minimax.io (matches the canonical host in the published docs; api-uw.minimax.io is the reduced-latency alt). - Add GroupId support via tts.minimax.group_id config or MINIMAX_GROUP_ID env var. Some MiniMax accounts scope TTS requests by group; without it, requests 401. Only appended when not already in the user's base_url. Tests rewritten to cover both the default t2a_v2 path (hex-encoded audio in JSON, nested voice_setting/audio_setting) and the legacy text_to_speech path (raw audio bytes, flat payload). Adds coverage for GroupId config/env wiring and error surfacing. Also adds AUTHOR_MAP entry for pty819's GitHub-noreply email. |
||
|
|
c875c0dc11 |
fix(tts): update MiniMax default model to speech-02 and correct API endpoint
The MiniMax TTS defaults were outdated: - DEFAULT_MINIMAX_MODEL was 'speech-01' but MiniMax now uses 'speech-02' - DEFAULT_MINIMAX_BASE_URL was 'https://api.minimax.chat/v1/text_to_speech' which no longer works; the correct endpoint is 'https://api.minimaxi.com/v1/t2a_v2' Users who configured tts.provider: minimax were getting model-not-supported errors because the hardcoded defaults did not match available API permissions. |
||
|
|
9d42c2c286 |
feat(video_gen): unified video_generate tool with pluggable provider backends (#25126)
* feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' | 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> |
||
|
|
59da8ec4ec |
fix(tools): refuse skill_view name collisions instead of guessing
skill_view ran the direct-path strategy across every skill dir before the recursive strategy, so a top-level skill in an external dir could silently shadow a same-named nested local skill. /skills correctly listed the local version (deduped local-first by _find_all_skills) but skill_view loaded the external one — confusing, and a real bug class for users with skills.external_dirs registered alongside categorized local skills. Pick a louder fix than @polkn's PR #6136 proposed: collect every match across all dirs (direct path, recursive by parent dir name, legacy flat <name>.md), and if there's more than one, refuse with an error that surfaces every matching path plus a hint to load by the categorized form. Local-first precedence would have replaced silent external-shadowing with silent same-name collisions between two externals, or made an externally-shadowed-by-local skill unreachable by bare name with no signal. Refusing forces the user to disambiguate once and never wonder which skill ran. Recovery: pass the full categorized path ("foundations/runtime/explore-codebase" instead of "explore-codebase"), or rename one of the colliding skills. Co-authored-by: pol <pol.kuijken@gmail.com> |
||
|
|
d6c9711ba8 |
fix(security): reduce unnecessary shell=True in subprocess calls
- memory_setup.py: use shlex.split() for plugin dep checks instead of shell=True - transcription_tools.py: avoid shell=True for auto-detected whisper commands (user-provided templates via env var still use shell=True for compatibility) - cli.py: add comment clarifying intentional shell=True for user quick_commands - Add test verifying auto-detected template is shlex-safe Addresses CONTRIBUTING.md Priority #3 (Security hardening — shell injection). |
||
|
|
5d90386baa |
fix(gateway): add lazy_deps.ensure() to slack, matrix, dingtalk, feishu adapters (#25014)
Only Discord and Telegram had lazy-install hooks in their check_*_requirements() functions. The remaining four platforms that were moved to lazy_deps (Slack, Matrix, DingTalk, Feishu) would just return False immediately if their packages weren't pre-installed — no attempt to install them at runtime. This means even with the .venv permissions fix (#24841), these four platforms would still fail to load in Docker (or any fresh install) unless the user manually ran pip install. Add the same lazy_deps.ensure() pattern to all four, matching the existing Discord/Telegram implementation. |
||
|
|
486b692ddd |
feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779)
* feat(nous): unified client=hermes-client-v<version> tag on every Portal request Every Hermes request to Nous Portal now carries the same client=hermes-client-v<__version__> tag (e.g. client=hermes-client-v0.13.0 on this release), sourced live from hermes_cli.__version__. The release script's regex bump auto-aligns it on every release. Centralized in agent/portal_tags.py and wired into all four call sites: - NousProfile.build_extra_body (main agent loop, every chat completion) - auxiliary_client.NOUS_EXTRA_BODY + _build_call_kwargs (aux client) - run_agent.py compression-summary fallback path - tools/web_tools.py web_extract fallback Replaces the client=aux marker added in #24194 with the unified version tag. Tests assert against the helper output (invariant) rather than the literal string, so they don't need updating on every release. * feat(nous): cover /goal judge and kanban specify aux paths Two aux-using surfaces bypassed call_llm by invoking client.chat.completions.create() directly without extra_body, so they were missing the unified Portal client tag: - hermes_cli/goals.py — /goal standing-goal judge - hermes_cli/kanban_specify.py — kanban triage specifier Both now pass extra_body=get_auxiliary_extra_body() or None so they inherit the version tag when the aux client points at Nous Portal, and emit nothing otherwise (no tag leak to OpenRouter/Anthropic auxes). |
||
|
|
80374d4dd9 | fix: approval DELETE pattern DOTALL flag allows newline bypass | ||
|
|
420762f867 |
fix(tools): forward thread_id via metadata in _send_via_adapter live path
The live adapter path in _send_via_adapter called adapter.send() without
passing thread_id, while the standalone fallback path correctly forwarded
it. For plugin platforms (google_chat, teams, irc, line) running with the
gateway in-process, this caused every threaded reply to land as a new
top-level message instead of continuing the thread.
Matches the pattern already used by _send_matrix_via_adapter and
_send_feishu: build metadata={"thread_id": thread_id} and pass it through.
|
||
|
|
081f9368bc |
fix(voice_mode): detect audio in WSL when sd.query_devices() returns empty list but PULSE_SERVER is set
In WSL2, sounddevice.query_devices() returns [] even when the PulseAudio bridge is functional. The existing code already handled the case where the query itself raises an exception, but it missed the empty-list case. This change treats an empty device list as non-fatal in WSL when PULSE_SERVER is configured, matching the existing exception-handler behavior. Fixes: WSL users seeing 'No audio input/output devices detected' even though paplay/arecord work fine. |
||
|
|
6f92a21926 |
fix(web): add Bearer auth header for Tavily /crawl endpoint
Tavily's /crawl endpoint requires Authorization: Bearer <key> in the header, unlike /search and /extract which accept api_key in the JSON body. Without the header, crawl returns 401 Unauthorized. |
||
|
|
a54d4b0e46 |
fix(send_message): recognize XMPP JIDs as explicit targets
_parse_target_ref() has no handler for XMPP JIDs (user@server or room@conference.server), so they fall through to the final `return None, None, False`. This causes send_message to fail when targeting an XMPP chat by JID, since the JID is not numeric and doesn't match any other platform pattern. Add an explicit check for XMPP targets containing '@', matching the existing Matrix pattern above it. |
||
|
|
29c9ff9ba5 |
fix(lsp): typescript SDK install + tsc-missing skip + shellcheck warning (#24630)
Three follow-ups to PR #24168 found during live E2E testing on TS/bash files: 1. typescript-language-server now installs the typescript SDK (tsserver) alongside it. Without that sibling install, initialize() failed with "Could not find a valid TypeScript installation" and the server was marked broken — no diagnostics ever reached the agent. New extra_pkgs field on INSTALL_RECIPES makes that explicit and reusable for future peer-dep cases. 2. _check_lint now treats "linter command exists on PATH but cannot actually run" as skipped instead of error. The motivating case is npx tsc when typescript is not in node_modules — npx prints its "This is not the tsc command you are looking for" banner and exits non-zero, which previously blocked the LSP semantic tier (gated on success or skipped). Pattern-matched per base command (npx, rustfmt, go) so genuine lint errors still flow through normally. 3. hermes lsp status now surfaces a Backend warnings section when bash-language-server is installed but shellcheck is missing. The server itself spawns fine but bash-language-server delegates diagnostics to shellcheck — without it on PATH the integration looks alive but never reports any problems. Same warning is logged once at server spawn time. Validation: - 12 new tests in tests/agent/lsp/test_install_and_lint_fixes.py: * recipe carries typescript SDK * _install_npm passes both pkg + extras to npm CLI * backwards compat: recipes without extras still work * _backend_warnings quiet when bash absent / both present * _backend_warnings fires when bash installed without shellcheck * status output includes the Backend warnings section * _looks_like_linter_unusable catches the npx tsc banner * real TS type errors not misclassified as unusable * unfamiliar linters fall through normally * _check_lint returns skipped on npx tsc unusable * _check_lint returns error on real tsc type errors - Full lsp + file_operations test suite: 245/245 pass - Live E2E: * try_install("typescript-language-server") installs both packages into node_modules * write_file(bad.ts, ...) returns lint=skipped + lsp_diagnostics with two real TS errors (was lint=error, no lsp_diagnostics) * hermes lsp status renders the shellcheck warning when bash is installed but shellcheck is not on PATH |
||
|
|
29d7c244c5 |
feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199)
The clarify tool returned 'not available in this execution context' for every gateway-mode agent because gateway/run.py never passed clarify_callback into the AIAgent constructor. Schema actively encouraged calling it; users never saw the question. Changes: - tools/clarify_gateway.py — new event-based primitive mirroring tools/approval.py: register/wait_for_response/resolve_gateway_clarify with per-session FIFO, threading.Event blocking with 1s heartbeat slices (so the inactivity watchdog keeps ticking), and clear_session for boundary cleanup. - gateway/platforms/base.py — abstract send_clarify with a numbered-text fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix, etc.) gets a working clarify out of the box. Plus an active-session bypass: when the agent is blocked on a text-awaiting clarify, the next non-command message routes inline to the runner's intercept instead of being queued + triggering an interrupt. Same shape as the /approve deadlock fix from PR #4926. - gateway/platforms/telegram.py — concrete send_clarify renders one inline button per choice plus '✏️ Other (type answer)'. cl: callback handler resolves numeric choices immediately, flips to text-capture mode for Other, with the same authorization guards as exec/slash approvals. - gateway/run.py — clarify_callback wired at the cached-agent per-turn callback assignment site (only the user-facing agent path; cron and hygiene-compress agents have no human attached). Bridges sync→async via run_coroutine_threadsafe, blocks with the configured timeout, and returns a '[user did not respond within Xm]' sentinel on timeout so the agent adapts rather than pinning the running-agent guard. Text- intercept added to _handle_message before slash-confirm intercept (skipping slash commands). clear_session called in the run's finally to cancel any orphan entries. - hermes_cli/config.py — agent.clarify_timeout default 600s. - website/docs/user-guide/messaging/telegram.md — Interactive Prompts section. Tests: - tests/tools/test_clarify_gateway.py (14 tests) — full primitive coverage: button resolve, open-ended auto-await, Other flip, timeout None, unknown-id idempotency, clear_session cancellation, FIFO ordering, register/unregister notify, config default. - tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render paths (multi-choice/open-ended/long-label/HTML-escape/not-connected), callback dispatch (numeric resolve/Other flip/already-resolved/ unauthorized/invalid-token), and base-adapter text fallback. Out of scope: bot-to-bot, guest mode, checklists, poll media, live photos. Closes #24191. |
||
|
|
83b93898c2 |
feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)
* feat(lsp): semantic diagnostics from real language servers in write_file/patch
Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.
LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.
The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.
New module agent/lsp/ split into:
- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}
Wired into tools/file_operations.py:
- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged
Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).
Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.
Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.
Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
* feat(lsp): structured logging, backend gate, defensive walk caps
Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.
agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.
agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.
tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.
agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.
agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.
Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
silence (clean writes stay DEBUG), state-transition INFO-once
semantics (active for, no project root), action-required
WARNING-once (server unavailable), per-call WARNING (timeouts,
spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
_lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
the LSP layer for non-local backends and route correctly for
LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
exercising the 8 KiB cap.
Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
... reportReturnType (Pyright)" through the full path, then patch
fix removes it on the next call.
* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field
Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:
1. atexit cleanup of spawned language servers
----------------------------------------------------------------
``agent/lsp/__init__.get_service`` now registers an ``atexit``
handler on first creation that tears down the LSPService on
Python exit. Without this, every ``hermes chat`` exit was
leaking pyright/gopls/etc. processes for a few seconds while
their stdout buffers drained -- they got reaped by the kernel
eventually but a watchful ``ps aux`` would catch them.
The handler runs once per process (gated by
``_atexit_registered``); idempotent ``shutdown_service``
ensures double-fire is a no-op. Errors during shutdown are
swallowed at debug level since by the time atexit fires the
user has already seen the agent's final response.
2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
----------------------------------------------------------------
Previously the LSP layer folded its diagnostic block into the
``lint.output`` string, conflating the syntax-check tier with
the semantic tier. The agent (and any downstream parsers) now
read syntax errors and semantic errors as independent signals:
{
"bytes_written": 42,
"lint": {"status": "ok", "output": ""},
"lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
}
``_check_lint_delta`` returns to its original two-tier shape
(syntax check + delta filter); ``write_file`` and
``patch_replace`` independently fetch LSP diagnostics via
``_maybe_lsp_diagnostics`` and pass them into the new field.
``patch_replace`` propagates the inner write_file's
``lsp_diagnostics`` so the outer PatchResult carries the patch's
delta correctly.
Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
fires once and only once across N get_service calls; the
registered callable is our internal shutdown wrapper;
shutdown_service is idempotent and safe when never started;
exceptions during shutdown are swallowed; inactive service is
cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
/ PatchResult dataclass shape, to_dict include/omit semantics,
channel separation (lint and lsp_diagnostics carry independent
signals), write_file populates the field via
_maybe_lsp_diagnostics only when the syntax tier is clean,
patch_replace propagates the field forward from its internal
write_file.
Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
error introduced -> lint clean (parses), lsp_diagnostics carries
the pyright reportReturnType block; patch fix -> both fields
clean again.
* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write
Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.
The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.
Fix:
- New ``_mark_broken_for_file`` helper at the service layer marks
the (server_id, workspace_root) pair broken from the OUTSIDE when
the outer timeout fires. Called from the except branches in
``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
+ generic Exception). Also kills any orphan client process that
survived the cancelled future, fire-and-forget with a 1s ceiling.
- ``enabled_for`` now consults the broken-set BEFORE returning True.
Files in already-broken (server_id, root) pairs short-circuit to
False, so the file_operations layer skips the LSP path entirely
with no spawn cost. Until the service is restarted (``hermes lsp
restart``) or the process exits.
- A single eventlog WARNING is emitted on first mark-broken so the
user knows which server gave up. Subsequent edits in the same
project stay silent.
Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.
Validation:
- Live retest of the wedged-binary scenario: 5 sequential writes,
first 8.88s (the one snapshot timeout), subsequent four ~0.84s
(no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
patch fix all behave correctly with the new broken-set logic.
Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
|
||
|
|
d89553c2d6 |
fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587)
Daytona ships breaking SDK changes on June 10, 2026 — `list()` returns an iterator and the `page=` offset parameter is removed. We pin daytona==0.155.0 so we're past the May 24 hard-cutoff, but the legacy-sandbox resume path in DaytonaEnvironment still passes `page=1` and reads `.items` off the result. Switch to `next(iter(results), None)` against a single-result `list(labels=..., limit=1)` call. Update tests to use `iter([...])` and drop the `page=1` kwarg from list() assertions. |
||
|
|
62fd905340 |
feat(browser): support externally managed Camofox sessions
Allow integrations to share a visible Camofox identity with Hermes and recover existing tabs without carrying local patches. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
fc3fd6bb6b |
fix(dashboard): UI polish — modals, layout, consistency, test fixes
Dashboard UX polish pass — consolidates create forms into modals triggered from the page header, fixes layout inconsistencies, adds scroll-to navigation for the Keys page, and aligns the TokenBar with the design system. Changes: - App.tsx: add padding to sidebar header - resolve-page-title.ts: add missing routes, better fallback title - en.ts: fix nav labels (Profiles was 'profiles : multi agents') - ModelsPage: two-col layout, auxiliary tasks modal, TokenBar redesign - ProfilesPage: create button in header, form in modal, Checkbox component - CronPage: create button in header, form in modal - EnvPage: scroll-to sub-nav in header, fix text overflow Modal and dialog standardization: - Replace all native confirm()/window.confirm() with ConfirmDialog (OAuthProvidersCard, PluginsPage, ModelsPage, ConfigPage) - Add useModalBehavior hook (Escape-to-close, scroll lock, focus restore) - Apply hook to ProfilesPage, CronPage, AuxiliaryTasksModal Component fixes (from PR review): - Checkbox: fix controlled/uncontrolled mismatch, add focus-visible ring - TokenBar: add rounded-full to legend dots, remove dead code CI/test fixes: - Fix TS unused imports (noUnusedLocals), type-narrow PickerTarget union - Add windows-footgun suppression on platform-guarded os.killpg - Fix 19 stale unit tests + 9 e2e tests broken by recent main changes - Restore minimal example-dashboard plugin for plugin auth test |
||
|
|
c1eb2dcda7 |
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
|
||
|
|
99ad2d1372 |
fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205)
The `mistralai` PyPI package was quarantined on 2026-05-12 after a malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build, CI run, install.sh first-run) currently fails on `mistralai>=2.3.0,<3` because PyPI returns zero candidates. Existing users running `hermes update` mostly didn't notice — `hermes update` falls back from `.[all]` to per-extra retries and silently skips mistral with a warning that scrolls past. But fresh installs hard-fail or lose every other extra. Changes: - pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and `[termux-all]`. The `mistral` extra itself is preserved so users can opt back in once PyPI un-quarantines. - hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the `hermes tools` provider picker until restored. - hermes_cli/web_server.py: drop "mistral" from dashboard STT options. - tools/transcription_tools.py: explicit `provider: mistral` returns "none" with a clear status message; auto-detect skips mistral. - tools/tts_tool.py: dispatcher returns a clear "temporarily disabled" error before any SDK import attempt (avoids cached-stale-package surprises). - tests/tools/: update three test files to assert the new disabled behavior. Each test docstring records why and points at the rollback trigger (PyPI un-quarantines mistralai). Restore plan: revert this commit once the package is available on PyPI again. The behavior change is intentional and documented in code comments + test docstrings to make the rollback trivial. Validation: - scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' → 425/425 passing. Refs: https://pypi.org/simple/mistralai/ (currently "pypi:project-status: quarantined"). |
||
|
|
271883447e |
feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847)
Set HERMES_SESSION_ID using the existing session_context.py ContextVar system for concurrency safety (multiple gateway sessions in one process won't cross-talk). Also writes os.environ as fallback for CLI mode. Touchpoints: - gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry - run_agent.py: Set both ContextVar and os.environ at init and on context-compression rotation - tools/environments/local.py: Bridge ContextVars into subprocess env in _make_run_env() (ContextVars don't propagate to child processes) - tests/run_agent/test_session_id_env.py: 3 tests covering env, provided ID, and ContextVar paths execute_code subprocess already passes HERMES_* prefixed vars through _scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_'). Primary use case: webhook-triggered agents that need to include a `--resume <session_id>` takeover command in their output. |
||
|
|
ce0f529cde |
chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940)
C401: set(x for x in y) -> {x for x in y} (set comprehension)
C416: [(k,v) for k,v in d] -> list(d.items()) (unnecessary listcomp)
C408: tuple()/dict() -> ()/{} (unnecessary collection call)
PLR1722: exit() -> sys.exit() (adds import sys where needed)
21 instances fixed, 0 remaining. 19 files, +40/-36.
|
||
|
|
2ec8d2b42f |
chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero). |
||
|
|
657874460f |
chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions. |
||
|
|
976d8e27ad |
fix(approval): catch sudo with stdin/askpass/shell privilege flags
Adds the only #17873 category not covered by the in-flight PRs #17962 (briandevans, reverse shell + download-execute) and #7993 (SHL0MS, credential reads + curl/wget exfiltration): sudo invocations that an LLM-driven agent can drive without TTY interaction. The agent has no TTY, so the sudo forms that succeed without human involvement are those reading the password from stdin (`-S` / `--stdin`) or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`) and list-privileges (`-a`) flags are also gated since they are privilege-relevant invocations the agent can chain after acquiring the password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell). Plain `sudo cmd` (no flag) is TTY-bound and excluded. Two patterns: 1. Direct flag: `\bsudo\b[^;|&\n]*?\s+(?:-s\b|--stdin\b|-a\b|--askpass\b)` The lazy `[^;|&\n]*?` consumes flag-arguments without spanning command separators, so `sudo -u root -S whoami` matches (a textbook offensive form that a strict `(?:\s+-[^\s]+)*` "leading flags only" pattern would have missed because `root` is a flag-value not a flag). 2. Combined short flags: `\bsudo\b[^;|&\n]*?\s+-[a-z]*[sa][a-z]*\b` Catches packed forms like `sudo -nS id` where multiple flags share a single `-X` token. `_normalize_command_for_detection` lowercases input before pattern matching (tools/approval.py:340), so case variants of S/s and A/a collapse — both letter-pairs are gated since each is a privilege- relevant invocation. Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all flag-order permutations including herestring source and printf-piped forms; 9 negative including TTY-bound `sudo whoami`, interactive `sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`, package install, and the `pseudosudo` word-boundary edge case). Empirical coverage: 11/11 attacks matched, 0/10 false positives. Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download- execute), #7993 (credential reads + curl/wget exfiltration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9520a1ccdf |
fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set
Fixes #9590: Block explicit sudo -S (stdin password mode) commands when the SUDO_PASSWORD environment variable is not configured. The attack vector: the LLM constructs 'echo guessedpass | sudo -S cmd' to brute-force sudo passwords, iterates based on sudo's error output ('Sorry, try again'). The existing _transform_sudo_command only injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit sudo -S must be treated as a guessing attempt. Changes: - Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when SUDO_PASSWORD is absent, anchored to command-start positions (^ ; && || | etc.) to avoid false positives on literal text - Integrate into check_all_command_guards() above yolo/mode=off so the block is unconditional (like the hardline floor) - Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass, integration with check_all_command_guards, yolo non-bypass, container backend bypass |
||
|
|
8ac998cb0c |
fix(send_message): allow kanban workers to call send_message
The kanban dispatcher sets HERMES_KANBAN_TASK on every spawned worker but launches it with the assignee profile's HERMES_HOME (e.g. ~/.hermes/profiles/<name>/), which has no gateway.pid file. The existing _check_send_message therefore returned False from the is_gateway_running() fallback, even though the parent gateway is alive and reachable. Net effect: workers could call kanban_* tools (gated on HERMES_KANBAN_TASK in _check_kanban_mode) but not send_message. This breaks the natural pattern of "worker does the job, calls send_message to deliver rich content to the originating chat, then calls kanban_complete with a one-line summary" because the kanban notifier's payload_summary is hard-truncated to the first line (~200 chars) at gateway/run.py:3963 — anything richer has to ship via send_message. Honoring HERMES_KANBAN_TASK in _check_send_message — symmetric with _check_kanban_mode in kanban_tools.py:42 — closes the gap. No new state, no new env var, no profile-config changes required. |
||
|
|
ebf2ea584a |
feat(terminal,cli): docker_extra_args + display.timestamps
Two independent opt-in QoL toggles, both off by default. terminal.docker_extra_args: - List of extra flags appended verbatim to docker run after security defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or other docker run options not exposed by existing config keys. - Non-string entries are logged and skipped. - Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var. display.timestamps: - Appends [HH:MM] to user input bullet and the assistant response box header. Single hub in _format_submitted_user_message_preview() covers both single-line and multi-line user previews; assistant response label gets the timestamp at box-open time. Closes #1569 (timestamps). Co-authored-by: Mibayy <Mibayy@users.noreply.github.com> |
||
|
|
62cfe79e93 |
fix(tools): clarify kanban_complete phantom-card retry guidance
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.
Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.
Fixes #22923
|
||
|
|
673418dfa1 | fix(kanban): reject toolset names in task skills | ||
|
|
d4b26df897 |
perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226)
Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a fast path when a supervisor is alive for the current task_id. Replaces the ~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms Runtime.evaluate over the supervisor's already-connected WebSocket. Falls through to the existing agent-browser CLI path when no supervisor is running (e.g. backends without CDP, or before the first browser_navigate attaches one), so behaviour is unchanged where it can't apply. JS-side exceptions surface directly without falling through to the subprocess (the subprocess would just re-raise the same error, slower); supervisor-side failures (loop down, no session) fall through cleanly. Benchmark — 30 iterations of `1 + 1` against headless Chrome: supervisor WS mean= 0.96ms median= 0.91ms agent-browser subprocess mean=179.35ms median=167.73ms → 187x speedup mean Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5 real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome being installed). Browser test suite: 355 passed, 1 skipped. |