Addresses findings from two self-review passes pre-merge.
First pass (3-agent parallel review):
1. plugins/browser/browser_use/provider.py: drop the
``_ = managed_nous_tools_enabled`` dead-import-hider in
_get_config_or_none(). The import was actively misleading — the
helper IS used in _get_config() (separate method, separate import),
not here. The "keep static analysis happy" comment was wrong about
what the helper does in this scope.
2. agent/browser_provider.py: drop ``pragma: no cover`` from
is_configured() / provider_name() backward-compat aliases. They ARE
covered by ``TestLegacyAbcAliases`` — the pragma would have masked
future regressions.
3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
instead of hardcoded set of 3 keys. Future maintainers adding a 4th
built-in provider now just extend _PROVIDER_REGISTRY; the override
detection adapts automatically. Previously the hardcoded
``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
True forever on any 4-key registry, silently routing every install
onto the legacy fixture path.
4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
but the registry has no matching plugin (typo, uninstalled plugin,
discovery failure), emit a WARNING with actionable text instead of
silently falling through to auto-detect. Legacy code surfaced a typed
credentials error via direct class instantiation; this log restores
the signal in the post-migration path.
5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
documentation. Module docstring + 13-line block-comment + 5-line
inline comment was repeating the same point. Kept the docstring and
trimmed the block-comment to 5 lines.
6. agent/browser_registry.py: upgrade is_available()-raised logging from
DEBUG to WARNING with exc_info=True. A provider's availability check
throwing is unusual enough that users debugging "no cloud provider"
need the traceback in logs.
7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
imports (os, shutil, tempfile — only referenced inside the
SUBPROCESS_SCRIPT string literal that runs in a child process).
Second pass (architecture + claim-verification review):
8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
auto-detect branch. Prior text claimed it "routes through the plugin
registry's legacy preference walk so third-party plugins still get a
chance to be selected when they're explicitly configured" — false on
both counts. The branch uses module-level legacy class aliases
(BrowserUseProvider / BrowserbaseProvider) directly; third-party
plugins are intentionally reachable only via explicit
``browser.cloud_provider``. Corrected comment now matches behaviour
and cross-references _LEGACY_PREFERENCE for the firecrawl gate
rationale.
9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
drop the unused ``get_active_browser_provider as
_registry_get_active_browser_provider`` alias from the
``from agent.browser_registry import ...`` block. It was never
referenced; matching test-stub line in the agent.browser_registry
SimpleNamespace also dropped. ``get_provider`` is still imported (used
by the explicit-config dispatch path at line 535).
10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
with the early-guard pattern used in browserbase + browser_use
plugins. Previously firecrawl tried the DELETE and relied on
``_headers()`` raising ValueError to trip a "missing credentials"
warning; same final outcome but a different control flow that read
like a bug to a maintainer skimming the three modules. Now: if
is_available() is False, log+return early — identical shape to the
other two providers.
Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
Two changes that go together:
1. tools/browser_tool.py — add _ensure_browser_plugins_loaded() and call
it from _get_cloud_provider() before consulting the registry. Normally
model_tools triggers discover_plugins() as an import side-effect, but
_get_cloud_provider() can be reached from contexts that haven't gone
through model_tools (standalone scripts, certain unit-test paths, the
new parity-sweep harness). Without the defensive call, the registry is
empty and _registry_get_browser_provider() returns None — silently
downgrading users to local mode when they explicitly configured a
cloud provider with no credentials yet. The behavior-parity sweep
below caught this as 4 scenario regressions (explicit-X-no-creds for
all 3 providers, and explicit-firecrawl-with-creds).
2. tests/plugins/browser/check_parity_vs_main.py — subprocess harness
that pins one Python invocation to origin/main and one to this PR's
worktree via sys.path.insert(), runs _get_cloud_provider() across a
13-scenario config matrix, and diffs the reduced shape tuple
(is_local, provider_name, is_available). Provider_name pulls from
provider.provider_name() which is the legacy CloudBrowserProvider
API and remains as a backward-compat alias on the new BrowserProvider
ABC, so the comparison is apples-to-apples regardless of class
identity.
Final result: PARITY OK across 13 scenarios. The four observable
config/credential matrices that exercise the dispatcher all match
origin/main bit-for-bit:
- no-config + no-env → local
- explicit local + any env → local
- explicit BB / BU / FC + no creds → provider returned with
is_available()==False (so dispatcher surfaces typed credentials
error; matches main exactly)
- explicit BB / BU / FC + creds → provider returned with
is_available()==True
- no-config + BU creds → Browser Use
- no-config + BB creds → Browserbase
- no-config + both → Browser Use (legacy walk first hit)
- no-config + FC only → local (firecrawl NOT in legacy walk)
- no-config + FC + BB → Browserbase (legacy walk skips firecrawl)
Per the dev skill's "behavior-parity for refactor PRs" rule — without
this subprocess sweep, 31/31 unit tests pass while the production code
path is silently broken for users who type `browser.cloud_provider:
browserbase` and run a single browser command without prior model_tools
import. Caught + fixed before push.
The four files in tools/browser_providers/ (base.py, browserbase.py,
browser_use.py, firecrawl.py) have been migrated into
plugins/browser/<vendor>/provider.py over the previous commits. No
in-tree code references them anymore — the legacy class names
(BrowserbaseProvider / BrowserUseProvider / FirecrawlProvider) are
re-exported from tools.browser_tool as aliases to the plugin classes,
so existing test patches keep working.
Updates tests/tools/test_managed_browserbase_and_modal.py:
- Adds _load_plugin_module() helper next to _load_tool_module().
- Reroutes five _load_tool_module('tools.browser_providers.X', ...)
calls to _load_plugin_module('plugins.browser.X.provider', ...).
- Renames BrowserbaseProvider/BrowserUseProvider -> the new plugin
class names (BrowserbaseBrowserProvider / BrowserUseBrowserProvider).
- Updates is_configured() -> is_available() on the one assertion that
cared about the rename (the others stay on is_configured() via the
BrowserProvider ABC's backward-compat alias).
Net diff: -630 / +39 lines (tests + dead-code deletion). Verified
23/23 tests in test_browser_cloud_*.py + test_managed_browserbase_and_modal.py
still pass.
Closes the file-tree mismatch portion of #25214. Remaining work:
new plugin-level test coverage under tests/plugins/browser/, behaviour
parity subprocess sweep vs origin/main, and full tests/tools/ regression
sweep before opening the PR.
Switches tools.browser_tool's cloud-provider lookup from the hardcoded
_PROVIDER_REGISTRY class-instantiation pattern to the
agent.browser_registry singleton registry that plugins self-populate.
Changes:
- tools/browser_tool.py top imports: pull BrowserProvider from
agent.browser_provider (re-exported as CloudBrowserProvider for legacy
callers) and the three provider classes from plugins/browser/<vendor>/.
Legacy class names (BrowserbaseProvider, BrowserUseProvider, FirecrawlProvider)
remain on tools.browser_tool as re-export shims so existing test patches
(monkeypatch.setattr(browser_tool, 'BrowserUseProvider', ...)) keep working.
- _get_cloud_provider() now consults agent.browser_registry.get_provider()
for explicit-config lookups. The auto-detect fallback still uses
BrowserUseProvider() / BrowserbaseProvider() at the module level so the
cache-policy test fixtures (which patch those names) keep driving the
function. Test-time _PROVIDER_REGISTRY overrides are detected by class
identity and routed through the legacy factory-call path.
- agent/browser_provider.py: BrowserProvider grows is_configured() and
provider_name() as thin backward-compat aliases for the legacy
CloudBrowserProvider API. Subclasses MUST implement is_available() and
name; the aliases delegate. This keeps ~6 caller sites in browser_tool.py
working without churning them.
- tests/tools/test_managed_browserbase_and_modal.py: _install_fake_tools_package
grows stubs for agent.browser_provider / agent.browser_registry /
plugins.browser.<vendor>.provider so the test's spec-loader path
(sys.modules-reset + reload-tool-from-disk) can satisfy tools.browser_tool's
top-level imports.
Verified: all 23 existing tests in test_browser_cloud_*.py +
test_managed_browserbase_and_modal.py still pass post-cutover.
The legacy tools/browser_providers/ directory is NOT yet deleted; several
tests still _load_tool_module() those files via spec_from_file_location.
The deletion + test-path updates land in a later commit.
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
When Hermes runs on a remote host over SSH, MCP OAuth loopback flows
silently fail: the OAuth provider redirects the user's browser to
http://127.0.0.1:<port>/callback, which reaches the callback server
on the *remote* machine — not the local machine where the browser is
running.
_redirect_handler already detected SSH (via _can_open_browser) and
printed "Headless environment detected — open the URL manually." but
gave no guidance on how to actually reach the callback server. Users
got silent timeouts or "Could not establish connection" errors.
This is the same bug fixed for xAI-oauth and Spotify in #26592, which
added _print_loopback_ssh_hint() in hermes_cli/auth.py. mcp_oauth.py
uses the identical loopback callback pattern (http://127.0.0.1:<port>/callback
via _configure_callback_port / _wait_for_callback) but was missing the hint.
Fix: when SSH_CLIENT or SSH_TTY is set and _oauth_port is available,
print the ssh -N -L port-forward command and the OAuth-over-SSH guide
URL to stderr, consistent with the rest of _redirect_handler's output.
Tests: 4 new cases in TestRedirectHandlerSshHint covering SSH_CLIENT,
SSH_TTY, local session (no hint), and missing _oauth_port (no hint).
Add creationflags=CREATE_NO_WINDOW to every Windows Popen call
across the terminal, process registry, code execution, and kanban
worker subsystems. Prevents visible CMD windows from flashing on
the user's desktop during agent operation.
Also adds the _IS_WINDOWS module constant to kanban_db.py where
it was missing, for consistency with the other patched files.
5 Popen sites across 4 files:
- tools/environments/local.py (terminal foreground spawn)
- tools/process_registry.py (background process spawn)
- tools/code_execution_tool.py (sandbox + interpreter probe)
- hermes_cli/kanban_db.py (kanban worker spawn)
- Remove unused from tools/tts_tool.py (dead code)
- Move _BUILTIN_DELIVER_PLATFORMS set from send() method to module
scope in gateway/platforms/webhook.py to avoid reallocation on
every call
Port from anomalyco/opencode#25019 ("fix: handle invalid mcp urls").
Previously: a typo in `config.yaml` (missing scheme, wrong scheme,
empty string, non-string value) slipped past `_is_http()` and hit
`httpx.URL(url)` or `streamablehttp_client(url, ...)` deep in the
transport layer. That raised a generic exception which went through
the reconnect-backoff loop, so a bad URL caused _MAX_INITIAL_CONNECT_RETRIES
attempts with doubling backoff — about a minute of pointless retries
plus an opaque error — before the server was marked failed.
Now: we validate the URL once, at the top of `run()`, before
entering the retry loop. A malformed URL raises `InvalidMcpUrlError`
(a `ValueError` subclass) with a message that names the offending
server and explains exactly what was wrong. `_ready` is set and
`_error` is populated, so `start()` re-raises and the server shows
up as failed in `hermes mcp list` without any backoff burn.
Validation rules:
- Must be a string (rejects None, dict, int)
- Must be non-empty (rejects '' and whitespace-only)
- Scheme must be http or https (rejects file://, ws://, stdio://)
- Must have a non-empty host (rejects http:///, http://:8080)
Tests (21 new cases in tests/tools/test_mcp_invalid_url.py):
- TestValidUrlsAccepted: http, https, IPv6, ports, paths, query strings
- TestInvalidUrlsRejected: every rejection path above + clear error text
- TestErrorIsValueError: downstream code catching ValueError still works
E2E verified: a misconfigured server with `url: not-a-valid-url`
now fails in <0.001s with the clear error, instead of minutes of retries.
Doesn't touch stdio servers (they use `command`, not `url`) — the
validator only fires when `_is_http()` returns True.
Closes#10695. Picks up the still-vulnerable Python pins on current main:
- aiohttp 3.13.3 -> 3.13.4 (messaging, slack, homeassistant, sms extras +
lazy_deps platform.slack) — CVE-2026-34513 (DNS cache exhaustion),
CVE-2026-34518 (cookie/proxy-auth leak on cross-origin redirect, relevant
for the gateway since it handles OAuth tokens), CVE-2026-34519 (response
reason injection), CVE-2026-34520 (null bytes in headers), CVE-2026-34525
(multiple Host headers).
- anthropic 0.86.0 -> 0.87.0 (anthropic extra + lazy_deps provider.anthropic)
— CVE-2026-34450 (memory tool files created mode 0o666),
CVE-2026-34452 (path-traversal in async local-filesystem memory tool).
Not directly exploitable since hermes-agent doesn't use the SDK's
filesystem memory tool, but the SDK is bumped for hygiene.
- cryptography pinned explicitly at 46.0.7 in core dependencies —
CVE-2026-39892 (buffer overflow on non-contiguous buffers). Previously
came in transitively via PyJWT[crypto]; the explicit floor keeps the
WeCom/Weixin crypto paths from drifting below the fix.
curl-cffi from the original issue is no longer in pyproject.toml or uv.lock,
so no action needed there.
uv.lock regenerated cleanly; only aiohttp / anthropic / cryptography moved.
Credit: original issue + scoping by @shaun0927 (#10695, #10701).
Floor analysis and packaging-surface audit by @gnanirahulnutakki (#10784),
adapted to current main's exact-pin style.
Co-authored-by: shaun0927 <shaun0927@users.noreply.github.com>
Co-authored-by: Gnani Rahul Nutakki <gnanirahulnutakki@users.noreply.github.com>
Port three hardening patches from Claude Code 2.1.113's expanded deny
rules to hermes' detect_dangerous_command() pattern list.
1. macOS /private/{etc,var,tmp,home} system paths
/etc, /var, /tmp, /home are symlinks to /private/<name> on macOS.
A write to /private/etc/sudoers works identically to /etc/sudoers
but bypassed the plain /etc/ pattern check. Extracted a shared
_SYSTEM_CONFIG_PATH fragment so /etc/ and the /private/ mirror
stay in sync across redirect / tee / cp / mv / install / sed -i
patterns.
2. killall -9 / -KILL / -SIGKILL / -s KILL / -r <regex>
Parallel to the existing pkill -9 pattern. killall -9 against
non-hermes processes was previously unprotected, and killall -r
can sweep unrelated processes matching a regex.
3. find -execdir rm
Same destructive effect as find -exec rm but ran in each match's
directory. The previous pattern required a literal '-exec ' so
-execdir slipped through.
Guarded by 32 new test cases in 4 test classes:
- TestMacOSPrivateSystemPaths (11 cases)
- TestKillallKillSignals (9 cases)
- TestFindExecdir (4 cases)
- TestEtcPatternsUnaffectedByRefactor (6 regression guards on
the existing /etc/ coverage after the _SYSTEM_CONFIG_PATH refactor)
Inspiration: https://github.com/anthropics/claude-code/releases
(Claude Code 2.1.113, April 17 2026 - "Enhanced deny rules" and
"Dangerous path protection")
Port from openai/codex#17667: MCP servers can now opt-in to parallel
tool execution by setting supports_parallel_tool_calls: true in their
config. This allows tools from the same server to run concurrently
within a single tool-call batch, matching the behavior already available
for built-in tools like web_search and read_file.
Previously all MCP tools were forced sequential because they weren't in
the _PARALLEL_SAFE_TOOLS set. Now _should_parallelize_tool_batch checks
is_mcp_tool_parallel_safe() which looks up the server's config flag.
Config example:
mcp_servers:
docs:
command: "docs-server"
supports_parallel_tool_calls: true
Changes:
- tools/mcp_tool.py: Track parallel-safe servers in _parallel_safe_servers
set, populated during register_mcp_servers(). Add is_mcp_tool_parallel_safe()
public API.
- run_agent.py: Add _is_mcp_tool_parallel_safe() lazy-import wrapper. Update
_should_parallelize_tool_batch() to check MCP tools against server config.
- 11 new tests covering the feature end-to-end.
- Updated MCP docs and config reference.
Subagent delegation hardcoded api_mode='chat_completions' for any
delegation.base_url that didn't match three specific hostnames
(chatgpt.com, api.anthropic.com, api.kimi.com/coding), and never
read delegation.api_mode from config. Azure AI Foundry's
https://foundry.services.ai.azure.com/anthropic endpoint fell through
and got chat_completions, causing 404s on every delegate_task call.
The main agent already handles this correctly via the shared
_detect_api_mode_for_url() helper (anything ending in /anthropic →
anthropic_messages); delegation reimplemented its own narrower check.
Reuse the shared detector and honor an explicit delegation.api_mode
when set so users can also force the transport on non-standard
endpoints the URL heuristic can't classify.
Fixes#10213.
Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
* feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth
Salvages tools/x_search_tool.py from the closed PR #10786 (originally by
@Jaaneek) and reworks its credential resolution so the tool registers
when EITHER xAI credential path is available:
* XAI_API_KEY (paid xAI API key) is set in ~/.hermes/.env or the env, OR
* The user is signed in via xAI Grok OAuth — SuperGrok subscription —
i.e. hermes auth add xai-oauth has been run
Both paths route through xAI's built-in x_search Responses tool at
https://api.x.ai/v1/responses. When both credentials exist OAuth wins,
matching tools/xai_http.py's existing preference order (uses SuperGrok
quota instead of paid API spend).
The check_fn calls resolve_xai_http_credentials() which auto-refreshes
the OAuth access token if it's within the refresh skew window, so a
True return means the bearer is fetchable AND non-empty.
Wiring
- tools/x_search_tool.py — new tool, ~370 LOC. Schema gated by check_fn,
bearer resolved per-call so revoked OAuth surfaces a clean tool_error
rather than an HTTP 401.
- toolsets.py — "x_search" toolset def. NOT added to _HERMES_CORE_TOOLS;
users opt in via hermes tools.
- hermes_cli/tools_config.py — CONFIGURABLE_TOOLSETS entry + TOOL_CATEGORIES
block with two provider options (OAuth + API key) sharing the existing
xai_grok post_setup hook for credential bootstrap.
- hermes_cli/config.py — DEFAULT_CONFIG["x_search"] with model /
timeout_seconds / retries. Additive nested key; no version bump.
- tests/tools/test_x_search_tool.py — 13 tests covering HTTP shape,
handle validation, citation extraction, 4xx/5xx/timeout handling,
and the full credential-resolution matrix (OAuth-only, API-key-only,
both-set, neither-set, resolver-raises, config overrides, registry
registration).
- website/docs/guides/xai-grok-oauth.md — adds X Search to the
direct-to-xAI tools section with off-by-default note.
- website/docs/user-guide/features/tools.md — new row in the tools table.
Off by default — users enable via `hermes tools` → 🐦 X (Twitter) Search.
Schema only appears to the model when xAI credentials are configured.
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
* docs(x_search): add dedicated feature page + reference entries
- website/docs/user-guide/features/x-search.md (new) — full feature
walkthrough: authentication, enablement, configuration, parameters,
returned fields, example, troubleshooting, see-also links.
- website/docs/reference/tools-reference.md — new "x_search" toolset
section with parameter docs and credential gating note.
- website/docs/reference/toolsets-reference.md — new row in the
toolset catalog table.
- website/sidebars.ts — wires the new feature page under
Media & Web, after web-search.
---------
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).
Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.
Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.
Ported from ironclaw#1639 (one piece of #3838's three-feature scout).
The truncated-tool-call (#1632) and empty-response-recovery (#1677,
#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
Plugins can now replace a built-in tool by passing override=True to
ctx.register_tool(). Without it, the registry rejects any registration
that would shadow an existing tool from a different toolset (unchanged
default behavior).
Unlocks the use case from #11049: drop-in replacement of browser/web
backends without forking core. Composes with the existing pre_tool_call
hook for runtime interception of any implementation.
The override is audit-logged at INFO so it surfaces in agent.log.
Tirith ships no Windows binary, so on every Windows CLI startup users
saw a scary 'tirith security scanner enabled but not available' banner
they could not act on. The banner suggested degraded security; in
reality pattern-matching guards still run and the message was pure noise.
Fix:
- New public is_platform_supported() helper in tools/tirith_security.py
that returns False when _detect_target() doesn't resolve (Windows, any
non-x86_64/aarch64 arch).
- ensure_installed(), _resolve_tirith_path(), and check_command_security()
short-circuit on unsupported platforms: cache _resolved_path =
_INSTALL_FAILED with reason 'unsupported_platform', skip PATH probes,
skip the background download thread, skip the disk failure marker, and
return allow with an empty summary from check_command_security so the
spawn loop never fires.
- Explicit user-configured tirith_path is still honored everywhere (a
user who built tirith themselves under WSL keeps that path).
- CLI banner in cli.py gated on is_platform_supported() — fires only on
platforms where tirith *should* work but isn't installed.
- Docs note tirith's supported-platform list and point Windows users at
WSL.
Tests: tests/tools/test_tirith_security.py +8 tests covering Linux
x86_64, Darwin arm64, Windows, and unknown-arch verdicts plus the
silent ensure_installed / check_command_security / _resolve_tirith_path
fast-paths and the explicit-path override.
test_tirith_security.py 75 passed (8 new + 67 pre-existing)
test_command_guards.py 19 passed
Two log-spam fixes surfaced by a Windows user (Git Bash + Python 3.11.9):
1. LocalEnvironment cwd warn spam
============================
Git Bash's `pwd -P` emits paths like `/c/Users/x`. The base-class
`_extract_cwd_from_output` was assigning this verbatim to `self.cwd`
without validation, then `_resolve_safe_cwd`'s `os.path.isdir(/c/...)`
returned False on Windows, triggering:
LocalEnvironment cwd '/c/Users/NVIDIA' is missing on disk;
falling back to '/' so terminal commands keep working.
...on every terminal call. The pre-existing Windows-path translation
inside `_run_bash` ran AFTER the safe-cwd check, so it could never
prevent the warning.
Fix:
- New `_msys_to_windows_path` helper (idempotent, no-op off Windows).
- `_resolve_safe_cwd` normalizes before `isdir`, so a valid MSYS path
is recognized as the real directory it points at.
- `LocalEnvironment._update_cwd` and a new override of
`_extract_cwd_from_output` translate + validate before mutating
`self.cwd`. Stale / non-existent marker paths roll back to the
previous cwd instead of clobbering it.
- The fallback warning still fires when the directory really is gone
(deletion-recovery scenario from #17558 still covered).
2. tirith spawn-failed warn spam
=============================
When tirith isn't installed (background install in flight, or marked
failed for the day) and the configured path stays as the bare string
`tirith`, every `subprocess.run([tirith_path, ...])` raises OSError
and logged:
tirith spawn failed: [WinError 2] The system cannot find the file specified
...on every command. fail_open=True means behaviour is correct, but
the log noise is severe.
Fix:
- `_warn_once(key, ...)` thread-safe dedupe helper.
- Three hot-path warnings (`tirith path resolved to None`,
`tirith spawn failed: ...`, `tirith timed out after Ns`) now log
once per (exception class, errno) / timeout-value / path-none key.
- Dedupe set is cleared on `_clear_install_failed` so a successful
install lets a subsequent failure surface again.
Tests
=====
- `tests/tools/test_local_env_windows_msys.py`: 12 tests covering the
MSYS→Windows translator, the resolve fast-path, update_cwd validation,
and extract_cwd_from_output rollback.
- `tests/tools/test_tirith_security.py`: 4 new dedupe tests (15 spawn
failures → 1 log line; distinct exc types → 2 lines; timeout dedupe;
path-None dedupe).
Targeted runs:
test_local_env_windows_msys.py 12 passed
test_local_env_cwd_recovery.py 7 passed (pre-existing, no regressions)
test_tirith_security.py 67 passed (63 pre-existing + 4 new)
test_base_environment + local_* 37 passed (no regressions)
test_local_env_blocklist + neighbours 114 passed
Reported via Hermes log capture: 19× cwd warnings + 15× tirith warnings
in a single short session.
On Windows (msvcrt path), _file_lock() first checked if the lock file
existed and wrote it with write_text(), then opened it with open('r+').
Between these two calls, another process could delete the file causing
open('r+') to raise FileNotFoundError — uncaught, leaving memory writes
to proceed without holding the lock, risking data corruption.
Replace the three-line sequence with a single open('a+', ...) call which
atomically creates the file if missing or opens it if it exists, closing
the TOCTOU window entirely. The existing fd.seek(0) before msvcrt.locking()
is preserved and sufficient for correct lock byte positioning.
Root cause: TOCTOU between lock_path.write_text() and open('r+')
Impact: concurrent memory writes on Windows could corrupt MEMORY.md
Pairs with the prior commit (start() now inside the try block). If
threading.Thread.start() itself raises (OS thread exhaustion under
heavy delegation fanout), the finally would call .join() on a
never-started thread, which raises RuntimeError("cannot join thread
before it is started") — trading one rare bug for another.
Thread.ident is None until start() succeeds, so gate the join on it.
_heartbeat_thread.start() was called before the try/finally block that
contains _heartbeat_stop.set(). If _register_subagent() or any code
between .start() and try: raised an exception, the finally block would
never run — leaving the heartbeat thread as an orphan that continues
calling _touch_activity() on the parent agent, incorrectly resetting
gateway timeout counters.
Move _heartbeat_thread.start() to be the first statement inside the
try block so the finally block always reaches _heartbeat_stop.set()
regardless of how the child run completes or fails.
Root cause: heartbeat start outside try/finally scope
Impact: orphan heartbeat thread incorrectly resets parent gateway timeouts
Before: missing node → hard exit; missing browser → FileNotFoundError.
After: both try ensure_dependency() first, which prompts interactively
and delegates installation to install.sh --ensure.
ripgrep and ffmpeg already degrade gracefully (grep fallback, skip
conversion) so they don't need wiring.
Also documents the design rationale in dep_ensure.py: detection and
prompting live in Python (portable, instant, UX-integrated); only
the actual installation delegates to install.sh (1900 lines of
battle-tested OS/package-manager logic).
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
Build on @aydnOktay's cronjob fix by routing the cronjob check through
the shared 'env_var_enabled' helper in utils.py (same truthy set:
1/true/yes/on) and applying the same semantics to the 8 sibling call
sites that read HERMES_INTERACTIVE / HERMES_GATEWAY_SESSION /
HERMES_EXEC_ASK / HERMES_CRON_SESSION with bare os.getenv() truthy
checks:
- tools/approval.py: _is_gateway_approval_context (2), check_command_safety (2),
check_all_command_guards (3) -- 7 sites total
- tools/terminal_tool.py: _handle_sudo_failure, sudo password prompt -- 2 sites
- tools/skills_tool.py: _is_gateway_surface -- 1 site
Without this, a user who exports HERMES_INTERACTIVE=0 in their shell
still gets interactive sudo prompts, approval prompts, and gateway
skill-install paths -- only the cronjob tool was hardened. Now all
consumers agree on the same false-like values.
Also drops the duplicate _is_truthy_env helper from cronjob_tools.py
in favour of the existing canonical utils.env_var_enabled.
Tests: extend the parametrized regression coverage to all three
session env vars (HERMES_INTERACTIVE / HERMES_GATEWAY_SESSION /
HERMES_EXEC_ASK) symmetrically. tests/tools/test_cronjob_tools.py:
60/60 pass; tests/tools/{approval,terminal_tool,skills_tool,
cron_approval_mode,hardline_blocklist}.py: 378/378 pass.
The new resolve_xai_http_credentials() resolver was using os.getenv()
for the XAI_API_KEY/XAI_BASE_URL fallback path, which dropped the
~/.hermes/.env contract guarded by PR #17140 / #17163. Users with
XAI_API_KEY in dotenv only would see "No xAI credentials found" even
though the key was configured.
Separately, _transcribe_xai started consulting creds["base_url"] (which
always returns at least the default https://api.x.ai/v1) ahead of the
public XAI_STT_BASE_URL env override, so the per-tool override stopped
working.
- tools/xai_http.py: add module-level get_env_value() wrapper that
reads ~/.hermes/.env first (via hermes_cli.config.get_env_value),
then os.environ. Resolver uses it for the API-key/base-url fallback.
- tools/transcription_tools.py: restore precedence so XAI_STT_BASE_URL
wins over creds["base_url"].
- tests/tools/test_transcription_dotenv_fallback.py +
tests/tools/test_tts_dotenv_fallback.py: repoint the per-call-site
patches at the new resolution point (tools.xai_http.get_env_value).
The end-to-end regression-guard test (which patches load_env) is
unchanged and still passes.
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.
Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
loopback clients) and `referrer=hermes-agent` for best-effort
attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
`exp`-based expiry detection with skew; refresh-token rotation
synced both ways between the singleton store and the credential
pool so multi-process / multi-profile setups don't tear each other's
refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
refreshes the token, swaps it back into `self.api_key`, and retries
the call once. Guarded against silent account swaps when the active
key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
dedicated xAI Responses-mode auxiliary client instead of falling back
to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
plugin) resolve credentials through a unified runtime → singleton →
env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
wired through the standard auth-commands surface; remove cleans up
the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
"xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
back to pool credentials when the singleton is missing.
Hardening
---------
* Discovery and refresh responses validate the returned
`token_endpoint` host against the same `*.x.ai` allowlist as the
authorization endpoint, blocking MITM persistence of a hostile
endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
wrapped to raise typed `AuthError` on malformed bodies (captive
portals, proxy error pages) instead of leaking JSONDecodeError
tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
transport (sending it as a top-level kwarg trips xAI's SDK with a
TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
an OAuth entry doesn't silently flip the active provider out from
under the running agent.
Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
redirect URI validation, singleton↔pool sync, concurrency races,
refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
`test_run_agent_codex_responses.py` cover the pool sync-back,
`extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
Wrap requests.post() in create_session() for browser_use, browserbase,
and firecrawl providers with requests.RequestException handling.
Connection timeouts and DNS resolution failures now surface as clean
RuntimeError messages instead of raw requests exception tracebacks.
Browser Use managed-gateway mode preserves raw exception propagation
so the existing idempotency-key retry semantics keep working.
Closes#2746
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Three asyncio.gather() calls in tools/web_tools.py ran without
return_exceptions=True. A single failing task (e.g. LLM rate limit on
one URL) would raise out of gather() and discard every other
successfully fetched/summarized result.
Pass return_exceptions=True and filter BaseException entries with a
warning log before unpacking. Affects:
- chunk summarization gather (large web_extract pages)
- firecrawl per-result LLM post-processing
- tavily crawl per-result LLM post-processing
Closes#2744
Remove redundant inner `import re` and regex recompilation on every call in
_interpolate_env_vars. Add module-level _ENV_VAR_PATTERN compiled once.
Replace the separate _interpolate_value() in mcp_config.py (which used \w+
and would silently fail on env vars containing hyphens or dots) with the
shared _ENV_VAR_PATTERN from mcp_tool.py. Remove now-unused import re.
Cron mutation operations (run/pause/resume/remove) and 'hermes cron edit'
now accept a job name in addition to the hex ID, with case-insensitive
matching. Before this, 'hermes cron run my_job_name' died with
'Job with ID my_job_name not found' and forced the user to look up the
hex ID first.
The original PR matched by name but silently picked the first match when
two jobs shared a name. This version refuses to act on an ambiguous name
and surfaces every matching job (id, name, schedule, next_run_at) so the
caller can pick a specific ID.
- cron/jobs.py:
- get_job() stays ID-only (preserves existing call-site semantics for
web_server/api_server/curator/scheduler/test code that always passes
real IDs).
- resolve_job_ref() is the new name-or-ID resolver, used by pause/
resume/trigger/remove_job. Exact ID match wins over a name match
even if a different job's name happens to equal that ID. Ambiguous
name match raises AmbiguousJobReference with all candidate IDs.
- tools/cronjob_tools.py: dispatch site uses resolve_job_ref, surfaces
ambiguous matches as a structured error with the matching IDs.
- hermes_cli/cron.py: 'cron edit' uses resolve_job_ref so editing by
name works and ambiguous names are reported with IDs.
- tests/cron/test_jobs.py: new TestResolveJobRef covering ID match,
case-insensitive name match, ID-wins-over-name, ambiguous refusal,
and that pause/resume/trigger/remove all refuse on ambiguity.
Closes#2627
When the in-tree FAL path has no API key (and no managed gateway), the
handler used to return a bare 'FAL_KEY environment variable not set'
error. Users had no idea where to get a key, that a managed Nous
gateway exists, or that plugin-registered providers are an option.
Now `image_generate_tool` returns a structured multi-line message:
- signup link (https://fal.ai)
- managed-gateway status (if Nous tools are enabled)
- pointer to `hermes tools` / `hermes plugins list` for alternate
backends, so users on a stale `image_gen.provider` know where to look
The schema is untouched — `check_fn` still gates the tool out of the
schema when no backend is reachable at startup, consistent with every
other conditional tool. This patch fixes the call-time failure modes:
managed-gateway 5xx, plugin provider disappearing mid-session, etc.
Inspired by #2546 / @Mibayy. The PR was ~5700 commits stale against
the new plugin-aware image_gen architecture, so this is a forward port
of the actionable-error idea rather than a cherry-pick.
Closes#2543
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Adds Hugging Face's official skill catalog to the default GitHub taps and
classifies it as a trusted source alongside openai/skills and anthropics/skills.
- tools/skills_guard.py: huggingface/skills -> TRUSTED_REPOS
- tools/skills_hub.py: GitHubSource.DEFAULT_TAPS += huggingface/skills (skills/)
- website/docs: list it under default taps + trusted-source examples
Closes#2549.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Discord's CDN serves attachments with Content-Encoding: br. aiohttp's
compression_utils tries 'import brotlicffi as brotli' first and falls back
to google's Brotli, but Brotli<1.2.0's Decompressor.process() is 1-arg
while aiohttp calls it with 2 args (data, max_length). Result: every
.txt/.md/.doc uploaded to a Discord-gateway session fails to decode at
att.read() with 'Can not decode content-encoding: br' / 'TypeError:
process() takes exactly 1 argument (2 given)', the agent never sees the
bytes, and falls back to filesystem guessing.
Pin brotlicffi==1.2.0.1 in both surfaces:
- tools/lazy_deps.py 'platform.discord' tuple: Discord users on the
lazy-install path get it on first discord.py import.
- pyproject.toml [messaging] extra: users who explicitly install
hermes-agent[messaging] (skipping the lazy path) get it eagerly.
brotlicffi wins aiohttp's import race regardless of what else is
installed (try brotlicffi / except: import brotli), so existing setups
that already pulled google's Brotli transitively don't change behavior
beyond the bug fix. ~1.5 MB wheel, manylinux/macOS/Windows coverage.
E2E verified: round-trip decode of Brotli-compressed payload via
aiohttp.compression_utils.brotli succeeds with brotlicffi pinned; same
test against Brotli==1.1.0 alone reproduces the reported TypeError.
Credit to @Korkyzer for the original diagnosis and fix shape in #15744;
the lazy-deps gating layer was added on top to keep brotlicffi out of
the install path for users who don't run a Discord gateway.
Fixes#12511.
Closes#15744.
Co-authored-by: Korky <korkyzer@gmail.com>
Follow-up to the sandbox-bypass env-var fix:
- Update the opt-out gate so a user-provided AGENT_BROWSER_ARGS is also
respected, not just the legacy AGENT_BROWSER_CHROME_FLAGS. Previously
the gate only checked the broken legacy var, so a user who pre-set
AGENT_BROWSER_ARGS would still get clobbered by Hermes's auto-injection.
- Document AGENT_BROWSER_ARGS in .env.example, the browser feature page,
and the env var reference, with notes about the auto-injection on
AppArmor-restricted systems (Ubuntu 23.10+, DGX Spark, containers).
- Add Anadi Jaggia to AUTHOR_MAP.
AGENT_BROWSER_CHROME_FLAGS is not read by agent-browser CLI.
The correct env var is AGENT_BROWSER_ARGS, with comma-separated values.
This fixes Chrome 'No usable sandbox' crash on Ubuntu 23.10+ systems
where AppArmor restricts unprivileged user namespaces. The detection
logic was correct but the fix used the wrong environment variable name
and space-separated instead of comma-separated args.
Pre-existing diagnostics below an edit point used to surface as 'LSP
diagnostics introduced by this edit' whenever the edit deleted or
inserted lines. The delta-filter key included the diagnostic's
range, so the same logical error reported at a different line in
the post-edit snapshot looked like a brand new diagnostic.
Concrete case: deleting 14 lines in cli.py caused Pyright errors at
lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be
reported as introduced by it.
Fix: build a piecewise-linear line-shift map (via difflib's
SequenceMatcher) from pre and post content, and remap baseline
diagnostics into post-edit coordinates before the set-difference.
Diagnostics in deleted regions drop out cleanly; diagnostics below
the edit shift by the right amount; diagnostics above are untouched.
The strict (range-aware) equality key stays — so a genuinely new
instance of an identical error class at a different line still
surfaces as new.
Pieces:
- agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range,
shift_baseline. Pure functions, no LSP state.
- agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an
optional line_shift kwarg; baseline is shift_baseline'd before
computing the seen-set. _diag_key keeps the strict range key.
- tools/file_operations.py — write_file captures pre_content for any
LSP-handled extension (not just LINTERS_INPROC) and passes pre/post
to _maybe_lsp_diagnostics, which builds the shift map.
- New _lsp_handles_extension helper guards the pre_content read.
Trade-offs preserved:
- Genuinely new same-class errors at different lines still surface
(content-only key would have swallowed them).
- Pre-existing errors at unshifted positions still get filtered
(covered by the strict-key path with no shift).
- Best-effort: when pre_content can't be captured (file didn't
exist, permissions), the unshifted comparison still catches
most pre-existing errors; the edge case it misses is a new file
with a non-empty baseline, which is structurally impossible.
Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional
backends moved to tools/lazy_deps.py and only install on first use.
hermes update runs uv pip install -e .[all] which doesn't touch any of
them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were
silently ignored on already-activated backends.
Two changes:
1. _is_satisfied() now parses the spec and checks the installed version
against the constraint via packaging.specifiers. Previously it
returned True the moment the package name was importable, which made
ensure() a name-presence gate rather than a version-pin gate.
2. New active_features() / refresh_active_features() pair: lists every
feature with at least one of its packages currently installed, then
re-runs ensure() on each. Refresh is invoked at the end of
_cmd_update_impl, right after the [all] install completes. Cold
backends (never activated) stay quiet — no churn for them.
Output during update is one summary block:
→ Refreshing 4 active lazy backend(s)...
↑ 1 refreshed: provider.anthropic
✓ 3 already current
or
⚠ memory.honcho failed to refresh: <pip stderr>
Failures never raise out of update — backends keep their previously-
installed version and we tell the user to rerun once upstream is fixed.
security.allow_lazy_installs=false is honored: features get marked
"skipped" with the reason shown.
Tests: 18 new unit tests covering version-aware satisfaction (exact pin,
range, extras blocks, missing package, malformed spec), active feature
discovery, and refresh status reporting. All 61 lazy_deps tests pass.
The _foreground_background_guidance() function matched background-wrapper
keywords (nohup/disown/setsid) anywhere in the command text, including
inside quoted strings, Python -c code, commit messages, and PR body text.
Two-layer fix:
1. Strip single-quoted, double-quoted, and backtick-quoted content before
pattern matching via _strip_quotes() helper.
2. Tighten the regex to only match keywords at command-start positions
(after ^, ;, &, &&, ||, or $() — not mid-argument.
Both layers are needed: quote stripping handles the common case of keywords
in string literals, and the position-aware regex handles unquoted cases
like 'export FOO=setsid' (word boundary match, wrong position).
Fixes#20064
Cross-provider delegation (e.g. MiniMax parent → DeepSeek child) must not
inherit the parent's api_mode, because each provider uses a different API
surface: MiniMax uses 'anthropic_messages' while DeepSeek uses
'chat_completions'. Inheriting the wrong mode causes 404 errors.
When the effective provider differs from the parent's provider, derive
api_mode from the target provider's defaults instead (None triggers
re-derivation).
Refs: Bug #20558, PR #20563
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.
Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:
1. **search**: ``firecrawl.search()`` caught the
``ValueError("Web tools are not configured...")`` from
``_get_firecrawl_client()`` and returned it as
``{"success": False, "error": ...}``, losing the legacy
``{"error": "Error searching web: ..."}`` envelope that
``tool_error()`` emits on main. Models that special-case the
``error`` key still detect the failure, but the prefix is part of
the legacy contract some users rely on.
2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
``ValueError`` and wrapped it as a per-page error inside
``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
BEFORE dispatching, so its unconfigured response is
``{"success": False, "error": "web_crawl requires Firecrawl..."}``
at the top level. The PR's per-page burying hid the failure inside
``results[]`` where models that check ``result.get("error")`` would
miss it.
Fix:
- ``plugins/web/firecrawl/provider.py``: pull
``_get_firecrawl_client()`` outside the broad ``try`` in
``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
to the dispatcher's top-level exception handler. In-flight SDK
errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
``web_crawl_tool``. When the resolved crawl provider is
``is_available()==False``, short-circuit BEFORE dispatching with the
same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
(``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
future plugin work can't undo this.
Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.
What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
- Accepts the same kwargs as the dispatcher passes to any crawl
provider (``instructions``, ``depth``, ``limit``); Firecrawl's
/crawl endpoint ignores ``instructions`` and ``depth`` so we log
and drop with a clear info message.
- Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
gateway event loop isn't blocked on a multi-page crawl.
- Preserves the response-shape normalization across pydantic /
typed-object / dict variants that the legacy inline code did.
- Preserves per-page website-policy re-check (catches blocked
redirects after the SDK returns).
- Returns the same {"results": [...]} shape so the dispatcher's
shared LLM-summarization post-processing path works unchanged.
- Sets supports_crawl() to True so the dispatcher routes through
the plugin instead of the legacy fallthrough.
2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
that used to run after "No registered provider supports crawl" —
~270 lines including:
- check_firecrawl_api_key gate + typed error
- inline SSRF + website-policy seed-URL gate (dispatcher already
does this)
- Firecrawl client setup with crawl_params
- 100+ lines of pydantic/dict/typed-object normalization
- Per-page LLM-processing loop (kept in the dispatcher's shared
post-processing path; that's where it always belonged)
- trimming + base64 image cleanup (still done in the dispatcher's
shared path)
Replaced with a single typed-error branch when no crawl-capable
provider is available: "web_crawl has no available backend. Set
FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
TAVILY_API_KEY for Tavily."
Test updates
------------
- tests/tools/test_website_policy.py:
- test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
gate still runs on web_tools.check_website_access (no change to
that patch), but the firecrawl client lockdown moved to the
plugin module — patch firecrawl_provider._get_firecrawl_client
instead of web_tools._get_firecrawl_client. The dispatcher
short-circuits before the plugin runs, so the test still passes.
- test_web_crawl_blocks_redirected_final_url: patch the per-page
policy gate at plugins.web.firecrawl.provider.check_website_access
(where it now runs) AND on web_tools (where the seed-URL gate
still runs). Patch firecrawl_provider._get_firecrawl_client for
the FakeCrawlClient injection. Both checks flow through the same
fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
- Update parametrized capability-flag spec: firecrawl supports_crawl
is now True.
- Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
verifies inspect.iscoroutinefunction(p.crawl) is True and that
the async crawl returns a per-page error dict (not a raise) when
FIRECRAWL_API_KEY is missing.
Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
name search extract crawl async-extract? async-crawl?
firecrawl True True True True True
tavily True True True False False
Both crawl-capable providers exercise the dispatcher's
inspect.iscoroutinefunction async-or-sync detection.
Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
+ new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
before this commit, 2227 before the migration started).
Removes the legacy in-tree provider scaffolding that PR #25182 fully
replaced with the plugin architecture:
tools/web_providers/__init__.py (6 lines)
tools/web_providers/base.py (89 lines — old ABCs)
tools/web_providers/ARCHITECTURE.md (73 lines — old design doc)
These were the staging-ground ABCs and provider modules that the
plugin migration absorbed. All seven web providers now implement the
single :class:`agent.web_search_provider.WebSearchProvider` ABC and
live under ``plugins/web/<vendor>/``. Nothing else in the tree imports
``tools.web_providers`` — verified via grep before deletion.
Test migration (tests/tools/test_web_providers.py)
--------------------------------------------------
Rewrote ``TestWebProviderABCs`` to test the new unified ABC at
:mod:`agent.web_search_provider`:
- test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available``
- test_concrete_search_only_provider_works — exercise default
``supports_extract=False`` / ``supports_crawl=False`` flags
- test_concrete_multi_capability_provider_works — exercise all three
capabilities, async extract supported (declared sync here for
simplicity; real plugins like parallel + firecrawl use async)
- test_search_only_provider_skips_extract_and_crawl — verify
``supports_*()`` flags default to False so search-only providers
don't have to implement extract() or crawl()
The 9 other tests in the file (per-capability backend selection,
DEFAULT_CONFIG merge, dispatcher routing) test public helpers in
``tools.web_tools`` that still exist and pass unchanged.
agent/web_search_provider.py docstring updated to reflect that the
legacy ABCs no longer exist; the response-shape contract is preserved
bit-for-bit so external consumers see no behavioral change.
Net diff
--------
- tools/web_providers/ removed (-168 lines)
- tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net,
same coverage, new API)
- agent/web_search_provider.py docstring (-3/+5 lines)
Verified
--------
- 173/173 targeted web tests pass
- 12/12 ABC contract tests pass with the new interface
- No remaining grep hits for ``tools.web_providers`` outside of
intentional historical references in plugin docstrings.
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.
Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
_FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
_get_firecrawl_gateway_url, _is_tool_gateway_ready,
_has_direct_firecrawl_config, _raise_web_backend_configuration_error,
_firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
_get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
_normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
_exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key
Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).
Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
_get_direct_firecrawl_config, _get_firecrawl_gateway_url,
_is_tool_gateway_ready, _has_direct_firecrawl_config,
_firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
_get_firecrawl_client, _to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload,
check_firecrawl_api_key
- From plugins.web.tavily.provider:
_tavily_request, _normalize_tavily_search_results,
_normalize_tavily_documents
- From plugins.web.parallel.provider:
_get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
_get_exa_client
Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
resolve_managed_tool_gateway, managed_nous_tools_enabled,
prefers_gateway (tests patch tools.web_tools.<name>)
Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.
The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.
Verified
--------
- 173/173 targeted web tests pass:
test_web_providers.py, test_web_providers_brave_free.py,
test_web_providers_ddgs.py, test_web_providers_searxng.py,
test_web_tools_config.py, test_web_tools_tavily.py,
test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
(plugins.web.<vendor>.provider)
Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
(next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR