Four kanban dashboard test failures, all from PR salvages that picked up
the test additions but dropped the corresponding implementations.
- BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the
board API never grew the column → test_board_empty failed because
VALID_STATUSES - {archived} mismatched the rendered columns).
- update_task: enrich the 'ready' 409 detail with the blocking parent
list (id, title, status) and add _parents_blocking_ready helper.
Implementation lost in the #26744 salvage (commit e215558ba) which
pinned the test but not the server-side code.
- dist/index.js: add parseApiErrorMessage helper, wire it through the
drag/drop banner, add patchErr state to the TaskDrawer and surface
it inline by the action row. Lost in the same #26744 salvage.
- test_diagnostics_endpoint_severity_filter: update to at-or-above
semantics (PR a94ddd807 changed the filter from exact-match so the
warning filter now correctly includes error+critical too).
Salvages #28125 by @Jpalmer95. Adds:
- Drag-to-delete trash zone in the kanban dashboard
- Bulk delete endpoint with cascading delete_task cleanup
- Frontend updates (drag visual + drop handler)
- Confirmation prompt before delete
Resolved end-of-file test conflict by appending both halves.
Salvages #24533 by @roycepersonalassistant. Adds a first-class
'scheduled' Kanban status for time-delay follow-ups that aren't
waiting on human input.
- hermes kanban schedule <task_id> [reason] CLI command
- Dashboard/API transitions to/from Scheduled
- unblock_task() now releases both 'blocked' AND 'scheduled' tasks
(re-checking parent dependencies before moving to ready/todo)
- i18n + docs updates
Resolved conflicts: kept HEAD's failure-counter reset on unblock
alongside the PR's scheduled state, kept HEAD's 'running' direct-set
rejection, combined both bulk-status branches. Dropped the dist/
bundle changes (months-stale; would need rebuild from source).
Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:
- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters
Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.
Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.
- GET /workers/active
Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
worker_pid IS NOT NULL, status='running'. Returns
{workers: [...], count, checked_at}.
- GET /runs/{run_id}
Direct lookup of any task_run row by id. Reuses existing
kanban_db.get_run() helper and _run_dict() serialiser. 404 when
not found. Mirrors GET /tasks/{task_id} 404 shape.
- GET /runs/{run_id}/inspect
Live PID stats via psutil.Process.as_dict() — cpu_percent,
memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
create_time, cmdline. Short-circuits with alive:false when run
has ended, has no worker_pid, the pid is gone, or psutil is
unavailable. AccessDenied surfaces as alive:true with error
rather than a 500.
11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.
Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).
Salvages #24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.
The checkbox label echoed its state ("Auto (default)" / "Manual") instead
of describing the action, so a checked box reading "Auto" parsed as a
status indicator rather than a control. The accompanying sub-description
was also static and started with "When on, ...", which read awkwardly
when the box was unchecked.
Replace the dynamic label with a static action label
("Auto-decompose triage tasks") and flip the sub-description between the
two modes so it stays accurate either way. The top-of-page Orchestration
pill is unchanged — that one is intentionally a status badge / toggle.
Fixes#28178
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the kanban auto-decomposer fans a triage task into child tasks,
recompute_ready() immediately promotes parent-free children to 'ready'
so the dispatcher picks them up. Some users want a manual workflow
where children stay in 'todo' for review before dispatch.
Add 'kanban.auto_promote_children' config key (default: true):
- false: children stay in 'todo' after decomposition
- true: existing behavior (auto-promote to 'ready')
Changes:
- kanban_db.py: decompose_triage_task() gains auto_promote param
- kanban_decompose.py: reads auto_promote_children from config
- kanban dashboard API: exposes the new setting in GET/PUT /orchestration
Closes#28016
Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with
overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit-
scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns
sit side-by-side at a fixed width instead of wrapping into a 2xN grid.
Page vertical scroll is unaffected: each column already caps at
max-height: calc(100vh - 220px), so the container never grows tall
enough to introduce its own vertical scrollbar.
The dashboard SDK's <Select> is a shadcn-style popup that fires
onValueChange(value), not native onChange({target:{value}}). The file
even has a selectChangeHandler() helper at L213 documenting this:
"Older plugin code calls onChange({target:{value}}) which silently
never fires."
#24547 already fixed the bulk-reassign, workspace-kind, and new-task
parent selects. This patch covers the two OrchestrationPanel selects
introduced later in #27572 that regressed onto the same broken pattern:
- OrchestrationPanel orchestrator_profile picker
- OrchestrationPanel default_assignee picker
Users opened the popup, picked an option, and the popup closed without
firing a PUT to /orchestration — so the orchestrator profile and
default assignee dropdowns appeared totally inert.
Uses the same selectChangeHandler helper as the other working Selects
in the file for consistency.
Reported by Exaario.
SDK Select fires onValueChange(value) not onChange({target:{value}}), so
all three bare onChange handlers silently received undefined from e.target.
Replace raw onChange with selectChangeHandler() — the existing helper that
wires both onValueChange and a guarded onChange — so selections register
regardless of which event the SDK Select dispatches.
Closes#24520
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a 'triage_aux_unavailable' diagnostic for tasks stuck in triage when
neither the active aux helper slot nor the main-model auto fallback is usable.
Auto-decompose aware:
- kanban.auto_decompose=True (default): primary is auxiliary.kanban_decomposer,
triage_specifier is the fanout=false fallback.
- kanban.auto_decompose=False: primary is auxiliary.triage_specifier (manual
'hermes kanban specify' path).
Default aux slots use 'provider: auto' which falls back to the main model, so
this rule only fires when both the explicit slot config AND the main-model
auto fallback are absent. Quiet by default; informative when there is a real
config gap.
Also adds kd.config_from_runtime_config() that carries kanban + auxiliary +
model keys through to diagnostics, and updates CLI/dashboard call sites to
use it. config_from_kanban_config() is preserved for back-compat.
Reworks the original PR #25640 idea (@qWaitCrypto) to align with the new
auto-decompose dispatcher path landed in #27572. The original PR pointed only
at auxiliary.triage_specifier, which is now the fallback rather than the
primary helper.
Co-authored-by: qWaitCrypto <axmaiqiu@gmail.com>
* feat(kanban): orchestrator-driven auto-decomposition on triage
Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").
The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
read_profile_meta / write_profile_meta. `create_profile` accepts
optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
sentence description from a profile's skills + model + name via the
auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
flag; new `hermes profile describe [name] [--text ... | --auto |
--all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
creates N child tasks, links the root as a child of every leaf
(root waits for the whole graph), flips root `triage -> todo` with
orchestrator assignee, records an audit comment + `decomposed` event
in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
(`auxiliary.kanban_decomposer`) with the profile roster + descriptions
to produce a JSON task graph, then invokes the DB helper. Rewrites
unknown assignees to the configured `kanban.default_assignee` (or
the active default profile) so a task NEVER lands with assignee=None.
Falls back to specify-style single-task promotion when the LLM
returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
kanban.orchestrator_profile, kanban.default_assignee,
kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
(default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
before each `_tick_once`, capped by `auto_decompose_per_tick` so a
bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
GET /profiles (list roster + descriptions),
PATCH /profiles/<name> (set description, user-authored),
POST /profiles/<name>/describe-auto (LLM-generate),
POST /tasks/<id>/decompose (run decomposer),
GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
collapsible — dropdowns for orchestrator profile and default
assignee, auto-decompose toggle, per-profile description editor with
Save and Auto-generate buttons. New ⚗ Decompose button next to
✨ Specify on triage-column task drawers.
Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
Children with no internal parents flip to `ready` immediately
(parallel dispatch). Children with sibling parents wait. The root
stays alive as a parent of every child — when the whole graph
finishes, it promotes to `ready` and the orchestrator profile wakes
back up to judge completion (the "adds more tasks until done" part
of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
automatically on dispatcher ticks; manual `hermes kanban decompose`
is always available.
Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
atomic DB helper (status transitions, dep graph, audit trail,
validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
decomposer module (fanout, no-fanout fallback, unknown-assignee
rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
user-vs-auto description protection, --overwrite, fallback parsing).
E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
task, mocked the aux LLM with a 3-task graph -> verified all three
children were created with the right assignees, the dependency
edges matched the LLM's graph, root flipped to todo gated by every
child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
HERMES_HOME, verified all four new endpoints via curl (profile
listing, PATCH for description, PUT for orchestration settings,
POST for decompose). Opened the UI in the browser, confirmed the
OrchestrationPanel renders with all three pickers + the per-profile
description editor, typed a description, clicked Save, verified
~/.hermes/profile.yaml was written. Clicked Decompose on the triage
card and confirmed the inline error message surfaced as designed
("no auxiliary client configured").
* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill
The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.
UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
to "Decompose mode / Auto (default) | Manual" for language parity
with the pill.
Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
until the user clicks ⚗ Decompose on each card (or runs
`hermes kanban decompose <id>`).
Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
(not just on expand) so the collapsed pill reflects real state.
Render mode pill in both collapsed and expanded headers. Reuses the
existing PUT /api/plugins/kanban/orchestration endpoint — no new
backend, no new tests required.
E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.
* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"
Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.
- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change
* docs(kanban): document decompose, profile descriptions, orchestration mode
Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.
- website/docs/reference/cli-commands.md
+ `hermes kanban decompose` action row in the action table, with
pointer to the Auto vs Manual orchestration section.
- website/docs/reference/profile-commands.md
+ `--description "<text>"` flag on `hermes profile create`.
+ Full `hermes profile describe` section: read, --text, --auto,
--overwrite, --all flags with examples.
- website/docs/user-guide/features/kanban.md (the big one)
+ Triage column intro rewritten around the Auto-decompose default
behavior, with pointer to the new Auto vs Manual section.
+ Status action row updated to mention both ⚗ Decompose and
✨ Specify on triage cards.
+ New "Auto vs Manual orchestration" section explaining the two
modes, how to flip them (pill, config), how routing-by-description
works, the no-None-assignee guarantee, plus a config knob table
(auto_decompose, auto_decompose_per_tick, orchestrator_profile,
default_assignee) and the two new auxiliary slots
(kanban_decomposer, profile_describer).
+ REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
/profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
/orchestration (GET + PUT).
- website/docs/user-guide/features/kanban-tutorial.md
+ Triage column blurb updated for Auto by default + Manual via the
pill, with cross-link to the Auto vs Manual orchestration section.
- website/docs/user-guide/profiles.md
+ Blank-profile flow now mentions --description and points to the
kanban routing model for context.
- website/docs/user-guide/configuration.md
+ `kanban_decomposer` and `profile_describer` added to the
`hermes model -> Configure auxiliary models` menu listing.
PR #25580 was authored before #2746 landed on main, so its plugin
versions of browser_use/browserbase/firecrawl ship without the
requests.RequestException → RuntimeError wrapping that 13c72fb4 added
to the legacy tools/browser_providers/ files for #2746. Cherry-picking
the PR + git rm'ing the legacy files (the migration's intent) would
silently revert that network-error fix.
Port the same try/except pattern into the three plugin create_session()
methods. Browser Use managed-mode keeps its raw-exception propagation
(idempotency-key retry semantics).
Co-authored-by: nidhi-singh02 <nidhi2894@gmail.com>
Addresses findings from two self-review passes pre-merge.
First pass (3-agent parallel review):
1. plugins/browser/browser_use/provider.py: drop the
``_ = managed_nous_tools_enabled`` dead-import-hider in
_get_config_or_none(). The import was actively misleading — the
helper IS used in _get_config() (separate method, separate import),
not here. The "keep static analysis happy" comment was wrong about
what the helper does in this scope.
2. agent/browser_provider.py: drop ``pragma: no cover`` from
is_configured() / provider_name() backward-compat aliases. They ARE
covered by ``TestLegacyAbcAliases`` — the pragma would have masked
future regressions.
3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
instead of hardcoded set of 3 keys. Future maintainers adding a 4th
built-in provider now just extend _PROVIDER_REGISTRY; the override
detection adapts automatically. Previously the hardcoded
``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
True forever on any 4-key registry, silently routing every install
onto the legacy fixture path.
4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
but the registry has no matching plugin (typo, uninstalled plugin,
discovery failure), emit a WARNING with actionable text instead of
silently falling through to auto-detect. Legacy code surfaced a typed
credentials error via direct class instantiation; this log restores
the signal in the post-migration path.
5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
documentation. Module docstring + 13-line block-comment + 5-line
inline comment was repeating the same point. Kept the docstring and
trimmed the block-comment to 5 lines.
6. agent/browser_registry.py: upgrade is_available()-raised logging from
DEBUG to WARNING with exc_info=True. A provider's availability check
throwing is unusual enough that users debugging "no cloud provider"
need the traceback in logs.
7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
imports (os, shutil, tempfile — only referenced inside the
SUBPROCESS_SCRIPT string literal that runs in a child process).
Second pass (architecture + claim-verification review):
8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
auto-detect branch. Prior text claimed it "routes through the plugin
registry's legacy preference walk so third-party plugins still get a
chance to be selected when they're explicitly configured" — false on
both counts. The branch uses module-level legacy class aliases
(BrowserUseProvider / BrowserbaseProvider) directly; third-party
plugins are intentionally reachable only via explicit
``browser.cloud_provider``. Corrected comment now matches behaviour
and cross-references _LEGACY_PREFERENCE for the firecrawl gate
rationale.
9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
drop the unused ``get_active_browser_provider as
_registry_get_active_browser_provider`` alias from the
``from agent.browser_registry import ...`` block. It was never
referenced; matching test-stub line in the agent.browser_registry
SimpleNamespace also dropped. ``get_provider`` is still imported (used
by the explicit-config dispatch path at line 535).
10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
with the early-guard pattern used in browserbase + browser_use
plugins. Previously firecrawl tried the DELETE and relied on
``_headers()`` raising ValueError to trip a "missing credentials"
warning; same final outcome but a different control flow that read
like a bug to a maintainer skimming the three modules. Now: if
is_available() is False, log+return early — identical shape to the
other two providers.
Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
Migrates the remaining two cloud browser providers to plugins:
plugins/browser/browser_use/ — dual auth (direct BROWSER_USE_API_KEY
or managed Nous gateway), idempotency-
key handling for retried managed-mode
creates, x-external-call-id capture.
plugins/browser/firecrawl/ — direct FIRECRAWL_API_KEY only;
distinct from plugins/web/firecrawl/
(same key, different endpoint).
Also drops the 'single-eligible shortcut' rule from
agent.browser_registry._resolve(). Was a copy-paste from
web_search_registry that would have introduced a real behavior change:
a user with only FIRECRAWL_API_KEY set (for web-extract) would silently
get routed to a paid Firecrawl cloud browser on a fresh install — not
matching origin/main, which only auto-detected between Browser Use and
Browserbase. Third-party browser plugins are subject to the same gate:
they require explicit `browser.cloud_provider` to take effect.
Verified end-to-end via plugin discovery:
- 3 plugins register (browser-use, browserbase, firecrawl)
- _resolve(None) with no creds: None (local mode)
- _resolve(None) with only FIRECRAWL_API_KEY: None (matches main)
- _resolve('firecrawl'): firecrawl (explicit wins)
- _resolve(None) with BU+firecrawl: browser-use (legacy walk first hit)
- _resolve(None) with all three: browser-use (legacy walk order)
Migrates tools/browser_providers/browserbase.py → plugins/browser/browserbase/.
Direct credentials only (BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID); same
session-creation, 402-handling, and feature-flag logic as the legacy
implementation. Renames is_configured() → is_available() to match the new
BrowserProvider ABC.
The legacy module tools/browser_providers/browserbase.py is NOT yet deleted
and tools/browser_tool.py still references the in-tree class. The dispatcher
cutover happens in a later commit so the plugin migration and the dispatcher
switch land as separate reviewable units.
Verified via plugin-discovery E2E:
- browserbase registers as 'browserbase'
- is_available() correctly tracks BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID
- _resolve('browserbase') returns the provider even when unavailable
(so dispatcher surfaces a typed credentials error)
- _resolve(None) returns the provider when it's the single eligible one
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
_LineClient's five aiohttp.ClientSession() calls omit trust_env=True,
silently bypassing HTTP_PROXY / HTTPS_PROXY / ALL_PROXY. Result: every
LINE API call (reply, push, loading, fetch_content, get_bot_user_id)
ignores the system proxy.
Fix: add trust_env=True to all five session constructions. Symmetric
with the wecom and weixin adapters which already set this flag. No
behavior change for users not behind a proxy.
aiohttp.ClientSession defaults to trust_env=False, which silently ignores
HTTP_PROXY, HTTPS_PROXY, and ALL_PROXY environment variables. Users behind
a corporate or network proxy cannot reach external APIs on any of these
platforms — all outbound requests fail with connection errors.
Symmetric with wecom.py (line 276), weixin.py (lines 1055/1268/1274), and
matrix.py (no-proxy path) which already set this flag. Complements the
open LINE fix (#26635) with the remaining gateway and plugin adapters.
Changed:
- gateway/platforms/sms.py: persistent Twilio session (connect) + fallback
session (send) — both hit https://api.twilio.com
- gateway/platforms/slack.py: ephemeral response_url POST session —
hits https://hooks.slack.com/... callback URLs
- plugins/platforms/teams/adapter.py: standalone send session —
hits login.microsoftonline.com (token) + Bot Framework service URL
- plugins/platforms/google_chat/adapter.py: standalone send session —
hits https://chat.googleapis.com/v1/...
WhatsApp sessions are excluded: they connect to http://127.0.0.1:{port}
(local bridge) and must not be routed through a system proxy.
Closes#26924 (and supersedes #26926) in spirit.
DeepSeek was missing `default_aux_model` on its `ProviderProfile`, so
`_get_aux_model_for_provider("deepseek")` returned an empty string and
the compression / vision / session-search paths emitted
"No auxiliary LLM provider configured -- context compression will
drop middle turns without a summary."
on every DeepSeek session, even when the user had perfectly working
DeepSeek credentials.
Fix lands at the profile layer rather than the legacy
`_API_KEY_PROVIDER_AUX_MODELS_FALLBACK` dict the original PR targeted.
Every modern provider (gemini, zai, minimax, anthropic, kimi-coding,
stepfun, ollama-cloud, gmi, novita, kilocode, ai-gateway, opencode-zen)
sets `default_aux_model` on its `ProviderProfile`; the fallback dict
only exists for providers that predate the profiles system.
Tests added under `tests/plugins/model_providers/test_deepseek_profile.py`:
- `test_profile_advertises_deepseek_chat` -- pins the profile attribute
- `test_consumer_api_returns_deepseek_chat` -- pins the consumer API behavior
- `test_consumer_api_returns_non_empty` -- regression guard for the
symptom in the issue
Original diagnosis and aux-model choice from @kriscolab in PR #26926;
moved one layer up.
Co-authored-by: kriscolab <71590782+kriscolab@users.noreply.github.com>
The langfuse plugin is hooks-only (no toolsets), so it never appears in
`hermes tools` — that menu iterates `_get_effective_configurable_toolsets()`
(= `CONFIGURABLE_TOOLSETS` + plugin-registered toolsets), and "langfuse"
is in neither. The `TOOL_CATEGORIES["langfuse"]` setup wizard (with its
`post_setup: "langfuse"` hook that pip-installs the SDK and writes
`plugins.enabled`) was reachable only when a toolset key "langfuse" got
enabled, which can't happen — so it's been dead code, and the docs that
promised "Setup (interactive): hermes tools → Langfuse Observability"
were silently broken.
Right home for that wizard is `hermes plugins` (e.g. auto-running a
plugin's post-setup hook on enable), which is a generic plugin-setup
mechanism worth designing properly rather than shoehorning langfuse
back into `hermes tools`. Until that exists, point users at the
working manual flow.
Code:
- Delete `TOOL_CATEGORIES["langfuse"]` (24 lines) — unreachable.
- Delete the `post_setup_key == "langfuse"` branch in `_run_post_setup`
(29 lines) — only caller was the deleted TOOL_CATEGORIES entry.
Docs / comments (point at the manual flow + interactive `hermes plugins`):
- `plugins/observability/langfuse/README.md`: collapse the two-option
setup section to the single working flow.
- `plugins/observability/langfuse/plugin.yaml`: update `description`.
- `plugins/observability/langfuse/__init__.py`: update module docstring.
- `hermes_cli/config.py`: update inline comment above the LANGFUSE_*
env-var allow-list.
- `website/docs/user-guide/features/built-in-plugins.md`: collapse
"Setup (interactive)" + "Setup (manual)" into one accurate block.
- `website/docs/reference/environment-variables.md`: update the
cross-reference in the Langfuse env-vars section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ready column help and fallbacks now describe dependency-ready work; show a
badge on unassigned ready cards and fix the stale unassigned tooltip. Align
localized Ready help strings with the new semantics.
Co-authored-by: Cursor <cursoragent@cursor.com>
The cherry-picked PR #15251 from @tw2818 correctly identified the
DeepSeek 400 root cause but placed the fix in the legacy fallback path
of `build_kwargs`, which DeepSeek never reaches — DeepSeek has a
registered ProviderProfile and goes through `_build_kwargs_from_profile`
instead. The legacy-path block was therefore dead code.
This commit pivots the fix to where it actually fires:
- New `DeepSeekProfile` in `plugins/model-providers/deepseek/__init__.py`
overrides `build_api_kwargs_extras` to emit DeepSeek's expected wire
format (mirrors `KimiProfile`):
{"reasoning_effort": "<low|medium|high|max>",
"extra_body": {"thinking": {"type": "enabled" | "disabled"}}}
- Model gating: only `deepseek-v4-*` and `deepseek-reasoner` emit
thinking control. `deepseek-chat` (V3) is untouched — current behavior.
- Effort mapping: low/medium/high passthrough, xhigh/max → max, unset →
omitted (DeepSeek server applies its own default).
- Revert the legacy-path additions from PR #15251 — they were dead code,
and the `_copy_reasoning_content_for_api` strip block specifically
would have nullified the existing reasoning_content padding machinery
(`_needs_deepseek_tool_reasoning` → space-pad on replay) that the
active provider already relies on for replay correctness.
- Unit tests pin the wire-shape contract and the model gating rules
(26 tests, all passing). Existing transport + provider profile suites
(321 tests) continue to pass.
- AUTHOR_MAP: map twebefy@gmail.com → tw2818 for release notes credit.
Closes#15700, #17212, #17825.
Co-authored-by: tw2818 <twebefy@gmail.com>
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.
Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
loopback clients) and `referrer=hermes-agent` for best-effort
attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
`exp`-based expiry detection with skew; refresh-token rotation
synced both ways between the singleton store and the credential
pool so multi-process / multi-profile setups don't tear each other's
refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
refreshes the token, swaps it back into `self.api_key`, and retries
the call once. Guarded against silent account swaps when the active
key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
dedicated xAI Responses-mode auxiliary client instead of falling back
to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
plugin) resolve credentials through a unified runtime → singleton →
env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
wired through the standard auth-commands surface; remove cleans up
the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
"xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
back to pool credentials when the singleton is missing.
Hardening
---------
* Discovery and refresh responses validate the returned
`token_endpoint` host against the same `*.x.ai` allowlist as the
authorization endpoint, blocking MITM persistence of a hostile
endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
wrapped to raise typed `AuthError` on malformed bodies (captive
portals, proxy error pages) instead of leaking JSONDecodeError
tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
transport (sending it as a top-level kwarg trips xAI's SDK with a
TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
an OAuth entry doesn't silently flip the active provider out from
under the running agent.
Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
redirect URI validation, singleton↔pool sync, concurrency races,
refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
`test_run_agent_codex_responses.py` cover the pool sync-back,
`extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
* fix(langfuse): reject placeholder credentials with one-shot warning
When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.
Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.
The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.
Fixes#22763Fixes#23823
* fix(langfuse): use actual API request messages for generation input
on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.
* fix(langfuse): record tool call outputs in traces
Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.
Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.
Also:
- Preserve OpenAI-style nested function shape in serialized tool calls
so Langfuse renders name/arguments correctly
- Keep name + tool_call_id on role:tool messages for proper pairing
- Backfill tool results onto the matching turn_tool_calls entry so the
generation's tool-call record carries the result alongside arguments
- Coerce request messages from whichever field the runtime provides
(request_messages, messages, conversation_history, user_message)
* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test
Self-review of the combined #22345 + #23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:
1. Drop is_first_turn from the pre_api_request hook. The boolean expression
`not bool(conversation_history)` was wrong: conversation_history is
reassigned to None mid-run after compression (5 sites in run_agent.py),
so the value flips False -> True mid-conversation on every post-compression
API call. The langfuse plugin never consumed it, so the kwarg was both
misleading AND dead.
2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
pre_api_request hook contract discards return values (invoke_hook never
writes back to api_kwargs), and the langfuse plugin's _serialize_messages
already builds its own snapshot dicts via _safe_value. A deepcopy on every
API call would walk every tool result and base64 image — significant
overhead for no real isolation benefit. Shallow copy of the outer list
protects against later mutations of api_messages without paying for the
inner-dict walk.
3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
threads behind a barrier to actually exercise _STATE_LOCK on the
pending_tools_by_name queue. The original test was sequential and only
validated Python list semantics; this one validates the lock discipline.
4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
that function does not exist. Tests reload the module via sys.modules.pop
+ importlib.import_module instead.
Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.
---------
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger
with no persistent user IDs — every contact is identified by an opaque
internal ID generated at connection time. This adds it as a Hermes
gateway platform via the plugin system.
The adapter connects to a local simplex-chat daemon via WebSocket,
listens for inbound messages, and sends replies. Originally proposed in
PR #2558 as a core-modifying integration; reshaped here as a self-
contained plugin under plugins/platforms/simplex/ with no edits to any
core file. Discovery is filesystem-based (scanned by gateway.config),
and the platform identity is resolved on demand via Platform("simplex").
Plugin contract:
- check_requirements() requires SIMPLEX_WS_URL AND the websockets package
- validate_config() / is_connected() accept env or config.yaml input
- _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel)
- _standalone_send() supports out-of-process cron delivery
- interactive_setup() provides a stdin wizard for hermes gateway setup
- register() wires the adapter into the registry with required_env,
install_hint, cron_deliver_env_var, allowed_users_env, and a
platform_hint for the LLM.
Lazy dependency: the websockets Python package is imported inside the
functions that need it. The plugin is importable and discoverable even
when websockets is missing — check_requirements() simply returns False
until `pip install websockets` is run. No new pyproject extras are
introduced.
Environment variables:
SIMPLEX_WS_URL WebSocket URL of the daemon (required)
SIMPLEX_ALLOWED_USERS Comma-separated allowed contact IDs
SIMPLEX_ALLOW_ALL_USERS Set true to allow all contacts
SIMPLEX_HOME_CHANNEL Default contact for cron delivery
SIMPLEX_HOME_CHANNEL_NAME Human label for the home channel
Closes#2557.
Add NovitaAI as a first-class provider with dedicated model selection
flow, live pricing, and authoritative context length resolution.
- Register provider in PROVIDER_REGISTRY, HERMES_OVERLAYS, and all
alias/label maps (ID: novita, aliases: novita-ai, novitaai)
- Add dedicated _model_flow_novita() with 3-tier model list fallback:
Novita API → models.dev → static curated list
- Fetch live pricing from /v1/models with correct unit conversion
(input_token_price_per_m is 0.0001 USD per Mtok)
- Add Novita-specific context length resolution (step 4b) in
get_model_context_length(), prioritized over models.dev/OpenRouter
- Register api.novita.ai in _URL_TO_PROVIDER to prevent early return
from the custom-endpoint code path
- Add models.dev mapping (novita → novita-ai)
- Add default auxiliary model (deepseek/deepseek-v3-0324)
- Add NOVITA_API_KEY to test isolation (conftest.py)
- Update docs: providers page, env vars reference, CLI reference,
.env.example, README, and landing page
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.
Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:
1. **search**: ``firecrawl.search()`` caught the
``ValueError("Web tools are not configured...")`` from
``_get_firecrawl_client()`` and returned it as
``{"success": False, "error": ...}``, losing the legacy
``{"error": "Error searching web: ..."}`` envelope that
``tool_error()`` emits on main. Models that special-case the
``error`` key still detect the failure, but the prefix is part of
the legacy contract some users rely on.
2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
``ValueError`` and wrapped it as a per-page error inside
``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
BEFORE dispatching, so its unconfigured response is
``{"success": False, "error": "web_crawl requires Firecrawl..."}``
at the top level. The PR's per-page burying hid the failure inside
``results[]`` where models that check ``result.get("error")`` would
miss it.
Fix:
- ``plugins/web/firecrawl/provider.py``: pull
``_get_firecrawl_client()`` outside the broad ``try`` in
``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
to the dispatcher's top-level exception handler. In-flight SDK
errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
``web_crawl_tool``. When the resolved crawl provider is
``is_available()==False``, short-circuit BEFORE dispatching with the
same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
(``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
future plugin work can't undo this.
Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.
Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
`("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
description and the legacy `_get_backend()` candidate order both
call for the 7-tuple
`(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
and no config, `get_active_search_provider()` now returns tavily
(was None); with EXA+PARALLEL it returns parallel (was None); with
BRAVE+FIRECRAWL it returns firecrawl (was brave-free).
2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
docstring, and inline comment all listed the old 4-tuple and claimed
"brave-free first because it was the shipped default". The legacy
default is `"firecrawl"`. Rewritten to match the new ordering and
reference `tools.web_tools._get_backend()` as the source of truth.
3. `agent/web_search_registry.py` — `get_active_crawl_provider`
docstring said "only Tavily implements it among built-in providers".
Firecrawl also advertises `supports_crawl=True` after the previous
commit. Updated to "Tavily and Firecrawl".
4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
the only built-in backend that natively crawls". Updated.
5. `agent/web_search_provider.py` — ABC docstring mentioned only
`search` / `extract` capabilities. Added `crawl` for accuracy.
6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
cache globals (`_firecrawl_client`, `_parallel_client`,
`_async_parallel_client`, `_exa_client`) were declared but never read
(all reads/writes go through `_wt.*` per the `extracting-inline-
helpers-to-plugins` recipe). Removed the dead declarations; the
reset-for-tests helpers in firecrawl + parallel now clear the
canonical `_wt._<name>` slots, matching the pattern exa already used.
Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.
What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
- Accepts the same kwargs as the dispatcher passes to any crawl
provider (``instructions``, ``depth``, ``limit``); Firecrawl's
/crawl endpoint ignores ``instructions`` and ``depth`` so we log
and drop with a clear info message.
- Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
gateway event loop isn't blocked on a multi-page crawl.
- Preserves the response-shape normalization across pydantic /
typed-object / dict variants that the legacy inline code did.
- Preserves per-page website-policy re-check (catches blocked
redirects after the SDK returns).
- Returns the same {"results": [...]} shape so the dispatcher's
shared LLM-summarization post-processing path works unchanged.
- Sets supports_crawl() to True so the dispatcher routes through
the plugin instead of the legacy fallthrough.
2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
that used to run after "No registered provider supports crawl" —
~270 lines including:
- check_firecrawl_api_key gate + typed error
- inline SSRF + website-policy seed-URL gate (dispatcher already
does this)
- Firecrawl client setup with crawl_params
- 100+ lines of pydantic/dict/typed-object normalization
- Per-page LLM-processing loop (kept in the dispatcher's shared
post-processing path; that's where it always belonged)
- trimming + base64 image cleanup (still done in the dispatcher's
shared path)
Replaced with a single typed-error branch when no crawl-capable
provider is available: "web_crawl has no available backend. Set
FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
TAVILY_API_KEY for Tavily."
Test updates
------------
- tests/tools/test_website_policy.py:
- test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
gate still runs on web_tools.check_website_access (no change to
that patch), but the firecrawl client lockdown moved to the
plugin module — patch firecrawl_provider._get_firecrawl_client
instead of web_tools._get_firecrawl_client. The dispatcher
short-circuits before the plugin runs, so the test still passes.
- test_web_crawl_blocks_redirected_final_url: patch the per-page
policy gate at plugins.web.firecrawl.provider.check_website_access
(where it now runs) AND on web_tools (where the seed-URL gate
still runs). Patch firecrawl_provider._get_firecrawl_client for
the FakeCrawlClient injection. Both checks flow through the same
fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
- Update parametrized capability-flag spec: firecrawl supports_crawl
is now True.
- Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
verifies inspect.iscoroutinefunction(p.crawl) is True and that
the async crawl returns a per-page error dict (not a raise) when
FIRECRAWL_API_KEY is missing.
Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
name search extract crawl async-extract? async-crawl?
firecrawl True True True True True
tavily True True True False False
Both crawl-capable providers exercise the dispatcher's
inspect.iscoroutinefunction async-or-sync detection.
Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
+ new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
before this commit, 2227 before the migration started).
Removes the seven hardcoded TOOL_CATEGORIES["web"] provider rows that
duplicated the plugin-registered providers, and deletes the
_WEB_PLUGIN_SKIPLIST that existed to prevent duplicate picker rows
during the migration. The Web Search & Extract category now derives its
provider rows entirely from agent.web_search_registry via
_plugin_web_search_providers(), matching how Spotify, Google Meet, and
the image_gen plugins are surfaced.
Removed (deduplicated against plugin schemas):
- Firecrawl Cloud → plugins.web.firecrawl
- Exa → plugins.web.exa
- Parallel → plugins.web.parallel
- Tavily → plugins.web.tavily
- SearXNG → plugins.web.searxng
- Brave Search (Free Tier) → plugins.web.brave_free
- DuckDuckGo (ddgs) → plugins.web.ddgs (post_setup hook preserved)
Retained in TOOL_CATEGORIES["web"]:
- Nous Subscription — requires requires_nous_auth +
managed_nous_feature + override_env_vars
to drive the managed-gateway UX. Not a
provider — a different *setup flow* for the
firecrawl backend.
- Firecrawl Self-Hosted — points firecrawl at a private Docker URL
via FIRECRAWL_API_URL only. Same reason:
UX setup-flow row, not a provider.
These two rows describe alternative auth/billing paths for the
firecrawl backend; they intentionally share web_backend="firecrawl"
with the plugin row but light up different env-var prompts.
Plugin schema extensions
------------------------
- ddgs plugin's get_setup_schema() now emits `post_setup: "ddgs"` so
selection still triggers the pip-install hook in _run_post_setup().
- _plugin_web_search_providers() passes `post_setup` through verbatim
when present in the schema (other future plugins like camofox / a
hypothetical playwright-web plugin can opt in the same way).
- Picker rows now carry both `web_backend` (legacy field consumed by
setup + selection helpers) and `web_search_plugin_name`
(informational marker), so behavior is identical between hardcoded
and plugin-registered rows.
Net diff
--------
- hermes_cli/tools_config.py: -141/+50 lines (~91 lines net)
- plugins/web/ddgs/provider.py: +7/-4 (post_setup field + badge polish)
Verified
--------
- Compile-clean for both files
- Picker shows: 2 hardcoded rows (Nous Subscription, Firecrawl
Self-Hosted) + 7 plugin rows (alphabetically: Brave Search,
DuckDuckGo, Exa, Firecrawl, Parallel, SearXNG, Tavily). DuckDuckGo
row carries post_setup="ddgs" for first-time install.
- 173 web-specific tests still pass.
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.
Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
_FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
_get_firecrawl_gateway_url, _is_tool_gateway_ready,
_has_direct_firecrawl_config, _raise_web_backend_configuration_error,
_firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
_get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
_normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
_exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key
Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).
Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
_get_direct_firecrawl_config, _get_firecrawl_gateway_url,
_is_tool_gateway_ready, _has_direct_firecrawl_config,
_firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
_get_firecrawl_client, _to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload,
check_firecrawl_api_key
- From plugins.web.tavily.provider:
_tavily_request, _normalize_tavily_search_results,
_normalize_tavily_documents
- From plugins.web.parallel.provider:
_get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
_get_exa_client
Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
resolve_managed_tool_gateway, managed_nous_tools_enabled,
prefers_gateway (tests patch tools.web_tools.<name>)
Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.
The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.
Verified
--------
- 173/173 targeted web tests pass:
test_web_providers.py, test_web_providers_brave_free.py,
test_web_providers_ddgs.py, test_web_providers_searxng.py,
test_web_tools_config.py, test_web_tools_tavily.py,
test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
(plugins.web.<vendor>.provider)
Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
(next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:
1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).
Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.
2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.
Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.
Test updates (tests/tools/test_website_policy.py — 4 tests):
- test_web_extract_short_circuits_blocked_url
- test_web_extract_blocks_redirected_final_url
Both: patch the gate at plugins.web.firecrawl.provider (where it
runs after migration) and force the firecrawl plugin to be the
active extract provider via FIRECRAWL_API_KEY.
- test_web_crawl_short_circuits_blocked_url
- test_web_crawl_blocks_redirected_final_url
Both: unchanged — the dispatcher-level gate at tools.web_tools.py
line 1651 still uses the imported `check_website_access` name and
the firecrawl-fallthrough path is exercised as before.
Verified: 22/22 tests/tools/test_website_policy.py pass.
Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.
What moved into the plugin (all previously in tools/web_tools.py):
Lazy Firecrawl SDK proxy
- _load_firecrawl_cls() — caches the imported SDK class
- _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
imports until first construction or isinstance check.
Client construction (dual auth)
- _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path
- _get_firecrawl_gateway_url() — managed Nous tool-gateway URL
- _is_tool_gateway_ready() — gateway URL + Nous token check
- _has_direct_firecrawl_config() — direct config present?
- _get_firecrawl_client() — combined client construction
honoring web.use_gateway
- check_firecrawl_api_key() — top-level "is firecrawl usable"
- _firecrawl_backend_help_suffix() — managed-gateway help string
- _raise_web_backend_configuration_error() — typed misconfig error
Response shape normalization (vendor-specific)
- _to_plain_object(), _normalize_result_list() — SDK→dict helpers
- _extract_web_search_results() — handles SDK/direct/gateway shapes
- _extract_scrape_payload() — nested-data unwrap for scrape
Per-URL extract loop
- 60s asyncio.wait_for timeout per URL
- Pre-scrape website-policy gate
- Post-scrape redirect-aware SSRF re-check
- Format-aware content selection (markdown / html / auto)
- Per-URL errors returned as {"error": str} entries, no raises
Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.
The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.
Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.
Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.
E2E verified:
- All 7 providers register: brave-free, ddgs, exa, firecrawl,
parallel, searxng, tavily
- inspect.iscoroutinefunction(firecrawl.extract) -> True
- Firecrawl proxy is a callable lazy proxy at module level
- check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence