hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-21 03:39:54 +00:00

Author	SHA1	Message	Date
Dusk1e	8db544b4d0	fix(clipboard): reject non-png clipboard images when png normalization fails	2026-05-13 22:54:21 -07:00
kshitijk4poor	4ca5e72444	fix(web): preserve top-level error envelope on unconfigured systems Surfaced by local E2E behavior-parity testing of PR vs origin/main: the plugin-migrated dispatchers were quietly changing the error envelope shape returned to function-calling models on unconfigured systems. Two findings, both from per-result error wrapping bleeding into the pre-flight configuration error path: 1. search: ``firecrawl.search()`` caught the ``ValueError("Web tools are not configured...")`` from ``_get_firecrawl_client()`` and returned it as ``{"success": False, "error": ...}``, losing the legacy ``{"error": "Error searching web: ..."}`` envelope that ``tool_error()`` emits on main. Models that special-case the ``error`` key still detect the failure, but the prefix is part of the legacy contract some users rely on. 2. crawl: ``firecrawl.crawl()`` caught the same pre-flight ``ValueError`` and wrapped it as a per-page error inside ``results[0]``. Main short-circuits on ``check_firecrawl_api_key()`` BEFORE dispatching, so its unconfigured response is ``{"success": False, "error": "web_crawl requires Firecrawl..."}`` at the top level. The PR's per-page burying hid the failure inside ``results[]`` where models that check ``result.get("error")`` would miss it. Fix: - ``plugins/web/firecrawl/provider.py``: pull ``_get_firecrawl_client()`` outside the broad ``try`` in ``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate to the dispatcher's top-level exception handler. In-flight SDK errors still get wrapped as ``{"success": False, ...}``. - ``tools/web_tools.py``: mirror main's upstream availability gate in ``web_crawl_tool``. When the resolved crawl provider is ``is_available()==False``, short-circuit BEFORE dispatching with the same top-level error shape main emits. - ``tests/tools/test_web_providers.py``: 2 regression tests (``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so future plugin work can't undo this. Verified via local subprocess-based parity test (14/14 scenarios match origin/main shape exactly) and full 210/210 web test suite green.	2026-05-13 22:31:28 -07:00
kshitijk4poor	21e3a863bb	feat(web): firecrawl plugin natively supports crawl; delete legacy inline path The web-provider migration originally left firecrawl crawl as the only provider-specific code remaining inline in tools/web_tools.py (~250 lines of Firecrawl-specific crawl orchestration that didn't fit the plugin's existing surface). This commit closes that gap. What this adds -------------- 1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)`` - Accepts the same kwargs as the dispatcher passes to any crawl provider (``instructions``, ``depth``, ``limit``); Firecrawl's /crawl endpoint ignores ``instructions`` and ``depth`` so we log and drop with a clear info message. - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the gateway event loop isn't blocked on a multi-page crawl. - Preserves the response-shape normalization across pydantic / typed-object / dict variants that the legacy inline code did. - Preserves per-page website-policy re-check (catches blocked redirects after the SDK returns). - Returns the same {"results": [...]} shape so the dispatcher's shared LLM-summarization post-processing path works unchanged. - Sets supports_crawl() to True so the dispatcher routes through the plugin instead of the legacy fallthrough. 2. tools/web_tools.py: delete the entire legacy firecrawl crawl block that used to run after "No registered provider supports crawl" — ~270 lines including: - check_firecrawl_api_key gate + typed error - inline SSRF + website-policy seed-URL gate (dispatcher already does this) - Firecrawl client setup with crawl_params - 100+ lines of pydantic/dict/typed-object normalization - Per-page LLM-processing loop (kept in the dispatcher's shared post-processing path; that's where it always belonged) - trimming + base64 image cleanup (still done in the dispatcher's shared path) Replaced with a single typed-error branch when no crawl-capable provider is available: "web_crawl has no available backend. Set FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set TAVILY_API_KEY for Tavily." Test updates ------------ - tests/tools/test_website_policy.py: - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL gate still runs on web_tools.check_website_access (no change to that patch), but the firecrawl client lockdown moved to the plugin module — patch firecrawl_provider._get_firecrawl_client instead of web_tools._get_firecrawl_client. The dispatcher short-circuits before the plugin runs, so the test still passes. - test_web_crawl_blocks_redirected_final_url: patch the per-page policy gate at plugins.web.firecrawl.provider.check_website_access (where it now runs) AND on web_tools (where the seed-URL gate still runs). Patch firecrawl_provider._get_firecrawl_client for the FakeCrawlClient injection. Both checks flow through the same fake_check function. - tests/plugins/web/test_web_search_provider_plugins.py: - Update parametrized capability-flag spec: firecrawl supports_crawl is now True. - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured — verifies inspect.iscoroutinefunction(p.crawl) is True and that the async crawl returns a per-page error dict (not a raise) when FIRECRAWL_API_KEY is missing. Verified -------- - 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl crawl test from this commit = 218 with the test deduplication). - Compile-clean (py_compile passes on both files). - Provider capabilities matrix confirmed end-to-end: name search extract crawl async-extract? async-crawl? firecrawl True True True True True tavily True True True False False Both crawl-capable providers exercise the dispatcher's inspect.iscoroutinefunction async-or-sync detection. Net diff -------- - tools/web_tools.py: -254 lines (legacy inline crawl gone) - plugins/web/firecrawl/provider.py: +185 lines (crawl method) - test_website_policy.py: +14/-9 lines (patch locations) - test_web_search_provider_plugins.py: +22/-1 lines (capability flag + new firecrawl crawl test) - Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763 before this commit, 2227 before the migration started).	2026-05-13 22:31:28 -07:00
kshitijk4poor	39b4ebfcea	refactor(web): delete legacy tools/web_providers/ directory + migrate ABC tests Removes the legacy in-tree provider scaffolding that PR #25182 fully replaced with the plugin architecture: tools/web_providers/__init__.py (6 lines) tools/web_providers/base.py (89 lines — old ABCs) tools/web_providers/ARCHITECTURE.md (73 lines — old design doc) These were the staging-ground ABCs and provider modules that the plugin migration absorbed. All seven web providers now implement the single :class:`agent.web_search_provider.WebSearchProvider` ABC and live under ``plugins/web/<vendor>/``. Nothing else in the tree imports ``tools.web_providers`` — verified via grep before deletion. Test migration (tests/tools/test_web_providers.py) -------------------------------------------------- Rewrote ``TestWebProviderABCs`` to test the new unified ABC at :mod:`agent.web_search_provider`: - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available`` - test_concrete_search_only_provider_works — exercise default ``supports_extract=False`` / ``supports_crawl=False`` flags - test_concrete_multi_capability_provider_works — exercise all three capabilities, async extract supported (declared sync here for simplicity; real plugins like parallel + firecrawl use async) - test_search_only_provider_skips_extract_and_crawl — verify ``supports_*()`` flags default to False so search-only providers don't have to implement extract() or crawl() The 9 other tests in the file (per-capability backend selection, DEFAULT_CONFIG merge, dispatcher routing) test public helpers in ``tools.web_tools`` that still exist and pass unchanged. agent/web_search_provider.py docstring updated to reflect that the legacy ABCs no longer exist; the response-shape contract is preserved bit-for-bit so external consumers see no behavioral change. Net diff -------- - tools/web_providers/ removed (-168 lines) - tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net, same coverage, new API) - agent/web_search_provider.py docstring (-3/+5 lines) Verified -------- - 173/173 targeted web tests pass - 12/12 ABC contract tests pass with the new interface - No remaining grep hits for ``tools.web_providers`` outside of intentional historical references in plugin docstrings.	2026-05-13 22:31:28 -07:00
kshitijk4poor	5e54330e27	fix(web): preserve firecrawl crawl + website-policy gate after migration Two regressions discovered by running the full tests/tools/ suite after the dispatcher cutover, both fixed in this commit: 1. web_crawl_tool incorrectly errored "search-only" for firecrawl --------------------------------------------------------------------- The cutover treated any provider with supports_crawl()==False as a search-only backend and returned the typed search-only error. But firecrawl can crawl via the legacy multi-page-extract path inside web_crawl_tool — it just doesn't expose supports_crawl on the plugin (adding native firecrawl crawl is a clean follow-up). Fix: only emit the search-only error when the provider supports NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the provider supports extract but not crawl (firecrawl), fall through to the legacy firecrawl-via-extract path below. 2. firecrawl plugin's check_website_access wasn't patchable --------------------------------------------------------------------- The plugin imported `from tools.website_policy import check_website_access` INSIDE the extract() function body, so monkeypatching the name on plugins.web.firecrawl.provider had no effect — the inner import re-bound the name on every call. Fix: hoist the import to module level. Cheap (website_policy itself has no heavy deps) and makes the standard monkeypatch.setattr(firecrawl_provider, "check_website_access", ...) pattern work. Test updates (tests/tools/test_website_policy.py — 4 tests): - test_web_extract_short_circuits_blocked_url - test_web_extract_blocks_redirected_final_url Both: patch the gate at plugins.web.firecrawl.provider (where it runs after migration) and force the firecrawl plugin to be the active extract provider via FIRECRAWL_API_KEY. - test_web_crawl_short_circuits_blocked_url - test_web_crawl_blocks_redirected_final_url Both: unchanged — the dispatcher-level gate at tools.web_tools.py line 1651 still uses the imported `check_website_access` name and the firecrawl-fallthrough path is exercised as before. Verified: 22/22 tests/tools/test_website_policy.py pass.	2026-05-13 22:31:28 -07:00
kshitijk4poor	b05253ceed	refactor(web): dispatch all three tools through web_search_registry Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in tools/web_tools.py to dispatch through agent.web_search_registry instead of the legacy hardcoded if-elif backend chains. Per-tool changes: web_search_tool (sync) Replace 5 backend branches (parallel, exa, registry-3-providers, tavily, firecrawl-fallthrough) with a single registry path: 1. _get_search_backend() resolves the configured name 2. _wsp_get_provider(name) for explicit-config-wins semantics 3. get_active_search_provider() fallback for typo / unknown name 4. provider.search(query, limit) — sync for all 7 providers web_extract_tool (async) Replace 4 backend branches (parallel-async, exa-sync, tavily-sync, search-only-error, firecrawl-perurl-loop) with: 1. Same provider resolution as search. 2. When configured backend IS registered but doesn't support extract (search-only providers like brave-free), surface a typed "search-only" error matching the legacy text — tests assert that wording. 3. inspect.iscoroutinefunction(provider.extract) detects sync vs async: parallel + firecrawl are async; exa + tavily are sync. Sync extracts run in asyncio.to_thread() so we don't block. web_crawl_tool (async) Replace tavily-specific branch + search-only-error block with: 1. _wsp_get_provider(backend) — explicit config first 2. Search-only typed error when the configured name doesn't support crawl (matches legacy phrasing) 3. get_active_crawl_provider() fallback otherwise 4. provider.crawl(url, *kwargs) — async-or-sync dispatch as above 5. Response post-processing (LLM summarization, trimming) stays unchanged — it's not provider-specific. When no plugin advertises supports_crawl, falls through to the existing Firecrawl-via-web-summarize path below (unchanged). Test updates (2 tests in tests/tools/test_web_tools_config.py): - test_web_search_clamps_limit_before_backend_call: patch("tools.web_tools._parallel_search") -> patch the registry provider returned by agent.web_search_registry.get_provider - test_search_error_response_does_not_expose_diagnostics: patch("tools.web_tools._get_firecrawl_client") -> same pattern Tests unchanged (still pass): - All TestXBackendWiring classes (test _get_backend / _is_backend_available config-resolution, independent of dispatch) - All TestXSearchOnlyErrors classes (test the search-only error path via web_extract_tool / web_crawl_tool — error text preserved) - 141 passing web tests total, 0 regressions. Dead-code cleanup deferred to a follow-up commit so this diff stays focused on the cutover. After this commit: - tools.web_tools._exa_search / _exa_extract / _parallel_search / _parallel_extract / _tavily_request / _normalize_tavily_ / _get_firecrawl_client / _extract_web_search_results / _extract_scrape_payload / _to_plain_object / _normalize_result_list are no longer called by the dispatchers, but still exist. - The config-resolution layer (_get_backend, _is_backend_available, _is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in use and must stay. - The Firecrawl proxy and check_firecrawl_api_key are still imported by integration tests and patched by unit tests — must stay (or be re-exported from the plugin).	2026-05-13 22:31:28 -07:00
kshitijk4poor	6b219f5af6	refactor(web): remove legacy in-tree provider modules Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three providers that moved to plugins/web/ in prior commits. tools/web_tools.py no longer imports them (registry dispatch as of `d8735963f`), so removing them is purely a cleanup pass. Also migrates the existing tests to the new import paths: tests/tools/test_web_providers_brave_free.py tests/tools/test_web_providers_ddgs.py tests/tools/test_web_providers_searxng.py Mechanical rewrites: - `from tools.web_providers.X import YSearchProvider` -> `from plugins.web.X.provider import YWebSearchProvider` - `.is_configured()` -> `.is_available()` (legacy method -> new method) - `.provider_name()` -> `.name` (legacy method -> new property) - `from tools.web_providers.base import WebSearchProvider` -> `from agent.web_search_provider import WebSearchProvider` (the subclass-check asserts membership in the new plugin-facing ABC) - `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at `plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports) The TestXBackendWiring / TestXSearchOnlyErrors classes (covering _is_backend_available, _get_backend, check_web_api_key, and the "search-only" error paths in web_extract/web_crawl) are untouched — those still test web_tools.py's backend-selection logic, which continues to recognize the names "brave-free" / "ddgs" / "searxng" even after the modules behind them moved to plugins. tools/web_providers/base.py is intentionally NOT deleted by this commit — it's the parent ABC of the legacy modules and shares its name with agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the naming collision (see PR description Finding 0); the real migration PR deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST guards in hermes_cli/tools_config.py. Test results: bash scripts/run_tests.sh tests/tools/test_web_providers_.py -> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests) bash scripts/run_tests.sh tests/tools/test_web_.py -> 141 passed in 4.70s (full web test set, post-deletion)	2026-05-13 22:31:28 -07:00
helix4u	52521c937a	fix(install): skip browser download when system chromium exists	2026-05-13 22:07:02 -07:00
Teknium	7f08cb5941	fix(tts): align MiniMax TTS defaults with current API and add GroupId support Follow-up on @pty819's t2a_v2 endpoint fix: - Default model: speech-02 -> speech-02-hd (bare 'speech-02' is not in the supported enum; t2a_v2 rejects it with 400). Official enum: speech-01-hd, speech-01-turbo, speech-02-hd, speech-02-turbo, speech-2.6-hd/turbo, speech-2.8-hd/turbo. - Default voice: female-shaonv -> English_expressive_narrator. The legacy speech-01-series short ID doesn't resolve cleanly on the speech-02+ models that are now the default. - Default base URL: api.minimaxi.com -> api.minimax.io (matches the canonical host in the published docs; api-uw.minimax.io is the reduced-latency alt). - Add GroupId support via tts.minimax.group_id config or MINIMAX_GROUP_ID env var. Some MiniMax accounts scope TTS requests by group; without it, requests 401. Only appended when not already in the user's base_url. Tests rewritten to cover both the default t2a_v2 path (hex-encoded audio in JSON, nested voice_setting/audio_setting) and the legacy text_to_speech path (raw audio bytes, flat payload). Adds coverage for GroupId config/env wiring and error surfacing. Also adds AUTHOR_MAP entry for pty819's GitHub-noreply email.	2026-05-13 22:04:28 -07:00
Teknium	9d42c2c286	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 ) * feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' \| 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-13 16:39:41 -07:00
teknium1	59da8ec4ec	fix(tools): refuse skill_view name collisions instead of guessing skill_view ran the direct-path strategy across every skill dir before the recursive strategy, so a top-level skill in an external dir could silently shadow a same-named nested local skill. /skills correctly listed the local version (deduped local-first by _find_all_skills) but skill_view loaded the external one — confusing, and a real bug class for users with skills.external_dirs registered alongside categorized local skills. Pick a louder fix than @polkn's PR #6136 proposed: collect every match across all dirs (direct path, recursive by parent dir name, legacy flat <name>.md), and if there's more than one, refuse with an error that surfaces every matching path plus a hint to load by the categorized form. Local-first precedence would have replaced silent external-shadowing with silent same-name collisions between two externals, or made an externally-shadowed-by-local skill unreachable by bare name with no signal. Refusing forces the user to disambiguate once and never wonder which skill ran. Recovery: pass the full categorized path ("foundations/runtime/explore-codebase" instead of "explore-codebase"), or rename one of the colliding skills. Co-authored-by: pol <pol.kuijken@gmail.com>	2026-05-13 13:29:28 -07:00
iuyup	d6c9711ba8	fix(security): reduce unnecessary shell=True in subprocess calls - memory_setup.py: use shlex.split() for plugin dep checks instead of shell=True - transcription_tools.py: avoid shell=True for auto-detected whisper commands (user-provided templates via env var still use shell=True for compatibility) - cli.py: add comment clarifying intentional shell=True for user quick_commands - Add test verifying auto-detected template is shlex-safe Addresses CONTRIBUTING.md Priority #3 (Security hardening — shell injection).	2026-05-13 10:31:22 -07:00
Teknium	29d7c244c5	feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199 ) The clarify tool returned 'not available in this execution context' for every gateway-mode agent because gateway/run.py never passed clarify_callback into the AIAgent constructor. Schema actively encouraged calling it; users never saw the question. Changes: - tools/clarify_gateway.py — new event-based primitive mirroring tools/approval.py: register/wait_for_response/resolve_gateway_clarify with per-session FIFO, threading.Event blocking with 1s heartbeat slices (so the inactivity watchdog keeps ticking), and clear_session for boundary cleanup. - gateway/platforms/base.py — abstract send_clarify with a numbered-text fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix, etc.) gets a working clarify out of the box. Plus an active-session bypass: when the agent is blocked on a text-awaiting clarify, the next non-command message routes inline to the runner's intercept instead of being queued + triggering an interrupt. Same shape as the /approve deadlock fix from PR #4926. - gateway/platforms/telegram.py — concrete send_clarify renders one inline button per choice plus '✏️ Other (type answer)'. cl: callback handler resolves numeric choices immediately, flips to text-capture mode for Other, with the same authorization guards as exec/slash approvals. - gateway/run.py — clarify_callback wired at the cached-agent per-turn callback assignment site (only the user-facing agent path; cron and hygiene-compress agents have no human attached). Bridges sync→async via run_coroutine_threadsafe, blocks with the configured timeout, and returns a '[user did not respond within Xm]' sentinel on timeout so the agent adapts rather than pinning the running-agent guard. Text- intercept added to _handle_message before slash-confirm intercept (skipping slash commands). clear_session called in the run's finally to cancel any orphan entries. - hermes_cli/config.py — agent.clarify_timeout default 600s. - website/docs/user-guide/messaging/telegram.md — Interactive Prompts section. Tests: - tests/tools/test_clarify_gateway.py (14 tests) — full primitive coverage: button resolve, open-ended auto-await, Other flip, timeout None, unknown-id idempotency, clear_session cancellation, FIFO ordering, register/unregister notify, config default. - tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render paths (multi-choice/open-ended/long-label/HTML-escape/not-connected), callback dispatch (numeric resolve/Other flip/already-resolved/ unauthorized/invalid-token), and base-adapter text fallback. Out of scope: bot-to-bot, guest mode, checklists, poll media, live photos. Closes #24191.	2026-05-12 16:33:33 -07:00
Teknium	d89553c2d6	fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587 ) Daytona ships breaking SDK changes on June 10, 2026 — `list()` returns an iterator and the `page=` offset parameter is removed. We pin daytona==0.155.0 so we're past the May 24 hard-cutoff, but the legacy-sandbox resume path in DaytonaEnvironment still passes `page=1` and reads `.items` off the result. Switch to `next(iter(results), None)` against a single-result `list(labels=..., limit=1)` call. Update tests to use `iter([...])` and drop the `page=1` kwarg from list() assertions.	2026-05-12 16:31:46 -07:00
Dan Benyamin	62fd905340	feat(browser): support externally managed Camofox sessions Allow integrations to share a visible Camofox identity with Hermes and recover existing tabs without carrying local patches. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 15:14:49 -07:00
Austin Pickett	fc3fd6bb6b	fix(dashboard): UI polish — modals, layout, consistency, test fixes Dashboard UX polish pass — consolidates create forms into modals triggered from the page header, fixes layout inconsistencies, adds scroll-to navigation for the Keys page, and aligns the TokenBar with the design system. Changes: - App.tsx: add padding to sidebar header - resolve-page-title.ts: add missing routes, better fallback title - en.ts: fix nav labels (Profiles was 'profiles : multi agents') - ModelsPage: two-col layout, auxiliary tasks modal, TokenBar redesign - ProfilesPage: create button in header, form in modal, Checkbox component - CronPage: create button in header, form in modal - EnvPage: scroll-to sub-nav in header, fix text overflow Modal and dialog standardization: - Replace all native confirm()/window.confirm() with ConfirmDialog (OAuthProvidersCard, PluginsPage, ModelsPage, ConfigPage) - Add useModalBehavior hook (Escape-to-close, scroll lock, focus restore) - Apply hook to ProfilesPage, CronPage, AuxiliaryTasksModal Component fixes (from PR review): - Checkbox: fix controlled/uncontrolled mismatch, add focus-visible ring - TokenBar: add rounded-full to legend dots, remove dead code CI/test fixes: - Fix TS unused imports (noUnusedLocals), type-narrow PickerTarget union - Add windows-footgun suppression on platform-guarded os.killpg - Fix 19 stale unit tests + 9 e2e tests broken by recent main changes - Restore minimal example-dashboard plugin for plugin auth test	2026-05-12 13:59:22 -04:00
Teknium	c1eb2dcda7	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 ) * feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. Both real installer scripts now prefer `uv sync --locked` as Tier 0. uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. Removed the `[mistral]` extra entirely. The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. Regenerated uv.lock. 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view \| Layer \| Where \| Protects against \| \|----------------------------\|-------------------\|-------------------------------------------\| \| Exact pins in pyproject \| direct deps \| new mistralai 2.4.6-style direct compromise \| \| uv.lock + `--locked` install \| transitive graph \| transitive worm injection \| \| Tier-0 hash-verified path \| install.sh / .ps1 \| actually USE the lockfile in fresh installs \| \| `uv lock --check` CI gate \| every PR \| drift between pyproject and lockfile \| \| `hermes_cli/security_advisories.py` \| runtime \| cleanup for users who already got hit \| The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules	2026-05-12 01:02:25 -07:00
Teknium	99ad2d1372	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 ) The `mistralai` PyPI package was quarantined on 2026-05-12 after a malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build, CI run, install.sh first-run) currently fails on `mistralai>=2.3.0,<3` because PyPI returns zero candidates. Existing users running `hermes update` mostly didn't notice — `hermes update` falls back from `.[all]` to per-extra retries and silently skips mistral with a warning that scrolls past. But fresh installs hard-fail or lose every other extra. Changes: - pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and `[termux-all]`. The `mistral` extra itself is preserved so users can opt back in once PyPI un-quarantines. - hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the `hermes tools` provider picker until restored. - hermes_cli/web_server.py: drop "mistral" from dashboard STT options. - tools/transcription_tools.py: explicit `provider: mistral` returns "none" with a clear status message; auto-detect skips mistral. - tools/tts_tool.py: dispatcher returns a clear "temporarily disabled" error before any SDK import attempt (avoids cached-stale-package surprises). - tests/tools/: update three test files to assert the new disabled behavior. Each test docstring records why and points at the rollback trigger (PyPI un-quarantines mistralai). Restore plan: revert this commit once the package is available on PyPI again. The behavior change is intentional and documented in code comments + test docstrings to make the rollback trivial. Validation: - scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' → 425/425 passing. Refs: https://pypi.org/simple/mistralai/ (currently "pypi:project-status: quarantined").	2026-05-11 23:02:15 -07:00
fr33d3m0n	976d8e27ad	fix(approval): catch sudo with stdin/askpass/shell privilege flags Adds the only #17873 category not covered by the in-flight PRs #17962 (briandevans, reverse shell + download-execute) and #7993 (SHL0MS, credential reads + curl/wget exfiltration): sudo invocations that an LLM-driven agent can drive without TTY interaction. The agent has no TTY, so the sudo forms that succeed without human involvement are those reading the password from stdin (`-S` / `--stdin`) or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`) and list-privileges (`-a`) flags are also gated since they are privilege-relevant invocations the agent can chain after acquiring the password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell). Plain `sudo cmd` (no flag) is TTY-bound and excluded. Two patterns: 1. Direct flag: `\bsudo\b[^;\|&\n]?\s+(?:-s\b\|--stdin\b\|-a\b\|--askpass\b)` The lazy `[^;\|&\n]?` consumes flag-arguments without spanning command separators, so `sudo -u root -S whoami` matches (a textbook offensive form that a strict `(?:\s+-[^\s]+)` "leading flags only" pattern would have missed because `root` is a flag-value not a flag). 2. Combined short flags: `\bsudo\b[^;\|&\n]?\s+-[a-z][sa][a-z]\b` Catches packed forms like `sudo -nS id` where multiple flags share a single `-X` token. `_normalize_command_for_detection` lowercases input before pattern matching (tools/approval.py:340), so case variants of S/s and A/a collapse — both letter-pairs are gated since each is a privilege- relevant invocation. Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all flag-order permutations including herestring source and printf-piped forms; 9 negative including TTY-bound `sudo whoami`, interactive `sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`, package install, and the `pseudosudo` word-boundary edge case). Empirical coverage: 11/11 attacks matched, 0/10 false positives. Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download- execute), #7993 (credential reads + curl/wget exfiltration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 06:56:30 -07:00
OpenClaw Agent	9520a1ccdf	fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set Fixes #9590: Block explicit sudo -S (stdin password mode) commands when the SUDO_PASSWORD environment variable is not configured. The attack vector: the LLM constructs 'echo guessedpass \| sudo -S cmd' to brute-force sudo passwords, iterates based on sudo's error output ('Sorry, try again'). The existing _transform_sudo_command only injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit sudo -S must be treated as a guessing attempt. Changes: - Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when SUDO_PASSWORD is absent, anchored to command-start positions (^ ; && \|\| \| etc.) to avoid false positives on literal text - Integrate into check_all_command_guards() above yolo/mode=off so the block is unconditional (like the hardline floor) - Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass, integration with check_all_command_guards, yolo non-bypass, container backend bypass	2026-05-11 06:56:30 -07:00
kshitijk4poor	494824fb11	chore: remove unused sentinel in test_send_message_tool	2026-05-11 06:44:58 -07:00
Dominikh	379e7dd014	test(send_message): cover _check_send_message gating paths Adds a TestCheckSendMessage class with 7 focused tests pinning the four passing conditions and the failure modes: - HERMES_KANBAN_TASK grants access (the new branch) - HERMES_KANBAN_TASK short-circuits before consulting session_context or gateway.status (so workers don't depend on those import paths being healthy) - HERMES_SESSION_PLATFORM=telegram grants access - HERMES_SESSION_PLATFORM=local falls through to gateway check - is_gateway_running()=True grants access - All signals absent → False - gateway.status ImportError is swallowed → False Pinning the short-circuit (test #2) is the load-bearing one — it documents the contract that worker-side availability cannot regress to depending on gateway-side state lookups.	2026-05-11 06:44:58 -07:00
konsisumer	62cfe79e93	fix(tools): clarify kanban_complete phantom-card retry guidance When kanban_complete rejects a created_cards list as hallucinated, the task is intentionally left in-flight (the gate runs before the write txn) so the worker can retry with a corrected list or pass created_cards=[] to skip the check. The retry path already worked, but the previous error wording read like a terminal failure and workers were observed abandoning the run instead of trying again. Spell out the recovery path explicitly in the tool_error response ("Your task is still in-flight ... Retry kanban_complete with ...") and add regression coverage at both the kernel and tool layers so the retry contract — and the wording the worker depends on to discover it — is pinned. Fixes #22923	2026-05-10 16:14:43 -07:00
Teknium	d4b26df897	perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226 ) Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a fast path when a supervisor is alive for the current task_id. Replaces the ~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms Runtime.evaluate over the supervisor's already-connected WebSocket. Falls through to the existing agent-browser CLI path when no supervisor is running (e.g. backends without CDP, or before the first browser_navigate attaches one), so behaviour is unchanged where it can't apply. JS-side exceptions surface directly without falling through to the subprocess (the subprocess would just re-raise the same error, slower); supervisor-side failures (loop down, no session) fall through cleanly. Benchmark — 30 iterations of `1 + 1` against headless Chrome: supervisor WS mean= 0.96ms median= 0.91ms agent-browser subprocess mean=179.35ms median=167.73ms → 187x speedup mean Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5 real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome being installed). Browser test suite: 355 passed, 1 skipped.	2026-05-10 07:37:55 -07:00
Teknium	2704e7b67e	fix(kanban): restrict board routing tools to orchestrators Adapted from PR #20568 commit `ce3518578` (Eric Litovsky / @kallidean). Adds two-tier gating for the kanban tool surface so dispatcher-spawned workers see only task-lifecycle tools (show/complete/block/heartbeat/ comment/create/link) while orchestrator profiles with `toolsets: [kanban]` also see board-routing tools (kanban_list, kanban_unblock). Workers shouldn't be enumerating or unblocking the board — they should close their own task via the lifecycle tools. Hiding board-routing tools from worker schemas keeps the worker focused and the toolset-isolation contract honest. Plus inherited from the same upstream commit: - 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata. - Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside the orchestrator handlers in case a stale registration ever routes a worker to one of them. - Tests for the new gate, the stricter bound, and the fact that even a worker with `toolsets: [kanban]` in config still doesn't see board routing. Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>	2026-05-10 05:58:44 -07:00
Eric Litovsky	50d281495e	fix(kanban): parse triage flag explicitly	2026-05-10 05:58:44 -07:00
Eric Litovsky	26bf45f8c5	fix(kanban): parse include_archived explicitly	2026-05-10 05:58:44 -07:00
Eric Litovsky	236cbe16b6	feat(kanban): add orchestrator board tools	2026-05-10 05:58:44 -07:00
Teknium	3800972dd0	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 ) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).	2026-05-09 21:06:19 -07:00
Teknium	08ec602770	fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913 ) Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB (32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach in _write_to_sandbox put the entire tool result inside the 'bash -c' arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument list too long' before the heredoc ever ran. The caller logged a warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the warning never reached agent.log either, and the agent saw a 1.5 KB preview tagged 'Full output could not be saved to sandbox'. Hits delegate_task with 3+ subagent outputs routinely now. Switch to passing content via env.execute(stdin_data=...). cmd is now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight payload travels through stdin where there is no argv-element limit. E2E reproduced the user's exact 144,778-char delegate_task envelope: old code OSError'd, new code round-trips cleanly to disk with all three task summaries intact.	2026-05-09 18:44:58 -07:00
Wesley Simplicio	116a1446a4	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537	2026-05-09 17:53:35 -07:00
Wesley Simplicio	53ec32819c	fix(process_registry): kill orphaned Popen on post-spawn setup failure After Popen succeeds with os.setsid (detached process group), 5 things happen with no try/except: Thread construction, reader.start(), lock acquisition, prune+register, checkpoint write. If any raises, the Popen object goes unregistered and the detached process group leaks indefinitely. Wrap the post-spawn setup in try/except. On failure: - os.killpg(getpgid(pid), SIGKILL) takes down the entire process group (not just the shell - important because of detached PG + -lic shell wrapper that may have spawned children) - proc.kill() fallback for ProcessLookupError/PermissionError/OSError - proc.wait(timeout=5) reaps with a bound - re-raise to preserve original traceback Nested try/except around cleanup so a secondary failure can't mask the original. Closes #2749.	2026-05-09 17:53:24 -07:00
Wesley Simplicio	2245879af0	fix(checkpoint): guard _touch_project against non-dict project metadata Problem ======= `tools.checkpoint_manager._touch_project` reads the project metadata file with `json.loads(meta_path.read_text(...))`, then immediately does: meta["workdir"] = str(_normalize_path(working_dir)) The `except` block only catches `(OSError, ValueError)`. When the file parses successfully but returns a non-dict value (a list `[]`, `null`, or a scalar from a corrupted or hand-truncated write), `json.loads` succeeds without error and `meta` is set to, e.g., `[]`. The subsequent subscript assignment then raises `TypeError: list indices must be integers or slices, not str`, which is NOT caught by the narrow except clause. This TypeError propagates up through `_take` to `ensure_checkpoint`, where the broad `except Exception` safety net swallows it. The effect is that `ensure_checkpoint` silently returns False for the entire session — all checkpoints are skipped for the affected working directory without any user-visible error. Root cause ========== Missing `isinstance(meta, dict)` guard after `json.loads`, identical in pattern to bugs fixed in `cron/jobs.py` (#22569) and `tools/process_registry.py` (#22544). The same guard is already present one function below in `_list_projects` (line 506), but was inadvertently omitted in `_touch_project`. Fix === Add two lines after the try/except: ```python if not isinstance(meta, dict): meta = {} ``` This matches the existing guard in `_list_projects` and ensures a fresh empty dict is used whenever the persisted value is not a mapping — preserving the `created_at` semantics via `setdefault` on the next line. Tests ===== `TestTouchProjectMalformedMeta` covers four non-dict root values (`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file, calls `_touch_project`, and asserts: (a) no exception raised, (b) the metadata file is rewritten as a valid dict containing `last_touch` and `workdir`. All four fail on main with `TypeError`, pass with fix. Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.	2026-05-09 17:53:13 -07:00
heathley	0c5c4d1b8d	fix(skills-hub): cover remaining SSRF fetch paths after #10029	2026-05-09 17:52:12 -07:00
Henkey	b349ae1e4c	fix(acp): honor task cwd for foreground terminal commands	2026-05-09 14:46:34 -07:00
HenkDz	840ebe063e	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
Wesley Simplicio	ca13993217	fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680 ) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013.	2026-05-09 13:37:30 -07:00
Wesley Simplicio	48bf0ea249	fix(browser_tool): fall through to autodetect on config read failure	2026-05-09 13:35:39 -07:00
Wesley Simplicio	3170c8d448	fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324	2026-05-09 13:35:39 -07:00
Teknium	b959cfa056	fix: move pytest.importorskip below pytest import in skip-guarded tests The original PR placed 'pwd = pytest.importorskip("pwd")' on line 4 but 'import pytest' on line 9 — NameError on module load. Same for test_file_sync_back.py. Plus, the in-function 'pwd = pytest.importorskip' calls in test_auto_detected_root_is_rejected confused Python's scope analysis (later 'import pytest' made pytest local everywhere in the function) and caused UnboundLocalError. Drop the now-redundant in-function importorskip calls and rely on the module-level guard.	2026-05-09 11:12:03 -07:00
Wali Reheman	4e8b8573ca	tests: add Windows skip guards for UNIX-only stdlib imports	2026-05-09 11:12:03 -07:00
Teknium	b6ff96c057	fix(cron): allow quoted URL in github auth-header allowlist The github-pr-workflow skill wraps the URL in double-quotes ('curl -H ... "https://api.github.com/..."'), which the original allowlist regex (\s+https://api...) did not match. Without this, the bundled github-pr-workflow skill is still blocked at every cron tick despite #22605's fix landing for the bare-URL form. Make the leading quote optional and add a regression test pinning both single- and double-quoted forms.	2026-05-09 11:11:45 -07:00
qWaitCrypto	691778a08b	fix(cron): keep auth-header exfiltration blocked	2026-05-09 11:11:45 -07:00
qWaitCrypto	783d11717a	fix(cron): avoid github skill false positives in scanner	2026-05-09 11:11:45 -07:00
Teknium	1f4200debf	feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694 ) The delegate_task tool description hardcoded 'default 3' / 'default 2' for max_concurrent_children / max_spawn_depth, which misled the model on any install that raised these limits — the schema text said 'default 3' even when the user had set max_concurrent_children=15 / max_spawn_depth=3, so the model would self-cap at 3 and never use the headroom. Make the description dynamic. ToolEntry gains an optional dynamic_schema_overrides callable; registry.get_definitions() merges its output on top of the static schema before returning it. delegate_tool registers a builder that reads the current delegation.* config and emits: - 'up to N items concurrently for this user' (N = max_concurrent_children) - 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)' - 'orchestrator children can themselves delegate up to M more level(s)' - 'orchestrator_enabled=false' when the kill switch is set The model_tools cache key already includes config.yaml mtime+size, so edits to delegation.* in config invalidate the cached tool definitions without an explicit hook. CLI_CONFIG staleness within a process is a pre-existing limitation of _load_config and out of scope here. Static description / tasks.description / role.description in DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME.	2026-05-09 11:07:53 -07:00
GodsBoy	93e25ceb13	feat(plugins): add standalone_sender_fn for out-of-process cron delivery Plugin platforms (IRC, Teams, Google Chat) currently fail with `No live adapter for platform '<name>'` when a `deliver=<plugin>` cron job runs in a separate process from the gateway, even though the platforms are eligible cron targets via `cron_deliver_env_var` (added in #21306). Built-in platforms (Telegram, Discord, Slack, etc.) use direct REST helpers in `tools/send_message_tool.py` so cron can deliver without holding the gateway in the same process; plugin platforms historically depended on `_gateway_runner_ref()` which returns `None` out of process. This change adds an optional `standalone_sender_fn` field to `PlatformEntry` so plugins can register an ephemeral send path that opens its own connection, sends, and closes without needing the live adapter. The dispatch site in `_send_via_adapter` falls through to the hook when the gateway runner is unavailable, with a descriptive error when neither path applies. The hook is optional, so existing plugins are unaffected. Reference migrations land in the same change for IRC, Teams, and Google Chat, exercising the hook across stdlib (asyncio + IRC protocol), Bot Framework OAuth client_credentials, and Google service-account flows respectively. Security hardening on the new code paths: * IRC: control-character stripping on chat_id and message body to block CRLF command injection; bounded nick-collision retries; JOIN before PRIVMSG so channels with the default `+n` mode accept the delivery. * Teams: TEAMS_SERVICE_URL validated against an allowlist of known Bot Framework hosts (`smba.trafficmanager.net`, `smba.infra.gov.teams.microsoft.us`) to block SSRF; chat_id and tenant_id constrained to the documented Bot Framework character set; per-request timeouts so a slow STS endpoint cannot starve the activity POST. * Google Chat: chat_id and thread_id validated against strict resource-name regexes; service-account refresh wrapped in `asyncio.wait_for` so a hung token endpoint cannot stall the scheduler. Test coverage: 20 new tests covering happy path, missing-config errors, network failure modes, and each defensive validation. Existing tests unchanged. `bash scripts/run_tests.sh tests/tools/test_send_message_tool.py tests/gateway/test_irc_adapter.py tests/gateway/test_teams.py tests/gateway/test_google_chat.py` reports 341 passed, 0 regressions. Documentation: new "Out-of-process cron delivery" section in website/docs/developer-guide/adding-platform-adapters.md and an entry in gateway/platforms/ADDING_A_PLATFORM.md naming the hook.	2026-05-09 02:56:29 -07:00
kshitijk4poor	e3ebaa19ba	test(kanban): cover kanban_comment author hardening + cross-task policy - Renames test_comment_custom_author -> test_comment_ignores_caller_supplied_author and inverts its assertion: an args['author'] override is silently ignored; the author always comes from HERMES_PROFILE. - Adds test_comment_schema_omits_author_override to assert the 'author' property is gone from KANBAN_COMMENT_SCHEMA so the forgery surface stays closed if someone re-adds the schema field by accident. - Adds test_worker_can_comment_on_foreign_task to pin the #19713 policy decision: cross-task commenting must remain unrestricted. Without this guard, a future change accidentally adding _enforce_worker_task_ownership to _handle_comment would close the documented handoff channel between tasks.	2026-05-09 02:32:16 -07:00
Bartok	326ca754ad	fix(delegate): accept JSON string batch tasks Recover delegate_task batch inputs when open-weight models emit tasks as a JSON-encoded array string, and return clear errors for malformed task lists. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 02:18:57 -07:00
kshitij	ae005ec588	fix(send_message): map Telegram General topic id to None for forum groups (#22423 ) Telegram forum supergroups address the General topic as `message_thread_id="1"` on incoming updates, but the Bot API rejects sends with `message_thread_id=1` ("Message thread not found"). The gateway adapter has a `_message_thread_id_for_send` helper that maps "1" to None for that reason; the standalone `_send_telegram` helper used by the `send_message` tool never got the same mapping, so any `send_message` call to a Topics-enabled group's General topic (target shape `telegram:<chat_id>:1`) failed with "Message thread not found." Reuse the adapter's helper when available, with an explicit fallback to the same mapping for environments where the adapter import path fails (e.g. python-telegram-bot missing in this venv). Fixes #22267	2026-05-09 01:58:33 -07:00
helix4u	e407376c50	fix(cron): normalize partial job records	2026-05-09 01:11:41 -07:00

1 2 3 4 5 ...

811 Commits