hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-21 03:39:54 +00:00

Files

T

Teknium 3800972dd0 feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )

When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.

Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.

Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.

Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
  capability table (_supports_media_in_tool_results). Schema description
  updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
  accepts the array form for multimodal tool results (was string-only).
  Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
  tools see the live CLI/gateway override, not the stale config.yaml
  default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
  start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.

Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
  table, envelope shape, fast-path gating (vision-capable model uses
  fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
  content becomes function_call_output.output array; preflight
  preserves arrays and drops unknown part types.

Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
  typed filepath, gets pixels back, reads exact text from images that
  no aux description could capture (font color irony, multi-line
  fruit-count list, etc.).

PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).

2026-05-09 21:06:19 -07:00

browser_providers

feat: ungate Tool Gateway — subscription-based access with per-tool opt-in

2026-04-16 12:36:49 -07:00

computer_use

fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765 )

2026-05-09 13:02:25 -07:00

environments

feat(cross-platform): psutil for PID/process management + Windows footgun checker

2026-05-08 14:27:40 -07:00

neutts_samples

refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

2026-03-17 02:33:12 -07:00

web_providers

docs(web): fix SearXNG env configuration

2026-05-07 17:54:47 -07:00

__init__.py

Merge branch 'main' into rewbs/tool-use-charge-to-subscription

2026-03-31 08:48:54 +09:00

ansi_strip.py

fix: strip ANSI at the source — clean terminal output before it reaches the model

2026-03-23 07:43:12 -07:00

approval.py

fix(approval): cron jobs must not be treated as gateway context

2026-05-08 07:30:14 -07:00

binary_extensions.py

fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening

2026-04-08 02:24:32 -07:00

browser_camofox_state.py

feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400 ) (#4419 )

2026-04-01 04:18:50 -07:00

browser_camofox.py

refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304 )

2026-04-28 23:17:39 -07:00

browser_cdp_tool.py

fix(async): replace get_event_loop() with get_running_loop() in async contexts

2026-05-09 02:34:19 -07:00

browser_dialog_tool.py

feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 )

2026-04-23 22:23:37 -07:00

browser_supervisor.py

fix(browser_supervisor): verify thread and loop health before returning cached supervisor

2026-04-30 20:33:33 -07:00

browser_tool.py

fix(browser_tool): fall through to autodetect on config read failure

2026-05-09 13:35:39 -07:00

budget_config.py

fix: preserve existing thresholds, remove pre-read byte guard

2026-04-08 02:24:32 -07:00

checkpoint_manager.py

fix(checkpoint): guard _touch_project against non-dict project metadata

2026-05-09 17:53:13 -07:00

clarify_tool.py

refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites

2026-04-07 13:36:38 -07:00

code_execution_tool.py

feat(cross-platform): psutil for PID/process management + Windows footgun checker

2026-05-08 14:27:40 -07:00

computer_use_tool.py

feat(computer-use): cua-driver backend, universal any-model schema

2026-05-08 11:07:38 -07:00

credential_files.py

fix(gateway): translate inbound document host paths to container paths for Docker backend

2026-05-07 05:02:26 -07:00

cronjob_tools.py

fix(cron): allow quoted URL in github auth-header allowlist

2026-05-09 11:11:45 -07:00

debug_helpers.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

delegate_tool.py

feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 )

2026-05-09 14:47:00 -07:00

discord_tool.py

feat: add Discord message deletion action

2026-05-07 05:11:09 -07:00

env_passthrough.py

refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304 )

2026-04-28 23:17:39 -07:00

feishu_doc_tool.py

perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )

2026-05-08 16:39:32 -07:00

feishu_drive_tool.py

perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )

2026-05-08 16:39:32 -07:00

file_operations.py

fix(windows): %1 install error, patch CRLF false-negative, SOUL.md BOM

2026-05-08 14:27:40 -07:00

file_state.py

feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 )

2026-04-21 16:41:26 -07:00

file_tools.py

fix(patch-tool): advertise per-mode required params in schema descriptions

2026-05-08 16:59:24 -07:00

fuzzy_match.py

fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage

2026-04-21 02:03:46 -07:00

homeassistant_tool.py

fix: clean up description escaping, add string-data tests

2026-04-13 04:45:07 -07:00

image_generation_tool.py

perf(image_gen): defer fal_client import to first generation request (#22859 )

2026-05-09 17:45:09 -07:00

interrupt.py

fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 )

2026-04-17 20:39:25 -07:00

kanban_tools.py

fix(security): drop caller-controlled author override in kanban_comment

2026-05-09 02:32:16 -07:00

managed_tool_gateway.py

fix(tools): add debug logging for token refresh and tighten domain check

2026-04-02 12:40:03 +11:00

mcp_oauth_manager.py

fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226 )

2026-05-07 05:35:33 -07:00

mcp_oauth.py

fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226 )

2026-05-07 05:35:33 -07:00

mcp_tool.py

fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper

2026-05-08 14:27:40 -07:00

memory_tool.py

feat(cross-platform): psutil for PID/process management + Windows footgun checker

2026-05-08 14:27:40 -07:00

microsoft_graph_auth.py

feat(msgraph): add auth and client foundation

2026-05-08 09:27:26 -07:00

microsoft_graph_client.py

fix(msgraph): stream download_to_file body instead of buffering

2026-05-08 09:27:26 -07:00

mixture_of_agents_tool.py

Fix (mixture_of_agents): replace deprecated Gemini model and forward max_tokens to OpenRouter (#6621 )

2026-04-23 15:14:11 -07:00

neutts_synth.py

fix(tts): document NeuTTS provider and align install guidance (#1903 )

2026-03-18 02:55:30 -07:00

openrouter_client.py

…

osv_check.py

feat: OSV malware check for MCP extension packages (#5305 )

2026-04-05 12:46:07 -07:00

patch_parser.py

fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage

2026-04-21 02:03:46 -07:00

path_security.py

refactor: extract shared helpers to deduplicate repeated code patterns (#7917 )

2026-04-11 13:59:52 -07:00

process_registry.py

fix(process_registry): kill orphaned Popen on post-spawn setup failure

2026-05-09 17:53:24 -07:00

registry.py

feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694 )

2026-05-09 11:07:53 -07:00

rl_training_tool.py

codebase: add encoding='utf-8' to all bare open() calls (PLW1514)

2026-05-08 14:27:40 -07:00

schema_sanitizer.py

fix: strip Codex-hostile top-level schema combinators

2026-05-07 07:03:21 -07:00

send_message_tool.py

feat(plugins): add standalone_sender_fn for out-of-process cron delivery

2026-05-09 02:56:29 -07:00

session_search_tool.py

fix: make session search initialize session db

2026-05-09 14:36:58 -07:00

skill_manager_tool.py

fix: exclude hidden and archive dirs from _find_skill rglob

2026-05-07 05:15:28 -07:00

skill_provenance.py

fix(curator): only mark agent-created for background-review sediment (#19621 )

2026-05-04 02:42:16 -07:00

skill_usage.py

feat(cross-platform): psutil for PID/process management + Windows footgun checker

2026-05-08 14:27:40 -07:00

skills_guard.py

feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)

2026-04-23 06:20:47 -07:00

skills_hub.py

fix(skills-hub): cover remaining SSRF fetch paths after #10029

2026-05-09 17:52:12 -07:00

skills_sync.py

refactor: consolidate symlink-safe atomic replace into shared helper

2026-04-28 04:58:22 -07:00

skills_tool.py

fix(skills): support category-qualified local skill names

2026-05-05 10:15:31 -07:00

slash_confirm.py

feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation

2026-04-29 21:56:47 -07:00

terminal_tool.py

fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV

2026-05-09 17:53:35 -07:00

tirith_security.py

codebase: add encoding='utf-8' to all bare open() calls (PLW1514)

2026-05-08 14:27:40 -07:00

todo_tool.py

fix(tools): enforce ID uniqueness in TODO store during replace operations

2026-04-11 16:22:50 -07:00

tool_backend_helpers.py

fix(cli): coerce use_gateway config flags in tool routing

2026-04-26 19:02:55 -07:00

tool_output_limits.py

feat(skills): add design-md skill for Google's DESIGN.md spec (#14876 )

2026-04-23 21:51:19 -07:00

tool_result_storage.py

fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913 )

2026-05-09 18:44:58 -07:00

transcription_tools.py

fix(ci): stabilize main test suite regressions (#17660 )

2026-04-29 23:18:55 -07:00

tts_tool.py

feat(cross-platform): psutil for PID/process management + Windows footgun checker

2026-05-08 14:27:40 -07:00

url_safety.py

fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234 ) (#21228 )

2026-05-07 05:38:05 -07:00

vision_tools.py

feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )

2026-05-09 21:06:19 -07:00

voice_mode.py

codebase: add encoding='utf-8' to all bare open() calls (PLW1514)

2026-05-08 14:27:40 -07:00

web_tools.py

perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )

2026-05-08 16:39:32 -07:00

website_policy.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

xai_http.py

feat(xai): upgrade to Responses API, add TTS provider

2026-04-16 02:24:08 -07:00

yuanbao_tools.py

chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )

2026-04-28 06:46:45 -07:00