mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-21 03:39:54 +00:00
lsp-plugin
48 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
407683b72d |
fix(docs): repair Voice & TTS provider table
Fixes NousResearch/hermes-agent#24101 |
||
|
|
fef1a41248 |
docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858)
Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.
messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
per-profile .env, not 'hermes config set' (same pattern fixed in
api-server.md last round). Also bumped example ports to 8650+ to dodge the
default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
collision.
developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
platform tree updated (qqbot is a sub-package, not qqbot.py; added
yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
(always active)' was wrong — it's an empty extension point and
_register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
legacy single-pair model — corrected to the canonical list-form
fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
'git submodule update --init' is required if cloning without
--recurse-submodules.
guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
positional (not --schedule), the script-only flag is --no-agent (not
--script-only), and there's no --command flag. Replaced with a real example
that creates the script under ~/.hermes/scripts/ and uses the actual flags.
Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
fails because --region isn't registered on the auth-add argparse spec.
Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
table mention with a webhook-subscription pointer; also fixed the dead link
to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
one quoted string, so cmd.exe gets a single arg instead of the multi-token
command line it needs. Removed the surrounding quotes — argparse nargs='*'
collects each token correctly.
integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
refreshed. Fallback section rewritten to lead with the canonical
fallback_providers list form (was leading with the legacy fallback_model
single dict); supported-providers list extended to include azure-foundry,
alibaba-coding-plan, lmstudio.
index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
integrations/index.md ('19+') and undercounted — bumped to 20+ and added
Weixin/QQ Bot/Yuanbao/Google Chat to the list.
Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.
|
||
|
|
0bcc327cab |
docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844)
The plumbing for setting OpenRouter provider preferences and the Pareto Code router on auxiliary tasks already exists — auxiliary.<task>.extra_body is forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented, so users who wanted (e.g.) Pareto Code routing for compression but the strongest coder for the main agent had no way to discover the escape hatch. - hermes_cli/config.py: expand the auxiliary section header with a YAML example showing provider routing plus plugins under extra_body, and an explicit note that main-agent provider_routing / openrouter.min_coding_score do NOT propagate to aux calls (each task is independent by design) - website/docs/user-guide/configuration.md: new 'OpenRouter routing and Pareto Code for auxiliary tasks' subsection with worked example - website/docs/integrations/providers.md: cross-link from the Pareto Code Router section to the aux-side doc E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with the configured provider routing and plugins blocks intact. |
||
|
|
c7f0aab949 |
feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
|
||
|
|
3beef57825 |
docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms
AGENTS.md is the AI-assistant entry doc, so its counts get used as ground truth. Several values had drifted, and the same drift had spread to a few user-facing surfaces. Fixing all of them in one commit so the count claims agree and clearly distinguish gateway-core from plugin-shipped platforms. AGENTS.md: - run_agent.py "~12k LOC" → "~14k LOC as of 2026-05-03" (actual 14,097) - cli.py "~11k LOC" → "~12k LOC as of 2026-05-03" (actual 12,043) - tools/environments/ list now lists all 7 user-selectable terminal backends in canonical order, matching tools/terminal_tool.py:2214-2215 - gateway/platforms/ list adds yuanbao and wecom_callback; the 19 names match the user-facing list at website/docs/integrations/index.md - plugins/ tree now mentions plugins/platforms/ (irc, teams) - tests/ snapshot "~15k tests across ~700 files as of Apr 2026" → "~19k tests across ~890 files as of 2026-05-03" User-facing count claims: - hermes_cli/tips.py:195 — "19 platforms" → "21 messaging platforms" with IRC and Microsoft Teams added to the named list - website/docs/index.md:49 — "6 terminal backends" → "7 terminal backends: ..., Vercel Sandbox" (also corrected by PR #19044; same edit content) - website/docs/index.md:50 — "15+ platforms from one gateway" → "21+ messaging platforms (19 in the gateway, plus IRC and Microsoft Teams via plugins)" - website/docs/integrations/index.md:83-85 — "15+ messaging platforms" → "19+", added yuanbao to the linked list. The surrounding text scopes it to "configured through the same gateway subsystem", so plugin platforms (IRC, Teams) are intentionally not in this list - website/scripts/generate-llms-txt.py:205 — "15+ platforms" → "21+ messaging platforms — 19 native to the gateway plus IRC and Microsoft Teams via plugins" LOC and date stamps follow the existing AGENTS.md "as of <date>" convention (line 56 already used this pattern). Source of truth for the gateway count is gateway/config.py:130-148 (PlatformID enum); plugin platforms live in plugins/platforms/. Out of scope: - RELEASE_v0.9.0.md historical "16 platforms" claim (immutable history) - userStories.json verbatim user quotes - Programmatic count generation from gateway/config.py + plugin manifests is a worthwhile build-system change but separate from these content fixes |
||
|
|
b1476c76f6 | docs(gemini): add Google Gemini guide | ||
|
|
acca3ec3af |
docs(providers): Together/Groq/Perplexity cookbook via custom_providers
Three worked recipes for OpenAI-compatible cloud providers, plus the Copilot HTTP 401 auto-recovery info block and the GMI Cloud row in the compatible providers table. All three additions were on the original docs/custom-providers-cookbook branch but its merge base predated 1186 main commits, making the rebase impractical (84k+ line conflict). Replays just the providers.md additions onto current main. |
||
|
|
20a4f79ed1 |
feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path
Introduces providers/ package — single source of truth for every inference provider. Adding a simple api-key provider now requires one providers/<name>.py file with zero edits anywhere else. What this PR ships: - providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes) - ProviderProfile declarative fields: name, api_mode, aliases, display_name, env_vars, base_url, models_url, auth_type, fallback_models, hostname, default_headers, fixed_temperature, default_max_tokens, default_aux_model - 4 overridable hooks: prepare_messages, build_extra_body, build_api_kwargs_extras, fetch_models - chat_completions.build_kwargs: profile path via _build_kwargs_from_profile, legacy flag path retained for lmstudio/tencent-tokenhub (which have session-aware reasoning probing that doesn't map cleanly to hooks yet) - run_agent.py: profile path for all registered providers; legacy path variable scoping fixed (all flags defined before branching) - Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS, doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER - GeminiProfile: thinking_config translation (native + openai-compat nested) - New tests/providers/ (79 tests covering profile declarations, transport parity, hook overrides, e2e kwargs assembly) Deltas vs original PR (salvaged onto current main): - Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth (were added to main since original PR) - Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their reasoning_effort probing has no clean hook equivalent yet) - Removed lmstudio alias from custom profile (it's a separate provider now) - Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension (resolve_provider special-cases them; adding breaks runtime resolution) - runtime_provider: profile.api_mode only as fallback when URL detection finds nothing (was breaking minimax /v1 override) - Preserved main's legacy-path improvements: deepseek reasoning_content preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M beta recovery, etc. - Kept agent/copilot_acp_client.py in place (rejected PR's relocation — main has 7 fixes landed since; relocation would revert them) - _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing test imports Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #14418 |
||
|
|
93869b48ab |
docs: add Microsoft Teams to platform lists across docs
Update all platform enumeration lists to include Teams: index.md, quickstart.md, integrations/index.md, sessions.md, slash-commands.md, updating.md, hooks.md, hermes-agent skill. Skipped PII redaction docs — Teams uses AAD object IDs, not phone numbers, so redaction doesn't apply there. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
289cc47631 |
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738)
Broad drift audit against origin/main (
|
||
|
|
22ff6ca32b |
docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727)
Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged. |
||
|
|
40a98fb0fa |
feat(minimax-oauth): full integration with peer OAuth providers
Close integration gaps discovered by auditing qwen-oauth's file coverage. These are surfaces the original salvage missed — they all existed on main and were added in the 747 commits since PR #15203 was opened. Coverage added: - agent/credential_pool.py: seed pool from auth.json providers.minimax-oauth so `hermes auth list` reflects logged-in state and `hermes auth remove minimax-oauth <N>` works through the standard flow. - agent/credential_sources.py: register RemovalStep for minimax-oauth with suppression-aware `_clear_auth_store_provider`. - agent/models_dev.py: PROVIDER_TO_MODELS_DEV mapping (-> 'minimax' family). - hermes_cli/providers.py: HermesOverlay entry (anthropic_messages transport, oauth_external auth_type, api.minimax.io/anthropic base). - hermes_cli/model_normalize.py: add to _MATCHING_PREFIX_STRIP_PROVIDERS so `minimax-oauth/MiniMax-M2.7` in config.yaml gets correctly repaired. - hermes_cli/status.py: render MiniMax OAuth block in `hermes doctor` (logged-in / region / expires_at / error). - hermes_cli/web_server.py: register in OAUTH_PROVIDER_REGISTRY + dispatch branch in _resolve_provider_status so the dashboard auth page shows it. - website/docs/integrations/providers.md: full 'MiniMax (OAuth)' section. - website/docs/reference/cli-commands.md: --provider enum. - website/docs/user-guide/features/fallback-providers.md: fallback table row. - scripts/release.py AUTHOR_MAP: amanning3390 mapping (CI gate). |
||
|
|
ed170f4333 |
docs(anthropic): correct OAuth scope to Max plan + extra usage credits only (#17404)
The previous docs pass (#17399) overstated what Anthropic OAuth works with. In practice Hermes can only route against a Claude Max plan that has purchased extra usage credits — the base Max allowance is not consumed, and Claude Pro is not supported at all. Without Max + extra credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token). Updates the four pages touched in #17399: - integrations/providers.md - user-guide/features/credential-pools.md - reference/environment-variables.md - getting-started/quickstart.md |
||
|
|
be57af7188 |
docs(anthropic): clarify OAuth uses Claude Pro/Max subscription usage (#17399)
Users have been asking what they're billed for when they authenticate Anthropic via OAuth in Hermes. Clarify in the provider docs that OAuth routes through Anthropic's Claude Code subscription path — consuming the extra Claude Code usage included with their Pro or Max plan — and that an ANTHROPIC_API_KEY is pay-per-token against that key's org instead. Touches: - integrations/providers.md: new info admonition in Anthropic (Native) section, plus provider-table row. - user-guide/features/credential-pools.md: OAuth comment line. - reference/environment-variables.md: Provider Auth (OAuth) intro. - getting-started/quickstart.md: provider-picker table row. |
||
|
|
433d38da09 | chore(docs): update provider docs | ||
|
|
214ca943ac | feat(agent): add lmstudio integration | ||
|
|
a6a6cf047d |
feat(providers): add tencent-tokenhub provider support
Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a new API-key provider with model tencent/hy3-preview (256K context). - PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars - Aliases: tencent, tokenhub, tencent-cloud, tencentmaas - openai_chat transport with is_tokenhub branch for top-level reasoning_effort (Hy3 is a reasoning model) - tencent/hy3-preview:free added to OpenRouter curated list - 60+ tests (provider registry, aliases, runtime resolution, credentials, model catalog, URL mapping, context length) - Docs: integrations/providers.md, environment-variables.md, model-catalog.json Author: simonweng <simonweng@tencent.com> Salvaged from PR #16860 onto current main (resolved conflicts with #16935 Azure Anthropic env-var hint tests and the --provider choices= list removal in chat_parser). |
||
|
|
56724147ef |
fix(providers/gmi): post-salvage review fixes
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
|
||
|
|
c53fcb0173 |
feat(providers): add GMI Cloud as a first-class API-key provider (#11955)
Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local> |
||
|
|
2cab8129d1 |
feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild
When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section. |
||
|
|
73533fc728 | docs: add GMI Cloud to compatible providers list | ||
|
|
424e9f36b0 |
refactor: remove smart_model_routing feature (#12732)
Smart model routing (auto-routing short/simple turns to a cheap model across providers) was opt-in and disabled by default. This removes the feature wholesale: the routing module, its config keys, docs, tests, and the orchestration scaffolding it required in cli.py / gateway/run.py / cron/scheduler.py. The /fast (Priority Processing / Anthropic fast mode) feature kept its hooks into _resolve_turn_agent_config — those still build a route dict and attach request_overrides when the model supports it; the route now just always uses the session's primary model/provider rather than running prompts through choose_cheap_model_route() first. Also removed: - DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out example sections in hermes_cli/config.py and cli-config.yaml.example - _load_smart_model_routing() / self._smart_model_routing on GatewayRunner - self._smart_model_routing / self._active_agent_route_signature on HermesCLI (signature kept; just no longer initialised through the smart-routing pipeline) - route_label parameter on HermesCLI._init_agent (only set by smart routing; never read elsewhere) - 'Smart Model Routing' section in website/docs/integrations/providers.md - tip in hermes_cli/tips.py - entries in hermes_cli/dump.py + hermes_cli/web_server.py - row in skills/autonomous-ai-agents/hermes-agent/SKILL.md Tests: - Deleted tests/agent/test_smart_model_routing.py - Rewrote tests/agent/test_credential_pool_routing.py to target the simplified _resolve_turn_agent_config directly (preserves credential pool propagation + 429 rotation coverage) - Dropped 'cheap model' test from test_cli_provider_resolution.py - Dropped resolve_turn_route patches from cli + gateway test_fast_command — they now exercise the real method end-to-end - Removed _smart_model_routing stub assignments from gateway/cron test helpers Targeted suites: 74/74 in the directly affected test files; tests/agent + tests/cron + tests/cli pass except 5 failures that already exist on main (cron silent-delivery + alias quick-command). |
||
|
|
d66414a844 | docs(custom-providers): use key_env in examples | ||
|
|
54e0eb24c0 |
docs: correctness audit — fix wrong values, add missing coverage (#11972)
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.
### Wrong values fixed (users would paste-and-fail)
- reference/environment-variables.md:
- DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
- MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
`/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
`mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
on disk (all moved to `optional-skills/`). Regenerated the whole bundled
section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
`free_response_channels` equivalent; both the env var and the yaml key are
in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
adapter only reads `extra.markdown_support` from config.yaml. Removed the
env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.
### Missing coverage added
- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
PROVIDER_REGISTRY but undocumented everywhere. Added sections to
integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
`feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
rows + all missing `hermes-<platform>` toolsets in the platform table
(bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
homeassistant, webhook, gateway). Fixed the `debugging` composite to
describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
_DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
that invented a `telegram.webhook_mode: true` yaml key the adapter does
not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
`on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
`/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
(10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
`title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
`nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
`claude-opus-4.7`.
### Docs-site integrity
- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
(`pre_api_request`, `post_api_request`). Replaced with the real
`on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
broken links to `/docs/user-guide/features/profiles` (actual path is
`/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
JSX tag. Escaped to `<1%`.
### False positives filtered out (not changed, verified correct)
- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).
Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
|
||
|
|
11a89cc032 |
docs: backfill coverage for recently-merged features (#11942)
Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors. |
||
|
|
3b569ff576 |
feat(providers): add native NVIDIA NIM provider
Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list. |
||
|
|
3524ccfcc4 |
feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist (free + paid tiers) (#11270)
* feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist
Adds 'google-gemini-cli' as a first-class inference provider with native
OAuth authentication against Google, hitting the Cloud Code Assist backend
(cloudcode-pa.googleapis.com) that powers Google's official gemini-cli.
Supports both the free tier (generous daily quota, personal accounts) and
paid tiers (Standard/Enterprise via GCP projects).
Architecture
============
Three new modules under agent/:
1. google_oauth.py (625 lines) — PKCE Authorization Code flow
- Google's public gemini-cli desktop OAuth client baked in (env-var overrides supported)
- Cross-process file lock (fcntl POSIX / msvcrt Windows) with thread-local re-entrancy
- Packed refresh format 'refresh_token|project_id|managed_project_id' on disk
- In-flight refresh deduplication — concurrent requests don't double-refresh
- invalid_grant → wipe credentials, prompt re-login
- Headless detection (SSH/HERMES_HEADLESS) → paste-mode fallback
- Refresh 60 s before expiry, atomic write with fsync+replace
2. google_code_assist.py (350 lines) — Code Assist control plane
- load_code_assist(): POST /v1internal:loadCodeAssist (prod → sandbox fallback)
- onboard_user(): POST /v1internal:onboardUser with LRO polling up to 60 s
- retrieve_user_quota(): POST /v1internal:retrieveUserQuota → QuotaBucket list
- VPC-SC detection (SECURITY_POLICY_VIOLATED → force standard-tier)
- resolve_project_context(): env → config → discovered → onboarded priority
- Matches Google's gemini-cli User-Agent / X-Goog-Api-Client / Client-Metadata
3. gemini_cloudcode_adapter.py (640 lines) — OpenAI↔Gemini translation
- GeminiCloudCodeClient mimics openai.OpenAI interface (.chat.completions.create)
- Full message translation: system→systemInstruction, tool_calls↔functionCall,
tool results→functionResponse with sentinel thoughtSignature
- Tools → tools[].functionDeclarations, tool_choice → toolConfig modes
- GenerationConfig pass-through (temperature, max_tokens, top_p, stop)
- Thinking config normalization (thinkingBudget, thinkingLevel, includeThoughts)
- Request envelope {project, model, user_prompt_id, request}
- Streaming: SSE (?alt=sse) with thought-part → reasoning stream separation
- Response unwrapping (Code Assist wraps Gemini response in 'response' field)
- finishReason mapping to OpenAI convention (STOP→stop, MAX_TOKENS→length, etc.)
Provider registration — all 9 touchpoints
==========================================
- hermes_cli/auth.py: PROVIDER_REGISTRY, aliases, resolver, status fn, dispatch
- hermes_cli/models.py: _PROVIDER_MODELS, CANONICAL_PROVIDERS, aliases
- hermes_cli/providers.py: HermesOverlay, ALIASES
- hermes_cli/config.py: OPTIONAL_ENV_VARS (HERMES_GEMINI_CLIENT_ID/_SECRET/_PROJECT_ID)
- hermes_cli/runtime_provider.py: dispatch branch + pool-entry branch
- hermes_cli/main.py: _model_flow_google_gemini_cli with upfront policy warning
- hermes_cli/auth_commands.py: pool handler, _OAUTH_CAPABLE_PROVIDERS
- hermes_cli/doctor.py: 'Google Gemini OAuth' health check
- run_agent.py: single dispatch branch in _create_openai_client
/gquota slash command
======================
Shows Code Assist quota buckets with 20-char progress bars, per (model, tokenType).
Registered in hermes_cli/commands.py, handler _handle_gquota_command in cli.py.
Attribution
===========
Derived with significant reference to:
- jenslys/opencode-gemini-auth (MIT) — OAuth flow shape, request envelope,
public client credentials, retry semantics. Attribution preserved in module
docstrings.
- clawdbot/extensions/google — VPC-SC handling, project discovery pattern.
- PR #10176 (@sliverp) — PKCE module structure.
- PR #10779 (@newarthur) — cross-process file locking pattern.
Supersedes PRs #6745, #10176, #10779 (to be closed on merge with credit).
Upfront policy warning
======================
Google considers using the gemini-cli OAuth client with third-party software
a policy violation. The interactive flow shows a clear warning and requires
explicit 'y' confirmation before OAuth begins. Documented prominently in
website/docs/integrations/providers.md.
Tests
=====
74 new tests in tests/agent/test_gemini_cloudcode.py covering:
- PKCE S256 roundtrip
- Packed refresh format parse/format/roundtrip
- Credential I/O (0600 perms, atomic write, packed on disk)
- Token lifecycle (fresh/expiring/force-refresh/invalid_grant/rotation preservation)
- Project ID env resolution (3 env vars, priority order)
- Headless detection
- VPC-SC detection (JSON-nested + text match)
- loadCodeAssist parsing + VPC-SC → standard-tier fallback
- onboardUser: free-tier allows empty project, paid requires it, LRO polling
- retrieveUserQuota parsing
- resolve_project_context: 3 short-circuit paths + discovery + onboarding
- build_gemini_request: messages → contents, system separation, tool_calls,
tool_results, tools[], tool_choice (auto/required/specific), generationConfig,
thinkingConfig normalization
- Code Assist envelope wrap shape
- Response translation: text, functionCall, thought → reasoning,
unwrapped response, empty candidates, finish_reason mapping
- GeminiCloudCodeClient end-to-end with mocked HTTP
- Provider registration (9 tests: registry, 4 alias forms, no-regression on
google-gemini alias, models catalog, determine_api_mode, _OAUTH_CAPABLE_PROVIDERS
preservation, config env vars)
- Auth status dispatch (logged-in + not)
- /gquota command registration
- run_gemini_oauth_login_pure pool-dict shape
All 74 pass. 349 total tests pass across directly-touched areas (existing
test_api_key_providers, test_auth_qwen_provider, test_gemini_provider,
test_cli_init, test_cli_provider_resolution, test_registry all still green).
Coexistence with existing 'gemini' (API-key) provider
=====================================================
The existing gemini API-key provider is completely untouched. Its alias
'google-gemini' still resolves to 'gemini', not 'google-gemini-cli'.
Users can have both configured simultaneously; 'hermes model' shows both
as separate options.
* feat(gemini): ship Google's public gemini-cli OAuth client as default
Pivots from 'scrape-from-local-gemini-cli' (clawdbot pattern) to
'ship-creds-in-source' (opencode-gemini-auth pattern) for zero-setup UX.
These are Google's PUBLIC gemini-cli desktop OAuth credentials, published
openly in Google's own open-source gemini-cli repository. Desktop OAuth
clients are not confidential — PKCE provides the security, not the
client_secret. Shipping them here matches opencode-gemini-auth (MIT) and
Google's own distribution model.
Resolution order is now:
1. HERMES_GEMINI_CLIENT_ID / _SECRET env vars (power users, custom GCP clients)
2. Shipped public defaults (common case — works out of the box)
3. Scrape from locally installed gemini-cli (fallback for forks that
deliberately wipe the shipped defaults)
4. Helpful error with install / env-var hints
The credential strings are composed piecewise at import time to keep
reviewer intent explicit (each constant is paired with a comment about
why it's non-confidential) and to bypass naive secret scanners.
UX impact: users no longer need 'npm install -g @google/gemini-cli' as a
prerequisite. Just 'hermes model' -> 'Google Gemini (OAuth)' works out
of the box.
Scrape path is retained as a safety net. Tests cover all four resolution
steps (env / shipped default / scrape fallback / hard failure).
79 new unit tests pass (was 76, +3 for the new resolution behaviors).
|
||
|
|
10edd288c3 |
docs: add Nous Tool Gateway documentation
- New page: user-guide/features/tool-gateway.md covering eligibility, setup (hermes model, hermes tools, manual config), how use_gateway works, precedence, switching back, status checking, self-hosted gateway env vars, and FAQ - Added to sidebar under Features (top-level, before Core category) - Cross-references from: overview.md, tools.md, browser.md, image-generation.md, tts.md, providers.md, environment-variables.md - Added Nous Tool Gateway subsection to env vars reference with TOOL_GATEWAY_DOMAIN, TOOL_GATEWAY_SCHEME, TOOL_GATEWAY_USER_TOKEN, and FIRECRAWL_GATEWAY_URL |
||
|
|
4da598b48a |
docs: clarify hermes model vs /model — two commands, two purposes (#10276)
Users are confused about the difference between `hermes model` (terminal command for full provider setup) and `/model` (session command for switching between already-configured providers). This distinction was not documented anywhere. Changes across 4 doc pages: - cli-commands.md: Added warning callout explaining the difference, added --global flag docs, added 'only see OpenRouter models?' info box - slash-commands.md: Added notes on both TUI and messaging /model entries that /model only switches between configured providers - providers.md: Added 'Two Commands for Model Management' comparison table near top of page, added warning callout in switching section - faq.md: Added new FAQ entry '/model only shows one provider' with quick reference table Prompted by user feedback in Discord — new users consistently hit this confusion when trying to add providers from inside a session. |
||
|
|
1acf81fdf5 |
docs: add QQBot to all 14 docs pages (full platform parity)
- sidebars.ts: sidebar navigation entry - webhooks.md: deliver field routing table - configuration.md: platform keys list - sessions.md: platform identifiers table - features/cron.md: delivery target table - developer-guide/architecture.md: adapter listing - developer-guide/cron-internals.md: delivery target table - developer-guide/gateway-internals.md: file tree listing - guides/cron-troubleshooting.md: supported platforms list - integrations/index.md: platform links list - reference/toolsets-reference.md: toolset table (qqbot.md, environment-variables.md, and messaging/index.md were already included in the contributor's original PR) |
||
|
|
0a4cf5b3e1 |
feat(providers): add Arcee AI as direct API provider
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini. Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, setup.py, trajectory_compressor.py. Based on PR #9274 by arthurbr11, simplified to a standard direct provider without dual-endpoint OpenRouter routing. |
||
|
|
0e60a9dc25 |
fix: add kimi-coding-cn to remaining provider touchpoints
Follow-up for salvaged PR #7637. Adds kimi-coding-cn to: - model_normalize.py (prefix strip) - providers.py (models.dev mapping) - runtime_provider.py (credential resolution) - setup.py (model list + setup label) - doctor.py (health check) - trajectory_compressor.py (URL detection) - models_dev.py (registry mapping) - integrations/providers.md (docs) |
||
|
|
289d2745af |
docs: add platform adapter developer guide + WeCom Callback docs (#7969)
Add the missing 'Adding a Platform Adapter' developer guide — a comprehensive step-by-step checklist covering all 20+ integration points (enum, adapter, config, runner, CLI, tools, toolsets, cron, webhooks, tests, and docs). Includes common patterns for long-poll, callback/webhook, and token-lock adapters with reference implementations. Also adds full docs coverage for the WeCom Callback platform: - New docs page: user-guide/messaging/wecom-callback.md - Environment variables reference (9 WECOM_CALLBACK_* vars) - Toolsets reference (hermes-wecom-callback) - Messaging index (comparison table, architecture diagram, toolsets, security, next-steps links) - Integrations index listing - Sidebar entries for both new pages |
||
|
|
d4bb44d4b9 |
docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len
- environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list - cli-commands.md: --provider choices - integrations/providers.md: provider table, Chinese providers section, config example, base URL list, choosing table, fallback providers list - fallback-providers.md: supported providers table, auto-detection chain - Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry) |
||
|
|
640441b865 | feat(tools): add Voxtral TTS provider (Mistral AI) | ||
|
|
7cec784b64 |
fix: complete Weixin platform parity audit — 16 missing integration points
Systematic audit found Weixin missing from: Code: - gateway/run.py: early WEIXIN_ALLOW_ALL_USERS env check - gateway/platforms/webhook.py: cross-platform delivery routing - hermes_cli/dump.py: platform detection for config export - hermes_cli/setup.py: hermes setup wizard platform list + _setup_weixin - hermes_cli/skills_config.py: platform labels for skills config UI Docs (11 pages): - developer-guide/architecture.md: platform adapter listing - developer-guide/cron-internals.md: delivery target table - developer-guide/gateway-internals.md: file tree - guides/cron-troubleshooting.md: supported platforms list - integrations/index.md: platform links - reference/toolsets-reference.md: toolset table - user-guide/configuration.md: platform keys for tool_progress - user-guide/features/cron.md: delivery target table - user-guide/messaging/index.md: intro text, feature table, mermaid diagram, toolset table, setup links - user-guide/messaging/webhooks.md: deliver field + routing table - user-guide/sessions.md: platform identifiers table |
||
|
|
34d06a9802 |
fix(compaction): don't halve context_length on output-cap-too-large errors
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.
Two distinct error classes now handled separately:
Prompt too long — input itself exceeds context window.
Fix: compress history + halve context_length (existing behaviour,
unchanged).
Output cap too large — input OK, but input + max_tokens > window.
Fix: parse available_tokens from the error message, set a one-shot
_ephemeral_max_output_tokens override for the retry, and leave
context_length completely untouched.
Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
that detects Anthropic's "available_tokens: N" error format and returns
the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
safety margin) and break to retry without touching context_length.
_build_api_kwargs consumes the ephemeral value exactly once then clears
it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
clearly document the max_tokens (output cap) vs context_length (total
window) distinction, which is a persistent source of confusion due to
the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
the parser, build_anthropic_kwargs clamping, ephemeral one-shot
consumption, and the invariant that context_length is never mutated
on output-cap errors.
|
||
|
|
ad06bfccf0 |
fix: remove dead LLM_MODEL env var — add migration to clear stale .env entries (#6543)
The old setup wizard (pre-March 2026) wrote LLM_MODEL to ~/.hermes/.env
across 12 provider flows. Commit
|
||
|
|
7120d6cdd6 |
fix(bluebubbles): add missing integration points and documentation (#6460)
- hermes_cli/skills_config.py: add platform label for per-platform skill config - gateway/session.py: add to PII-safe platforms (no mention system) - website/docs/user-guide/messaging/bluebubbles.md: full setup guide - website/sidebars.ts: sidebar navigation entry - 10 docs pages: add BlueBubbles to all platform enumerations (env vars, toolsets, cron delivery, gateway internals, etc.) |
||
|
|
afe6c63c52 |
docs: comprehensive docs audit — cover 13 features from last week's PRs (#5815)
Cover documentation gaps found by auditing all 50+ merged PRs from the past week:
tools-reference.md:
- Fix stale tool count (47→46, 11→10 browser tools) after browser_close removal
- Document notify_on_complete parameter in terminal tool description
telegram.md:
- Add Interactive Model Picker section (inline keyboard, provider/model drill-down)
discord.md:
- Add Interactive Model Picker section (Select dropdowns, 120s timeout)
- Add Native Slash Commands for Skills section (auto-registration at startup)
signal.md:
- Expand Attachments section with outgoing media delivery (send_image_file,
send_voice, send_video, send_document via MEDIA: tags)
webhooks.md:
- Document {__raw__} special template token for full payload access
- Document Forum Topic Delivery via message_thread_id in deliver_extra
slack.md:
- Fix stale/misleading thread reply docs — thread replies no longer require
@mention when bot has active session (3 locations updated)
security.md:
- Add cross-session isolation (layer 6) and input sanitization (layer 7)
to security layers overview
feishu.md:
- Add WebSocket Tuning section (ws_reconnect_interval, ws_ping_interval)
- Add Per-Group Access Control section (group_rules with 5 policy types)
credential-pools.md:
- Add Delegation & Subagent Sharing section
delegation.md:
- Update key properties to mention credential pool inheritance
providers.md:
- Add Z.AI Endpoint Auto-Detection note
- Add xAI (Grok) Prompt Caching section
skills-catalog.md:
- Add p5js to creative skills category
|
||
|
|
c58e16757a |
docs: fix 40+ discrepancies between documentation and codebase (#5818)
Comprehensive audit of all ~100 doc pages against the actual code, fixing: Reference docs: - HERMES_API_TIMEOUT default 900 -> 1800 (env-vars) - TERMINAL_DOCKER_IMAGE default python:3.11 -> nikolaik/python-nodejs (env-vars) - compression.summary_model default shown as gemini -> actually empty string (env-vars) - Add missing GOOGLE_API_KEY, GEMINI_API_KEY, GEMINI_BASE_URL env vars (env-vars) - Add missing /branch (/fork) slash command (slash-commands) - Fix hermes-cli tool count 39 -> 38 (toolsets-reference) - Fix hermes-api-server drop list to include text_to_speech (toolsets-reference) - Fix total tool count 47 -> 48, standalone 14 -> 15 (tools-reference) User guide: - web_extract.timeout default 30 -> 360 (configuration) - Remove display.theme_mode (not implemented in code) (configuration) - Remove display.background_process_notifications (not in defaults) (configuration) - Browser inactivity timeout 300/5min -> 120/2min (browser) - Screenshot path browser_screenshots -> cache/screenshots (browser) - batch_runner default model claude-sonnet-4-20250514 -> claude-sonnet-4.6 - Add minimax to TTS provider list (voice-mode) - Remove credential_pool_strategies from auth.json example (credential-pools) - Fix Slack token path platforms/slack/ -> root ~/.hermes/ (slack) - Fix Matrix store path for new installs (matrix) - Fix WhatsApp session path for new installs (whatsapp) - Fix HomeAssistant config from gateway.json to config.yaml (homeassistant) - Fix WeCom gateway start command (wecom) Developer guide: - Fix tool/toolset counts in architecture overview - Update line counts: main.py ~5500, setup.py ~3100, run.py ~7500, mcp_tool ~2200 - Replace nonexistent agent/memory_store.py with memory_manager.py + memory_provider.py - Update _discover_tools() list: remove honcho_tools, add skill_manager_tool - Add session_search and delegate_task to intercepted tools list (agent-loop) - Fix budget warning: two-tier system (70% caution, 90% warning) (agent-loop) - Fix gateway auth order (per-platform first, global last) (gateway-internals) - Fix email_adapter.py -> email.py, add webhook.py + api_server.py (gateway-internals) - Add 7 missing providers to provider-runtime list Other: - Add Docker --cap-add entries to security doc - Fix Python version 3.10+ -> 3.11+ (contributing) - Fix AGENTS.md discovery claim (not hierarchical walk) (tips) - Fix cron 'add' -> canonical 'create' (cron-internals) - Add pre_api_request/post_api_request hooks to plugin guide - Add Google/Gemini provider to providers page - Clarify OPENAI_BASE_URL deprecation (providers) |
||
|
|
c7768137fa |
docs: add Supermemory to memory providers docs, env vars, CLI reference
- Add full Supermemory section to memory-providers.md with config table, tools, setup instructions, and key features - Update provider count from 7 to 8 across memory.md and memory-providers.md - Add SUPERMEMORY_API_KEY to environment-variables.md - Add Supermemory to integrations/providers.md optional API keys table - Add supermemory to cli-commands.md provider list - Add Supermemory to profile isolation section (config file providers) |
||
|
|
537a2b8bb8 |
docs: add WSL2 networking guide for local model servers (#5616)
Windows users running Hermes in WSL2 with model servers on the Windows host hit 'connection refused' because WSL2's NAT networking means localhost points to the VM, not Windows. Covers: - Mirrored networking mode (Win 11 22H2+) — makes localhost work - NAT mode fallback using the host IP via ip route - Per-server bind address table (Ollama, LM Studio, llama-server, vLLM, SGLang) - Detailed Ollama Windows service config for OLLAMA_HOST - Windows Firewall rules for WSL2 connections - Quick verification steps - Cross-reference from Troubleshooting section |
||
|
|
43d468cea8 |
docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages: Staleness fixes: - Fix FAQ: wrong import path (hermes.agent → run_agent) - Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash - Fix integrations/index: missing MiniMax TTS provider - Fix integrations/index: web_crawl is not a registered tool - Fix sessions: add all 19 session sources (was only 5) - Fix cron: add all 18 delivery targets (was only telegram/discord) - Fix webhooks: add all delivery targets - Fix overview: add missing MCP, memory providers, credential pools - Fix all line-number references → use function name searches instead - Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500) Expanded thin pages (< 150 lines → substantial depth): - honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI - overview.md: 49 → 55 lines — added MCP, memory providers, credential pools - toolsets-reference.md: 57 → 175 lines — added explanations, config examples, custom toolsets, wildcards, platform differences table - optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across communication, devops, mlops (18!), productivity, research categories - integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections - cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states, tick cycle, delivery targets, script-backed jobs, CLI interface - gateway-internals.md: 111 → 250 lines — added architecture diagram, message flow, two-level guard, platform adapters, token locks, process management - agent-loop.md: 112 → 235 lines — added entry points, API mode resolution, turn lifecycle detail, message alternation rules, tool execution flow, callback table, budget tracking, compression details - architecture.md: 152 → 295 lines — added system overview diagram, data flow diagrams, design principles table, dependency chain Other depth additions: - context-references.md: added platform availability, compression interaction, common patterns sections - slash-commands.md: added quick commands config example, alias resolution - image-generation.md: added platform delivery table - tools-reference.md: added tool counts, MCP tools note - index.md: updated platform count (5 → 14+), tool count (40+ → 47) |
||
|
|
77a2aad771 |
docs: fix stale references across 8 doc pages
Audit found 24+ discrepancies between docs and code. Fixed: HIGH severity: - Remove honcho toolset from tools-reference, toolsets-reference, and tools.md (converted to memory provider plugin, not a built-in toolset) - Add note that Honcho is available via plugin MEDIUM severity: - Add hermes memory command family to cli-commands.md (setup/status/off) - Add --clone-all, --clone-from to profile create in cli-commands.md - Add --max-turns option to hermes chat in cli-commands.md - Add /btw slash command to slash-commands.md - Fix profile show example output (remove nonexistent disk usage, add .env and SOUL.md status lines) - Add missing hermes-webhook toolset to toolsets-reference.md - Add 5 missing providers to fallback-providers.md table - Add 7 missing providers to providers.md fallback list - Fix outdated model examples: glm-4-plus→glm-5, moonshot-v1-auto→kimi-for-coding |
||
|
|
57625329a2 |
docs+feat: comprehensive local LLM provider guides and context length warning (#4294)
* docs: update llama.cpp section with --jinja flag and tool calling guide The llama.cpp docs were missing the --jinja flag which is required for tool calling to work. Without it, models output tool calls as raw JSON text instead of structured API responses, making Hermes unable to execute them. Changes: - Add --jinja and -fa flags to the server startup example - Replace deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with hermes model interactive setup - Add caution block explaining the --jinja requirement and symptoms - List models with native tool calling support - Add /props endpoint verification tip * docs+feat: comprehensive local LLM provider guides and context length warning Docs (providers.md): - Rewrote Ollama section with context length warning (defaults to 4k on <24GB VRAM), three methods to increase it, and verification steps - Rewrote vLLM section with --max-model-len, tool calling flags (--enable-auto-tool-choice, --tool-call-parser), and context guidance - Rewrote SGLang section with --context-length, --tool-call-parser, and warning about 128-token default max output - Added LM Studio section (port 1234, context length defaults to 2048, tool calling since 0.3.6) - Added llama.cpp context length flag (-c) and GPU offload (-ngl) - Added Troubleshooting Local Models section covering: - Tool calls appearing as text (with per-server fix table) - Silent context truncation and diagnosis commands - Low detected context at startup - Truncated responses - Replaced all deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with hermes model interactive setup and config.yaml examples - Added deprecation warning for legacy env vars in General Setup Code (cli.py): - Added context length warning in show_banner() when detected context is <= 8192 tokens, with server-specific fix hints: - Ollama (port 11434): suggests OLLAMA_CONTEXT_LENGTH env var - LM Studio (port 1234): suggests model settings adjustment - Other servers: suggests config.yaml override Tests: - 9 new tests covering warning thresholds, server-specific hints, and no-warning cases |
||
|
|
5b0243e6ad |
docs: deep quality pass — expand 10 thin pages, fix specific issues (#4134)
Developer guide stubs expanded to full documentation: - trajectory-format.md: 56→233 lines (JSONL format, ShareGPT example, normalization rules, reasoning markup, replay code) - session-storage.md: 66→388 lines (SQLite schema, migration table, FTS5 search syntax, lineage queries, Python API examples) - context-compression-and-caching.md: 72→321 lines (dual compression system, config defaults, 4-phase algorithm, before/after example, prompt caching mechanics, cache-aware patterns) - tools-runtime.md: 65→246 lines (registry API, dispatch flow, availability checking, error wrapping, approval flow) - prompt-assembly.md: 89→246 lines (concrete assembled prompt example, SOUL.md injection, context file discovery table) User-facing pages expanded: - docker.md: 62→224 lines (volumes, env forwarding, docker-compose, resource limits, troubleshooting) - updating.md: 79→167 lines (update behavior, version checking, rollback instructions, Nix users) - skins.md: 80→206 lines (all color/spinner/branding keys, built-in skin descriptions, full custom skin YAML template) Hub pages improved: - integrations/index.md: 25→82 lines (web search backends table, TTS/browser providers, quick config example) - features/overview.md: added Integrations section with 6 missing links Specific fixes: - configuration.md: removed duplicate Gateway Streaming section - mcp.md: removed internal "PR work" language - plugins.md: added inline minimal plugin example (self-contained) 13 files changed, ~1700 lines added. Docusaurus build verified clean. |
||
|
|
44d02f35d2 |
docs: restructure site navigation — promote features and platforms to top-level (#4116)
Major reorganization of the documentation site for better discoverability and navigation. 94 pages across 8 top-level sections (was 5). Structural changes: - Promote Features from 3-level-deep subcategory to top-level section with new Overview hub page categorizing all 26 feature pages - Promote Messaging Platforms from User Guide subcategory to top-level section, add platform comparison matrix (13 platforms x 7 features) - Create new Integrations section with hub page, grouping MCP, ACP, API Server, Honcho, Provider Routing, Fallback Providers - Extract AI provider content (626 lines) from configuration.md into dedicated integrations/providers.md — configuration.md drops from 1803 to 1178 lines - Subcategorize Developer Guide into Architecture, Extending, Internals - Rename "User Guide" to "Using Hermes" for top-level items Orphan fixes (7 pages now reachable via sidebar): - build-a-hermes-plugin.md added to Guides - sms.md added to Messaging Platforms - context-references.md added to Features > Core - plugins.md added to Features > Core - git-worktrees.md added to Using Hermes - checkpoints-and-rollback.md added to Using Hermes - checkpoints.md (30-line stub) deleted, superseded by checkpoints-and-rollback.md (203 lines) New files: - integrations/index.md — Integrations hub page - integrations/providers.md — AI provider setup (extracted) - user-guide/features/overview.md — Features hub page Broken link fixes: - quickstart.md, faq.md: update context-length-detection anchors - configuration.md: update checkpoints link - overview.md: fix checkpoint link path Docusaurus build verified clean (zero broken links/anchors). |