Addresses findings from two self-review passes pre-merge.
First pass (3-agent parallel review):
1. plugins/browser/browser_use/provider.py: drop the
``_ = managed_nous_tools_enabled`` dead-import-hider in
_get_config_or_none(). The import was actively misleading — the
helper IS used in _get_config() (separate method, separate import),
not here. The "keep static analysis happy" comment was wrong about
what the helper does in this scope.
2. agent/browser_provider.py: drop ``pragma: no cover`` from
is_configured() / provider_name() backward-compat aliases. They ARE
covered by ``TestLegacyAbcAliases`` — the pragma would have masked
future regressions.
3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
instead of hardcoded set of 3 keys. Future maintainers adding a 4th
built-in provider now just extend _PROVIDER_REGISTRY; the override
detection adapts automatically. Previously the hardcoded
``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
True forever on any 4-key registry, silently routing every install
onto the legacy fixture path.
4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
but the registry has no matching plugin (typo, uninstalled plugin,
discovery failure), emit a WARNING with actionable text instead of
silently falling through to auto-detect. Legacy code surfaced a typed
credentials error via direct class instantiation; this log restores
the signal in the post-migration path.
5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
documentation. Module docstring + 13-line block-comment + 5-line
inline comment was repeating the same point. Kept the docstring and
trimmed the block-comment to 5 lines.
6. agent/browser_registry.py: upgrade is_available()-raised logging from
DEBUG to WARNING with exc_info=True. A provider's availability check
throwing is unusual enough that users debugging "no cloud provider"
need the traceback in logs.
7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
imports (os, shutil, tempfile — only referenced inside the
SUBPROCESS_SCRIPT string literal that runs in a child process).
Second pass (architecture + claim-verification review):
8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
auto-detect branch. Prior text claimed it "routes through the plugin
registry's legacy preference walk so third-party plugins still get a
chance to be selected when they're explicitly configured" — false on
both counts. The branch uses module-level legacy class aliases
(BrowserUseProvider / BrowserbaseProvider) directly; third-party
plugins are intentionally reachable only via explicit
``browser.cloud_provider``. Corrected comment now matches behaviour
and cross-references _LEGACY_PREFERENCE for the firecrawl gate
rationale.
9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
drop the unused ``get_active_browser_provider as
_registry_get_active_browser_provider`` alias from the
``from agent.browser_registry import ...`` block. It was never
referenced; matching test-stub line in the agent.browser_registry
SimpleNamespace also dropped. ``get_provider`` is still imported (used
by the explicit-config dispatch path at line 535).
10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
with the early-guard pattern used in browserbase + browser_use
plugins. Previously firecrawl tried the DELETE and relied on
``_headers()`` raising ValueError to trip a "missing credentials"
warning; same final outcome but a different control flow that read
like a bug to a maintainer skimming the three modules. Now: if
is_available() is False, log+return early — identical shape to the
other two providers.
Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
Switches tools.browser_tool's cloud-provider lookup from the hardcoded
_PROVIDER_REGISTRY class-instantiation pattern to the
agent.browser_registry singleton registry that plugins self-populate.
Changes:
- tools/browser_tool.py top imports: pull BrowserProvider from
agent.browser_provider (re-exported as CloudBrowserProvider for legacy
callers) and the three provider classes from plugins/browser/<vendor>/.
Legacy class names (BrowserbaseProvider, BrowserUseProvider, FirecrawlProvider)
remain on tools.browser_tool as re-export shims so existing test patches
(monkeypatch.setattr(browser_tool, 'BrowserUseProvider', ...)) keep working.
- _get_cloud_provider() now consults agent.browser_registry.get_provider()
for explicit-config lookups. The auto-detect fallback still uses
BrowserUseProvider() / BrowserbaseProvider() at the module level so the
cache-policy test fixtures (which patch those names) keep driving the
function. Test-time _PROVIDER_REGISTRY overrides are detected by class
identity and routed through the legacy factory-call path.
- agent/browser_provider.py: BrowserProvider grows is_configured() and
provider_name() as thin backward-compat aliases for the legacy
CloudBrowserProvider API. Subclasses MUST implement is_available() and
name; the aliases delegate. This keeps ~6 caller sites in browser_tool.py
working without churning them.
- tests/tools/test_managed_browserbase_and_modal.py: _install_fake_tools_package
grows stubs for agent.browser_provider / agent.browser_registry /
plugins.browser.<vendor>.provider so the test's spec-loader path
(sys.modules-reset + reload-tool-from-disk) can satisfy tools.browser_tool's
top-level imports.
Verified: all 23 existing tests in test_browser_cloud_*.py +
test_managed_browserbase_and_modal.py still pass post-cutover.
The legacy tools/browser_providers/ directory is NOT yet deleted; several
tests still _load_tool_module() those files via spec_from_file_location.
The deletion + test-path updates land in a later commit.
Migrates the remaining two cloud browser providers to plugins:
plugins/browser/browser_use/ — dual auth (direct BROWSER_USE_API_KEY
or managed Nous gateway), idempotency-
key handling for retried managed-mode
creates, x-external-call-id capture.
plugins/browser/firecrawl/ — direct FIRECRAWL_API_KEY only;
distinct from plugins/web/firecrawl/
(same key, different endpoint).
Also drops the 'single-eligible shortcut' rule from
agent.browser_registry._resolve(). Was a copy-paste from
web_search_registry that would have introduced a real behavior change:
a user with only FIRECRAWL_API_KEY set (for web-extract) would silently
get routed to a paid Firecrawl cloud browser on a fresh install — not
matching origin/main, which only auto-detected between Browser Use and
Browserbase. Third-party browser plugins are subject to the same gate:
they require explicit `browser.cloud_provider` to take effect.
Verified end-to-end via plugin discovery:
- 3 plugins register (browser-use, browserbase, firecrawl)
- _resolve(None) with no creds: None (local mode)
- _resolve(None) with only FIRECRAWL_API_KEY: None (matches main)
- _resolve('firecrawl'): firecrawl (explicit wins)
- _resolve(None) with BU+firecrawl: browser-use (legacy walk first hit)
- _resolve(None) with all three: browser-use (legacy walk order)
Foundation commit for the browser-provider plugin migration (#25214).
Mirrors the architecture established by PR #25182 (web providers):
- agent/browser_provider.py — BrowserProvider ABC. Preserves the legacy
CloudBrowserProvider lifecycle contract bit-for-bit (create_session,
close_session, emergency_cleanup, session metadata shape) so the
dispatcher in tools/browser_tool.py becomes a pure registry lookup.
Renames is_configured() → is_available() for parity with WebSearchProvider.
- agent/browser_registry.py — selection registry with the same
three-rule resolution as web_search_registry:
1. Explicit config wins (returns even if is_available() == False so
the dispatcher surfaces a precise credentials error)
2. Single-eligible shortcut
3. Legacy preference walk: browser-use → browserbase, filtered by
availability. Firecrawl is intentionally NOT in the legacy walk
(matches pre-migration behaviour — Firecrawl was only reachable
via explicit browser.cloud_provider: firecrawl).
- hermes_cli/plugins.py — adds ctx.register_browser_provider() facade,
one-liner mirror of register_web_search_provider().
No plugins registered yet; no dispatcher cutover yet. The next commits
move browserbase/browser-use/firecrawl into plugins/browser/<vendor>/
and switch tools/browser_tool.py over to the registry.
agent/bedrock_adapter.py now calls lazy_deps to install boto3 and
botocore on first import, mirroring how other optional provider
adapters defer their heavy AWS dependencies until actually used.
Keeps the base install slim for users who don't run on Bedrock.
Both the `action=block` and `decision=block` branches in _parse_response
shared identical field-priority and type-validation logic. Extract it into
a single _block_message(primary, secondary) helper so the two branches are
one line each and the type guard lives in exactly one place.
No functional change: existing tests (TestParseResponse, 14 tests) all
pass unchanged, confirming identical behaviour.
Address code review feedback on _parse_response:
1. Restore isinstance(raw, str) guard so non-string message/reason values
(e.g. integers, lists) from a malformed hook response fall back to the
default rather than being forwarded as-is. This keeps the contract that
message in the returned dict is always a string.
2. Extract the repeated literal 'Blocked by shell hook.' into a module-level
constant _DEFAULT_BLOCK_MESSAGE to avoid duplication and make it easy to
change in one place.
Four new unit tests added to tests/agent/test_shell_hooks.py covering:
- action block with no message (uses default)
- decision block with no reason (uses default)
- action block with empty string message (uses default)
- action block with non-string message, e.g. integer (uses default)
_parse_response in agent/shell_hooks.py only forwarded a pre_tool_call
block directive if the hook also provided a non-empty message or reason.
When either field was missing the function returned None, causing Hermes
to treat the response as a no-op and execute the tool unconditionally.
This means a hook that outputs {"action": "block"} or {"decision": "block"}
without a reason string is silently ignored. The security boundary fails
open: tools the user intended to gate are executed anyway.
Fix: remove the message-presence guard. Honor the block unconditionally
and fall back to a default message when none is provided. Existing hooks
that already include a message or reason are unaffected.
qwen3.6-plus did not have an explicit entry in DEFAULT_CONTEXT_LENGTHS,
so the longest-substring fallback matched the generic 'qwen': 131072
catch-all. That dropped the effective context limit from 1,048,576
tokens to 131,072, prematurely lowered the compression threshold, and
produced misleading warnings about main/compression context mismatch
in long sessions.
Add an explicit 'qwen3.6-plus': 1048576 entry before the catch-all and
cover it with a regression test (bare, qwen/, and dashscope/ prefixes).
Note: PR #6599 also mentions touching model_metadata.py but the actual
diff only edits hermes_cli/models.py, so this fix is independent and
not duplicated by that PR.
Closes#27008
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
Original commit 75e5d0f6b by hueilau targeted _build_api_kwargs in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.build_api_kwargs — re-applied there.
Also: switch the custom_providers forward (from 21078ebce) to use
getattr() — tests build a bare AIAgent via __new__ and would otherwise
hit AttributeError on _custom_providers.
Co-authored-by: hueilau <33933019+hueilau@users.noreply.github.com>
Original commit 8d756a421 by austrian_guy targeted __init__ in
pre-refactor run_agent.py. The body now lives in
agent/agent_init.init_agent — re-applied there.
Co-authored-by: austrian_guy <33156212+ether-btc@users.noreply.github.com>
Original commit 973f27e95 by Teknium targeted _spawn_background_review in
pre-refactor run_agent.py. The body now lives in
agent/background_review._spawn_background_review — re-applied there.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit 21078ebce by PaTTeeL targeted _try_activate_fallback in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.try_activate_fallback — re-applied there.
Co-authored-by: PaTTeeL <9150277+PaTTeeL@users.noreply.github.com>
Original commit 33528b428 by konsisumer targeted _restore_primary_runtime
in pre-refactor run_agent.py. The body now lives in
agent/agent_runtime_helpers.restore_primary_runtime — re-applied there.
Fixes#20465
Co-authored-by: konsisumer <der@konsi.org>
Original commit 2b193907d by Teknium added a new module-level
_StreamErrorEvent class and threaded its raise into
_run_codex_create_stream_fallback in pre-refactor run_agent.py.
- _StreamErrorEvent class → run_agent.py (module-level, next to
_qwen_portal_headers; class needs to be top-level for the codex
runtime to import it)
- The fallback event-loop's 'type=error' handler → agent/codex_runtime.py
where run_codex_create_stream_fallback now lives. Imports
_StreamErrorEvent lazily from run_agent to avoid circular import.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit e51d74ab9 by Maxim Esipov targeted _extract_api_error_context
and _recover_with_credential_pool in pre-refactor run_agent.py. Both bodies
now live in agent/agent_runtime_helpers.py — re-applied to that module:
- extract_api_error_context: payload.get('type') added to the reason
fallback chain (Codex error bodies use 'type' instead of 'code'/'error')
- recover_with_credential_pool: usage_limit_reached detection in the
rate_limit branch — skip the retry-once-then-rotate dance and rotate
immediately when the body says the per-account usage limit hit.
Co-authored-by: Maxim Esipov <maksesipov@gmail.com>
Original commits 4ded3ede3 (@konsisumer) + 374dc81c2 (Teknium) added a
413 hint to run_agent.py's agent loop. Final-state version (the sharpened
374dc81c2 wording) ported to agent/conversation_loop.py, where the
payload_too_large branch now lives.
The deprecation detection + _URL_TO_PROVIDER changes from both commits
landed in agent/copilot_acp_client.py and agent/model_metadata.py via
the prior merge.
Closes#10648
Co-authored-by: konsisumer <der@konsi.org>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit 395e9dd9e by Teknium targeted module-level _is_mcp_tool_parallel_safe
and _should_parallelize_tool_batch helpers in pre-refactor run_agent.py. Both
helpers now live in agent/tool_dispatch_helpers.py — re-applied to that
module.
The tools/mcp_tool.py portion (the public is_mcp_tool_parallel_safe API
+ _parallel_safe_servers tracking) merged cleanly from main via the prior
merge commit.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit 9c304a7f5 by helix4u targeted _flatten_exception_chain,
_summarize_api_error, and the _call streaming retry loop in pre-refactor
run_agent.py. Re-applied to:
- New _is_provider_stream_parse_error helper → run_agent.py (next
to _flatten_exception_chain in the AIAgent class)
- _summarize_api_error early-return for the malformed-streaming
ValueError → run_agent.py (kept method body)
- _call streaming retry: _is_stream_parse_err flag wired into
_is_transient AND the post-exhaustion branch + dedicated
malformed-streaming user-status string → agent/chat_completion_helpers.py
(the _call body now lives there)
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Original commit 97a32afdc by helix4u targeted _check_compression_model_feasibility
in pre-refactor run_agent.py. The function body now lives in
agent/conversation_compression.py — re-applied the configured-but-unavailable
provider message there.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Collapses the four-commit xAI entitlement-403 chain to its final
on-main state, ported to the post-refactor module layout:
- Added _is_entitlement_failure on AIAgent (run_agent.py) — detects
Grok subscription-shape 403s on (401|403|None) status codes.
- Added entitlement-skip branch to recover_with_credential_pool
(agent/agent_runtime_helpers.py) — breaks the refresh-loop that
Don's 100-iteration trace exposed when a Premium+ user hit a real
entitlement issue.
- Removed _decorate_xai_entitlement_error and unwrapped its two
_summarize_api_error call sites — xAI's own body text already
points users at grok.com/?_s=usage so we surface that verbatim
(dffb602f3 reasoning: X Premium subs DO now work per xAI's
2026-05-16 announcement, so editorialising would misdirect).
- grok-4.3 1M context entry landed in agent/model_metadata.py
via the prior merge — no additional port needed.
Tests already on disk (tests/run_agent/test_codex_xai_oauth_recovery.py)
assert _is_entitlement_failure shape and verbatim body surfacing.
Closes#27110.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit 31ba2b0cb by Teknium targeted run_codex_stream() at
its pre-refactor location in run_agent.py. Re-applied:
- Prelude error retry/fallback → agent/codex_runtime.py (in
run_codex_stream where the body now lives)
- _decorate_xai_entitlement_error helper + _summarize_api_error
wrapping → run_agent.py (these methods remained on AIAgent
as @staticmethod's; cherry-pick applied them cleanly)
The xai-oauth provider gate, encrypted_content drop on replay, etc.
landed in agent/codex_responses_adapter.py via the prior merge from main.
Closes#8133, #14634
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Original commit 13c3d4b4e by kchantharuan touched __init__ and
_apply_client_headers_for_base_url in pre-refactor run_agent.py. Re-applied to:
- __init__: agent/agent_init.py (3 hunks — NVIDIA branch + _custom_headers
fallback in routed-client and fallback-client paths)
- _apply_client_headers_for_base_url: still in run_agent.py (1 hunk)
build_nvidia_nim_headers was already present in agent/auxiliary_client.py
from the prior merge — no additional port needed.
Co-authored-by: kchantharuan <kchantharuan@nvidia.com>
Original commit b62c99797 by Jaaneek targeted six locations in
pre-refactor run_agent.py. Re-applied to the extracted post-PR locations:
- api_mode dispatch → agent/agent_init.py
- is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py
- codex_auth_retry block + 401 hint → agent/conversation_loop.py
- _try_refresh_codex_client_credentials body → run_agent.py (kept)
The non-run_agent.py portions of the commit (auxiliary_client, codex
transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly
from main via the prior merge commit.
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Original commit 4f8aaf104 by InB4DevOps targeted run_conversation() in
the pre-refactor run_agent.py. Re-applied to the extracted location in
agent/conversation_loop.py.
Co-authored-by: InB4DevOps <tolle.lege+github@gmail.com>
run_agent.py taken from HEAD (the extracted forwarder structure). The 25
run_agent.py fixes that landed on main during the PR's life need to be
ported into the agent/* extracted modules in follow-up commits.
_OPENROUTER_MODEL hardcoded 'google/gemini-3-flash-preview' which
returns 404 on OpenRouter, breaking all vision tasks for users who
rely on the OpenRouter default. Additionally, _try_openrouter()
ignored the user-configured auxiliary.vision.model entirely.
Changes:
- Update _OPENROUTER_MODEL default to google/gemini-2.5-flash (valid)
- Add optional 'model' parameter to _try_openrouter()
- Pass configured model from _resolve_strict_vision_backend() through
to _try_openrouter()
This allows users who set auxiliary.vision.model (e.g. x-ai/grok-4.3)
to have it actually used, while maintaining backward compatibility.
In resolve_provider_client(), the named custom provider code path at
~line 2914 only checked the ``key_env`` field when looking for an
environment-variable-based API key. The documented ``api_key_env``
snake_case alias was silently ignored, causing custom providers
configured with ``api_key_env`` to fall through to the
``no-key-required`` placeholder — which produces a confusing 401
(``****ired`` mask) on auth-required remote endpoints.
This mirrors the same fix already applied to run_agent.py in commit
6ddc48b05 (fix(fallback): resolve api_key_env in fallback chain entries).
Also adds a logger.warning() when the placeholder is reached, so
future alias gaps are easier to debug.
Closes#25091
Four fixes from PR #27248 review:
1. **__init__ forwarder is now keyword-forwarded** (daimon-nous review).
Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64
params positionally to agent.agent_init.init_agent, so adding a
65th param on main would require three lockstep edits (signature,
init_agent signature, forwarder call) or silently shift every value.
Keyword forwarding makes this trivially safe — adding a param now
only needs the two signatures and one extra keyword line.
2. **Drop dead _ra() in agent/codex_runtime.py** (daimon-nous + Copilot).
The lazy run_agent reference was defined but never called inside
this module — the codex paths use agent.* accessors only.
3. **Drop unused imports in agent/codex_runtime.py** (Copilot):
contextvars, threading, time, uuid, Optional. Carried over from
run_agent.py during the original extraction.
4. **Tighten three source-introspection test guards** (Copilot):
- test_memory_nudge_counter_hydration.py — was scanning the
concatenated source of run_agent.py + agent/conversation_loop.py
and matching self.X or agent.X form. Now asserts the
hydration block lives in agent/conversation_loop.py specifically
with the agent.X form — the body never moves back, so if it
ever drifts a future re-introduction fails the guard.
- test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on
agent.iteration_budget = IterationBudget exactly (was just
iteration_budget = IterationBudget) so an unrelated identifier
ending in iteration_budget can't match.
- test_run_agent.py::TestMemoryProviderTurnStart — assert the
agent._user_turn_count form directly (the extracted body uses
agent.X, not self.X — accepting either was a transitional fudge).
- test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py
only, not the concatenation.
Not addressed in this commit:
* Pre-existing bugs in agent/tool_executor.py (heartbeat index
mismatch when calls are blocked, _current_tool clobber in result
loop, blocked-counted-as-completed in spinner summary, dead
result_preview computation). These were preserved byte-for-byte from
the original _execute_tool_calls_concurrent — worth a separate
follow-up PR with proper tests.
* _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged
by any of the original test patches (nothing actually does
isinstance(x, OpenAI) against the proxy instance).
* agent_init.py:949 mem_config potential NameError — pre-existing;
only triggers if _agent_cfg.get('memory', {}) itself raises, which
it can't with a stock dict.
tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing
test_auxiliary_client failure (unchanged).
run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded
init call's verbosity). Final: 16083 -> 3937 (-12146, 75% reduction).
build_skill_invocation_message() returns a non-empty placeholder string
('[Failed to load skill: ...]') when the skill exists in the command cache
but loading the actual SKILL.md payload fails. CLI/gateway callers treat
any truthy return value as success, so the failure is silently routed into
the model as if it were a valid skill prompt.
Return None instead, matching the existing behavior for unknown commands,
so callers using 'if msg:' can properly detect the failure.
The largest method left on AIAgent (60+ parameters, the entire startup
sequence — credential resolution, provider auto-detection, context
engine bootstrap, memory store hydration, plugin lifecycle hooks)
moves into agent/agent_init.py.
AIAgent.__init__ is now a thin wrapper that calls
agent.agent_init.init_agent(self, ...) with the original full
parameter list preserved.
Module-level run_agent names referenced in the body (_openrouter_prewarm_done,
_qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI,
get_tool_definitions, check_toolset_requirements) are resolved through
_ra() so test patches on those names keep working. agent_init's logger
warnings are routed via _ra().logger so tests patching run_agent.logger
capture them (TestStringKSuffixContextLengthWarns,
TestCustomProvidersInvalidContextLengthWarns).
Live E2E reconfirmed on three model paths (openai/gpt-5.4,
anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking).
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 5944 -> 4564 lines (-1380).
Total reduction since baseline: 16083 -> 4564 (-11519, 72%).
The 3,877-line run_conversation body — the agent loop itself — moves out
of run_agent.py into a dedicated module. AIAgent.run_conversation is
now a thin forwarder that delegates to agent.conversation_loop.run_conversation
with the AIAgent instance as the first argument.
This is the largest single extraction in the run_agent.py refactor.
The body keeps all 163 self.X references intact (rewritten as agent.X),
all nested closures, all retry/backoff/compression machinery. Symbols
that tests or callers patch on run_agent (_set_interrupt,
handle_function_call, AIAgent class attrs) are resolved through _ra()
inside the extracted module so the patch surface is preserved.
Five tests doing inspect.getsource(AIAgent.run_conversation) updated to
scan agent.conversation_loop.run_conversation. Two source-introspection
tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart)
updated to accept either self.X (legacy) or agent.X (extracted
form) in the matched assertions.
Live E2E verified on three model paths:
* openai/gpt-5.4 (OpenAI chat completions via OpenRouter)
* anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter)
* moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path)
Plus read_file tool execution, terminal tool, web_search.
tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client::test_custom_endpoint... — same as on main).
run_agent.py: 9800 -> 5944 lines (-3856).
Total reduction since baseline: 16083 -> 5944 (-10139, 63%).
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT,
_SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move
out of the AIAgent class body and into agent/background_review.py where
they're consumed.
AIAgent re-exposes them as class attributes via 'from ... import' inside
the class body — Python binds those names into the class namespace so
existing AIAgent._MEMORY_REVIEW_PROMPT references keep working.
spawn_background_review_thread also falls back to the module-level
constants if an agent doesn't have the attribute (preserves the test
pattern of mocking these on the agent).
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 9986 -> 9800 lines (-186).
Move _interruptible_streaming_api_call out of run_agent.py — the biggest
single method in the file. Body lives next to interruptible_api_call
in agent/chat_completion_helpers.py so streaming + non-streaming code
share one home.
Nested closures (_call_chat_completions, _call_anthropic, the codex
stream branch) all come along with the body and still capture the
parent function's locals as expected.
AIAgent keeps a thin forwarder method. is_local_endpoint added to
the import block (used by the stream stale-timeout disable logic).
One source-introspection test in TestAnthropicInterruptHandler is
updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call
instead of AIAgent._interruptible_streaming_api_call.
tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 12277 -> 11385 lines (-892).
Move the two big tool-dispatch methods out of run_agent.py:
* execute_tool_calls_concurrent — 408-line concurrent path (interrupt
pre-flight, guardrail+plugin block, callback fan-out, ContextVar-
preserving ThreadPoolExecutor, periodic heartbeats for the gateway
inactivity monitor, per-tool result handling with subdir hints +
guardrail observations + checkpoint, /steer drain)
* execute_tool_calls_sequential — 441-line sequential path (the
original behavior used for single-tool batches and interactive
tools)
Both take the parent AIAgent as their first argument; AIAgent keeps
thin forwarders so call sites unchanged. handle_function_call is
routed through _ra() so tests that patch run_agent.handle_function_call
keep working. _set_interrupt likewise.
The AST guard in test_tool_executor_contextvar_propagation.py is
updated to scan both run_agent.py AND agent/tool_executor.py so it
still catches the executor.submit(_run_tool, ...) regression
regardless of which file the body lives in.
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).
run_agent.py: 14309 -> 13461 lines (-848).
Move the background-review subsystem (the self-improvement loop — see the
README) out of run_agent.py into a dedicated module.
* summarize_background_review_actions — was the @staticmethod that builds
the user-facing action summary
* spawn_background_review_thread — builds the thread target + prompt;
the actual review loop body (forked AIAgent, runtime inheritance,
tool whitelist, suppression, teardown) lives in _run_review_in_thread
* build_memory_write_metadata — provenance for external memory mirrors
AIAgent keeps thin wrappers for backward compatibility AND because tests
patch run_agent.threading.Thread to assert lifecycle behavior — the
threading.Thread construction stays in AIAgent._spawn_background_review,
the inner work moves out.
tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client.py::test_custom_endpoint... — confirmed failing
on main before this change). 3 skipped.
run_agent.py: 15272 -> 14972 lines (-300).
Three small extractions into focused modules:
* agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import),
\_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio,
\_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap.
* agent/iteration_budget.py — IterationBudget class (thread-safe consume/
refund counter shared by parent agent and subagents).
run_agent re-exports every name so existing test patches like
patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget'
keep working unchanged. Verified the patch-rebinding contract for OpenAI
explicitly.
tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py:
1347 passed, 3 skipped.
run_agent.py: 15427 -> 15261 lines (-166).
Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates,
\_sanitize_structure_surrogates, \_sanitize_messages_surrogates,
\_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments,
\_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii,
\_strip_images_from_messages, \_sanitize_structure_non_ascii) and the
\_SURROGATE_RE constant out of run_agent.py into a new module.
These are stateless byte-walking helpers with no AIAgent dependency.
Backward compatibility: run_agent re-exports every name via a single
import block, so existing 'from run_agent import _sanitize_surrogates'
imports in tests and cli.py keep working unchanged. Same pattern the
file already uses for _summarize_user_message_for_log (codex_responses_adapter).
run_agent.py: 16077 -> 15682 lines (-395).
After context compression, the protected tail messages retain their
original image parts. When those include multi-MB pasted screenshots,
every subsequent API request re-ships the same base-64 blobs forever —
which can push the request past provider body-size limits and wedge the
session even though compression 'succeeded'.
Add _strip_historical_media() to agent/context_compressor.py. After the
summary is built, find the newest user message that carries an image
part and replace image parts in every earlier message with a short
text placeholder ('[Attached image — stripped after compression]').
The newest image-bearing user turn keeps its media so the model can
still analyse what the user just sent.
Handles all three multimodal shapes:
- OpenAI chat.completions image_url
- OpenAI Responses API input_image
- Anthropic native {type: image, source: ...}
Includes 27 unit tests covering the helpers and the end-to-end
compress() integration, plus a manual E2E check confirming a ~4MB
two-image conversation shrinks to ~2MB after compression.
Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects
two shapes that the rest of the JSON Schema ecosystem accepts:
1. $ref nodes with sibling keywords. Moonshot expands the reference before
validation and then rejects the node if keys like `description`, `type`,
or `default` appear alongside $ref. MCP-sourced tool schemas commonly
put a `description` on $ref-typed properties so the model sees the
field hint — which worked on every provider except Moonshot.
2. Tuple-style `items` arrays (positional element schemas). Moonshot's
engine requires ONE schema applied to every array element. Common in
tool schemas generated from Go/Protobuf that model fixed-length arrays
as `[{type:number}, {type:number}]`.
Repairs applied in `agent/moonshot_schema.py`:
- Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only
(strip every sibling). The referenced definition still carries its own
description on the target node, which Moonshot accepts.
- Rule 4: when `items` is a list, collapse to the first element schema
(falling back to `{}` which is then filled by the generic missing-type
rule). Preserves `minItems` / `maxItems` / other siblings.
Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems,
plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type
still passes (it asserted plain $ref passes through; now it passes through
as exactly `{"$ref": "..."}` which is strictly compatible).
All 35 tests in test_moonshot_schema.py pass.