mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-21 03:39:54 +00:00
544c31b50b
* perf(config): add load_config_readonly() fast path for hot agent loop
`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.
Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)
Microbench (cache-hit, real config.yaml present):
load_config() 265us/call (125us deepcopy + 140us infra)
load_config_readonly() 138us/call (~48% faster)
`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.
Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.
Profile A/B (cached config, 21-turn agent loop):
BEFORE AFTER delta
get_provider_request_timeout 55ms 16ms -71%
total function calls 399k 160k -60%
deepcopy calls (in hotspots) 34,398 ~0 ~elim
Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass
* perf(redact): substring pre-screens skip non-matching regex chains
Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.
Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:
- `_PREFIX_RE` gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE` gated on `=` in text
- `_JSON_FIELD_RE` gated on `:` and `"` in text
- `_AUTH_HEADER_RE` gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE` gated on `:` in text
- `_PRIVATE_KEY_RE` gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE` gated on `://` in text
- `_JWT_RE` gated on `eyJ` in text
- URL userinfo/query gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE` gated on `+`
Microbench (5 typical log records, 20k iterations each):
BEFORE AFTER delta
redact_sensitive_text per call 5.63us 1.79us -68%
Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).
Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.
The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.
* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)
Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).
Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.
Profile A/B (31-turn cached-config agent run):
BEFORE AFTER delta
_needs_thinking_reasoning_pad cum 18ms 1ms -94%
_copy_reasoning_content_for_api cum 17ms 1ms -94%
base_url_host_matches calls 3,342 372 -89%
urlparse calls 3,373 403 -88%
total function calls 296k 223k -25%
Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped
83 lines
2.4 KiB
Python
83 lines
2.4 KiB
Python
from __future__ import annotations
|
|
|
|
|
|
def _coerce_timeout(raw: object) -> float | None:
|
|
try:
|
|
timeout = float(raw)
|
|
except (TypeError, ValueError):
|
|
return None
|
|
if timeout <= 0:
|
|
return None
|
|
return timeout
|
|
|
|
|
|
def get_provider_request_timeout(
|
|
provider_id: str, model: str | None = None
|
|
) -> float | None:
|
|
"""Return a configured provider request timeout in seconds, if any."""
|
|
if not provider_id:
|
|
return None
|
|
|
|
try:
|
|
from hermes_cli.config import load_config_readonly
|
|
config = load_config_readonly()
|
|
except Exception:
|
|
return None
|
|
|
|
providers = config.get("providers", {}) if isinstance(config, dict) else {}
|
|
provider_config = (
|
|
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
|
|
)
|
|
if not isinstance(provider_config, dict):
|
|
return None
|
|
|
|
model_config = _get_model_config(provider_config, model)
|
|
if model_config is not None:
|
|
timeout = _coerce_timeout(model_config.get("timeout_seconds"))
|
|
if timeout is not None:
|
|
return timeout
|
|
|
|
return _coerce_timeout(provider_config.get("request_timeout_seconds"))
|
|
|
|
|
|
def get_provider_stale_timeout(
|
|
provider_id: str, model: str | None = None
|
|
) -> float | None:
|
|
"""Return a configured non-stream stale timeout in seconds, if any."""
|
|
if not provider_id:
|
|
return None
|
|
|
|
try:
|
|
from hermes_cli.config import load_config_readonly
|
|
config = load_config_readonly()
|
|
except Exception:
|
|
return None
|
|
|
|
providers = config.get("providers", {}) if isinstance(config, dict) else {}
|
|
provider_config = (
|
|
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
|
|
)
|
|
if not isinstance(provider_config, dict):
|
|
return None
|
|
|
|
model_config = _get_model_config(provider_config, model)
|
|
if model_config is not None:
|
|
timeout = _coerce_timeout(model_config.get("stale_timeout_seconds"))
|
|
if timeout is not None:
|
|
return timeout
|
|
|
|
return _coerce_timeout(provider_config.get("stale_timeout_seconds"))
|
|
|
|
|
|
def _get_model_config(
|
|
provider_config: dict[str, object], model: str | None
|
|
) -> dict[str, object] | None:
|
|
if not model:
|
|
return None
|
|
|
|
models = provider_config.get("models", {})
|
|
model_config = models.get(model, {}) if isinstance(models, dict) else {}
|
|
if isinstance(model_config, dict):
|
|
return model_config
|
|
return None
|