Files
hermes-agent/tests/stress
kshitij 5fba236644 chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.

All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.

Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
  gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
  tests/tools/: 18187 passed, 59 pre-existing failures (verified against
  origin/main with the same shape — identical failure count, identical
  category — all xdist test-order flakes unrelated to this change)

Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
2026-05-17 02:29:41 -07:00
..

Stress / battle-test suite

Long-running tests that exercise the Kanban kernel under adversarial conditions. Not run by scripts/run_tests.sh because they can take 30+ seconds each and spawn real subprocesses.

Run manually:

./venv/bin/python -m pytest tests/stress/ -v -s
# or individual files:
./venv/bin/python tests/stress/test_concurrency.py
./venv/bin/python tests/stress/test_subprocess_e2e.py
./venv/bin/python tests/stress/test_property_fuzzing.py
./venv/bin/python tests/stress/test_benchmarks.py

What's covered

  • test_concurrency.py — 5 workers, 100 tasks, race-for-claim. Asserts no double-claims, no orphan runs, no SQLite errors escape retry.
  • test_concurrency_mixed.py — 10 workers + 1 reclaimer, 500 tasks, random ops (claim/complete/block/unblock/archive). Same invariants under adversarial scheduling.
  • test_concurrency_reclaim_race.py — TTL < work duration so the reclaimer intentionally yanks tasks mid-work; verifies the worker's late-complete is refused cleanly (CAS guard works).
  • test_subprocess_e2e.py — dispatcher spawns real Python subprocess workers that heartbeat + complete via the CLI; crash detection against a real dead PID.
  • test_property_fuzzing.py — 500 random operation sequences, ~40k operations total, 9 invariant checks after each step.
  • test_atypical_scenarios.py — 28 scenarios covering atypical user inputs: unicode/emoji/RTL, 1 MB strings, SQL injection attempts, cycles, self-parents, wide fan-in/out, clock skew, HERMES_HOME with spaces/unicode/symlinks, 1000 runs on one task, idempotency-key race across processes, terminal-state resurrection attempts, dashboard REST with weird JSON.
  • test_benchmarks.py — latency at 100/1k/10k tasks for dispatch, recompute_ready, list_tasks, build_worker_context, etc. Results saved to JSON for regression diffing.