diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index b16b1d57..c7649620 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -24,7 +24,14 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip - pip install pyyaml>=6.0 pytest pytest-timeout + pip install pyyaml>=6.0 pytest pytest-timeout pytest-asyncio + # Install the `mcp` package so tests/test_mcp_server.py runs in CI. + # The package is an optional runtime dep of mcp_server.py — users + # who run the MCP integration install it themselves; CI installs + # it so test coverage exists. If mcp install fails (Python 3.13 + # wheel not yet available, etc.), tests/test_mcp_server.py uses + # importorskip and the matrix stays green. + pip install mcp || echo "mcp install failed — test_mcp_server.py will importorskip" - name: Run tests run: pytest tests/ -v --timeout=60 diff --git a/CHANGELOG.md b/CHANGELOG.md index 44417dc0..b12f4a7f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,38 @@ # Hermes Web UI -- Changelog -## [v0.51.27] — 2026-05-08 — 4-PR contributor batch (Release E1: workspace-prefix sentinel hardening, custom named provider API key resolution, streaming chat scroll-pin retention, Kanban detail scrollable) +## [v0.51.28] — 2026-05-08 — 2-PR contributor batch (Release E2: MCP server Option A rewrite + WebUI /goal command) + +### Added (2 PRs) + +- **PR #1895** by @samuelgudi — MCP server Option A rewrite (#1616). Replaces the fragile MCP integration with a clean `mcp_server.py` (567 LOC) implementing project CRUD, session listing, and session mutations (rename/move) over Hermes's HTTP API. Imports `api.models` / `api.profiles` canonically rather than carrying duplicate slug-matching helpers. Relocates `_profiles_match` from `api/routes.py` into `api/profiles.py` as the single source of truth (mcp_server.py and api/routes.py both now import the canonical helper — re-introducing a local copy in either module trips a parity test immediately). Adds env-aware WEBUI_URL (`HERMES_WEBUI_HOST` / `HERMES_WEBUI_PORT`). New behaviour: `delete_project` REFUSES to touch session JSONs when `HERMES_WEBUI_PASSWORD` is unset, returning `{ok:true, unassigned_sessions:0, warning:"…"}` instead — preventing data-loss when an MCP client tries to delete a project on an unauthenticated server. 53-test coverage in `tests/test_mcp_server.py` (914 LOC) including HTTP wire-format integration tests, profile-scoped isolation, legacy untagged row visibility, and `--profile foo` CLI ordering regression. Closes #1616. + +- **PR #1866** by @Michaelyklam — WebUI `/goal` command for goal-tracking with budget enforcement and continuation prompts. New `api/goals.py` (489 LOC) implements goal lifecycle (set / pause / resume / clear / status), per-profile SQLite `SessionDB` cache, and `evaluate_goal_after_turn()` SSE hook that emits `goal` and `goal_continue` events from `api/streaming.py` after assistant turns. Wire-up: `api/routes.py` adds `/api/goal` endpoint (POST set/pause/resume/clear, GET status) and `_start_chat_stream_for_session()` extraction so kickoff prompts can run through the canonical streaming path; `static/commands.js` adds `/goal` autocomplete (cmdGoal handler) with i18n description; `static/messages.js` handles new SSE event types with continuation-toast UI; `static/i18n.js` adds 9 new strings across all locales. 4 documentation screenshots added under `docs/pr-media/{1866,1808}/`. Closes #1808. + +### Mid-stage absorbed fixes (test isolation, per blocker investigation) + +- **#1857 polluter root-cause** — `tests/test_issue1857_usage_overwrite.py` was using `mock.patch.dict(sys.modules, {...})`, which DELETES any keys added during the patched scope on `__exit__`. That silently evicted lazily-imported pydantic submodules (e.g. `pydantic.root_model`), producing `KeyError: 'pydantic.root_model'` in `test_mcp_server.py` downstream when the full pytest suite ran. Fixed by replacing with manual save/restore using a `_MISSING` sentinel. +- **#1895 module-attribute restoration** — `tests/test_mcp_server.py` mutates module-level constants on `api.config`/`api.models`/`mcp_server` (`STATE_DIR`, `SESSION_DIR`, `PROJECTS_FILE`, …) so the MCP server reads from a tmpdir. Without restoration, downstream tests (`test_pytest_state_isolation`, `test_provider_quota_status`, `test_provider_management`) read deleted tmpdirs from `api.config.STATE_DIR`. Fixed by snapshotting originals on first `_reimport_mcp()` call and restoring in `_cleanup_state_dir()`. +- **#1895 `_profiles_match` parity test parent-attribute leak** — `test_profiles_match_single_source_of_truth` pops `api.routes`/`api.profiles` from `sys.modules` and re-imports for the canonical-helper identity check. When restoring `sys.modules` only, fresh modules still leaked through because `import api.routes as r` resolves via `sys.modules['api'].routes` (parent-package attribute), NOT via `sys.modules['api.routes']` directly. Fixed by ALSO restoring parent-package attributes — without this, sibling tests (`test_plugins_panel`, `test_pr1350_sse_notify_correctness`, `test_version_badge`) that patch `api.routes.j` and call handlers via `import api.routes as routes` would fail because the patch hits one module object and the handler reads from another. + +### Tests + +4898 → **4947 collected, 4947 passing, 0 regressions** (+49 net new). Full suite ~140s on Python 3.11 (HERMES_HOME isolated). JS syntax check (`node -c`) clean on `static/commands.js`, `static/i18n.js`, `static/messages.js`. Browser API sanity harness (port 8789): all 11 endpoints + 20 QA tests PASS. Opus advisor pass: SHIP-READY, no blockers (2 follow-up items filed: goal hook firing on unrelated turns; English-only runtime strings in goal UI). + +### Pre-release verification + +- Full pytest under `HERMES_HOME` isolation: **4947 passed, 8 skipped, 1 xfailed, 2 xpassed, 8 subtests passed** in 140.41s. +- Browser API harness against stage-323 on port 8789: all 11 endpoints + 20 QA tests PASS (110.66s for QA phase). +- `node -c` on all 3 modified `static/*.js` files: clean. +- Stage diff: 16 files, +2692/-105. +- Opus advisor pass on stage-323 brief: VERDICT=SHIP-READY. No coexistence bugs between #1895 and #1866 (disjoint hunks in routes.py, SSE event names align, `_profiles_match` resolution unambiguous either way, no path collisions). +- v0.51.27 fixes verified preserved: `_strip_workspace_prefix` (callers at routes.py:1446/1485), `on_interim_assistant` (streaming.py:2120), `_max_iterations_cfg` (streaming.py:2331-2410), `if input_tokens > 0:` guard (streaming.py:2933). +- Pre-stamp re-fetch of #1866 (sha f2aacf4) + #1895 (sha 766c91e): both MERGEABLE, no force-push during Opus window. + +### Follow-up items (filed for next sweep) + +- **Goal hook fires on unrelated turns** — while a goal is `active`, every completed assistant turn runs `evaluate_goal_after_turn` and ticks `state.turns_used += 1`, even on user messages unrelated to the goal. UX surprise but not bug-broken; consider gating on `user_initiated` or a goal-context flag. +- **English-only runtime strings in goal UI** — `messages.js:889` ("Evaluating goal progress…"), `commands.js:651` ("Working toward goal…"), `messages.js:914` ("Continuing toward goal…" toast); also backend strings in `goals.py` (`status_line`, "⊙ Goal set …", "⏸ Goal paused …", "↻ Continuing …"). The `cmd_goal` autocomplete description IS localized across all 9 locales — only the runtime status strings are missed. + ### Fixed (4 PRs) diff --git a/ROADMAP.md b/ROADMAP.md index 41cfb98b..30080d0f 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,7 +2,7 @@ > Web companion to the Hermes Agent CLI. Same workflows, browser-native. > -> Last updated: v0.51.27 (May 8, 2026) — 4898 tests collected — 4-PR Release E1 batch (workspace-prefix sentinel hardening, custom named provider API key resolution, streaming chat scroll-pin, Kanban detail scrollable) +> Last updated: v0.51.28 (May 8, 2026) — 4947 tests collected — 2-PR Release E2 batch (MCP server Option A rewrite + WebUI /goal command) > Test source: `pytest tests/ --collect-only -q` > Per-version detail: see [CHANGELOG.md](./CHANGELOG.md) @@ -168,7 +168,7 @@ Remaining gaps and forward work live in [Forward Work](#forward-work) below. ### Slash commands - [x] Command registry + autocomplete dropdown -- [x] Built-ins: `/help`, `/clear`, `/model`, `/workspace`, `/new`, `/usage`, `/theme`, `/compact`, `/queue`, `/interrupt`, `/steer`, `/btw`, `/reasoning`, `/skills`, `/toolsets` +- [x] Built-ins: `/help`, `/clear`, `/model`, `/workspace`, `/new`, `/usage`, `/theme`, `/compact`, `/queue`, `/interrupt`, `/steer`, `/goal`, `/btw`, `/reasoning`, `/skills`, `/toolsets` - [x] Transparent pass-through for unrecognized commands ### Security diff --git a/TESTING.md b/TESTING.md index e906700c..d254ea4d 100644 --- a/TESTING.md +++ b/TESTING.md @@ -1835,8 +1835,8 @@ Bridged CLI sessions: --- -*Last updated: v0.51.27, May 8, 2026* -*Total automated tests collected: 4898* +*Last updated: v0.51.28, May 8, 2026* +*Total automated tests collected: 4947* *Regression gate: tests/test_regressions.py* *Run: pytest tests/ -v --timeout=60* *Source: /* diff --git a/api/goals.py b/api/goals.py new file mode 100644 index 00000000..32d0262d --- /dev/null +++ b/api/goals.py @@ -0,0 +1,489 @@ +"""WebUI bridge for Hermes persistent session goals.""" + +from __future__ import annotations + +import copy +import logging +import time +from pathlib import Path +from typing import Any, Dict, Optional + +logger = logging.getLogger(__name__) + +try: # Exposed as a module attribute so tests can monkeypatch it directly. + from hermes_cli.goals import ( # type: ignore + CONTINUATION_PROMPT_TEMPLATE, + DEFAULT_MAX_TURNS, + GoalManager as _NativeGoalManager, + GoalState, + judge_goal, + ) +except Exception: # pragma: no cover - depends on installed hermes-agent + CONTINUATION_PROMPT_TEMPLATE = "" # type: ignore + DEFAULT_MAX_TURNS = 20 # type: ignore + _NativeGoalManager = None # type: ignore + GoalState = None # type: ignore + judge_goal = None # type: ignore + +GoalManager = _NativeGoalManager # type: ignore + +_DB_CACHE: dict[str, Any] = {} + + +def _default_max_turns() -> int: + """Return the configured /goal turn budget, defaulting to Hermes' 20 turns.""" + try: + from api import config as _config + + cfg = getattr(_config, "cfg", {}) or {} + goals_cfg = cfg.get("goals", {}) if isinstance(cfg, dict) else {} + if not isinstance(goals_cfg, dict): + return int(DEFAULT_MAX_TURNS or 20) + return max(1, int(goals_cfg.get("max_turns", DEFAULT_MAX_TURNS or 20) or 20)) + except Exception: + return int(DEFAULT_MAX_TURNS or 20) + + +def _meta_key(session_id: str) -> str: + return f"goal:{session_id}" + + +def _profile_db(profile_home: str | Path): + """Return a SessionDB pinned to *profile_home*, without reading HERMES_HOME. + + The upstream Hermes GoalManager persists through hermes_cli.goals.load_goal(), + which resolves SessionDB from process-global HERMES_HOME. WebUI sessions are + profile-scoped and can run concurrently, so the WebUI bridge uses an explicit + state.db path whenever the caller provides the session's profile home. + """ + home = Path(profile_home).expanduser().resolve() + key = str(home) + cached = _DB_CACHE.get(key) + if cached is not None: + return cached + try: + from hermes_state import SessionDB # type: ignore + + db = SessionDB(db_path=home / "state.db") + except Exception as exc: # pragma: no cover - import/env dependent + logger.debug("GoalManager profile DB unavailable for %s: %s", home, exc) + return None + _DB_CACHE[key] = db + return db + + +class _ProfileGoalManager: + """Small WebUI-local GoalManager adapter with explicit profile persistence.""" + + def __init__(self, session_id: str, *, profile_home: str | Path, default_max_turns: int = 20): + if GoalState is None: + raise RuntimeError("Hermes goal state unavailable") + self.session_id = session_id + self.profile_home = Path(profile_home).expanduser().resolve() + self.default_max_turns = int(default_max_turns or DEFAULT_MAX_TURNS or 20) + self._state = self._load() + + @property + def state(self): + return self._state + + def _load(self): + db = _profile_db(self.profile_home) + if db is None or not self.session_id: + return None + try: + raw = db.get_meta(_meta_key(self.session_id)) + except Exception as exc: + logger.debug("GoalManager profile get_meta failed: %s", exc) + return None + if not raw: + return None + try: + return GoalState.from_json(raw) # type: ignore[union-attr] + except Exception as exc: + logger.warning("GoalManager profile state parse failed for %s: %s", self.session_id, exc) + return None + + def _save(self, state) -> None: + db = _profile_db(self.profile_home) + if db is None or not self.session_id or state is None: + return + try: + db.set_meta(_meta_key(self.session_id), state.to_json()) + except Exception as exc: + logger.debug("GoalManager profile set_meta failed: %s", exc) + + def is_active(self) -> bool: + return self._state is not None and self._state.status == "active" + + def has_goal(self) -> bool: + return self._state is not None and self._state.status in ("active", "paused") + + def status_line(self) -> str: + s = self._state + if s is None or s.status in ("cleared",): + return "No active goal. Set one with /goal ." + turns = f"{s.turns_used}/{s.max_turns} turns" + if s.status == "active": + return f"⊙ Goal (active, {turns}): {s.goal}" + if s.status == "paused": + extra = f" — {s.paused_reason}" if s.paused_reason else "" + return f"⏸ Goal (paused, {turns}{extra}): {s.goal}" + if s.status == "done": + return f"✓ Goal done ({turns}): {s.goal}" + return f"Goal ({s.status}, {turns}): {s.goal}" + + def set(self, goal: str, *, max_turns: Optional[int] = None): + goal = (goal or "").strip() + if not goal: + raise ValueError("goal text is empty") + state = GoalState( # type: ignore[operator] + goal=goal, + status="active", + turns_used=0, + max_turns=int(max_turns) if max_turns else self.default_max_turns, + created_at=time.time(), + last_turn_at=0.0, + ) + self._state = state + self._save(state) + return state + + def pause(self, reason: str = "user-paused"): + if not self._state: + return None + self._state.status = "paused" + self._state.paused_reason = reason + self._save(self._state) + return self._state + + def resume(self, *, reset_budget: bool = True): + if not self._state: + return None + self._state.status = "active" + self._state.paused_reason = None + if reset_budget: + self._state.turns_used = 0 + self._save(self._state) + return self._state + + def clear(self) -> None: + if self._state is None: + return + self._state.status = "cleared" + self._save(self._state) + self._state = None + + def evaluate_after_turn(self, last_response: str, *, user_initiated: bool = True) -> Dict[str, Any]: + state = self._state + if state is None or state.status != "active": + return { + "status": state.status if state else None, + "should_continue": False, + "continuation_prompt": None, + "verdict": "inactive", + "reason": "no active goal", + "message": "", + } + + state.turns_used += 1 + state.last_turn_at = time.time() + + if judge_goal is None: + verdict, reason = "continue", "goal judge unavailable" + else: + verdict, reason = judge_goal(state.goal, str(last_response or "")) + state.last_verdict = verdict + state.last_reason = reason + + if verdict == "done": + state.status = "done" + self._save(state) + return { + "status": "done", + "should_continue": False, + "continuation_prompt": None, + "verdict": "done", + "reason": reason, + "message": f"✓ Goal achieved: {reason}", + } + + if state.turns_used >= state.max_turns: + state.status = "paused" + state.paused_reason = f"turn budget exhausted ({state.turns_used}/{state.max_turns})" + self._save(state) + return { + "status": "paused", + "should_continue": False, + "continuation_prompt": None, + "verdict": "continue", + "reason": reason, + "message": ( + f"⏸ Goal paused — {state.turns_used}/{state.max_turns} turns used. " + "Use /goal resume to keep going, or /goal clear to stop." + ), + } + + self._save(state) + return { + "status": "active", + "should_continue": True, + "continuation_prompt": self.next_continuation_prompt(), + "verdict": "continue", + "reason": reason, + "message": f"↻ Continuing toward goal ({state.turns_used}/{state.max_turns}): {reason}", + } + + def next_continuation_prompt(self) -> Optional[str]: + if not self._state or self._state.status != "active": + return None + return CONTINUATION_PROMPT_TEMPLATE.format(goal=self._state.goal) + + +def _manager(session_id: str, *, profile_home: str | Path | None = None): + if GoalManager is None: + return None + if profile_home and GoalManager is _NativeGoalManager and GoalState is not None: + try: + return _ProfileGoalManager( + session_id=session_id, + profile_home=profile_home, + default_max_turns=_default_max_turns(), + ) + except Exception as exc: + logger.debug("Profile-scoped GoalManager unavailable: %s", exc) + return None + return GoalManager(session_id=session_id, default_max_turns=_default_max_turns()) + + +def _state_payload(state: Any) -> Optional[Dict[str, Any]]: + if state is None: + return None + return { + "goal": getattr(state, "goal", "") or "", + "status": getattr(state, "status", "") or "", + "turns_used": int(getattr(state, "turns_used", 0) or 0), + "max_turns": int(getattr(state, "max_turns", 0) or 0), + "last_verdict": getattr(state, "last_verdict", None), + "last_reason": getattr(state, "last_reason", None), + "paused_reason": getattr(state, "paused_reason", None), + } + + +def _payload( + *, + ok: bool = True, + action: str, + message: str, + state: Any = None, + error: str | None = None, + kickoff_prompt: str | None = None, + decision: Dict[str, Any] | None = None, +) -> Dict[str, Any]: + body: Dict[str, Any] = { + "ok": bool(ok), + "action": action, + "message": message, + "goal": _state_payload(state), + } + if error: + body["error"] = error + if kickoff_prompt: + body["kickoff_prompt"] = kickoff_prompt + if decision is not None: + body["decision"] = decision + return body + + +def goal_state_snapshot(session_id: str, *, profile_home: str | Path | None = None) -> Any: + """Return a deep copy of current goal state for rollback before kickoff.""" + mgr = _manager(str(session_id or ""), profile_home=profile_home) + if mgr is None: + return None + return copy.deepcopy(getattr(mgr, "state", None)) + + +def restore_goal_state(session_id: str, snapshot: Any, *, profile_home: str | Path | None = None) -> None: + """Restore a prior goal state after kickoff stream creation fails.""" + mgr = _manager(str(session_id or ""), profile_home=profile_home) + if mgr is None: + return + if snapshot is None: + try: + mgr.clear() + except Exception: + pass + return + if isinstance(mgr, _ProfileGoalManager): + mgr._state = snapshot + mgr._save(snapshot) + return + try: + from hermes_cli.goals import save_goal # type: ignore + + save_goal(str(session_id or ""), snapshot) + except Exception as exc: # pragma: no cover - native fallback only + logger.debug("Goal state restore failed for %s: %s", session_id, exc) + + +def goal_command_payload( + session_id: str, + args: str = "", + *, + stream_running: bool = False, + profile_home: str | Path | None = None, +) -> Dict[str, Any]: + """Return the WebUI response payload for a /goal command. + + Mirrors the gateway command semantics: + - /goal or /goal status shows status + - /goal pause pauses + - /goal resume resumes without auto-starting a turn + - /goal clear|stop|done clears + - /goal sets a new active goal and returns kickoff_prompt so the + caller can start the first normal user-role turn immediately. + """ + sid = str(session_id or "").strip() + if not sid: + return _payload(ok=False, action="error", error="missing_session", message="session_id required") + + mgr = _manager(sid, profile_home=profile_home) + if mgr is None: + return _payload(ok=False, action="error", error="unavailable", message="Goals unavailable on this session.") + + text = str(args or "").strip() + lower = text.lower() + + if not text or lower == "status": + return _payload(action="status", message=mgr.status_line(), state=getattr(mgr, "state", None)) + + if lower == "pause": + state = mgr.pause(reason="user-paused") + if state is None: + return _payload(ok=False, action="pause", error="no_goal", message="No goal set.") + return _payload(action="pause", message=f"⏸ Goal paused: {state.goal}", state=state) + + if lower == "resume": + state = mgr.resume() + if state is None: + return _payload(ok=False, action="resume", error="no_goal", message="No goal to resume.") + return _payload( + action="resume", + message=( + f"▶ Goal resumed: {state.goal}\n" + "Send a new message, or type continue, to kick it off." + ), + state=state, + ) + + if lower in ("clear", "stop", "done"): + had = bool(mgr.has_goal()) + mgr.clear() + return _payload( + action="clear", + message="Goal cleared." if had else "No active goal.", + state=getattr(mgr, "state", None), + ) + + if stream_running: + return _payload( + ok=False, + action="set", + error="agent_running", + message=( + "Agent is running — use /goal status / pause / clear mid-run, " + "or /stop before setting a new goal." + ), + ) + + try: + state = mgr.set(text) + except ValueError as exc: + return _payload(ok=False, action="set", error="invalid_goal", message=f"Invalid goal: {exc}") + + return _payload( + action="set", + message=( + f"⊙ Goal set ({state.max_turns}-turn budget): {state.goal}\n" + "I'll keep working until the goal is done, you pause/clear it, or the budget is exhausted.\n" + "Controls: /goal status · /goal pause · /goal resume · /goal clear" + ), + state=state, + kickoff_prompt=state.goal, + ) + + +def has_active_goal( + session_id: str, + *, + profile_home: str | Path | None = None, +) -> bool: + """Return True when the session has an active standing goal to evaluate.""" + sid = str(session_id or "").strip() + if not sid: + return False + mgr = _manager(sid, profile_home=profile_home) + if mgr is None: + return False + try: + return bool(mgr.is_active()) + except Exception as exc: + logger.debug("goal active-state check failed for session=%s: %s", sid, exc) + return False + + +def evaluate_goal_after_turn( + session_id: str, + last_response: str, + *, + user_initiated: bool = True, + profile_home: str | Path | None = None, +) -> Dict[str, Any]: + """Evaluate a completed turn against the standing goal, if any.""" + sid = str(session_id or "").strip() + if not sid: + return { + "status": None, + "should_continue": False, + "continuation_prompt": None, + "verdict": "inactive", + "reason": "missing session_id", + "message": "", + } + mgr = _manager(sid, profile_home=profile_home) + if mgr is None: + return { + "status": None, + "should_continue": False, + "continuation_prompt": None, + "verdict": "inactive", + "reason": "goals unavailable", + "message": "", + } + try: + if not mgr.is_active(): + return { + "status": getattr(getattr(mgr, "state", None), "status", None), + "should_continue": False, + "continuation_prompt": None, + "verdict": "inactive", + "reason": "no active goal", + "message": "", + } + decision = mgr.evaluate_after_turn(str(last_response or ""), user_initiated=user_initiated) + except Exception as exc: + logger.debug("goal evaluation failed for session=%s: %s", sid, exc) + return { + "status": None, + "should_continue": False, + "continuation_prompt": None, + "verdict": "error", + "reason": f"goal evaluation failed: {type(exc).__name__}", + "message": "", + } + if not isinstance(decision, dict): + decision = {} + decision.setdefault("should_continue", False) + decision.setdefault("continuation_prompt", None) + decision.setdefault("message", "") + return decision diff --git a/api/profiles.py b/api/profiles.py index f9744c27..9af9bcba 100644 --- a/api/profiles.py +++ b/api/profiles.py @@ -170,6 +170,33 @@ def _is_root_profile(name: str) -> bool: return name in _root_profile_name_cache +def _profiles_match(row_profile, active_profile) -> bool: + """Return True if a session/project row's profile matches the active profile. + + Treats both the literal alias 'default' and any renamed-root display name + (per _is_root_profile) as equivalent, so legacy rows tagged 'default' + still surface when the user has renamed the root profile to e.g. 'kinni', + and vice versa. + + A row with no profile (`None` or empty string) is treated as belonging to + the root profile — that's the convention used by the legacy backfill at + api/models.py::all_sessions, and matches the default seen in + `static/sessions.js` (`S.activeProfile||'default'`). + + Originally lived in api/routes.py; relocated here so both routes.py and + out-of-process consumers (mcp_server.py) can import the canonical helper + instead of duplicating the body. See #1614 for the visibility model. + """ + row = row_profile or 'default' + active = active_profile or 'default' + if row == active: + return True + # Cross-alias the renamed root. + if _is_root_profile(row) and _is_root_profile(active): + return True + return False + + def get_active_profile_name() -> str: """Return the currently active profile name. diff --git a/api/routes.py b/api/routes.py index 1fc35ad0..26bcd782 100644 --- a/api/routes.py +++ b/api/routes.py @@ -72,29 +72,11 @@ _STALE_MESSAGING_END_REASONS = {"session_reset", "session_switch"} # when the active profile is `'default'`. _is_root_profile() is the # canonical check. -def _profiles_match(row_profile, active_profile) -> bool: - """Return True if a session/project row's profile matches the active profile. - - Treats both the literal alias 'default' and any renamed-root display name - (per _is_root_profile) as equivalent, so legacy rows tagged 'default' - still surface when the user has renamed the root profile to e.g. 'kinni', - and vice versa. - - A row with no profile (`None` or empty string) is treated as belonging to - the root profile — that's the convention used by the legacy backfill at - api/models.py::all_sessions, and matches the default seen in - `static/sessions.js` (`S.activeProfile||'default'`). - """ - from api.profiles import _is_root_profile - - row = row_profile or 'default' - active = active_profile or 'default' - if row == active: - return True - # Cross-alias the renamed root. - if _is_root_profile(row) and _is_root_profile(active): - return True - return False +# Canonical helper now lives in api.profiles so out-of-process consumers +# (mcp_server.py) can import it without duplicating the visibility model. +# Re-exported here so existing `_profiles_match(...)` call sites in this +# module keep resolving without per-call-site refactors. +from api.profiles import _profiles_match # noqa: F401, E402 (re-export) def _all_profiles_query_flag(parsed_url) -> bool: @@ -4287,6 +4269,9 @@ def handle_post(handler, parsed) -> bool: if parsed.path == "/api/background": return _handle_background(handler, body) + if parsed.path == "/api/goal": + return _handle_goal_command(handler, body) + if parsed.path == "/api/chat/start": return _handle_chat_start(handler, body, diag=diag) @@ -6456,6 +6441,197 @@ def _prepare_chat_start_session_for_stream( s.save() +def _start_chat_stream_for_session( + s, + *, + msg: str, + attachments=None, + workspace: str, + model: str, + model_provider=None, + normalized_model: bool = False, + diag=None, +): + """Persist pending state, register an SSE channel, and start an agent turn.""" + attachments = attachments or [] + # Prevent duplicate runs in the same session while a stream is still active. + # This commonly happens after page refresh/reconnect races and can produce + # duplicated clarify cards for what appears to be a single user request. + diag.stage("active_stream_check") if diag else None + current_stream_id = getattr(s, "active_stream_id", None) + if current_stream_id: + diag.stage("active_stream_lock_wait") if diag else None + with STREAMS_LOCK: + current_active = current_stream_id in STREAMS + if current_active: + diag.stage("response_write") if diag else None + return { + "error": "session already has an active stream", + "active_stream_id": current_stream_id, + "_status": 409, + } + # Stale stream id from a previous run; clear and continue. + diag.stage("stale_stream_cleanup") if diag else None + _clear_stale_stream_state(s) + stream_id = uuid.uuid4().hex + session_lock = _get_session_agent_lock(s.session_id) + diag.stage("session_lock_wait") if diag else None + with session_lock: + diag.stage("save_pending_state") if diag else None + _prepare_chat_start_session_for_stream( + s, + msg=msg, + attachments=attachments, + workspace=workspace, + model=model, + model_provider=model_provider, + stream_id=stream_id, + ) + diag.stage("set_last_workspace") if diag else None + set_last_workspace(workspace) + diag.stage("stream_registration") if diag else None + stream = create_stream_channel() + with STREAMS_LOCK: + STREAMS[stream_id] = stream + diag.stage("worker_thread_start") if diag else None + thr = threading.Thread( + target=_run_agent_streaming, + args=(s.session_id, msg, model, workspace, stream_id, attachments), + kwargs={"model_provider": model_provider}, + daemon=True, + ) + thr.start() + response = { + "stream_id": stream_id, + "session_id": s.session_id, + "pending_started_at": s.pending_started_at, + } + if normalized_model: + response["effective_model"] = model + if model_provider: + response["effective_model_provider"] = model_provider + return response + + +def _handle_goal_command(handler, body): + """Handle WebUI /goal command controls and optional kickoff stream.""" + try: + require(body, "session_id") + except ValueError as e: + return bad(handler, str(e)) + try: + s = get_session(body["session_id"]) + except KeyError: + return bad(handler, "Session not found", 404) + + requested_profile = str(body.get("profile") or "").strip() + if requested_profile: + try: + from api.profiles import _PROFILE_ID_RE + + if requested_profile != "default" and not _PROFILE_ID_RE.fullmatch(requested_profile): + return bad(handler, "invalid profile", 400) + except ImportError: + requested_profile = "" + if requested_profile and not _profiles_match(getattr(s, "profile", None), requested_profile): + has_persisted_turns = bool( + getattr(s, "messages", None) + or getattr(s, "context_messages", None) + or getattr(s, "pending_user_message", None) + ) + if not has_persisted_turns: + s.profile = requested_profile + + current_stream_id = getattr(s, "active_stream_id", None) + stream_running = False + if current_stream_id: + with STREAMS_LOCK: + stream_running = current_stream_id in STREAMS + if not stream_running: + _clear_stale_stream_state(s) + + try: + from api.profiles import get_hermes_home_for_profile + + profile_home = get_hermes_home_for_profile(getattr(s, "profile", None)) + except Exception: + profile_home = None + + from api.goals import goal_command_payload, goal_state_snapshot, restore_goal_state + + goal_args = str(body.get("args", "") or body.get("text", "") or "") + goal_action = goal_args.strip().lower() + will_kickoff = bool( + goal_args.strip() + and goal_action not in ("status", "pause", "resume", "clear", "stop", "done") + and not stream_running + ) + workspace = model = model_provider = normalized_model = None + previous_goal_state = None + if will_kickoff: + try: + workspace = str(resolve_trusted_workspace(body.get("workspace") or s.workspace)) + except ValueError as e: + return bad(handler, str(e)) + requested_model = body.get("model") or s.model + requested_provider = ( + body.get("model_provider") + if "model_provider" in body + else getattr(s, "model_provider", None) + ) + model, model_provider, normalized_model = _resolve_compatible_session_model_state( + requested_model, + requested_provider, + ) + previous_goal_state = goal_state_snapshot(s.session_id, profile_home=profile_home) + + payload = goal_command_payload( + s.session_id, + goal_args, + stream_running=stream_running, + profile_home=profile_home, + ) + if not payload.get("ok", True): + status = 409 if payload.get("error") == "agent_running" else 400 + return j(handler, payload, status=status) + + kickoff_prompt = str(payload.get("kickoff_prompt") or "").strip() + if kickoff_prompt: + if workspace is None: + try: + workspace = str(resolve_trusted_workspace(body.get("workspace") or s.workspace)) + except ValueError as e: + return bad(handler, str(e)) + if model is None: + requested_model = body.get("model") or s.model + requested_provider = ( + body.get("model_provider") + if "model_provider" in body + else getattr(s, "model_provider", None) + ) + model, model_provider, normalized_model = _resolve_compatible_session_model_state( + requested_model, + requested_provider, + ) + stream_response = _start_chat_stream_for_session( + s, + msg=kickoff_prompt, + attachments=[], + workspace=workspace, + model=model, + model_provider=model_provider, + normalized_model=normalized_model, + ) + status = int(stream_response.pop("_status", 200) or 200) + payload.update(stream_response) + if status >= 400: + restore_goal_state(s.session_id, previous_goal_state, profile_home=profile_home) + payload["ok"] = False + return j(handler, payload, status=status) + + return j(handler, payload) + + def _handle_chat_start(handler, body, diag=None): try: diag.stage("validate_session_id") if diag else None @@ -6512,72 +6688,25 @@ def _handle_chat_start(handler, body, diag=None): requested_model, requested_provider, ) - # Prevent duplicate runs in the same session while a stream is still active. - # This commonly happens after page refresh/reconnect races and can produce - # duplicated clarify cards for what appears to be a single user request. - diag.stage("active_stream_check") if diag else None - current_stream_id = getattr(s, "active_stream_id", None) - if current_stream_id: - diag.stage("active_stream_lock_wait") if diag else None - with STREAMS_LOCK: - current_active = current_stream_id in STREAMS - if current_active: - diag.stage("response_write") if diag else None - return j( - handler, - { - "error": "session already has an active stream", - "active_stream_id": current_stream_id, - }, - status=409, - ) - # Stale stream id from a previous run; clear and continue. - diag.stage("stale_stream_cleanup") if diag else None - _clear_stale_stream_state(s) - stream_id = uuid.uuid4().hex - session_lock = _get_session_agent_lock(s.session_id) - diag.stage("session_lock_wait") if diag else None - with session_lock: - diag.stage("save_pending_state") if diag else None - _prepare_chat_start_session_for_stream( - s, - msg=msg, - attachments=attachments, - workspace=workspace, - model=model, - model_provider=model_provider, - stream_id=stream_id, - ) - diag.stage("set_last_workspace") if diag else None - set_last_workspace(workspace) - diag.stage("stream_registration") if diag else None - stream = create_stream_channel() - with STREAMS_LOCK: - STREAMS[stream_id] = stream - diag.stage("worker_thread_start") if diag else None - thr = threading.Thread( - target=_run_agent_streaming, - args=(s.session_id, msg, model, workspace, stream_id, attachments), - kwargs={"model_provider": model_provider}, - daemon=True, + response = _start_chat_stream_for_session( + s, + msg=msg, + attachments=attachments, + workspace=workspace, + model=model, + model_provider=model_provider, + normalized_model=normalized_model, + diag=diag, ) - thr.start() - response = { - "stream_id": stream_id, - "session_id": s.session_id, - "pending_started_at": s.pending_started_at, - } - if normalized_model: - response["effective_model"] = model - if model_provider: - response["effective_model_provider"] = model_provider + status = int(response.pop("_status", 200) or 200) diag.stage("response_write") if diag else None - return j(handler, response) + return j(handler, response, status=status) finally: if diag: diag.finish() + def _normalize_chat_attachments(raw_attachments): """Normalize attachment payloads from the browser. diff --git a/api/streaming.py b/api/streaming.py index c71fe074..d9daeb33 100644 --- a/api/streaming.py +++ b/api/streaming.py @@ -3227,6 +3227,64 @@ def _run_agent_streaming( }) except Exception: logger.debug("Failed to drain pending steer for session %s", session_id) + # /goal parity: after a successful assistant turn, run the Hermes + # GoalManager judge before terminal done/stream_end events. The + # frontend surfaces the status line and queues continuation_prompt as + # a normal next user message so /queue and user input keep priority. + try: + from api.goals import evaluate_goal_after_turn, has_active_goal + + if not has_active_goal(session_id, profile_home=_profile_home): + _goal_decision = {} + else: + _last_goal_response = '' + for _goal_msg in reversed(s.messages or []): + if not isinstance(_goal_msg, dict) or _goal_msg.get('role') != 'assistant': + continue + _goal_content = _goal_msg.get('content', '') + if isinstance(_goal_content, list): + _goal_parts = [] + for _goal_part in _goal_content: + if isinstance(_goal_part, dict): + _goal_text = _goal_part.get('text') or _goal_part.get('content') + if _goal_text: + _goal_parts.append(str(_goal_text)) + _last_goal_response = '\n'.join(_goal_parts) + else: + _last_goal_response = str(_goal_content or '') + break + put('goal', { + 'session_id': session_id, + 'state': 'evaluating', + 'message': 'Evaluating goal progress…', + }) + _goal_decision = evaluate_goal_after_turn( + session_id, + _last_goal_response, + user_initiated=True, + profile_home=_profile_home, + ) + decision = _goal_decision or {} + _goal_message = str(decision.get('message') or '').strip() + if _goal_message: + put('goal', { + 'session_id': session_id, + 'state': 'continuing' if decision.get('should_continue') else 'idle', + 'message': _goal_message, + 'decision': decision, + }) + if decision.get('should_continue'): + continuation_prompt = str(decision.get('continuation_prompt') or '').strip() + if continuation_prompt: + put('goal_continue', { + 'session_id': session_id, + 'continuation_prompt': continuation_prompt, + 'text': continuation_prompt, + 'message': _goal_message, + 'decision': decision, + }) + except Exception as _goal_exc: + logger.debug("Goal continuation hook failed for session %s: %s", session_id, _goal_exc) raw_session = s.compact() | {'messages': s.messages, 'tool_calls': tool_calls} put('done', {'session': redact_session_data(raw_session), 'usage': usage}) # Emit one last metering packet for the live message-header TPS label. diff --git a/docs/pr-media/1808/goal-autocomplete.png b/docs/pr-media/1808/goal-autocomplete.png new file mode 100644 index 00000000..e07e32a5 Binary files /dev/null and b/docs/pr-media/1808/goal-autocomplete.png differ diff --git a/docs/pr-media/1808/goal-command-set.png b/docs/pr-media/1808/goal-command-set.png new file mode 100644 index 00000000..f5ee966d Binary files /dev/null and b/docs/pr-media/1808/goal-command-set.png differ diff --git a/docs/pr-media/1808/goal-status-command.png b/docs/pr-media/1808/goal-status-command.png new file mode 100644 index 00000000..26eef14b Binary files /dev/null and b/docs/pr-media/1808/goal-status-command.png differ diff --git a/docs/pr-media/1866/goal-evaluating-status.png b/docs/pr-media/1866/goal-evaluating-status.png new file mode 100644 index 00000000..a411a43a Binary files /dev/null and b/docs/pr-media/1866/goal-evaluating-status.png differ diff --git a/mcp_server.py b/mcp_server.py new file mode 100644 index 00000000..53ff2ef4 --- /dev/null +++ b/mcp_server.py @@ -0,0 +1,567 @@ +#!/usr/bin/env python3 +""" +Hermes WebUI MCP Server — exposes project and session management +as MCP tools for any MCP-compatible agent. + +Option A rewrite (2026-05-08): imports api.models and api.profiles +directly from the webui codebase, using canonical helpers for +locking, profile scoping, index consistency, and validation. + + pip install mcp # one-time setup + python3 mcp_server.py # start via stdio + +MCP config for Hermes Agent (add to config.yaml): + mcp_servers: + hermes-webui: + command: /path/to/venv/bin/python3 + args: [/path/to/hermes-webui/mcp_server.py] + env: + HERMES_WEBUI_PASSWORD: your_password + +Profile override (optional): + args: [/path/to/hermes-webui/mcp_server.py, --profile, myprofile] + +AI-authoring disclosure: this file was rewritten by MILO (Hermes Agent) +under human direction, per maintainer guidelines for #1616. +""" + +import argparse +import json +import os +import re +import sys +import time +import uuid +from pathlib import Path + +from mcp.server import Server +from mcp.server.stdio import stdio_server +from mcp.types import Tool, TextContent + +# ── Ensure the repo root is on sys.path so api.* imports work ───────────── +_REPO_ROOT = Path(__file__).parent.resolve() +if str(_REPO_ROOT) not in sys.path: + sys.path.insert(0, str(_REPO_ROOT)) + +# ── CLI: optional --profile override ────────────────────────────────────── +_profile_arg: str | None = None +_parser = argparse.ArgumentParser(add_help=False) +_parser.add_argument("--profile", type=str, default=None) +_args, _unknown = _parser.parse_known_args() +_profile_arg = _args.profile + +# ── Import webui canonical modules (after path setup) ───────────────────── +import api.config as _cfg +from api.config import ( + STATE_DIR, SESSION_DIR, SESSION_INDEX_FILE, PROJECTS_FILE, HOME, +) +from api.models import load_projects, save_projects +from api.profiles import get_active_profile_name, _is_root_profile, _profiles_match + +# ── Apply --profile override before any module uses get_active_profile_name +if _profile_arg is not None: + import api.profiles as _profiles + _profiles._active_profile = _profile_arg + +# ── API auth state ───────────────────────────────────────────────────────── +# Mirror the env-var contract used by api/config.py:32-33 so a non-default +# WebUI port/host (e.g. when 8787 is held by another service on the host) +# Just Works without configuration drift between the WebUI process and MCP. +WEBUI_HOST = os.environ.get("HERMES_WEBUI_HOST", "127.0.0.1") +WEBUI_PORT = os.environ.get("HERMES_WEBUI_PORT", "8787") +WEBUI_URL = f"http://{WEBUI_HOST}:{WEBUI_PORT}" +_auth_cookie: str | None = None +_auth_expires: float = 0 # unix timestamp after which we re-auth + +server = Server("hermes-webui") + + +# ═══════════════════════════════════════════════════════════════════════════ +# Helpers — filesystem (project CRUD via canonical api.models) +# ═══════════════════════════════════════════════════════════════════════════ + +def _active_profile() -> str: + """Shorthand for the current profile name (--profile or auto-detected).""" + return get_active_profile_name() or 'default' + + +def _validate_color(color: str | None) -> str | None: + """Return an error string if color is invalid, else None.""" + if color is not None and not re.match(r"^#[0-9a-fA-F]{3,8}$", color): + return "Invalid color format (use #RGB, #RRGGBB, or #RRGGBBAA)" + return None + + +def _load_index() -> list: + """Read the session index. Falls back to empty list on failure.""" + if not SESSION_INDEX_FILE.exists(): + return [] + try: + return json.loads(SESSION_INDEX_FILE.read_text(encoding="utf-8")) + except Exception: + return [] + + +def _session_compact(row: dict) -> dict: + """Lightweight compact representation of a session index entry.""" + return { + "session_id": row.get("session_id"), + "title": row.get("title"), + "project_id": row.get("project_id"), + "workspace": row.get("workspace"), + "model": row.get("model"), + "message_count": row.get("message_count", 0), + "source_tag": row.get("source_tag"), + "is_cli_session": row.get("is_cli_session", False), + "profile": row.get("profile"), + } + + +# ═══════════════════════════════════════════════════════════════════════════ +# Helpers — HTTP API (for mutations that need cache sync) +# ═══════════════════════════════════════════════════════════════════════════ + +def _api_password() -> str | None: + """Return the plaintext webui password from HERMES_WEBUI_PASSWORD, or None. + + settings.json stores only the bcrypt hash, which the login endpoint cannot + accept — it calls verify_password(plaintext) against the stored hash. So + there's no usable fallback when the env var is unset; the MCP simply runs + in unauthenticated mode and any auth-protected mutation will fail clearly + with the server's 401 instead of silently sending an unusable hash. + """ + pw = os.environ.get("HERMES_WEBUI_PASSWORD", "").strip() + return pw or None + + +def _api_auth() -> str | None: + """Authenticate and return cookie value, or None if auth disabled/fails.""" + global _auth_cookie, _auth_expires + + pw = _api_password() + if not pw: + return None # auth not enabled — API calls will fail anyway + + # Reuse cookie if still valid (25 days — server issues 30-day cookies) + if _auth_cookie and time.time() < _auth_expires: + return _auth_cookie + + import urllib.request + + try: + req = urllib.request.Request( + f"{WEBUI_URL}/api/auth/login", + data=json.dumps({"password": pw}).encode(), + headers={"Content-Type": "application/json"}, + method="POST", + ) + resp = urllib.request.urlopen(req, timeout=5) + cookie = resp.headers.get("Set-Cookie", "") + if cookie: + _auth_cookie = cookie.split(";")[0] # "hermes_session=VALUE; ..." + _auth_expires = time.time() + 25 * 86400 # 25 days + return _auth_cookie + except Exception: + _auth_cookie = None + return None + + +def _api_post(endpoint: str, body: dict) -> dict: + """POST to webui API with auth cookie. Returns parsed JSON response.""" + import urllib.request + import urllib.error + + cookie = _api_auth() + headers = {"Content-Type": "application/json"} + if cookie: + headers["Cookie"] = cookie + + try: + req = urllib.request.Request( + f"{WEBUI_URL}{endpoint}", + data=json.dumps(body).encode(), + headers=headers, + method="POST", + ) + resp = urllib.request.urlopen(req, timeout=5) + return json.loads(resp.read()) + except urllib.error.HTTPError as e: + err_body = json.loads(e.read()) + return {"error": f"API {e.code}: {err_body.get('error', 'unknown')}"} + except Exception as e: + return {"error": f"API unreachable: {e}"} + + +# ═══════════════════════════════════════════════════════════════════════════ +# Tool handlers — read-only (filesystem, profile-aware) +# ═══════════════════════════════════════════════════════════════════════════ + +async def handle_list_projects(_arguments: dict) -> list[TextContent]: + """List all projects with session counts, scoped to active profile.""" + projects = load_projects() + active = _active_profile() + index = _load_index() + + # Session counts per project (from index) + counts: dict[str, int] = {} + for s in index: + pid = s.get("project_id") + if pid: + counts[pid] = counts.get(pid, 0) + 1 + + result = [] + for p in projects: + # Profile filter: legacy untagged rows are treated as 'default' by + # _profiles_match, so non-root profiles correctly hide them. + if not _profiles_match(p.get("profile"), active): + continue + entry = dict(p) + entry["session_count"] = counts.get(p["project_id"], 0) + result.append(entry) + + return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False, indent=2))] + + +async def handle_list_sessions(arguments: dict) -> list[TextContent]: + """List sessions, optionally filtered by project or unassigned status.""" + project_id = arguments.get("project_id") + unassigned = arguments.get("unassigned", False) + limit = max(1, min(500, arguments.get("limit", 50))) + active = _active_profile() + + index = _load_index() + sessions = [_session_compact(s) for s in index if s.get("session_id")] + + # Filter by profile: legacy untagged rows are treated as 'default' by + # _profiles_match (canonical convention), so non-root profiles hide them. + sessions = [s for s in sessions if _profiles_match(s.get("profile"), active)] + + if unassigned: + sessions = [s for s in sessions if not s["project_id"]] + elif project_id: + sessions = [s for s in sessions if s["project_id"] == project_id] + + sessions = sessions[:limit] + return [TextContent(type="text", text=json.dumps(sessions, ensure_ascii=False, indent=2))] + + +# ═══════════════════════════════════════════════════════════════════════════ +# Tool handlers — project CRUD (canonical helpers, profile-scoped) +# ═══════════════════════════════════════════════════════════════════════════ + +async def handle_create_project(arguments: dict) -> list[TextContent]: + """Create a new project (profile-scoped, exact-match title collision).""" + name = arguments.get("name", "").strip()[:128] + if not name: + return [TextContent(type="text", text=json.dumps( + {"error": "name is required"}, ensure_ascii=False))] + + color = arguments.get("color") + color_err = _validate_color(color) + if color_err: + return [TextContent(type="text", text=json.dumps( + {"error": color_err}, ensure_ascii=False))] + + active = _active_profile() + projects = load_projects() + + # Title collision: exact match (consistent with ensure_cron_project) + if any(p.get("name") == name and _profiles_match(p.get("profile"), active) + for p in projects): + return [TextContent(type="text", text=json.dumps( + {"error": f"Project '{name}' already exists"}, ensure_ascii=False))] + + proj = { + "project_id": uuid.uuid4().hex[:12], + "name": name, + "color": color, + "profile": active, + "created_at": time.time(), + } + projects.append(proj) + save_projects(projects) + + proj["session_count"] = 0 + return [TextContent(type="text", text=json.dumps(proj, ensure_ascii=False, indent=2))] + + +async def handle_rename_project(arguments: dict) -> list[TextContent]: + """Rename a project and optionally change its color (profile-checked).""" + project_id = arguments.get("project_id") + name = arguments.get("name", "").strip()[:128] + if not project_id or not name: + return [TextContent(type="text", text=json.dumps( + {"error": "project_id and name are required"}, ensure_ascii=False))] + + color = arguments.get("color") + color_err = _validate_color(color) + if color_err: + return [TextContent(type="text", text=json.dumps( + {"error": color_err}, ensure_ascii=False))] + + active = _active_profile() + projects = load_projects() + proj = next((p for p in projects if p["project_id"] == project_id), None) + if not proj: + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + + # #1614: profile ownership check + if not _profiles_match(proj.get("profile"), active): + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + + proj["name"] = name + if color is not None: + proj["color"] = color + save_projects(projects) + return [TextContent(type="text", text=json.dumps(proj, ensure_ascii=False, indent=2))] + + +async def handle_delete_project(arguments: dict) -> list[TextContent]: + """Delete a project and unassign all its sessions (profile-checked).""" + project_id = arguments.get("project_id") + if not project_id: + return [TextContent(type="text", text=json.dumps( + {"error": "project_id is required"}, ensure_ascii=False))] + + active = _active_profile() + projects = load_projects() + proj = next((p for p in projects if p["project_id"] == project_id), None) + if not proj: + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + + # #1614: profile ownership check + if not _profiles_match(proj.get("profile"), active): + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + + projects = [p for p in projects if p["project_id"] != project_id] + save_projects(projects) + + # Unassign sessions only when we can do it cache-safely via the HTTP API. + # The previous filesystem fallback wrote session_data directly with + # os.replace(), which bypassed _write_session_index() in api/models.py + # and left _index.json holding the stale project_id — a running WebUI + # would still group those sessions under the deleted project until a + # subsequent re-compact. Even calling Session.save() in-process would + # not help because the WebUI's SESSIONS dict cache (a separate process) + # still has the old project_id and overwrites our update on its next + # save. The HTTP API is the only cache-safe path; without auth we + # refuse and surface the limitation so the operator can act. + has_auth = bool(_api_password()) + if not has_auth: + return [TextContent(type="text", text=json.dumps({ + "ok": True, + "deleted": proj["name"], + "unassigned_sessions": 0, + "warning": "Set HERMES_WEBUI_PASSWORD to unassign sessions; " + "without auth the session index cannot be safely " + "updated and direct filesystem writes would cause " + "index drift in a running WebUI.", + }, ensure_ascii=False))] + + unassigned = 0 + if SESSION_DIR.exists(): + for p in SESSION_DIR.glob("*.json"): + if p.name.startswith("_"): + continue + try: + session_data = json.loads(p.read_text(encoding="utf-8")) + if session_data.get("project_id") == project_id: + sid = p.stem + result = _api_post("/api/session/move", + {"session_id": sid, "project_id": None}) + if "ok" in result or "session" in result: + unassigned += 1 + except Exception: + pass + + return [TextContent(type="text", text=json.dumps({ + "ok": True, + "deleted": proj["name"], + "unassigned_sessions": unassigned, + }, ensure_ascii=False))] + + +# ═══════════════════════════════════════════════════════════════════════════ +# Tool handlers — mutations (HTTP API with auth, cache-safe) +# ═══════════════════════════════════════════════════════════════════════════ + +async def handle_rename_session(arguments: dict) -> list[TextContent]: + """Rename a session via the authenticated webui API (cache-safe).""" + session_id = arguments.get("session_id") + title = arguments.get("title", "").strip()[:80] + if not session_id or not title: + return [TextContent(type="text", text=json.dumps( + {"error": "session_id and title are required"}, ensure_ascii=False))] + + result = _api_post("/api/session/rename", + {"session_id": session_id, "title": title}) + if "error" in result: + return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False))] + + session = result.get("session", {}) + return [TextContent(type="text", text=json.dumps({ + "ok": True, + "session_id": session_id, + "title": session.get("title", title), + "method": "api", + }, ensure_ascii=False, indent=2))] + + +async def handle_move_session(arguments: dict) -> list[TextContent]: + """Assign a session to a project via the authenticated webui API (cache-safe).""" + session_id = arguments.get("session_id") + project_id = arguments.get("project_id") # None/null = unassign + if not session_id: + return [TextContent(type="text", text=json.dumps( + {"error": "session_id is required"}, ensure_ascii=False))] + + # If project_id is provided, verify it exists and is profile-accessible + if project_id is not None: + projects = load_projects() + active = _active_profile() + target = next((p for p in projects if p["project_id"] == project_id), None) + if not target: + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + # #1614: refuse moves into projects owned by another profile + if not _profiles_match(target.get("profile"), active): + return [TextContent(type="text", text=json.dumps( + {"error": "Project not found"}, ensure_ascii=False))] + + result = _api_post("/api/session/move", + {"session_id": session_id, "project_id": project_id}) + if "error" in result: + return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False))] + + session = result.get("session", {}) + return [TextContent(type="text", text=json.dumps({ + "ok": True, + "session_id": session_id, + "project_id": project_id, + "title": session.get("title"), + "method": "api", + }, ensure_ascii=False, indent=2))] + + +# ═══════════════════════════════════════════════════════════════════════════ +# MCP Server wiring +# ═══════════════════════════════════════════════════════════════════════════ + +TOOLS = [ + Tool( + name="list_projects", + description="List all session projects with their IDs, names, colors, and session counts (scoped to active profile).", + inputSchema={"type": "object", "properties": {}, "required": []}, + ), + Tool( + name="create_project", + description="Create a new project for organizing sessions (profile-scoped).", + inputSchema={ + "type": "object", + "properties": { + "name": {"type": "string", "description": "Project name (max 128 chars)"}, + "color": {"type": "string", "description": "Optional hex color (#RGB, #RRGGBB, or #RRGGBBAA)"}, + }, + "required": ["name"], + }, + ), + Tool( + name="rename_project", + description="Rename a project and optionally change its color (profile-checked).", + inputSchema={ + "type": "object", + "properties": { + "project_id": {"type": "string", "description": "12-char project ID"}, + "name": {"type": "string", "description": "New name (max 128 chars)"}, + "color": {"type": "string", "description": "Optional new hex color"}, + }, + "required": ["project_id", "name"], + }, + ), + Tool( + name="delete_project", + description="Delete a project and unassign all its sessions (profile-checked).", + inputSchema={ + "type": "object", + "properties": { + "project_id": {"type": "string", "description": "12-char project ID to delete"}, + }, + "required": ["project_id"], + }, + ), + Tool( + name="rename_session", + description="Rename a session (updates sidebar via authenticated API, cache-safe).", + inputSchema={ + "type": "object", + "properties": { + "session_id": {"type": "string", "description": "Session ID"}, + "title": {"type": "string", "description": "New title (max 80 chars)"}, + }, + "required": ["session_id", "title"], + }, + ), + Tool( + name="move_session", + description="Assign a session to a project. Pass project_id=null to unassign. Uses authenticated API for cache safety (profile-checked).", + inputSchema={ + "type": "object", + "properties": { + "session_id": {"type": "string", "description": "Session ID"}, + "project_id": {"type": ["string", "null"], "description": "Project ID (or null to unassign)"}, + }, + "required": ["session_id", "project_id"], + }, + ), + Tool( + name="list_sessions", + description="List sessions, optionally filtered by project or unassigned status (profile-scoped).", + inputSchema={ + "type": "object", + "properties": { + "project_id": {"type": "string", "description": "Filter sessions by project ID"}, + "unassigned": {"type": "boolean", "description": "Show only sessions with no project"}, + "limit": {"type": "integer", "description": "Max results (default: 50, max: 500)"}, + }, + "required": [], + }, + ), +] + +HANDLERS = { + "list_projects": handle_list_projects, + "create_project": handle_create_project, + "rename_project": handle_rename_project, + "delete_project": handle_delete_project, + "rename_session": handle_rename_session, + "move_session": handle_move_session, + "list_sessions": handle_list_sessions, +} + + +@server.list_tools() +async def list_tools() -> list[Tool]: + return TOOLS + + +@server.call_tool() +async def call_tool(name: str, arguments: dict) -> list[TextContent]: + handler = HANDLERS.get(name) + if not handler: + return [TextContent(type="text", text=json.dumps( + {"error": f"Unknown tool: {name}"}, ensure_ascii=False))] + return await handler(arguments) + + +async def main(): + async with stdio_server() as (read, write): + await server.run(read, write, server.create_initialization_options()) + + +if __name__ == "__main__": + import asyncio + asyncio.run(main()) diff --git a/static/commands.js b/static/commands.js index 90fc5aca..3d5aa9f0 100644 --- a/static/commands.js +++ b/static/commands.js @@ -18,6 +18,7 @@ const COMMANDS=[ {name:'personality', desc:t('cmd_personality'), fn:cmdPersonality, arg:'name', subArgs:'personalities'}, {name:'skills', desc:t('cmd_skills'), fn:cmdSkills, arg:'query'}, {name:'stop', desc:t('cmd_stop'), fn:cmdStop, noEcho:true}, + {name:'goal', desc:t('cmd_goal'), fn:cmdGoal, arg:'[status|pause|resume|clear|text]', subArgs:['status','pause','resume','clear']}, {name:'queue', desc:t('cmd_queue'), fn:cmdQueue, arg:'message', noEcho:true}, {name:'interrupt', desc:t('cmd_interrupt'), fn:cmdInterrupt, arg:'message', noEcho:true}, {name:'steer', desc:t('cmd_steer'), fn:cmdSteer, arg:'message', noEcho:true}, @@ -625,6 +626,53 @@ async function cmdStop(){ else showToast(t('cancel_unavailable')); } +async function cmdGoal(args){ + if(!S.session){await newSession();await renderSessionList();} + if(!S.session||!S.session.session_id){showToast(t('no_active_session'));return;} + const activeSid=S.session.session_id; + try{ + const r=await api('/api/goal',{method:'POST',body:JSON.stringify({ + session_id:activeSid, + args:args||'', + workspace:S.session.workspace, + model:S.session.model||($('modelSelect')&&$('modelSelect').value)||'', + model_provider:S.session.model_provider||null, + profile:S.activeProfile||S.session.profile||'default', + })}); + const msg=String((r&&r.message)||'').trim(); + if(msg){ + S.messages.push({role:'assistant',content:msg,_ts:Date.now()/1000,_goalStatus:true,_transient:true}); + renderMessages({preserveScroll:true}); + showToast(msg.split('\n')[0],2600); + } + if(!r||!r.stream_id)return; + S.toolCalls=[]; + if(typeof clearLiveToolCards==='function')clearLiveToolCards(); + appendThinking();setBusy(true); + setComposerStatus('Working toward goal…'); + S.activeStreamId=r.stream_id; + if(S.session&&S.session.session_id===activeSid){ + S.session.active_stream_id=r.stream_id; + if(typeof r.pending_started_at==='number')S.session.pending_started_at=r.pending_started_at; + if(r.effective_model)S.session.model=r.effective_model; + if(r.effective_model_provider)S.session.model_provider=r.effective_model_provider; + } + INFLIGHT[activeSid]={messages:[...S.messages],uploaded:[],toolCalls:[]}; + if(typeof markInflight==='function')markInflight(activeSid,r.stream_id); + if(typeof saveInflightState==='function')saveInflightState(activeSid,{streamId:r.stream_id,messages:INFLIGHT[activeSid].messages,uploaded:[],toolCalls:[]}); + startApprovalPolling(activeSid); + startClarifyPolling(activeSid); + if(typeof _fetchYoloState==='function')_fetchYoloState(activeSid); + attachLiveStream(activeSid,r.stream_id,[]); + if(typeof renderSessionList==='function')void renderSessionList(); + }catch(e){ + const err=String((e&&e.message)||e||'Goal command failed'); + S.messages.push({role:'assistant',content:`**Goal command failed:** ${err}`,_ts:Date.now()/1000,_error:true}); + renderMessages({preserveScroll:true}); + showToast(err,3000); + } +} + // ── Busy-input mode commands ────────────────────────────────────────────── // These commands let users override the default busy_input_mode setting for a // specific message. They are only meaningful while the agent is running. diff --git a/static/i18n.js b/static/i18n.js index ffe70f84..41c97275 100644 --- a/static/i18n.js +++ b/static/i18n.js @@ -183,6 +183,7 @@ const LOCALES = { theme_set: 'Theme: ', no_active_session: 'No active session', cmd_queue: 'Queue a message for the next turn', + cmd_goal: 'Set or inspect a persistent goal', cmd_interrupt: 'Cancel current turn and send a new message', cmd_steer: 'Inject a mid-turn correction without interrupting the agent', cmd_queue_no_msg: 'Usage: /queue ', @@ -1204,6 +1205,7 @@ const LOCALES = { theme_set: 'テーマ: ', no_active_session: 'アクティブなセッションがありません', cmd_queue: '次のターン用にメッセージをキュー', + cmd_goal: '永続ゴールを設定または確認', cmd_interrupt: '現在のターンをキャンセルして新規メッセージを送信', cmd_steer: 'エージェントを中断せずにターン中に修正を注入', cmd_queue_no_msg: '使い方: /queue <メッセージ>', @@ -2186,6 +2188,7 @@ const LOCALES = { theme_set: 'Тема: ', no_active_session: 'Нет активной сессии', cmd_queue: 'Поставить сообщение в очередь на следующий оборот', + cmd_goal: 'Задать или проверить постоянную цель', cmd_interrupt: 'Прервать текущий оборот и отправить новое сообщение', cmd_steer: 'Направить агента исправлением (переходит к прерыванию)', cmd_queue_no_msg: 'Использование: /queue <сообщение>', @@ -3173,6 +3176,7 @@ const LOCALES = { theme_set: 'Tema: ', no_active_session: 'No hay ninguna sesión activa', cmd_queue: 'Poner mensaje en cola para el siguiente turno', + cmd_goal: 'Definir o consultar un objetivo persistente', cmd_interrupt: 'Cancelar turno actual y enviar nuevo mensaje', cmd_steer: 'Inyectar una corrección a mitad del turno sin interrumpir al agente', cmd_queue_no_msg: 'Uso: /queue ', @@ -4109,6 +4113,7 @@ const LOCALES = { model_scope_advisory: 'Gilt für diesen Chat ab Ihrer nächsten Nachricht.', model_scope_toast: 'Gilt für diesen Chat ab Ihrer nächsten Nachricht.', cmd_queue: 'Nachricht f\u00fcr den n\u00e4chsten Durchgang einreihen', + cmd_goal: 'Ein dauerhaftes Ziel setzen oder prüfen', cmd_interrupt: 'Aktuellen Durchgang abbrechen und neue Nachricht senden', cmd_steer: 'Korrektursignal einf\u00fcgen ohne Unterbrechung', cmd_queue_no_msg: 'Verwendung: /queue ', @@ -5082,6 +5087,7 @@ const LOCALES = { theme_set: '\u4e3b\u9898\uff1a', no_active_session: '\u5f53\u524d\u6ca1\u6709\u6d3b\u52a8\u4f1a\u8bdd', cmd_queue: '\u5c06\u6d88\u606f\u52a0\u5165\u4e0b\u4e00\u8f6e\u7684\u961f\u5217', + cmd_goal: '设置或查看持久目标', cmd_interrupt: '\u53d6\u6d88\u5f53\u524d\u56de\u5408\u5e76\u53d1\u9001\u65b0\u6d88\u606f', cmd_steer: '\u7528\u7ea0\u6b63\u4fe1\u606f\u5f15\u5bfc\u4ee3\u7406\uff08\u56de\u9000\u4e3a\u4e2d\u65ad\uff09', cmd_queue_no_msg: '\u7528\u6cd5\uff1a/queue <\u6d88\u606f>', @@ -6459,6 +6465,7 @@ const LOCALES = { never: '\u5f9e\u4e0d', no_active_session: '\u7121\u6d3b\u8e8d\u6703\u8a71', cmd_queue: '\u5c07\u8a0a\u606f\u52a0\u5165\u4e0b\u4e00\u8f2a\u7684\u4f47\u5217', + cmd_goal: '設定或查看持久目標', cmd_interrupt: '\u53d6\u6d88\u7576\u524d\u56de\u5408\u4e26\u767c\u9001\u65b0\u8a0a\u606f', cmd_steer: '\u5728\u56de\u5408\u9032\u884c\u4e2d\u6ce8\u5165\u7d3a\u6b63\uff0c\u4e0d\u4e2d\u65b7\u4ee3\u7406', cmd_queue_no_msg: '\u7528\u6cd5\uff1a/queue <\u8a0a\u606f>', @@ -6972,6 +6979,7 @@ const LOCALES = { theme_set: 'Tema: ', no_active_session: 'Nenhuma sessão ativa', cmd_queue: 'Enfileirar mensagem para o próximo turno', + cmd_goal: 'Definir ou consultar uma meta persistente', cmd_interrupt: 'Cancelar turno atual e enviar nova mensagem', cmd_steer: 'Injetar correção no meio do turno sem interromper', cmd_queue_no_msg: 'Uso: /queue ', @@ -7869,6 +7877,7 @@ const LOCALES = { theme_set: 'Theme: ', no_active_session: '활성 세션 없음', cmd_queue: 'Queue a message for the next turn', + cmd_goal: '지속 목표를 설정하거나 확인', cmd_interrupt: 'Cancel current turn and send a new message', cmd_steer: 'Inject a mid-turn correction without interrupting the agent', cmd_queue_no_msg: 'Usage: /queue ', diff --git a/static/messages.js b/static/messages.js index 95ca37d5..3c196937 100644 --- a/static/messages.js +++ b/static/messages.js @@ -63,7 +63,7 @@ async function send(){ // cmdSteer / cmdInterrupt say "No active task to stop." if(text.startsWith('/')){ const _pc=typeof parseCommand==='function'&&parseCommand(text); - if(_pc&&['steer','interrupt','queue','terminal'].includes(_pc.name)){ + if(_pc&&['steer','interrupt','queue','terminal','goal'].includes(_pc.name)){ const _bc=COMMANDS.find(c=>c.name===_pc.name); if(_bc){ $('msg').value='';autoResize(); @@ -336,6 +336,8 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){ let assistantText=''; let reasoningText=''; let liveReasoningText=''; + let _latestGoalStatus=null; + let _pendingGoalContinuation=null; let assistantRow=null; let assistantBody=null; let segmentStart=0; // char offset in assistantText where current segment begins @@ -879,6 +881,41 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){ }catch(_){} }); + source.addEventListener('goal',e=>{ + try{ + const d=JSON.parse(e.data||'{}'); + if((d.session_id||activeSid)!==activeSid) return; + const goalState=String(d.state||'').trim(); + const goalEvaluatingMessage='Evaluating goal progress…'; + if(goalState==='evaluating'){ + setComposerStatus(goalEvaluatingMessage); + return; + } + const msg=String(d.message||'').trim(); + if(!msg)return; + _latestGoalStatus={message:msg,decision:d.decision||null,state:goalState||null}; + setComposerStatus(msg); + showToast(msg.split('\n')[0],2600); + }catch(_){} + }); + + source.addEventListener('goal_continue',e=>{ + try{ + const d=JSON.parse(e.data||'{}'); + const sid=d.session_id||activeSid; + const continuation_prompt=String(d.continuation_prompt||d.text||'').trim(); + if(!continuation_prompt||sid!==activeSid)return; + _pendingGoalContinuation={ + sid, + text:continuation_prompt, + model:S.session&&S.session.model||'', + model_provider:S.session&&S.session.model_provider||null, + profile:S.activeProfile||'default', + }; + showToast('Continuing toward goal…',2200); + }catch(_){} + }); + source.addEventListener('done',e=>{ _terminalStateReached=true; if(_persistTimer){clearTimeout(_persistTimer);_persistTimer=null;} @@ -992,6 +1029,15 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){ const lastUser=[...S.messages].reverse().find(m=>m.role==='user'); if(lastUser)lastUser.attachments=uploaded; } + if(_latestGoalStatus&&_latestGoalStatus.message){ + S.messages.push({ + role:'assistant', + content:String(_latestGoalStatus.message), + _ts:Date.now()/1000, + _goalStatus:true, + _transient:true, + }); + } clearLiveToolCards(); S.busy=false; // No-reply guard (#373): if agent returned nothing, show inline error @@ -1003,6 +1049,18 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){ // TTS auto-read: speak the last assistant response if enabled (#499) if(typeof autoReadLastAssistant==='function') setTimeout(()=>autoReadLastAssistant(), 300); } + if(isActiveSession&&_pendingGoalContinuation&&typeof queueSessionMessage==='function'){ + const _goalNext=_pendingGoalContinuation; + _pendingGoalContinuation=null; + queueSessionMessage(_goalNext.sid,{ + text:_goalNext.text, + files:[], + model:_goalNext.model, + model_provider:_goalNext.model_provider, + profile:_goalNext.profile, + }); + if(typeof updateQueueBadge==='function')updateQueueBadge(_goalNext.sid); + } if(isActiveSession) _queueDrainSid=activeSid; renderSessionList(); _setActivePaneIdleIfOwner(); diff --git a/tests/test_goal_command_webui.py b/tests/test_goal_command_webui.py new file mode 100644 index 00000000..e1bca59b --- /dev/null +++ b/tests/test_goal_command_webui.py @@ -0,0 +1,272 @@ +"""Regression tests for first-class WebUI /goal command parity.""" + +import io +import json +from pathlib import Path +from types import SimpleNamespace + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[1] +COMMANDS_JS = (REPO_ROOT / "static" / "commands.js").read_text(encoding="utf-8") +MESSAGES_JS = (REPO_ROOT / "static" / "messages.js").read_text(encoding="utf-8") +ROUTES_PY = (REPO_ROOT / "api" / "routes.py").read_text(encoding="utf-8") +STREAMING_PY = (REPO_ROOT / "api" / "streaming.py").read_text(encoding="utf-8") + + +def test_goal_command_payload_matches_gateway_controls(monkeypatch): + """The backend command helper mirrors gateway /goal status/pause/resume/clear/set.""" + from api import goals as webui_goals + + calls = [] + + class FakeState: + goal = "ship the feature" + status = "active" + turns_used = 0 + max_turns = 20 + last_verdict = None + last_reason = None + paused_reason = None + + class FakeGoalManager: + def __init__(self, session_id, default_max_turns=20): + calls.append(("init", session_id, default_max_turns)) + self.state = None + + def status_line(self): + return "No active goal. Set one with /goal ." + + def pause(self, reason="user-paused"): + calls.append(("pause", reason)) + return FakeState() + + def resume(self, reset_budget=True): + calls.append(("resume", reset_budget)) + return FakeState() + + def has_goal(self): + return True + + def clear(self): + calls.append(("clear",)) + + def set(self, goal): + calls.append(("set", goal)) + state = FakeState() + state.goal = goal + self.state = state + return state + + monkeypatch.setattr(webui_goals, "GoalManager", FakeGoalManager) + monkeypatch.setattr(webui_goals, "_default_max_turns", lambda: 20) + + status = webui_goals.goal_command_payload("sid-123", "status") + pause = webui_goals.goal_command_payload("sid-123", "pause") + resume = webui_goals.goal_command_payload("sid-123", "resume") + clear = webui_goals.goal_command_payload("sid-123", "clear") + set_goal = webui_goals.goal_command_payload("sid-123", "ship the feature") + + assert status["message"] == "No active goal. Set one with /goal ." + assert pause["message"] == "⏸ Goal paused: ship the feature" + assert resume["message"].startswith("▶ Goal resumed: ship the feature") + assert clear["message"] == "Goal cleared." + assert set_goal["action"] == "set" + assert set_goal["kickoff_prompt"] == "ship the feature" + assert "⊙ Goal set (20-turn budget): ship the feature" in set_goal["message"] + assert ("set", "ship the feature") in calls + + +def test_goal_command_payload_rejects_new_goal_while_stream_running(monkeypatch): + """Status/control subcommands are safe mid-run; replacing the goal is not.""" + from api import goals as webui_goals + + class FakeGoalManager: + def __init__(self, session_id, default_max_turns=20): + pass + + def status_line(self): + return "⊙ Goal (active, 1/20 turns): existing" + + monkeypatch.setattr(webui_goals, "GoalManager", FakeGoalManager) + monkeypatch.setattr(webui_goals, "_default_max_turns", lambda: 20) + + status = webui_goals.goal_command_payload("sid-123", "status", stream_running=True) + rejected = webui_goals.goal_command_payload("sid-123", "replace it", stream_running=True) + + assert status["ok"] is True + assert rejected["ok"] is False + assert rejected["error"] == "agent_running" + assert "use /goal status / pause / clear mid-run" in rejected["message"] + + +def test_has_active_goal_reports_only_active_state(monkeypatch): + """Streaming can avoid showing an evaluating spinner when no standing goal is active.""" + from api import goals as webui_goals + + class FakeGoalManager: + def __init__(self, session_id, default_max_turns=20): + self.session_id = session_id + + def is_active(self): + return self.session_id == "sid-active-goal" + + monkeypatch.setattr(webui_goals, "GoalManager", FakeGoalManager) + monkeypatch.setattr(webui_goals, "_default_max_turns", lambda: 20) + + assert webui_goals.has_active_goal("sid-active-goal") is True + assert webui_goals.has_active_goal("sid-idle-goal") is False + assert webui_goals.has_active_goal("") is False + + +def test_goal_continuation_decision_emits_status_and_normal_user_prompt(monkeypatch): + """Post-turn hook returns the visible status event plus a normal continuation prompt.""" + from api import goals as webui_goals + + class FakeGoalManager: + def __init__(self, session_id, default_max_turns=20): + self.session_id = session_id + + def is_active(self): + return True + + def evaluate_after_turn(self, last_response, user_initiated=True): + return { + "status": "active", + "should_continue": True, + "continuation_prompt": "[Continuing toward your standing goal]\nGoal: ship it", + "verdict": "continue", + "reason": "one step remains", + "message": "↻ Continuing toward goal (1/20): one step remains", + } + + monkeypatch.setattr(webui_goals, "GoalManager", FakeGoalManager) + monkeypatch.setattr(webui_goals, "_default_max_turns", lambda: 20) + + decision = webui_goals.evaluate_goal_after_turn("sid-123", "not done yet", user_initiated=False) + + assert decision["message"].startswith("↻ Continuing toward goal") + assert decision["should_continue"] is True + assert decision["continuation_prompt"].startswith("[Continuing toward your standing goal]") + + +def test_goal_endpoint_sets_goal_and_starts_kickoff_stream(monkeypatch, tmp_path): + """POST /api/goal uses GoalManager state and launches the first goal turn.""" + from api import goals as webui_goals + from api import routes + + class FakeState: + goal = "ship the feature" + status = "active" + turns_used = 0 + max_turns = 20 + last_verdict = None + last_reason = None + paused_reason = None + + class FakeGoalManager: + def __init__(self, session_id, default_max_turns=20): + self.session_id = session_id + self.default_max_turns = default_max_turns + + def set(self, goal): + state = FakeState() + state.goal = goal + return state + + class FakeSession: + session_id = "sid-goal-route" + profile = "default" + workspace = str(tmp_path) + model = "gpt-5.5" + model_provider = "openai-codex" + messages = [] + context_messages = [] + pending_user_message = None + active_stream_id = None + + monkeypatch.setattr(webui_goals, "GoalManager", FakeGoalManager) + monkeypatch.setattr(routes, "get_session", lambda sid: FakeSession()) + monkeypatch.setattr(routes, "resolve_trusted_workspace", lambda workspace: tmp_path) + monkeypatch.setattr( + routes, + "_resolve_compatible_session_model_state", + lambda model, provider: (model, provider, False), + ) + started = [] + + def fake_start(session, **kwargs): + started.append(kwargs) + return {"stream_id": "goal-stream", "session_id": session.session_id, "pending_started_at": 123.0} + + monkeypatch.setattr(routes, "_start_chat_stream_for_session", fake_start) + monkeypatch.setattr(routes, "j", lambda handler, payload, status=200, **kwargs: {"status": status, "payload": payload}) + + result = routes._handle_goal_command( + object(), + { + "session_id": "sid-goal-route", + "args": "ship the feature", + "workspace": str(tmp_path), + "model": "gpt-5.5", + "model_provider": "openai-codex", + }, + ) + + assert result["status"] == 200 + assert result["payload"]["action"] == "set" + assert result["payload"]["stream_id"] == "goal-stream" + assert started and started[0]["msg"] == "ship the feature" + assert started[0]["model_provider"] == "openai-codex" + + +def test_routes_register_goal_endpoint_and_kickoff_stream(): + assert 'if parsed.path == "/api/goal"' in ROUTES_PY + assert "return _handle_goal_command(handler, body)" in ROUTES_PY + assert "goal_command_payload" in ROUTES_PY + assert "kickoff_prompt" in ROUTES_PY + assert "_start_chat_stream_for_session" in ROUTES_PY + + +def test_streaming_post_turn_goal_hook_surfaces_and_continues(): + assert "evaluate_goal_after_turn" in STREAMING_PY + assert "put('goal'" in STREAMING_PY + assert "decision.get('should_continue')" in STREAMING_PY + assert "continuation_prompt" in STREAMING_PY + assert "put('goal_continue'" in STREAMING_PY + goal_idx = STREAMING_PY.find("evaluate_goal_after_turn") + done_idx = STREAMING_PY.find("put('done'", goal_idx) + assert goal_idx != -1 and done_idx != -1 + assert goal_idx < done_idx, "goal status should be emitted before the terminal done payload" + + +def test_streaming_goal_hook_emits_evaluating_state_before_judge(): + evaluating_idx = STREAMING_PY.find("'state': 'evaluating'") + judge_idx = STREAMING_PY.find("_goal_decision = evaluate_goal_after_turn") + done_idx = STREAMING_PY.find("put('done'", judge_idx) + assert evaluating_idx != -1, "goal hook should emit an evaluating state before judge round-trip" + assert judge_idx != -1 and done_idx != -1 + assert evaluating_idx < judge_idx < done_idx + assert "Evaluating goal progress…" in STREAMING_PY + assert "'state': 'continuing' if decision.get('should_continue') else 'idle'" in STREAMING_PY + + +def test_frontend_has_goal_slash_command_and_status_event_handler(): + assert "{name:'goal'" in COMMANDS_JS + assert "subArgs:['status','pause','resume','clear']" in COMMANDS_JS + assert "function cmdGoal" in COMMANDS_JS + assert "api('/api/goal'" in COMMANDS_JS + assert "stream_id" in COMMANDS_JS + assert "goal'" in MESSAGES_JS + assert "source.addEventListener('goal'" in MESSAGES_JS + assert "source.addEventListener('goal_continue'" in MESSAGES_JS + assert "['steer','interrupt','queue','terminal','goal'].includes(_pc.name)" in MESSAGES_JS + assert "queueSessionMessage" in MESSAGES_JS + + +def test_frontend_goal_evaluating_state_uses_calm_composer_indicator(): + assert "const goalState=String(d.state||'').trim();" in MESSAGES_JS + assert "const goalEvaluatingMessage='Evaluating goal progress…';" in MESSAGES_JS + assert "if(goalState==='evaluating')" in MESSAGES_JS + assert "setComposerStatus(goalEvaluatingMessage);" in MESSAGES_JS + assert "return;" in MESSAGES_JS diff --git a/tests/test_issue1857_usage_overwrite.py b/tests/test_issue1857_usage_overwrite.py index 5e392b56..948fd332 100644 --- a/tests/test_issue1857_usage_overwrite.py +++ b/tests/test_issue1857_usage_overwrite.py @@ -3,6 +3,9 @@ import sys import types from unittest import mock +# Sentinel for sys.modules save/restore — distinguishes "key wasn't there" from None. +_MISSING = object() + def test_stream_completion_overwrites_session_usage_with_latest_turn(cleanup_test_sessions): """#1857: completed turns must not add prompt tokens to stale session totals.""" @@ -125,27 +128,40 @@ def test_stream_completion_overwrites_session_usage_with_latest_turn(cleanup_tes fake_hermes_state = types.ModuleType("hermes_state") fake_hermes_state.SessionDB = mock.Mock(return_value=None) - with mock.patch.object(streaming, "get_session", return_value=fake_session), \ - mock.patch.object(streaming, "_get_ai_agent", return_value=UsageAgent), \ - mock.patch.object(streaming, "resolve_model_provider", return_value=("gpt-5.4", "openai", None)), \ - mock.patch("api.config.get_config", return_value={}), \ - mock.patch("api.config._resolve_cli_toolsets", return_value=[]), \ - mock.patch.dict( - sys.modules, - { - "hermes_cli": fake_hermes_cli, - "hermes_cli.runtime_provider": fake_runtime_module, - "hermes_state": fake_hermes_state, - }, - ): - streaming.STREAMS[fake_stream_id] = fake_queue - streaming._run_agent_streaming( - session_id=fake_session.session_id, - msg_text="new turn", - model="gpt-5.4", - workspace="/tmp", - stream_id=fake_stream_id, - ) + # NOTE: We deliberately avoid mock.patch.dict(sys.modules, ...) here. + # patch.dict tracks original keys at __enter__ and on __exit__ DELETES any + # keys added during the patch that weren't in the original snapshot. That + # silently evicts lazily-imported submodules (e.g. pydantic.root_model) + # that other tests rely on, producing KeyError: 'pydantic.root_model' in + # downstream tests (notably tests/test_mcp_server.py via fastmcp imports). + # Manual save/restore only touches the three keys we explicitly inject. + _injected = { + "hermes_cli": fake_hermes_cli, + "hermes_cli.runtime_provider": fake_runtime_module, + "hermes_state": fake_hermes_state, + } + _saved = {k: sys.modules.get(k, _MISSING) for k in _injected} + sys.modules.update(_injected) + try: + with mock.patch.object(streaming, "get_session", return_value=fake_session), \ + mock.patch.object(streaming, "_get_ai_agent", return_value=UsageAgent), \ + mock.patch.object(streaming, "resolve_model_provider", return_value=("gpt-5.4", "openai", None)), \ + mock.patch("api.config.get_config", return_value={}), \ + mock.patch("api.config._resolve_cli_toolsets", return_value=[]): + streaming.STREAMS[fake_stream_id] = fake_queue + streaming._run_agent_streaming( + session_id=fake_session.session_id, + msg_text="new turn", + model="gpt-5.4", + workspace="/tmp", + stream_id=fake_stream_id, + ) + finally: + for k, prev in _saved.items(): + if prev is _MISSING: + sys.modules.pop(k, None) + else: + sys.modules[k] = prev assert fake_session.input_tokens == 123 assert fake_session.output_tokens == 45 diff --git a/tests/test_mcp_server.py b/tests/test_mcp_server.py new file mode 100644 index 00000000..98e2a0b2 --- /dev/null +++ b/tests/test_mcp_server.py @@ -0,0 +1,925 @@ +"""Tests for mcp_server.py — Option A rewrite (Issue #1616). + +Covers: project CRUD, profile scoping, title collision, color validation, +session listing, cross-profile isolation. + +Uses HERMES_WEBUI_STATE_DIR env var to point to a temp directory, +so tests don't touch the real webui state. Module is re-imported +per test class to ensure clean state. +""" + +import json +import os +import sys +import tempfile +import uuid +from pathlib import Path + +import pytest + +# Skip the entire module when the optional `mcp` package isn't installed. +# CI runs with stdlib-only deps (pyyaml + pytest + pytest-timeout), and the +# `mcp` package is only required for users who actually run the MCP server. +# Locally with `pip install mcp pytest-asyncio` these tests run; on CI they +# skip cleanly without breaking the matrix. +pytest.importorskip("mcp", reason="mcp package not installed (optional MCP server dep)") + +# pytest-asyncio is also optional but always installed alongside mcp tests +# in our local runs. If absent, importorskip the asyncio plugin gracefully. +pytest.importorskip("pytest_asyncio", reason="pytest-asyncio required for MCP server tests") + +pytestmark = pytest.mark.asyncio + +# ── Ensure repo root on path ────────────────────────────────────────────── +_REPO = Path(__file__).parent.parent.resolve() +if str(_REPO) not in sys.path: + sys.path.insert(0, str(_REPO)) + + +# ═══════════════════════════════════════════════════════════════════════════ +# State-restore bookkeeping +# ═══════════════════════════════════════════════════════════════════════════ +# +# These tests mutate module-level constants on api.config / mcp_server / +# api.models (STATE_DIR, SESSION_DIR, PROJECTS_FILE, …) so the MCP server +# reads from a tmpdir. Without restoration, downstream tests in the full +# suite (test_pytest_state_isolation, test_provider_quota_status, +# test_provider_management, etc.) read the now-deleted tmpdir from +# api.config.STATE_DIR and fail. +# +# We snapshot the original values on first _reimport_mcp() call and restore +# them in _cleanup_state_dir() so the post-test module state matches pre-test. + +_MISSING_ENV = object() +_SAVED_CONSTANTS = {"captured": False} + + +# ═══════════════════════════════════════════════════════════════════════════ +# Helpers +# ═══════════════════════════════════════════════════════════════════════════ + +def _fresh_state_dir(): + """Create a clean temp state dir and set HERMES_WEBUI_STATE_DIR.""" + td = tempfile.mkdtemp() + state_dir = Path(td) + sessions_dir = state_dir / "sessions" + sessions_dir.mkdir(parents=True) + (state_dir / "projects.json").write_text("[]", encoding="utf-8") + (sessions_dir / "_index.json").write_text("[]", encoding="utf-8") + os.environ["HERMES_WEBUI_STATE_DIR"] = str(state_dir) + return state_dir + + + +def _cleanup_state_dir(state_dir: Path): + """Remove temp state dir, clear env var, and restore api.config/mcp_server + module constants to whatever they were before the fixture started. + + Without restoring, subsequent tests (test_pytest_state_isolation, + test_provider_quota_status, test_provider_management, etc.) read the + fixture's tmpdir from `api.config.STATE_DIR` and fail because the path + no longer exists or doesn't match their pytest-managed state dir.""" + import shutil + shutil.rmtree(state_dir, ignore_errors=True) + os.environ.pop("HERMES_WEBUI_STATE_DIR", None) + + # Restore api.config / mcp_server / api.models module constants. + saved = _SAVED_CONSTANTS + if saved.get("captured"): + import api.config as _cfg + for attr, val in saved["api.config"].items(): + setattr(_cfg, attr, val) + if "mcp_server" in sys.modules: + mcp_mod = sys.modules["mcp_server"] + for attr, val in saved["mcp_server"].items(): + setattr(mcp_mod, attr, val) + if "api.models" in sys.modules: + models_mod = sys.modules["api.models"] + for attr, val in saved["api.models"].items(): + setattr(models_mod, attr, val) + # Restore HERMES_BASE_HOME / HERMES_HOME if we changed them + for env_key, env_val in saved["env"].items(): + if env_val is _MISSING_ENV: + os.environ.pop(env_key, None) + else: + os.environ[env_key] = env_val + +def _reimport_mcp(): + """Re-point mcp_server's module-level STATE_DIR / SESSION_DIR / + SESSION_INDEX_FILE / PROJECTS_FILE constants at the current + HERMES_WEBUI_STATE_DIR. + + Returns (mcp_module, profiles_module) — profiles_module is the + live api.profiles reference. + + NOTE: Does NOT use `del sys.modules[...]` or `importlib.reload(...)`. + Both patterns trigger a chain re-import inside the FastMCP / pydantic + stack that corrupts pydantic's `_generics._GENERIC_TYPES_CACHE` + (manifests as `KeyError: 'pydantic.root_model'` in unrelated + downstream tests in the full suite). Instead, we mutate the + constants in-place after the first one-time import, which is + behaviorally equivalent for these tests since the constants are + module-level Path objects used only to compute STATE_DIR-rooted + paths at call time. + + Also normalizes HERMES_BASE_HOME / HERMES_HOME to point at a + directory whose `profiles/` subdirectory we control. This isolates + us from sibling test files (e.g. test_profile_path_security.py) + that mutate those env vars during their own setup and don't restore + them in the strict sense the active-profile path resolution needs. + """ + state_dir = Path(os.environ['HERMES_WEBUI_STATE_DIR']) + + # Sibling test files (e.g. test_profile_path_security.py) mutate + # HERMES_BASE_HOME / HERMES_HOME but only restore sys.modules — the + # env vars stay pointing at their tmpdir, which then breaks our + # active-profile path resolution. Re-anchor at a local home dir + # under our state_dir so other-profile scoping works. + isolated_home = state_dir.parent / "hermes-home" + (isolated_home / "profiles").mkdir(parents=True, exist_ok=True) + + # Snapshot env vars BEFORE we overwrite them, so _cleanup_state_dir + # can restore them at fixture exit. + if not _SAVED_CONSTANTS.get("captured"): + _SAVED_CONSTANTS["env"] = { + "HERMES_BASE_HOME": os.environ.get("HERMES_BASE_HOME", _MISSING_ENV), + "HERMES_HOME": os.environ.get("HERMES_HOME", _MISSING_ENV), + } + + os.environ["HERMES_BASE_HOME"] = str(isolated_home) + os.environ["HERMES_HOME"] = str(isolated_home) + + import api.config as cfg + import mcp_server as mod + + # First-time snapshot of module constants — captured AFTER the imports + # land their original values but BEFORE we mutate them below. + if not _SAVED_CONSTANTS.get("captured"): + _SAVED_CONSTANTS["api.config"] = { + attr: getattr(cfg, attr) + for attr in ("STATE_DIR", "SESSION_DIR", "WORKSPACES_FILE", + "SETTINGS_FILE", "LAST_WORKSPACE_FILE", "PROJECTS_FILE", + "SESSION_INDEX_FILE") + if hasattr(cfg, attr) + } + _SAVED_CONSTANTS["mcp_server"] = { + attr: getattr(mod, attr) + for attr in ("STATE_DIR", "SESSION_DIR", "PROJECTS_FILE", + "SESSION_INDEX_FILE", "WEBUI_HOST", "WEBUI_PORT", + "WEBUI_URL") + if hasattr(mod, attr) + } + if "api.models" in sys.modules: + models_mod = sys.modules["api.models"] + _SAVED_CONSTANTS["api.models"] = { + attr: getattr(models_mod, attr) + for attr in ("STATE_DIR", "PROJECTS_FILE", "SESSION_DIR") + if hasattr(models_mod, attr) + } + else: + _SAVED_CONSTANTS["api.models"] = {} + _SAVED_CONSTANTS["captured"] = True + + # Acquire the api.profiles module THAT mcp_server's bound functions read. + # Sibling tests (test_profile_path_security.py) deletes api.profiles from + # sys.modules during their setup, then restores the originally-saved + # module reference. The result is that `import api.profiles` returns + # whatever module is currently in sys.modules, which may NOT be the same + # object as `mcp_server.get_active_profile_name`'s closure reference. + # We need to mutate the closure-bound module so mcp_server sees our + # _active_profile assignment. + import api.profiles as fresh_profiles_via_import + # mcp_server.get_active_profile_name is bound at first-import time and + # reads `_active_profile` from its own module's globals via closure. + # That module is the function's __globals__["__name__"] entry in + # sys.modules at first-import time. The most reliable way to find it + # is to follow the bound function back to its module. + bound_get_active = mod.get_active_profile_name + bound_module_name = bound_get_active.__module__ + # Grab whatever Python currently has registered for that name; it may + # or may not be the same object as fresh_profiles_via_import. + # Use the function's __globals__ directly — that's the actual closure + # the function uses for its module-level reads. + bound_globals = bound_get_active.__globals__ + # bound_globals IS the dict from sys.modules[].__dict__ at + # first-import time. Mutating it propagates to all bound functions. + fresh_profiles = sys.modules.get(bound_module_name) + if fresh_profiles is None or fresh_profiles.__dict__ is not bound_globals: + # Sibling tests left a different module in sys.modules. The bound + # functions still use the original globals dict, so we expose a + # ModuleType-like proxy that writes to the original dict. + class _ProxyModule: + def __init__(self, globs): + self.__dict__ = globs + fresh_profiles = _ProxyModule(bound_globals) + + # Re-point api.config module-level constants + cfg.STATE_DIR = state_dir + cfg.SESSION_DIR = state_dir / "sessions" + cfg.WORKSPACES_FILE = state_dir / "workspaces.json" + cfg.SETTINGS_FILE = state_dir / "settings.json" + cfg.LAST_WORKSPACE_FILE = state_dir / "last_workspace.txt" + cfg.PROJECTS_FILE = state_dir / "projects.json" + if hasattr(cfg, 'SESSION_INDEX_FILE'): + cfg.SESSION_INDEX_FILE = state_dir / "sessions" / "_index.json" + + # Re-point mcp_server's imported aliases (they were copied at first + # import and don't pick up cfg mutations automatically). + mod.STATE_DIR = cfg.STATE_DIR + mod.SESSION_DIR = cfg.SESSION_DIR + mod.PROJECTS_FILE = cfg.PROJECTS_FILE + if hasattr(mod, 'SESSION_INDEX_FILE'): + mod.SESSION_INDEX_FILE = cfg.SESSION_INDEX_FILE + + # api.models also imports STATE_DIR / PROJECTS_FILE etc. as module + # constants — re-point those too so load_projects() / save_projects() + # see the fresh STATE_DIR. + if 'api.models' in sys.modules: + models_mod = sys.modules['api.models'] + if hasattr(models_mod, 'STATE_DIR'): + models_mod.STATE_DIR = cfg.STATE_DIR + if hasattr(models_mod, 'PROJECTS_FILE'): + models_mod.PROJECTS_FILE = cfg.PROJECTS_FILE + if hasattr(models_mod, 'SESSION_DIR'): + models_mod.SESSION_DIR = cfg.SESSION_DIR + + # Re-evaluate WEBUI_URL from current env (PR #1895 made it env-aware + # but the value is computed once at module load; tests need to see + # current env state). + mod.WEBUI_HOST = os.environ.get("HERMES_WEBUI_HOST", "127.0.0.1") + mod.WEBUI_PORT = os.environ.get("HERMES_WEBUI_PORT", "8787") + mod.WEBUI_URL = f"http://{mod.WEBUI_HOST}:{mod.WEBUI_PORT}" + + fresh_profiles._active_profile = 'default' + + # Invalidate the root-profile cache (set at module load to detect + # renamed-root profiles, but stale after sibling tests that called + # switch_profile / list_profiles_api in their own setup). + if hasattr(fresh_profiles, '_invalidate_root_profile_cache'): + fresh_profiles._invalidate_root_profile_cache() + elif hasattr(fresh_profiles, '_root_profile_name_cache'): + fresh_profiles._root_profile_name_cache.clear() + fresh_profiles._root_profile_name_cache.add('default') + if hasattr(fresh_profiles, '_root_profile_name_cache_loaded'): + fresh_profiles._root_profile_name_cache_loaded = False + return mod, fresh_profiles + + +async def _call(mod, tool_name, **kwargs): + """Call a tool handler and return parsed JSON.""" + handler = mod.HANDLERS[tool_name] + result = await handler(kwargs) + return json.loads(result[0].text) + + +# ═══════════════════════════════════════════════════════════════════════════ +# Project CRUD +# ═══════════════════════════════════════════════════════════════════════════ + +class TestCreateProject: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_create_basic(self): + result = await _call(self.mod, "create_project", name="Test Project") + assert "project_id" in result + assert result["name"] == "Test Project" + assert result["profile"] == "default" + assert result["session_count"] == 0 + + async def test_create_with_color(self): + result = await _call(self.mod, "create_project", + name="Colored", color="#ff6600") + assert result["color"] == "#ff6600" + + async def test_create_duplicate_exact_match(self): + await _call(self.mod, "create_project", name="My Project") + result = await _call(self.mod, "create_project", name="My Project") + assert "error" in result + assert "already exists" in result["error"] + + async def test_create_case_sensitive_no_collision(self): + """Exact match: 'MY project' and 'My Project' are different.""" + await _call(self.mod, "create_project", name="My Project") + result = await _call(self.mod, "create_project", name="MY project") + assert "project_id" in result + + async def test_create_empty_name(self): + result = await _call(self.mod, "create_project", name="") + assert "error" in result + + async def test_create_invalid_color(self): + result = await _call(self.mod, "create_project", + name="Bad", color="not-a-color") + assert "error" in result + assert "Invalid color" in result["error"] + + async def test_create_valid_color_formats(self): + for color in ["#fff", "#ff6600", "#ff6600aa"]: + result = await _call(self.mod, "create_project", + name=f"Color-{color}", color=color) + assert result["color"] == color + + +class TestRenameProject: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_rename_basic(self): + created = await _call(self.mod, "create_project", name="Old") + pid = created["project_id"] + result = await _call(self.mod, "rename_project", + project_id=pid, name="New") + assert result["name"] == "New" + assert result["project_id"] == pid + + async def test_rename_with_color(self): + created = await _call(self.mod, "create_project", name="X") + result = await _call(self.mod, "rename_project", + project_id=created["project_id"], + name="X", color="#000") + assert result["color"] == "#000" + + async def test_rename_not_found(self): + result = await _call(self.mod, "rename_project", + project_id="nonexistent", name="Nope") + assert "error" in result + + async def test_rename_wrong_profile(self): + created = await _call(self.mod, "create_project", name="DefaultOwned") + pid = created["project_id"] + self.profiles._active_profile = 'other' + result = await _call(self.mod, "rename_project", + project_id=pid, name="Stolen") + assert "error" in result + assert "not found" in result["error"].lower() + + +class TestDeleteProject: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_delete_basic(self): + created = await _call(self.mod, "create_project", name="ToDelete") + pid = created["project_id"] + result = await _call(self.mod, "delete_project", project_id=pid) + assert result["ok"] is True + assert result["deleted"] == "ToDelete" + + async def test_delete_not_found(self): + result = await _call(self.mod, "delete_project", + project_id="nonexistent") + assert "error" in result + + async def test_delete_wrong_profile(self): + created = await _call(self.mod, "create_project", name="Owned") + pid = created["project_id"] + self.profiles._active_profile = 'other' + result = await _call(self.mod, "delete_project", project_id=pid) + assert "error" in result + + async def test_delete_no_auth_refuses_unassign(self): + """Without HERMES_WEBUI_PASSWORD, delete_project must NOT touch + session JSONs. Direct FS writes would bypass _write_session_index() + and leave _index.json holding the stale project_id, causing a + running WebUI to keep grouping sessions under the deleted project. + + The handler should: delete the project from projects.json, leave + every session JSON untouched, leave the index untouched, and + surface a `warning` field telling the operator to set the env var. + """ + from api.config import SESSION_DIR, SESSION_INDEX_FILE + os.environ.pop("HERMES_WEBUI_PASSWORD", None) + + # Create project + a session JSON that points at it + created = await _call(self.mod, "create_project", name="ToDelete") + pid = created["project_id"] + sid = "test_sess_001" + session_path = SESSION_DIR / f"{sid}.json" + session_payload = { + "session_id": sid, + "title": "T", + "project_id": pid, + "messages": [], + } + session_path.write_text(json.dumps(session_payload), encoding="utf-8") + # Index references the session under the project + SESSION_INDEX_FILE.write_text( + json.dumps([{"session_id": sid, "project_id": pid, "title": "T"}]), + encoding="utf-8") + index_before = SESSION_INDEX_FILE.read_text(encoding="utf-8") + session_before = session_path.read_text(encoding="utf-8") + + result = await _call(self.mod, "delete_project", project_id=pid) + + assert result["ok"] is True + assert result["unassigned_sessions"] == 0 + assert "warning" in result + assert "HERMES_WEBUI_PASSWORD" in result["warning"] + # Session JSON untouched + assert session_path.read_text(encoding="utf-8") == session_before + # Index untouched + assert SESSION_INDEX_FILE.read_text(encoding="utf-8") == index_before + + +# ═══════════════════════════════════════════════════════════════════════════ +# Profile Scoping +# ═══════════════════════════════════════════════════════════════════════════ + +class TestProfileScoping: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_projects_tagged_with_profile(self): + result = await _call(self.mod, "create_project", name="Tagged") + assert result["profile"] == "default" + + async def test_list_projects_respects_profile(self): + # Create under default + await _call(self.mod, "create_project", name="DefaultProject") + + # Switch to other + self.profiles._active_profile = 'other' + await _call(self.mod, "create_project", name="OtherProject") + + # List should only show current profile's projects + projects = await _call(self.mod, "list_projects") + names = [p["name"] for p in projects] + assert "OtherProject" in names + assert "DefaultProject" not in names + + # Switch back + self.profiles._active_profile = 'default' + projects = await _call(self.mod, "list_projects") + names = [p["name"] for p in projects] + assert "DefaultProject" in names + assert "OtherProject" not in names + + async def test_cross_profile_isolation_create(self): + """Same name in different profiles should be allowed.""" + await _call(self.mod, "create_project", name="Shared") + self.profiles._active_profile = 'other' + result = await _call(self.mod, "create_project", name="Shared") + assert "project_id" in result + + async def test_legacy_untagged_hidden_from_non_root_profile(self): + """Untagged projects (no `profile` field) belong to the root profile. + + Mirrors api/routes.py:_profiles_match where a missing profile coerces + to 'default'. A non-root profile must NOT see legacy untagged rows. + """ + # Manually write a legacy untagged project (pre-#1614 schema) + import api.config as _cfg_mod + PROJECTS_FILE = _cfg_mod.PROJECTS_FILE + legacy = [{ + "project_id": "legacy000001", + "name": "LegacyUntagged", + "color": None, + "created_at": 1700000000.0, + # No "profile" field on purpose + }] + PROJECTS_FILE.write_text(json.dumps(legacy), encoding="utf-8") + + # Non-root profile must NOT see it + self.profiles._active_profile = 'other' + projects = await _call(self.mod, "list_projects") + names = [p["name"] for p in projects] + assert "LegacyUntagged" not in names + + # Root profile still sees it (load_projects backfills `profile` + # to 'default', so visibility is preserved for the root). + self.profiles._active_profile = 'default' + projects = await _call(self.mod, "list_projects") + names = [p["name"] for p in projects] + assert "LegacyUntagged" in names + + async def test_legacy_untagged_rename_blocked_from_non_root(self): + """Non-root profile cannot rename a legacy untagged project.""" + import api.config as _cfg_mod + PROJECTS_FILE = _cfg_mod.PROJECTS_FILE + legacy = [{ + "project_id": "legacy000002", + "name": "Legacy", + "color": None, + "created_at": 1700000000.0, + }] + PROJECTS_FILE.write_text(json.dumps(legacy), encoding="utf-8") + self.profiles._active_profile = 'other' + result = await _call(self.mod, "rename_project", + project_id="legacy000002", name="Stolen") + assert "error" in result + + +# ═══════════════════════════════════════════════════════════════════════════ +# Session listing +# ═══════════════════════════════════════════════════════════════════════════ + +class TestListSessions: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_list_empty(self): + result = await _call(self.mod, "list_sessions") + assert result == [] + + async def test_list_with_limit(self): + result = await _call(self.mod, "list_sessions", limit=10) + assert isinstance(result, list) + + async def test_list_unassigned(self): + result = await _call(self.mod, "list_sessions", unassigned=True) + assert isinstance(result, list) + + +# ═══════════════════════════════════════════════════════════════════════════ +# Session mutations (HTTP API — basic validation only) +# ═══════════════════════════════════════════════════════════════════════════ + +class TestSessionMutations: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_rename_missing_args(self): + result = await _call(self.mod, "rename_session", + session_id="", title="") + assert "error" in result + + async def test_move_missing_args(self): + result = await _call(self.mod, "move_session", + session_id="", project_id="x") + assert "error" in result + + async def test_move_project_not_found(self): + result = await _call(self.mod, "move_session", + session_id="s1", project_id="nonexistent") + assert "error" in result + + async def test_move_target_owned_by_other_profile_rejected(self): + """A project owned by profile A is invisible to profile B (#1614).""" + created = await _call(self.mod, "create_project", name="ATarget") + pid = created["project_id"] + self.profiles._active_profile = 'other' + result = await _call(self.mod, "move_session", + session_id="any", project_id=pid) + assert "error" in result + assert "not found" in result["error"].lower() + + +# ═══════════════════════════════════════════════════════════════════════════ +# Auth helper +# ═══════════════════════════════════════════════════════════════════════════ + +class TestApiPassword: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + # Ensure env var is unset for the test + os.environ.pop("HERMES_WEBUI_PASSWORD", None) + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_no_env_no_settings_returns_none(self): + assert self.mod._api_password() is None + + async def test_password_hash_in_settings_is_ignored(self): + """settings.json holds a hash, not a plaintext password — must NOT + be returned as if it were a usable password.""" + from api.config import STATE_DIR as _SD + (_SD / "settings.json").write_text( + json.dumps({"password_hash": "$2b$12$abcdefghijk"}), + encoding="utf-8") + assert self.mod._api_password() is None + + async def test_env_var_returned(self): + os.environ["HERMES_WEBUI_PASSWORD"] = "secret123" + try: + assert self.mod._api_password() == "secret123" + finally: + os.environ.pop("HERMES_WEBUI_PASSWORD", None) + + +# ═══════════════════════════════════════════════════════════════════════════ +# _profiles_match parity (mcp_server vs api.routes vs api.profiles) +# ═══════════════════════════════════════════════════════════════════════════ +# +# Locks the canonical-helper relocation: mcp_server.py and api/routes.py both +# now import _profiles_match from api/profiles.py. If anyone re-introduces a +# local copy in either module, both the identity check and the input-matrix +# parametrize trip immediately. + +async def test_profiles_match_single_source_of_truth(): + """All three module names resolve to the same canonical object. + + This locks the relocation: mcp_server.py and api/routes.py both import + _profiles_match from api/profiles.py rather than carrying a local copy. + Re-introducing a local definition in either module trips this test + immediately. + + Imported here in a clean module-import context (not via _reimport_mcp, + which would re-execute api/profiles.py and produce a distinct function + object that's behaviorally identical but fails the `is` check). + + NOTE: We swap-in fresh modules but RESTORE the originals at exit so + sibling test files (test_provider_quota_status etc.) that imported + api.profiles at module-load time continue to see the same object + they already have monkeypatch handles into. Otherwise their + `monkeypatch.setattr(profiles, ...)` patches the wrong module object. + """ + # Snapshot the originals; we'll put them back at the end. + saved_modules = { + k: sys.modules[k] + for k in ('mcp_server', 'api.routes', 'api.profiles') + if k in sys.modules + } + # Also snapshot the attributes on the parent `api` package, because + # `import api.routes as r` resolves via `sys.modules['api'].routes`, + # NOT directly via sys.modules['api.routes']. If we don't restore + # the parent attribute, subsequent `import api.routes as r` calls + # bind to the fresh re-imported module even though sys.modules + # holds the original. + import api as _api_parent + saved_api_attrs = {} + for sub in ('routes', 'profiles'): + if hasattr(_api_parent, sub): + saved_api_attrs[sub] = getattr(_api_parent, sub) + + for k in ('mcp_server', 'api.routes', 'api.profiles'): + sys.modules.pop(k, None) + try: + import api.profiles as _profiles_mod + import api.routes as _routes_mod + import mcp_server as _mcp_mod + canonical = _profiles_mod._profiles_match + assert _routes_mod._profiles_match is canonical + assert _mcp_mod._profiles_match is canonical + finally: + # Restore so monkeypatch handles in sibling tests target the right module. + for k in ('mcp_server', 'api.routes', 'api.profiles'): + sys.modules.pop(k, None) + sys.modules.update(saved_modules) + # Restore parent-package attributes too (see above for why). + for sub, mod_obj in saved_api_attrs.items(): + setattr(_api_parent, sub, mod_obj) + + +@pytest.mark.parametrize("a, b", [ + (None, None), + (None, ''), + ('', None), + ('', ''), + (None, 'default'), + ('default', None), + ('default', 'default'), + ('foo', 'foo'), + ('foo', 'bar'), + ('foo', None), + (None, 'foo'), + ('default', 'foo'), + ('foo', 'default'), +]) +async def test_profiles_match_input_matrix(a, b): + """mcp_server._profiles_match agrees with api.routes._profiles_match + on every (row, active) pair across the visibility matrix. + + Note: function-object identity is checked separately in + test_profiles_match_single_source_of_truth — here we only assert + behavioral parity, which is robust to test-fixture re-imports that + clear and re-execute api.profiles.""" + from mcp_server import _profiles_match as mcp_match + from api.routes import _profiles_match as routes_match + assert mcp_match(a, b) == routes_match(a, b) + + +# ═══════════════════════════════════════════════════════════════════════════ +# --profile CLI ordering regression +# ═══════════════════════════════════════════════════════════════════════════ +# +# Maintainer ask: verify that --profile is applied to _active_profile *before* +# any api.models / api.profiles consumer reads the active profile. The risk +# is that if the canonical helpers cached the profile on first read at import +# time, a --profile foo flag passed at startup would bind too late. +# +# Today the helpers read _active_profile lazily (api/profiles.py:173 reads +# the module global at every call) so the override is safe. This test locks +# the behaviour: setting _active_profile = 'foo' before the first list call +# produces results filtered to 'foo', not the default. + +class TestProfileCliOrdering: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + self.mod, self.profiles = _reimport_mcp() + yield + _cleanup_state_dir(self.state_dir) + + async def test_active_profile_override_takes_effect_before_first_read(self): + """--profile foo must filter list_projects to foo's rows immediately. + + Simulates the CLI override path (mcp_server.py:62-64 sets + _profiles._active_profile = _profile_arg right after import). If a + helper had latched the profile at import time, the override here + would be too late and the test would see 'default'-tagged rows.""" + import api.config as _cfg_mod + PROJECTS_FILE = _cfg_mod.PROJECTS_FILE + # Pre-seed two projects: one for default, one for foo. + seeded = [ + {"project_id": "p_default_0001", "name": "DefaultRow", + "color": None, "profile": "default", "created_at": 1.0}, + {"project_id": "p_foo_0001", "name": "FooRow", + "color": None, "profile": "foo", "created_at": 2.0}, + ] + PROJECTS_FILE.write_text(json.dumps(seeded), encoding="utf-8") + + # Apply the override BEFORE the first list call. This is what + # mcp_server.py:62-64 does after argparse. + self.profiles._active_profile = 'foo' + + projects = await _call(self.mod, "list_projects") + names = [p["name"] for p in projects] + assert "FooRow" in names + assert "DefaultRow" not in names + + +# ═══════════════════════════════════════════════════════════════════════════ +# HTTP wire-format coverage for rename_session / move_session +# ═══════════════════════════════════════════════════════════════════════════ +# +# Maintainer ask: exercise the actual HTTP path so a typo in WEBUI_URL or in +# the request body shape can't slip through validation-only tests. We stand +# up a tiny http.server stub on a free localhost port, point WEBUI_URL at it, +# and capture (path, body) from the requests our handlers issue. This is +# the thing that would have caught the original 8788 vs 8787 mismatch. + +import http.server +import socket +import threading + + +class _RecordingHandler(http.server.BaseHTTPRequestHandler): + """Captures POST path + body, returns canned JSON. Class-level state is + set by the fixture before each test so handlers can cross-reference.""" + captured = None # populated per-test as a list of (path, body, headers) + canned_response = None # populated per-test: dict to be JSON-encoded + + def log_message(self, *args, **kwargs): # noqa: D401 — silence stderr + pass + + def do_POST(self): + length = int(self.headers.get("Content-Length", "0")) + raw = self.rfile.read(length) if length else b"" + try: + body = json.loads(raw.decode("utf-8")) if raw else {} + except Exception: + body = {"_raw": raw.decode("utf-8", errors="replace")} + type(self).captured.append({ + "path": self.path, + "body": body, + "cookie": self.headers.get("Cookie"), + "content_type": self.headers.get("Content-Type"), + }) + payload = json.dumps(type(self).canned_response or {}).encode("utf-8") + self.send_response(200) + self.send_header("Content-Type", "application/json") + self.send_header("Content-Length", str(len(payload))) + self.end_headers() + self.wfile.write(payload) + + +def _free_port() -> int: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + s.bind(("127.0.0.1", 0)) + port = s.getsockname()[1] + s.close() + return port + + +class TestApiWireFormat: + @pytest.fixture(autouse=True) + def setup(self): + self.state_dir = _fresh_state_dir() + # Stand up a recording HTTP server on a free port. We override + # WEBUI_URL on the imported mcp_server module to point at it. + self.port = _free_port() + _RecordingHandler.captured = [] + _RecordingHandler.canned_response = {} + self.httpd = http.server.HTTPServer(("127.0.0.1", self.port), + _RecordingHandler) + self.thread = threading.Thread(target=self.httpd.serve_forever, + daemon=True) + self.thread.start() + + # Disable auth so _api_post() does not attempt a real /api/auth/login. + os.environ.pop("HERMES_WEBUI_PASSWORD", None) + + self.mod, self.profiles = _reimport_mcp() + # Override AFTER import so the value sticks in the loaded module. + self.mod.WEBUI_URL = f"http://127.0.0.1:{self.port}" + yield + self.httpd.shutdown() + self.httpd.server_close() + self.thread.join(timeout=2) + _cleanup_state_dir(self.state_dir) + + async def test_rename_session_posts_to_canonical_path(self): + """rename_session must POST {session_id, title} to /api/session/rename.""" + _RecordingHandler.canned_response = { + "session": {"session_id": "abc123", "title": "Renamed"} + } + result = await _call(self.mod, "rename_session", + session_id="abc123", title="Renamed") + assert len(_RecordingHandler.captured) == 1 + req = _RecordingHandler.captured[0] + assert req["path"] == "/api/session/rename" + assert req["body"] == {"session_id": "abc123", "title": "Renamed"} + assert req["content_type"] == "application/json" + # Handler returns success-shaped result on 200. + assert result["ok"] is True + assert result["session_id"] == "abc123" + assert result["title"] == "Renamed" + assert result["method"] == "api" + + async def test_move_session_posts_to_canonical_path(self): + """move_session (with a project_id) POSTs to /api/session/move + after confirming the project exists locally.""" + # Need a real project so the pre-flight profile check passes. + created = await _call(self.mod, "create_project", name="MoveTarget") + pid = created["project_id"] + _RecordingHandler.canned_response = { + "ok": True, + "session": {"session_id": "s1", "title": "T", "project_id": pid} + } + result = await _call(self.mod, "move_session", + session_id="s1", project_id=pid) + assert len(_RecordingHandler.captured) == 1 + req = _RecordingHandler.captured[0] + assert req["path"] == "/api/session/move" + assert req["body"] == {"session_id": "s1", "project_id": pid} + assert result["ok"] is True + assert result["session_id"] == "s1" + assert result["project_id"] == pid + assert result["method"] == "api" + + async def test_move_session_unassign_sends_null_project_id(self): + """Passing project_id=None must serialize as JSON null (not omitted).""" + _RecordingHandler.canned_response = { + "ok": True, "session": {"session_id": "s1", "project_id": None} + } + result = await _call(self.mod, "move_session", + session_id="s1", project_id=None) + assert len(_RecordingHandler.captured) == 1 + req = _RecordingHandler.captured[0] + assert req["path"] == "/api/session/move" + assert req["body"] == {"session_id": "s1", "project_id": None} + assert result["ok"] is True + + async def test_url_built_from_env_vars(self): + """HERMES_WEBUI_HOST / HERMES_WEBUI_PORT govern WEBUI_URL. + + Locks the maintainer-suggested env-var contract from #1895 review: + the MCP must track the same env vars api/config.py:32-33 reads, so + a non-default WebUI port (e.g. 8788 when 8787 is held by another + service on the host) does not require a code edit.""" + os.environ["HERMES_WEBUI_HOST"] = "10.0.0.42" + os.environ["HERMES_WEBUI_PORT"] = "9999" + try: + mod, _ = _reimport_mcp() + assert mod.WEBUI_HOST == "10.0.0.42" + assert mod.WEBUI_PORT == "9999" + assert mod.WEBUI_URL == "http://10.0.0.42:9999" + finally: + os.environ.pop("HERMES_WEBUI_HOST", None) + os.environ.pop("HERMES_WEBUI_PORT", None) + + async def test_url_default_when_env_unset(self): + """Default upstream port is 8787, matching api/config.py:33.""" + os.environ.pop("HERMES_WEBUI_HOST", None) + os.environ.pop("HERMES_WEBUI_PORT", None) + mod, _ = _reimport_mcp() + assert mod.WEBUI_HOST == "127.0.0.1" + assert mod.WEBUI_PORT == "8787" + assert mod.WEBUI_URL == "http://127.0.0.1:8787"