Files
hermes-agent/tests/run_agent/test_codex_app_server_integration.py
T
Teknium1 93a0fe6495 feat(codex-runtime): wire codex_app_server runtime into AIAgent
The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.

Three small surgical edits to run_agent.py (~105 LOC total):

1. Line ~1204 (constructor api_mode validation set):
   Add 'codex_app_server' so an explicit api_mode='codex_app_server'
   passed to AIAgent() isn't silently rewritten to 'chat_completions'.

2. Line ~12048 (run_conversation, just before the while loop):
   Early-return to _run_codex_app_server_turn() when self.api_mode is
   'codex_app_server'. Placed AFTER all standard pre-loop setup —
   logging context, session DB, surrogate sanitization, _user_turn_count
   and _turns_since_memory increments, _ext_prefetch_cache, memory
   manager on_turn_start — so behavior outside the model-call loop is
   identical between paths. Default Hermes flow is unchanged when the
   flag is off.

3. End-of-class (line ~15497):
   New method _run_codex_app_server_turn(). Lazy-instantiates one
   CodexAppServerSession per AIAgent (reused across turns), runs the
   turn, splices projected_messages into messages, increments
   _iters_since_skill by tool_iterations (since the chat_completions
   loop normally does that per iteration), fires
   _spawn_background_review on the same cadence as the default path.

Counter accounting:

  _turns_since_memory  ← already incremented at run_conversation:11817
                         (gated on memory store configured) — codex
                         helper does NOT touch it (would double-count).
  _user_turn_count     ← already incremented at run_conversation:11793
                         — codex helper does NOT touch it.
  _iters_since_skill   ← incremented in the chat_completions loop per
                         tool iteration. Codex helper increments by
                         turn.tool_iterations since the loop is bypassed.

User message:

  ALREADY appended to messages by run_conversation pre-loop (line 11823)
  before the early-return reaches us. Helper does NOT append again.
  Regression test test_user_message_not_duplicated guards this.

Approval callback wiring:

  Lazy-fetches tools.terminal_tool._get_approval_callback at session
  spawn time, passes to CodexAppServerSession. CLI threads with
  prompt_toolkit get interactive approvals; gateway/cron contexts get
  the codex-side fail-closed deny.

Error path:

  Codex session exceptions become a 'partial' result with completed=False
  and a final_response that explicitly tells the user how to switch back:
  'Codex app-server turn failed: ... Fall back to default runtime with
  /codex-runtime auto.' Same return-dict shape as the chat_completions
  path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.

9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
  - api_mode='codex_app_server' is accepted on AIAgent construction
  - run_conversation returns the expected codex shape
    (final_response, codex_thread_id, codex_turn_id, completed, partial)
  - Projected messages are spliced into messages list
  - _iters_since_skill ticks per tool iteration
  - _user_turn_count delegated to standard flow (not double-counted)
  - User message appears exactly once (regression guard)
  - _spawn_background_review IS invoked (memory/skill review keeps working)
  - chat.completions.create is NEVER called (loop fully bypassed)
  - Session exception → partial result with /codex-runtime auto hint
  - Interrupted turn → partial result with error preserved

Adjacent test runs confirm no regressions:
  - tests/run_agent/test_memory_nudge_counter_hydration.py: green
  - tests/run_agent/test_background_review.py: green
  - tests/run_agent/test_fallback_model.py: green
  - tests/agent/transports/: 249/249 green

Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.
2026-05-12 10:26:26 -07:00

191 lines
8.1 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""Integration test for the codex_app_server runtime path through AIAgent.
Verifies that:
- api_mode='codex_app_server' is accepted on AIAgent construction
- run_conversation() takes the early-return path and never enters the
chat completions loop
- Projected messages from a fake Codex session land in the messages list
- tool_iterations from the codex session tick the skill nudge counter
- Memory nudge counter ticks once per turn
- The returned dict has the same shape as the chat_completions path
"""
from __future__ import annotations
from unittest.mock import patch
import pytest
import run_agent
from agent.transports.codex_app_server_session import CodexAppServerSession, TurnResult
@pytest.fixture
def fake_session(monkeypatch):
"""Replace CodexAppServerSession with a stub that returns a fixed
TurnResult, so we can drive AIAgent without spawning real codex."""
def fake_run_turn(self, user_input: str, **kwargs):
return TurnResult(
final_text=f"echo: {user_input}",
projected_messages=[
{"role": "assistant", "content": None,
"tool_calls": [{"id": "exec_1", "type": "function",
"function": {"name": "exec_command",
"arguments": "{}"}}]},
{"role": "tool", "tool_call_id": "exec_1", "content": "ok"},
{"role": "assistant", "content": f"echo: {user_input}"},
],
tool_iterations=1,
interrupted=False,
error=None,
turn_id="turn-stub-1",
thread_id="thread-stub-1",
)
monkeypatch.setattr(CodexAppServerSession, "run_turn", fake_run_turn)
monkeypatch.setattr(
CodexAppServerSession, "ensure_started", lambda self: "thread-stub-1"
)
def _make_codex_agent():
"""Construct an AIAgent in codex_app_server mode without contacting any
real provider. We pass api_mode explicitly so the constructor takes the
fast path for direct credentials."""
return run_agent.AIAgent(
api_key="stub",
base_url="https://stub.invalid",
provider="openai",
api_mode="codex_app_server",
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)
class TestApiModeAccepted:
def test_api_mode_is_codex_app_server(self):
agent = _make_codex_agent()
assert agent.api_mode == "codex_app_server"
class TestRunConversationCodexPath:
def test_run_conversation_returns_codex_shape(self, fake_session):
agent = _make_codex_agent()
# No background review fork during tests
with patch.object(agent, "_spawn_background_review", return_value=None):
result = agent.run_conversation("hello there")
assert result["final_response"] == "echo: hello there"
assert result["completed"] is True
assert result["partial"] is False
assert result["error"] is None
assert result["api_calls"] == 1
assert result["codex_thread_id"] == "thread-stub-1"
assert result["codex_turn_id"] == "turn-stub-1"
def test_projected_messages_are_spliced(self, fake_session):
agent = _make_codex_agent()
with patch.object(agent, "_spawn_background_review", return_value=None):
result = agent.run_conversation("hello")
msgs = result["messages"]
# User message + 3 projected (assistant tool_call + tool + assistant text)
assert len(msgs) >= 4
assert msgs[0]["role"] == "user"
assert msgs[0]["content"] == "hello"
# Last assistant message has the final text
final = [m for m in msgs if m.get("role") == "assistant"
and m.get("content") == "echo: hello"]
assert final, f"expected final assistant message in {msgs}"
def test_nudge_counters_tick(self, fake_session):
"""The skill nudge counter must accumulate tool_iterations across
turns. The memory nudge counter is gated on memory being configured
(which we skip via skip_memory=True), so we don't assert on it here —
a separate test below covers that path explicitly."""
agent = _make_codex_agent()
agent._iters_since_skill = 0
agent._user_turn_count = 0
with patch.object(agent, "_spawn_background_review", return_value=None):
agent.run_conversation("first")
assert agent._iters_since_skill == 1 # one tool_iteration in fake turn
# _user_turn_count is incremented by run_conversation pre-loop, not
# by the codex helper — confirms we delegate that to the standard flow.
assert agent._user_turn_count == 1
with patch.object(agent, "_spawn_background_review", return_value=None):
agent.run_conversation("second")
assert agent._iters_since_skill == 2
assert agent._user_turn_count == 2
def test_user_message_not_duplicated(self, fake_session):
"""Regression guard: the user message must appear exactly once in
the messages list. The standard run_conversation pre-loop appends
it, and the codex helper must NOT append again."""
agent = _make_codex_agent()
with patch.object(agent, "_spawn_background_review", return_value=None):
result = agent.run_conversation("ping unique 12345")
user_count = sum(
1 for m in result["messages"]
if m.get("role") == "user" and m.get("content") == "ping unique 12345"
)
assert user_count == 1, f"user message appeared {user_count}× in {result['messages']}"
def test_background_review_invoked(self, fake_session):
agent = _make_codex_agent()
with patch.object(agent, "_spawn_background_review",
return_value=None) as spawn:
agent.run_conversation("ping")
assert spawn.called
def test_chat_completions_loop_is_not_entered(self, fake_session):
"""The early-return must bypass the regular API call loop entirely.
We confirm by patching the SDK call and asserting it's never invoked."""
agent = _make_codex_agent()
# The chat_completions loop calls self.client.chat.completions.create(...)
# If our early-return works, that path is dead.
with patch.object(agent, "client") as client_mock, patch.object(
agent, "_spawn_background_review", return_value=None
):
agent.run_conversation("hi")
assert not client_mock.chat.completions.create.called
class TestErrorHandling:
def test_session_exception_returns_partial_with_error(self, monkeypatch):
def boom_run_turn(self, user_input, **kwargs):
raise RuntimeError("subprocess died")
monkeypatch.setattr(CodexAppServerSession, "ensure_started",
lambda self: "t1")
monkeypatch.setattr(CodexAppServerSession, "run_turn", boom_run_turn)
agent = _make_codex_agent()
with patch.object(agent, "_spawn_background_review", return_value=None):
result = agent.run_conversation("hi")
assert result["completed"] is False
assert result["partial"] is True
assert "subprocess died" in result["error"]
assert "codex-runtime auto" in result["final_response"]
def test_interrupted_turn_marked_partial(self, monkeypatch):
def interrupted_turn(self, user_input, **kwargs):
return TurnResult(
final_text="",
projected_messages=[],
tool_iterations=0,
interrupted=True,
error="user interrupted",
turn_id="t",
thread_id="th",
)
monkeypatch.setattr(CodexAppServerSession, "ensure_started",
lambda self: "th")
monkeypatch.setattr(CodexAppServerSession, "run_turn", interrupted_turn)
agent = _make_codex_agent()
with patch.object(agent, "_spawn_background_review", return_value=None):
result = agent.run_conversation("hi")
assert result["completed"] is False
assert result["partial"] is True
assert result["error"] == "user interrupted"