Commit Graph

6 Commits

Author SHA1 Message Date
yoniebans cebd480818 refactor(session-log): drop branch/compress re-point of session_log_file
The attribute no longer exists; nothing to re-point.
2026-05-20 11:44:10 -07:00
Teknium 6cb9917c73 perf(compression): defer feasibility check to first compression attempt (#28957)
`AIAgent.__init__` was eagerly calling
`_check_compression_model_feasibility()` which probes the auxiliary
provider chain and runs `get_model_context_length()` (potentially
network-bound) to decide whether the configured auxiliary model can
fit a full compression-threshold window. That cost ~440ms cold on
every agent construction.

Most `chat -q` invocations finish in 1-5 seconds and never accumulate
enough context to trip the compression threshold, so the feasibility
check is pure overhead. The result is also only consumed when
compression actually fires (the function adjusts the live threshold
downward if the aux model can't fit; absent that mutation, the gate
in `conversation_loop.py:442` would never fire anyway).

Defer to first `compress_context()` call via
`agent._compression_feasibility_checked` sentinel. Runs at most once
per agent lifetime, just before the first compression pass. The
warning storage (`_compression_warning`) and gateway replay
machinery is unchanged — it still emits to status_callback on the
first turn that actually needs compression.

E2E timing (chat -q 'hi', 3 runs each):
                BEFORE   AFTER    delta
  median wall   2.03s    1.86s    -8% (-169ms)
  min wall      1.92s    1.63s    -15% (-293ms)

Real cold-start observation (synthetic 31-turn agent loop): identical
behavior since feasibility check fires once on first compression and
caches. No semantic difference for sessions that DO compress.

UX trade-off: users with broken auxiliary-provider config no longer
see the warning at session start. They see it when compression first
fires — which is exactly when it matters. For users with working
config (the vast majority), the warning never fires anyway, so the
deferral is invisible.

Tests:
- tests/run_agent/test_compression_feasibility.py — 16/16 pass
  (the one test that asserted call-at-init was updated to drive the
  lazy check explicitly via agent._check_compression_model_feasibility())
- Live tmux session: 2-turn conversation + tool call completes clean,
  zero errors in agent.log
2026-05-19 17:27:17 -07:00
Teknium 1634397ddb fix(compress): abort instead of dropping messages when summary LLM fails (#28102)
When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
2026-05-18 10:19:40 -07:00
glennc 9df9816dab feat(azure-foundry): add Microsoft Entra ID auth
Use azure-identity DefaultAzureCredential for keyless Foundry auth.

Preserve refreshable callable credentials through OpenAI and Anthropic client paths.

Add setup, doctor, auth status, docs, and tests for Entra auth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 10:14:38 -07:00
teknium1 f885be030c fix(auxiliary): resolve xai oauth compression from pool — port to conversation_compression
Original commit 97a32afdc by helix4u targeted _check_compression_model_feasibility
in pre-refactor run_agent.py. The function body now lives in
agent/conversation_compression.py — re-applied the configured-but-unavailable
provider message there.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-16 23:33:59 -07:00
teknium1 5311d9959e refactor(run_agent): extract context compression to agent/conversation_compression.py
Move four compression-related methods to a dedicated module:

* check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor
* replay_compression_warning — re-emit stored warning through gateway status_callback
* compress_context — run compressor, split SQLite session, notify plugins+memory
* try_shrink_image_parts_in_messages — image-too-large recovery via re-encode

AIAgent keeps thin forwarder methods so existing call sites and tests
that patch run_agent.AIAgent methods keep working.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 15013 -> 14535 lines (-478).
2026-05-16 18:09:33 -07:00