Files
hermes-agent/tests
Teknium 2a18b6283b fix(cache): drop ttl=1h on Portal Qwen — Alibaba upstream is 5m-only (#24702)
PR #24151 routed Portal Qwen (qwen3.6-plus) through the prefix_and_2
long-lived cache layout, attaching {"type":"ephemeral","ttl":"1h"}
markers to the tools[-1] entry and the stable system-prefix block.
That layout works for Portal Claude because Anthropic / OpenRouter on
Anthropic routes honour 1h TTL — but Portal Qwen ultimately proxies to
Alibaba DashScope, which documents a single "ephemeral" TTL of 5
minutes on its Context Cache. The ttl="1h" qualifier is silently
dropped upstream, so the two highest-value breakpoints (tools array +
system prefix) never land. Only the rolling-window 5m markers on the
last 2 messages cache, which matches the observed ~25% read rate.

Fix: keep Portal Qwen on cache_control via _anthropic_prompt_cache_policy
returning (True, False), but drop it from _supports_long_lived_anthropic_cache
so it rides the standard system_and_3 5m layout (system + last 3 messages,
all at 5m). Same 4 breakpoints, all in a TTL the upstream actually honours.

Refs: https://www.alibabacloud.com/help/en/model-studio/context-cache
      https://openrouter.ai/docs/features/prompt-caching (Alibaba Qwen
      section: "TTL: 5 minutes")

- _supports_long_lived_anthropic_cache: Portal scope narrowed back to Claude
- tests: flip the two qwen long-lived expectations to False, retitle
  non_claude_non_qwen_rejected -> non_claude_rejected
2026-05-12 18:34:43 -07:00
..