mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-24 18:50:15 +00:00
ad8e10304c
* fix: remove orphaned i18n keys from top-level LOCALES object Three Traditional Chinese translation keys (cmd_status, memory_saved, profile_delete_title) were placed outside any locale block between the en and ru blocks in static/i18n.js. They became top-level properties of the LOCALES object, causing them to appear as invalid language options in the Settings > Preferences dropdown. The correct translations already exist in the zh-Hant locale block. Fixes #1008 * fix: block stale SSE events from polluting new session's DOM - appendThinking(): guard with !S.session||!S.activeStreamId to drop events from a previous session's SSE stream during a session switch - appendLiveToolCard(): same guard for consistency - finalizeThinkingCard(): scroll thinking-card-body to top when scroll is pinned, so completed response is immediately visible - appendThinking(): auto-scroll thinking card body to bottom while streaming if user is watching (scroll pinned) * Fix empty agent sessions in sidebar * fix: resolve cron UI UX issues — icon ambiguity, toast overlap, running status Fixes #995 — three sub-issues in the Cron Jobs UI: 1. Dual play icons ambiguous: Resume button now shows a distinct play+bar icon (play triangle + vertical line) instead of the identical triangle used by Run now. 2. Toast notification overlapping header buttons: Added position:relative; z-index:10 to .main-view-header so it stacks above the fixed toast (z-index:100 within its layer). 3. No running status after trigger: After triggering a job, the status badge immediately shows 'running…' with a CSS spinner animation, and polls the cron list every 3s (up to 30s) to refresh when the job completes. - Added cron_status_running i18n key in all 5 locales (en, es, de, ru, zh, zh-Hant) - Added .detail-badge.running CSS class with spinner animation - New functions: _setCronDetailStatus(), _startCronRunningPoll() * fix(#1011): address review feedback — poll cleanup, badge persistence, 30s fallback - _clearCronDetail() now clears _cronRunningPoll interval on navigation - Poll re-applies 'running' badge after loadCrons() re-render (prevents flicker) - When poll ends (30s max), detail re-renders with actual status as fallback * feat: create folder and add space directly from UI (#782) - After creating a folder via the file tree New folder button, offer to add it as a space via confirm dialog - Add Create folder if it doesnt exist checkbox in the New Space form - Backend: support create flag in /api/workspaces/add to mkdir before validation - i18n: 4 new keys (folder_add_as_space_title/msg/btn, workspace_auto_create_folder) in all 6 locales * fix: validate workspace path before mkdir to prevent orphan directories Review feedback (critical): the previous code called mkdir() before validate_workspace_to_add(), which meant a rejected path (e.g. system dir) would leave an orphan directory on disk. New flow: 1. Resolve path and check against blocked system roots BEFORE any mutation 2. mkdir() only if path passes the blocklist check 3. Full validation (exists, is_dir) after mkdir Also imports _workspace_blocked_roots for the pre-mutation blocklist check. * fix(#1014): classify model-not-found errors with helpful message - Add model_not_found error type to streaming.py exception classifier - Detect 404, 'not found', 'does not exist', 'invalid model' patterns - Strip HTML tags from provider error messages (nginx 404 pages, etc.) - Add model_not_found branch to apperror handler in messages.js - Add i18n key model_not_found_label in all 6 locales - 15 tests covering detection, sanitization, frontend, and i18n * feat(ui): add live TPS stat to header Adds a TPS (Tokens Per Second) chip to the right of the header title bar that updates live while AI output is streaming. Metering (api/metering.py) - Tracks per-session output + reasoning tokens via GlobalMeter singleton - Per-session TPS = total_tokens / elapsed_time - Global TPS = average of active sessions' TPS values - HIGH/LOW are max/min of global_tps snapshots over a 60-minute rolling window (only recorded when > 0, so idle periods are excluded) - Thread-safe with a single lock Metering events emitted from streaming.py - Throttled at 100ms from token/reasoning/tool callbacks so the display updates rapidly during fast token streams - 1Hz ticker as fallback for slow streams (exits when no active sessions) - Final stats emitted on stream end Routes (api/routes.py) - Removed POST /api/metering/interval endpoint (dynamic interval via focus/blur was replaced with simple always-1s-when-active approach) UI (static/messages.js, index.html, style.css) - TPS chip in titlebar: shows 'N.N t/s . N.N high . N.N low' - Default: '0.0 t/s . 0.0 high' when idle - Display updates on every metering SSE event (throttled to 100ms) * feat: session restore speed + title gen reasoning hardening (#1025, #1026) PR #1025 (@franksong2702): Speed up large session restore paths - GET /api/session?messages=0 now parses only metadata before the messages array - Metadata-only loads no longer populate the full-session LRU cache - Frontend lazy fetch uses resolve_model=0 to avoid cold model-catalog lookup - Hard reload no longer waits for populateModelDropdown() before restoring session PR #1026 (@franksong2702): Harden auto title generation for reasoning models - Raises title-gen completion budget to 512 tokens (reasoning-safe) - Retries once with 1024 tokens on empty content / finish_reason:length - Applies retry to both auxiliary and active-agent fallback routes - Preserves underlying failure reason in title_status on local fallback Co-authored-by: Frank Song <franksong2702@gmail.com> * feat: session attention indicators in right slot + last_message_at timestamps (#1024) PR #1024 (@franksong2702): Polish session attention indicators - Streaming spinners and unread dots now reuse the right-side actions slot - Running/unread rows hide timestamps; idle/read rows keep right-aligned timestamps - Date group carets point down when expanded, right when collapsed - Pinned group no longer repeats pinned-star icon per row - Running indicators appear immediately after send (local busy state while /api/sessions catches up) - Sidebar sorting/grouping/timestamps now prefer last_message_at (derived from last real message) so metadata-only saves don't make old sessions appear under Today Co-authored-by: Frank Song <franksong2702@gmail.com> * docs: v0.50.207 release notes — 10 PRs, 2169 tests (+36) --------- Co-authored-by: bergeouss <bergeouss@users.noreply.github.com> Co-authored-by: Josh <josh@fyul.link> Co-authored-by: Frank Song <franksong2702@gmail.com> Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
188 lines
7.1 KiB
Python
188 lines
7.1 KiB
Python
"""
|
|
Hermes Web UI -- Streaming performance metering.
|
|
|
|
Tracks Tokens Per Second (TPS) across all active WebUI sessions, and the
|
|
HIGH/LOW TPS values observed over the past 60 minutes. Metering data is
|
|
emitted via SSE events so the header label can update live during a stream.
|
|
|
|
Architecture
|
|
────────────
|
|
Each streaming session is tracked independently. TPS per session is:
|
|
|
|
session_tps = total_tokens / (last_token_ts - first_token_ts)
|
|
|
|
The global tps is the average of all currently active sessions' TPS values.
|
|
This correctly represents the system's real-time capacity regardless of how
|
|
many sessions are running or how long each has been streaming.
|
|
|
|
For HIGH/LOW tracking, every stats snapshot records the current global tps
|
|
(only when > 0 — idle periods are skipped) into a rolling 60-minute history.
|
|
The max/min of that history gives the peak throughput observed over the past hour.
|
|
|
|
The ticker in streaming.py calls get_interval() — it returns 1.0 when sessions
|
|
are actively receiving tokens so the header updates at 1 Hz, and 10.0 when idle
|
|
so the ticker exits and no idle readings are emitted.
|
|
|
|
Usage from api/streaming.py
|
|
─────────────────────────────
|
|
from api.metering import meter
|
|
|
|
meter().begin_session(stream_id) # stream starts
|
|
meter().record_token(stream_id, running_output) # per output token
|
|
meter().record_reasoning(stream_id, running_reasoning_len) # per reasoning token
|
|
|
|
The SSE `metering` event payload:
|
|
{
|
|
"tps": 47.3, # average TPS across active sessions (real-time)
|
|
"high": 52.1, # highest average TPS observed in the past 60 minutes
|
|
"low": 31.4, # lowest average TPS (excl. readings < 1 tps, to ignore idle)
|
|
"active": 1, # sessions currently streaming
|
|
}
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import threading
|
|
import time
|
|
from dataclasses import dataclass
|
|
|
|
_HOUR_SECS = 3600.0 # rolling window for HIGH/LOW tracking
|
|
_STALE_SECS = 60.0 # consider a session inactive after this
|
|
|
|
|
|
@dataclass
|
|
class _SessionMeter:
|
|
output_tokens: int = 0
|
|
reasoning_tokens: int = 0
|
|
first_token_ts: float = 0.0 # time.monotonic() of first token received
|
|
last_token_ts: float = 0.0 # time.monotonic() of last token received
|
|
|
|
def total_tokens(self) -> int:
|
|
return self.output_tokens + self.reasoning_tokens
|
|
|
|
def tps(self) -> float:
|
|
if self.first_token_ts == 0.0 or self.last_token_ts <= self.first_token_ts:
|
|
return 0.0
|
|
return self.total_tokens() / (self.last_token_ts - self.first_token_ts)
|
|
|
|
|
|
class GlobalMeter:
|
|
"""Thread-safe global streaming meter.
|
|
|
|
Tracks per-session TPS, averages them for a global tps, and maintains a
|
|
60-minute rolling history of global tps snapshots for HIGH/LOW reporting.
|
|
"""
|
|
|
|
__slots__ = (
|
|
'_lock',
|
|
'_sessions', # stream_id -> _SessionMeter
|
|
'_readings', # [(monotonic_ts, tps), ...] rolling 60-minute history
|
|
'_window_start', # monotonic ts of current window
|
|
)
|
|
|
|
def __init__(self) -> None:
|
|
self._lock = threading.Lock()
|
|
self._sessions: dict[str, _SessionMeter] = {}
|
|
self._readings: list[tuple[float, float]] = []
|
|
self._window_start: float = time.monotonic()
|
|
|
|
# ── Public API ────────────────────────────────────────────────────────────
|
|
|
|
def begin_session(self, stream_id: str) -> None:
|
|
with self._lock:
|
|
self._sessions[stream_id] = _SessionMeter()
|
|
|
|
def get_interval(self) -> float:
|
|
"""Return 1.0 when sessions are actively receiving tokens, 10.0 when idle.
|
|
|
|
Used by the streaming ticker to run at 1 Hz during work and exit when
|
|
there is nothing to measure.
|
|
"""
|
|
now = time.monotonic()
|
|
with self._lock:
|
|
# Only count sessions that have received at least one token recently.
|
|
active_sids = {
|
|
sid for sid, s in self._sessions.items()
|
|
if s.first_token_ts > 0 and (now - s.last_token_ts) <= _STALE_SECS
|
|
}
|
|
return 1.0 if active_sids else 10.0
|
|
|
|
def record_token(self, stream_id: str, running_output_tokens: int) -> None:
|
|
now = time.monotonic()
|
|
with self._lock:
|
|
s = self._sessions.get(stream_id)
|
|
if s is None:
|
|
return
|
|
if s.first_token_ts == 0.0:
|
|
s.first_token_ts = now
|
|
s.last_token_ts = now
|
|
s.output_tokens = running_output_tokens
|
|
|
|
def record_reasoning(self, stream_id: str, running_reasoning_tokens: int) -> None:
|
|
now = time.monotonic()
|
|
with self._lock:
|
|
s = self._sessions.get(stream_id)
|
|
if s is None:
|
|
return
|
|
if s.first_token_ts == 0.0:
|
|
s.first_token_ts = now
|
|
s.last_token_ts = now
|
|
s.reasoning_tokens = running_reasoning_tokens
|
|
|
|
def end_session(self, stream_id: str, final_output_tokens: int, input_tokens: int = 0) -> None:
|
|
with self._lock:
|
|
self._sessions.pop(stream_id, None)
|
|
|
|
def get_stats(self) -> dict:
|
|
now = time.monotonic()
|
|
with self._lock:
|
|
# Prune stale sessions
|
|
stale = [
|
|
sid for sid, s in self._sessions.items()
|
|
if s.first_token_ts > 0 and (now - s.last_token_ts) > _STALE_SECS
|
|
]
|
|
for sid in stale:
|
|
self._sessions.pop(sid, None)
|
|
|
|
# Reset window if everything went stale
|
|
if not self._sessions:
|
|
self._window_start = now
|
|
|
|
# Compute global tps: average of per-session TPS values
|
|
active = [s for s in self._sessions.values() if s.first_token_ts > 0]
|
|
if active:
|
|
global_tps = sum(s.tps() for s in active) / len(active)
|
|
else:
|
|
global_tps = 0.0
|
|
|
|
# Prune readings older than 1 hour
|
|
cutoff = now - _HOUR_SECS
|
|
self._readings = [(ts, v) for ts, v in self._readings if ts > cutoff]
|
|
|
|
# Only record this snapshot for HIGH/LOW if there is active work.
|
|
# This prevents idle periods from flooding the history and keeps
|
|
# HIGH/LOW meaningful for the past hour of actual throughput.
|
|
if global_tps > 0:
|
|
self._readings.append((now, global_tps))
|
|
|
|
# HIGH/LOW from the past hour (skip near-zero idle readings)
|
|
active_readings = [v for _, v in self._readings if v >= 1.0]
|
|
high = max(active_readings) if active_readings else 0.0
|
|
low = min(active_readings) if active_readings else 0.0
|
|
|
|
return {
|
|
'tps': round(global_tps, 1),
|
|
'high': round(high, 1),
|
|
'low': round(low, 1),
|
|
'active': len(self._sessions),
|
|
}
|
|
|
|
|
|
# ── Module-level singleton ─────────────────────────────────────────────────────
|
|
|
|
_meter = GlobalMeter()
|
|
|
|
|
|
def meter() -> GlobalMeter:
|
|
return _meter
|